Вы находитесь на странице: 1из 1284

Contents

Cloud Adoption Framework


Getting started
Begin a migration journey
Understand the innovation journey
Enable successful cloud adoption
Strategy
Overview
Motivations
Business outcomes
Overview
Fiscal outcomes
Agility outcomes
Global reach outcomes
Customer engagement outcomes
Performance outcomes
Business outcome template
Align efforts to learning metrics
Business justification
Build a business justification
Create a financial model
Understand cloud accounting
First cloud adoption project
Suggested skills
Plan
Overview
Digital estate
The 5 Rs of rationalization
What is a digital estate?
Digital estate planning
Gather inventory data
Rationalize the digital estate
Align cost models to forecast costs
Initial organization alignment
Cloud adoption plan
Overview
Prerequisites
Deploy the template to Azure DevOps
Prioritize and define workloads
Align assets to workloads
Review rationalization decisions
Establish iterations and release plans
Estimate timelines
Suggested skills
Adapt for the cloud
Ready
Overview
Azure setup guide
Before you start
Organize your resources
Manage access
Manage costs and billing
Plan governance, security, and compliance
Establish monitoring and reporting
Stay current with Azure
Deploy a migration landing zone
Landing zone considerations
Overview
Azure fundamental concepts
Review compute options
Review networking options
Review storage options
Review data options
Role-based access controls
Create hybrid cloud consistency
Expanded scope
Terraform landing zones
Network boundary security
Best practices
Overview
Resource organization
Scaling with multiple subscriptions
Naming and tagging
Networking
Plan virtual networks
Best practices for network security
Best practices for networking migrated workloads
Perimeter networks
Hub and spoke network topology
Identity and access controls
Identity management best practices
Secure privileged access
Choose an authentication method
Storage
Storage security guide
Databases
Database security best practices
Choose a deployment option in Azure SQL
Cost management
Track costs
Optimize your cloud investment
Create and manage budgets
Export cost data
Optimize costs from recommendations
Monitor usage and spending
Suggested skills
Adopt
Migrate
Overview
Azure migration guide
Before you start
Prerequisites for Azure migration
Assess your digital estate
Migrate workload assets
Migration-focused cost control mechanisms
Optimize and transform
Secure and manage
Obtain assistance
Expanded scope
Expanded scope checklist
Balance the portfolio
Skills readiness
VMware host migration
SQL Server migration
Multiple datacenters
Data requirements exceed network capacity
Governance and compliance
Best practices
Overview
Assessing workloads
Assess on-premises workloads
Migrating workloads
Set up networking for migrated workloads
Application migration
Overview
Deploy a migration infrastructure
Windows Server workloads
Linux workloads
SQL Server workloads
ASP.NET/PHP/Java apps
Scale a migration
Migrating VMware hosts
Overview
Prerequisites
Secure your environment
Private cloud management
Private cloud networking
VMware platform
Azure core integration
Migration and disaster recovery options
Migrate workload VMs to Private Cloud vCenter
Migrate data using Azure Data Box
Back up workload VMs
Set up Private Cloud as disaster recovery site using Zerto
Set up Private Cloud as disaster recovery site using VMware SRM
Migrate data platforms
Overview
Azure Database Migration Guide
SQL Server migration to Azure SQL DB
SQL Server migration to Azure SQL DB Managed Instance
SQL Server migration to SQL Server on Azure VMs
SQL Server migration to Azure SQL DW
MySQL
PostgreSQL
MariaDB
MongoDB
Cassandra
Oracle
DB2
SAP ASE
Access
Azure Database Migration Service (DMS) tutorials
Azure Database Migration Service
Migrate SQL Server to Azure SQL DB offline
Migrate SQL Server to Azure SQL DB online
Migrate SQL Server to Azure SQL DB Managed Instance offline
Migrate SQL Server to Azure SQL DB Managed Instance online
Migrate AWS RDS SQL Server to Azure SQL DB or Azure SQL DB Managed
Instance online
Migrate MySQL to Azure DB for MySQL online
Migrate AWS RDS MySQL to Azure DB for MySQL online
Migrate PostgreSQL to Azure DB for PostgreSQL online
Migrate AWS RDS PostgreSQL to Azure DB for PostgreSQL online
Migrate MongoDB to Azure Cosmos DB Mongo API offline
Migrate MongoDB to Azure Cosmos DB Mongo API online
Migrate Oracle to Azure DB for PostgreSQL online
Migrate mainframes
Overview
Myths and facts
Switch from mainframes to Azure
Mainframe application migration
Optimize workloads
Costing and sizing workloads
Secure and manage workloads
Securing and managing workloads after migration
Additional best practices
Azure database security best practices
Azure data security and encryption best practices
Azure identity management and access control security best practices
Azure network security best practices
Azure operational security best practices
Azure PaaS best practices
Azure Service Fabric security best practices
Best practices for Azure VM security
Implementing a secure hybrid network architecture in Azure
IoT security best practices
Securing PaaS databases in Azure
Securing PaaS web and mobile applications using Azure App Service
Securing PaaS web and mobile applications using Azure Storage
Security best practices for IaaS workloads in Azure
Migration considerations
Overview
Prerequisites
Overview
Decisions that affect migration
Environment planning checklist
Align roles and responsibilities
Agile change management
Migration backlog review
Assess assets and workloads
Assess assets before migration
Keep priorities aligned
Evaluate workload readiness
Architect workloads
Estimate cloud costs
Understand partnership options
Manage change
Approve architecture changes
Migrate individual assets
Overview
Promotion models
Remediate assets
Replicate assets
Replicate options
Stage workloads
Optimize and promote workloads
Overview
Business change plan
Business testing
Benchmark and resize assets
Prepare for promotion
Promote to production
Decommission retired assets
Conduct retrospectives
Secure and manage workloads in production
Overview
Innovate
Overview
Azure innovation guide
Before you start
Prepare for customer feedback
Democratize data
Engage through apps
Empower adoption
Interact with devices
Predict and influence
Considerations
Overview
Business value consensus
Customer adoption
Feedback loops
Build with customer empathy
Measure for customer impact
Learn with customers
Customer challenges and blockers
Digital invention
Develop digital inventions
Democratize data
Engage via apps
Empower adoption
Interact with devices
Predict and influence
Innovation best practices
Overview and Azure toolchain
Democratize data
Overview
Sharing data with experts
Quickly generate data insights
Sharing data with coworkers and partners
Embed reports in a website or portal
Create new workspaces in Power BI
Govern data
Classify data
Secure data
Annotate data with data catalog
Document data sources with data catalog
Centralize data
Create and query an Azure SQL Data Warehouse
Best practices for loading data into Azure SQL Data Warehouse
Visualize warehouse data using Power BI
Reference architecture - Enterprise BI with SQL Data Warehouse
Manage enterprise big data with Azure Data Lake Storage
What is a data lake
Collect data
Migrate on-premises data to Azure from SQL, Oracle, or NoSQl platforms
Integrate cloud data sources with SQL Data Warehouse
Load on-premises data into Azure SQL Data Warehouse
Integrate data - data factory to OLAP
Ingest Stream Analytics into SQL Data Warehouse
Reference architecture - Ingest and analysis of new feeds
Data virtualization with Azure SQL Data Warehouse and Polybase
Engage via apps
Overview
Citizen developers
Creating apps in PowerApps
Create your first workflow with Microsoft Flow
Using AI Builder
Compliance and data privacy for citizen developer solutions
Data loss prevention policies for citizen developer solutions
Intelligent experiences
Modern web apps
Infusing intelligence
ChatBots
Cloud-native applications
Microservices architecture
Containers
Spring Boot microservices
Event-driven applications
Empower adoption
Overview
Shared solution
Getting started with a shared repository - GitHub and Git
Getting started with a shared backlog
Synchronize PowerApps with Azure DevOps
Feedback loops
Manage feedback with Azure DevOps
Continuous integration
Continuous integration with Azure Pipelines and GitHub
Reliable testing
Manage and track test plans
Solution deployment
Continuous deployment with Azure Pipelines and GitHub
Integrated metrics
Monitor ASP.NET applications
Monitor .Net Core applications
Monitor Node.js applications
Monitor Mobile applications
Monitor Web applications
Monitor VMs hosting traditional applications
Interact with devices
Overview
Mobile experience
Extend a legacy claims-processing app with a web and mobile experience
Optimize reports to share data on a mobile app
Extend PowerApps canvas app to a mobile experience
Extend Microsoft Flow to add a mobile experience
Secure mobile experiences
Mixed reality
Develop mixed reality experiences with Unity
Quickstarts to add Azure Spatial Anchors to a mixed reality solution
Integrated reality and IoT
Visualize sensor data with Azure IoT in Power BI
Visualize sensor data with Azure IoT hub in a web solution
Securing an IoT solution
Get started with Azure Sphere
Create a deployment with Azure Sphere
Get started with Azure Kinect DK
Build your first Azure Kinect DK app
Adjusted reality
Azure Digital Twins + HoloLens - Adjusting virtual reality
Get started with Azure Digital Twins
Monitor a building with Digital Twins
Azure IoT for cloud-to-device communications guide
Azure IoT configuration for cloud-to-device communications
Predict and influence
Overview
Patterns
Azure Machine Learning workflow
Hadoop R Scaler solution
Azure Machine Learning with Azure SQL Data Warehouse
Predictions
Deploy predictions
Setting up HDInsight clusters
Interactions
Azure Machine Learning from experiment to operational services
Operationalize ML Services on Azure HDInsight
Govern
Overview
Methodology
Benchmark
Initial governance foundation
Governance foundation improvements
Governance guides
Overview
Standard enterprise governance guide
Overview
Narrative
Initial corporate policy
Prescriptive guidance
Improve the security baseline discipline
Improve resource consistency discipline
Improve cost management discipline
Multicloud scenarios
Governance guide for complex enterprises
Overview
Narrative
Initial corporate policy
Prescriptive guidance
Improve the identity baseline discipline
Improve the security baseline discipline
Improve the resource consistency discipline
Improve the cost management discipline
Multicloud scenarios
Multiple layers of governance
Governance considerations
Evaluate corporate policy
Cloud-ready corporate policy and compliance
Make corporate policy cloud-ready
Understand business risks
Evaluate risk tolerance
Define corporate policy
Align design with policy
Establish policy adherence processes
Regulatory compliance
Cloud security readiness
Cloud policy review
Data classification
Disciplines of cloud governance
Implement disciplines of cloud governance
Cost management
Overview of cost management
Download the template
Understand business risks
Risk tolerance metrics and indicators
Sample cost management policies
Policy compliance processes
Improve cost management
Azure tools for cost management
Security baseline
Overview of the security baseline
Download the template
Understand business risks
Risk tolerance metrics and indicators
Sample security baseline policies
Policy compliance processes
Improve the security baseline
Cloud-native security baseline
Additional Azure security guidance
Azure tools for security baseline
Identity baseline
Overview of the identity baseline
Download the template
Understand business risks
Risk tolerance metrics and indicators
Sample identity baseline policies
Policy compliance processes
Improve the identity baseline
Azure tools for identity baseline
Resource consistency
Overview of resource consistency
Download the template
Understand business risks
Risk tolerance metrics and indicators
Sample resource consistency policies
Policy compliance processes
Improve resource consistency
Azure tools for resource consistency
Resource access management
Governance design for a simple workload
Governance design for multiple teams
Deployment acceleration
Overview of deployment acceleration
Download the template
Understand business risks
Risk tolerance metrics and indicators
Sample deployment acceleration policies
Policy compliance processes
Improve deployment acceleration
Azure tools for deployment acceleration
Manage
Overview
Azure management guide
Before you start
Inventory and visibility
Operational compliance
Protect and recover
Enhanced baseline
Platform specialization
Workload specialization
Best practices
Azure Server Management
Introduction to Azure Server Management
Getting ready for cloud operations
Getting started with cloud operations
Overview
Configure the service for a single VM
Configure the service for an entire subscription
Configure at scale with automation
Set up basic alerts
Ongoing cloud operations
Overview
Enable guest configuration policy
Critical changes (tracking and alerting)
Update schedules
Common policies in Azure
Review of tools and services
Monitoring
Overview
Monitoring cloud models
Data collection
Alerting
Monitoring platforms overview
Centralize management operations
Establish an operational fitness review
Improving platform or workload resiliency
Resiliency checklist for Azure services
Failure mode analysis
Recover from a region wide service disruption
Recover from data corruption or accidental deletion
Management considerations
Overview
Business alignment
Define criticality
Understand business impact
Establish business commitments
Management disciplines
Inventory and visibility
Operational compliance
Protect and recovery
Platform operations
Workload operations
Advanced management and system design
Organize
Managing organization alignment
Required cloud capabilities
Cloud strategy
Cloud adoption
Cloud governance
Central IT
Cloud operations
Cloud center of excellence
Cloud platform
Cloud automation
Establish teams
Align the RACI matrix
Building technical skills
Creating a cost-conscious organization
Antipatterns - IT fiefdoms and IT silos
Reference
Cloud Adoption Framework roadmap
Operating model
Overview
Terminology
Decision guides
Overview
Subscriptions
Identity
Policy enforcement
Resource consistency
Resource tagging
Encryption
Software-defined networks
Overview
PaaS-only
Cloud-native
Cloud DMZ
Hybrid
Hub and spoke model
Logging and reporting
Migration tools
Infrastructure
Virtual machines
Deploy a basic workload
Cloud Operating Model
Azure enterprise scaffold
Virtual Datacenter (VDC)
How does Azure work?
Azure Architecture Center
The Cloud Adoption Framework is the One Microsoft approach to cloud adoption in Azure, consolidating and sharing best
practices from Microsoft employees, partners, and customers. The framework gives customers a set of tools, guidance, and
narratives that help shape technology, business, and people strategies for driving desired business outcomes during their adoption
effort. This guidance aligns to the following phases of the cloud adoption lifecycle, ensuring easy access to the right guidance at
the right time.

Strategy
Define business justification and expected outcomes.

Plan
Align actionable adoption plans to business outcomes.

Ready
Prepare the cloud environment for the planned changes.

Migrate
Migrate and modernize existing workloads.

Innovate
Develop new cloud-native or hybrid solutions.

Govern
Govern the environment and workloads.
Manage
Operations management for cloud and hybrid solutions.

Understand the lifecycle


Each of the phases captured above are part of a broad cloud adoption lifecycle. The following image ties together each phase to
demonstrate the overall lifecycle. The Cloud Adoption Framework is a full lifecycle framework, supporting customers throughout
each phase.

Intent
The cloud fundamentally changes how enterprises procure, use, and secure technology resources. Traditionally, enterprises
assumed ownership of and responsibility for all aspects of technology, from infrastructure to software. By moving to the cloud,
enterprises can provision and consume resources only when they're needed. Although the cloud offers tremendous flexibility in
design choices, enterprises need a proven and consistent methodology for adopting cloud technologies. The Microsoft Cloud
Adoption Framework for Azure meets that need, helping guide decisions throughout cloud adoption.
However, cloud adoption is only a means to an end. Successful cloud adoption starts well before a cloud platform vendor is
selected. It begins when business and IT decision makers realize that the cloud can accelerate a specific business transformation
objective. The Cloud Adoption Framework can help them align strategies for business, culture, and technical change to achieve
their desired business outcomes.
The Cloud Adoption Framework provides technical guidance for Microsoft Azure. Because enterprise customers might still be in
the process of choosing a cloud vendor or may have an intentional multi-cloud strategy, the framework provides cloud-agnostic
guidance for strategic decisions whenever possible.

Intended audience
This guidance affects the business, technology, and culture of enterprises. The affected roles include line-of-business leaders,
business decision makers, IT decision makers, finance, enterprise administrators, IT operations, IT security and compliance, IT
governance, workload development owners, and workload operations owners. Each role uses its own vocabulary, and each has
different objectives and key performance indicators. A single set of content can't address all audiences effectively.
Enter the Cloud Architect. The Cloud Architect serves as the thought leader and facilitator to bring these audiences together.
We've designed this collection of guides to help Cloud Architects facilitate the right conversations with the right audiences and
drive decision-making. Business transformation that's empowered by the cloud depends on the Cloud Architect role to help guide
decisions throughout the business and IT.
Each section of the Cloud Adoption Framework represents a different specialization or variant of the Cloud Architect role. These
sections also create opportunities to share cloud architecture responsibilities across a team of Cloud Architects. For example, the
governance section is designed for Cloud Architects who have a passion for mitigating technical risks. Some cloud providers refer
to these specialists as cloud custodians; we prefer the term cloud guardian or, collectively, the cloud governance team.

How to use the Microsoft Cloud Adoption Framework for Azure


If your enterprise is new to Azure, begin by reading Get started with the Cloud Adoption Framework. This overview provides best
practices for your enterprise's digital transformation as it walks you through each step of the process.
Get started
Begin a cloud migration journey in Azure
9 minutes to read • Edit Online

Use the Microsoft Cloud Adoption Framework for Azure to begin a cloud migration journey. This framework
provides comprehensive guidance for transitioning legacy application workloads to the cloud using innovative
cloud-based technologies.

Executive summary
The Cloud Adoption Framework helps customers undertake a simplified cloud adoption journey. This framework
contains detailed information about an end-to-end cloud adoption journey, starting with targeted business
outcomes, and then aligning cloud readiness and assessments with clearly defined business objectives. Those
outcomes are achieved through a defined path for cloud adoption. With migration-based adoption, the defined
path focuses largely on migrating on-premises workloads to the cloud. Sometimes this journey includes
modernization of workloads to increase the return on investment from the migration effort.
This framework is designed primarily for cloud architects and the cloud strategy teams leading cloud adoption
efforts. However, many topics in this framework are relevant to other roles across the business and IT. Cloud
architects frequently serve as facilitators to engage each of the relevant roles. This executive summary is
designed to prepare the various roles before facilitating conversations.

NOTE
This guidance is currently a public preview. Terminology, approaches, and guidance are being thoroughly tested with
customers, partners, and Microsoft teams during this preview. As such, the TOC and guidance may change slightly over
time.

Motivations
Cloud migrations can help companies achieve their desired business outcomes. Clear communication of
motivations, business drivers, and measurements of success are important foundations for making wise decisions
throughout cloud migration efforts. The following table classifies motivations to facilitate this conversation. It is
assumed that most companies will have motivations across each classification. The objective of this table is not to
limit outcomes, but instead make it easier to prioritize overall objectives and motivations:
CRITICAL BUSINESS EVENTS MIGRATION MOTIVATIONS INNOVATION MOTIVATIONS

Datacenter exit Cost savings Prepare for new technical capabilities

Mergers, acquisition, or divestiture Reduction in vendor or technical Build new technical capabilities
complexity
Reductions in capital expenses Modernize security posture and
Optimization of internal operations controls
End of support for mission-critical
technologies Increase business agility Scale to meet geographic or market
demands
Response to regulatory compliance Prepare for new technical capabilities
changes Improve customer experiences and
Scale to meet market demands engagements
Meet new data sovereignty
requirements Scale to meet geographic or market Transform products or services
demands
Reduce disruptions and improve IT Disrupt the market with new products
stability or services

When a response to critical business events is the highest priority, it is important to engage in cloud
implementation early, often in parallel with strategy and planning efforts. Taking such an approach requires a
growth mindset and a willingness to iteratively improve processes, based on direct lessons learned.
When migration motivations are a priority, strategy and planning will play a vital role early in the process.
However, it is highly suggested that implementation of the first workload is conducted in parallel with planning,
to help the team understand and plan for any learning curves associated with the cloud.
When innovation motivations are the highest priority, strategy and planning will require additional investments
early in the process to ensure balance in the portfolio and wise alignment of the investment made during cloud.
For more information about realizing innovation motivations, see Understand the innovation journey.
Preparing all participants across the migration effort with an awareness of the motivations will ensure wiser
decisions. The following migration methodology outlines how Microsoft suggests customers guide those
decisions in a consistent methodology.

Migration approach
The Cloud Adoption Framework establishes a high-level construct of Plan, Ready, Adopt to group the types of
effort required across any cloud adoption. This executive summary builds on that high-level flow to establish
iterative processes that can facilitate lift-shift-optimize efforts and modernization efforts in a single approach
across all cloud migration activities.
This approach consists of two methodologies or areas of focus: Cloud Strategy & Planning and Cloud
Implementation. The motivation or desired business outcome for a cloud migration often determines how much
a team should invest in strategy and planning and implementation. Those motivations can also influence
decisions to execute each sequentially or in parallel.

Cloud implementation
Cloud implementation is an iterative process for migrating and modernizing the digital estate in alignment with
targeted business outcomes and change management controls. During each iteration, workloads are migrated or
modernized in alignment with the strategy and plan. Decisions regarding IaaS, PaaS, or hybrid are made during
the assess phase to optimized control and execution. Those decisions will drive the tools used during the Migrate
phase. This model can be used with minimal strategy and planning. However, to ensure the greatest business
returns, it is highly suggested that both IT and the business align on a clear strategy and plan to guide
implementation activities.
The focus of this effort is the migration or modernization of workloads. A workload is a collection of
infrastructure, applications, and data that collectively supports a common business goal, or the execution of a
common business process. Examples of workloads could include things like a line-of-business application, an HR
payroll solution, a CRM solution, a financial document approval workflow, or a business intelligence solution.
Workloads may also include shared technical resources like a data warehouse that supports several other
solutions. In some cases, a workload could be represented by a single asset like a self-contained server,
application, or data platform.
Cloud migrations are often considered a single project within a broader program to streamline IT operations,
costs, or complexity. The cloud implementation methodology helps align the technical efforts within a series of
workload migrations to higher-level business values outlined in the cloud strategy and plan.
Getting started: To get started with a cloud implementation, the Azure migration guide and Azure setup guide
outline the tools and high-level processes needed to be successful in the execution of a cloud implementation.
Migrating your first workload using those guides will help the team overcome initial learning curves early in the
planning process. Afterwards, additional considerations should be given to the expanded scope checklist,
migration best practices and migration consideration, to align the baseline guidance with your effort's unique
constraints, processes, team structures, and objectives.

Cloud strategy and planning


Cloud strategy and planning is a methodology that focuses on aligning business outcomes, priorities, and
constraints to establish a clear migration strategy and plan. The resultant plan (or migration backlog) outlines the
approach to migration and modernization across the IT portfolio, which may span entire datacenters, multiple
workloads, or miscellaneous collections of infrastructure, applications, and data. Proper management of the IT
portfolio across cloud implementation efforts will help drive the desired business outcomes.
Getting started: The remainder of this article prepares the reader for the proper application of the Cloud
Adoption Framework's Cloud strategy and planning methodology. It also outlines additional resources and links
that can help the reader adopt this approach to guide cloud implementation efforts.
Methodology explained
The Cloud Adoption Framework's cloud strategy and planning methodology is based on an incremental
approach to cloud implementation that aligns to agile technology strategies, cultural maturity based on growth
mindset approaches, and strategies driven by business outcomes. This methodology consists of the following
high-level components that guide the implementation of each strategy.
As depicted in the image above, this framework aligns strategic decisions to a small number of contained
processes, which operate within an iterative model. While described in a linear document, each of the following
processes is expected to mature in parallel with iterations of the cloud implementation. The links for each process
will aid in defining the end state and the means of maturing toward the desired end state:
Plan: When technical implementation is aligned with clear business objectives, it's much easier to measure
and align success across multiple cloud implementation efforts, regardless of technical decisions.
Ready: Preparing the business, culture, people, and environment for coming changes leads to success in each
effort and accelerates implementation and change projects.
Adopt: Ensure proper implementation of desired changes, across IT and business processes, to achieve
business outcomes.
Migrate: Iterative execution of the cloud implementation methodology adhering to the tested process
of Assess, Migrate, Optimize, and Secure & Manage to create a repeatable process for migrating
workloads.
Innovate: Drive business value through innovation activities that unlock new technical skills and
expanded business capabilities.
Govern: Align corporate policy to tangible risks, mitigated through policy, process, and cloud-based
governance tooling.
Manage: Expand IT operations to ensure cloud-based solutions can be operated through secure, cost
effective processes using modern, cloud-first operations tools.
Organize: Align people and teams to deliver proper cloud operations and adoption.
Throughout this migration experience this framework will be used to address ambiguity, manage change, and
guide cross-functional teams through the realization of business outcomes.
Common cultural changes resulting from adherence to this methodology
The effort to realize the desired business outcomes may trigger slight changes to the culture of IT, to security, and
to some degree the culture of the business. The following are a few common cultural changes seen in this
process:
The IT and security teams are likely to adopt new skills to support workloads in the cloud.
Execution of a cloud migration encourages iterative or agile approaches.
Inclusion of cloud governance also tends to inspire DevOps approaches.
Creation of a cloud strategy team can lead to tighter integration between business and IT leaders.
Collectively, these changes tend to lead to greater business and IT agility.
Cultural change is not a goal of cloud migration or the Cloud Adoption Framework, but it is a commonly
experienced outcome. Cultural changes are not directly guided, instead subtle changes to the culture are
embedded in the suggested process improvements and approaches throughout the guidance.
Common technical efforts associated with this methodology
During implementation of the cloud strategy and plan the IT team will focus a large percentage of their time on
the migration of existing digital assets to the cloud. During this effort, minimal code changes are expected, but
can often be limited to configuration changes. In many cases, a strong business justification can be made for
modernization as part of the cloud migration.
Common workload examples
Cloud strategy and planning often target a broad collection of workloads and applications. Within the portfolio,
common application or workload types are typically migrated. The following are a few examples:
Line-of-business applications
Customer-facing applications
Third-party applications
Data analytics platforms
Globally distributed solutions
Highly scalable solutions
Common technologies migrated
The technologies migrated to the cloud constantly expand as cloud providers add new capabilities. The following
are a few examples of the technologies commonly seen in a migration effort:
Windows and SQL Server
Linux and Open Source (OSS ) databases
Unstructure/NoSQL databases
SAP on Azure
Analytics (Data Warehouse, Data Lake)

Next steps: Lifecycle solution


The Cloud Adoption Framework is a lifecycle solution. It is designed to help readers who are just beginning their
journey and as well as readers who are deep into their migration. As such, content is very context and audience
specific. Next steps are best aligned to the high-level process the reader would like to improve next.
Strategy
Plan
Ready
Migrate
Innovate
Govern
Manage
Organize
Innovate through cloud adoption
2 minutes to read • Edit Online

Cloud migration is an excellent option for your existing workloads. But creating new products and services
requires a different approach. The innovate methodology in the Cloud Adoption Framework establishes an
approach that guides the development of new products and services.

Motivations behind innovation


Innovation isn't the right adoption path for every workload. An innovation path typically requires a larger
investment in custom code and data management than other paths. Innovation also takes longer than migration
or other forms of modernization. Follow this path when you're targeting the following business outcomes:
Prepare for new technical capabilities.
Scale to meet market demands.
Scale to meet geographic demands.
Build new technical capabilities.
Improve customer experiences and engagements.
Transform products or services.
Disrupt the market with new products or services.

Workloads associated with cloud innovation


Candidate workloads for cloud innovation include:
Custom-built applications.
Technology-based experiences.
Integration of physical products and technology using IoT.
Ambient intelligence: Integration of nonintrusive technology into an environment.
Cognitive Services: Big Data, AI, Machine Learning, and predictive solutions.

Next steps
Begin your innovation journey using the innovate methodology.
Begin your innovation journey
The Cloud Adoption Framework is a free self-service tool that guides readers through various cloud adoption efforts. The
framework helps customers succeed at realizing business objectives that can be enabled by Microsoft Azure. However, this
content also recognizes that the reader may be addressing broad business, culture, or technical challenges and that sometimes
might require a cloud-neutral position. Therefore, each section of this guidance begins with an Azure-first approach, and then
follows with cloud-neutral theory that can scale across many business and technical decisions.
Throughout this framework, enablement is a core theme. The following checklist itemizes fundamental cloud adoption principles
that ensure an adoption journey is considered successful by both IT and the business:
Plan: Establishing clear business outcomes, a clearly defined digital estate plan, and well-understood adoption backlogs.
Ready: Ensure the readiness of staff through skills and learning plans.
Operate: Define a manageable operating model to guide activities during and long after adoption.
Organize: Align people and teams to deliver proper cloud operations and adoption.
Govern: Align proper governance disciplines to consistently apply cost management, risk mitigation, compliance, and
security baselines across all cloud adoption.
Manage: Ongoing operational management of the IT portfolio to minimize interruptions to business processes and
ensure stability of the IT portfolio.
Support: Align proper partnership and support options.
Another core theme is security, which is a critical quality attribute for a successful cloud adoption. Security is integrated
throughout this framework to provide integrated guidance on maintaining confidentiality, integrity, and availability assurances for
your cloud workloads.

Additional tools
In addition to the Cloud Adoption Framework, Microsoft covers additional topics that can enable success. This article highlights a
few common tools that can significantly improve success beyond the scope of the Cloud Adoption Framework. Establishing cloud
governance, resilient architectures, technical skills, and a DevOps approach are each important to the success of any cloud
adoption effort. Bookmark this page as a resource to revisit throughout any cloud adoption journey.

Cloud Governance
Understand business risk, map those risks to proper policies and processes. Using cloud governance tools and the Five
Disciplines of Cloud Governance minimizes risks and improves the likelihood of success. Cloud governance helps control
costs, create consistency, improve security, and accelerate deployment.

Reliable Architecture (Resiliency)


Building a reliable application in the cloud is different from traditional application development. While historically you may
have purchased higher-end hardware to scale up, in a cloud environment you scale out instead of up. Instead of trying to
prevent failures altogether, the goal is to minimize the effects of a single failing component.

Technical Skills Building


The greatest tool to aid in cloud adoption success is a well-trained team. Expand the skills of existing business and technical
team members with the help of focused learning paths.

DevOps Approach
Microsoft's historic transformation is rooted firmly in a Growth Mindset approach to culture and a DevOps approach to
technical execution. The Cloud Adoption Framework embeds both throughout the framework. To accelerate DevOps
adoption, review the learning DevOps content

Azure Architecture Center


Architecture solutions, reference architectures, example scenarios, best practices, and cloud design patterns to aid in the
architecture of solutions running on Azure.

Azure Pricing Calculator


Calculate the cost of various Azure components required to create or migration a chosen solution.

Next steps
Armed with an understanding of the top enabling aspects of the Cloud Adoption Framework, the likelihood of success in a
Migrate or Innovate effort will be that much higher.
Migrate
Innovate
The cloud delivers fundamental technology benefits that can help your enterprise execute multiple business strategies. By using
cloud-based approaches, you can improve business agility, reduce costs, accelerate time to market, and enable expansion into
new markets. To take advantage of this great potential, start by documenting your business strategy in a way that's both
understandable to cloud technicians and palatable to your business stakeholders.

Cloud adoption strategy process


The exercises in this section can help you document your business strategy efficiently. By using this approach, you can drive
adoption efforts that capture targeted business value in a cross-functional model. You can then map your cloud adoption
strategy to specific cloud capabilities and business strategies to reach your desired state of transformation.

Motivations
Meet with key stakeholders and executives to document the motivations behind cloud adoption.

Business outcomes
Engage motivated stakeholders and executives to document specific business outcomes.

Business justification
Develop a business case to validate the financial model that supports your motivations and outcomes.

Choose the right first project


Your first cloud adoption project will help align motivations with technical effort. This article can help you choose your first
project wisely.

To help build out your cloud adoption strategy, download the Microsoft Cloud Adoption Plan template, and then track the output
of each exercise.

Next steps
Start building your cloud adoption strategy by documenting the motivations behind cloud adoption.
Document motivations
Motivations: Why are we moving to the cloud?
3 minutes to read • Edit Online

"Why are we moving to the cloud?" is a common question for business and technical stakeholders alike. If the
answer is, "Our board (or CIO, or C -level executives) told us to move to the cloud," it's unlikely that the business
will achieve the desired outcomes.
This article discusses a few motivations behind cloud migration that can help produce more successful business
outcomes. These options help facilitate a conversation about motivations and, ultimately, business outcomes.

Motivations
Business transformations that are supported by cloud adoption can be driven by various motivations. It's likely
that several motivations apply at the same time. The goal of the lists in the following table is to help spark ideas
about which motivations are relevant. From there, you can prioritize and assess the potential impacts of the
motivations. In this article, we recommend that your cloud adoption team meet with various executives and
business leaders using the list below to understand which of these motivations are affected by the cloud
adoption effort.

CRITICAL BUSINESS EVENTS MIGRATION INNOVATION

Datacenter exit Cost savings Preparation for new technical


capabilities
Merger, acquisition, or divestiture Reduction in vendor or technical
complexity Building new technical capabilities
Reduction in capital expenses
Optimization of internal operations Scaling to meet market demands
End of support for mission-critical
technologies Increase in business agility Scaling to meet geographic demands

Response to regulatory compliance Preparation for new technical Improved customer experiences and
changes capabilities engagements

New data sovereignty requirements Scaling to meet market demands Transformation of products or services

Reduction of disruptions and Scaling to meet geographic demands Market disruption with new products
improvement of IT stability or services

Classify your motivations


Your motivations for cloud adoption will likely fall into multiple categories. As you're building the list of
motivations, trends will likely emerge. Motivations tend to be associated more with one classification than with
others. Use the predominant classification to help guide the development of your cloud adoption strategy.
When a response to critical business events is the highest priority, it's important to engage early in cloud
implementation, often in parallel with strategy and planning efforts. Taking this approach requires a growth
mindset and a willingness to iteratively improve processes, based on direct lessons learned.
When migration is the highest priority, strategy and planning will play a vital role early in the process. We
recommend that you implement the first workload in parallel with planning, to help the team understand and
anticipate any learning curves that are associated with cloud adoption.
When innovation is the highest priority, strategy and planning will require additional investments early in the
process to ensure balance in the portfolio and wise alignment of the investment made during cloud adoption.
For further information and guidance, see Understand the innovation journey.
To ensure wiser decision-making, all participants in the migration process should have a clear awareness of their
motivations. The following section outlines how customers can guide and effect wiser decisions through
consistent, strategic methodologies.

Motivation-driven strategies
This section highlights the Migration and Innovation motivations and their corresponding strategies.
Migration
The Migration motivations listed near the top of the Motivations table are the most common, but not necessarily
the most significant, reasons for adopting the cloud. These outcomes are important to achieve, but they're most
effectively used to transition to other, more useful worldviews. This important first step to cloud adoption is often
called a cloud migration. The framework refers to the strategy for executing a cloud migration by using the term
Migrate.
Some motivations align well with a migrate strategy. The motives at the top of this list will likely have
significantly less business impact than those toward the bottom of the list.
Cost savings.
Reduction in vendor or technical complexity.
Optimization of internal operations.
Increasing business agility.
Preparing for new technical capabilities.
Scaling to meet market demands.
Scaling to meet geographic demands.
Innovation
Data is the new commodity. Modern apps are the supply chain that drives that data into various experiences. In
today's business market, it's hard to find a transformative product or service that isn't built on top of data,
insights, and customer experiences. The motivations that appear lower in the Innovation list align to a
technology strategy referred to in this framework as Innovate.
The following list includes motivations that cause an IT organization to focus more on an innovate strategy than
a migrate strategy.
Increasing business agility.
Preparing for new technical capabilities.
Building new technical capabilities.
Scaling to meet market demands.
Scaling to meet geographic demands.
Improving customer experiences and engagements.
Transforming products or services.

Next steps
Understanding projected business outcomes helps facilitate the conversations that you need to have as you
document your motivations and supporting metrics, in alignment with your business strategy. Next, read an
overview of business outcomes that are commonly associated with a move to the cloud.
Overview of business outcomes
What business outcomes are associated with
transformation journeys?
2 minutes to read • Edit Online

The most successful transformation journeys start with a business outcome in mind. Cloud adoption can be a
costly and time-consuming effort. Fostering the right level of support from IT and other areas of the business is
crucial to success. The Microsoft business outcome framework is designed to help customers identify business
outcomes that are concise, defined, and drive observable results or change in business performance, supported
by a specific measure.
During any cloud transformation, the ability to speak in terms of business outcomes supports transparency and
cross-functional partnerships. The business outcome framework starts with a simple template to help
technically minded individuals document and gain consensus. This template can be used with several business
stakeholders to collect a variety of business outcomes, which could each be influenced by a company's
transformation journey. Feel free to use this template electronically or, better still, draw it on a whiteboard to
engage business leaders and stakeholders in outcome-focused discussions.
To learn more about business outcomes and the business outcome template, see documenting business
outcomes, or download the business outcome template.

Prepare for conversations with different personas


The following are a few business outcomes that tend to trigger conversations with various personas:
Finance leadership: Increase profitability while driving compliance.
Marketing: Acquire and retain customers, build reputation.
Sales: Accelerate sales, improve customer lifetime value.
Human Resources: Retain, recruit, and empower employees.

Sample outcomes by category


Speaking in business outcomes can feel like a foreign language to many technically minded individuals. To help
ease translation, Microsoft curates a set of business outcome examples in the business outcome framework. Use
these samples to help inspire and demonstrate business outcomes that are based on actual transformation
journeys.
To help you find business outcomes more easily, we've separated them into the following categories. This
approach tends to drive consensus-building conversations across business units.
Fiscal outcomes
Financial or fiscal performance is the cleanest business outcome for many business leaders, but not the only
one.
View samples of fiscal outcomes.
Agility outcomes
Today's fast-changing business environment places a premium on time. The ability to respond to and drive
market change quickly is the fundamental measure of business agility.
View samples of agility outcomes.
Reach outcomes
In a constantly shrinking market, global reach (ability to support global customers and users) can be measured
by compliance in geographies that are relevant to the business.
View outcomes related to global reach.
Customer engagement outcomes
Social marketplaces are redefining winners and losers at an unheard-of pace. Responding to user needs is a key
measure of customer engagement.
Learn more about customer engagement outcomes.
Performance outcomes
Performance and reliability are assumed. When either falters, reputation damage can be painful and long-
lasting.
Learn more about performance outcomes.
Each of the business outcomes listed in the preceding categories can help facilitate a focused conversation
among your business and technical team members. However, you shouldn't limit your conversations to these
generic samples. Understanding the unique needs of your own business, and building outcomes that match,
maximizes the value of a cloud transformation.

Next steps
Learn more about fiscal outcomes.
Fiscal outcomes
Examples of fiscal outcomes
7 minutes to read • Edit Online

At the top level, fiscal conversations consist of three basic concepts:


Revenue: Will more money come into the business as a result of the sales of goods or services.
Cost: Will less money be spent in the creation, marketing, sales, or delivery of goods or services.
Profit: Although they're rare, some transformations can both increase revenue and decrease costs. This is a
profit outcome.
The remainder of this article explains these fiscal outcomes in the context of a cloud transformation.

NOTE
The following examples are hypothetical and should not be considered a guarantee of returns when adopting any cloud
strategy.

Revenue outcomes
New revenue streams
The cloud can help create opportunities to deliver new products to customers or deliver existing products in a new
way. New revenue streams are innovative, entrepreneurial, and exciting for many people in the business world.
New revenue streams are also prone to failure and are considered by many companies to be high risk. When
revenue-related outcomes are proposed by IT, there will likely be resistance. To add credibility to these outcomes,
partner with a business leader who's a proven innovator. Validation of the revenue stream early in the process
helps avoid roadblocks from the business.
Example: A company has been selling books for over a hundred years. An employee of the company realizes
that the content can be delivered electronically. The employee creates a device that can be sold in the
bookstore, which allows the same books to be downloaded directly, driving $X in new book sales.
Revenue increases
With global scale and digital reach, the cloud can help businesses to increase revenues from existing revenue
streams. Often, this type of outcome comes from an alignment with sales or marketing leadership.
Example: A company that sells widgets could sell more widgets, if the salespeople could securely access the
company's digital catalog and stock levels. Unfortunately, that data is only in the company's ERP system, which
can be accessed only via a network-connected device. Creating a service façade to interface with the ERP and
exposing the catalog list and nonsensitive stock levels to an application in the cloud would allow the
salespeople to access the data they need while onsite with a customer. Extending on-premises Active Directory
using Azure Active Directory (Azure AD ) and integrating role-based access into the application would allow
the company to help ensure that the data stays safe. This simple project could affect revenue from an existing
product line by x%.
Profit increases
Seldom does a single effort simultaneously increase revenue and decrease costs. However, when it does, align the
outcome statements from one or more of the revenue outcomes with one or more of the cost outcomes to
communicate the desired outcome.
Cost outcomes
Cost reduction
Cloud computing can reduce capital expenses for hardware and software, setting up datacenters, running on-site
datacenters, and so on. The costs of racks of servers, round-the-clock electricity for power and cooling, and IT
experts for managing the infrastructure add up fast. Shutting down a datacenter can reduce capital expense
commitments. This is commonly referred to as "getting out of the datacenter business." Cost reduction is typically
measured in dollars in the current budget, which could span one to five years depending on how the CFO
manages finances.
Example #1: A company's datacenter consumes a large percentage of the annual IT budget. IT chooses to
conduct a cloud migration and transitions the assets in that datacenter to infrastructure as a service (IaaS )
solutions, creating a three-year cost reduction.
Example #2: A holding company recently acquired a new company. In the acquisition, the terms dictate that
the new entity should be removed from the current datacenters within six months. Failure to do so will result in
a fine of 1 million USD per month to the holding company. Moving the digital assets to the cloud in a cloud
migration could allow for a quick decommission of the old assets.
Example #3: An income tax company that caters to consumers experiences 70 percent of its annual revenue
during the first three months of the year. The remainder of the year, its large IT investment sits relatively
dormant. A cloud migration could allow IT to deploy the compute/hosting capacity required for those three
months. During the remaining nine months, the IaaS costs could be significantly reduced by shrinking the
compute footprint.
Example: Coverdell
Coverdell modernizes their infrastructure to drive record cost savings with Azure. Coverdell's decision to invest in
Azure, and to unite their network of websites, applications, data, and infrastructure within this environment, led to
more cost savings than the company could have ever expected. The migration to an Azure-only environment
eliminated 54,000 USD in monthly costs for colocation services. With the company's new, united infrastructure
alone, Coverdell expects to save an estimated 1 million USD over the next two to three years.

"Having access to the Azure technology stack opens the door for some scalable, easy-to-implement, and
highly available solutions that are cost effective. This allows our architects to be much more creative with the
solutions they provide."
Ryan Sorensen
Director of Application Development and Enterprise Architecture
Coverdell

Cost avoidance
Terminating a datacenter can also provide cost avoidance, by preventing future refresh cycles. A refresh cycle is
the process of buying new hardware and software to replace aging on-premises systems. In Azure, hardware and
OS are routinely maintained, patched, and refreshed at no additional cost to customers. This allows a CFO to
remove planned future spend from long-term financial forecasts. Cost avoidance is measured in dollars. It differs
from cost reduction, generally focusing on a future budget that has not been fully approved yet.
Example: A company's datacenter is up for a lease renewal in six months. The datacenter has been in service
for eight years. Four years ago, all servers were refreshed and virtualized, costing the company millions of
dollars. Next year, the company plans to refresh the hardware and software again. Migrating the assets in that
datacenter as part of a cloud migration would allow cost avoidance by removing the planned refresh from next
year's forecasted budget. It could also produce cost reduction by decreasing or eliminating the real estate lease
costs.
Capital expenses vs. operating expenses
Before you discuss cost outcomes, it's important to understand the two primary cost options: capital expenses and
operating expenses.
The following terms will help you understand the differences between capital expenses and operating expenses
during business discussions about a transformation journey.
Capital is the money and assets owned by a business to contribute to a particular purpose, such as increasing
server capacity or building an application.
Capital expenditures generate benefits over a long period. These expenditures are generally nonrecurring
and result in the acquisition of permanent assets. Building an application could qualify as a capital expenditure.
Operating expenditures are ongoing costs of doing business. Consuming cloud services in a pay-as-you-go
model could qualify as an operating expenditure.
Assets are economic resources that can be owned or controlled to produce value. Servers, data lakes, and
applications can all be considered assets.
Depreciation is a decrease in the value of an asset over time. More relevant to the capital expense versus
operating expense conversation, depreciation is how the costs of an asset are allocated across the periods in
which they are used. For instance, if you build an application this year but it's expected to have an average shelf
life of five years (like most commercial apps), the cost of the dev team and necessary tools required to create
and deploy the code base would be depreciated evenly over five years.
Valuation is the process of estimating how much a company is worth. In most industries, valuation is based
on the company's ability to generate revenue and profit, while respecting the operating costs required to create
the goods that provide that revenue. In some industries, such as retail, or in some transaction types, such as
private equity, assets and depreciation can play a large part in the company's valuation.
It is often a safe bet that various executives, including the chief investment officer (CIO ), debate the best use of
capital to grow the company in the desired direction. Giving the CIO a means of converting contentious capital
expense conversations into clear accountability for operating expenses could be an attractive outcome by itself. In
many industries, chief financial officers (CFOs) are actively seeking ways of better associating fiscal accountability
to the cost of goods being sold.
However, before you associate any transformation journey with this type of capital versus operating expense
conversion, it's wise to meet with members of the CFO or CIO teams to see which cost structure the business
prefers. In some organizations, reducing capital expenses in favor of operating expenses is a highly undesirable
outcome. As previously mentioned, this approach is sometimes seen in retail, holding, and private equity
companies that place higher value on traditional asset accounting models, which place little value on IP. It's also
seen in organizations that had negative experiences when they outsourced IT staff or other functions in the past.
If an operating expense model is desirable, the following example could be a viable business outcome:
Example: The company's datacenter is currently depreciating at x USD per year for the next three years. It is
expected to require an additional y USD to refresh the hardware next year. We can convert the capital expenses
to an operating expense model at an even rate of z USD per month, allowing for better management of and
accountability for the operating costs of technology.

Next steps
Learn more about agility outcomes.
Agility outcomes
Examples of agility outcomes
3 minutes to read • Edit Online

As discussed in the business outcomes overview, several potential business outcomes can serve as the foundation
for any transformation journey conversation with the business. This article focuses on the timeliest business
measure: business agility. Understanding your company's market position and competitive landscape can help you
articulate the business outcomes that are the target of the business's transformation journey.
Traditionally, chief investment officers (CIOs) and IT teams were considered a source of stability in core mission-
critical processes. This is still true. Few businesses can function well when their IT platform is unstable. However,
in today's business world, much more is expected. IT can expand beyond a simple cost center by partnering with
the business to provide market advantages. Many CIOs and executives assume that stability is simply a baseline
for IT. For these leaders, business agility is the measure of IT's contribution to the business.

Why is agility so important?


Markets change at a faster pace today than ever before. As of 2015, only 57 companies were still in the Fortune
500 61 years later—an 88.6 percent turnover rate. This represents market change at a previously unheard-of rate.
IT agility or even business agilities are unlikely to affect an organization listing on the Fortune 500, but these
figures help us understand the pace at which markets continue to change.
For incumbents and upstarts alike, business agility can be the difference between success or failure of a business
initiative. Quickly adapting to market changes can help ring-fence existing customers or claim market share from
competitors. The agility-related outcomes in the next sections can help communicate the value of the cloud during
a transformation.

Time-to-market outcome
During cloud-enabled innovation efforts, time to market is a key measure of IT's ability to address market change.
In many cases, a business leader might have existing budget for the creation of an application or the launch of a
new product. Clearly communicating a time-to-market benefit can motivate that leader to redirect budget to IT's
transformation journey.
Example 1: The European division of a US -based company needs to comply with GDPR regulations by
protecting customer data in a database that supports UK operations. The existing version of SQL doesn't
support the necessary row -level security. An in-place upgrade would be too disruptive. Using Azure SQL to
replicate and upgrade the database, the customer adds the necessary compliance measure in a matter of
weeks.
Example 2: A logistics company has discovered an untapped segment of the market, but it needs a new
version of their flagship application to capture this market share. Their larger competitor has made the
same discovery. Through the execution of a cloud-enabled application innovation effort, the company
embraces customer obsession and a DevOps-driven development approach to beat their slower, legacy
competitor by x months. This jump on market entrance secured the customer base.
Aurora Health Care
Healthcare system transforms online services into a friendly digital experience. To transform its digital services,
Aurora Health Care migrated its websites to the Microsoft Azure platform and adopted a strategy of continuous
innovation.

"As a team, we're focused on high-quality solutions and speed. Choosing Azure was a very transformative
decision for us."
Jamey Shiels
Vice President of Digital Experience
Aurora Health Care

Provision time
When business demands new IT services or scale to existing services, acquisition and provision of new hardware
or virtual resources can take weeks. After cloud migration, IT can more easily enable self-service provisioning,
allowing the business to scale in hours.
Example: A consumer packaged goods company requires the creation and tear-down of hundreds of database
clusters per year to fulfill operational demands of the business. The on-premises virtual hosts can provision
quickly, but the process of recovering virtual assets is slow and requires significant time from the team. As
such, the legacy on-premises environment suffers from bloat and can seldom keep up with demand. After
cloud migration, IT can more easily provide scripted self-provisioning of resources, with a chargeback
approach to billing. Together, this allows the business to move as quickly as they need, but still be accountable
for the cost of the resources they demand. Doing so in the cloud limits deployments to the business's budget
only.

Next steps
Learn more about reach outcomes.
Reach outcomes
Examples of global reach outcomes
2 minutes to read • Edit Online

As discussed in business outcomes, several potential business outcomes can serve as the foundation for any
transformation journey conversation with the business. This article focuses on a common business measure:
reach. Understanding the company's globalization strategy will help to better articulate the business outcomes
that are the target of a business's transformation journey.
Across the Fortune 500 and smaller enterprises, globalization of services and customer base has been a focus for
over three decades. As the world shrinks, it is increasingly likely for any business to engage in global commerce.
Supporting global operations is challenging and costly. Hosting datacenters around the world can consume more
than 80 percent of an annual IT budget. By themselves, wide area networks using private lines to connect those
datacenters can cost millions of dollars per year.
Cloud solutions move the cost of globalization to the cloud provider. In Azure, customers can quickly deploy
resources in the same region as customers or operations without having to buy and provision a datacenter.
Microsoft owns one of the largest wide area networks in the world, connecting datacenters around the globe.
Connectivity and global operating capacity are available to global customers on demand.

Global access
Expanding into a new market can be one of the most valuable business outcomes during a transformation. The
ability to quickly deploy resources in market without a longer-term commitment allows sales and operations
leaders to explore options that wouldn't have been considered in the past.
Example: A cosmetics manufacturer has identified a trend. Some products are being shipped to the Asia
Pacific region even though no sales teams are operating in that region. The minimum systems required by a
remote sales force are small, but latency prevents a remote access solution. To capitalize on this trend, the VP
of sales would like to experiment with sales teams in Japan and Korea. Because the company has undergone a
cloud migration, it was able to deploy the necessary systems in both Japan and Korea within days. This allowed
the VP of Sales to grow revenue in the region by x percent within three months. Those two markets continue to
outperform other parts of the world, leading to sales operations throughout the region.

Data sovereignty
Operating in new markets introduces additional governance constraints. GDPR is one example of governance
criteria that could cause significant financial recourse. Azure provides compliance offerings that help customers
meet compliance obligations across regulated industries and global markets. For more information, see the
overview of Microsoft Azure compliance.
Example: A US -based utilities provider was awarded a contract to provide utilities in Canada. Canadian data
sovereignty law requires that Canadian data stay in Canada. This company had been working their way
through a cloud-enabled application innovation effort for years. As a result, their software was able to be
deployed through fully scripted DevOps processes. With a few minor changes to the code base, they were able
to deploy a working copy of the code to an Azure datacenter in Canada, meeting data sovereignty compliance
and keeping the customer.

Next steps
Learn more about customer engagement outcomes.
Customer engagement outcomes
Examples of customer engagement outcomes
2 minutes to read • Edit Online

As discussed in the business outcomes overview, several potential business outcomes can serve as the foundation
for any transformation journey conversation with the business. This article focuses on a common business
measure: customer engagement. Understanding the needs of customers and the ecosystem around customers
helps with articulating the business outcomes that are the target of a business's transformation journey.
During cloud-enabled data innovation efforts, customer engagement is assumed. Aggregating data, testing
theories, advancing insights, and informing cultural change; each of these disruptive functions requires a high
degree of customer engagement. During a cloud-enabled application innovation effort, this type of customer
engagement is a maturation goal.
Customer engagement outcomes are all about meeting and exceeding customer expectations. As a baseline for
customer engagements, customers assume that products and services are performant and reliable. When they are
not, it's easy for an executive to understand the business value of performance and reliability outcomes. For more
advanced companies, speed of integrating learnings and observations is a fundamental business outcome.
The following are examples and outcomes related to customer engagement:

Cycle time
During customer-obsessed transformations, like a cloud-enabled application innovation effort, customers respond
from direct engagement and the ability to see their needs met quickly by the development team. Cycle time is a
Six Sigma term that refers to the duration from the start to finish of a function. For business leaders who are
customer-obsessed and investing heavily in improving customer engagement, cycle time can be a strong business
outcome.
Example: A services company that provides business-to-business (B2B ) services is attempting to hold on to
market share in a competitive market. Customers who've left for a competing service provider have stated that
their overly complex technical solution interferes with their business processes and is the primary reason for
leaving. In this case, cycle time is imperative. Today, it takes 12 months for a feature to go from request to
release. If it's prioritized by the executive team, that cycle can be reduced to six to nine months. Through a
cloud-enabled application innovation effort, cloud-native application models and Azure DevOps integration,
the team was able to cut cycle time down to one month, allowing the business and application development
teams to interact more directly with customers.

ExakTime
Labor management breaks free of on-premises constraints with cloud technology. With Microsoft Azure,
ExakTime is moving toward streamlined agile product development, while the company's clients enjoy a more
robust and easier-to-use product, full of new features.
"Now, a developer can sit down at his machine, have an idea, spin up a web service or an Azure instance, test
out his idea, point it at test data, and get the concept going. In the time that it would have taken to provision
just the stuff to do a test, we can actually write the functionality."
Wayne Wise
Vice President of Software Development
ExakTime

Next steps
Learn more about performance outcomes.
Performance outcomes
Examples of performance outcomes
2 minutes to read • Edit Online

As discussed in business outcomes, several potential business outcomes can serve as the foundation for any
transformation journey conversation with the business. This article focuses on a common business measure:
performance.
In today's technological society, customers assume that applications will perform well and always be available.
When this expectation isn't met, it causes reputation damage that can be costly and long-lasting.

Performance
The biggest cloud computing services run on a worldwide network of secure datacenters, which are regularly
upgraded to the latest generation of fast and efficient computing hardware. This provides several benefits over a
single corporate datacenter, such as reduced network latency for applications and greater economies of scale.
Transform your business and reduce costs with an energy-efficient infrastructure that spans more than 100 highly
secure facilities worldwide, linked by one of the largest networks on earth. Azure has more global regions than
any other cloud provider. This translates into the scale that's required to bring applications closer to users around
the world, preserve data residency, and provide comprehensive compliance and resiliency options for customers.
Example 1: A services company was working with a hosting provider that hosted multiple operational
infrastructure assets. Those systems suffered from frequent outages and poor performance. The company
migrated its assets to Azure to take advantage of the SLA and performance controls of the cloud. The
downtime that it suffered cost it approximately 15,000 USD per minute of outage. With four to eight hours
of outage per month, it was easy to justify this organizational transformation.
Example 2: A consumer investment company was in the early stages of a cloud-enabled application
innovation effort. Agile processes and DevOps were maturing well, but application performance was spiky.
As a more mature transformation, the company started a program to monitor and automate sizing based
on usage demands. The company was able to eliminate sizing issues by using Azure performance
management tools, resulting in a surprising 5 percent increase in transactions.

Reliability
Cloud computing makes data backup, disaster recovery, and business continuity easier and less expensive,
because data can be mirrored at multiple redundant sites on the cloud provider's network.
One of IT's crucial functions is ensuring that corporate data is never lost and applications stay available despite
server crashes, power outages, or natural disasters. You can keep your data safe and recoverable by backing it up
to Azure.
Azure Backup is a simple solution that decreases your infrastructure costs while providing enhanced security
mechanisms to protect your data against ransomware. With one solution, you can protect workloads that are
running in Azure and on-premises across Linux, Windows, VMware, and Hyper-V. You can ensure business
continuity by keeping your applications running in Azure.
Azure Site Recovery makes it simple to test disaster recovery by replicating applications between Azure regions.
You can also replicate on-premises VMware and Hyper-V virtual machines and physical servers to Azure to stay
available if the primary site goes down. And you can recover workloads to the primary site when it's up and
running again.
Example: An oil and gas company used Azure technologies to implement a full site recovery. The company
chose not to fully embrace the cloud for day-to-day operations, but the cloud's disaster recovery and business
continuity (DRBC ) features still protected their datacenter. As a hurricane formed hundreds of miles away, their
implementation partner started recovering the site to Azure. Before the storm touched down, all mission-
critical assets were running in Azure, preventing any downtime.

Next steps
Learn how to use the business outcome template.
Use the business outcome template
How to use the business outcome template
2 minutes to read • Edit Online

As discussed in the business outcomes overview, it can be difficult to bridge the gap between business and
technical conversations. This simple template is designed to help teams uniformly capture business outcomes to
be used later in the development of customer transformation journey strategies.
Download the business outcome template spreadsheet to begin brainstorming and tracking business outcomes.
Continue reading to learn how to use the template. Review the business outcomes section for ideas on potential
business outcomes that could come up in executive conversations.

Use the business outcome template


Introduced by Kate Johnson at the Microsoft Digital Transformation Academy, business outcomes focus on three
topics:
Aligning to stakeholders or business decision makers
Understanding business drivers and objectives
Mapping outcomes to specific solutions and technical capability

Figure 1 - Business outcomes visualized as a house with stakeholders, over business outcomes, over technical
capabilities.
The business outcome template focuses on simplified conversations that can quickly engage stakeholders without
getting too deep into the technical solution. By rapidly understanding and aligning the key performance indicators
(KPIs) and business drivers that are important to stakeholders, your team can think about high-level approaches
and transformations before diving into the implementation details.
An example can be found on the "Example Outcome" tab of the spreadsheet, as shown below. To track multiple
outcomes, add them to the "Collective Outcomes" tab.
Figure 2 - Example of a business outcome template.

Why is this template relevant?


Discovery is a fundamental tenet of enterprise architecture. If discovery is limited to technical discovery, the
solution is likely to miss many opportunities to improve the business. Enterprise architects, solution architects, and
other technically minded leaders can master the discovery process by using this template. In effective discovery
processes, these leaders consider five key aspects of the business outcome before leading a transformation
journey, as shown in the following image:

Figure 3 - Five areas of focus in discovery: stakeholders, outcomes, drivers, KPIs, and capabilities.
Stakeholders: Who in the organization is likely to see the greatest value in a specific business outcome? Who is
most likely to support this transformation, especially when things get tough or time consuming? Who has the
greatest stake in the success of this transformation? This person is a potential stakeholder.
Business outcomes: A business outcome is a concise, defined, and observable result or change in business
performance, supported by a specific measure. How does the stakeholder want to change the business? How will
the business be affected? What is the value of this transformation?
Business drivers: Business drivers capture the current challenge that's preventing the company from achieving
desired outcomes. They can also capture new opportunities that the business can capitalize on with the right
solution. How would you describe the current challenges or future state of the business? What business functions
would be changing to meet the desired outcomes?
KPIs: How will this change be measured? How does the business know if they are successful? How frequently will
this KPI be observed? Understanding each KPI helps enable incremental change and experimentation.
Capabilities: When you define any transformation journey, how will technical capabilities accelerate realization of
the business outcome? What applications must be included in the transformation to achieve business objectives?
How do various applications or workloads get prioritized to deliver on capabilities? How do parts of the solution
need to be expanded or rearchitected to meet each of the outcomes? Can execution approaches (or timelines) be
rearranged to prioritize high-impact business outcomes?

Next steps
Learn about aligning your technical efforts to meaningful learning metrics.
Align your technical efforts
How can we align efforts to meaningful learning
metrics?
3 minutes to read • Edit Online

The business outcomes overview discussed ways to measure and communicate the impact a transformation will
have on the business. Unfortunately, it can take years for some of those outcomes to produce measurable results.
The board and C -suite are unhappy with reports that show a 0% delta for long periods of time.
Learning metrics are interim, shorter-term metrics that can be tied back to longer-term business outcomes. These
metrics align well with a growth mindset and help position the culture to become more resilient. Rather than
highlighting the anticipated lack of progress toward a long-term business goal, learning metrics highlight early
indicators of success. The metrics also highlight early indicators of failure, which are likely to produce the greatest
opportunity for you to learn and adjust the plan.
As with much of the material in this framework, we assume you're familiar with the transformation journey that
best aligns with your desired business outcomes. This article will outline a few learning metrics for each
transformation journey to illustrate the concept.

Cloud migration
This transformation focuses on cost, complexity, and efficiency, with an emphasis on IT operations. The most easily
measured data behind this transformation is the movement of assets to the cloud. In this kind of transformation,
the digital estate is measured by virtual machines (VMs), racks or clusters that host those VMs, datacenter
operational costs, required capital expenses to maintain systems, and depreciation of those assets over time.
As VMs are moved to the cloud, dependence on on-premises legacy assets is reduced. The cost of asset
maintenance is also reduced. Unfortunately, businesses can't realize the cost reduction until clusters are
deprovisioned and datacenter leases expire. In many cases, the full value of the effort isn't realized until the
depreciation cycles are complete.
Always align with the CFO or finance office before making financial statements. However, IT teams can generally
estimate current monetary cost and future monetary cost values for each VM based on CPU, memory, and
storage consumed. You can then apply that value to each migrated VM to estimate the immediate cost savings
and future monetary value of the effort.

Application innovation
Cloud-enabled application innovation focuses largely on the customer experience and the customer's willingness
to consume products and services provided by the company. It takes time for increments of change to affect
consumer or customer buying behaviors. But application innovation cycles tend to be much shorter than they are
in the other forms of transformation. The traditional advice is that you should start with an understanding of the
specific behaviors that you want to influence and use those behaviors as the learning metrics. For example, in an
e-commerce application, total purchases or add-on purchases could be the target behavior. For a video company,
time watching video streams could be the target.
The challenge with customer behavior metrics is that they can easily be influenced by outside variables. So it's
often important to include related statistics with the learning metrics. These related statistics can include release
cadence, bugs resolved per release, code coverage of unit tests, number of page views, page throughput, page
load time, and other app performance metrics. Each can show different activities and changes to the code base and
the customer experience to correlate with higher-level customer behavior patterns.
Data innovation
Changing an industry, disrupting markets, or transforming products and services can take years. In a cloud-
enabled data innovation effort, experimentation is key to measuring success. Be transparent by sharing prediction
metrics like percent probability, number of failed experiments, and number of models trained. Failures will
accumulate faster than successes. These metrics can be discouraging, and the executive team must understand the
time and investment needed to use these metrics properly.
On the other hand, some positive indicators are often associated with data-driven learning: centralization of
heterogeneous data sets, data ingress, and democratization of data. While the team is learning about the customer
of tomorrow, real results can be produced today. Supporting learning metrics could include:
Number of models available
Number of partner data sources consumed
Devices producing ingress data
Volume of ingress data
Types of data
An even more valuable metric is the number of dashboards created from combined data sources. This number
reflects the current-state business processes that are affected by new data sources. By sharing new data sources
openly, your business can take advantage of the data by using reporting tools like Power BI to produce
incremental insights and drive business change.

Next steps
After learning metrics are aligned, you're ready to start assessing the digital estate against those metrics. The
result will be a transformation backlog or migration backlog.
Assess the digital estate
Build a business justification for cloud migration
8 minutes to read • Edit Online

Cloud migrations can generate early return on investment (ROI) from cloud transformation efforts. But
developing a clear business justification with tangible, relevant costs and returns can be a complex process. This
article will help you think about what data you need to create a financial model that aligns with cloud migration
outcomes. First, let's dispel a few myths about cloud migration, so your organization can avoid some common
mistakes.

Dispelling cloud migration myths


Myth: The cloud is always cheaper. It's commonly believed that operating a datacenter in the cloud is always
cheaper than operating one on-premises. While this assumption might generally be true, it's not always the case.
Sometimes cloud operating costs are higher. These higher costs are often caused by poor cost governance,
misaligned system architectures, process duplication, atypical system configurations, or greater staffing costs.
Fortunately, you can mitigate many of these problems to create early ROI. Following the guidance in Build the
business justification can help you detect and avoid these misalignments. Dispelling the other myths described
here can help too.
Myth: Everything should go into the cloud. In fact, some business drivers might lead you to choose a hybrid
solution. Before you finalize a business model, it's smart to complete a first-round quantitative analysis, as
described in the digital estate articles. For more information on the individual quantitative drivers involved in
rationalization, see The 5 Rs of rationalization. Either approach will use easily obtained inventory data and a brief
quantitative analysis to identify workloads or applications that could result in higher costs in the cloud. These
approaches could also identify dependencies or traffic patterns that would necessitate a hybrid solution.
Myth: Mirroring my on-premises environment will help me save money in the cloud. During digital estate
planning, it's not unheard of for businesses to detect unused capacity of more than 50% of the provisioned
environment. If assets are provisioned in the cloud to match current provisioning, cost savings are hard to realize.
Consider reducing the size of the deployed assets to align with usage patterns rather than provisioning patterns.
Myth: Server costs drive business cases for cloud migration. Sometimes this assumption is true. For some
companies, it's important to reduce ongoing capital expenses related to servers. But it depends on several factors.
Companies with a five-year to eight-year hardware refresh cycle are unlikely to see fast returns on their cloud
migration. Companies with standardized or enforced refresh cycles can hit a break-even point quickly. In either
case, other expenses might be the financial triggers that justify the migration. Here are a few examples of costs that
are commonly overlooked when companies take a server-only or VM -only view of costs:
Costs of software for virtualization, servers, and middleware can be extensive. Cloud providers eliminate some
of these costs. Two examples of a cloud provider reducing virtualization costs are the Azure Hybrid Benefit and
Azure reservations programs.
Business losses caused by outages can quickly exceed hardware or software costs. If your current datacenter is
unstable, work with the business to quantify the impact of outages in terms of opportunity costs or actual
business costs.
Environmental costs can also be significant. For the average American family, a home is the biggest investment
and the highest cost in the budget. The same is often true for datacenters. Real estate, facilities, and utility costs
represent a fair portion of on-premises costs. When datacenters are retired, those facilities can be repurposed,
or your business could potentially be released from these costs entirely.
Myth: An operating expense model is better than a capital expense model. As explained in the fiscal
outcomes article, an operating expense model can be a good thing. But some industries view operating
expenditures negatively. Here are a few examples that would trigger tighter integration with the accounting and
business units regarding the operating expense conversation:
When a business sees capital assets as a driver for business valuation, capital expense reductions could be a
negative outcome. Though it's not a universal standard, this sentiment is most commonly seen in the retail,
manufacturing, and construction industries.
A private equity firm or a company that's seeking capital influx might consider operating expense increases as a
negative outcome.
If a business focuses heavily on improving sales margins or reducing cost of goods sold (COGS ), operating
expenses could be a negative outcome.
Businesses are more likely to see operating expense as more favorable than capital expense. For example, this
approach might be well received by businesses that are trying to improve cash flow, reduce capital investments, or
decrease asset holdings.
Before you provide a business justification that focuses on a conversion from capital expense to operating expense,
understand which is better for your business. Accounting and procurement can often help align the message to
financial objectives.
Myth: Moving to the cloud is like flipping a switch. Migrations are a manually intense technical
transformation. When developing a business justification, especially justifications that are time sensitive, consider
the following aspects that could increase the time it takes to migrate assets:
Bandwidth limitations: The amount of bandwidth between the current datacenter and the cloud provider will
drive timelines during migration.
Testing timelines: Testing applications with the business to ensure readiness and performance can be time
consuming. Aligning power users and testing processes is critical.
Migration timelines: The amount of time and effort required to implement the migration can increase costs
and cause delays. Allocating employees or contracting partners can also delay the process. The plan should
account for these allocations.
Technical and cultural impediments can slow cloud adoption. When time is an important aspect of the business
justification, the best mitigation is proper planning. During planning, two approaches can help mitigate timeline
risks:
Invest the time and energy in understanding technical adoption constraints. Though pressure to move quickly
might be high, it's important to account for realistic timelines.
If cultural or people impediments arise, they'll have more serious effects than technical constraints. Cloud
adoption creates change, which produces the desired transformation. Unfortunately, people sometimes fear
change and might need additional support to align with the plan. Identify key people on the team who are
opposed to change and engage them early.
To maximize readiness and mitigation of timeline risks, prepare executive stakeholders by firmly aligning business
value and business outcomes. Help those stakeholders understand the changes that will come with the
transformation. Be clear and set realistic expectations from the beginning. When people or technologies slow the
process, it will be easier to enlist executive support.

Build the business justification


The following process defines an approach to developing the business justification for cloud migrations. For more
information about the calculations and financial terms, see the article on financial models.
At the highest level, the formula for business justification is simple. But the subtle data points required to populate
the formula can be difficult to align. On a basic level, the business justification focuses on the return on investment
(ROI) associated with the proposed technical change. The generic formula for ROI is:

We can unpack this equation to get a migration-specific view of the formulas for the input variables on the right
side of the equation. The remaining sections of this article offer some considerations to take into account.

Migration-specific initial investment


Cloud providers like Azure offer calculators to estimate cloud investments. The Azure pricing calculator is one
example.
Some cloud providers also provide cost-delta calculators. The Azure Total Cost of Ownership (TCO ) Calculator
is one example.
For more refined cost structures, consider a digital estate planning exercise.
Estimate the cost of migration.
Estimate the cost of any expected training opportunities. Microsoft Learn might be able to help mitigate those
costs.
At some companies, the time invested by existing staff members might need to be included in the initial costs.
Consult the finance office for guidance.
Discuss any additional costs or burden costs with the finance office for validation.

Migration-specific revenue deltas


This aspect is often overlooked by strategists creating a business justification for migration. In some areas, the
cloud can cut costs. But the ultimate goal of any transformation is to yield better results over time. Consider the
downstream effects to understand long-term revenue improvements. What new technologies will be available to
your business after the migration that can't be used today? What projects or business objectives are blocked by
dependencies on legacy technologies? What programs are on hold, pending high capital expenditures for
technology?
After you consider the opportunities unlocked by the cloud, work with the business to calculate the revenue
increases that could come from those opportunities.

Migration-specific cost deltas


Calculate any changes to costs that will come from the proposed migration. See the financial models article for
details about the types of cost deltas. Cloud providers often offer tools for cost-delta calculations. The Azure Total
Cost of Ownership (TCO ) Calculator is one example.
Other examples of costs that might be reduced by a cloud migration:
Datacenter termination or reduction (environmental costs)
Reduction in power consumed (environmental costs)
Rack termination (physical asset recovery)
Hardware refresh avoidance (cost avoidance)
Software renewal avoidance (operational cost reduction or cost avoidance)
Vendor consolidation (operational cost reduction and potential soft-cost reduction)

When ROI results are surprising


If the ROI for a cloud migration doesn't match your expectations, you might want to revisit the common myths
listed at the beginning of this article.
But it's important to understand that a cost savings isn't always possible. Some applications cost more to operate
in the cloud than on-premises. These applications can significantly skew results in an analysis.
When the ROI is below 20%, consider a digital estate planning exercise, paying specific attention to rationalization.
During quantitative analysis, review each application to find workloads that skew the results. It might make sense
to remove those workloads from the plan. If usage data is available, consider reducing the size of VMs to match
usage.
If the ROI is still misaligned, seek help from your Microsoft sales representative or engage an experienced partner.

Next steps
Create a financial model for cloud transformation
Create a financial model for cloud transformation
5 minutes to read • Edit Online

Creating a financial model that accurately represents the full business value of any cloud transformation can be
complicated. Financial models and business justifications tend to vary for different organizations. This article
establishes some formulas and points out a few things that are commonly missed when strategists create
financial models.

Return on investment
Return on investment (ROI) is often an important criteria for the C -suite or the board. ROI is used to compare
different ways to invest limited capital resources. The formula for ROI is fairly simple. The details you'll need to
create each input to the formula might not be as simple. Essentially, ROI is the amount of return produced from
an initial investment. It's usually represented as a percentage:

In the next sections, we'll walk through the data you'll need to calculate the initial investment and the gain from
investment (earnings).

Calculate initial investment


Initial investment is the capital expense and operating expense required to complete a transformation. The
classification of costs can vary depending on accounting models and CFO preference. But this category would
include items like professional services to transform, software licenses used only during the transformation, the
cost of cloud services during the transformation, and potentially the cost of salaried employees during the
transformation.
Add these costs to create an estimate of the initial investment.

Calculate the gain from investment


Calculating the gain from investment often requires a second formula that's specific to the business outcomes
and associated technical changes. Calculating earnings is harder than calculating cost reductions.
To calculate earnings, you need two variables:

These variables are described in the following sections.

Revenue deltas
Revenue deltas should be forecast in partnership with business stakeholders. After the business stakeholders
agree on a revenue impact, it can be used to improve the earning position.
Cost deltas
Cost deltas are the amount of increase or decrease that will be caused by the transformation. Independent
variables can affect cost deltas. Earnings are largely based on hard costs like capital expense reductions, cost
avoidance, operational cost reductions, and depreciation reductions. The following sections describe some cost
deltas to consider.
Depreciation reduction or acceleration
For guidance on depreciation, speak with the CFO or finance team. The following information is meant to serve
as a general reference on the topic of depreciation.
When capital is invested in the acquisition of an asset, that investment could be used for financial or tax purposes
to produce ongoing benefits over the expected lifespan of the asset. Some companies see depreciation as a
positive tax advantage. Others see it as a committed, ongoing expense similar to other recurring expenses
attributed to the annual IT budget.
Speak with the finance office to find out if elimination of depreciation is possible and if it would make a positive
contribution to cost deltas.
Physical asset recovery
In some cases, retired assets can be sold as a source of revenue. This revenue is often lumped into cost reduction
for simplicity. But it's truly an increase in revenue and can be taxed as such. Speak with the finance office to
understand the viability of this option and how to account for the resulting revenue.
Operational cost reductions
Recurring expenses required to operate a business are often called operating expenses. This is a broad category.
In most accounting models, it includes:
Software licensing.
Hosting expenses.
Electric bills.
Real estate rentals.
Cooling expenses.
Temporary staff required for operations.
Equipment rentals.
Replacement parts.
Maintenance contracts.
Repair services.
Business continuity and disaster recovery (BCDR ) services.
Other expenses that don't require capital expense approvals.
This category provides one of the highest earning deltas. When you're considering a cloud migration, time
invested in making this list exhaustive is rarely wasted. Ask the CIO and finance team questions to ensure all
operational costs are accounted for.
Cost avoidance
When an operating expenditure is expected but not yet in an approved budget, it might not fit into a cost
reduction category. For example, if VMware and Microsoft licenses need to be renegotiated and paid next year,
they aren't fully qualified costs yet. Reductions in those expected costs are treated like operational costs for the
sake of cost-delta calculations. Informally, however, they should be referred to as "cost avoidance" until
negotiation and budget approval is complete.
Soft-cost reductions
At some companies, soft costs like reductions in operational complexity or reductions in full-time staff for
operating a datacenter could also be included in cost deltas. But including soft costs might not be a good idea.
When you include soft-cost reductions, you insert an undocumented assumption that the reduction will create
tangible cost savings. Technology projects rarely result in actual soft-cost recovery.
Headcount reductions
Time savings for staff are often included under soft-cost reduction. When those time savings map to actual
reduction of IT salary or staffing, they could be calculated separately as headcount reductions.
That said, the skills needed on-premises generally map to a similar (or higher-level) set of skills needed in the
cloud. So people aren't generally laid off after a cloud migration.
An exception occurs when operational capacity is provided by a third party or managed services provider (MSP ).
If IT systems are managed by a third party, the operating costs could be replaced by a cloud-native solution or
cloud-native MSP. A cloud-native MSP is likely to operate more efficiently and potentially at a lower cost. If that's
the case, operational cost reductions belong in the hard-cost calculations.
Capital expense reductions or avoidance
Capital expenses are slightly different from operating expenses. Generally, this category is driven by refresh
cycles or datacenter expansion. An example of a datacenter expansion would be a new high-performance cluster
to host a big data solution or data warehouse. This expense would generally fit into a capital expense category.
More common are the basic refresh cycles. Some companies have rigid hardware refresh cycles, meaning assets
are retired and replaced on a regular cycle (usually every three, five, or eight years). These cycles often coincide
with asset lease cycles or the forecasted life span of equipment. When a refresh cycle hits, IT draws capital
expense to acquire new equipment.
If a refresh cycle is approved and budgeted, the cloud transformation could help eliminate that cost. If a refresh
cycle is planned but not yet approved, the cloud transformation could avoid a capital expenditure. Both
reductions would be added to the cost delta.

Next steps
Learn more about cloud accounting models.
Cloud accounting
What is cloud accounting?
4 minutes to read • Edit Online

The cloud changes how IT accounts for costs, as is described in Creating a financial model for cloud
transformation. Various IT accounting models are much easier to support because of how the cloud allocates
costs. So it's important to understand how to account for cloud costs before you begin a cloud transformation
journey. This article outlines the most common cloud accounting models for IT.

Traditional IT accounting (cost center model)


It's often accurate to consider IT a cost center. In the traditional IT accounting model, IT consolidates purchasing
power for all IT assets. As we pointed out in the financial models article, that purchasing power consolidation can
include software licenses, recurring charges for CRM licensing, purchase of employee desktops, and other large
costs.
When IT serves as a cost center, the perceived value of IT is largely viewed through a procurement management
lens. This perception makes it difficult for the board or other executives to understand the true value that IT
provides. Procurement costs tend to skew the view of IT by outweighing any other value added by the
organization. This view explains why IT is often lumped into the CFO's or COO's responsibilities. This perception
of IT is limited and can be short sighted.

Central IT accounting (profit center model)


To overcome the cost center view of IT, some CIOs opted for a central IT model of accounting. In this type of
model, IT is treated like a competing business unit and a peer to revenue-producing business units. In some cases,
this model can be entirely logical. For example, some organizations have a professional IT services division that
generates a revenue stream. Frequently, central IT models don't generate significant revenue, making it difficult to
justify the model.
Regardless of the revenue model, central IT accounting models are unique because of how the IT unit accounts for
costs. In a traditional IT model, the IT team records costs and pays those costs from shared funds like operations
and maintenance (O&M ) or a dedicated profit and loss (P&L ) account.
In a central IT accounting model, the IT team marks up the services provided to account for overhead,
management, and other estimated expenses. It then bills the competing business units for the marked-up services.
In this model, the CIO is expected to manage the P&L associated with the sale of those services. This can create
inflated IT costs and contention between central IT and business units, especially when IT needs to cut costs or
isn't meeting agreed-upon SLAs. During times of technology or market change, any new technology would cause
a disruption to central IT's P&L, making transformation difficult.

Chargeback
One of the common first steps in changing IT's reputation as a cost center is implementing a chargeback model of
accounting. This model is especially common in smaller enterprises or highly efficient IT organizations. In the
chargeback model, any IT costs that are associated with a specific business unit are treated like an operating
expense in that business unit's budget. This practice reduces the cumulative cost effects on IT, allowing business
values to show more clearly.
In a legacy on-premises model, chargeback is difficult to realize because someone still has to carry the large
capital expenses and depreciation. The ongoing conversion from capital expenditures to operating expenses
associated with usage is a difficult accounting exercise. This difficulty is a major reason for the creation of the
traditional IT accounting model and the central IT accounting model. The operating expenses model of cloud cost
accounting is almost required if you want to efficiently deliver a chargeback model.
But you shouldn't implement this model without considering the implications. Here are a few consequences that
are unique to a chargeback model:
Chargeback results in a massive reduction of the overall IT budget. For IT organizations that are inefficient or
require extensive complex technical skills in operations or maintenance, this model can expose those expenses
in an unhealthy way.
Loss of control is a common consequence. In highly political environments, chargeback can result in loss of
control and staff being reallocated to the business. This could create significant inefficiencies and reduce IT's
ability to consistently meet SLAs or project requirements.
Difficulty accounting for shared services is another common consequence. If the organization has grown
through acquisition and is carrying technical debt as a result, it's likely that a high percentage of shared services
must be maintained to keep all systems working together effectively.
Cloud transformations include solutions to these and other consequences associated with a chargeback model.
But each of those solutions includes implementation and operating expenses. The CIO and CFO should carefully
weigh the pros and cons of a chargeback model before considering one.

Showback or awareness-back
For larger enterprises, a showback or awareness-back model is a safer first step in the transition from cost center
to value center. This model doesn't affect financial accounting. In fact, the P&Ls of each organization don't change.
The biggest shift is in mindset and awareness. In a showback or awareness-back model, IT manages the
centralized, consolidated buying power as an agent for the business. In reports back to the business, IT attributes
any direct costs to the relevant business unit, which reduces the perceived budget directly consumed by IT. IT also
plans budgets based on the needs of the associated business units, which allows IT to more accurately account for
costs associated to purely IT initiatives.
This model provides a balance between a true chargeback model and more traditional models of IT accounting.

Impact of cloud accounting models


The choice of accounting models is crucial in system design. The choice of accounting model can affect
subscription strategies, naming standards, tagging standards, and policy and blueprint designs.
After you've worked with the business to make decisions about a cloud accounting model and global markets, you
have enough information to develop an Azure foundation.
Develop an Azure foundation
First cloud adoption project
3 minutes to read • Edit Online

There's a learning curve and a time commitment associated with cloud adoption planning. Even for experienced
teams, proper planning takes time: time to align stakeholders, time to collect and analyze data, time to validate
long-term decisions, and time to align people, processes, and technology. In the most productive adoption efforts,
planning grows in parallel with adoption, improving with each release and with each workload migration to the
cloud. It's important to understand the difference between a cloud adoption plan and a cloud adoption strategy.
You need a well-defined strategy to facilitate and guide the implementation of a cloud adoption plan.
The Cloud Adoption Framework for Azure outlines the processes for cloud adoption and the operation of
workloads hosted in the cloud. Each of the processes across the Define strategy, Plan, Ready, Adopt, and Operate
phases require slight expansions of technical, business, and operational skills. Some of those skills can come from
directed learning. But many of them are most effectively acquired through hands-on experience.
Starting a first adoption process in parallel with the development of the plan provides some benefits:
Establish a growth mindset to encourage learning and exploration
Provide an opportunity for the team to develop necessary skills
Create situations that encourage new approaches to collaboration
Identify skill gaps and potential partnership needs
Provide tangible inputs to the plan

First project criteria


Your first adoption project should align with your motivations for cloud adoption. Whenever possible, your first
project should also demonstrate progress toward a defined business outcome.

First project expectations


Your team's first adoption project is likely to result in a production deployment of some kind. But this isn't always
the case. Establish proper expectations early. Here are a few wise expectations to set:
This project is a source of learning.
This project might result in production deployments, but it will probably require additional effort first.
The output of this project is a set of clear requirements to provide a longer-term production solution.

First project examples


To support the preceding criteria, this list provides an example of a first project for each motivation category:
Critical business events: When a critical business event is the primary motivation, implementation of a
tool like Azure Site Recovery might be a good first project. During migration, you can use this tool to quickly
migrate datacenter assets. But during the first project, you could use it purely as a disaster recovery tool,
reducing dependencies on disaster recovery assets within the datacenter.
Migration motivations: When migration is the primary motivation, it's wise to start with the migration of
a noncritical workload. The Azure setup guide and the Azure migration guide can provide guidance for the
migration of your first workload.
Innovation motivations: When innovation is the primary motivation, creation of a targeted dev/test
environment can be a great first project.
Additional examples of first adoption projects include:
Disaster recovery and business continuity (DRBC ): Beyond Azure Site Recovery, you can implement
multiple DRBC strategies as a first project.
Nonproduction: Deploy a nonproduction instance of a workload.
Archive: Cold storage can place a strain on datacenter resources. Moving that data to the cloud is a solid quick
win.
End of support (EOS ): Migrating assets that have reached the end of support is another quick win that builds
technical skills. It could also provide some cost avoidance from expensive support contracts or licensing costs.
Virtual Desktop Interface (VDI ): Creating virtual desktops for remote employees can provide a quick win. In
some cases, this first adoption project could also reduce dependence on expensive private networks in favor of
commodity public internet connectivity.
Dev/test: Remove dev/test from on-premises environments to give developers control, agility, and self-service
capacity.
Simple apps (less than five): Modernize and migrate a simple app to quickly gain developer and operations
experience.
Performance labs: When you need high-scale performance in a lab setting, use the cloud to quickly and cost-
effectively provision those labs for a short time.
Data Platform: Creating a data lake with scalable compute for analytics, reporting, or machine learning
workloads, and migrating to managed databases using dump/restore methods or data migration services.

Next steps
After the first cloud adoption project has begun, the cloud strategy team can turn their attention to the longer-term
cloud adoption plan.
Build your cloud adoption plan
Skills readiness path during the Plan phase of a
migration journey
4 minutes to read • Edit Online

During the Plan phase of a migration journey, the objective is to develop the plans necessary to guide migration
implementation. This phase requires a few critical skills, including:
Establishing the vision.
Building the business justification.
Rationalizing the digital estate.
Creating a migration backlog (technical plan).
The following sections provide learning paths to develop each of these skills.

Establish the vision


The success of any cloud adoption effort is defined by the business vision. When the technical team doesn't
understand the motives and desired outcomes, it's hard for them to guide their efforts toward business success.
See these articles for information about documenting and articulating the business vision for the technical team:
Adoption motivations. Document and articulate the reasons behind the technical effort.
Business outcomes. Clearly articulate what's expected of the technical team in terms of business changes.
Learning metrics. Establish short-term metrics that can show progress toward longer-term business outcomes.

Build the business justification


Justifying the investment to adopt the cloud can require deeper analysis and an understanding of your
organization's accounting practices. The articles on business justification can help you develop these skills:
Cloud migration business case. Establish a business case for cloud migration.

Rationalize the digital estate


You can refine your business case by aligning the desired business case with current and future digital estate
inventory. These articles can guide the development of a digital estate rationalization:
Incremental rationalization. An agile approach to rationalization that properly aligns late-bound technical
decisions.
Rs of rationalization. Understand the various rationalization options.

Create a migration backlog (technical plan)


Convert the business case and rationalized digital estate into an actionable migration plan to guide the technical
activities required to achieve the desired business outcomes.

Business planning skills


During the Ready phase, technical staff creates a migration landing zone capable of hosting, operating, and
governing workloads that have been migrated to the cloud. These learning paths can help you develop the
necessary skills:
Create an Azure account. The first step to using Azure is to create an account. Your account holds the Azure
services you provision and handles your personal settings, like identity, billing, and preferences.
Azure portal. Tour the Azure portal features and services, and customize the portal.
Introduction to Azure. Get started with Azure by creating and configuring your first virtual machine in the
cloud.
Introduction to security in Azure. Learn the basic concepts for protecting your infrastructure and data when you
work in the cloud. Understand what responsibilities are yours and what Azure takes care of for you.
Manage resources in Azure. Learn how to work with the Azure command line and web portal to create,
manage, and control cloud-based resources.
Create a VM. Create a virtual machine by using the Azure portal.
Azure networking. Learn the basics of Azure networking and how Azure networking helps you improve
resiliency and reduce latency.
Azure compute options. Learn about the Azure compute services.
Secure resources with RBAC. Use RBAC to secure resources.
Data storage options. Learn about the benefits of Azure data storage.

Organizational skills
Depending on the motivations and desired business outcomes of a cloud adoption effort, leaders might need to
establish new organizational structures or virtual teams (v-teams) to facilitate various functions. These articles will
help you develop the skills necessary to structure those teams to meet desired outcomes:
Initial organizational alignment. Overview of organizational alignment and various team structures to facilitate
specific goals.
Breaking down silos and fiefdoms. Understanding two common organizational antipatterns and ways to guide
a team to productive collaboration.

Deeper skills exploration


Beyond these initial options for developing skills, a variety of learning options is available.
Typical mappings of cloud IT roles
Microsoft and partners offer various options to help all audiences develop their skills with Azure services:
Microsoft IT Pro Center. Serves as a free online resource to help map your cloud career path. Learn what
industry experts suggest for your cloud role and the skills to get you there. Follow a learning curriculum at your
own pace to build the skills you need most to stay relevant.
We recommend turning knowledge of Azure into official recognition with Microsoft Azure certification training
and exams.

Microsoft Learn
Microsoft Learn is a new approach to learning. Readiness for the new skills and responsibilities that come with
cloud adoption doesn't come easily. Microsoft Learn provides a more rewarding approach to hands-on learning
that helps you achieve your goals faster. Earn points, levels and achieve more!
Here is an example of a tailored learning paths which aligns to the Strategy potion of the Cloud Adoption
Framework.
Learn the business value of Microsoft Azure: This learning experience will take you on a journey that will begin by
showing you how digital transformation and the power of the cloud can transform your business. We will cover
how Microsoft Azure cloud services can power your organization on a trusted cloud platform. Finally, we will wrap
up by illustrating how to make this journey real for your organization.
Learn more
To discover additional learning paths, browse the Microsoft Learn catalog. Use the Roles filter to align learning
paths with your role.
Cloud adoption plans convert the aspirational goals of a cloud adoption strategy into an actionable plan. The collective cloud
teams can use the cloud adoption plan to guide their technical efforts and align them with the business strategy.

Cloud adoption plan process


The following exercises will help you document your technology strategy. This approach captures prioritized tasks to drive
adoption efforts. The cloud adoption plan then maps to the metrics and motivations defined in the cloud adoption strategy.

Digital estate
Inventory and rationalize your digital estate based on assumptions that align with motivations and business outcomes.

Initial organizational alignment


Establish a plan for initial organizational alignment to support the adoption plan.

Skills readiness plan


Create a plan for addressing skills readiness gaps.

Cloud adoption plan


Develop a cloud adoption plan to manage change across the digital estate, skills, and organization.

Download the Cloud Adoption Framework strategy and planning template to track the outputs of each exercise as you build out
your cloud adoption strategy.

Next steps
Start building the cloud adoption plan with a focus on the digital estate.
Digital estate
Cloud rationalization
4 minutes to read • Edit Online

Cloud rationalization is the process of evaluating assets to determine the best way to migrate or modernize each
asset in the cloud. For more information about the process of rationalization, see What is a digital estate?.

Rationalization context
The "five Rs of rationalization" listed in this article are a great way to label a potential future state for any
workload that's being considered as a cloud candidate. However, this labeling process should be put into the
correct context before you attempt to rationalize an environment. Review the following myths to provide that
context:
Myth: It's easy to make rationalization decisions early in the process. Accurate rationalization
requires a deep knowledge of the workload and associated assets (apps, VMs, and data). Most importantly,
accurate rationalization decisions take time. We recommend using an incremental rationalization process.
Myth: Cloud adoption has to wait for all workloads to be rationalized. Rationalizing an entire IT
portfolio or even a single datacenter can delay the realization of business value by months or even years.
Full rationalization should be avoided when possible. Instead, use the power of 10 approach to release
planning to make wise decisions about the next 10 workloads that are slated for cloud adoption.
Myth: Business justification has to wait for all workloads to be rationalized. To develop a business
justification for a cloud adoption effort, make a few basic assumptions at the portfolio level. When
motivations are aligned to innovation, assume rearchitecture. When motivations are aligned to migration,
assume rehost. These assumptions can accelerate the business justification process. Assumptions are then
challenged and budgets refined during the assessment phase of each workload's adoption cycles.
Now review the following five Rs of rationalization to familiarize yourself with the long-term process. While
developing your cloud adoption plan, choose the option that best aligns with your motivations, business
outcomes, and current state environment. The goal in digital estate rationalization is to set a baseline, not to
rationalize every workload.

The five Rs of rationalization


The five Rs of rationalization that are listed here describe the most common options for rationalization.

Rehost
Also known as a lift and shift migration, a rehost effort moves a current state asset to the chosen cloud provider,
with minimal change to overall architecture.
Common drivers might include:
Reducing capital expense
Freeing up datacenter space
Achieving rapid return on investment in the cloud
Quantitative analysis factors:
VM size (CPU, memory, storage)
Dependencies (network traffic)
Asset compatibility
Qualitative analysis factors:
Tolerance for change
Business priorities
Critical business events
Process dependencies

Refactor
Platform as a service (PaaS ) options can reduce the operational costs that are associated with many applications.
It's a good idea to slightly refactor an application to fit a PaaS -based model.
"Refactor" also refers to the application development process of refactoring code to enable an application to
deliver on new business opportunities.
Common drivers might include:
Faster and shorter updates
Code portability
Greater cloud efficiency (resources, speed, cost, managed operations)
Quantitative analysis factors:
Application asset size (CPU, memory, storage)
Dependencies (network traffic)
User traffic (page views, time on page, load time)
Development platform (languages, data platform, middle-tier services)
Database (CPU, memory, storage, version)
Qualitative analysis factors:
Continued business investments
Bursting options/timelines
Business process dependencies

Rearchitect
Some aging applications aren't compatible with cloud providers because of the architectural decisions that were
made when the application was built. In these cases, the application might need to be rearchitected before
transformation.
In other cases, applications that are cloud-compatible, but not cloud-native, might create cost efficiencies and
operational efficiencies by rearchitecting the solution into a cloud-native application.
Common drivers might include:
Application scale and agility
Easier adoption of new cloud capabilities
Mix of technology stacks
Quantitative analysis factors:
Application asset size (CPU, memory, storage)
Dependencies (network traffic)
User traffic (page views, time on page, load time)
Development platform (languages, data platform, middle tier services)
Database (CPU, memory, storage, version)
Qualitative analysis factors:
Growing business investments
Operational costs
Potential feedback loops and DevOps investments.

Rebuild
In some scenarios, the delta that must be overcome to carry an application forward can be too large to justify
further investment. This is especially true for applications that previously met the needs of a business but are now
unsupported or misaligned with the current business processes. In this case, a new code base is created to align
with a cloud-native approach.
Common drivers might include:
Accelerate innovation
Build apps faster
Reduce operational cost
Quantitative analysis factors:
Application asset size (CPU, memory, storage)
Dependencies (network traffic)
User traffic (page views, time on page, load time)
Development platform (languages, data platform, middle tier services)
Database (CPU, memory, storage, version)
Qualitative analysis factors:
Declining end-user satisfaction
Business processes limited by functionality
Potential cost, experience, or revenue gains

Replace
Solutions are typically implemented by using the best technology and approach available at the time. Sometimes
software as a service (SaaS ) applications can provide all the necessary functionality for the hosted application. In
these scenarios, a workload can be scheduled for future replacement, effectively removing it from the
transformation effort.
Common drivers might include:
Standardizing around industry-best practices
Accelerating adoption of business process-driven approaches
Reallocating development investments into applications that create competitive differentiation or advantages
Quantitative analysis factors:
General operating cost reductions
VM size (CPU, memory, storage)
Dependencies (network traffic)
Assets to be retired
Database (CPU, memory, storage, version)
Qualitative analysis factors:
Cost benefit analysis of the current architecture versus a SaaS solution
Business process maps
Data schemas
Custom or automated processes

Next steps
Collectively, you can apply these five Rs of rationalization to a digital estate to help you make rationalization
decisions about the future state of each application.
What is a digital estate?
What is a digital estate?
2 minutes to read • Edit Online

Every modern company has some form of digital estate. Much like a physical estate, a digital estate is an
abstract reference to a collection of tangible owned assets. In a digital estate, those assets include virtual
machines (VMs), servers, applications, data, and so on. Essentially, a digital estate is the collection of IT assets
that power business processes and supporting operations.
The importance of a digital estate is most obvious during the planning and execution of digital transformation
efforts. During transformation journeys, the cloud strategy teams use the digital estate to map the business
outcomes to release plans and technical efforts. That all starts with an inventory and measurement of the
digital assets that the organization owns today.

How can a digital estate be measured?


The measurement of a digital estate changes depending on the desired business outcomes.
Infrastructure migrations: When an organization is inward-facing and seeks to optimize costs,
operational processes, agility, or other aspects of their operations, the digital estate focuses on VMs,
servers, and workloads.
Application innovation: For customer-focused transformations, the lens is a bit different. The focus
should be placed on the applications, APIs, and transactional data that supports the customers. VMs
and network appliances often receive less focus.
Data-driven innovation: In today's digitally driven market, it's difficult to launch a new product or
service without a strong foundation in data. During cloud-enabled data innovation efforts, the focus is
more on the silos of data across the organization.
After an organization understands the most important form of transformation, digital estate planning becomes
much easier to manage.

TIP
Each type of transformation can be measured with any of the three views. Companies commonly complete all three
transformations in parallel. We strongly recommend that company leadership and the cloud strategy team agree
regarding the transformation that is most important for business success. That understanding serves as the basis for
common language and metrics across multiple initiatives.

How can a financial model be updated to reflect the digital estate?


An analysis of the digital estate drives cloud adoption activities. It also informs financial models by providing
cloud costing models, which in turn drive return on investment (ROI).
To complete the digital estate analysis, take the following steps:
1. Determine analysis approach.
2. Collect current state inventory.
3. Rationalize the assets in the digital estate.
4. Align assets to cloud offerings to calculate pricing.
Financial models and migration backlogs can be modified to reflect the rationalized and priced estate.
Next steps
Before digital estate planning begins, determine which approach to use.
Approaches to digital estate planning
Approaches to digital estate planning
3 minutes to read • Edit Online

Digital estate planning can take several forms depending on the desired outcomes and size of the existing estate.
There are various approaches that you can take. It's important to set expectations regarding the approach early in
planning cycles. Unclear expectations often lead to delays associated with additional inventory-gathering exercises.
This article outlines three approaches to analysis.

Workload-driven approach
The top-down assessment approach evaluates security aspects. Security includes the categorization of data (high,
medium, or low business impact), compliance, sovereignty, and security risk requirements. This approach assesses
high-level architectural complexity. It evaluates aspects such as authentication, data structure, latency
requirements, dependencies, and application life expectancy.
The top-down approach also measures the operational requirements of the application, such as service levels,
integration, maintenance windows, monitoring, and insight. When all of these aspects have been analyzed and
taken into consideration, the resulting score that reflects the relative difficulty of migrating this application to each
of the cloud platforms: IaaS, PaaS, and SaaS.
In addition, the top-down assessment evaluates the financial benefits of the application, such as operational
efficiencies, TCO, return on investment, and other appropriate financial metrics. The assessment also examines the
seasonality of the application (for example, are there times of the year when demand spikes?) and overall compute
load.
It also looks at the types of users it supports (casual/expert, always/occasionally logged on), and the required
scalability and elasticity. Finally, the assessment concludes by examining business continuity and resiliency
requirements, as well as dependencies for running the application if a disruption of service should occur.

TIP
This approach requires interviews and anecdotal feedback from business and technical stakeholders. Availability of key
individuals is the biggest risk to timing. The anecdotal nature of the data sources makes it more difficult to produce accurate
cost or timing estimates. Plan schedules in advance and validate any data that's collected.

Asset-driven approach
The asset-driven approach provides a plan based on the assets that support an application for migration. In this
approach, you pull statistical usage data from a configuration management database (CMDB ) or other
infrastructure assessment tools.
This approach usually assumes an IaaS model of deployment as a baseline. In this process, the analysis evaluates
the attributes of each asset: memory, number of processors (CPU cores), operating system storage space, data
drives, network interface cards (NICs), IPv6, network load balancing, clustering, operating system version,
database version (if necessary), supported domains, and third-party components or software packages, among
others. The assets that you inventory in this approach are then aligned with workloads or applications for
grouping and dependency mapping purposes.
TIP
This approach requires a rich source of statistical usage data. The time that's needed to scan the inventory and collect data is
the biggest risk to timing. The low-level data sources can miss dependencies between assets or applications. Plan for at least
one month to scan the inventory. Validate dependencies before deployment.

Incremental approach
We strongly suggest an incremental approach, as we do for many processes in the Cloud Adoption Framework. In
the case of digital estate planning, that equates to a multiphase process:
Initial cost analysis: If financial validation is required, start with an asset-driven approach, described
earlier, to get an initial cost calculation for the entire digital estate, with no rationalization. This establishes a
worst-case scenario benchmark.
Migration planning: After you have assembled a cloud strategy team, build an initial migration backlog
using a workload-driven approach that's based on their collective knowledge and limited stakeholder
interviews. This approach quickly builds a lightweight workload assessment to foster collaboration.
Release planning: At each release, the migration backlog is pruned and reprioritized to focus on the most
relevant business impact. During this process, the next five to ten workloads are selected as prioritized
releases. At this point, the cloud strategy team invests the time in completing an exhaustive workload-
driven approach. Delaying this assessment until a release is aligned better respects the time of
stakeholders. It also delays the investment in full analysis until the business starts to see results from earlier
efforts.
Execution analysis: Before migrating, modernizing, or replicating any asset, assess it both individually and
as part of a collective release. At this point, the data from the initial asset-driven approach can be
scrutinized to ensure accurate sizing and operational constraints.

TIP
This incremental approach enables streamlined planning and accelerated results. It's important that all parties involved
understand the approach to delayed decision making. It's equally important that assumptions made at each stage be
documented to avoid loss of details.

Next steps
After an approach is selected, the inventory can be collected.
Gather inventory data
Gather inventory data for a digital estate
2 minutes to read • Edit Online

Developing an inventory is the first step in digital estate planning. In this process, a list of IT assets that support
specific business functions are collected for later analysis and rationalization. This article assumes that a bottom-
up approach to analysis is most appropriate for planning. For more information, see Approaches to digital estate
planning.

Take inventory of a digital estate


The inventory that supports a digital estate changes depending on the desired digital transformation and
corresponding transformation journey.
Cloud migration: We often recommend that during a cloud migration, you collect the inventory from
scanning tools that create a centralized list of all virtual machines and servers. Some tools can also create
network mappings and dependencies, which help define workload alignment.
Application innovation: Inventory during a cloud-enabled application innovation effort begins with the
customer. Mapping the customer experience from start to finish is a good place to begin. Aligning that map
to applications, APIs, data, and other assets creates a detailed inventory for analysis.
Data innovation: Cloud-enabled data innovation efforts focus on the product or service. An inventory
also includes a mapping of the opportunities for disrupting the market, as well as the capabilities needed.
Security: Inventory provides security the understanding to help assess, protect, and monitor the
organization's assets.

Accuracy and completeness of an inventory


An inventory is rarely complete in its first iteration. We strongly recommend the cloud strategy team aligns
stakeholders and power users to validate the inventory. When possible, use additional tools like network and
dependency analysis to identify assets that are being sent traffic, but that are not in the inventory.

Next steps
After an inventory is compiled and validated, it can be rationalized. Inventory rationalization is the next step to
digital estate planning.
Rationalize the digital estate
Rationalize the digital estate
10 minutes to read • Edit Online

Cloud rationalization is the process of evaluating assets to determine the best approach to hosting them in the
cloud. After you've determined an approach and aggregated an inventory, cloud rationalization can begin. Cloud
rationalization discusses the most common rationalization options.

Traditional view of rationalization


It's easy to understand rationalization when you visualize the traditional process of rationalization as a complex
decision tree. Each asset in the digital estate is fed through a process that results in one of five answers (the five
Rs). For small estates, this process works well. For larger estates, it's inefficient and can lead to significant delays.
Let's examine the process to see why. Then we'll present a more efficient model.
Inventory: A thorough inventory of assets, including applications, software, hardware, operating systems, and
system performance metrics, is required for completing a full rationalization by using traditional models.
Quantitative analysis: In the decision tree, quantitative questions drive the first layer of decisions. Common
questions include the following: Is the asset in use today? If so, is it optimized and sized properly? What
dependencies exist between assets? These questions are vital to the classification of the inventory.
Qualitative analysis: The next set of decisions requires human intelligence in the form of qualitative analysis.
Often, the questions that come up here are unique to the solution and can be answered only by business
stakeholders and power users. These decisions typically delay the process, slowing things down considerably.
This analysis generally consumes 40 to 80 FTE hours per application.
For guidance about building a list of qualitative analysis questions, see Approaches to digital estate planning.
Rationalization decision: In the hands of an experienced rationalization team, the qualitative and quantitative
data creates clear decisions. Unfortunately, teams with a high degree of rationalization experience are expensive
to hire or take months to train.

Rationalization at enterprise scale


If this effort is time consuming and daunting for a 50-VM digital estate, imagine the effort that's required to
drive business transformation in an environment with thousands of VMs and hundreds of applications. The
human effort required can easily exceed 1,500 FTE hours and nine months of planning.
While full rationalization is the end state and a great direction to move in, it seldom produces a high ROI (return
on investment) relative to the time and energy that's required.
When rationalization is essential to financial decisions, it's worth considering a professional services
organization that specializes in cloud rationalization to accelerate the process. Even then, full rationalization can
be a costly and time-consuming effort that delays transformation or business outcomes.
The rest of this article describes an alternative approach, known as incremental rationalization.

Incremental rationalization
The complete rationalization of a large digital estate is prone to risk and can suffer delays because of its
complexity. The assumption behind the incremental approach is that delayed decisions stagger the load on the
business to reduce the risk of roadblocks. Over time, this approach creates an organic model for developing the
processes and experience required to make qualified rationalization decisions more efficiently.
Inventory: Reduce discovery data points
Few organizations invest the time, energy, and expense in maintaining an accurate, real-time inventory of the full
digital estate. Loss, theft, refresh cycles, and employee onboarding often justify detailed asset tracking of end-
user devices. However, the ROI of maintaining an accurate server and application inventory in a traditional, on-
premises datacenter is often low. Most IT organizations have other more pressing issues to address than
tracking the usage of fixed assets in a datacenter.
In a cloud transformation, inventory directly correlates to operating costs. Accurate inventory data is required for
proper planning. Unfortunately, current environmental scanning options can delay decisions by weeks or
months. Fortunately, a few tricks can accelerate data collection.
Agent-based scanning is the most frequently cited delay. The robust data that's required for a traditional
rationalization can often only be collected with an agent running on each asset. This dependency on agents often
slows progress, because it can require feedback from security, operations, and administration functions.
In an incremental rationalization process, an agent-less solution could be used for an initial discovery to
accelerate early decisions. Depending on the level of complexity in the environment, an agent-based solution
might still be required. However, it can be removed from the critical path to business change.
Quantitative analysis: Streamline decisions
Regardless of the approach to inventory discovery, quantitative analysis can drive initial decisions and
assumptions. This is especially true when trying to identify the first workload or when the goal of rationalization
is a high-level cost comparison. In an incremental rationalization process, the cloud strategy team and the cloud
adoption teams limit the five Rs of rationalization to two concise decisions and only apply those quantitative
factors. This streamlines the analysis and reduces the amount of initial data that's required to drive change.
For example, if an organization is in the midst of an IaaS migration to the cloud, you can assume that most
workloads will either be retired or rehosted.
Qualitative analysis: Temporary assumptions
By reducing the number of potential outcomes, it's easier to reach an initial decision about the future state of an
asset. When you reduce the options, you also reduce the number of questions asked of the business at this early
stage.
For example, if the options are limited to rehosting or retiring, the business needs to answer only one question
during initial rationalization, which is whether to retire the asset.
"Analysis suggests that no users are actively using this asset. Is that accurate, or have we overlooked
something?" Such a binary question is typically much easier to run through qualitative analysis.
This streamlined approach produces baselines, financial plans, strategy, and direction. In later activities, each
asset goes through further rationalization and qualitative analysis to evaluate other options. All assumptions that
you make in this initial rationalization are tested before

Challenge assumptions
The outcome of the prior section is a rough rationalization that's full of assumptions. Next, it's time to challenge
some of those assumptions.
Retire assets
In a traditional on-premises environment, hosting small, unused assets seldom causes a significant impact on
annual costs. With a few exceptions, FTE effort that's required to analyze and retire the actual asset outweighs
the cost savings from pruning and retiring those assets.
However, when you move to a cloud accounting model, retiring assets can produce significant savings in annual
operating costs and up-front migration efforts.
It's not uncommon for organizations to retire 20% or more of their digital estate after completing a quantitative
analysis. We recommend doing further qualitative analysis before deciding on such an action. After it's
confirmed, the retirement of those assets can produce the first ROI victory in the cloud migration. In many cases,
this is one of the biggest cost-saving factors. As such, we recommend that the cloud strategy team oversee the
validation and retirement of assets, in parallel with the build phase of the migration process, to allow for an early
financial win.
Program adjustments
A company seldom embarks on just one transformation journey. The choice between cost reduction, market
growth, and new revenue streams is rarely a binary decision. As such, we recommend that the cloud strategy
team work with IT to identify assets on parallel transformation efforts that are outside of the scope of the
primary transformation journey.
In the IaaS migration example given in this article:
Ask the DevOps team to identify assets that are already part of a deployment automation and remove
those assets from the core migration plan.
Ask the Data and R&D teams to identify assets that are powering new revenue streams and remove them
from the core migration plan.
This program-focused qualitative analysis can be executed quickly and creates alignment across multiple
migration backlogs.
You might still need to consider some assets as rehost assets for a while. You can phase in later rationalization
after the initial migration.

Select the first workload


Implementing the first workload is key to testing and learning. It's the first opportunity to demonstrate and build
a growth mindset.
Business criteria
To ensure business transparency, identify a workload that is supported by a member of the cloud strategy team's
business unit. Preferably choose one in which the team has a vested stake and strong motivation to move to the
cloud.
Technical criteria
Select a workload that has minimum dependencies and can be moved as a small group of assets. We
recommend that you select a workload with a defined testing path to make validation easier.
The first workload is often deployed in an experimental environment with no operational or governance capacity.
It's important to select a workload that doesn't interact with secure data.
Qualitative analysis
The cloud adoption teams and the cloud strategy team can work together to analyze this small workload. This
collaboration creates a controlled opportunity to create and test qualitative analysis criteria. The smaller
population creates an opportunity to survey the affected users, and to complete a detailed qualitative analysis in
a week or less. For common qualitative analysis factors, see the specific rationalization target in the 5 Rs of
rationalization.
Migration
In parallel with continued rationalization, the cloud adoption team can begin migrating the small workload to
expand learning in the following key areas:
Strengthen skills with the cloud provider's platform.
Define the core services (and Azure standards) needed to fit the long-term vision.
Better understand how operations might need to change later in the transformation.
Understand any inherent business risks and the business' tolerance for those risks.
Establish a baseline or minimum viable product (MVP ) for governance based on the business' risk tolerance.

Release planning
While the cloud adoption team is executing the migration or implementation of the first workload, the cloud
strategy team can begin prioritizing the remaining applications and workloads.
Power of 10
The traditional approach to rationalization attempts to meet all foreseeable needs. Fortunately, a plan for every
application is often not required to start a transformation journey. In an incremental model, the Power of 10
provides a good starting point. In this model, the cloud strategy team selects the first 10 applications to be
migrated. Those ten workloads should contain a mixture of simple and complex workloads.
Build the first backlogs
The cloud adoption teams and the cloud strategy team can work together on the qualitative analysis for the first
10 workloads. This effort creates the first prioritized migration backlog and the first prioritized release backlog.
This method enables the teams to iterate on the approach and provides sufficient time to create an adequate
process for qualitative analysis.
Mature the process
After the two teams agree on the qualitative analysis criteria, assessment can become a task within each
iteration. Reaching consensus on assessment criteria usually requires two to three releases.
After the assessment has moved into the incremental execution process of migration, the cloud adoption team
can iterate faster on assessment and architecture. At this stage, the cloud strategy team is also abstracted, which
reduces the drain on their time. This also enables the cloud strategy team to focus on prioritizing the applications
that are not yet in a specific release, which ensures tight alignment with changing market conditions.
Not all of the prioritized applications will be ready for migration. Sequencing is likely to change as the team does
deeper qualitative analysis and discovers business events and dependencies that might prompt reprioritization
of the backlog. Some releases might group together a small number of workloads. Others might just contain a
single workload.
The cloud adoption team is likely to run iterations that don't produce a complete workload migration. The
smaller the workload, and the fewer dependencies, the more likely a workload is to fit into a single sprint or
iteration. For this reason, we recommend that the first few applications in the release backlog be small and
contain few external dependencies.

End state
Over time, the combination of the cloud adoption team and the cloud strategy team will complete a full
rationalization of the inventory. However, this incremental approach enables the teams to get continually faster
at the rationalization process. It also helps the transformation journey to yield tangible business results sooner,
without as much upfront analysis effort.
In some cases, the financial model might be too tight to make a decision without additional rationalization. In
such cases, you might need a more traditional approach to rationalization.

Next steps
The output of a rationalization effort is a prioritized backlog of all assets that are affected by the chosen
transformation. This backlog is now ready to serve as the foundation for costing models of cloud services.
Align cost models with the digital estate
Align cost models with the digital estate to forecast
cloud costs
2 minutes to read • Edit Online

After you've rationalized a digital estate, you can align it to equivalent costing models with the chosen cloud
provider. Discussing cost models is difficult without focusing on a specific cloud provider. To provide tangible
examples in this article, Azure is the assumed cloud provider.
Azure pricing tools help you manage cloud spend with transparency and accuracy, so you can make the most of
Azure and other clouds. Providing the tools to monitor, allocate, and optimize cloud costs, empowers customers to
accelerate future investments with confidence.
Azure Migrate: Azure Migrate is perhaps the most cost effective approach to cost model alignment. This
tool allows for digital estate inventory, limited rationalization, and cost calculations in one tool.
Total cost of ownership (TCO ) calculator: Lower the total cost of ownership of your on-premises
infrastructure with the Azure cloud platform. Use the Azure TCO calculator to estimate the cost savings you
can realize by migrating your application workloads to Azure. Provide a brief description of your on-
premises environment to get an instant report.
Azure pricing calculator: Estimate your expected monthly bill by using our pricing calculator. Track your
actual account usage and bill at any time using the billing portal. Set up automatic email billing alerts to
notify you if your spend goes above an amount you configure.
Azure Cost Management: Azure Cost Management, licensed by Microsoft subsidiary Cloudyn, is a
multicloud cost management solution that helps you use and manage Azure and other cloud resources
effectively. Collect cloud usage and billing data through application program interfaces (APIs) from Azure,
Amazon Web Services, and Google Cloud Platform. With that data, gain full visibility into resource
consumption and costs across cloud platforms in a single, unified view. Continuously monitor cloud
consumption and cost trends. Track actual cloud spending against your budget to avoid overspending.
Detect spending anomalies and usage inefficiencies. Use historical data to improve your forecasting
accuracy for cloud usage and expenditures.
Initial organization alignment
2 minutes to read • Edit Online

The most important aspect of any cloud adoption plan is the alignment of people who will make the plan a reality.
No plan is complete until you understand its people-related aspects.
True organizational alignment takes time. It will become important to establish long-term organizational
alignment, especially as cloud adoption scales across the business and IT culture. Alignment is so important that
an entire section has been dedicated to it in the Operate section of the Cloud Adoption Framework.
Full organization alignment is not a required component of the cloud adoption plan. However, some initial
organization alignment is needed. This article outlines a best-practice starting point for organizational alignment.
The guidance here can help complete your plan and get your teams ready for cloud adoption. When you're ready,
you can use the organization alignment section to customize this guidance to fit your organization.

Initial best-practice structure


To create a balance between speed and control, we recommend that during cloud adoption, at a minimum, you
have people accountable for cloud adoption and cloud governance. This might be a team of people sharing
responsibilities for each of these areas, or capabilities. It might also be individual people who are both accountable
for the outcomes and responsible for the work. In either scenario, cloud adoption and cloud governance are two
capabilities that involve natural friction between moving quickly and reducing risks. Here's how the two teams fit
together:

It's fairly intuitive that cloud adoption tasks require people to execute those tasks. So, few people are surprised that
a cloud adoption team is a requirement. However, those who are new to the cloud may not fully appreciate the
importance of a cloud governance team. This challenge often occurs early in adoption cycles. The cloud
governance team provides the necessary checks and balances to ensure that cloud adoption doesn't expose the
business to any new risks. When risks must be taken, this team ensures that proper processes and controls are
implemented to mitigate or govern those risks.
To learn more about cloud adoption, cloud governance, and other such capabilities, see the brief section on
understanding required cloud capabilities.

Map people to capabilities


Assuming that the suggested structure aligns to your cloud adoption plan, the next step is to map specific people
to the necessary capabilities. To do so, answer the following questions:
What person (or group of people) will be responsible for completing technical tasks in the cloud adoption plan?
What person will be accountable for the team's ability to deliver technical changes?
What person (or group of people) will be responsible for implementing protective governance mechanisms?
What person will be accountable for the defining those governance controls?
Are there other capabilities or people that will have accountability or responsibility within the cloud adoption
plan?
After you've documented the answers to these questions, you can establish plans for skills readiness to define
plans to prepare these people for forthcoming work.

Next steps
Learn how to plan for cloud adoption.
Plan for cloud adoption
Plan for cloud adoption
2 minutes to read • Edit Online

A plan is an essential requirement for a successful cloud adoption. A cloud adoption plan is an iterative project
plan that helps a company transition from traditional IT approaches to transformation over to modern, agile
approaches. This article series outlines how a cloud adoption plan helps companies balance their IT portfolio and
manage transitions over time. Through this process, business objectives can be clearly translated into tangible
technical efforts. Those efforts can then be managed and communicated in ways that make sense to business
stakeholders. However, adopting such a process may require some changes to traditional project-management
approaches.

Align strategy and planning


Cloud adoption plans start with a well-defined strategy. At a minimum, the strategy should outline the
motivations, business outcomes, and business justifications for cloud adoption. Those positive returns are then
balanced by the effort required to realize them.
The effort starts with the digital estate (proposed or existing), which translates the strategy into more tangible
workloads and assets. You can then map these tangible elements to technical work. From there, skilled people in a
proper organizational structure can execute the technical work. The cloud adoption plan combines all of these
topics into one plan that can be forecasted, budgeted, implemented, and managed by means of agile project-
management practices. This article series helps you build the plan and provides a few templates to make the job
easier.

Transition from sequential to iterative planning


Planning for cloud adoption can be a significant change for some organizations. IT organizations have long
focused on the application of linear or sequential models of project management, like the waterfall model. In
traditional IT, this approach was entirely logical. Most large IT projects started with a procurement request to
acquire expensive hardware resources. Capital expense requests, budget allocations, and equipment acquisition
often represented a large percentage of project execution. And, after it was acquired, the hardware itself became a
constraint on what could be delivered.
The acquisition models of the cloud change the core dependencies that made a sequential model necessary. The
replacement of acquisition cycles with an operating-expense approach helps businesses move more quickly and
with smaller financial commitments. This approach helps teams to engage in projects before all requirements are
well known. It also creates room for a growth mindset, which frees the team to experiment, learn, and deliver
without artificial constraints. For all these reasons and more, we highly recommend that teams use agile or
iterative approaches to cloud adoption planning.

Build your cloud adoption plan


This article series walks through each step of translating strategy and effort into an actionable cloud adoption plan:
1. Prerequisites: Confirm that all prerequisite steps have been completed before you create your plan.
2. Define and prioritize workloads: Prioritize your first 10 workloads to establish an initial adoption backlog.
3. Align assets: Identify which assets (proposed or existing) are required to support the prioritized workloads.
4. Review rationalization: Review rationalization decisions to refine adoption-path decisions: Migrate or
Innovate.
5. Define iterations and releases: Iterations are the time blocks allocated to do work. Releases are the definition
of the work to be done before triggering a change to production processes.
6. Estimate timelines: Establish rough timelines for release planning purposes, based on initial estimates.

Next steps
Before building your cloud adoption plan, ensure that all necessary prerequisites are in place.
Review prerequisites
Prerequisites for an effective cloud adoption plan
2 minutes to read • Edit Online

A plan is only as effective as the data that's put into it. For a cloud adoption plan to be effective, there are two
categories of input: strategic and tactical. The following sections outline the minimum data points required in each
category.

Strategic inputs
Accurate strategic inputs ensure that the work being done contributes to achievement of business outcomes. The
strategy section of the Cloud Adoption Framework provides a series of exercises to develop a clear strategy. The
outputs of those exercises feed the cloud adoption plan. Before developing the plan, ensure that the following
items are well defined as a result of those exercises:
Clear motivations: Why are we adopting the cloud?
Defined business outcomes: What results do we expect to see from adopting the cloud?
Business justification: How will the business measure success?
Every member of the team that implements the cloud adoption plan should be able to answer these three strategic
questions. Managers and leaders who are accountable for implementation of the plan should understand the
metrics behind each question and any progress toward realizing those metrics.

Tactical inputs
Accurate tactical inputs ensure that the work can be planned accurately and managed effectively. The plan section
of the Cloud Adoption Framework provides a series of exercises to develop planning artifacts before you develop
your plan. These artifacts provide answers to the following questions:
Digital estate rationalization: What are the top 10 priority workloads in the adoption plan? How many
additional workloads are likely to be in the plan? How many assets are being considered as candidates for
cloud adoption? Are the initial efforts focused more on migration or innovation activities?
Organization alignment: Who will do the technical work in the adoption plan? Who is accountable for
adherence to governance and compliance requirements?
Skills readiness: How many people are allocated to perform the required tasks? How well are their skills
aligned to cloud adoption efforts? Are partners aligned to support the technical implementation?
These questions are essential to the accuracy of the cloud adoption plan. At a minimum, the questions about
digital estate rationalization must be answered to create a plan. To provide accurate timelines, the questions about
organization and skills are also important.

Next steps
After the team is comfortable with the strategic inputs and the inputs for digital estate rationalization, the next step
of workload prioritization can begin.
Prioritize and define workloads
Cloud adoption plan and Azure DevOps
3 minutes to read • Edit Online

Azure DevOps is the set of cloud-based tools for Azure customers who manage iterative projects. It also includes
tools for managing deployment pipelines and other important aspects of DevOps.
In this article, you'll learn how to quickly deploy a backlog to Azure DevOps by using a cloud adoption plan
template. This template aligns cloud adoption efforts to a standardized process based on the guidance in the Cloud
Adoption Framework.

Create your cloud adoption plan


To deploy the cloud adoption plan, open the Azure DevOps Demo Generator. This tool will deploy the template to
your Azure DevOps tenant. Using the tool requires the following steps:
1. Verify that the Selected Template field is set to Cloud Adoption Plan. If it isn't, select Choose template to
choose the right template.
2. Select your Azure DevOps organization from the Select Organization drop-down list box.
3. Enter a name for your new project. The cloud adoption plan will have this name when it's deployed to your
Azure DevOps tenant.
4. Select Create Project to create a new project in your tenant, based on the plan template. A progress bar show
your progress toward deploying the project.
5. When deployment is finished, select Navigate to project to see your new project.
After your project has been created, continue through this article series to see how you can modify the template to
align to your cloud adoption plan.
For additional support and guidance on this tool, see Azure DevOps Services Demo Generator.

Bulk edit the cloud adoption plan


When the plan project has been deployed, you can use Microsoft Excel to modify it. It's much easier to create new
workloads or assets in the plan by using Excel than by using the Azure DevOps browser experience.
To prepare your workstation for bulk editing, see Bulk add or modify work items with Excel.

Use the cloud adoption plan


The cloud adoption plan organizes activities by activity type:
Epics: An epic represents an overall phase of the cloud adoption lifecycle.
Features: Features are used to organize specific objectives within each phase. For instance, migration of a
specific workload would be one feature.
User stories: User stories group work into logical collections of activities based on a specific goal.
Tasks: Tasks are the actual work to be done.
At each layer, activities are then sequenced based on dependencies. Activities are linked to articles in the Cloud
Adoption Framework to clarify the objective or task at hand.
The clearest view of the cloud adoption plan comes from the Epics backlog view. For help with changing to the
Epics backlog view, see the article on viewing a backlog. From this view, it's easy to plan and manage the work
required to complete the current phase of the adoption lifecycle.
NOTE
The current state of the cloud adoption plan focuses heavily on migration efforts. Tasks related to governance, innovation, or
operations must be populated manually.

Align the cloud adoption plan


The overview pages for the strategy and planning phases of the cloud adoption lifecycle each reference the Cloud
Adoption Framework strategy and planning template. That template organizes the decisions and data points that
will align the template for the cloud adoption plan with your specific plans for adoption. If you haven't done so
already, you might want to complete the exercises related to strategy and planning before aligning your new
project.
The following articles support alignment of the cloud adoption plan:
Workloads: Align features within the Cloud Migration epic to capture each workload to be migrated or
modernized. Add and modify those features to capture the effort to migrate your top 10 workloads.
Assets: Each asset (VM, application, or data) is represented by the user stories under each workload. Add and
modify those user stories to align with your digital estate.
Rationalization: As each workload is defined, the initial assumptions about that workload can be challenged.
This might result in changes to the tasks under each asset.
Create release plans: Iteration paths establish release plans by aligning efforts with various releases and
iterations.
Establish timelines: Defining start and end dates for each iteration creates a timeline to manage the overall
project.
These five articles help with each of the alignment tasks required to start managing your adoption efforts. The next
step gets you started on the alignment exercise.

Next steps
Start aligning your plan project by defining and prioritizing workloads.
Define and prioritize workloads
Prioritize and define workloads for a cloud adoption
plan
5 minutes to read • Edit Online

Establishing clear, actionable priorities is one of the secrets to successful cloud adoption. The natural temptation
is to invest time in defining all workloads that could potentially be affected during cloud adoption. But that's
counterproductive, especially early in the adoption process.
Instead, we recommend that your team focus on thoroughly prioritizing and documenting the first 10 workloads.
After implementation of the adoption plan begins, the team can maintain a list of the next 10 highest-priority
workloads. This approach provides enough information to plan for the next few iterations.
Limiting the plan to 10 workloads encourages agility and alignment of priorities as business criteria change. This
approach also makes room for the cloud adoption team to learn and to refine estimates. Most important, it
removes extensive planning as a barrier to effective business change.

What is a workload?
In the context of a cloud adoption, a workload is a collection of IT assets (servers, VMs, applications, data, or
appliances) that collectively support a defined process. Workloads can support more than one process.
Workloads can also depend on other shared assets or larger platforms. However, a workload should have
defined boundaries regarding the dependent assets and the processes that depend upon the workload. Often,
workloads can be visualized by monitoring network traffic among IT assets.

Prerequisites
The strategic inputs from the prerequisites list make the following tasks much easier to accomplish. For help with
gathering the data discussed in this article, review the prerequisites.

Initial workload prioritization


During the process of incremental rationalization, your team should agree on a "Power of 10" approach, which
consists of 10 priority workloads. These workloads serve as an initial boundary for adoption planning.
If you decide that a digital estate rationalization isn't needed, we recommend that the cloud adoption teams and
the cloud strategy team agree on a list of 10 applications to serve as the initial focus of the migration. We
recommend further that these 10 workloads contain a mixture of simple workloads (fewer than 10 assets in a
self-contained deployment) and more complex workloads. Those 10 workloads will start the workload
prioritization process.

NOTE
The Power of 10 serves as an initial boundary for planning, to focus the energy and investment in early-stage analysis.
However, the act of analyzing and defining workloads is likely to cause changes in the list of priority workloads.

Add workloads to your cloud adoption plan


In the previous article, Cloud adoption plan and Azure DevOps, you created a cloud adoption plan in Azure
DevOps.
You can now represent the workloads in the Power of 10 list in your cloud adoption plan. The easiest way to do
this is via bulk editing in Microsoft Excel. To prepare your workstation for bulk editing, see Bulk add or modify
work items with Excel.
Step 5 in that article tells you to select Input list. Instead, select Query list. Then, from the Select a Query
drop-down list, select the Workload Template query. That query loads all the efforts related to the migration of
a single workload into your spreadsheet.
After the work items for the workload template are loaded, follow these steps to begin adding new workloads:
1. Copy all the items that have the Workload Template tag in the far right column.
2. Paste the copied rows below the last line item in the table.
3. Change the title cell for the new feature from Workload Template to the name of your new workload.
4. Paste the new workload name cell into the tag column for all rows below the new feature. Be careful to not
change the tags or name of the rows related to the actual Workload Template feature. You will need those
work items when you add the next workload to the cloud adoption plan.
5. Skip to Step 8 in the bulk-editing instructions to publish the worksheet. This step creates all the work items
required to migrate your workload.
Repeat steps 1 through 5 for each of the workloads in the Power of 10 list.

Define workloads
After initial priorities have been defined and workloads have been added to the plan, each of the workloads can
be defined via deeper qualitative analysis. Before including any workload in the cloud adoption plan, try to
provide the following data points for each workload.
Business inputs
DATA POINT DESCRIPTION INPUT

Workload name What is this workload called?

Workload description In one sentence, what does this


workload do?

Adoption motivations Which of the cloud adoption


motivations are affected by this
workload?

Primary sponsor Of those stakeholders affected, who is


the primary sponsor requesting the
preceding motivations?

Business unit Which business unit is responsible for


the cost of this workload?

Business processes Which business processes will be


affected by changes to the workload?

Business teams Which business teams will be affected


by changes?

Business stakeholders Are there any executives whose


business will be affected by changes?
DATA POINT DESCRIPTION INPUT

Business outcomes How will the business measure the


success of this effort?

Metrics What metrics will be used to track


success?

Compliance Are there any third-party compliance


requirements for this workload?

Application owners Who is accountable for the business


impact of any applications associated
with this workload?

Business freeze periods Are there any times during which the
business will not permit change?

Geographies Are any geographies affected by this


workload?

Technical inputs
DATA POINT DESCRIPTION INPUT

Adoption approach Is this adoption a candidate for


migration or innovation?

Application ops lead List the parties responsible for


performance and availability of this
workload.

SLAs List any service-level agreements


(RTO/RPO requirements).

Criticality List the current application criticality.

Data classification List the classification of data sensitivity.

Operating geographies List any geographies in which the


workload is or should be hosted.

Applications Specify an initial list or count of any


applications included in this workload.

VMs Specify an initial list or count of any


VMs or servers included in the
workload.

Data sources Specify an initial list or count of any


data sources included in the workload.

Dependencies List any asset dependencies not


included in the workload.
DATA POINT DESCRIPTION INPUT

User traffic geographies List geographies that have a significant


collection of user traffic.

Confirm priorities
Based on the assembled data, the cloud strategy team and the cloud adoption team should meet to reevaluate
priorities. Clarification of business data points might prompt changes in priorities. Technical complexity or
dependencies might result in changes related to staffing allocations, timelines, or sequencing of technical efforts.
After a review, both teams should be comfortable with confirming the resulting priorities. This set of
documented, validated, and confirmed priorities is the prioritized cloud adoption backlog.

Next steps
For any workload in the prioritized cloud adoption backlog, the team is now ready to align assets.
Align assets for prioritized workloads
Align assets to prioritized workloads
2 minutes to read • Edit Online

Workload is a conceptual description of a collection of assets: VMs, applications, and data sources. The previous
article, Prioritize and define workloads, gave guidance for collecting the data that will define the workload. Before
migration, a few of the technical inputs in that list require additional validation. This article helps with validation of
the following inputs:
Applications: List any applications included in this workload.
VMs and servers: List any VMs or servers included in the workload.
Data sources: List any data sources included in the workload.
Dependencies: List any asset dependencies not included in the workload.
There are several options for assembling this data. The following are a few of the most common approaches.

Alternative inputs: Migrate, Modernize, Innovate


The objective of the preceding data points is to capture relative technical effort and dependencies as an aid to
prioritization. Depending on the transition you want, you may need to gather alternative data points to support
proper prioritization.
Migrate: For pure migration efforts, the existing inventory and asset dependencies serve as a fair measure of
relative complexity.
Modernize: When the goal for a workload is to modernize applications or other assets, these data points are still
solid measures of complexity. However, it might be wise to add an input for modernization opportunities to the
workload documentation.
Innovate: When data or business logic is undergoing material change during a cloud adoption effort, it's
considered an innovate type of transformation. The same is true when you're creating new data or new business
logic. For any innovate scenarios, the migration of assets will likely represent the smallest amount of effort
required. For these scenarios, the team should devise a set of technical data inputs to measure relative complexity.

Azure Migrate
Azure Migrate provides a set of grouping functions that can speed up the aggregation of applications, VMs, data
sources, and dependencies. After workloads have been defined conceptually, they can be used as the basis for
grouping assets based on dependency mapping.
The Azure Migrate documentation provides guidance on how to group machines based on dependencies.

Configuration-management database
Some organizations have a well-maintained configuration-management database (CMDB ) within their existing
operations-management tooling. They could use the CMDB alternatively to provide the input data points
discussed earlier.

Next steps
Review rationalization decisions based on asset alignment and workload definitions.
Review rationalization decisions
Review rationalization decisions
4 minutes to read • Edit Online

During initial strategy and planning phases, we suggest you apply an incremental rationalization approach to the
digital estate. But this approach embeds some assumptions into the resulting decisions. We advise the cloud
strategy team and the cloud adoption teams to review those decisions in light of expanded-workload
documentation. This review is also a good time to involve business stakeholders and the executive sponsor in
future state decisions.

IMPORTANT
Further validation of the rationalization decisions will occur during the assessment phase of migration. This validation focuses
on business review of the rationalization to align resources appropriately.

To validate rationalization decisions, use the following questions to facilitate a conversation with the business. The
questions are grouped by the likely rationalization alignment.

Innovation indicators
If the joint review of the following questions results in a "Yes" answer, a workload might be a better candidate for
innovation. Such a workload wouldn't be migrated via a lift and shift or modernize model. Instead, the business
logic or data structures would be re-created as a new or rearchitected application. This approach can be more
labor-intensive and time-consuming. But for a workload that represents significant business returns, the
investment is justified.
Do the applications in this workload create market differentiation?
Is there a proposed or approved investment aimed at improving the experiences associated with the
applications in this workload?
Does the data in this workload make new product or service offerings available?
Is there a proposed or approved investment aimed at taking advantage of the data associated with this
workload?
Can the effect of the market differentiation or new offerings be quantified? If so, does that return justify the
increased cost of innovation during cloud adoption?
The following two questions can help you include high-level technical scenarios in the rationalization review.
Answering "Yes" to either could identify ways of accounting for or reducing the cost associated with innovation.
Will the data structures or business logic change during the course of cloud adoption?
Is an existing deployment pipeline used to deploy this workload to production?
If the answer to either question is "Yes," the team should consider including this workload as an innovation
candidate. At a minimum, the team should flag this workload for architecture review to identify modernization
opportunities.

Migration indicators
Migration is a faster and cheaper way of adopting the cloud. But it doesn't take advantage of opportunities to
innovate. Before you invest in innovation, answer the following questions. They can help you determine if a
migration model is more applicable for a workload.
Is the source code supporting this application stable? Do you expect it to remain stable and unchanged during
the time frame of this release cycle?
Does this workload support production business processes today? Will it do so throughout the course of this
release cycle?
Is it a priority that this cloud adoption effort improves the stability and performance of this workload?
Is cost reduction associated with this workload an objective during this effort?
Is reducing operational complexity for this workload a goal during this effort?
Is innovation limited by the current architecture or IT operation processes?
If the answer to any of these questions is "Yes," you should consider a migration model for this workload. This
recommendation is true even if the workload is a candidate for innovation.
Challenges in operational complexity, costs, performance, or stability can hinder business returns. You can use the
cloud to quickly produce improvements related to those challenges. Where it's applicable, we suggest you use the
migration approach to first stabilize the workload. Then expand on innovation opportunities in the stable, agile
cloud environment. This approach provides short-term returns and reduces the cost required to drive long-term
change.

IMPORTANT
Migration models include incremental modernization. Using platform as a service (PaaS) architectures is a common aspect of
migration activities. So too are minor configuration changes that use those platform services. The boundary for migration is
defined as a material change to the business logic or supporting business structures. Such change is considered an
innovation effort.

Update the project plan


The skills required for a migration effort are different from the skills required for an innovation effort. During
implementation of a cloud adoption plan, we suggest that you assign migration and innovation efforts to different
teams. Each team has its own iteration, release, and planning cadences. Assigning separate teams provides the
process flexibility to maintain one cloud adoption plan while accounting for innovation and migration efforts.
When you manage the cloud adoption plan in Azure DevOps, that management is reflected by changing the
parent work item (or epic) from cloud migration to cloud innovation. This subtle change helps ensure all
participants in the cloud adoption plan can quickly track the required effort and changes to remediation efforts.
This tracking also helps align proper assignments to the relevant cloud adoption team.
For large, complex adoption plans with multiple distinct projects, consider updating the iteration path. Changing
the area path makes the workload visible only to the team assigned to that area path. This change can make work
easier for the cloud adoption team by reducing the number of visible tasks. But it adds complexity for the project
management processes.

Next steps
Define iterations and releases to begin planning work.
Define iterations and releases to begin planning work.
Establish iterations and release plans
4 minutes to read • Edit Online

Agile and other iterative methodologies are built on the concepts of iterations and releases. This article outlines
the assignment of iterations and releases during planning. Those assignments drive timeline visibility to make
conversations easier among members of the cloud strategy team. The assignments also align technical tasks in a
way that the cloud adoption team can manage during implementation.

Establish iterations
In an iterative approach to technical implementation, you plan technical efforts around recurring time blocks.
Iterations tend to be one-week to six-week time blocks. Consensus suggests that two weeks is the average
iteration duration for most cloud adoption teams. But the choice of iteration duration depends on the type of
technical effort, the administrative overhead, and the team's preference.
To begin aligning efforts to a timeline, we suggest that you define a set of iterations that last 6 to 12 months.

Understand velocity
Aligning efforts to iterations and releases requires an understanding of velocity. Velocity is the amount of work
that can be completed in any given iteration. During early planning, velocity is an estimate. After several iterations,
velocity becomes a highly valuable indicator of the commitments that the team can make confidently.
You can measure velocity in abstract terms like story points. You can also measure it in more tangible terms like
hours. For most iterative frameworks, we recommend using abstract measurements to avoid challenges in
precision and perception. Examples in this article represent velocity in hours per sprint. This representation makes
the topic more universally understood.
Example: A five-person cloud adoption team has committed to two-week sprints. Given current obligations like
meetings and support of other processes, each team member can consistently contribute 20 hours per week to
the adoption effort. For this team, the initial velocity estimate is 100 hours per sprint.

Iteration planning
Initially, you plan iterations by evaluating the technical tasks based on the prioritized backlog. Cloud adoption
teams estimate the effort required to complete various tasks. Those tasks are then assigned to the first available
iteration.
During iteration planning, the cloud adoption teams validate and refine estimates. They do so until they have
aligned all available velocity to specific tasks. This process continues for each prioritized workload until all efforts
align to a forecasted iteration.
In this process, the team validates the tasks assigned to the next sprint. The team updates its estimates based on
the team's conversation about each task. The team then adds each estimated task to the next sprint until the
available velocity is met. Finally, the team estimates additional tasks and adds them to the next iteration. The team
performs these steps until the velocity of that iteration is also exhausted.
The preceding process continues until all tasks are assigned to an iteration.
Example: Let's build on the previous example. Assume each workload migration requires 40 tasks. Also assume
you estimate each task to take an average of one hour. The combined estimation is approximately 40 hours per
workload migration. If these estimates remain consistent for all 10 of the prioritized workloads, those workloads
will take 400 hours.
The velocity defined in the previous example suggests that the migration of the first 10 workloads will take four
iterations, which is two months of calendar time. The first iteration will consist of 100 tasks that result in the
migration of two workloads. In the next iteration, a similar collection of 100 tasks will result in the migration of
three workloads.

WARNING
The preceding numbers of tasks and estimates are strictly used as an example. Technical tasks are seldom that consistent.
You shouldn't see this example as a reflection of the amount of time required to migrate a workload.

Release planning
Within cloud adoption, a release is defined as a collection of deliverables that produce enough business value to
justify the risk of disruption to business processes.
Releasing any workload-related changes into a production environment creates some changes to business
processes. Ideally, these changes are seamless, and the business sees the value of the changes with no significant
disruptions to service. But the risk of business disruption is present with any change and shouldn't be taken lightly.
To ensure a change is justified by its potential return, the cloud strategy team should participate in release
planning. Once tasks are aligned to sprints, the team can determine a rough timeline of when each workload will
be ready for production release. The cloud strategy team would review the timing of each release. The team would
then identify the inflection point between risk and business value.
Example: Continuing the previous example, the cloud strategy team has reviewed the iteration plan. The review
identified two release points. During the second iteration, a total of five workloads will be ready for migration.
Those five workloads will provide significant business value and will trigger the first release. The next release will
come two iterations later, when the next five workloads are ready for release.

Assign iteration paths and tags


For customers who manage cloud adoption plans in Azure DevOps, the previous processes are reflected by
assigning an iteration path to each task and user story. We also recommend tagging each workload with a specific
release. That tagging and assignment feed the automatic population of timeline reports.

Next steps
Estimate timelines to properly communicate expectations.
Estimate timelines
Timelines in a cloud adoption plan
2 minutes to read • Edit Online

In the previous article in this series, workloads and tasks were assigned to releases and iterations. Those
assignments feed the timeline estimates in this article.
Work breakdown structures (WBS ) are commonly used in sequential project-management tools. They represent
how dependent tasks will be completed over time. Such structures work well when tasks are sequential in nature.
The interdependencies in tasks found in cloud adoption make such structures difficult to manage. To fill this gap,
you can estimate timelines based on iteration-path assignments by hiding complexity.

Estimate timelines
To develop a timeline, start with releases. Those release objectives create a target date for any business impact.
Iterations aid in aligning those releases with specific time durations.
If more granular milestones are required in the timeline, use iteration assignment to indicate milestones. To do this
assignment, assume that the last instance of a workload-related task can serve as the final milestone. Teams also
commonly tag the final task as a milestone.
For any level of granularity, use the last day of the iteration as the date for each milestone. This ties completion of
workload adoption to a specific date. You can track the date in a spreadsheet or a sequential project-management
tool like Microsoft Project.

Delivery plans in Azure DevOps


If you're using Azure DevOps to manage your cloud adoption plan, consider using the Microsoft Delivery Plans
extension. This extension can quickly create a visual representation of the timeline that is based on iteration and
release assignments.
Getting started on a skills readiness path
2 minutes to read • Edit Online

IT staff members might feel anxious about their roles and positions as they realize a different set of skills is needed
to support cloud solutions. Agile employees who explore and learn new cloud technologies don't need to have that
fear. They can lead the adoption of cloud services by helping the organization understand and embrace the
associated changes.

Figure 1 - Mapping of skills to IT roles in a cloud -hosted environment.


The Cloud Adoption Framework guides readers through the full adoption lifecycle. Throughout this framework,
readers are provided opportunities to build necessary skills. To help you get started on this journey, skills-
readiness articles are included in the following outline for easier access. Each of the following links maps to the
skills required to be successful in each of those adoption phases.
Strategy: Develop the skills needed to prepare an actionable migration plan. This includes business
justification and other required business-planning skills.
Plan: Develop the skills needed to prepare an actionable migration plan. This includes business justification
and other required business-planning skills.
Ready: Develop the skills needed to prepare the business, culture, people, and environment for coming
changes.
Adopt: Adoption skills are aligned to various technical efforts:
Migrate: Gain the skills required to implement the cloud migration plan.
Innovate: Gain the skills needed to deliver innovative new solutions.
Operate: Skills related to the operating model for cloud adoption are aligned to various opportunities to
gain skills:
Govern: Gain the skills needed to govern the cloud environment.
Manage: Gain the skills needed to manage a cloud environment.
Each of the previous learning paths shares opportunities across multiple media types to maximize knowledge
acquisition.
Microsoft Learn
Microsoft Learn is a new approach to learning. Readiness for the new skills and responsibilities that come with
cloud adoption doesn't come easily. Microsoft Learn provides a more rewarding approach to hands-on learning
that helps you achieve your goals faster. Earn points and levels, and achieve more! Here are a couple examples of
tailored learning paths on Microsoft Learn which align to the Plan section of Cloud Adoption Framework:
Evolve your DevOps practices:DevOps is the union of people, process, and products to enable continuous delivery
of value to your end users. Azure DevOps is a set of services that gives you the tools you need to do just that. With
Azure DevOps, you can build, test, and deploy any application, either to the cloud or on premises.
Azure for the Data Engineer:Explore how the world of data has evolved and how the advent of cloud technologies
is providing new opportunities for business to explore. You will learn the various data platform technologies that
are available, and how a Data Engineer can take advantage of this technology to an organization benefit.

Learn more
To discover additional learning paths, browse the Microsoft Learn catalog. Use the Roles filter to align learning
paths with your role.
Adapt existing roles, skills, and processes for the
cloud
3 minutes to read • Edit Online

At each phase of the IT industry's history, the most notable changes have often been marked by changes in staff
roles. One example is the transition from mainframe computing to client/server computing. The role of the
computer operator during this transition has largely disappeared, replaced by the system administrator role. When
virtualization arrived, the requirement for individuals working with physical servers was replaced with a need for
virtualization specialists.
Roles will likely change as institutions similarly shift to cloud computing. For example, datacenter specialists might
be replaced with cloud administrators or cloud architects. In some cases, though IT job titles haven't changed, the
daily work of these roles has changed significantly.
IT staff members might feel anxious about their roles and positions because they realize that they need a different
set of skills to support cloud solutions. But agile employees who explore and learn new cloud technologies
shouldn't fear. They can lead the adoption of cloud services and help the organization learn and embrace the
associated changes.
For guidance on building a new skill set, see the Skills readiness path.

Capture concerns
As the organization prepares for a cloud adoption effort, each team should document staff concerns as they arise
by identifying:
The type of concern. For example, workers might be resistant to the changes in job duties that come with the
adoption effort.
The impact if the concern isn't addressed. For example, resistance to adoption might result in workers being
slow to execute the required changes.
The area equipped to address the concern. For example, if workers in the IT department are reluctant to acquire
new skills, the IT stakeholder's area is best equipped to address this concern. Identifying the area might be clear
for some concerns. In these cases, you might need to escalate to executive leadership.
IT staff members commonly have concerns about acquiring the training needed to support expanded functions
and new duties. Learning the training preferences of the team helps you prepare a plan. It also allows you to
address these concerns.

Identify gaps
Identifying gaps is another important aspect of organization readiness. A gap is a role, skill, or process that is
required for your digital transformation but doesn't currently exist in your enterprise.
1. Enumerate the responsibilities that come with the digital transformation. Emphasize new responsibilities and
existing responsibilities to be retired.
2. Identify the area that aligns with each responsibility. For each new responsibility, check how closely it aligns
with the area. Some responsibilities might span several areas. This crossover represents an opportunity for
better alignment that you should document as a concern. In the case where no area is identified as being
responsible, document this gap.
3. Identify the skills necessary to support each responsibility, and check if your enterprise has existing resources
with those skills. Where there are no existing resources, determine the training programs or talent acquisition
necessary to fill the gaps. Also determine the deadline by which you must support each responsibility to keep
your digital transformation on schedule.
4. Identify the roles that will execute these skills. Some of your existing workforce will assume parts of the roles.
In other cases, entirely new roles might be necessary.

Partner across teams


The skills necessary to fill the gaps in your organization's digital transformation are typically not confined to a
single role or even a single department. Skills will have relationships and dependencies that can span a single role
or multiple roles. Those roles might exist in several departments. For example, a workload owner might require
someone in an IT role to provision core resources like subscriptions and resource groups.
These dependencies represent new processes that your organization implements to manage the workflow among
roles. The preceding example shows several types of processes that support the relationship between the
workload owner and the IT role. For instance, you can create a workflow tool to manage the process or use an
email template.
Track these dependencies and make note of the processes that will support them. Also note whether the processes
currently exist. For processes that require tooling, ensure that the timeline for deploying any tools aligns with the
overall digital-transformation schedule.

Next steps
Ensuring proper support for the translated roles is a team effort. To act on this guidance, review the organizational
readiness introduction to identify the right team structures and participants.
Identify the right team structures
Before adoption can begin, you must create a landing zone to host the workloads that you plan to build in the cloud or migrate
to the cloud. This section of the framework guides you through the creation of a landing zone.

Landing zone exercises


The following exercises help guide you through the process of creating a landing zone to support cloud adoption.

Azure setup guide


Review the Azure setup guide to become familiar with the tools and approaches you need to use to create a landing zone.

First landing zone


Evaluate the Cloud Adoption Framework migrate landing zone blueprint. Use this blueprint to create your first migration-
ready landing zone for quick experimentation and learning.

Expand the blueprint


Use the landing zone considerations to identify and make any necessary modifications to the blueprint template.

Best practices
Validate landing zone modifications against the best practices sections to ensure the proper configuration of your current
and future landing zones.

Next steps
To get ready for cloud adoption, review the Azure setup guide.
Azure setup guide
2 minutes to read • Edit Online

Azure setup guide: Before you start


NOTE
This guide provides a starting point for readiness guidance in the Cloud Adoption Framework and is also available in Azure
Quickstart Center. See the tip in the article for a link.

Before you start


Before you start building and deploying solutions using Azure services, you need to prepare your environment.
In this guide, we introduce features that help you organize resources, control costs, and secure and manage your
organization. For more information, best practices, and considerations related to preparing your cloud
environment, see the Cloud Adoption Framework's readiness section.
You'll learn how to:
Organize resources: Set up a management hierarchy to consistently apply access control, policy, and
compliance to groups of resources and use tagging to track related resources.
Manage access: Use role-based access control to make sure that users have only the permissions they really
need.
Manage costs and billing: Identify your subscription type, understand how billing works, and see how you
can control costs.
Plan for governance, security, and compliance: Enforce and automate policies and security settings that
help you follow applicable legal requirements.
Use monitoring and reporting: Get visibility across resources to help find and fix problems, optimize
performance, or get insight to customer behavior.
Stay current with Azure: Track product updates so you can take a proactive approach to change
management.

TIP
For an interactive experience, view this guide in the Azure portal. Go to the Azure Quickstart Center in the Azure portal,
select Introduction to Azure Setup, and then follow the step-by-step instructions.

Next steps: Organize your resources to simplify how you apply settings
This guide provides interactive steps that let you try features as they're introduced. To come back to where you
left off, use the breadcrumb for navigation.
Organize your Azure resources
6 minutes to read • Edit Online

Organizing your cloud-based resources is critical to securing, managing, and tracking the costs related to your
workloads. To organize your resources, use the management hierarchies within the Azure platform, implement
well-thought-out naming conventions, and apply resource tagging.
Azure management groups and hierarchy
Naming standards
Resource tags
Azure provides four levels of management scope: management groups, subscriptions, resource groups, and
resources. The following image shows the relationship of these levels.

Management groups: These groups are containers that help you manage access, policy, and compliance for
multiple subscriptions. All subscriptions in a management group automatically inherit the conditions applied to
the management group.
Subscriptions: A subscription groups together user accounts and the resources that were created by those
user accounts. Each subscription has limits or quotas on the amount of resources you can create and use.
Organizations can use subscriptions to manage costs and the resources that are created by users, teams, or
projects.
Resource groups: A resource group is a logical container into which Azure resources like web apps, databases,
and storage accounts are deployed and managed.
Resources: Resources are instances of services that you create, like virtual machines, storage, or SQL
databases.

Scope of management settings


You can apply management settings, like policies and role-based access controls, at any of the management levels.
The level you select determines how widely the setting is applied. Lower levels inherit settings from higher levels.
For example, when you apply a policy to a subscription, that policy is also applied to all resource groups and
resources in that subscription.
Usually, it makes sense to apply critical settings at higher levels and project-specific requirements at lower levels.
For example, you might want to make sure all resources for your organization are deployed to certain regions. To
do that, apply a policy to the subscription that specifies the allowed locations. As other users in your organization
add new resource groups and resources, the allowed locations are automatically enforced. Learn more about
policies in the governance, security, and compliance section of this guide.
If you have only a few subscriptions, it's relatively simple to manage them independently. If the number of
subscriptions you use increases, consider creating a management group hierarchy to simplify the management of
your subscriptions and resources. For more information on how to manage multiple subscriptions, see scaling with
multiple Azure subscriptions.
As you plan your compliance strategy, work with people in your organization with these roles: security and
compliance, IT administration, enterprise architect, networking, finance, and procurement.

Create a management level


You can create a management group, additional subscriptions, or resource groups.
Create a management group
Create a management group to help you manage access, policy, and compliance for multiple subscriptions.
1. Go to Management groups.
2. Select Add management group.
Create a subscription
Use subscriptions to manage costs and resources that are created by users, teams, or projects.
1. Go to Subscriptions.
2. Select Add.
Create a resource group
Create a resource group to hold resources like web apps, databases, and storage accounts that share the same
lifecycle, permissions, and policies.
1. Go to Resource groups.
2. Select Add.
3. Select the Subscription that you want your resource group created under.
4. Enter a name for the Resource group.
5. Select a Region for the resource group location.

Learn more
To learn more, see:
Azure fundamentals
Scaling with multiple Azure subscriptions
Understand resource access management in Azure
Organize your resources with Azure management groups
Subscription service limits

Actions
Create a management group:
Create a management group to help you manage access, policy, and compliance for multiple subscriptions.
1. Go to Management groups.
2. Select Add management group.
G O TO M A N A G E M E N T
G R O U PS
Create an additional subscription:
Use subscriptions to manage costs and resources that are created by users, teams, or projects.
1. Go to Subscriptions.
2. Select Add.
G O TO
S U B S C R I P TI O N S

Create a resource group:


Create a resource group to hold resources like web apps, databases, and storage accounts that share the same
lifecycle, permissions, and policies.
1. Go to Resource groups.
2. Select Add.
3. Select the Subscription that you want your resource group created under.
4. Enter a name for the Resource group.
5. Select a Region for the resource group location.
C R E A TE A R E S O U R C E
GROUP
Manage access to your Azure environment with role-
based access controls
2 minutes to read • Edit Online

Managing who can access your Azure resources and subscriptions is an important part of your Azure governance
strategy, and assigning group-based access rights and privileges is a good practice. Dealing with groups rather
than individual users simplifies maintenance of access policies, provides consistent access management across
teams, and reduces configuration errors. Azure role-based access control (RBAC ) is the primary method of
managing access in Azure.
RBAC provides detailed access management of resources in Azure. It helps you manage who has access to Azure
resources, what they can do with those resources, and what scopes they can access.
When you plan your access control strategy, grant users the least privilege required to get their work done. The
following image shows a suggested pattern for assigning RBAC.

When you plan your access control methodology, we recommend that you work with people in your organizations
with the following roles: security and compliance, IT administration, and enterprise architect.
The Cloud Adoption Framework offers additional guidance on how to use role-based access control as part of your
cloud adoption efforts.

Actions
Grant resource group access:
To grant a user access to a resource group:
1. Go to Resource groups.
2. Select a resource group.
3. Select Access control (IAM ).
4. Select + Add > Add role assignment.
5. Select a role, and then assign access to a user, group, or service principal.
G O TO R E S O U R C E
G R O U PS

Grant subscription access:


To grant a user access to a subscription:
1. Go to Subscriptions.
2. Select a subscription.
3. Select Access control (IAM ).
4. Select +Add > Add role assignment.
5. Select a role, and then assign access to a user, group, or service principal.
G O TO
S U B S C R I P TI O N S

Grant resource group access


To grant a user access to a resource group:
1. Go to Resource groups.
2. Select a resource group.
3. Select Access control (IAM ).
4. Select +Add > Add role assignment.
5. Select a role, and then assign access to a user, group, or service principal.

Grant subscription access


To grant a user access to a subscription:
1. Go to Subscriptions.
2. Select a subscription.
3. Select Access control (IAM ).
4. Select +Add > Add role assignment.
5. Select a role, and then assign access to a user, group, or service principal.

Learn more
To learn more, see:
What is role-based access control (RBAC )?
Cloud Adoption Framework: Use role-based access control
Manage costs and billing for your Azure resources
2 minutes to read • Edit Online

Cost management is the process of effectively planning and controlling costs involved in your business. Cost
management tasks are typically performed by finance, management, and app teams. Azure Cost Management can
help you plan with cost in mind. It can also help you to analyze costs effectively and take action to optimize cloud
spending.
For more information on how to integrate cloud cost management processes throughout your organization, see
the Cloud Adoption Framework article on how to track costs across business units, environments, or projects.

Manage your costs with Azure Cost Management


Azure Cost Management provides a few ways to help you predict and manage costs:
Analyze cloud costs helps you explore and analyze your costs. You can view aggregated cost for your account
or view accumulated costs over time.
Monitor with budgets allows you to create a budget and then configure alerts to warn you when you're close
to exceeding it.
Optimize with recommendations helps identify idle and underused resources so you can take action to
reduce waste.
Manage invoices and payments gives you visibility to your cloud investment.
Predict and manage costs
1. Go to Cost Management + Billing.
2. Select Cost Management.
3. Explore the features that help to analyze and optimize cloud costs.
Manage invoices and payment methods
1. Go to Cost Management + Billing.
2. Select Invoices or Payment methods from the Billing section in the left pane.

Billing and subscription support


We offer 24-hour access every day for billing and subscription support to Azure customers. If you need assistance
to understand Azure usage, create a support request.
Create a support request
To submit a new support request:
1. Go to Help + Support.
2. Select New support request.
View a support request
To view your support requests and their status:
1. Go to Help + Support.
2. Select All support requests.

Learn more
To learn more, see:
Azure billing and cost management documentation
Cloud Adoption Framework: Track costs across business units, environments, or projects
Cloud Adoption Framework: Cost management governance discipline

Actions
Predict and manage costs:
1. Go to Cost Management + Billing.
2. Select Cost Management.
Manage invoices and payment methods:
1. Go to Cost Management + Billing.
2. Select Invoices or Payment methods from the Billing section in the left pane.
G O TO C O S T M A N A G E M E N T +
B ILLING

Billing and subscription support:


We offer 24-hour access every day for billing and subscription support to Azure customers. If you need assistance
to understand Azure usage, create a support request.
Create a support request:
To submit a new support request:
1. Go to Help + Support.
2. Select New support request.
View a support request: To view your support requests and their status:
1. Go to Help + Support.
2. Select All support requests.
G O TO H E L P +
SU PPO R T
Governance, security, and compliance in Azure
3 minutes to read • Edit Online

As you establish corporate policy and plan your governance strategies, you can use tools and services like Azure
Policy, Azure Blueprints, and Azure Security Center to enforce and automate your organization's governance
decisions. Before you start your governance planning, use the Governance Benchmark tool to identify potential
gaps in your organization's cloud governance approach. For more information on how to develop governance
processes, see the Cloud Adoption Framework for Azure's governance guidance.
Azure Blueprints
Azure Policy
Azure Security Center
Azure Blueprints enables cloud architects and central information technology groups to define a repeatable set of
Azure resources that implements and adheres to an organization's standards, patterns, and requirements. Azure
Blueprints makes it possible for development teams to rapidly build and stand up new environments and trust that
they're building within organizational compliance using a set of built-in components--such as networking--to
speed up development and delivery.
Blueprints are a declarative way to orchestrate the deployment of various resource templates and other artifacts
like:
Role assignments.
Policy assignments.
Azure Resource Manager templates.
Resource groups.

Create a blueprint
To create a blueprint:
1. Go to Blueprints - Getting started.
2. In the Create a Blueprint section, select Create.
3. Filter the list of blueprints to select the appropriate blueprint.
4. Enter the Blueprint name, and select the appropriate Definition location.
5. Click Next : Artifacts >> and review the artifacts included in the blueprint.
6. Click Save Draft.
C R E A TE A
B LU E PR INT

1. Go to Blueprints - Getting started.


2. In the Create a Blueprint section, select Create.
3. Filter the list of blueprints to select the appropriate blueprint.
4. Enter the Blueprint name, and select the appropriate Definition location.
5. Click Next : Artifacts >> and review the artifacts included in the blueprint.
6. Click Save Draft.

Publish a blueprint
To publish a blueprint artifacts to your subscription:
1. Goto Blueprints - Blueprint definitions.
2. Select the blueprint you created in the previous steps.
3. Review the blueprint definition and select Publish blueprint.
4. Provide a Version (such as 1.0) and any Change notes, then select Publish.
B LU E PR INT
D E F I N I TI O N S

1. Go to Blueprints - Blueprint definitions.


2. Select the blueprint definition you created in the previous steps.
3. Review the blueprint definition and select Publish blueprint.
4. Provide a Version (such as 1.0) and any Change notes, then select Publish.

Learn more
To learn more, see:
Azure Blueprints
Cloud Adoption Framework: Resource consistency decision guide
Standards-based blueprints samples
Monitoring and reporting in Azure
4 minutes to read • Edit Online

Azure offers many services that together provide a comprehensive solution for collecting, analyzing, and acting on
telemetry from your applications and the Azure resources that support them. In addition, these services can extend
to monitoring critical on-premises resources to provide a hybrid monitoring environment.
Azure Monitor
Azure Service Health
Azure Advisor
Azure Security Center
Azure Monitor provides a single unified hub for all monitoring and diagnostics data in Azure. You can use it to get
visibility across your resources. With Azure Monitor, you can find and fix problems and optimize performance. You
also can understand customer behavior.
Monitor and visualize metrics. Metrics are numerical values available from Azure resources that help you
understand the health of your systems. Customize charts for your dashboards, and use workbooks for
reporting.
Query and analyze logs. Logs include activity logs and diagnostic logs from Azure. Collect additional logs
from other monitoring and management solutions for your cloud or on-premises resources. Log Analytics
provides a central repository to aggregate all this data. From there, you can run queries to help troubleshoot
issues or to visualize data.
Set up alerts and actions. Alerts proactively notify you of critical conditions. Corrective actions can be
taken based on triggers from metrics, logs, or service health issues. You can set up different notifications and
actions and send data to your IT service management tools.
Start monitoring your:
Applications
Containers
Virtual machines
Networks
To monitor other resources, find additional solutions in the Azure Marketplace.
To explore Azure Monitor, go to the Azure portal.

Learn more
To learn more, see Azure Monitor documentation.

Action
E XP L O R E A Z U R E
M O N I TO R
Stay current with Microsoft Azure
2 minutes to read • Edit Online

Cloud platforms like Microsoft Azure change faster than many organizations are accustomed to. This pace of
change means that organizations have to adapt people and processes to a new cadence. If you're responsible for
helping your organization keep up with change, you might feel overwhelmed at times. The resources listed in this
section can help you stay up to date.
Top resources
Additional resources
The following resources can help you stay current with Azure:
Azure Service Health
Service Health and alerts provide timely notifications about ongoing service issues, planned
maintenance, and health advisories. This resource also includes information about features being
removed from Azure.
Azure Updates
Subscribe to Azure Updates to receive announcements about product updates. Brief summaries link to
further details, which makes the updates easy to follow.
Subscribe via RSS.
Azure Blog
The Azure Blog communicates the most important announcements for the Azure platform. Follow this
blog to stay up to date on critical information.
Subscribe via RSS.
Service-specific blogs
Individual Azure services publish blogs that you might want to follow if you rely on those services.
Many Azure service blogs are available. Find the ones you're interested in through a web search.
Azure Info Hub
This site is an unofficial resource that pulls together most of the resources listed here. Follow links to
individual services to get detailed information and find service-specific blogs.
Subscribe via RSS.
Deploy a migration landing zone
4 minutes to read • Edit Online

Migration landing zone is a term used to describe an environment that has been provisioned and prepared to host
workloads being migrated from an on-premises environment into Azure. A migration landing zone is the final
deliverable of the Azure setup guide. This article ties together all of the readiness subjects discussed in this guide
and applies the decisions made to the deployment of your first migration landing zone.
The following sections outline a landing zone commonly used to establish an environment that's suitable for use
during a migration. The environment or landing zone described in this article is also captured in an Azure
blueprint. You can use the Cloud Adoption Framework migrate landing zone blueprint to deploy the defined
environment with a single click.

Purpose of the blueprint


The Cloud Adoption Framework migrate landing zone blueprint creates a landing zone. That landing zone is
intentionally limited. It's designed to create a consistent starting point that provides room to learn infrastructure as
code. For some migration efforts, this landing zone might be sufficient to meet your needs. It's also likely that you
will need to change something in the blueprint to meet your unique constraints.

Blueprint alignment
The following image shows the Cloud Adoption Framework migrate landing zone blueprint in relation to
architectural complexity and compliance requirements.

The letter A sits inside of a curved line that marks the scope of this blueprint. That scope is meant to convey
that this blueprint covers limited architectural complexity but is built on relatively mid-line compliance
requirements.
Customers who have a high degree of complexity and stringent compliance requirements might be better
served by using a partner's extended blueprint or one of the standards-based blueprint samples.
Most customers' needs will fall somewhere between these two extremes. The letter B represents the process
outlined in the landing zone considerations articles. For customers in this space, you can use the decision
guides found in those articles to identify nodes to be added to the Cloud Adoption Framework migrate landing
zone blueprint. This approach allows you to customize the blueprint to fit your needs.

Use this blueprint


Before you use the Cloud Adoption Framework migration landing zone blueprint, review the following
assumptions, decisions, and implementation guidance.

Assumptions
The following assumptions or constraints were used when this initial landing zone was defined. If these
assumptions align with your constraints, you can use the blueprint to create your first landing zone. The blueprint
also can be extended to create a landing zone blueprint that meets your unique constraints.
Subscription limits: This adoption effort isn't expected to exceed subscription limits. Two common indicators
are an excess of 25,000 VMs or 10,000 vCPUs.
Compliance: No third-party compliance requirements are needed in this landing zone.
Architectural complexity: Architectural complexity doesn't require additional production subscriptions.
Shared services: There are no existing shared services in Azure that require this subscription to be treated like
a spoke in a hub and spoke architecture.
If these assumptions seem aligned with your current environment, then this blueprint might be a good place to
start building your landing zone.

Decisions
The following decisions are represented in the landing zone blueprint.

COMPONENT DECISIONS ALTERNATIVE APPROACHES

Migration tools Azure Site Recovery will be deployed Migration tools decision guide
and an Azure Migrate project will be
created.

Logging and monitoring Operational Insights workspace and


diagnostic storage account will be
provisioned.

Network A virtual network will be created with Networking decisions


subnets for gateway, firewall, jumpbox,
and landing zone.

Identity It's assumed that the subscription is Identity management best practices
already associated with an Azure Active
Directory instance.

Policy This blueprint currently assumes that


no Azure policies are to be applied.
COMPONENT DECISIONS ALTERNATIVE APPROACHES

Subscription design N/A - Designed for a single production Scaling subscriptions


subscription.

Management groups N/A - Designed for a single production Scaling subscriptions


subscription.

Resource groups N/A - Designed for a single production Scaling subscriptions


subscription.

Data N/A Choose the correct SQL Server option


in Azure and Azure Data Store guidance

Storage N/A Azure Storage guidance

Naming and tagging standards N/A Naming and tagging best practices

Cost management N/A Tracking costs

Compute N/A Compute options

Customize or deploy a landing zone from this blueprint


Learn more and download a reference sample of the Cloud Adoption Framework migrate landing zone blueprint
for deployment or customization from Azure Blueprints samples.
The blueprint samples are also available within the portal. For details of how to create a blueprint, see Azure
Blueprints.
For guidance on customization that should be made to this blueprint or the resulting landing zone, see the landing
zone considerations articles.

Next steps
After a migration landing zone is deployed, you're ready to migrate workloads to Azure. For guidance on the tools
and processes that are required to migrate your first workload, see the Azure migration guide.
Migrate your first workload with the Azure migration guide
Landing zone considerations
2 minutes to read • Edit Online

A landing zone is the basic building block of any cloud adoption environment. The term landing zone refers to an
environment that's been provisioned and prepared to host workloads in a cloud environment like Azure. A fully
functioning landing zone is the final deliverable of any iteration of the Cloud Adoption Framework's Ready
methodology.

This image shows the major considerations for implementing any landing zone deployment. The considerations
can be broken into three categories or types of considerations: hosting, Azure fundamentals, and governance.

Hosting considerations
All landing zones provide structure for hosting options. The structure is created explicitly through governance
controls or organically through the adoption of services within the landing zone. The following articles can help
you make decisions that will be reflected in the blueprint or other automation scripts that create your landing
zone:
Compute decisions. To minimize operational complexity, align compute options with the purpose of the
landing zone. This decision can be enforced by using automation toolchains, like Azure Policy initiatives and
landing zone blueprints.
Storage decisions. Choose the right Azure Storage solution to support your workload requirements.
Networking decisions. Choose the networking services, tools, and architectures to support your
organization's workload, governance, and connectivity requirements.
Database decisions. Determine which database technology is best suited for your workload requirements.

Azure fundamentals
Each landing zone is part of a broader solution for organizing resources across a cloud environment. Azure
fundamentals are the foundational building blocks for organization.
Azure fundamental concepts. Learn fundamental concepts and terms that are used to organize resources in
Azure, and how the concepts relate to one another.
Resource consistency decision guide. When you understand each of the fundamentals, the resource
organization decision guide can help you make decisions that shape the landing zone.

Governance considerations
The Cloud Adoption Framework's Govern methodologies establish a process for governing the environment as a
whole. However, there are many use cases that might require you to make governance decisions on a per-landing
zone basis. In many scenarios, governance baselines are enforced on a per-landing zone basis, even though the
baselines are established holistically. It's true for the first few landing zones that an organization deploys.
The following articles can help you make governance-related decisions about your landing zone. You can factor
each decision into your governance baselines.
Cost requirements. Based on an organization's motivation for cloud adoption and operational commitments
made about its environment, various cost management configurations might need to be changed for the
landing zone.
Monitoring decisions. Depending on the operational requirements for a landing zone, various monitoring
tools can be deployed. The monitoring decisions article can help you determine the most appropriate tools to
deploy.
Using role-based access control. Azure role-based access control (RBAC ) offers fine-grained, group-based
access management for resources that are organized around user roles.
Policy decisions. Azure Blueprints samples provide premade compliance blueprints, each with predefined
policy initiatives. Policy decisions help inform a selection of the best blueprint or policy initiative based on your
requirements and constraints.
Create hybrid cloud consistency. Create hybrid cloud solutions that give your organization the benefits of
cloud innovation while maintaining many of the conveniences of on-premises management.
Azure fundamental concepts
5 minutes to read • Edit Online

Learn fundamental concepts and terms that are used in Azure, and how the concepts relate to one another.

Azure terminology
It's helpful to know the following definitions as you begin your Azure cloud adoption efforts:
Resource: An entity that's managed by Azure. Examples include Azure virtual machines, virtual networks, and
storage accounts.
Subscription: A logical container for your resources. Each Azure resource is associated with only one
subscription. Creating a subscription is the first step in adopting Azure.
Azure account: The email address that you provide when you create an Azure subscription is the Azure
account for the subscription. The party that's associated with the email account is responsible for the monthly
costs that are incurred by the resources in the subscription. When you create an Azure account, you provide
contact information and billing details, like a credit card. You can use the same Azure account (email address)
for multiple subscriptions. Each subscription is associated with only one Azure account.
Account administrator: The party associated with the email address that's used to create an Azure
subscription. The account administrator is responsible for paying for all costs that are incurred by the
subscription's resources.
Azure Active Directory (Azure AD ): The Microsoft cloud-based identity and access management service.
Azure AD allows your employees to sign in and access resources.
Azure AD tenant: A dedicated and trusted instance of Azure AD. An Azure AD tenant is automatically created
when your organization first signs up for a Microsoft cloud service subscription like Microsoft Azure, Microsoft
Intune, or Office 365. An Azure tenant represents a single organization.
Azure AD directory: Each Azure AD tenant has a single, dedicated, and trusted directory. The directory
includes the tenant's users, groups, and apps. The directory is used to perform identity and access management
functions for tenant resources. A directory can be associated with multiple subscriptions, but each subscription
is associated with only one directory.
Resource groups: Logical containers that you use to group related resources in a subscription. Each resource
can exist in only one resource group. Resource groups allow for more granular grouping within a subscription.
Commonly used to represent a collection of assets required to support a workload, application, or specific
function within a subscription.
Management groups: Logical containers that you use for one or more subscriptions. You can define a
hierarchy of management groups, subscriptions, resource groups, and resources to efficiently manage access,
policies, and compliance through inheritance.
Region: A set of Azure datacenters that are deployed inside a latency-defined perimeter. The datacenters are
connected through a dedicated, regional, low -latency network. Most Azure resources run in a specific Azure
region.

Azure subscription purposes


An Azure subscription serves several purposes. An Azure subscription is:
A legal agreement. Each subscription is associated with an Azure offer (such as a Free Trial or Pay-As-You-
Go). Each offer has a specific rate plan, benefits, and associated terms and conditions. You choose an Azure
offer when you create a subscription.
A payment agreement. When you create a subscription, you provide payment information for that
subscription, such as a credit card number. Each month, the costs incurred by the resources deployed to that
subscription are calculated and billed via that payment method.
A boundary of scale. Scale limits are defined for a subscription. The subscription's resources can't exceed the
set scale limits. For example, there's a limit on the number of virtual machines that you can create in a single
subscription.
An administrative boundary. A subscription can act as a boundary for administration, security, and policy.
Azure also provides other mechanisms to meet these needs, such as management groups, resource groups,
and role-based access control.

Azure subscription considerations


When you create an Azure subscription, you make several key choices about the subscription:
Who is responsible for paying for the subscription? The party associated with the email address that you
provide when you create a subscription by default is the subscription's account administrator. The party
associated with this email address is responsible for paying for all costs that are incurred by the subscription's
resources.
Which Azure offer am I interested in? Each subscription is associated with a specific Azure offer. You can
choose the Azure offer that best meets your needs. For example, if you intend to use a subscription to run
nonproduction workloads, you might choose the Pay-As-You-Go Dev/Test offer or the Enterprise Dev/Test
offer.

NOTE
When you sign up for Azure, you might see the phrase create an Azure account. You create an Azure account when you
create an Azure subscription and associate the subscription with an email account.

Azure administrative roles


Azure defines three types of roles for administering subscriptions, identities, and resources:
Classic subscription administrator roles
Azure role-based access control (RBAC ) roles
Azure Active Directory (Azure AD ) administrator roles
The account administrator role for an Azure subscription is assigned to the email account that's used to create the
Azure subscription. The account administrator is the billing owner of the subscription. The account administrator
can manage the subscription details in the Azure Account Center.
By default, the service administrator role for a subscription also is assigned to the email account that's used to
create the Azure subscription. The service administrator has permissions to the subscription equivalent to the
RBAC -based Owner role. The service administrator also has full access to the Azure portal. The account
administrator can change the service administrator to a different email account.
When you create an Azure subscription, you can associate it with an existing Azure AD tenant. Otherwise, a new
Azure AD tenant with an associated directory is created. The role of Global Administrator in the Azure AD
directory is assigned to the email account that's used to create the Azure AD subscription.
An email account can be associated with multiple Azure subscriptions. The account administrator can transfer a
subscription to another account.
For a detailed description of the roles defined in Azure, see Classic subscription administrator roles, Azure RBAC
roles, and Azure AD administrator roles.
Subscriptions and regions
Every Azure resource is logically associated with only one subscription. When you create a resource, you choose
which Azure subscription to deploy that resource to. You can move a resource to another subscription later.
A subscription isn't tied to a specific Azure region. However, each Azure resource is deployed to only one region.
You can have resources in multiple regions that are associated with the same subscription.

NOTE
Most Azure resources are deployed to a specific region. However, certain resource types are considered global resources,
such as policies that you set by using the Azure Policy services.

Related resources
The following resources provide detailed information about the concepts discussed in this article:
How does Azure work?
Resource access management in Azure
Azure Resource Manager overview
Role-based access control (RBAC ) for Azure resources
What is Azure Active Directory?
Associate or add an Azure subscription to your Azure Active Directory tenant
Topologies for Azure AD Connect
Subscriptions, licenses, accounts, and tenants for Microsoft's cloud offerings

Next steps
Now that you understand fundamental Azure concepts, learn how to scale with multiple Azure subscriptions.
Scale with multiple Azure subscriptions
Review your compute options
6 minutes to read • Edit Online

Determining the compute requirements for hosting your workloads is a key consideration as you prepare for your
cloud adoption. Azure compute products and services support a wide variety of workload computing scenarios
and capabilities. How you configure your landing zone environment to support your compute requirements
depends on your workload's governance, technical, and business requirements.

Identify compute services requirements


As part of your landing zone evaluation and preparation, you need to identify all compute resources that your
landing zone will need to support. This process involves assessing each of the applications and services that make
up your workloads to determine your compute and hosting requirements. After you identify and document your
requirements, you can create policies for your landing zone to control what resource types are allowed based on
your workload needs.
For each application or service you'll deploy to your landing zone environment, use the following decision tree as
a starting point to help you determine your compute services requirements:

NOTE
Learn more about how to assess compute options for each of your applications or services in the Azure application
architecture guide.

Key questions
Answer the following questions about your workloads to help you make decisions based on the Azure compute
services decision tree:
Are you building net new applications and services or migrating from existing on-premises
workloads? Developing new applications as part of your cloud adoption efforts allows you to take full
advantage of modern cloud-based hosting technologies from the design phase on.
If you're migrating existing workloads, can they take advantage of modern cloud technologies?
Migrating on-premises workloads requires analysis: Can you easily optimize existing applications and services
to take advantage of modern cloud technologies or will a lift and shift approach work better for your
workloads?
Can your applications or services take advantage of containers? If your applications are good candidates
for containerized hosting, you can take advantage of the resource efficiency, scalability, and orchestration
capabilities provided by Azure container services. Both Azure Disk Storage and Azure Files services can be
used for persistent storage for containerized applications.
Are your applications web-based or API -based, and do they use PHP, ASP.NET, Node.js, or similar
technologies? Web apps can be deployed to managed Azure App Service instances, so you don't have to
maintain virtual machines for hosting purposes.
Will you require full control over the OS and hosting environment of your workload? If you need to
control the hosting environment, including OS, disks, locally running software, and other configurations, you
can use Azure Virtual Machines to host your applications and services. In addition to choosing your virtual
machine sizes and performance tiers, your decisions regarding virtual disk storage will affect performance and
SLAs related to your infrastructure as a service (IaaS )-based workloads. For more information, see the Azure
Disk Storage documentation.
Will your workload involve high-performance computing (HPC ) capabilities? Azure Batch provides job
scheduling and autoscaling of compute resources as a platform service, so it's easy to run large-scale parallel
and HPC applications in the cloud.
Will your applications use a microservices architecture? Applications that use a microservices-based
architecture can take advantage of several optimized compute technologies. Self-contained, event-driven
workloads can use Azure Functions to build scalable, serverless applications that don't need an infrastructure.
For applications that require more control over the environment where microservices run, you can use
container services like Azure Container Instances, Azure Kubernetes Service, and Azure Service Fabric.

NOTE
Most Azure compute services are used in combination with Azure Storage. Consult the storage decisions guidance for
related storage decisions.

Common compute scenarios


The following table illustrates a few common use scenarios and the recommended compute services for handling
them:

SCENARIO COMPUTE SERVICE

I need to provision Linux and Windows virtual machines in Azure Virtual Machines
seconds with the configurations of my choice.

I need to achieve high availability by autoscaling to create Virtual machine scale sets
thousands of VMs in minutes.

I want to simplify the deployment, management, and Azure Kubernetes Service (AKS)
operations of Kubernetes.
SCENARIO COMPUTE SERVICE

I need to accelerate app development by using an event- Azure Functions


driven serverless architecture.

I need to develop microservices and orchestrate containers on Azure Service Fabric


Windows and Linux.

I want to quickly create cloud apps for web and mobile by Azure App Service
using a fully managed platform.

I want to containerize apps and easily run containers by using Azure Container Instances
a single command.

I need cloud-scale job scheduling and compute management Azure Batch


with the ability to scale to tens, hundreds, or thousands of
virtual machines.

I need to create highly available, scalable cloud applications Azure Cloud Services
and APIs that can help me focus on apps instead of hardware.

Regional availability
Azure lets you deliver services at the scale you need to reach your customers and partners wherever they are. A
key factor in planning your cloud deployment is to determine which Azure region will host your workload
resources.
Some compute options, such as Azure App Service, are generally available in most Azure regions. However, some
compute services are supported only in select regions. Some virtual machine types and their associated storage
types have limited regional availability. Before you decide which regions you will deploy your compute resources
to, we recommend that you refer to the regions page to check the latest status of regional availability.
To learn more about the Azure global infrastructure, see the Azure regions page. You can also view products
available by region for specific details about the overall services that are available in each Azure region.

Data residency and compliance requirements


Legal and contractual requirements related to data storage often will apply to your workloads. These requirements
might vary based on the location of your organization, the jurisdiction where files and data are stored and
processed, and your applicable business sector. Components of data obligations to consider include data
classification, data location, and the respective responsibilities for data protection under the shared responsibility
model. Many compute solutions depend on linked storage resources. This requirement also might influence your
compute decisions. For help with understanding these requirements, see the white paper Achieving Compliant
Data Residency and Security with Azure.
Part of your compliance efforts might include controlling where your compute resources are physically located.
Azure regions are organized into groups called geographies. An Azure geography ensures that data residency,
sovereignty, compliance, and resiliency requirements are honored within geographical and political boundaries. If
your workloads are subject to data sovereignty or other compliance requirements, you must deploy your storage
resources to regions in a compliant Azure geography.

Establish controls for compute services


When you prepare your landing zone environment, you can establish controls that limit what resources each user
can deploy. The controls can help you manage costs and limit security risks, while still allowing developers and IT
teams to deploy and configure resources that are needed to support your workloads.
After you identify and document your landing zone's requirements, you can use Azure Policy to control the
compute resources that you allow users to create. Controls can take the form of allowing or denying the creation
of compute resource types. For example, you might restrict users to creating only Azure App Service or Azure
Functions resources. You also can use policy to control the allowable options when a resource is created, like
restricting what virtual machine SKUs can be provisioned or allowing only specific VM images.
Policies can be scoped to resources, resource groups, subscriptions, and management groups. You can include
your policies in Azure Blueprint definitions and apply them repeatedly throughout your cloud estate.
Review your network options
7 minutes to read • Edit Online

Designing and implementing Azure networking capabilities is a critical part of your cloud adoption efforts. You'll
need to make networking design decisions to properly support the workloads and services that will be hosted in
the cloud. Azure networking products and services support a wide variety of networking capabilities. How you
structure these services and the networking architectures you choose depends on your organization's workload,
governance, and connectivity requirements.

Identify workload networking requirements


As part of your landing zone evaluation and preparation, you need to identify the networking capabilities that your
landing zone needs to support. This process involves assessing each of the applications and services that make up
your workloads to determine their connectivity network control requirements. After you identify and document
the requirements, you can create policies for your landing zone to control the allowed networking resources and
configuration based on your workload needs.
For each application or service you'll deploy to your landing zone environment, use the following decision tree as
a starting point to help you determine the networking tools or services to use:
Key questions
Answer the following questions about your workloads to help you make decisions based on the Azure networking
services decision tree:
Will your workloads require a virtual network? Managed platform as a service (PaaS ) resource types use
underlying platform network capabilities that don't always require a virtual network. If your workloads don't
require advanced networking features and you don't need to deploy infrastructure as a service (IaaS )
resources, the default native networking capabilities provided by PaaS resources might meet your workload
connectivity and traffic management requirements.
Will your workloads require connectivity between virtual networks and your on-premises
datacenter? Azure provides two solutions for establishing hybrid networking capabilities: Azure VPN
Gateway and Azure ExpressRoute. Azure VPN Gateway connects your on-premises networks to Azure through
site-to-site VPNs similar to how you might set up and connect to a remote branch office. VPN Gateway has a
maximum bandwidth of 1.25 GBps. Azure ExpressRoute offers higher reliability and lower latency by using a
private connection between Azure and your on-premises infrastructure. Bandwidth options for ExpressRoute
range from 50 MBps to 100 GBps.
Will you need to inspect and audit outgoing traffic by using on-premises network devices? For cloud-
native workloads, you can use Azure Firewall or cloud-hosted, third-party network virtual appliances (NVAs) to
inspect and audit traffic going to or coming from the public internet. However, many enterprise IT security
policies require internet-bound outgoing traffic to pass through centrally managed devices in the organization's
on-premises environment. Forced tunneling supports these scenarios. Not all managed services support
forced tunneling. Services and features like App Service Environment in Azure App Service, Azure API
Management, Azure Kubernetes Service (AKS ), Managed Instances in Azure SQL Database, Azure Databricks,
and Azure HDInsight support this configuration when the service or feature is deployed inside a virtual
network.
Will you need to connect multiple virtual networks? You can use virtual network peering to connect
multiple instances of Azure Virtual Network. Peering can support connections across subscriptions and
regions. For scenarios where you provide services that are shared across multiple subscriptions or need to
manage a large number of network peerings, consider adopting a hub and spoke networking architecture or
using Azure Virtual WAN. Virtual network peering provides connectivity only between two peered networks.
By default, it doesn't provide transitive connectivity across multiple peerings.
Will your workloads be accessible over the internet? Azure provides services that are designed to help
you manage and secure external access to your applications and services:
Azure Firewall
Network appliances
Azure Front Door Service
Azure Application Gateway
Azure Traffic Manager
Will you need to support custom DNS management? Azure DNS is a hosting service for DNS domains.
Azure DNS provides name resolution by using the Azure infrastructure. If your workloads require name
resolution that goes beyond the features that are provided by Azure DNS, you might need to deploy additional
solutions. If your workloads also require Active Directory services, consider using Azure Active Directory
Domain Services to augment Azure DNS capabilities. For more capabilities, you can also deploy custom IaaS
virtual machines to support your requirements.

Common networking scenarios


Azure networking is composed of multiple products and services that provide different networking capabilities. As
part of your networking design process, you can compare your workload requirements to the networking
scenarios in the following table to identify the Azure tools or services you can use to provide these networking
capabilities:

SCENARIO NETWORKING PRODUCT OR SERVICE

I need the networking infrastructure to connect everything, Azure Virtual Network


from virtual machines to incoming VPN connections.
SCENARIO NETWORKING PRODUCT OR SERVICE

I need to balance inbound and outbound connections and Azure Load Balancer
requests to my applications or services.

I want to optimize delivery from application server farms Azure Application Gateway
while increasing application security with a web application Azure Front Door Service
firewall.

I need to securely use the internet to access Azure Virtual Azure VPN Gateway
Network through high-performance VPN gateways.

I want to ensure ultra-fast DNS responses and ultra-high Azure DNS


availability for all my domain needs.

I need to accelerate the delivery of high-bandwidth content Azure Content Delivery Network
to customers worldwide, from applications and stored content
to streaming video.

I need to protect my Azure applications from DDoS attacks. Azure DDoS Protection

I need to distribute traffic optimally to services across global Azure Traffic Manager
Azure regions, while providing high availability and Azure Front Door Service
responsiveness.

I need to add private network connectivity to access Microsoft Azure ExpressRoute


cloud services from my corporate networks, as if they were
on-premises and residing in my own datacenter.

I want to monitor and diagnose conditions at a network- Azure Network Watcher


scenario level.

I need native firewall capabilities, with built-in high availability, Azure Firewall
unrestricted cloud scalability, and zero maintenance.

I need to connect business offices, retail locations, and sites Azure Virtual WAN
securely.

I need a scalable, security-enhanced delivery point for global Azure Front Door Service
microservices-based web applications.

Choose a networking architecture


After you identify the Azure networking services that you need to support your workloads, you also need to
design the architecture that will combine these services to provide your landing zone's cloud networking
infrastructure. The Cloud Adoption Framework Software Defined Networking decision guide provides details
about some of the most common networking architecture patterns used on Azure.
The following table summarizes the primary scenarios that these patterns support:

SCENARIO SUGGESTED NETWORK ARCHITECTURE

All of the Azure-hosted workloads deployed to your landing PaaS-only


zone will be entirely PaaS-based, won't require a virtual
network, and aren't part of a wider cloud adoption effort that
will include IaaS resources.
SCENARIO SUGGESTED NETWORK ARCHITECTURE

Your Azure-hosted workloads will deploy IaaS-based Cloud-native


resources like virtual machines or otherwise require a virtual
network, but don't require connectivity to your on-premises
environment.

Your Azure-hosted workloads require limited access to on- Cloud DMZ


premises resources, but you're required to treat cloud
connections as untrusted.

Your Azure-hosted workloads require limited access to on- Hybrid


premises resources, and you plan to implement mature
security policies and secure connectivity between the cloud
and your on-premises environment.

You need to deploy and manage a large number of VMs and Hub and spoke
workloads, potentially exceeding Azure subscription limits, you
need to share services across subscriptions, or you need a
more segmented structure for role, application, or permission
segregation.

You have many branch offices that need to connect to each Azure Virtual WAN
other and to Azure.

Azure Virtual Datacenter


In addition using one of these architecture patterns, if your enterprise IT group manages large cloud
environments, consider consulting the Azure Virtual Datacenter guidance when you design your Azure-based
cloud infrastructure. Azure Virtual Datacenter provides a combined approach to networking, security,
management, and infrastructure if your organization meets the following criteria:
Your enterprise is subject to regulatory compliance that requires centralized monitoring and audit capabilities.
Your cloud estate will consist of more than 10,000 IaaS VMs or an equivalent scale of PaaS services.
You need to enable agile deployment capabilities for workloads to support developer and operations teams
while maintaining common policy and governance compliance and central IT control over core services.
Your industry depends on a complex platform that requires deep domain expertise (for example, finance, oil
and gas, or manufacturing).
Your existing IT governance policies require tighter parity with existing features, even during early-stage
adoption.

Follow Azure networking best practices


As part of your networking design process, see these articles:
Virtual network planning. Learn how to plan for virtual networks based on your isolation, connectivity, and
location requirements.
Azure best practices for network security. Learn about Azure best practices that can help you enhance your
network security.
Best practices for networking when you migrate workloads to Azure. Get additional guidance about how to
implement Azure networking to support IaaS -based and PaaS -based workloads.
Review your storage options
17 minutes to read • Edit Online

Storage capabilities are critical for supporting workloads and services that are hosted in the cloud. As part of your
cloud adoption readiness preparations, review this article to help you plan for and address your storage needs.

Select storage tools and services to support your workloads


Azure Storage is the Azure platform's managed service for providing cloud storage. Azure Storage is composed of
several core services and supporting features. Storage in Azure is highly available, secure, durable, scalable, and
redundant. Review the scenarios and considerations described here to choose the relevant Azure services and the
correct architectures to fit your organization's workload, governance, and data storage requirements.
Key questions
Answer the following questions about your workloads to help you make decisions based on the Azure storage
decision tree:
Do your workloads require disk storage to support the deployment of infrastructure as a service
(IaaS ) virtual machines? Azure Disk Storage provides virtual disk capabilities for IaaS virtual machines.
Will you need to provide downloadable images, documents, or other media as part of your
workloads? Azure Blob storage provides the ability to host static files, which are then accessible for download
over the internet. You can make assets that are hosted in Blob storage public, or you can limit assets to
authorized users via Azure Active Directory (Azure AD ), shared keys, or shared access signatures.
Will you need a location to store virtual machine logs, application logs, and analytics data? You can
use Azure Blob storage to store Azure Monitor log data.
Will you need to provide a location for backup, disaster recovery, or archiving workload-related
data? Azure Disk Storage uses Azure Blob storage to provide backup and disaster recovery capabilities. You
can also use Blob storage as a location to back up other resources, like on-premises or IaaS VM -hosted SQL
Server data.
Will you need to support big data analytics workloads? Azure Data Lake Storage Gen 2 is built on top of
Azure Blob storage. Data Lake Storage Gen 2 can support large-enterprise data lake functionality. It also can
handle storing petabytes of information while sustaining hundreds of gigabits of throughput.
Will you need to provide cloud-native file shares? Azure has two primary services that provide cloud-
hosted file shares: Azure NetApp Files and Azure Files. Azure NetApp Files provides high-performance NFS
shares that are well suited to common enterprise workloads like SAP. Azure Files provides file shares
accessible over SMB 3.0 and HTTPS.
Will you need to support hybrid cloud storage for on-premises high-performance computing (HPC )
workloads? Avere vFXT for Azure is a hybrid caching solution that you can use to expand your on-premises
storage capabilities by using cloud-based storage. Avere vFXT for Azure is optimized for read-heavy HPC
workloads that involve compute farms of 1,000 to 40,000 CPU cores. Avere vFXT for Azure can integrate with
on-premises hardware network attached storage (NAS ), Azure Blob storage, or both.
Will you need to perform large-scale archiving and syncing of your on-premises data to the cloud?
Azure Data Box products are designed to help you move large amounts of data from your on-premises
environment to the cloud. Azure Data Box Gateway is a virtual device that resides on-premises. Data Box
Gateway helps you manage large-scale data migration to the cloud. If you need to analyze, transform, or filter
data before you move it to the cloud, you can use Azure Data Box Edge, an AI-enabled physical edge
computing device that's deployed to your on-premises environment. Data Box Edge accelerates processing and
the secure transfer of data to Azure.
Do you want to expand an existing on-premises file share to use cloud storage? Azure File Sync lets
you use the Azure Files service as an extension of file shares that are hosted on your on-premises Windows
Server machines. The syncing service transforms Windows Server into a quick cache of your Azure file share.
It allows your on-premises machines that access the share to use any protocol that's available on Windows
Server.

Common storage scenarios


Azure offers multiple products and services for different storage capabilities. In addition to the storage
requirements decision tree shown earlier in this article, the following table describes a series of potential storage
scenarios and the recommended Azure services to address the scenario's requirements:
Block storage scenarios
CONSIDERATIONS FOR SUGGESTED
SCENARIO SUGGESTED AZURE SERVICES SERVICES

I have bare-metal servers or VMs Azure Disk Storage (Premium SSD) For production services, the Premium
(Hyper-V or VMware) with direct SSD option provides consistent low-
attached storage running LOB latency coupled with high IOPS and
applications. throughput.

I have servers that will host web and Azure Disk Storage (Standard SSD) Standard SSD IOPS and throughput
mobile apps. might be sufficient (at a lower cost than
Premium SSD) for CPU-bound web and
app servers in production.

I have an enterprise SAN or all-flash Azure Disk Storage (Premium or Ultra Ultra SSD is NVMe-based and offers
array (AFA). SSD) submillisecond latency with high IOPS
and bandwidth. Ultra SSD is scalable up
Azure NetApp Files to 64 TiB. The choice of Premium SSD
and Ultra SSD depends on peak latency,
IOPS, and scalability requirements.

I have high-availability (HA) clustered Azure Files (Premium) Clustered workloads require multiple
servers (such as SQL Server FCI or Azure Disk Storage (Premium or Ultra nodes to mount the same underlying
Windows Server failover clustering). SSD) shared storage for failover or HA.
Premium file shares offer shared
storage that's mountable via SMB.
Shared block storage also can be
configured on Premium SSD or Ultra
SSD by using partner solutions.

I have a relational database or data Azure Disk Storage Premium or Ultra The choice of Premium SSD versus Ultra
warehouse workload (such as SQL SSD) SSD depends on peak latency, IOPS,
Server or Oracle). and scalability requirements. Ultra SSD
also reduces complexity by removing
the need for storage pool configuration
for scalability (see details).

I have a NoSQL cluster (such as Azure Disk Storage (Premium SSD) Azure Disk Storage Premium SSD
Cassandra or MongoDB). offering provides consistent low-latency
coupled with high IOPS and
throughput.
CONSIDERATIONS FOR SUGGESTED
SCENARIO SUGGESTED AZURE SERVICES SERVICES

I am running containers with persistent Azure Files (Standard or Premium) File (RWX) and block (RWO) volumes
volumes. driver options are available for both
Azure Disk Storage (Standard, Premium, Azure Kubernetes Service (AKS) and
or Ultra SSD) custom Kubernetes deployments.
Persistent volumes can map to either
an Azure Disk Storage disk or a
managed Azure Files share. Choose
premium versus standard options bases
on workload requirements for
persistent volumes.

I have a data lake (such as a Hadoop Azure Data Lake Storage Gen 2 The Data Lake Storage Gen 2 feature of
cluster for HDFS data). Azure Blob storage provides server-side
Azure Disk Storage (Standard or HDFS compatibility and petabyte scale
Premium SSD) for parallel analytics. It also offers HA
and reliability. Software like Cloudera
can use Premium or Standard SSD on
master/worker nodes, if needed.

I have an SAP or SAP HANA Azure Disk Storage (Premium or Ultra Ultra SSD is optimized to offer
deployment. SSD) submillisecond latency for tier-1 SAP
workloads. Ultra SSD is now in preview.
Premium SSD coupled with M-Series
offers a general availability (GA) option.

I have a disaster recovery site with Azure page blobs Azure page blobs are used by
strict RPO/RTO that syncs from my replication software to enable low-cost
primary servers. replication to Azure without the need
for compute VMs until failover occurs.
For more information, see the Azure
Disk Storage documentation. Note:
Page blobs support a maximum of 8
TB.

File and object storage scenarios


CONSIDERATIONS FOR SUGGESTED
SCENARIO SUGGESTED AZURE SERVICES SERVICES

I use Windows File Server. Azure Files With Azure File Sync, you can store
rarely used data on cloud-based Azure
Azure File Sync file shares while caching your most
frequently used files on-premises for
fast, local access. You can also use
multisite sync to keep files in sync
across multiple servers. If you plan to
migrate your workloads to a cloud-only
deployment, Azure Files might be
sufficient.
CONSIDERATIONS FOR SUGGESTED
SCENARIO SUGGESTED AZURE SERVICES SERVICES

I have an enterprise NAS (such as Azure NetApp Files If you have an on-premises deployment
NetApp Filers or Dell-EMC Isilon). of NetApp, consider using Azure
Azure Files (Premium) NetApp Files to migrate your
deployment to Azure. If you use or will
migrate to Windows Server or a Linux
server, or you have basic functionality
needs from a file share, consider using
Azure Files. For continued on-premises
access, use Azure File Sync to sync
Azure file shares with on-premises file
shares by using a cloud tiering
mechanism.

I have a file share (SMB or NFS). Azure Files (Standard or Premium) The choice of Premium versus Standard
Azure Files tiers depends on IOPS,
Azure NetApp Files throughput, and your need for latency
consistency. If you have an on-premises
deployment of NetApp, consider using
Azure NetApp Files. If you need to
migrate your access control lists (ACLs)
and timestamps to the cloud, Azure File
Sync can bring all these settings to your
Azure file shares as a convenient
migration path.

I have an on-premises object storage Azure Blob storage Azure Blob storage provides premium,
system for petabytes of data (such as hot, cool, and archive tiers to match
Dell-EMC ECS). your workload performance and cost
needs.

I have a DFSR deployment or another Azure Files Azure File Sync offers multisite sync to
way of handling branch offices. keep files in sync across multiple
Azure File Sync servers and native Azure file shares in
the cloud. Move to a fixed storage
footprint on-premises by using cloud
tiering. Cloud tiering transforms your
server into a cache for the relevant files
while scaling cold data in Azure file
shares.

I have a tape library (either on- Azure Blob storage (cool or archive An Azure Blob storage archive tier will
premises or offsite) for backup and tiers) have the lowest possible cost, but it
disaster recovery or long-term data might require hours to copy the offline
retention. data to a cool, hot, or premium tier of
storage to allow access. Cool tiers
provide instantaneous access at low
cost.
CONSIDERATIONS FOR SUGGESTED
SCENARIO SUGGESTED AZURE SERVICES SERVICES

I have file or object storage configured Azure Blob storage (cool or archive To back up data for long-term retention
to receive my backups. tiers) with lowest-cost storage, move data to
Azure File Sync Azure Blob storage and use cool and
archive tiers. To enable fast disaster
recovery for file data on a server (on-
premises or on an Azure VM), sync
shares to individual Azure file shares by
using Azure File Sync. With Azure file
share snapshots, you can restore earlier
versions and sync them back to
connected servers or access them
natively in the Azure file share.

I run data replication to a disaster Azure Files Azure File Sync removes the need for a
recovery site. disaster recovery server and stores files
Azure File Sync in native Azure SMB shares. Fast
disaster recovery rebuilds any data on a
failed on-premises server quickly. You
can even keep multiple server locations
in sync or use cloud tiering to store
only relevant data on-premises.

I manage data transfer in disconnected Azure Data Box Edge or Azure Data Box Using Data Box Edge or Data Box
scenarios. Gateway Gateway, you can copy data in
disconnected scenarios. When the
gateway is offline, it saves all files you
copy in the cache, then uploads them
when you're connected.

I manage an ongoing data pipeline to Azure Data Box Edge or Azure Data Box Move data to the cloud from systems
the cloud. Gateway that are constantly generating data just
by having them copy that data straight
to the storage gateway. If they need to
access that data later, it's right there
where they put it.

I have bursts of quantities of data that Azure Data Box Edge or Azure Data Box Manage large quantities of data that
arrive at the same time. Gateway arrive at the same time, like when an
autonomous car pulls back into the
garage, or a gene sequencing machine
finishes its analysis. Copy all that data
to Data Box Gateway at fast local
speeds, and then let the gateway
upload it as your network allows.

Plan based on data workloads


CONSIDERATIONS FOR SUGGESTED
SCENARIO SUGGESTED AZURE SERVICES SERVICES

I want to develop a new cloud-native Azure Blob storage


application that needs to persist
unstructured data.

I need to migrate data from an on- Azure NetApp Files


premises NetApp instance to Azure.
CONSIDERATIONS FOR SUGGESTED
SCENARIO SUGGESTED AZURE SERVICES SERVICES

I need to migrate data from on- Azure Files


premises Windows File Server instances
to Azure.

I need to move file data to the cloud Azure Files


but continue to primarily access the
data from on-premises. Azure File Sync

I need to support "burst compute" - Avere vFXT for Azure IaaS scale-out NFS/SMB file caching
NFS/SMB read-heavy, file-based
workloads with data assets that reside
on-premises while computation runs in
the cloud.

I need to move an on-premises Azure Disk Storage


application that uses a local disk or
iSCSI.

I need to migrate a container-based Azure Disk Storage


application that has persistent volumes.
Azure Files

I need to move file shares that aren't Azure Files Protocol Support Regional Availability
Windows Server or NetApp to the Performance Requirements Snapshot
cloud. Azure NetApp Files and Clone Capabilities Price Sensitivity

I need to transfer terabytes to Azure Data Box Edge


petabytes of data from on-premises to
Azure.

I need to process data before Azure Data Box Edge


transferring it to Azure.

I need to support continuous data Azure Data Box Gateway


ingestion in an automated way by
using local cache.

Learn more about Azure storage services


After you identify the Azure tools that best match your requirements, use the detailed documentation linked in the
following table to familiarize yourself with these services:

SERVICE DESCRIPTION
SERVICE DESCRIPTION

Azure Blob storage Azure Blob storage is Microsoft's object storage solution for
the cloud. Blob storage is optimized for storing massive
amounts of unstructured data. Unstructured data is data that
doesn't adhere to a specific data model or definition, such as
text or binary data.

Blob storage is designed for:


Serving images or documents directly to a browser.
Storing files for distributed access.
Streaming video and audio.
Writing to log files.
Storing data for backup and restore, disaster recovery,
and archiving.
Storing data for analysis by an on-premises or Azure-
hosted service.

Azure Data Lake Storage Gen 2 Blob storage supports Azure Data Lake Storage Gen2,
Microsoft's enterprise big data analytics solution for the cloud.
Azure Data Lake Storage Gen2 offers a hierarchical file system
as well as the advantages of Blob storage, including low-cost,
tiered storage; high availability; strong consistency; and
disaster recovery capabilities.

Azure Disk Storage Azure Disk Storage offers persistent, high-performance block
storage to power Azure virtual machines. Azure disks are
highly durable, secure, and offer the industry's only single-
instance SLA for VMs that use premium or ultra SSDs (learn
more about disk types). Azure disks provide high availability
with Availability Sets and Availability Zones that map to your
Azure virtual machines fault domains. In addition, Azure disks
are managed as a top-level resource in Azure. Azure Resource
Manager capabilities like role-based access control (RBAC),
policy, and tagging by default are provided.

Azure Files Azure Files provides fully managed, native SMB file shares as a
service, without the need to run a VM. You can mount an
Azure Files share as a network drive to any Azure VM or on-
premises machine.

Azure File Sync Azure File Sync can be used to centralize your organization's
file shares in Azure Files, while keeping the flexibility,
performance, and compatibility of an on-premises file server.
Azure File Sync transforms Windows Server into a quick cache
of your Azure file share.

Azure NetApp Files The Azure NetApp Files service is an enterprise-class, high-
performance, metered file storage service. Azure NetApp Files
supports any workload type and is highly available by default.
You can select service and performance levels and set up
snapshots through the service.

Azure Data Box Edge Azure Data Box Edge is an on-premisesnetworkdevicethat


moves data intoand out of Azure. Data Box Edge has AI-
enabled edge compute to preprocess dataduring upload. Data
Box Gateway is a virtual version of the device but with the
same data transfer capabilities.
SERVICE DESCRIPTION

Azure Data Box Gateway Azure Data Box Gateway is a storage solution that enables
you to seamlessly send data to Azure. Data Box Gateway is a
virtual device based on a virtual machine provisioned in your
virtualized environment or hypervisor. The virtual device
resides on-premises and you write data to it by using the NFS
and SMB protocols. The device then transfers your data to
Azure block blobs or Azure page blobs, or to Azure Files.

Avere vFXT for Azure Avere vFXT for Azure is a filesystem caching solution for data-
intensive high-performance computing (HPC) tasks. Take
advantage of cloud computing's scalability to make your data
accessible when and where it's needed—even for data that's
stored in your own on-premises hardware.

Data redundancy and availability


Azure Storage has various redundancy options to help ensure durability and high availability based on customer
needs: locally redundant storage (LRS ), zone-redundant storage (ZRS ), geo-redundant storage (GRS ), and read-
access geo-redundant storage (RA-GRS ).
See Azure Storage redundancy to learn more about these capabilities and how you can decide on the best
redundancy option for your use cases. Also, service level agreements (SLAs) for storage services provide
guarantees that are financially backed. For more information, see SLA for managed disks, SLA for virtual
machines, and SLA for storage accounts.
For help with planning the right solution for Azure disks, see Backup and disaster recovery for Azure Disk
Storage.

Security
To help you protect your data in the cloud, Azure Storage offers several best practices for data security and
encryption for data at rest and in transit. You can:
Secure the storage account by using RBAC and Azure AD.
Secure data in transit between an application and Azure by using client-side encryption, HTTPS, or SMB 3.0.
Set data to be automatically encrypted when it's written to Azure Storage by using storage service encryption.
Grant delegated access to the data objects in Azure Storage by using shared access signatures.
Use analytics to track the authentication method that someone is using when they access storage in Azure.
These security features apply to Azure Blob storage (block and page) and to Azure Files. Get detailed storage
security guidance in the Azure Storage security guide.
Storage service encryption provides encryption at rest and safeguards your data to meet your organization's
security and compliance commitments. Storage service encryption is enabled by default for all managed disks,
snapshots, and images in all the Azure regions. Starting June 10, 2017, all new managed disks, snapshots, images,
and new data written to existing managed disks are automatically encrypted at rest with keys managed by
Microsoft. Visit the FAQ for managed disks for more details.
Azure Disk Encryption allows you to encrypt managed disks that are attached to IaaS VMs as OS and data disks
at rest and in transit by using your keys stored in Azure Key Vault. For Windows, the drives are encrypted by using
industry-standard BitLocker encryption technology. For Linux, the disks are encrypted by using the dm-crypt
subsystem. The encryption process is integrated with Azure Key Vault to allow you to control and manage the disk
encryption keys. For more information, see Azure Disk Encryption for Windows and Linux IaaS VMs.
Regional availability
You can use Azure to deliver services at the scale that you need to reach your customers and partners wherever
they are. The managed disks and Azure Storage regional availability pages show the regions where these services
are available. Checking the regional availability of a service beforehand can help you make the right decision for
your workload and customer needs.
Managed disks are available in all Azure regions that have Premium SSD and Standard SSD offerings. Although
Ultra SSD currently is in public preview, it's offered in only one availability zone, the East US 2 region. Verify the
regional availability when you plan mission-critical, top-tier workloads that require Ultra SSD.
Hot and cool blob storage, Data Lake Storage Gen2, and Azure Files storage are available in all Azure regions.
Archival bob storage, premium file shares, and premium block blob storage are limited to certain regions. We
recommend that you refer to the regions page to check the latest status of regional availability.
To learn more about Azure global infrastructure, see the Azure regions page. You can also consult the products
available by region page for specific details about what's available in each Azure region.

Data residency and compliance requirements


Legal and contractual requirements that are related to data storage often will apply to your workloads. These
requirements might vary based on the location of your organization, the jurisdiction of the physical assets that
host your data stores, and your applicable business sector. Components of data obligations to consider include
data classification, data location, and the respective responsibilities for data protection under the shared
responsibility model. For help with understanding these requirements, see the white paper Achieving Compliant
Data Residency and Security with Azure.
Part of your compliance efforts might include controlling where your database resources are physically located.
Azure regions are organized into groups called geographies. An Azure geography ensures that data residency,
sovereignty, compliance, and resiliency requirements are honored within geographical and political boundaries. If
your workloads are subject to data sovereignty or other compliance requirements, you must deploy your storage
resources to regions that are in a compliant Azure geography.
Review your data options
6 minutes to read • Edit Online

When you prepare your landing zone environment for your cloud adoption, you need to determine the data
requirements for hosting your workloads. Azure database products and services support a wide variety of data
storage scenarios and capabilities. How you configure your landing zone environment to support your data
requirements depends on your workload governance, technical, and business requirements.

Identify data services requirements


As part of your landing zone evaluation and preparation, you need to identify the data stores that your landing
zone needs to support. The process involves assessing each of the applications and services that make up your
workloads to determine their data storage and access requirements. After you identify and document these
requirements, you can create policies for your landing zone to control allowed resource types based on your
workload needs.
For each application or service you'll deploy to your landing zone environment, use the following decision tree as a
starting point to help you determine the appropriate data store services to use:
Key questions
Answer the following questions about your workloads to help you make decisions based on the Azure database
services decision tree:
Do you need full control or ownership of your database software or host OS? Some scenarios require
you to have a high degree of control or ownership of the software configuration and host servers for your
database workloads. In these scenarios, you can deploy custom infrastructure as a service (IaaS ) virtual
machines to fully control the deployment and configuration of data services. If you don't have these
requirements, platform as a service (PaaS )-managed database services might reduce your management and
operations costs.
Will your workloads use a relational database technology? If so, what technology do you plan to use?
Azure provides managed PaaS database capabilities for Azure SQL Database, MySQL, PostgreSQL, and
MariaDB.
Will your workloads use SQL Server? In Azure, you can have your workloads running in IaaS -based SQL
Server on Azure Virtual Machines or on the PaaS -based Azure SQL Database hosted service. Choosing which
option to use is primarily a question of whether you want to manage your database, apply patches, and take
backups, or if you want to delegate these operations to Azure. In some scenarios, compatibility issues might
require the use of IaaS -hosted SQL Server. For more information about how to choose the correct option for
your workloads, see Choose the right SQL Server option in Azure.
Will your workloads use key/value database storage? Azure Cache for Redis offers a high-performance
cached key/value data storage solution that can power fast, scalable applications. Azure Cosmos DB also
provides general-purpose key/value storage capabilities.
Will your workloads use document or graph data? Azure Cosmos DB is a multi-model database service
that supports a wide variety of data types and APIs. Azure Cosmos DB also provides document and graph
database capabilities.
Will your workloads use column-family data? Apache HBase in Azure HDInsight is built on Apache
Hadoop. It supports large amounts of unstructured and semi-structured data in a schema-less database that's
organized by column families.
Will your workloads require high-capacity data analytics capabilities? You can use Azure SQL Data
Warehouse to effectively store and query structured petabyte-scale data. For unstructured big data workloads,
you can use Azure Data Lake to store and analyze petabyte-size files and trillions of objects.
Will your workloads require search engine capabilities? You can use Azure Search to build AI-enhanced
cloud-based search indexes that can be integrated into your applications.
Will your workloads use time series data? Azure Time Series Insights is built to store, visualize, and query
large amounts of time series data, such as data generated by IoT devices.

NOTE
Learn more about how to assess database options for each of your application or services in the Azure application
architecture guide.

Common database scenarios


The following table illustrates a few common use scenario requirements and the recommended database services
for handling them:

SCENARIO DATA SERVICE

I need a globally distributed, multi-model database with Azure Cosmos DB


support for NoSQL choices.

I need a fully managed relational database that provisions Azure SQL Database
quickly, scales on the fly, and includes built-in intelligence and
security.

I need a fully managed, scalable MySQL relational database Azure Database for MySQL
that has high availability and security built in at no extra cost.

I need a fully managed, scalable PostgreSQL relational Azure Database for PostgreSQL
database that has high availability and security built in at no
extra cost.

I plan to host enterprise SQL Server apps in the cloud and SQL Server on Virtual Machines
have full control over the server OS.
SCENARIO DATA SERVICE

I need a fully managed elastic data warehouse that has Azure SQL Data Warehouse
security at every level of scale at no extra cost.

I need data lake storage resources that are capable of Azure Data Lake
supporting Hadoop clusters or HDFS data.

I need high throughput and consistent, low-latency access for Azure Cache for Redis
my data to support fast, scalable applications.

I need a fully managed, scalable MariaDB relational database Azure Database for MariaDB
that has high availability and security built in at no extra cost.

Regional availability
Azure lets you deliver services at the scale you need to reach your customers and partners, wherever they are. A
key factor in planning your cloud deployment is to determine what Azure region will host your workload
resources.
Most database services are generally available in most Azure regions. However, there are a few regions, mostly
targeting governmental customers, that support only a subset of these products. Before you decide which regions
you will deploy your database resources to, we recommend that you refer to the regions page to check the latest
status of regional availability.
To learn more about Azure global infrastructure, see the Azure regions page. You can also view products available
by region for specific details about the overall services that are available in each Azure region.

Data residency and compliance requirements


Legal and contractual requirements that are related to data storage often will apply to your workloads. These
requirements might vary based on the location of your organization, the jurisdiction of the physical assets that host
your data stores, and your applicable business sector. Components of data obligations to consider include data
classification, data location, and the respective responsibilities for data protection under the shared responsibility
model. For help with understanding these requirements, see the white paper Achieving Compliant Data Residency
and Security with Azure.
Part of your compliance efforts might include controlling where your database resources are physically located.
Azure regions are organized into groups called geographies. An Azure geography ensures that data residency,
sovereignty, compliance, and resiliency requirements are honored within geographical and political boundaries. If
your workloads are subject to data sovereignty or other compliance requirements, you must deploy your storage
resources to regions in a compliant Azure geography.

Establish controls for database services


When you prepare your landing zone environment, you can establish controls that limit what data stores users can
deploy. Controls can help you manage costs and limit security risks, while still allowing developers and IT teams to
deploy and configure resources that are needed to support your workloads.
After you identify and document your landing zone's requirements, you can use Azure Policy to control the
database resources that you allow users to create. Controls can take the form of allowing or denying the creation
of database resource types. For example, you might restrict users to creating only Azure SQL Database resources.
You can also use policy to control the allowable options when a resource is created, like restricting what SQL
Database SKUs can be provisioned or allowing only specific versions of SQL server to be installed on an IaaS VM.
Policies can be scoped to resources, resource groups, subscriptions, and management groups. You can include your
policies in Azure Blueprint definitions and apply them repeatedly throughout your cloud estate.
Role-based access control
5 minutes to read • Edit Online

Group-based access rights and privileges are a good practice. Dealing with groups rather than individual users
simplifies maintenance of access policies, provides consistent access management across teams, and reduces
configuration errors. Assigning users to and removing users from appropriate groups helps keep current the
privileges of a specific user. Azure role-based access control (RBAC ) offers fine-grained access management for
resources organized around user roles.
For an overview of recommended RBAC practices as part of an identity and security strategy, see Azure identity
management and access control security best practices.

Overview of role-based access control


By using role-based access control, you can separate duties within your team and grant only enough access for
specific Azure Active Directory (Azure AD ) users, groups, service principals, or managed identities to perform their
jobs. Instead of giving everybody unrestricted access to your Azure subscription or resources, you can limit
permissions for each set of resources.
RBAC role definitions list operations that are permitted or disallowed for users or groups assigned to that role. A
role's scope specifies which resources these defined permissions apply to. Scopes can be specified at multiple
levels: management group, subscription, resource group, or resource. Scopes are structured in a parent/child
relationship.

For detailed instructions for assigning users and groups to specific roles and assigning roles to scopes, see
Manage access to Azure resources using RBAC.
When planning your access control strategy, use a least-privilege access model that grants users only the
permissions required to perform their work. The following diagram shows a suggested pattern for using RBAC
through this approach.
NOTE
The more specific or detailed permissions are that you define, the more likely it is that your access controls will become
complex and difficult to manage. This is especially true as your cloud estate grows in size. Avoid resource-specific
permissions. Instead, use management groups for enterprise-wide access control and resource groups for access control
within subscriptions. Also avoid user-specific permissions. Instead, assign access to groups in Azure AD.

Use built-in RBAC roles


Azure provides a many built-in role definitions, with three core roles for providing access:
The Owner role can manage everything, including access to resources.
The Contributor role can manage everything except access to resources.
The Reader role can view everything but not make any changes.
Beginning from these core access levels, additional built-in roles provide more detailed controls for accessing
specific resource types or Azure features. For example, you can manage access to virtual machines by using the
following built-in roles:
The Virtual Machine Administrator Login role can view virtual machines in the portal and sign in as
administrator.
The Virtual Machine Contributor role can manage virtual machines, but it can't access them or the virtual
network or storage account they're connected to.
The Virtual Machine User Login role can view virtual machines in the portal and sign in as a regular user.
For another example of using built-in roles to manage access to particular features, see the discussion on
controlling access to cost-tracking features in Tracking costs across business units, environments, or projects.
For a complete list of available built-in roles, see Built-in roles for Azure resources.

Use custom roles


Although the roles built in to Azure support a wide variety of access control scenarios, they might not meet all the
needs of your organization or team. For example, if you have a single group of users responsible for managing
virtual machines and Azure SQL Database resources, you might want to create a custom role to optimize
management of the required access controls.
The Azure RBAC documentation contains instructions on creating custom roles, along with details on how role
definitions work.
Separation of responsibilities and roles for large organizations
RBAC allows organizations to assign different teams to various management tasks within large cloud estates. It
can allow central IT teams to control core access and security features, while also giving software developers and
other teams large amounts of control over specific workloads or groups of resources.
Most cloud environments can also benefit from an access-control strategy that uses multiple roles and
emphasizes a separation of responsibilities between these roles. This approach requires that any significant
change to resources or infrastructure involves multiple roles to complete, ensuring that more than one person
must review and approve a change. This separation of responsibilities limits the ability of a single person to access
sensitive data or introduce vulnerabilities without the knowledge of other team members.
The following table illustrates a common pattern for dividing IT responsibilities into separate custom roles:

GROUP COMMON ROLE NAME RESPONSIBILITIES

Security Operations SecOps Provides general security oversight.

Establishes and enforces security policy


such as encryption at rest.

Manages encryption keys.

Manages firewall rules.

Network Operations NetOps Manages network configuration and


operations within virtual networks, such
as routes and peerings.

Systems Operations SysOps Specifies compute and storage


infrastructure options, and maintains
resources that have been deployed.

Development, Test, and Operations DevOps Builds and deploys workload features
and applications.

Operates features and applications to


meet service-level agreements (SLAs)
and other quality standards.

The breakdown of actions and permissions in these standard roles are often the same across your applications,
subscriptions, or entire cloud estate, even if these roles are performed by different people at different levels.
Accordingly, you can create a common set of RBAC role definitions to apply across different scopes within your
environment. Users and groups can then be assigned a common role, but only for the scope of resources, resource
groups, subscriptions, or management groups that they're responsible for managing.
For example, in a hub and spoke networking topology with multiple subscriptions, you might have a common set
of role definitions for the hub and all workload spokes. A hub subscription's NetOps role can be assigned to
members of the organization's central IT staff, who are responsible for maintaining networking for shared services
used by all workloads. A workload spoke subscription's NetOps role can then be assigned to members of that
specific workload team, allowing them to configure networking within that subscription to best support their
workload requirements. The same role definition is used for both, but scope-based assignments ensure that users
have only the access that they need to perform their job.
Create hybrid cloud consistency
6 minutes to read • Edit Online

This article guides you through the high-level approaches for creating hybrid cloud consistency.
Hybrid deployment models during migration can reduce risk and contribute to a smooth infrastructure transition.
Cloud platforms offer the greatest level of flexibility when it comes to business processes. Many organizations are
hesitant to make the move to the cloud. Instead, they prefer to keep full control over their most sensitive data.
Unfortunately, on-premises servers don't allow for the same rate of innovation as the cloud. A hybrid cloud
solution offers the speed of cloud innovation and the control of on-premises management.

Integrate hybrid cloud consistency


Using a hybrid cloud solution allows organizations to scale computing resources. It also eliminates the need to
make massive capital expenditures to handle short-term spikes in demand. Changes to your business can drive the
need to free up local resources for more sensitive data or applications. It's easier, faster, and less expensive to
deprovision cloud resources. You pay only for those resources your organization temporarily uses, instead of
having to purchase and maintain additional resources. This approach reduces the amount of equipment that might
remain idle over long periods of time. Hybrid cloud computing delivers all the benefits of cloud computing
flexibility, scalability, and cost efficiencies with the lowest possible risk of data exposure.

Figure 1 - Creating hybrid cloud consistency across identity, management, security, data, development, and
DevOps.
A true hybrid cloud solution must provide four components, each of which brings significant benefits:
Common identity for on-premises and cloud applications: This component improves user productivity by
giving users single sign-on (SSO ) to all their applications. It also ensures consistency as applications and users
cross network or cloud boundaries.
Integrated management and security across your hybrid cloud: This component provides you with a
cohesive way to monitor, manage, and secure the environment, which enables increased visibility and control.
A consistent data platform for the datacenter and the cloud: This component creates data portability,
combined with seamless access to on-premises and cloud data services for deep insight into all data sources.
Unified development and DevOps across the cloud and on-premises datacenters: This component
allows you to move applications between the two environments as needed. Developer productivity improves
because both locations now have the same development environment.
Here are some examples of these components from an Azure perspective:
Azure Active Directory (Azure AD ) works with on-premises Active Directory to provide common identity for all
users. SSO across on-premises and via the cloud makes it simple for users to safely access the applications and
assets they need. Admins can manage security and governance controls and also have the flexibility to adjust
permissions without affecting the user experience.
Azure provides integrated management and security services for both cloud and on-premises infrastructure.
These services include an integrated set of tools that are used to monitor, configure, and protect hybrid clouds.
This end-to-end approach to management specifically addresses real-world challenges that face organizations
considering a hybrid cloud solution.
Azure hybrid cloud provides common tools that ensure secure access to all data, seamlessly and efficiently.
Azure data services combine with Microsoft SQL Server to create a consistent data platform. A consistent
hybrid cloud model allows users to work with both operational and analytical data. The same services are
provided on-premises and in the cloud for data warehousing, data analysis, and data visualization.
Azure cloud services, combined with Azure Stack on-premises, provide unified development and DevOps.
Consistency across the cloud and on-premises means that your DevOps team can build applications that run in
either environment and can easily deploy to the right location. You also can reuse templates across the hybrid
solution, which can further simplify DevOps processes.

Azure Stack in a hybrid cloud environment


Azure Stack is a hybrid cloud solution that allows organizations to run Azure-consistent services in their
datacenter. It provides a simplified development, management, and security experience that's consistent with Azure
public cloud services. Azure Stack is an extension of Azure. You can use it to run Azure services from your on-
premises environments and then move to the Azure cloud if and when required.
With Azure Stack, you can deploy and operate both IaaS and PaaS by using the same tools and offering the same
experience as the Azure public cloud. Management of Azure Stack, whether through the web UI portal or through
PowerShell, has a consistent look and feel for IT administrators and end users with Azure.
Azure and Azure Stack open up new hybrid use cases for both customer-facing and internal line-of-business
applications:
Edge and disconnected solutions. To address latency and connectivity requirements, customers can process
data locally in Azure Stack and then aggregate it in Azure for further analytics. They can use common
application logic across both. Many customers are interested in this edge scenario across different contexts, like
factory floors, cruise ships, and mine shafts.
Cloud applications that meet various regulations. Customers can develop and deploy applications in
Azure, with full flexibility to deploy on-premises on Azure Stack to meet regulatory or policy requirements. No
code changes are needed. Application examples include global audit, financial reporting, foreign exchange
trading, online gaming, and expense reporting. Customers sometimes look to deploy different instances of the
same application to Azure or Azure Stack, based on business and technical requirements. While Azure meets
most requirements, Azure Stack complements the deployment approach where needed.
Cloud application model on-premises. Customers can use Azure web services, containers, serverless, and
microservice architectures to update and extend existing applications or build new ones. You can use consistent
DevOps processes across Azure in the cloud and Azure Stack on-premises. There's a growing interest in
application modernization, even for core mission-critical applications.
Azure Stack is offered via two deployment options:
Azure Stack integrated systems: Azure Stack integrated systems are offered through Microsoft and
hardware partners to create a solution that provides cloud-paced innovation balanced with simple
management. Because Azure Stack is offered as an integrated system of hardware and software, you get
flexibility and control while still adopting innovation from the cloud. Azure Stack integrated systems range in
size from 4 to 12 nodes. They're jointly supported by the hardware partner and Microsoft. Use Azure Stack
integrated systems to enable new scenarios for your production workloads.
Azure Stack Development Kit: The Microsoft Azure Stack Development Kit is a single-node deployment of
Azure Stack. You can use it to evaluate and learn about Azure Stack. You can also use the kit as a developer
environment, where you can develop by using APIs and tooling that are consistent with Azure. The Azure Stack
Development Kit isn't intended for use as a production environment.

Azure Stack one-cloud ecosystem


You can speed up Azure Stack initiatives by using the complete Azure ecosystem:
Azure ensures that most applications and services that are certified for Azure will work on Azure Stack. Several
ISVs are extending their solutions to Azure Stack. These ISVs include Bitnami, Docker, Kemp Technologies,
Pivotal Cloud Foundry, Red Hat Enterprise Linux, and SUSE Linux.
You can opt to have Azure Stack delivered and operated as a fully managed service. Several partners will have
managed service offerings across Azure and Azure Stack shortly. These partners include Tieto, Yourhosting,
Revera, Pulsant, and NTT. These partners deliver managed services for Azure via the Cloud Solution Provider
(CSP ) program. They're extending their offerings to include hybrid solutions.
As an example of a complete, fully managed hybrid cloud solution, Avanade delivers an all-in-one offer. It
includes cloud transformation services, software, infrastructure, setup and configuration, and ongoing managed
services. This way customers can consume Azure Stack just as they do with Azure today.
Providers can help accelerate application modernization initiatives by building end-to-end Azure solutions for
customers. They bring deep Azure skill sets, domain and industry knowledge, and process expertise, such as
DevOps. Every Azure Stack cloud is an opportunity for a provider to design the solution and lead and influence
system deployment. They also can customize the included capabilities and deliver operational activities.
Examples of providers include Avanade, DXC, Dell EMC Services, InFront Consulting Group, HPE Pointnext,
and PwC (formerly PricewaterhouseCoopers).
Use Terraform to build your landing zones
6 minutes to read • Edit Online

Azure provides native services for deploying your landing zones. Other third-party tools can also help with this
effort. One such tool that customers and partners often use to deploy landing zones is Hashicorp's Terraform. This
section shows how to use a prototype landing zone to deploy fundamental logging, accounting, and security
capabilities for an Azure subscription.

Purpose of the landing zone


The Cloud Adoption Framework foundational landing zone for Terraform has a limited set of responsibilities and
features to enforce logging, accounting, and security. This landing zone uses standard components known as
Terraform modules to enforce consistency across resources deployed in the environment.

Use standard modules


Reuse of components is a fundamental principle of infrastructure as code. Modules are instrumental in defining
standards and consistency across resource deployment within and across environments. The modules used to
deploy this first landing zone are available in the official Terraform registry.

Architecture diagram
The first landing zone deploys the following components in your subscription:
Capabilities
The components deployed and their purpose include the following:

COMPONENT RESPONSIBILITY

Resource groups Core resource groups needed for the foundation

Activity logging Auditing all subscription activities and archiving:


- Storage account
- Azure Event Hubs

Diagnostics logging All operation logs kept for a specific number of days:
- Storage account
- Event Hubs
COMPONENT RESPONSIBILITY

Log Analytics Stores all the operation logs


Deploy common solutions for deep application best practices
review:
- NetworkMonitoring
- ADAssessment
- ADReplication
- AgentHealthAssessment
- DnsAnalytics
- KeyVaultAnalytics

Azure Security Center Security hygiene metrics and alerts sent to email and phone
number

Use this blueprint


Before you use the Cloud Adoption Framework foundation landing zone, review the following assumptions,
decisions, and implementation guidance.

Assumptions
The following assumptions or constraints were considered when this initial landing zone was defined. If these
assumptions align with your constraints, you can use the blueprint to create your first landing zone. The blueprint
also can be extended to create a landing zone blueprint that meets your unique constraints.
Subscription limits: This adoption effort is unlikely to exceed subscription limits. Two common indicators are
an excess of 25,000 VMs or 10,000 vCPUs.
Compliance: No third-party compliance requirements are needed for this landing zone.
Architectural complexity: Architectural complexity doesn't require additional production subscriptions.
Shared services: There are no existing shared services in Azure that require this subscription to be treated like
a spoke in a hub and spoke architecture.
If these assumptions match your current environment, this blueprint might be a good way to start building your
landing zone.

Design decisions
The following decisions are represented in the Terraform landing zone:

COMPONENT DECISIONS ALTERNATIVE APPROACHES

Logging and monitoring Azure Monitor Log Analytics workspace


is used. A diagnostics storage account
as well as Event Hub is provisioned.

Network N/A - Network is implemented in Networking decisions


another landing zone.

Identity It's assumed that the subscription is Identity management best practices
already associated with an Azure Active
Directory instance.

Policy This landing zone currently assumes


that no Azure policies are to be applied.
COMPONENT DECISIONS ALTERNATIVE APPROACHES

Subscription design N/A - Designed for a single production Scaling subscriptions


subscription.

Management groups N/A - Designed for a single production Scaling subscriptions


subscription.

Resource groups N/A - Designed for a single production Scaling subscriptions


subscription.

Data N/A Choose the correct SQL Server option in


Azure and Azure Data Store guidance

Storage N/A Azure Storage guidance

Naming standards When the environment is created, a Naming and tagging best practices
unique prefix is also created. Resources
that require a globally unique name
(such as storage accounts) use this
prefix. The custom name is appended
with a random suffix. Tag usage is
mandated as described in the following
table.

Cost management N/A Tracking costs

Compute N/A Compute options

Tagging standards
The following set of minimum tags must be present on all resources and resource groups:

TAG NAME DESCRIPTION KEY EXAMPLE VALUE

Business Unit Top-level division of your BusinessUnit FINANCE, MARKETING,


company that owns the {Product Name}, CORP,
subscription or workload the SHARED
resource belongs to.

Cost Center Accounting cost center CostCenter Number


associated with this
resource.

Disaster Recovery Business criticality of the DR DR-ENABLED, NON-DR-


application, workload, or ENABLED
service.

Environment Deployment environment of Env Prod, Dev, QA, Stage, Test,


the application, workload, or Training
service.

Owner Name Owner of the application, Owner email


workload, or service.

Deployment Type Defines how the resources deploymentType Manual, Terraform


are being maintained.
TAG NAME DESCRIPTION KEY EXAMPLE VALUE

Version Version of the blueprint version v0.1


deployed.

Application Name Name of the associated ApplicationName "app name"


application, service, or
workload associated with the
resource.

Customize and deploy your first landing zone


You can clone your Terraform foundation landing zone. It's easy to get started with the landing zone by modifying
the Terraform variables. In our example, we use blueprint_foundations.sandbox.auto.tfvars, so Terraform
automatically sets the values in this file for you.
Let's look at the different variable sections.
In this first object, we create two resource groups in the southeastasia region named -hub-core-sec and
-hub-operations along with a prefix added at runtime.

resource_groups_hub = {
HUB-CORE-SEC = {
name = "-hub-core-sec"
location = "southeastasia"
}
HUB-OPERATIONS = {
name = "-hub-operations"
location = "southeastasia"
}
}

Next, we specify the regions where we can set the foundations. Here, southeastasia is used to deploy all the
resources.

location_map = {
region1 = "southeastasia"
region2 = "eastasia"
}

Then, we specify the retention period for the operations logs and the Azure subscription logs. This data is stored in
separate storage accounts and an event hub, whose names are randomly generated because they must be unique.

azure_activity_logs_retention = 365
azure_diagnostics_logs_retention = 60

Into the tags_hub, we specify the minimum set of tags that are applied to all resources created.

tags_hub = {
environment = "DEV"
owner = "Arnaud"
deploymentType = "Terraform"
costCenter = "65182"
BusinessUnit = "SHARED"
DR = "NON-DR-ENABLED"
}
Then, we specify the log analytics name and a set of solutions that analyze the deployment. Here, we retained
Network Monitoring, Active Directory (AD ) Assessment and Replication, DNS Analytics, and Key Vault Analytics.

analytics_workspace_name = "lalogs"

solution_plan_map = {
NetworkMonitoring = {
"publisher" = "Microsoft"
"product" = "OMSGallery/NetworkMonitoring"
},
ADAssessment = {
"publisher" = "Microsoft"
"product" = "OMSGallery/ADAssessment"
},
ADReplication = {
"publisher" = "Microsoft"
"product" = "OMSGallery/ADReplication"
},
AgentHealthAssessment = {
"publisher" = "Microsoft"
"product" = "OMSGallery/AgentHealthAssessment"
},
DnsAnalytics = {
"publisher" = "Microsoft"
"product" = "OMSGallery/DnsAnalytics"
},
KeyVaultAnalytics = {
"publisher" = "Microsoft"
"product" = "OMSGallery/KeyVaultAnalytics"
}
}

Next, we configured the alert parameters for Azure Security Center.

# Azure Security Center Configuration


security_center = {
contact_email = "joe@contoso.com"
contact_phone = "+6500000000"
}

Get started
After you've reviewed the configuration, you can deploy the configuration as you would deploy a Terraform
environment. We recommend that you use the rover, which is a Docker container that allows deployment from
Windows, Linux, or MacOS. You can get started with the rover GitHub repository.

Next steps
The foundation landing zone lays the groundwork for a complex environment in a decomposed manner. This
edition provides a set of simple capabilities that can be extended by:
Adding other modules to the blueprint.
Layering additional landing zones on top of it.
Layering landing zones is a good practice for decoupling systems, versioning each component that you're using,
and allowing fast innovation and stability for your infrastructure as code deployment.
Future reference architectures will demonstrate this concept for a hub and spoke topology.
Review the foundation Terraform landing zone sample
The virtual datacenter: A network perspective
44 minutes to read • Edit Online

Overview
Migrating on-premises applications to Azure provides organizations the benefits of a secured and cost-efficient
infrastructure, even if the applications are migrated with minimal changes. However, to make the most of the
agility possible with cloud computing, enterprises should evolve their architectures to take advantage of Azure
capabilities.
Microsoft Azure delivers hyper-scale services and infrastructure with enterprise-grade capabilities and reliability.
These services and infrastructure offer many choices in hybrid connectivity so customers can choose to access
them over the public internet or over a private network connection. Microsoft partners can also provide enhanced
capabilities by offering security services and virtual appliances that are optimized to run in Azure.
With the Microsoft Azure platform, customers can seamlessly extend their infrastructure into the cloud and build
multi-tier architectures.

What is the virtual datacenter?


In the beginning, the cloud was essentially a platform for hosting public-facing applications. Enterprises began to
understand the value of the cloud and started to move internal line-of-business applications to the cloud. These
types of applications brought additional security, reliability, performance, and cost considerations that required
additional flexibility in the way cloud services were delivered. This paved the way for new infrastructure and
networking services designed to provide this flexibility but also new features for scale, disaster recovery, and other
considerations.
Cloud solutions were first designed to host single, relatively isolated applications in the public spectrum. This
approach worked well for a few years. Then the benefits of cloud solutions became clear, and multiple large-scale
workloads were hosted on the cloud. Addressing security, reliability, performance, and cost concerns of
deployments in one or more regions became vital throughout the life cycle of the cloud service.
The following cloud deployment diagram shows an example of a security gap in the red box. The yellow box
shows room for optimizing network virtual appliances across workloads.
The virtual datacenter (VDC ) is a concept born of the necessity for scaling to support enterprise workloads. This
scale must address the challenges introduced when supporting large-scale applications in the public cloud.
A VDC implementation doesn't just represent the application workloads in the cloud. It's also the network, security,
management, and infrastructure (for example, DNS and Directory Services). As more and more of an enterprise's
workloads move to Azure, it's important to think about the supporting infrastructure and objects these workloads
are placed in. Thinking carefully about how resources are structured can avoid the proliferation of hundreds of
"workload islands" that must be managed separately with independent data flow, security models, and compliance
challenges.
The VDC concept is a set of recommendations and high-level designs for implementing a collection of separate
but related entities. These entities often have common supporting functions, features, and infrastructure. By
viewing your workloads through the lens of the VDC, you can realize reduced cost from economies of scale,
optimized security through component and data flow centralization, along with easier operations, management,
and compliance audits.

NOTE
It's important to understand that the VDC is NOT a discrete Azure product, but the combination of various features and
capabilities to meet your exact requirements. The VDC is a way of thinking about your workloads and Azure usage to
maximize your resources and abilities in the cloud. It's a modular approach to building up IT services in Azure while
respecting the enterprise's organizational roles and responsibilities.

A VDC implementation can help enterprises get workloads and applications into Azure for the following scenarios:
Host multiple related workloads.
Migrate workloads from an on-premises environment to Azure.
Implement shared or centralized security and access requirements across workloads.
Mix DevOps and centralized IT appropriately for a large enterprise.

Who should implement a virtual datacenter?


Any Azure customer that has decided to adopt the cloud can benefit from the efficiency of configuring a set of
resources for common use by all applications. Depending on the size, even single applications can benefit from
using the patterns and components used to build a VDC implementation.
If your organization has centralized teams or departments for IT, networking, security, or compliance,
implementing a VDC can help enforce policy points, segregation of duty, and ensure uniformity of the underlying
common components while giving application teams as much freedom and control as is appropriate for your
requirements.
Organizations that are looking to DevOps can also use the VDC concepts to provide authorized pockets of Azure
resources. This method can ensure the DevOps groups have total control within that grouping, at either the
subscription level or within resource groups in a common subscription. At the same time, the network and security
boundaries stay compliant as defined by a centralized policy in the hub VNet and centrally managed resource
group.

Considerations for implementing a virtual datacenter


When designing a VDC implementation there are several pivotal issues to consider:
Identity and directory service
Identity and directory services are a key aspect of all datacenters, both on-premises and in the cloud. Identity is
related to all aspects of access and authorization to services within a VDC implementation. To help ensure that only
authorized users and processes access your Azure Account and resources, Azure uses several types of credentials
for authentication. These include passwords (to access the Azure account), cryptographic keys, digital signatures,
and certificates. Azure Multi-Factor Authentication is an additional layer of security for accessing Azure services.
Multi-Factor Authentication provides strong authentication with a range of easy verification options (phone call,
text message, or mobile app notification) and allow customers to choose the method they prefer.
Any large enterprise needs to define an identity management process that describes the management of individual
identities, their authentication, authorization, roles, and privileges within or across their VDC implementation. The
goals of this process should be to increase security and productivity while reducing cost, downtime, and repetitive
manual tasks.
Enterprise organizations may require a demanding mix of services for different lines of business, and employees
often have different roles when involved with different projects. The VDC requires good cooperation between
different teams, each with specific role definitions, to get systems running with good governance. The matrix of
responsibilities, access, and rights can be complex. Identity management in the VDC is implemented through
Azure Active Directory (Azure AD ) and role-based access control (RBAC ).
A directory service is a shared information infrastructure that locates, manages, administers, and organizes
everyday items and network resources. These resources can include volumes, folders, files, printers, users, groups,
devices, and other objects. Each resource on the network is considered an object by the directory server.
Information about a resource is stored as a collection of attributes associated with that resource or object.
All Microsoft online business services rely on Azure Active Directory (Azure AD ) for sign-on and other identity
needs. Azure Active Directory is a comprehensive, highly available identity and access management cloud solution
that combines core directory services, advanced identity governance, and application access management. Azure
AD can integrate with on-premises Active Directory to enable single sign-on for all cloud-based and locally hosted
on-premises applications. The user attributes of on-premises Active Directory can be automatically synchronized
to Azure AD.
A single global administrator isn't required to assign all permissions in a VDC implementation. Instead, each
specific department, group of users, or services in the Directory Service can have the permissions required to
manage their own resources within a VDC implementation. Structuring permissions requires balancing. Too many
permissions can impede performance efficiency, and too few or loose permissions can increase security risks.
Azure role-based access control (RBAC ) helps to address this problem, by offering fine-grained access
management for resources in a VDC implementation.
Security infrastructure
Security infrastructure refers to the segregation of traffic in a VDC implementation's specific virtual network
segment. This infrastructure specifies how ingress and egress is controlled in a VDC implementation. Azure is
based on a multi-tenant architecture that prevents unauthorized and unintentional traffic between deployments by
using VNet isolation, access control lists (ACLs), load balancers, IP filters, and traffic flow policies. Network address
translation (NAT) separates internal network traffic from external traffic.
The Azure fabric allocates infrastructure resources to tenant workloads and manages communications to and from
virtual machines (VMs). The Azure hypervisor enforces memory and process separation between VMs and
securely routes network traffic to guest OS tenants.
Connectivity to the cloud
A VDC implementation requires connectivity to external networks to offer services to customers, partners, or
internal users. This need for connectivity refers not only to the Internet, but also to on-premises networks and
datacenters.
Customers control which services have access to, and are accessible from, the public internet. This access is
controlled by using Azure Firewall or other types of virtual network appliances (NVAs), custom routing policies by
using user-defined routes, and network filtering by using network security groups. We recommend that all
internet-facing resources also be protected by the Azure DDoS Protection Standard.
Enterprises may need to connect their VDC implementation to on-premises datacenters or other resources. This
connectivity between Azure and on-premises networks is a crucial aspect when designing an effective architecture.
Enterprises have two different ways to create this interconnection: transit over the Internet or via private direct
connections.
An Azure Site-to-Site VPN is an interconnection service between on-premises networks and a VDC
implementation in Azure. The link is established through secure encrypted connections (IPsec tunnels). Azure Site-
to-Site VPN connections are flexible, quick to create, and do not, generally, require any additional hardware
procurement. Based on industry standard protocols, most current network devices can create VPN connections to
Azure over the internet or existing connectivity paths.
ExpressRoute is an Azure connectivity service that enables private connections between a VDC implementation
and any on-premises networks. ExpressRoute connections don't go over the public Internet, and offer higher
security, reliability, and higher speeds (up to 100 Gbps) along with consistent latency. ExpressRoute is useful for
VDC implementations, as ExpressRoute customers can get the benefits of compliance rules associated with private
connections. With ExpressRoute Direct, you can connect directly to Microsoft routers at 10 or 100 Gbps.
Deploying ExpressRoute connections usually involves engaging with an ExpressRoute service provider
(ExpressRoute Direct being the exception). For customers that need to start quickly, it's common to initially use
Site-to-Site VPN to establish connectivity between a VDC implementation and on-premises resources. Once your
physical interconnection with your service provider is complete, then migrate connectivity over your ExpressRoute
connection.
For large numbers of VPN or ExpressRoute connections, Azure Virtual WAN is a networking service that
provides optimized and automated branch-to-branch connectivity through Azure. Virtual WAN lets you connect to
and configure branch devices to communicate with Azure. Connecting and configuring can be done either
manually, or by using preferred provider devices through a Virtual WAN partner. Using preferred provider devices
allows ease of use, simplification of connectivity, and configuration management. The Azure WAN built-in
dashboard provides instant troubleshooting insights that can help save you time, and gives you an easy way to
view large-scale Site-to-Site connectivity. Virtual WAN also provides security services with an optional Azure
Firewall and Firewall Manager in your WAN hub.
Connectivity within the cloud
Virtual Networks (VNets) and VNet peering are the basic networking connectivity services inside a VDC
implementation. A VNet guarantees a natural boundary of isolation for VDC resources. VNet peering allows
intercommunication between different VNets within the same Azure region, across regions, or even between
VNets in different subscriptions. Inside a VNet and between VNets, traffic flows can be controlled by sets of
security rules specified in network security groups, firewall policies (Azure Firewall or network virtual appliances),
and custom user-defined routes.
VNets are also anchor points for integrating Platform as a Service (PaaS ) Azure products like Azure Storage, Azure
SQL, and other integrated public services that have public endpoints. With Service Endpoints and Private Link, you
can integrate your public services with your private network. You can even take your public services private, but
still enjoy the benefits of Azure managed PaaS services.

Virtual datacenter overview


Topologies
The Virtual datacenter can be built based on four general, high-level topologies based on your needs and scale.
The topologies, at the highest level, are:
Flat is a model in which all resources are deployed in a single virtual network. Subnets allow for flow control and
segregation.

Mesh is a model using VNet Peering to connect all virtual networks directly to each other.

VNet Peering Hub and spoke is a model for designing a network topology for distributed applications/teams and
delegation.

Azure Virtual WAN is a model for large-scale branch offices and global WAN services.
As shown above, two of the design types are hub and spoke (VNet Peering hub-and-spoke and Azure Virtual
WAN ). Hub and spoke designs are optimal for communication, shared resources, and centralized security policy.
Hubs are built either using a VNet Peering hub (Hub Virtual Network in the diagram) or a Virtual WAN Hub
(Azure Virtual WAN in the diagram). Virtual WAN is good for large-scale branch-to-branch and branch-to-Azure
communications, or if you opt to avoid the complexities of building all the components individually in a VNet
Peering Hub. In some cases, a VNet Peering Hub design is dictated by your requirements. An example of a
dictating requirement would be the need to use a Network Virtual Appliances in the hub.
In both hub and spoke topologies, the hub is the central network zone that controls and inspects ingress or egress
traffic between different zones: internet, on-premises, and the spokes. The hub and spoke topology gives the IT
department an effective way to enforce security policies in a central location. It also reduces the potential for
misconfiguration and exposure.
The hub often contains the common service components consumed by the spokes. The following examples are
common central services:
The Windows Active Directory infrastructure, required for user authentication of third parties that access from
untrusted networks before they get access to the workloads in the spoke. It includes the related Active Directory
Federation Services (AD FS ).
A Distributed Name System (DNS ) service to resolve naming for the workload in the spokes, to access
resources on-premises and on the internet if Azure DNS isn't used.
A public key infrastructure (PKI), to implement single sign-on on workloads.
Flow control of TCP and UDP traffic between the spoke network zones and the internet.
Flow control between the spokes and on-premises.
If needed, flow control between one spoke and another.
The VDC reduces overall cost by using the shared hub infrastructure between multiple spokes.
The role of each spoke can be to host different types of workloads. The spokes also provide a modular approach
for repeatable deployments of the same workloads. Examples are dev and test, user acceptance testing, pre-
production, and production. The spokes can also segregate and enable different groups within your organization.
An example is DevOps groups. Inside a spoke, it's possible to deploy a basic workload or complex multi-tier
workloads with traffic control between the tiers.
Subscription limits and multiple hubs

IMPORTANT
Based on the size of your Azure deployments, a multiple hub strategy may be needed. When designing your hub and spoke
strategy, ask "can this design scale to use another hub VNet in this region?", also, "can this design scale to accommodate
multiple regions?" It's far better to plan for a design that scales and not need it, than to fail to plan and need it.
When to scale to a secondary (or more) hub will depend on myriad factors, usually based on inherent limits on scale. Be sure
to review the Subscription, VNet, and VM limits when designing for scale.
In Azure, every component, whatever the type, is deployed in an Azure Subscription. The isolation of Azure
components in different Azure subscriptions can satisfy the requirements of different LOBs, such as setting up
differentiated levels of access and authorization.
A single VDC implementation can scale up to large number of spokes, although, as with every IT system, there are
platform limits. The hub deployment is bound to a specific Azure subscription, which has restrictions and limits (for
example, a maximum number of VNet peerings. See Azure subscription and service limits, quotas, and constraints
for details). In cases where limits may be an issue, the architecture can scale up further by extending the model
from a single hub-spokes to a cluster of hub and spokes. Multiple hubs in one or more Azure regions can be
connected using VNet Peering, ExpressRoute, Virtual WAN, or site-to-site VPN.

The introduction of multiple hubs increases the cost and management effort of the system. It is only justified due
to scalability, system limits, redundancy, regional replication for end-user performance, or disaster recovery. In
scenarios requiring multiple hubs, all the hubs should strive to offer the same set of services for operational ease.
Interconnection between spokes
Inside a single spoke, or a flat network design, it's possible to implement complex multi-tier workloads. Multi-tier
configurations can be implemented using subnets, one for every tier or application, in the same VNet. Traffic
control and filtering are done using network security groups and user-defined routes.
An architect might want to deploy a multi-tier workload across multiple virtual networks. With virtual network
peering, spokes can connect to other spokes in the same hub or different hubs. A typical example of this scenario is
the case where application processing servers are in one spoke, or virtual network. The database deploys in a
different spoke, or virtual network. In this case, it's easy to interconnect the spokes with virtual network peering
and, by doing that, avoid transiting through the hub. A careful architecture and security review should be done to
ensure that bypassing the hub doesn’t bypass important security or auditing points that might exist only in the
hub.
Spokes can also be interconnected to a spoke that acts as a hub. This approach creates a two-level hierarchy: the
spoke in the higher level (level 0) becomes the hub of lower spokes (level 1) of the hierarchy. The spokes of a VDC
implementation are required to forward the traffic to the central hub so that the traffic can transit to its destination
in either the on-premises network or the public internet. An architecture with two levels of hubs introduces
complex routing that removes the benefits of a simple hub-spoke relationship.
Although Azure allows complex topologies, one of the core principles of the VDC concept is repeatability and
simplicity. To minimize management effort, the simple hub-spoke design is the VDC reference architecture that we
recommend.
Components
The virtual datacenter is made up of four basic component types: Infrastructure, Perimeter Networks,
Workloads, and Monitoring.
Each component type consists of various Azure features and resources. Your VDC implementation is made up of
instances of multiple components types and multiple variations of the same component type. For instance, you
may have many different, logically separated workload instances that represent different applications. You use
these different component types and instances to ultimately build the VDC.

The preceding high-level conceptual architecture of the VDC shows different component types used in different
zones of the hub-spokes topology. The diagram shows infrastructure components in various parts of the
architecture.
As good practice in general, access rights and privileges should be group-based. Dealing with groups rather than
individual users eases maintenance of access policies, by providing a consistent way to manage it across teams,
and aids in minimizing configuration errors. Assigning and removing users to and from appropriate groups helps
keeping the privileges of a specific user up to date.
Each role group should have a unique prefix on their names. This prefix makes it easy to identify which group is
associated with which workload. For example, a workload hosting an authentication service might have groups
named AuthServiceNetOps, AuthServiceSecOps, AuthServiceDevOps, and AuthServiceInfraOps.
Centralized roles, or roles not related to a specific service, might be prefaced with Corp. An example is
CorpNetOps.
Many organizations use a variation of the following groups to provide a major breakdown of roles:
The central IT group, Corp, has the ownership rights to control infrastructure components. Examples are
networking and security. The group needs to have the role of contributor on the subscription, control of the hub,
and network contributor rights in the spokes. Large organizations frequently split up these management
responsibilities between multiple teams. Examples are a network operations CorpNetOps group with exclusive
focus on networking and a security operations CorpSecOps group responsible for the firewall and security
policy. In this specific case, two different groups need to be created for assignment of these custom roles.
The dev-test group, AppDevOps, has the responsibility to deploy app or service workloads. This group takes
the role of virtual machine contributor for IaaS deployments or one or more PaaS contributor’s roles. See Built-
in roles for Azure resources. Optionally, the dev/test team might need visibility on security policies (network
security groups) and routing policies (user-defined routes) inside the hub or a specific spoke. In addition to the
role of contributor for workloads, this group would also need the role of network reader.
The operation and maintenance group, CorpInfraOps or AppInfraOps, has the responsibility of managing
workloads in production. This group needs to be a subscription contributor on workloads in any production
subscriptions. Some organizations might also evaluate if they need an additional escalation support team group
with the role of subscription contributor in production and the central hub subscription. The additional group
fixes potential configuration issues in the production environment.
The VDC is designed so that groups created for the central IT group, managing the hub, have corresponding
groups at the workload level. In addition to managing hub resources only, the central IT group is able to control
external access and top-level permissions on the subscription. Workload groups are also able to control resources
and permissions of their VNet independently from central IT.
The VDC is partitioned to securely host multiple projects across different Lines-of-Business (LOBs). All projects
require different isolated environments (Dev, UAT, production). Separate Azure subscriptions for each of these
environments can provide natural isolation.
The preceding diagram shows the relationship between an organization's projects, users, and groups and the
environments where the Azure components are deployed.
Typically in IT, an environment (or tier) is a system in which multiple applications are deployed and executed. Large
enterprises use a development environment (where changes are made and tested) and a production environment
(what end-users use). Those environments are separated, often with several staging environments in between
them to allow phased deployment (rollout), testing, and rollback if problems arise. Deployment architectures vary
significantly, but usually the basic process of starting at development (DEV ) and ending at production (PROD ) is
still followed.
A common architecture for these types of multi-tier environments consists of DevOps for development and
testing, UAT for staging, and production environments. Organizations can leverage single or multiple Azure AD
tenants to define access and rights to these environments. The previous diagram shows a case where two different
Azure AD tenants are used: one for DevOps and UAT, and the other exclusively for production.
The presence of different Azure AD tenants enforces the separation between environments. The same group of
users, such as the central IT, need to authenticate by using a different URI to access a different Azure AD tenant to
modify the roles or permissions of either the DevOps or production environments of a project. The presence of
different user authentications to access different environments reduces possible outages and other issues caused
by human errors.
Component type: Infrastructure
This component type is where most of the supporting infrastructure resides. It's also where your centralized IT,
security, and compliance teams spend most of their time.

Infrastructure components provide an interconnection for the different components of a VDC implementation, and
are present in both the hub and the spokes. The responsibility for managing and maintaining the infrastructure
components is typically assigned to the central IT and/or security team.
One of the primary tasks of the IT infrastructure team is to guarantee the consistency of IP address schemas
across the enterprise. The private IP address space assigned to a VDC implementation must be consistent and
NOT overlapping with private IP addresses assigned on your on-premises networks.
While NAT on the on-premises edge routers or in Azure environments can avoid IP address conflicts, it adds
complications to your infrastructure components. Simplicity of management is one of the key goals of the VDC, so
using NAT to handle IP concerns, while a valid solution, is not a recommended solution.
Infrastructure components have the following functionality:
Identity and directory services. Access to every resource type in Azure is controlled by an identity stored in a
directory service. The directory service stores not only the list of users, but also the access rights to resources in
a specific Azure subscription. These services can exist cloud-only, or they can be synchronized with on-premises
identity stored in Active Directory.
Virtual Network. Virtual Networks are one of main components of the VDC, and enable you to create a traffic
isolation boundary on the Azure platform. A Virtual Network is composed of a single or multiple virtual
network segments, each with a specific IP network prefix (a subnet, either IPv4 or dual stack IPv4/IPv6). The
Virtual Network defines an internal perimeter area where IaaS virtual machines and PaaS services can establish
private communications. VMs (and PaaS services) in one virtual network can't communicate directly to VMs
(and PaaS services) in a different virtual network, even if both virtual networks are created by the same
customer, under the same subscription. Isolation is a critical property that ensures customer VMs and
communication remains private within a virtual network. Where cross-VNet connectivity is desired, the
following features describe how that can be accomplished.
VNet Peering. The fundamental feature used to create the infrastructure of the VDC is VNet Peering, a
mechanism that connects two virtual networks (VNets) in the same region through the Azure datacenter
network, or using the Azure world-wide backbone across regions.
Service Endpoints. Virtual Network (VNet) service endpoints extend your virtual network private address
space to include your PaaS space. The endpoints also extend the identity of your VNet to the Azure services
over a direct connection. Endpoints allow you to secure your critical Azure service resources to only your virtual
networks.
Private Link. Azure Private Link enables you to access Azure PaaS Services (for example, Azure Storage, Azure
Cosmos DB, and Azure SQL Database) and Azure hosted customer/partner services over a Private Endpoint in
your virtual network. Traffic between your virtual network and the service traverses over the Microsoft
backbone network, eliminating exposure from the public Internet. You can also create your own Private Link
Service in your virtual network (VNet) and deliver it privately to your customers. The setup and consumption
experience using Azure Private Link is consistent across Azure PaaS, customer-owned, and shared partner
services.
User-defined routes. Traffic in a virtual network is routed by default based on the system routing table. A
user-defined route is a custom routing table that network administrators can associate to one or more subnets
to override the behavior of the system routing table and define a communication path within a virtual network.
The presence of user-defined routes guarantees that egress traffic from the spoke transit through specific
custom VMs or network virtual appliances and load balancers present in both the hub and the spokes.
Network security groups. A network security group is a list of security rules that act as traffic filtering on IP
sources, IP destinations, protocols, IP source ports, and IP destination ports (also called a layer-4 five-tuple).
The network security group can be applied to a subnet, a Virtual NIC associated with an Azure VM, or both. The
network security groups are essential to implement a correct flow control in the hub and in the spokes. The
level of security afforded by the network security group is a function of which ports you open, and for what
purpose. Customers should apply additional per-VM filters with host-based firewalls such as IPtables or the
Windows Firewall.
DNS. The name resolution of resources in the VNets of a VDC implementation is provided through DNS.
Azure provides DNS services for both Public and Private name resolution. Private zones provide name
resolution both within a virtual network and across virtual networks. You can have private zones not only span
across virtual networks in the same region, but also across regions and subscriptions. For public resolution,
Azure DNS provides a hosting service for DNS domains, providing name resolution using Microsoft Azure
infrastructure. By hosting your domains in Azure, you can manage your DNS records using the same
credentials, APIs, tools, and billing as your other Azure services.
Management group, Subscription, and Resource Group management. A subscription defines a natural
boundary to create multiple groups of resources in Azure. This separation can be for function, role segregation,
or billing. Resources in a subscription are assembled together in logical containers known as resource groups.
The resource group represents a logical group to organize the resources of a VDC implementation. If your
organization has many subscriptions, you may need a way to efficiently manage access, policies, and
compliance for those subscriptions. Azure management groups provide a level of scope above subscriptions.
You organize subscriptions into containers called "management groups" and apply your governance conditions
to the management groups. All subscriptions within a management group automatically inherit the conditions
applied to the management group. To see these three features in a hierarchy view, read the Cloud Adoption
Framework page, Organizing your resources.
Role-Based Access Controls (RBAC ). Through RBAC, it's possible to map organizational role along with
rights to access specific Azure resources, allowing you to restrict users to only a certain subset of actions. If
you're using Azure Active Directory synchronized with an on-premises Active Directory, you can use the same
AD Groups in Azure that you use in Azure. With RBAC, you can grant access by assigning the appropriate role
to users, groups, and applications within the relevant scope. The scope of a role assignment can be an Azure
subscription, a resource group, or a single resource. RBAC allows inheritance of permissions. A role assigned at
a parent scope also grants access to the children contained within it. Using RBAC, you can segregate duties and
grant only the amount of access to users that they need to perform their jobs. For example, use RBAC to let one
employee manage virtual machines in a subscription, while another can manage SQL DBs within the same
subscription.
Component Type: Perimeter Networks
Perimeter network (sometimes called a DMZ network) components enable network connectivity between your on-
premises or physical datacenter networks, along with any connectivity to and from the Internet. It's also where
your network and security teams likely spend most of their time.
Incoming packets should flow through the security appliances in the hub before reaching the back-end servers and
services in the spokes. Examples are the firewall, IDS, and IPS. Before they leave the network, internet-bound
packets from the workloads should also flow through the security appliances in the perimeter network. The
purposes of this flow are policy enforcement, inspection, and auditing.
Perimeter network components include the following features:
Virtual networks, user-defined routes, and network security groups
Network virtual appliances
Azure Load Balancer
Azure Application Gateway with web application firewall (WAF )
Public IPs
Azure Front Door with web application firewall (WAF )
Azure Firewall and Azure Firewall Manager
Standard DDoS Protection
Usually, the central IT and security teams have responsibility for requirement definition and operation of the
perimeter networks.
The preceding diagram shows the enforcement of two perimeters with access to the internet and an on-premises
network, both resident in the DMZ hub. In the DMZ hub, the perimeter network to internet can scale up to support
large numbers of LOBs, using multiple farms of Web Application Firewalls (WAFs) and/or Azure Firewalls. The
hub also allows for on-premises connectivity via VPN or ExpressRoute as needed.

NOTE
In the preceding diagram, in the "DMZ Hub", many of the following features can be bundled together in an Azure Virtual
WAN hub (for instance; VNet, UDR, NSG, VPN Gateway, ExpressRoute Gateway, Azure Load Balancers, Azure Firewall, Firewall
Manager, and DDOS). Using vWAN Hubs can make the creation of the hub VNet, and thus the VDC, much easier since most
of the engineering complexity is done for you by Azure when you deploy an Azure vWAN Hub.

Virtual networks. The hub is typically built on a virtual network with multiple subnets to host the different types
of services that filter and inspect traffic to or from the internet via Azure Firewall, NVAs, WAF, and Azure
Application Gateway instances.
User-defined routes Using user-defined routes, customers can deploy firewalls, IDS/IPS, and other virtual
appliances, and route network traffic through these security appliances for security boundary policy enforcement,
auditing, and inspection. User-defined routes can be created in both the hub and the spokes to guarantee that
traffic transits through the specific custom VMs, Network Virtual Appliances, and load balancers used by a VDC
implementation. To guarantee that traffic generated from virtual machines residing in the spoke transits to the
correct virtual appliances, a user-defined route needs to be set in the subnets of the spoke by setting the front-end
IP address of the internal load balancer as the next-hop. The internal load balancer distributes the internal traffic to
the virtual appliances (load balancer back-end pool).
Azure Firewall is a managed, cloud-based network security service that protects your Azure Virtual Network
resources. It's a stateful firewall as a service with built-in high availability and cloud scalability. You can centrally
create, enforce, and log application and network connectivity policies across subscriptions and virtual networks.
Azure Firewall uses a static public IP address for your virtual network resources. It allows outside firewalls to
identify traffic that originates from your virtual network. The service is fully integrated with Azure Monitor for
logging and analytics.
If you use the vWAN Topology, the Azure Firewall Manager is a security management service that provides
central security policy and route management for cloud-based security perimeters. It works with Azure Virtual
WAN Hub, a Microsoft-managed resource that lets you easily create hub and spoke architectures. When security
and routing policies are associated with such a hub, it's referred to as a secured virtual hub.
Network virtual appliances. In the hub, the perimeter network with access to the internet is normally managed
through an Azure Firewall instance or a farm of firewalls or web application firewall (WAF ).
Different LOBs commonly use many web applications. These applications tend to suffer from various
vulnerabilities and potential exploits. Web application firewalls are a special type of product used to detect attacks
against web applications, HTTP/HTTPS, in more depth than a generic firewall. Compared with tradition firewall
technology, WAFs have a set of specific features to protect internal web servers from threats.
An Azure Firewall or NVA firewall both use a common administration plane, with a set of security rules to protect
the workloads hosted in the spokes, and control access to on-premises networks. The Azure Firewall has scalability
built in, whereas NVA firewalls can be manually scaled behind a load balancer. Generally, a firewall farm has less
specialized software compared with a WAF, but has a broader application scope to filter and inspect any type of
traffic in egress and ingress. If an NVA approach is used, they can be found and deployed from the Azure
marketplace.
We recommend that you use one set of Azure Firewall instances, or NVAs, for traffic originating on the internet.
Use another for traffic originating on-premises. Using only one set of firewalls for both is a security risk as it
provides no security perimeter between the two sets of network traffic. Using separate firewall layers reduces the
complexity of checking security rules and makes it clear which rules correspond to which incoming network
request.
Azure Load Balancer offers a high availability Layer 4 (TCP, UDP ) service, which can distribute incoming traffic
among service instances defined in a load-balanced set. Traffic sent to the load balancer from front-end endpoints
(public IP endpoints or private IP endpoints) can be redistributed with or without address translation to a set of
back-end IP address pool (examples are Network Virtual Appliances or VMs).
Azure Load Balancer can probe the health of the various server instances as well, and when an instance fails to
respond to a probe, the load balancer stops sending traffic to the unhealthy instance. In the VDC, an external load
balancer is deployed to the hub and the spokes. In the hub, the load balancer is used to efficiently route traffic
across firewall instances, and in the spokes, load balancers are used to manage application traffic.
Azure Front Door (AFD ) is Microsoft's highly available and scalable Web Application Acceleration Platform,
Global HTTP Load Balancer, Application Protection, and Content Delivery Network. Running in more than 100
locations at the edge of Microsoft's Global Network, AFD enables you to build, operate, and scale out your
dynamic web application and static content. AFD provides your application with world-class end-user
performance, unified regional/stamp maintenance automation, BCDR automation, unified client/user information,
caching, and service insights. The platform offers performance, reliability and support SLAs, compliance
certifications and auditable security practices developed, operated, and supported natively by Azure. A web
application firewall (WAF ) is also provided as part of the Front Door WAF SKU. This SKU provides protection to
web applications from common web vulnerabilities and exploits.
Application Gateway Microsoft Azure Application Gateway is a dedicated virtual appliance providing application
delivery controller (ADC ) as a service, offering various layer 7 load-balancing capabilities for your application. It
allows you to optimize web farm productivity by offloading CPU intensive SSL termination to the application
gateway. It also provides other layer 7 routing capabilities including round robin distribution of incoming traffic,
cookie-based session affinity, URL path-based routing, and the ability to host multiple websites behind a single
Application Gateway. A web application firewall (WAF ) is also provided as part of the application gateway WAF
SKU. This SKU provides protection to web applications from common web vulnerabilities and exploits. Application
Gateway can be configured as internet facing gateway, internal only gateway, or a combination of both.
Public IPs. With some Azure features, you can associate service endpoints to a public IP address so that your
resource is accessible from the internet. This endpoint uses network address translation (NAT) to route traffic to
the internal address and port on the Azure virtual network. This path is the primary way for external traffic to pass
into the virtual network. You can configure public IP addresses to determine which traffic is passed in and how and
where it's translated onto the virtual network.
Azure DDoS Protection Standard provides additional mitigation capabilities over the Basic service tier that are
tuned specifically to Azure Virtual Network resources. DDoS Protection Standard is simple to enable and requires
no application changes. Protection policies are tuned through dedicated traffic monitoring and machine learning
algorithms. Policies are applied to public IP addresses associated to resources deployed in virtual networks.
Examples are Azure Load Balancer, Azure Application Gateway, and Azure Service Fabric instances. Near real-time,
system-generated logs are available through Azure Monitor views during an attack and for history. Application
layer protection can be added through the Azure Application Gateway web application firewall. Protection is
provided for IPv4 and IPv6 Azure public IP addresses.
The Hub and Spoke topology at a detail level uses VNet Peering and UDRs to route traffic properly

In the diagram, the UDR ensures traffic flows from the spoke to the firewall before transiting to on-premises
through the ExpressRoute gateway (assuming the firewall policy allows that flow ).
Component type: Monitoring
Monitoring components provide visibility and alerting from all the other components types. All teams should have
access to monitoring for the components and services they have access to. If you have a centralized help desk or
operations teams, they require integrated access to the data provided by these components.
Azure offers different types of logging and monitoring services to track the behavior of Azure-hosted resources.
Governance and control of workloads in Azure is based not just on collecting log data but also on the ability to
trigger actions based on specific reported events.
Azure Monitor. Azure includes multiple services that individually perform a specific role or task in the monitoring
space. Together, these services deliver a comprehensive solution for collecting, analyzing, and acting on system-
generated logs from your applications and the Azure resources that support them. They can also work to monitor
critical on-premises resources in order to provide a hybrid monitoring environment. Understanding the tools and
data that are available is the first step in developing a complete monitoring strategy for your applications.
There are two fundamental types of logs in Azure Monitor:
Metrics are numerical values that describe some aspect of a system at a particular point in time. They are
lightweight and capable of supporting near real-time scenarios. For many Azure resources, you'll see data
collected by Azure Monitor right in their Overview page in the Azure portal. As an example, look at any
virtual machine and you'll see several charts displaying performance metrics. Click on any of the graphs to
open the data in metrics explorer in the Azure portal, which allows you to chart the values of multiple
metrics over time. You can view the charts interactively or pin them to a dashboard to view them with other
visualizations.
Logs contain different kinds of data organized into records with different sets of properties for each type.
Telemetry such as events and traces are stored as logs in addition to performance data so that it can all be
combined for analysis. Log data collected by Azure Monitor can be analyzed with queries to quickly retrieve,
consolidate, and analyze collected data. Logs are stored and queried from Log Analytics. You can create and
test queries using Log Analytics in the Azure portal and then either directly analyze the data using these
tools or save queries for use with visualizations or alert rules.

Azure Monitor can collect data from a variety of sources. You can think of monitoring data for your applications in
tiers ranging from your application, any operating system, and the services it relies on, down to the Azure platform
itself. Azure Monitor collects data from each of the following tiers:
Application monitoring data: Data about the performance and functionality of the code you have written,
regardless of its platform.
Guest OS monitoring data: Data about the operating system on which your application is running. This OS
could be running in Azure, another cloud, or on-premises.
Azure resource monitoring data: Data about the operation of an Azure resource.
Azure subscription monitoring data: Data about the operation and management of an Azure subscription, as
well as data about the health and operation of Azure itself.
Azure tenant monitoring data: Data about the operation of tenant-level Azure services, such as Azure Active
Directory.
Custom Sources: Logs sent from on-prem sources can be included as well, examples could be on-premises
server events, or network device syslog output.
Monitoring data is only useful if it can increase your visibility into the operation of your computing environment.
Azure Monitor includes several features and tools that provide valuable insights into your applications and other
resources that they depend on. Monitoring solutions and features such as Application Insights and Azure Monitor
for containers provide deep insights into different aspects of your application and specific Azure services.
Monitoring solutions in Azure Monitor are packaged sets of logic that provide insights for a particular application
or service. They include logic for collecting monitoring data for the application or service, queries to analyze that
data, and views for visualization. Monitoring solutions are available from Microsoft and partners to provide
monitoring for various Azure services and other applications.
With all of this rich data collected, it's important to take proactive action on events happening in your environment
where manual queries alone won't suffice. Alerts in Azure Monitor proactively notify you of critical conditions and
potentially attempt to take corrective action. Alert rules based on metrics provide near real-time alerting based on
numeric values, while rules based on logs allow for complex logic across data from multiple sources. Alert rules in
Azure Monitor use action groups, which contain unique sets of recipients and actions that can be shared across
multiple rules. Based on your requirements, action groups can perform such actions as using webhooks to have
alerts start external actions or to integrate with your ITSM tools.
Azure Monitor also allows the creation of custom dashboards. Azure dashboards allow you to combine different
kinds of data, including both metrics and logs, into a single pane in the Azure portal. You can optionally share the
dashboard with other Azure users. Elements throughout Azure Monitor can be added to an Azure dashboard in
addition to the output of any log query or metrics chart. For example, you could create a dashboard that combines
tiles that show a graph of metrics, a table of activity logs, a usage chart from Application Insights, and the output of
a log query.
Finally, Azure Monitor data is a native source for Power BI. Power BI is a business analytics service that provides
interactive visualizations across a variety of data sources and is an effective means of making data available to
others within and outside your organization. You can configure Power BI to automatically import log data from
Azure Monitor to take advantage of these additional visualizations.
Azure Network Watcher provides tools to monitor, diagnose, and view metrics and enable or disable logs for
resources in an Azure virtual network. It's a multifaceted service that allows the following functionalities and more:
Monitor communication between a virtual machine and an endpoint.
View resources in a virtual network and their relationships.
Diagnose network traffic filtering problems to or from a VM.
Diagnose network routing problems from a VM.
Diagnose outbound connections from a VM.
Capture packets to and from a VM.
Diagnose problems with an Azure virtual network gateway and connections.
Determine relative latencies between Azure regions and internet service providers.
View security rules for a network interface.
View network metrics.
Analyze traffic to or from a network security group.
View diagnostic logs for network resources.
Component type: Workloads
Workload components are where your actual applications and services reside. It's where your application
development teams spend most of their time.
The workload possibilities are endless. The following are just a few of the possible workload types:
Internal LOB Applications: Line-of-business applications are computer applications critical to the ongoing
operation of an enterprise. LOB applications have some common characteristics:
Interactive by nature. Data is entered, and results or reports are returned.
Data driven - data intensive with frequent access to databases or other storage.
Integrated - offer integration with other systems within or outside the organization.
Customer facing web sites (Internet or Internal facing): Most applications that interact with the Internet are
web sites. Azure offers the capability to run a web site on an IaaS VM or from an Azure Web Apps site (PaaS ).
Azure Web Apps support integration with VNets that allow the deployment of the Web Apps in a spoke network
zone. Internal facing web sites don't need to expose a public internet endpoint because the resources are accessible
via private non-internet routable addresses from the private VNet.
Big Data/Analytics: When data needs to scale up to larger volumes, relational databases may not perform well
under the extreme load or unstructured nature of the data. Azure HDInsight is a managed, full-spectrum, open-
source analytics service in the cloud for enterprises. You can use open-source frameworks such as Hadoop, Apache
Spark, Apache Hive, LLAP, Apache Kafka, Apache Storm, R, and more. HDInsight supports deploying into a
location-based VNet, can be deployed to a cluster in a spoke of the VDC.
Events and Messaging: Azure Event Hubs is a hyperscale telemetry ingestion service that collects, transforms,
and stores millions of events. As a distributed streaming platform, it offers low latency and configurable time
retention, enabling you to ingest massive amounts of telemetry into Azure and read that data from multiple
applications. With Event Hubs, a single stream can support both real-time and batch-based pipelines.
You can implement a highly reliable cloud messaging service between applications and services through Azure
Service Bus. It offers asynchronous brokered messaging between client and server, structured first-in-first-out
(FIFO ) messaging, and publishes and subscribe capabilities.

These examples barely scratch the surface of the types of workloads you can create in Azure; everything from a
basic Web and SQL app to the latest in IoT, Big Data, Machine Learning, AI, and so much more.
Making the VDC highly available: multiple VDCs
So far, this article has focused on the design of a single VDC, describing the basic components and architectures
that contribute to resiliency. Azure features such as Azure load balancer, NVAs, availability zones, availability sets,
scale sets, along with other mechanisms contribute to a system that enables you to build solid SLA levels into your
production services.
However, because a single VDC is typically implemented within a single region, it may be vulnerable to any major
outage that affects that entire region. Customers that require high availability must protect the services through
deployments of the same project in two (or more) VDC implementations placed in different regions.
In addition to SLA concerns, there are several common scenarios where deploying multiple VDC implementations
makes sense:
Regional or global presence of your end users or partners.
Disaster recovery requirements.
A mechanism to divert traffic between datacenters for load or performance.
Regional/global presence
Azure datacenters are present in numerous regions worldwide. When selecting multiple Azure datacenters,
customers need to consider two related factors: geographical distances and latency. To offer the best user
experience, evaluate the geographical distance between each VDC implementation as well as the distance between
each VDC implementation and the end users.
The region in which VDC implementations are hosted must conform with regulatory requirements established by
any legal jurisdiction under which your organization operates.
Disaster recovery
The design of a disaster recovery plan depends on the types of workloads and the ability to synchronize state of
those workloads between different VDC implementations. Ideally, most customers desire a fast fail-over
mechanism, and this requirement may need application data synchronization between deployments running in
multiple VDC implementations. However, when designing disaster recovery plans, it's important to consider that
most applications are sensitive to the latency that can be caused by this data synchronization.
Synchronization and heartbeat monitoring of applications in different VDC implementations requires them to
communicate over the network. Multiple VDC implementations in different regions can be connected through:
Hub-to-Hub communication automatically built into Azure Virtual WAN Hubs across regions in the same
virtual WAN.
VNet Peering - VNet Peering can connect hubs across regions.
ExpressRoute private peering when the hubs in each VDC implementation are connected to the same
ExpressRoute circuit.
Multiple ExpressRoute circuits connected via your corporate backbone and your multiple VDC implementations
connected to the ExpressRoute circuits.
Site-to-Site VPN connections between the hub zone of your VDC implementations in each Azure Region.
Typically, vWAN Hubs, VNet Peering, or ExpressRoute connections are the preferred type of network connectivity
due to the higher bandwidth and consistent latency levels when transiting through the Microsoft backbone.
We recommend that customers run network qualification tests to verify the latency and bandwidth of these
connections, and decide whether synchronous or asynchronous data replication is appropriate based on the result.
It's also important to weigh these results in view of the optimal recovery time objective (RTO ).
Disaster recovery: diverting traffic from one region to another
Both Azure Traffic Manager and Azure Front Door periodically check the service health of listening endpoints in
different VDC implementations and, if those endpoints fail, route automatically to the next closest VDC. Traffic
Manager uses real-time user measurements and DNS to route users to the closest (or next closest during failure).
Azure Front Door is a reverse proxy at over 100 Microsoft backbone edge sites, using anycast to route users to the
closest listening endpoint.
Summary
The Virtual datacenter is an approach to datacenter migration to create a scalable architecture in Azure that
maximizes cloud resource use, reduces costs, and simplifies system governance. The VDC is most often based on
hub and spoke network topologies (either using VNet Peering or Virtual WAN Hubs), providing common shared
services in the hub and allowing specific applications and workloads in the spokes. The VDC also matches the
structure of company roles, where different departments such as Central IT, DevOps, and operations and
maintenance all work together while performing their specific roles. The VDC satisfies the requirements for a "lift
and shift" migration, but also provides many advantages to native cloud deployments.

References
The following features were discussed in this document. Follow the links to learn more.
NETWORK FEATURES LOAD BALANCING CONNECTIVITY

Azure Virtual Networks Azure Front Door VNet Peering


Network Security Groups Azure Load Balancer (L4) Virtual Private Network
Service Endpoints Application Gateway (L7) Virtual WAN
Private Link Azure Traffic Manager ExpressRoute
User-Defined Routes ExpressRoute Direct
Network Virtual Appliances
Public IP Addresses
Azure DNS

IDENTITY MONITORING BEST PRACTICES

Azure Active Directory Network Watcher Management Group


Multi-Factor Authentication Azure Monitor Subscription Management
Role Base Access Controls Log Analytics Resource Group Management
Default Azure AD Roles Azure Subscription Limits

SECURITY OTHER AZURE SERVICES

Azure Firewall Azure Storage Event Hubs


Firewall Manager Azure SQL Service Bus
Application Gateway WAF Azure Web Apps Azure IoT
Front Door WAF Cosmos DB Azure Machine Learning
Azure DDoS HDInsight

Next Steps
Explore VNet Peering, the underpinning technology for VDC hub and spoke designs
Implement Azure AD to get started with RBAC exploration
Develop a Subscription and Resource management model and RBAC model to meet the structure,
requirements, and policies of your organization. The most important activity is planning. As much as practical,
analyze how reorganizations, mergers, new product lines, and other considerations will affect your initial
models to ensure you can scale to meet future needs and growth.
Best practices for Azure readiness
3 minutes to read • Edit Online

A large part of cloud readiness is equipping staff with the technical skills needed to begin a cloud adoption effort
and prepare your migration target environment for the assets and workloads you'll move to the cloud. The
following topics provide best practices and additional guidance to help your team establish and prepare your
Azure environment.

Azure fundamentals
Use the following guidance when organizing and deploying your assets in the Azure environment:
Azure fundamental concepts. Learn fundamental concepts and terms used in Azure. Also learn how these
concepts relate to one another.
Recommended naming and tagging conventions. Review detailed recommendations for naming and tagging
your resources. These recommendations support enterprise cloud adoption efforts.
Scaling with multiple Azure subscriptions. Understand strategies for scaling with multiple Azure subscriptions.
Organize your resources with Azure management groups. Learn how Azure management groups can manage
resources, roles, policies, and deployment across multiple subscriptions.
Create hybrid cloud consistency. Create hybrid cloud solutions that provide the benefits of cloud innovation
while maintaining many of the conveniences of on-premises management.

Networking
Use the following guidance to prepare your cloud networking infrastructure to support your workloads:
Networking decisions. Choose the networking services, tools, and architectures that will support your
organization's workload, governance, and connectivity requirements.
Virtual network planning. Learn to plan virtual networks based on your isolation, connectivity, and location
requirements.
Best practices for network security. Learn best practices for addressing common network security issues by
using built-in Azure capabilities.
Perimeter networks. Also known as demilitarized zones (DMZs), perimeter networks enable secure connectivity
between your cloud networks and your on-premises or physical datacenter networks, along with any
connectivity to and from the internet.
Hub and spoke network topology. Hub and spoke is a networking model for efficient management of common
communication or security requirements for complicated workloads. It also addresses potential Azure
subscription limitations.

Identity and access control


Use the following guidance when designing your identity and access control infrastructure to improve the security
and management efficiency of your workloads:
Azure identity management and access control security best practices. Learn best practices for identity
management and access control using built-in Azure capabilities.
Best practices for role-based access control. Azure role-based access control (RBAC ) offers fine-grained group-
based access management for resources organized around user roles.
Securing privileged access for hybrid and cloud deployments in Azure Active Directory. Use Azure Active
Directory to help ensure that your organization's administrative access and admin accounts are secure across
your cloud and on-premises environment.

Storage
Azure Storage guidance. Select the right Azure Storage solution to support your usage scenarios.
Azure Storage security guide. Learn about security features in Azure Storage.

Databases
Choose the correct SQL Server option in Azure. Choose the PaaS or IaaS solution that best supports your SQL
Server workloads.
Database security best practices. Learn best practices for database security on the Azure platform.
Choose the right data store. Selecting the right data store for your requirements is a key design decision. There
are literally hundreds of implementations to choose from among SQL and NoSQL databases. Data stores are
often categorized by how they structure data and the types of operations they support. This article describes
several of the most common storage models.

Cost management
Tracking costs across business units, environments, and projects. Learn best practices for creating proper cost-
tracking mechanisms.
How to optimize your cloud investment with Azure Cost Management. Implement a strategy for cost
management and learn about the tools available for addressing cost challenges.
Create and manage budgets. Learn to create and manage budgets by using Azure Cost Management.
Export cost data. Learn to create and manage exported data in Azure Cost Management.
Optimize costs based on recommendations. Learn to identify underutilized resources and take action to reduce
costs by using Azure Cost Management and Azure Advisor.
Use cost alerts to monitor usage and spending. Learn to use Cost Management alerts to monitor your Azure
usage and spending.
Scale with multiple Azure subscriptions
5 minutes to read • Edit Online

Organizations often need more than one Azure subscription as a result of resource limits and other governance
considerations. Having a strategy for scaling your subscriptions is important.

Production and nonproduction workloads


When deploying your first production workload in Azure, you should start with two subscriptions: one for your
production environment and one for your nonproduction (dev/test) environment.

We recommend this approach for several reasons:


Azure has specific subscription offerings for dev/test workloads. These offerings provide discounted rates on
Azure services and licensing.
Your production and nonproduction environments will likely have different sets of Azure policies. Using
separate subscriptions makes it simple to apply each distinct policy set at the subscription level.
You might want certain types of Azure resources in a dev/test subscription for testing. With a separate
subscription, you can use those resource types without making them available in your production
environment.
You can use dev/test subscriptions as isolated sandbox environments. Such sandboxes allow admins and
developers to rapidly build up and tear down entire sets of Azure resources. This isolation can also help with
data protection and security concerns.
Acceptable cost thresholds will likely vary between production and dev/test subscriptions.

Other reasons for multiple subscriptions


Other situations might require additional subscriptions. Keep the following in mind as you expand your cloud
estate.
Subscriptions have different limits for different resource types. For example, the number of virtual
networks in a subscription is limited. When a subscription approaches any of its limits, you'll need to
create another subscription and put new resources there.
For more information, see Azure subscription and service limits, quotas, and constraints.
Each subscription can implement its own policies for deployable resource types and supported regions.
Subscriptions in public cloud regions and sovereign or government cloud regions have different
limitations. These are often driven by different data-classification levels between environments.
If you completely segregate different sets of users for security or compliance reasons, you might require
separate subscriptions. For example, national government organizations might need to limit a
subscription's access to citizens only.
Different subscriptions might have different types of offerings, each with its own terms and benefits.
Trust issues might exist between the owners of a subscription and the owner of resources to be deployed.
Using another subscription with different ownership can mitigate these issues, however, one must also
consider networking and data protection issues that will arise in this deployment.
Rigid financial or geopolitical controls might require separate financial arrangements for specific
subscriptions. These concerns might include considerations of data sovereignty, companies with multiple
subsidiaries, or separate accounting and billing for business units in different countries and different
currencies.
Azure resources created using the classic deployment model should be isolated in their own
subscription. The security for classic resources differs from that of resources deployed via Azure
Resource Manager. Azure policies can't be applied to classic resources.
Service admins using classic resources have the same permissions as role-based access control (RBAC )
owners of a subscription. It's difficult to sufficiently narrow these service admins' access in a subscription
that mixes classic resources and Resource Manager resources.
You might also opt to create additional subscriptions for other business or technical reasons specific to your
organization. There might be some additional costs for data ingress and egress between subscriptions.
You can move many types of resources from one subscription to another or use automated deployments to
migrate resources to another subscription. For more information, see Move Azure resources to another
resource group or subscription.

Manage multiple subscriptions


If you have only a few subscriptions, managing them independently is relatively simple. But if you have many
subscriptions, you should consider creating a management-group hierarchy to simplify managing your
subscriptions and resources.
Management groups allow efficient management of access, policies, and compliance for an organization's
subscriptions. Each management group is a container for one or more subscriptions.
Management groups are arranged in a single hierarchy. You define this hierarchy in your Azure Active Directory
(Azure AD ) tenant to align with your organization's structure and needs. The top level is called the root
management group. You can define up to six levels of management groups in your hierarchy. Each subscription
is contained by only one management group.
Azure provides four levels of management scope: management groups, subscriptions, resource groups, and
resources. Any access or policy applied at one level in the hierarchy is inherited by the levels below it. A
resource owner or subscription owner can't alter an inherited policy. This limitation helps improve governance.

NOTE
Note that tag inheritance is not currently available but will become available soon.

By relying on this inheritance model, you can arrange the subscriptions in your hierarchy so that each
subscription follows appropriate policies and security controls.
Any access or policy assignment on the root management group applies to all resources in the directory.
Carefully consider which items you define at this scope. Include only the assignments you must have.
When you initially define your management-group hierarchy, you first create the root management group. You
then move all existing subscriptions in the directory into the root management group. New subscriptions are
always created in the root management group. You can later move them to another management group.
When you move a subscription to an existing management group, it inherits the policies and role assignments
from the management-group hierarchy above it. Once you have established multiple subscriptions for your
Azure workloads, you should create additional subscriptions to contain Azure services that other subscriptions
share.

For more information, see Organizing your resources with Azure management groups.

Tips for creating new subscriptions


Identify who will be responsible for creating new subscriptions.
Decide which resources will be in a subscription by default.
Decide what all standard subscriptions should look like. Considerations include RBAC access, policies, tags,
and infrastructure resources.
If possible, use a service principal to create new subscriptions. Define a security group that can request new
subscriptions via an automated workflow.
If you're an Enterprise Agreement (EA) customer, ask Azure support to block creation of non-EA
subscriptions for your organization.

Related resources
Azure fundamental concepts.
Organize your resources with Azure management groups.
Elevate access to manage all Azure subscriptions and management groups.
Move Azure resources to another resource group or subscription.

Next steps
Review recommended naming and tagging conventions to follow when deploying your Azure resources.
Recommended naming and tagging conventions
Recommended naming and tagging conventions
9 minutes to read • Edit Online

Organizing cloud-based assets in ways that aid operational management and support accounting requirements
is a common challenge in large cloud adoption efforts. By applying well-defined naming and metadata tagging
conventions to cloud-hosted resources, IT staff can quickly find and manage resources. Well-defined names and
tags also help to align cloud usage costs with business teams by using chargeback and showback accounting
mechanisms.
The Azure Architecture Center's guidance for naming rules and restrictions for Azure resources provides general
recommendations and platform limitations. The following discussion extends that guidance with more detailed
recommendations aimed specifically at supporting enterprise cloud adoption efforts.
Resource names can be difficult to change. Prioritize establishing a comprehensive naming convention before
you begin any large cloud deployment.

NOTE
Every business has different organizational and management requirements. These recommendations provide a starting
point for discussions within your cloud adoption teams.
As these discussions proceed, use the following template to capture the naming and tagging decisions you make when
you align these recommendations to your specific business needs.
Download the naming and tagging convention tracking template.

Naming and tagging resources


A naming and tagging strategy includes business and operational details as components of resource names and
metadata tags:
The business side of this strategy ensures that resource names and tags include the organizational
information that's needed to identify the teams. Use a resource along with the business owners who are
responsible for resource costs.
The operational side ensures that names and tags include information that IT teams use to identify the
workload, application, environment, criticality, and other information useful for managing resources.
Resource naming
An effective naming convention assembles resource names by using important resource information as parts of
a resource's name. For example, using these recommended naming conventions, a public IP resource for a
production SharePoint workload is named like this: pip-sharepoint-prod-westus-001 .
From the name, you can quickly identify the resource's type, its associated workload, its deployment
environment, and the Azure region hosting it.
Naming scope
All Azure resource types have a scope that defines the level that resource names must be unique. A resource
must have a unique name within its scope.
For example, a virtual network has a resource group scope, which means that there can be only one network
named vnet-prod-westus-001 in a given resource group. Other resource groups can have their own virtual
network named vnet-prod-westus-001 . Subnets, to give another example, are scoped to virtual networks, which
means that each subnet within a virtual network must be uniquely named.
Some resource names, such as PaaS services with public endpoints or virtual machine DNS labels, have global
scopes, which means that they must be unique across the entire Azure platform.
Resource names have length limits. Balancing the context embedded in a name with its scope and length is
important when you develop your naming conventions. For more information about naming rules for allowed
characters, scopes, and name lengths for resource types, see Naming conventions for Azure resources.
Recommended naming components
When you construct your naming convention, identify the key pieces of information that you want to reflect in a
resource name. Different information is relevant for different resource types. The following list provides
examples of information that are useful when you construct resource names.
Keep the length of naming components short to prevent exceeding resource name length limits.

NAMING COMPONENT DESCRIPTION EXAMPLES

Business unit Top-level division of your company fin, mktg, product, it, corp
that owns the subscription or workload
the resource belongs to. In smaller
organizations, this component might
represent a single corporate top-level
organizational element.

Subscription type Summary description of the purpose of prod, shared, client


the subscription that contains the
resource. Often broken down by
deployment environment type or
specific workloads.

Application or service name Name of the application, workload, or navigator, emissions, sharepoint,
service that the resource is a part of. hadoop

Deployment environment The stage of the development lifecycle prod, dev, qa, stage, test
for the workload that the resource
supports.

Region The Azure region where the resource is westus, eastus2, westeurope, usgovia
deployed.

Recommended resource-type prefixes


Each workload can consist of many individual resources and services. Incorporating resource type prefixes into
your resource names makes it easier to visually identify application or service components.
The following list provides recommended Azure resource type prefixes to use when you define your naming
conventions.

RESOURCE TYPE RESOURCE NAME PREFIX

Resource group rg-

Availability set avail-

API management service api-

Virtual network vnet-


RESOURCE TYPE RESOURCE NAME PREFIX

Virtual network gateway vnetgw-

Gateway connection cn-

Subnet snet-

Network security group nsg-

Route table route-

Virtual machine vm

VM storage account stvm

Public IP pip-

Load balancer lb-

NIC nic-

Key vault kv-

AKS cluster aks-

AKS container con-

Service Bus sb-

Service Bus queue sbq-

Service Bus topic sbt-

App Service plan plan-

Web app app-

Function app func-

Cloud service cld-

Azure SQL Database server sql-

Azure SQL database sqldb-

Cosmos DB database cosmos-

Azure Cache for Redis cache redis-

MySQL database mysql-


RESOURCE TYPE RESOURCE NAME PREFIX

PostgreSQL database psql-

Azure SQL Data Warehouse sqldw-

SQL Server Stretch Database sqlstrdb-

Storage account st

Azure StorSimple ssimp

Azure Search srch-

Azure Cognitive Services cog-

Azure Machine Learning workspace mlw-

Azure Data Lake Storage dls

Azure Data Lake Analytics dla

Azure HDInsight - Spark hdis-

Azure HDInsight - Hadoop hdihd-

Azure HDInsight - R Server hdir-

Azure HDInsight - HBase hdihb-

Power BI Embedded pbi-

Azure Stream Analytics asa-

Azure Data Factory adf-

Event hub evh-

IoT hub iot-

Notification hubs ntf-

Notification Hubs namespace ntfns-

Metadata tags
When you apply metadata tags to your cloud resources, you can include information about those assets that
couldn't be included in the resource name. You can use that information to perform more sophisticated filtering
and reporting on resources. You want these tags to include context about the resource's associated workload or
application, operational requirements, and ownership information. This information can be used by IT or
business teams to find resources or generate reports about resource usage and billing.
What tags you apply to resources and what tags are required or optional differs among organizations. The
following list provides examples of common tags that capture important context and information about a
resource. Use this list as a starting point to establish your own tagging conventions.

TAG NAME DESCRIPTION KEY EXAMPLE VALUE

Application name Name of the application, ApplicationName {app name}


service, or workload the
resource is associated with.

Approver name Person responsible for Approver {email}


approving costs related to
this resource.

Budget required/approved Money allocated for this BudgetAmount {$}


application, service, or
workload.

Business unit Top-level division of your BusinessUnit FINANCE, MARKETING,


company that owns the {Product Name}, CORP,
subscription or workload SHARED
the resource belongs to. In
smaller organizations, this
tag might represent a single
corporate or shared top-
level organizational element.

Cost center Accounting cost center CostCenter {number}


associated with this
resource.

Disaster recovery Business criticality of the DR Mission-critical, Critical,


application, workload, or Essential
service.

End date of the project Date when the application, EndDate {date}
workload, or service is
scheduled for retirement.

Environment Deployment environment of Env Prod, Dev, QA, Stage, Test


the application, workload, or
service.

Owner name Owner of the application, Owner {email}


workload, or service.

Requester name User who requested the Requestor {email}


creation of this application.

Service class Service level agreement ServiceClass Dev, Bronze, Silver, Gold
level of the application,
workload, or service.

Start date of the project Date when the application, StartDate {date}
workload, or service was
first deployed.

Sample naming convention


The following section provides examples of naming schemes for common Azure resource types that are
deployed during an enterprise cloud deployment.
Subscriptions
ASSET TYPE SCOPE FORMAT EXAMPLES

Subscription Account/Enterprise <Business Unit>- mktg-prod-001


Agreement <Subscription type>- corp-shared-001
<###> fin-client-001

Resource groups
ASSET TYPE SCOPE FORMAT EXAMPLES

Resource group Subscription rg-<App or Service name>- rg-mktgsharepoint-


<Subscription type>- prod-001
<###> rg-acctlookupsvc-
share-001
rg-ad-dir-services-
shared-001

Virtual networking
ASSET TYPE SCOPE FORMAT EXAMPLES

Azure Virtual Network Resource group vnet-<Subscription type>- vnet-shared-


<Region>-<###> eastus2-001
vnet-prod-westus-
001
vnet-client-eastus2-
001

Virtual network virtual Virtual network vnetgw-v-<Subscription vnetgw-v-shared-


gateway type>-<Region>-<###> eastus2-001
vnetgw-v-prod-
westus-001
vnetgw-v-client-
eastus2-001

Virtual network local Virtual gateway vnetgw-l-<Subscription vnetgw-l-shared-


gateway type>-<Region>-<###> eastus2-001
vnetgw-l-prod-
westus-001
vnetgw-l-client-
eastus2-001

Site-to-site connections Resource group cn-<local gateway name>- cn-l-gw-shared-


to-<virtual gateway name> eastus2-001-to-v-
gw-shared-eastus2-
001
cn-l-gw-shared-
eastus2-001-to-
shared-westus-001
ASSET TYPE SCOPE FORMAT EXAMPLES

Virtual network connections Resource group cn-<subscription1> cn-shared-eastus2-


<region1>-to- to-shared-westus
<subscription2> cn-prod-eastus2-to-
<region2>- prod-westus

Subnet Virtual network snet-<subscription>- snet-shared-


<subregion>-<###> eastus2-001
snet-prod-westus-
001
snet-client-eastus2-
001

Network security group Subnet or NIC nsg-<policy name or nsg-weballow-001


appname>-<###> nsg-rdpallow-001
nsg-sqlallow-001
nsg-dnsbloked-001

Public IP Resource group pip-<vm name or app pip-dc1-shared-


name>-<Environment>- eastus2-001
<subregion>-<###> pip-hadoop-prod-
westus-001

Azure Virtual Machines


ASSET TYPE SCOPE FORMAT EXAMPLES

Azure Virtual Machines Resource group vm<policy name or vmnavigator001


appname><###> vmsharepoint001
vmsqlnode001
vmhadoop001

VM storage account Global stvm<performance type> stvmstcoreeastus20


<appname or prodname> 01
<region><###> stvmpmcoreeastus2
001
stvmstplmeastus200
1
stvmsthadoopeastus
2001

DNS label Global <A record of vm>. dc1.westus.cloudapp


[<region>.cloudapp.azure.c .azure.com
om] web1.eastus2.clouda
pp.azure.com

Azure Load Balancer Resource group lb-<app name or role> lb-navigator-prod-


<Environment><###> 001
lb-sharepoint-dev-
001
ASSET TYPE SCOPE FORMAT EXAMPLES

NIC Resource group nic-<##>-<vmname>- nic-01-dc1-shared-


<subscription><###> 001
nic-02-vmhadoop1-
prod-001
nic-02-vmtest1-
client-001

PaaS services
ASSET TYPE SCOPE FORMAT EXAMPLES

Azure Web Apps Global app-<App Name>- app-navigator-prod-


<Environment>-<###>. 001.azurewebsites.n
[{azurewebsites.net}] et
app-accountlookup-
dev-
001.azurewebsites.n
et

Azure Functions Global func-<App Name>- func-navigator-


<Environment>-<###>. prod-
[{azurewebsites.net}] 001.azurewebsites.n
et
func-accountlookup-
dev-
001.azurewebsites.n
et

Azure Cloud Services Global cld-<App Name>- cld-navigator-prod-


<Environment>-<###>. 001.azurewebsites.n
[{cloudapp.net}] et
cld-accountlookup-
dev-
001.azurewebsites.n
et

Azure Service Bus


ASSET TYPE SCOPE FORMAT EXAMPLES

Azure Service Bus Global sb-<App Name>- sb-navigator-prod


<Environment>. sb-emissions-dev
[{servicebus.windows.net}]

Azure Service Bus queues Service Bus sbq-<query descriptor> sbq-messagequery

Azure Service Bus topics Service Bus sbt-<query descriptor> sbt-messagequery

Databases
ASSET TYPE SCOPE FORMAT EXAMPLES

Azure SQL Database Server Global sql-<App Name>- sql-navigator-prod


<Environment> sql-emissions-dev

Azure SQL Database Azure SQL Database sqldb-<Database Name>- sqldb-users-prod


<Environment> sqldb-users-dev

Azure Cosmos DB Global cosmos-<App Name>- cosmos-navigator-


<Environment> prod
cosmos-emissions-
dev

Azure Cache for Redis Global redis-<App Name>- redis-navigator-prod


<Environment> redis-emissions-dev

Azure Database for MySQL Global mysql-<App Name>- mysql-navigator-


<Environment> prod
mysql-emissions-
dev

Azure Database for Global psql-<App Name>- psql-navigator-prod


PostgreSQL <Environment> psql-emissions-dev

Azure SQL Data Warehouse Global sqldw-<App Name>- sqldw-navigator-


<Environment> prod
sqldw-emissions-dev

SQL Server Stretch Azure SQL Database sqlstrdb-<App Name>- sqlstrdb-navigator-


Database <Environment> prod
sqlstrdb-emissions-
dev

Storage
ASSET TYPE SCOPE FORMAT EXAMPLES

Azure Storage account - Global st<storage name><###> stnavigatordata001


general use stemissionsoutput00
1

Azure Storage account - Global stdiag<first 2 letters of stdiagsh001eastus2


diagnostic logs subscription name and 001
number><region><###> stdiagsh001westus0
01

Azure StorSimple Global ssimp<App Name> ssimpnavigatorprod


<Environment> ssimpemissionsdev
AI + Machine Learning
ASSET TYPE SCOPE FORMAT EXAMPLES

Azure Search Global srch-<App Name>- srch-navigator-prod


<Environment> srch-emissions-dev

Azure Cognitive Services Resource group cog-<App Name>- cog-navigator-prod


<Environment> cog-emissions-dev

Azure Machine Learning Resource group mlw-<App Name>- mlw-navigator-prod


workspace <Environment> mlw-emissions-dev

Analytics
ASSET TYPE SCOPE FORMAT EXAMPLES

Azure Data Factory Global adf-<App Name> adf-navigator-prod


<Environment> adf-emissions-dev

Azure Data Lake Storage Global dls<App Name> dlsnavigatorprod


<Environment> dlsemissionsdev

Azure Data Lake Analytics Global dla<App Name> dlanavigatorprod


<Environment> dlaemissionsdev

Azure HDInsight - Spark Global hdis-<App Name>- hdis-navigator-prod


<Environment> hdis-emissions-dev

Azure HDInsight - Hadoop Global hdihd-<App Name>- hdihd-hadoop-prod


<Environment> hdihd-emissions-dev

Azure HDInsight - R Server Global hdir-<App Name>- hdir-navigator-prod


<Environment> hdir-emissions-dev

Azure HDInsight - HBase Global hdihb-<App Name>- hdihb-navigator-


<Environment> prod
hdihb-emissions-dev

Power BI Embedded Global pbi-<App Name> pbi-navigator-prod


<Environment> pbi-emissions-dev

Data Streams / Internet of Things (IoT )


ASSET TYPE SCOPE FORMAT EXAMPLES
ASSET TYPE SCOPE FORMAT EXAMPLES

Azure Stream Analytics Resource group asa-<App Name>- asa-navigator-prod


<Environment> asa-emissions-dev

Azure IoT Hub Global iot-<App Name>- iot-navigator-prod


<Environment> iot-emissions-dev

Azure Event Hubs Global evh-<App Name>- evh-navigator-prod


<Environment> evh-emissions-dev

Azure Notification Hubs Resource group ntf-<App Name>- ntf-navigator-prod


<Environment> ntf-emissions-dev

Azure Notification Hubs Global ntfns-<App Name>- ntfns-navigator-


namespace <Environment> prod
ntfns-emissions-dev
Best practices to set up networking for workloads
migrated to Azure
27 minutes to read • Edit Online

As you plan and design for migration, in addition to the migration itself, one of the most critical steps is the design
and implementation of Azure networking. This article describes best practices for networking when migrating to
IaaS and PaaS implementations in Azure.

IMPORTANT
The best practices and opinions described in this article are based on the Azure platform and service features available at the
time of writing. Features and capabilities change over time. Not all recommendations might be applicable for your
deployment, so select those that work for you.

Design virtual networks


Azure provides virtual networks (VNets):
Azure resources communicate privately, directly, and securely with each other over VNets.
You can configure endpoint connections on VNets for VMs and services that require internet communication.
A VNet is a logical isolation of the Azure cloud that's dedicated to your subscription.
You can implement multiple VNets within each Azure subscription and Azure region.
Each VNet is isolated from other VNets.
VNets can contain private and public IP addresses defined in RFC 1918, expressed in CIDR notation. Public IP
addresses specified in a VNet's address space are not directly accessible from the internet.
VNets can connect to each other using VNet peering. Connected VNets can be in the same or different regions.
Thus resources in one VNet can connect to resources in other VNets.
By default, Azure routes traffic between subnets within a VNet, connected VNets, on-premises networks, and
the internet.
When planning your VNet topology, you should consider how to arrange IP address spaces, how to implement a
hub and spoke network, how to segment VNets into subnets, setting up DNS, and implementing Azure availability
zones.

Best practice: Plan IP addressing


When you create VNets as part of your migration, it's important to plan out your VNet IP address space.
You should assign an address space that isn't larger than a CIDR range of /16 for each VNet. VNets allow for
the use of 65536 IP addresses, and assigning a smaller prefix than /16 would result in the loss of IP addresses.
It's important not to waste IP addresses, even if they're in the private ranges defined by RFC 1918.
The VNet address space shouldn't overlap with on-premises network ranges.
Network Address Translation (NAT) shouldn't be used.
Overlapping addresses can cause networks that can't be connected and routing that doesn't work properly. If
networks overlap, you'll need to redesign the network or use network address translation (NAT).
Learn more:
Get an overview of Azure VNets.
Read the networking FAQ.
Learn about networking limitations.

Best practice: Implement a hub and spoke network topology


A hub and spoke network topology isolates workloads while sharing services such as identity and security.
The hub is an Azure VNet that acts as a central point of connectivity.
The spokes are VNets that connect to the hub VNet using VNet peering.
Shared services are deployed in the hub, while individual workloads are deployed as spokes.
Consider the following:
Implementing a hub and spoke topology in Azure centralizes common services such as connections to on-
premises networks, firewalls, and isolation between VNets. The hub VNet provides a central point of
connectivity to on-premises networks, and a place to host services use by workloads hosted in spoke VNets.
A hub and spoke configuration is typically used by larger enterprises. Smaller networks might consider a
simpler design to save on costs and complexity.
Spoke VNets can be used to isolate workloads, with each spoke managed separately from other spokes. Each
workload can include multiple tiers, and multiple subnets that are connected with Azure load balancers.
Hub and spoke VNets can be implemented in different resource groups, and even in different subscriptions.
When you peer virtual networks in different subscriptions, the subscriptions can be associated to the same, or
different, Azure Active Directory (Azure AD ) tenants. This allows for decentralized management of each
workload, while sharing services maintained in the hub network.

Hub and spoke topology


Learn more:
Read about a hub and spoke topology.
Get network recommendations for running Azure Windows and Linux VMs.
Learn about VNet peering.

Best practice: Design subnets


To provide isolation within a VNet, you segment it into one or more subnets, and allocate a portion of the VNet's
address space to each subnet.
You can create multiple subnets within each VNet.
By default, Azure routes network traffic between all subnets in a VNet.
Your subnet decisions are based on your technical and organizational requirements.
You create subnets using CIDR notation.
When deciding on network range for subnets, it's important to note that Azure retains five IP addresses from
each subnet that can't be used. For example, if you create the smallest available subnet of /29 (with eight IP
addresses), Azure will retain five addresses, so you only have three usable addresses that can be assigned to
hosts on the subnet.
For most cases, use /28 as the smallest subnet.
Example:
The table shows an example of a VNet with an address space of 10.245.16.0/20 segmented into subnets, for a
planned migration.

SUBNET CIDR ADDRESSES USE

DEV-FE-EUS2 10.245.16.0/22 1019 Front-end/web tier VMs

DEV-APP-EUS2 10.245.20.0/22 1019 App-tier VMs

DEV-DB-EUS2 10.245.24.0/23 507 Database VMs

Learn more:
Learn about designing subnets.
Learn how a fictional company (Contoso) prepared their networking infrastructure for migration.

Best practice: Set up a DNS server


Azure adds a DNS server by default when you deploy a VNet. This allows you to rapidly build VNets and deploy
resources. However, this DNS server only provides services to the resources on that VNet. If you want to connect
multiple VNets together, or connect to an on-premises server from VNets, you need additional name resolution
capabilities. For example, you might need Active Directory to resolve DNS names between virtual networks. To do
this, you deploy your own custom DNS server in Azure.
DNS servers in a VNet can forward DNS queries to the recursive resolvers in Azure. This enables you to
resolve host names within that VNet. For example, a domain controller running in Azure can respond to
DNS queries for its own domains, and forward all other queries to Azure.
DNS forwarding allows VMs to see both your on-premises resources (via the domain controller) and Azure-
provided host names (using the forwarder). Access to the recursive resolvers in Azure is provided using the
virtual IP address 168.63.129.16.
DNS forwarding also enables DNS resolution between VNets, and allows on-premises machines to resolve
host names provided by Azure.
To resolve a VM host name, the DNS server VM must reside in the same VNet, and be configured to
forward host name queries to Azure.
Because the DNS suffix is different in each VNet, you can use conditional forwarding rules to send DNS
queries to the correct VNet for resolution.
When you use your own DNS servers, you can specify multiple DNS servers for each VNet. You can also
specify multiple DNS servers per network interface (for Azure Resource Manager), or per cloud service (for
the classic deployment model).
DNS servers specified for a network interface or cloud service take precedence over DNS servers specified
for the VNet.
In the Azure Resource Manager deployment model, you can specify DNS servers for a VNet and a network
interface, but the best practice is to use the setting only on VNets.

DNS servers for VNet


Learn more:
Learn about name resolution when you use your own DNS server.
Learn about DNS naming rules and restrictions.

Best practice: Set up availability zones


Availability zones increase high-availability to protect your apps and data from datacenter failures.
Availability Zones are unique physical locations within an Azure region.
Each zone is made up of one or more datacenters equipped with independent power, cooling, and
networking.
To ensure resiliency, there's a minimum of three separate zones in all enabled regions.
The physical separation of availability zones within a region protects applications and data from datacenter
failures.
Zone-redundant services replicate your applications and data across availability zones to protect from single
points of failure. - - With availability zones, Azure offers an SLA of 99.99% VM uptime.
Availability zone
You can plan and build high-availability into your migration architecture by colocating compute, storage,
networking, and data resources within a zone, and replicating them in other zones. Azure services that
support availability zones fall into two categories:
Zonal services: You associate a resource with a specific zone. For example VMs, managed disks, IP
addresses).
Zone-redundant services: The resource replicates automatically across zones. For example, zone-
redundant storage, Azure SQL Database.
You can deploy a standard Azure load balanced with internet-facing workloads or app tiers, to provide zonal
fault tolerance.
Load balancer
Learn more:
Get an overview of availability zones.

Design hybrid cloud networking


For a successful migration, it's critical to connect on-premises corporate networks to Azure. This creates an always-
on connection known as a hybrid-cloud network, where services are provided from the Azure cloud to corporate
users. There are two options for creating this type of network:
Site-to-site VPN: You establish a site-to-site connection between your compatible on-premises VPN device
and an Azure VPN gateway that's deployed in a VNet. Any authorized on-premises resource can access VNets.
Site-to-site communications are sent through an encrypted tunnel over the internet.
Azure ExpressRoute: You establish an Azure ExpressRoute connection between your on-premises network and
Azure, through an ExpressRoute partner. This connection is private, and traffic doesn't go over the internet.
Learn more:
Learn more about hybrid-cloud networking.

Best practice: Implement a highly available site-to-site VPN


To implement a site-to-site VPN, you set up a VPN gateway in Azure.
A VPN gateway is a specific type of VNet gateway that sends encrypted traffic between an Azure VNet and an
on-premises location over the public internet.
A VPN gateway can also send encrypted traffic between Azure VNets over the Microsoft network.
Each VNet can have only one VPN gateway.
You can create multiple connections to the same VPN gateway. When you create multiple connections, all VPN
tunnels share the available gateway bandwidth.
Every Azure VPN gateway consists of two instances in an active-standby configuration.
For planned maintenance or unplanned disruption to the active instance, failover occurs and the standby
instance takes over automatically, and resumes the site-to-site or VNet-to-VNet connection.
The switchover causes a brief interruption.
For planned maintenance, connectivity should be restored within 10 to 15 seconds.
For unplanned issues, the connection recovery takes longer, about one to 1.5 minutes in the worst case.
Point-to-site (P2S ) VPN client connections to the gateway will be disconnected, and the users will need to
reconnect from client machines.
When setting up a site-to-site VPN, you do the following:
You need a VNet whose address range doesn't overlap with the on-premises network to which the VPN will
connect.
You create a gateway subnet in the network.
You create a VPN gateway, specify the gateway type (VPN ) and whether the gateway is policy-based or route-
based. A route-based VPN is considered more capable and future-proof.
You create a local network gateway on-premises, and configure your on-premises VPN device.
You create a failover site-to-site VPN connection between the VNet gateway and the on-premises device. Using
route-based VPN allows for either active-passive or active-active connections to Azure. Route-based also
supports both site-to-site (from any computer) and point-to-site (from a single computer) connections
concurrently.
You specify the gateway SKU that you want to use. This will depend on your workload requirements,
throughputs, features, and SLAs.
Border gateway protocol (BGP ) is an optional feature you can use with Azure ExpressRoute and route-based
VPN gateways to propagate your on-premises BGP routes to your VNets.

Site-to -site VPN


Learn more:
Review compatible on-premises VPN devices.
Get an overview of VPN gateways.
Learn about highly available VPN connections.
Learn about planning and designing a VPN gateway.
Review VPN gateway settings.
Review gateway SKUs.
Read about setting up BGP with Azure VPN gateways.
Best practice: Configure a gateway for VPN Gateways
When you create a VPN gateway in Azure, you must use a special subnet named GatewaySubnet. When creating
this subnet note these best practices:
The prefix length of the gateway subnet can have a maximum prefix length of 29 (for example,
10.119.255.248/29). The current recommendation is that you use a prefix length of 27 (for example,
10.119.255.224/27).
When you define the address space of the gateway subnet, use the very last part of the VNet address space.
When using the Azure GatewaySubnet, never deploy any VMs or other devices such as Application Gateway to
the gateway subnet.
Don't assign a network security group (NSG ) to this subnet. It will cause the gateway to stop functioning.
Learn more:
Use this tool to determine your IP address space.

Best practice: Implement Azure Virtual WAN for branch offices


For multiple VPN connections, Azure Virtual WAN is a networking service that provides optimized and automated
branch-to-branch connectivity through Azure.
Virtual WAN allows you to connect and configure branch devices to communicate with Azure. This can be done
manually, or by using preferred provider devices through a Virtual WAN partner.
Using preferred provider devices allows for simple use, connectivity, and configuration management.
The Azure WAN built-in dashboard provides instant troubleshooting insights that save time, and provide an
easy way to track large-scale site-to-site connectivity.
Learn more: Learn about Azure Virtual WAN.
Best practice: Implement ExpressRoute for mission-critical connections
The Azure ExpressRoute service extends your on-premises infrastructure into the Microsoft cloud by creating
private connections between the virtual Azure datacenter and on-premises networks.
ExpressRoute connections can be over an any-to-any (IP VPN ) network, a point-to-point Ethernet network, or
through a connectivity provider. They don't go over the public internet.
ExpressRoute connections offer higher security, reliability, and higher speeds (up to 10 Gbps), along with
consistent latency.
ExpressRoute is useful for virtual datacenters, as customers can get the benefits of compliance rules associated
with private connections.
With ExpressRoute Direct you can connect directly to Microsoft routers at 100Gbps, for larger bandwidth needs.
ExpressRoute uses BGP to exchange routes between on-premises networks, Azure instances, and Microsoft
public addresses.
Deploying ExpressRoute connections usually involves engaging with an ExpressRoute service provider. For a quick
start, it's common to initially use a site-to-site VPN to establish connectivity between the virtual datacenter and on-
premises resources, and then migrate to an ExpressRoute connection when a physical interconnection with your
service provider is established.
Learn more:
Read an overview of ExpressRoute.
Learn about ExpressRoute Direct.
Best practice: Optimize ExpressRoute routing with BGP communities
When you have multiple ExpressRoute circuits, you have more than one path to connect to Microsoft. As a result,
suboptimal routing can happen and your traffic might take a longer path to reach Microsoft, and Microsoft to your
network. The longer the network path,the higher the latency. Latency has direct impact on app performance and
user experience.
Example:
Let's review an example:
You have two offices in the US, one in Los Angeles and one in New York.
Your offices are connected on a WAN, which can be either your own backbone network or your service
provider's IP VPN.
You have two ExpressRoute circuits, one in US West and one in US East, that are also connected on the WAN.
Obviously, you have two paths to connect to the Microsoft network.
Problem:
Now imagine you have an Azure deployment (for example, Azure App Service) in both US West and US East.
You want users in each office to access their nearest Azure services for an optimal experience.
Thus you want to connect users in Los Angeles to Azure US West and users in New York to Azure US East.
This works for East Coast users, but not for those on the West Coast. The problem is:
On each ExpressRoute circuit, we advertise both prefixes in Azure US East (23.100.0.0/16) and Azure US
West (13.100.0.0/16).
Without knowing which prefix is from which region, prefixes aren't treated differently.
Your WAN network can assume that both prefixes are closer to US East than US West, and thus route
users from both offices to the ExpressRoute circuit in US East, providing a suboptimal experience for
users in the Los Angeles office.
BGP communities unoptimized connection
Solution:
To optimize routing for both office users, you need to know which prefix is from Azure US West and which is from
Azure US East. You can encode this information by using BGP community values.
You assign a unique BGP community value to each Azure region. For example, 12076:51004 for US East;
12076:51006 for US West.
Now that it's clear which prefix belongs to which Azure region, you can configure a preferred ExpressRoute
circuit.
Because you're using BGP to exchange routing information, you can use BGP's local preference to influence
routing.
In our example, you assign a higher local preference value to 13.100.0.0/16 in US West than in US East, and
similarly, a higher local preference value to 23.100.0.0/16 in US East than in US West.
This configuration ensures that when both paths to Microsoft are available, users in Los Angeles will connect to
Azure US West using the west circuit, and users New York connect to Azure US East using the east circuit.
Routing is optimized on both sides.
BGP communities optimized connection
Learn more:
Learn about optimizing routing.

Secure VNets
The responsibility for securing VNets is shared between Microsoft and you. Microsoft provides many networking
features, as well as services that help keep resources secure. When designing security for VNets, best practices you
should follow include implementing a perimeter network, using filtering and security groups, securing access to
resources and IP addresses, and implementing attack protection.
Learn more:
Get an overview of best practices for network security.
Learn how to design for secure networks.

Best practice: Implement an Azure perimeter network


Although Microsoft invests heavily in protecting the cloud infrastructure, you must also protect your cloud services
and resource groups. A multilayered approach to security provides the best defense. Putting a perimeter network in
place is an important part of that defense strategy.
A perimeter network protects internal network resources from an untrusted network.
It's the outermost layer that's exposed to the internet. It generally sits between the internet and the enterprise
infrastructure, usually with some form of protection on both sides.
In a typical enterprise network topology, the core infrastructure is heavily fortified at the perimeters, with
multiple layers of security devices. The boundary of each layer consists of devices and policy enforcement
points.
Each layer can include a combination of the network security solutions that include firewalls, Denial of Service
(DoS ) prevention, intrusion detection/intrusion protection systems (IDS/IPS ), and VPN devices.
Policy enforcement on the perimeter network can use firewall policies, access control lists (ACLs), or specific
routing.
As incoming traffic arrives from the internet, it's intercepted and handled by a combination of defense solution
to block attacks and harmful traffic, while allowing legitimate requests into the network.
Incoming traffic can route directly to resources in the perimeter network. The perimeter network resource can
then communicate with other resources deeper in the network, moving traffic forward into the network after
validation.
The following figure shows an example of a single subnet perimeter network in a corporate network, with two
security boundaries.

Perimeter network deployment


Learn more:
Learn about deploying a perimeter network between Azure and your on-premises datacenter.

Best practice: Filter VNet traffic with NSGs


Network security groups (NSG ) contain multiple inbound and outbound security rules that filter traffic going to
and from resources. Filtering can be by source and destination IP address, port, and protocol.
NSGs contain security rules that allow or deny inbound network traffic to (or outbound network traffic from)
several types of Azure resources. For each rule, you can specify source and destination, port, and protocol.
NSG rules are evaluated by priority using five-tuple information (source, source port, destination, destination
port, and protocol) to allow or deny the traffic.
A flow record is created for existing connections. Communication is allowed or denied based on the connection
state of the flow record.
A flow record allows an NSG to be stateful. For example, if you specify an outbound security rule to any address
over port 80, you don't need an inbound security rule to respond to the outbound traffic. You only need to
specify an inbound security rule if communication is initiated externally.
The opposite is also true. If inbound traffic is allowed over a port, you don't need to specify an outbound security
rule to respond to traffic over the port.
Existing connections aren't interrupted when you remove a security rule that enabled the flow. Traffic flows are
interrupted when connections are stopped, and no traffic is flowing in either direction, for at least a few minutes.
When creating NSGs, create as few as possible but as many that are necessary.
Best practice: Secure north/south and east/west traffic
When securing VNets, it's important to consider attack vectors.
Using only subnet NSGs simplifies your environment, but only secures traffic into your subnet. This is known as
north/south traffic.
Traffic between VMs on the same subnet is known as east/west traffic.
It's important to use both forms of protection, so that if a hacker gains access from the outside they'll be
stopped when trying to attach machines located in the same subnet.
Use service tags on NSGs
A service tag represents a group of IP address prefixes. Using a service tag helps minimize complexity when you
create NSG rules.
You can use service tags instead of specific IP addresses when you create rules.
Microsoft manages the address prefixes associated with a service tag, and automatically updates the service tag
as addresses change.
You can't create your own service tag, or specify which IP addresses are included within a tag.
Service tags take the manual work out of assigning a rule to groups of Azure services. For example, if you want to
allow a VNet subnet containing web servers access to an Azure SQL Database, you could create an outbound rule
to port 1433, and use the Sql service tag.
This Sql tag denotes the address prefixes of the Azure SQL Database and Azure SQL Data Warehouse services.
If you specify Sql as the value, traffic is allowed or denied to Sql.
If you only want to allow access to Sql in a specific region, you can specify that region. For example, if you want
to allow access only to Azure SQL Database in the East US region, you can specify Sql.EastUS as a service tag.
The tag represents the service, but not specific instances of the service. For example, the tag represents the
Azure SQL Database service, but doesn't represent a particular SQL database or server.
All address prefixes represented by this tag are also represented by the Internet tag.
Learn more:
Read about NSGs.
Review the service tags available for NSGs.

Best practice: Use application security groups


Application security groups enable you to configure network security as a natural extension of an app structure.
You can group VMs and define network security policies based on application security groups.
Application security groups enable you to reuse your security policy at scale without manual maintenance of
explicit IP addresses.
Application security groups handle the complexity of explicit IP addresses and multiple rule sets, allowing you to
focus on your business logic.
Example:
Application security
group example

NETWORK INTERFACE APPLICATION SECURITY GROUP

NIC1 AsgWeb

NIC2 AsgWeb

NIC3 AsgLogic

NIC4 AsgDb

In our example, each network interface belongs to only one application security group, but in fact an interface
can belong to multiple groups, in accordance with Azure limits.
None of the network interfaces have an associated NSG. NSG1 is associated to both subnets and contains the
following rules.

RULE NAME PURPOSE DETAILS


RULE NAME PURPOSE DETAILS

Allow-HTTP-Inbound-Internet Allow traffic from the internet to the Priority: 100


web servers. Inbound traffic from the
internet is denied by the Source: internet
DenyAllInbound default security rule, so
no additional rule is needed for the Source port: *
AsgLogic or AsgDb application security
groups. Destination: AsgWeb

Destination port: 80

Protocol: TCP

Access: Allow.

Deny-Database-All AllowVNetInBound default security rule Priority: 120


allows all communication between
resources in the same VNet, this rule is Source: *
needed to deny traffic from all
resources. Source port: *

Destination: AsgDb

Destination port: 1433

Protocol: All

Access: Deny.

Allow-Database-BusinessLogic Allow traffic from the AsgLogic Priority: 110


application security group to the AsgDb
application security group. The priority Source: AsgLogic
for this rule is higher than the Deny-
Database-All rule, and is processed Source port: *
before that rule, so traffic from the
AsgLogic application security group is Destination: AsgDb
allowed, and all other traffic is blocked.
Destination port: 1433

Protocol: TCP

Access: Allow.

The rules that specify an application security group as the source or destination are only applied to the network
interfaces that are members of the application security group. If the network interface is not a member of an
application security group, the rule is not applied to the network interface, even though the network security
group is associated to the subnet.
Learn more:
Learn about application security groups.
Best practice: Secure access to PaaS using VNet service endpoints
VNet service endpoints extend your VNet private address space and identity to Azure services over a direct
connection.
Endpoints allow you to secure critical Azure service resources to your VNets only. Traffic from your VNet to the
Azure service always remains on the Microsoft Azure backbone network.
VNet private address space can be overlapping and thus cannot be used to uniquely identify traffic originating
from a VNet.
After service endpoints are enabled in your VNet, you can secure Azure service resources by adding a VNet rule
to the service resources. This provides improved security by fully removing public internet access to resources,
and allowing traffic only from your VNet.

Service endpoints
Learn more:
Learn about VNet service endpoints.

Best practice: Control public IP addresses


Public IP addresses in Azure can be associated with VMs, load balancers, application gateways, and VPN gateways.
Public IP addresses allow internet resources to communicate inbound to Azure resources, and Azure resources
to communicate outbound to the internet.
Public IP addresses are created with a basic or standard SKU, which have several differences. Standard SKUs
can be assigned to any service, but are most usually configured on VMs, load balancers, and application
gateways.
It's important to note that a basic public IP address doesn't have an NSG automatically configured. You need to
configure your own and assign rules to control access. Standard SKU IP addresses have an NSG and rules
assigned by default.
As a best practice, VMs shouldn't be configured with a public IP address.
If you need a port opened, it should only be for web services such as port 80 or 443.
Standard remote management ports such as SSH (22) and RDP (3389) should be set to deny, along with
all other ports, using NSGs.
A better practice is to put VMs behind an Azure load balancer or application gateway. Then if access to remote
management ports is needed, you can use just-in-time VM access in the Azure Security Center.
Learn more:
Public IP addresses in Azure
Manage virtual machine access using just-in-time

Take advantage of Azure security features for networking


Azure has platform security features that are easy to use, and provide rich countermeasures to common network
attacks. These include Azure Firewall, web application firewall, and Network Watcher.

Best practice: Deploy Azure Firewall


Azure Firewall is a managed cloud-based network security service that protects your VNet resources. It is a fully
stateful managed firewall with built-in high availability and unrestricted cloud scalability.

Azure Firewall
Azure Firewall can centrally create, enforce, and log application and network connectivity policies across
subscriptions and VNets.
Azure Firewall uses a static public IP address for your VNet resources, allowing outside firewalls to identify
traffic originating from your VNet.
Azure Firewall is fully integrated with Azure Monitor for logging and analytics.
As a best practice when creating Azure Firewall rules, use the FQDN tags to create rules.
An FQDN tag represents a group of FQDNs associated with well-known Microsoft services.
You can use an FQDN tag to allow the required outbound network traffic through the firewall.
For example, to manually allow Windows Update network traffic through your firewall, you would need to
create multiple application rules. Using FQDN tags, you create an application rule, and include the Windows
Updates tag. With this rule in place, network traffic to Microsoft Windows Update endpoints can flow through
your firewall.
Learn more:
Get an overview of Azure Firewall.
Learn about FQDN tags.

Best practice: Deploy a web application firewall (WAF)


Web applications are increasingly targets of malicious attacks that exploit commonly known vulnerabilities. Exploits
include SQL injection attacks and cross-site scripting attacks. Preventing such attacks in application code can be
challenging, and can require rigorous maintenance, patching and monitoring at multiple layers of the application
topology. A centralized web application firewall helps make security management much simpler and helps app
administrators guard against threats or intrusions. A web app firewall can react to security threats faster, by
patching known vulnerabilities at a central location, instead of securing individual web applications. Existing
application gateways can be converted to a web application firewall enabled application gateway easily.
The web application firewall (WAF ) is a feature of Azure Application Gateway.
WAF provides centralized protection of your web applications, from common exploits and vulnerabilities.
WAF protects without modification to back-end code.
It can protect multiple web apps at the same time behind an application gateway.
WAF is integrated with Azure Security Center.
You can customize WAF rules and rule groups to suit your app requirements.
As a best practice, you should use a WAF in front on any web-facing app, including apps on Azure VMs or as an
Azure App Service.
Learn more:
Learn about WAF.
Review WAF limitations and exclusions.

Best practice: Implement Azure Network Watcher


Azure Network Watcher provides tools to monitor resources and communications in an Azure VNet. For example,
you can monitor communications between a VM and an endpoint such as another VM or FQDN, view resources
and resource relationships in a VNet, or diagnose network traffic issues.

Network Watcher
With Network Watcher you can monitor and diagnose networking issues without logging into VMs.
You can trigger packet capture by setting alerts, and gain access to real-time performance information at the
packet level. When you see an issue, you can investigate it in detail.
As a best practice, use Network Watcher to review NSG flow logs.
NSG flow logs in Network Watcher allow you to view information about ingress and egress IP traffic
through an NSG.
Flow logs are written in JSON format.
Flow logs show outbound and inbound flows on a per-rule basis, the network interface (NIC ) to which
the flow applies, 5-tuple information about the flow (source/destination IP, source/destination port, and
protocol), and whether the traffic was allowed or denied.
Learn more:
Get an overview of Network Watcher.
Learn more about NSG flow Logs.

Use partner tools in the Azure Marketplace


For more complex network topologies, you might use security products from Microsoft partners, in particular
network virtual appliances (NVAs).
An NVA is a VM that performs a network function, such as a firewall, WAN optimization, or other network
function.
NVAs bolster VNet security and network functions. They can be deployed for highly available firewalls, intrusion
prevention, intrusion detection, web application firewalls (WAFs), WAN optimization, routing, load balancing,
VPN, certificate management, Active Directory, and multi-factor authentication.
NVA is available from numerous vendors in the Azure Marketplace.

Best practice: Implement firewalls and NVAs in hub networks


In the hub, the perimeter network (with access to the internet) is normally managed through an Azure firewall, a
firewall farm, or a web application firewall (WAF ). Consider the following comparisons.

FIREWALL TYPE DETAILS

WAFs Web applications are common, and tend to suffer from


vulnerabilities and potential exploits.

WAFs are designed to detect attacks against web applications


(HTTP/HTTPS), more specifically than a generic firewall.

Compared with traditional firewall technology, WAFs have a


set of specific features that protect internal web servers from
threats.

Azure Firewall Like NVA firewall farms, Azure Firewall uses a common
administration mechanism, and a set of security rules to
protect workloads hosted in spoke networks, and to control
access to on-premises networks.

The Azure Firewall has built-in scalability.


FIREWALL TYPE DETAILS

NVA firewalls Like Azure Firewall NVA firewall farms have common
administration mechanism, and a set of security rules to
protect workloads hosted in spoke networks, and to control
access to on-premises networks.

NVA firewalls can be manually scaled behind a load balancer.

Though an NVA firewall has less specialized software than a


WAF, it has broader application scope to filter and inspect any
type of traffic in egress and ingress.

If you want to use NVA you can find them in the Azure
Marketplace.

We recommend using one set of Azure Firewalls (or NVAs) for traffic originating on the internet, and another for
traffic originating on-premises.
Using only one set of firewalls for both is a security risk, as it provides no security perimeter between the two
sets of network traffic.
Using separate firewall layers reduces the complexity of checking security rules, and it's clear which rules
correspond to which incoming network request.
Learn more:
Learn about using NVAs in an Azure VNet.

Next steps
Review other best practices:
Best practices for security and management after migration.
Best practices for cost management after migration.
Perimeter networks
6 minutes to read • Edit Online

Perimeter networks enable secure connectivity between your cloud networks and your on-premises or physical
datacenter networks, along with any connectivity to and from the internet. They're also known as demilitarized
zones (DMZs).
For perimeter networks to be effective, incoming packets must flow through security appliances hosted in secure
subnets before reaching back-end servers. Examples are the firewall, intrusion detection systems (IDS ), and
intrusion prevention systems (IPS ). Before they leave the network, internet-bound packets from workloads should
also flow through the security appliances in the perimeter network. The purposes of this flow are policy
enforcement, inspection, and auditing.
Perimeter networks make use of the following Azure features and services:
Virtual networks, user-defined routes, and network security groups
Network virtual appliances (NVAs)
Azure Load Balancer
Azure Application Gateway and web application firewall (WAF )
Public IPs
Azure Front Door with web application firewall
Azure Firewall

NOTE
Azure reference architectures provide example templates that you can use to implement your own perimeter networks:
Implement a DMZ between Azure and your on-premises datacenter
Implement a DMZ between Azure and the internet

Usually, your central IT and security teams are responsible for defining requirements for operating your perimeter
networks.
The preceding diagram shows an example hub and spoke network topology that implements enforcement of two
perimeters with access to the internet and an on-premises network. Both perimeters reside in the DMZ hub. In the
DMZ hub, the perimeter network to the internet can scale up to support many lines of business (LOBs), by using
multiple farms of WAFs and Azure Firewall instances that help protect the spoke virtual networks. The hub also
allows for connectivity via VPN or Azure ExpressRoute as needed.

Virtual networks
Perimeter networks are typically built using a virtual network with multiple subnets to host the different types of
services that filter and inspect traffic to or from the internet via NVAs, WAFs, and Azure Application Gateway
instances.

User-defined routes
By using user-defined routes, customers can deploy firewalls, IDS/IPS, and other virtual appliances. Customers
can then route network traffic through these security appliances for security boundary policy enforcement,
auditing, and inspection. User-defined routes can be created to guarantee that traffic passes through the specified
custom VMs, NVAs, and load balancers.
In a hub and spoke network example, guaranteeing that traffic generated by virtual machines that reside in the
spoke passes through the correct virtual appliances in the hub requires a user-defined route defined in the subnets
of the spoke. This route sets the front-end IP address of the internal load balancer as the next hop. The internal
load balancer distributes the internal traffic to the virtual appliances (load balancer back-end pool).

Azure Firewall
Azure Firewall is a managed cloud-based service that helps protect your Azure virtual network resources. It's a
fully stateful managed firewall with built-in high availability and unrestricted cloud scalability. You can centrally
create, enforce, and log application and network connectivity policies across subscriptions and virtual networks.
Azure Firewall uses a static public IP address for your virtual network resources. It allows outside firewalls to
identify traffic that originates from your virtual network. The service interoperates with Azure Monitor for logging
and analytics.

Network virtual appliances


Perimeter networks with access to the internet are typically managed through an Azure Firewall instance or a farm
of firewalls or web application firewalls.
Different LOBs commonly use many web applications. These applications tend to suffer from various
vulnerabilities and potential exploits. A web application firewall detects attacks against web applications
(HTTP/HTTPS ) in more depth than a generic firewall. Compared with tradition firewall technology, web application
firewalls have a set of specific features to help protect internal web servers from threats.
An Azure Firewall instance and a network virtual appliance firewall use a common administration plane with a set
of security rules to help protect the workloads hosted in the spokes and control access to on-premises networks.
Azure Firewall has built-in scalability, whereas NVA firewalls can be manually scaled behind a load balancer.
A firewall farm typically has less specialized software compared with a WAF, but it has a broader application scope
to filter and inspect any type of traffic in egress and ingress. If you use an NVA approach, you can find and deploy
the software from the Azure Marketplace.
Use one set of Azure Firewall instances (or NVAs) for traffic that originates on the internet and another set for
traffic that originates on-premises. Using only one set of firewalls for both is a security risk because it provides no
security perimeter between the two sets of network traffic. Using separate firewall layers reduces the complexity of
checking security rules and makes clear which rules correspond to which incoming network requests.

Azure Load Balancer


Azure Load Balancer offers a high-availability Layer 4 (TCP/UDP ) service, which can distribute incoming traffic
among service instances defined in a load-balanced set. Traffic sent to the load balancer from front-end endpoints
(public IP endpoints or private IP endpoints) can be redistributed with or without address translation to a pool of
back-end IP addresses (such as NVAs or VMs).
Azure Load Balancer can also probe the health of the various server instances. When an instance fails to respond
to a probe, the load balancer stops sending traffic to the unhealthy instance.
As an example of using a hub and spoke network topology, you can deploy an external load balancer to both the
hub and the spokes. In the hub, the load balancer efficiently routes traffic to services in the spokes. In the spokes,
load balancers manage application traffic.

Azure Front Door Service


Azure Front Door Service is Microsoft's highly available and scalable web application acceleration platform and
global HTTPS load balancer. You can use Azure Front Door Service to build, operate, and scale out your dynamic
web application and static content. It runs in more than 100 locations at the edge of Microsoft's global network.
Azure Front Door Service provides your application with unified regional/stamp maintenance automation, BCDR
automation, unified client/user information, caching, and service insights. The platform offers performance,
reliability, and support SLAs. It also offers compliance certifications and auditable security practices that are
developed, operated, and supported natively by Azure.

Application Gateway
Azure Application Gateway is a dedicated virtual appliance that provides a managed application delivery controller
(ADC ). It offers various layer 7 load-balancing capabilities for your application.
Application Gateway allows you to optimize web farm productivity by offloading CPU -intensive SSL termination
to the application gateway. It also provides other layer 7 routing capabilities, including round-robin distribution of
incoming traffic, cookie-based session affinity, URL path-based routing, and the ability to host multiple websites
behind a single application gateway.
The application gateway WAF SKU includes a web application firewall. This SKU provides protection to web
applications from common web vulnerabilities and exploits. You can configure Application Gateway as an internet-
facing gateway, an internal-only gateway, or a combination of both.

Public IPs
With some Azure features, you can associate service endpoints to a public IP address so that your resource can be
accessed from the internet. This endpoint uses network address translation (NAT) to route traffic to the internal
address and port on the Azure virtual network. This path is the primary way for external traffic to pass into the
virtual network. You can configure public IP addresses to determine what traffic is passed in, and how and where
it's translated onto the virtual network.

Azure DDoS Protection Standard


Azure DDoS Protection Standard provides additional mitigation capabilities over the Basic service tier that are
tuned specifically to Azure virtual network resources. DDoS Protection Standard is simple to enable and requires
no application changes.
You can tune protection policies through dedicated traffic monitoring and machine-learning algorithms. Policies
are applied to public IP addresses associated to resources deployed in virtual networks. Examples are Azure Load
Balancer, Azure Application Gateway, and Azure Service Fabric instances.
Real-time telemetry is available through Azure Monitor views both during an attack and for historical purposes.
You can add application-layer protection by using the web application firewall in Azure Application Gateway.
Protection is provided for IPv4 Azure public IP addresses.
Hub and spoke network topology
4 minutes to read • Edit Online

Hub and spoke is a networking model for more efficient management of common communication or security
requirements. It also helps avoid Azure subscription limitations. This model addresses the following concerns:
Cost savings and management efficiency. Centralizing services that can be shared by multiple workloads,
such as network virtual appliances (NVAs) and DNS servers, in a single location allows IT to minimize
redundant resources and management effort.
Overcoming subscriptions limits. Large cloud-based workloads might require the use of more resources
than are allowed in a single Azure subscription. Peering workload virtual networks from different subscriptions
to a central hub can overcome these limits. For more information, see subscription limits.
Separation of concerns. You can deploy individual workloads between central IT teams and workload teams.
Smaller cloud estates might not benefit from the added structure and capabilities that this model offers. But larger
cloud adoption efforts should consider implementing a hub and spoke networking architecture if they have any of
the concerns listed previously.

NOTE
The Azure Reference Architectures site contains example templates that you can use as the basis for implementing your own
hub and-spoke networks:
Implement a hub and spoke network topology in Azure
Implement a hub and spoke network topology with shared services in Azure

Overview
As shown in the diagram, Azure supports two types of hub and spoke design. It supports communication, shared
resources, and centralized security policy ("VNet Hub" in the diagram), or a virtual WAN type ("Virtual WAN" in
the diagram) for large-scale branch-to-branch and branch-to-Azure communications.
A hub is a central network zone that controls and inspects ingress or egress traffic between zones: internet, on-
premises, and spokes. The hub and spoke topology gives your IT department an effective way to enforce security
policies in a central location. It also reduces the potential for misconfiguration and exposure.
The hub often contains the common service components that the spokes consume. The following examples are
common central services:
The Windows Server Active Directory infrastructure, required for user authentication of third parties that gain
access from untrusted networks before they get access to the workloads in the spoke. It includes the related
Active Directory Federation Services (AD FS ).
A DNS service to resolve naming for the workload in the spokes, to access resources on-premises and on the
internet if Azure DNS isn't used.
A public key infrastructure (PKI), to implement single sign-on on workloads.
Flow control of TCP and UDP traffic between the spoke network zones and the internet.
Flow control between the spokes and on-premises.
If needed, flow control between one spoke and another.
You can minimize redundancy, simplify management, and reduce overall cost by using the shared hub
infrastructure to support multiple spokes.
The role of each spoke can be to host different types of workloads. The spokes also provide a modular approach
for repeatable deployments of the same workloads. Examples are dev and test, user acceptance testing, staging,
and production.
The spokes can also segregate and enable different groups within your organization. An example is Azure DevOps
groups. Inside a spoke, it's possible to deploy a basic workload or complex multitier workloads with traffic control
between the tiers.

Subscription limits and multiple hubs


In Azure, every component, whatever the type, is deployed in an Azure subscription. The isolation of Azure
components in different Azure subscriptions can satisfy the requirements of different lines of business, such as
setting up differentiated levels of access and authorization.
A single hub and spoke implementation can scale up to a large number of spokes. But as with every IT system,
there are platform limits. The hub deployment is bound to a specific Azure subscription, which has restrictions and
limits. One example is a maximum number of virtual network peerings. For more information, see Azure
subscription and service limits, quotas, and constraints.
In cases where limits might be an issue, you can scale up the architecture further by extending the model from a
single hub and spoke to a cluster of hubs and spokes. You can interconnect multiple hubs in one or more Azure
regions by using virtual network peering, Azure ExpressRoute, a virtual WAN, or a site-to-site VPN.

The introduction of multiple hubs increases the cost and management overhead of the system. This is only
justified by scalability, system limits, or redundancy and regional replication for user performance or disaster
recovery. In scenarios that require multiple hubs, all the hubs should strive to offer the same set of services for
operational ease.

Interconnection between spokes


It's possible to implement complex multitier workloads in a single spoke. You can implement multitier
configurations by using subnets (one for every tier) in the same virtual network and by using network security
groups to filter the flows.
An architect might want to deploy a multitier workload across multiple virtual networks. With virtual network
peering, spokes can connect to other spokes in the same hub or in different hubs.
A typical example of this scenario is the case where application processing servers are in one spoke or virtual
network. The database deploys in a different spoke or virtual network. In this case, it's easy to interconnect the
spokes with virtual network peering and avoid transiting through the hub. The solution is to perform a careful
architecture and security review to ensure that bypassing the hub doesn't bypass important security or auditing
points that might exist only in the hub.

Spokes can also be interconnected to a spoke that acts as a hub. This approach creates a two-level hierarchy: the
spoke in the higher level (level 0) becomes the hub of lower spokes (level 1) of the hierarchy. The spokes of a hub
and spoke implementation are required to forward the traffic to the central hub so that the traffic can transit to its
destination in either the on-premises network or the public internet. An architecture with two levels of hubs
introduces complex routing that removes the benefits of a simple hub and spoke relationship.
Track costs across business units, environments, or
projects
7 minutes to read • Edit Online

Building a cost-conscious organization requires visibility and properly defined access (or scope) to cost-related
data. This best-practice article outlines decisions and implementation approaches to creating tracking
mechanisms.

Establish a well-managed environment hierarchy


Cost control, much like governance and other management constructs, depends on a well-managed environment.
Establishing such an environment (especially a complex one) requires consistent processes in the classification
and organization of all assets.
Assets (also known as resources) include all virtual machines, data sources, and applications deployed to the
cloud. Azure provides several mechanisms for classifying and organizing assets. Scaling with multiple Azure
subscriptions details options for organizing resources based on multiple criteria to establish a well-managed
environment. This article focuses on the application of Azure fundamental concepts to provide cloud cost visibility.
Classification
Tagging is an easy way to classify assets. Tagging associates metadata to an asset. That metadata can be used to
classify the asset based on various data points. When tags are used to classify assets as part of a cost
management effort, companies often need the following tags: business unit, department, billing code, geography,
environment, project, and workload or "application categorization." Azure Cost Management can use these tags to
create different views of cost data.
Tagging is a primary way to understand the data in any cost reporting. It's a fundamental part of any well-
managed environment. It's also the first step in establishing proper governance of any environment.
The first step in accurately tracking cost information across business units, environments, and projects is to define
a tagging standard. The second step is to ensure that the tagging standard is consistently applied. The following
articles can help you accomplish each of these steps:
Develop naming and tagging standards
Establish a governance MVP to enforce tagging standards
Resource organization
There are several approaches to organizing assets. This section outlines a best practice based on the needs of a
large enterprise with cost structures spread across business units, geographies, and IT organizations. A similar
best practice for a smaller, less complex organization is available in the standard enterprise governance guide.
For a large enterprise, the following model for management groups, subscriptions, and resource groups will
create a hierarchy that allows each team to have the right level of visibility to perform their duties. When the
enterprise needs cost controls to prevent budget overrun, it can apply governance tooling like Azure Blueprints or
Azure Policy to the subscriptions within this structure to quickly block future cost errors.

In the preceding diagram, the root of the management group hierarchy contains a node for each business unit. In
this example, the multinational company needs visibility into the regional business units, so it creates a node for
geography under each business unit in the hierarchy.
Within each geography, there's a separate node for production and nonproduction environments to isolate cost,
access, and governance controls. To allow for more efficient operations and wiser operations investments, the
company uses subscriptions to further isolate production environments with varying degrees of operational
performance commitments. Finally, the company uses resource groups to capture deployable units of a function,
called applications.
The diagram shows best practices but doesn't include these options:
Many companies limit operations to a single geopolitical region. That approach reduces the need to diversify
governance disciplines or cost data based on local data-sovereignty requirements. In those cases, a geography
node is unnecessary.
Some companies prefer to further segregate development, testing, and quality control environments into
separate subscriptions.
When a company integrates a cloud center of excellence (CCoE ) team, shared services subscriptions in each
geography node can reduce duplicated assets.
Smaller adoption efforts might have a much smaller management hierarchy. It's common to see a single root
node for corporate IT, with a single level of subordinate nodes in the hierarchy for various environments. This
isn't a violation of best practices for a well-managed environment. But it does make it more difficult to provide
a least-rights access model for cost control and other important functions.
The rest of this article assumes the use of the best-practice approach in the preceding diagram. However, the
following articles can help you apply the approach to a resource organization that best fits your company:
Scaling with multiple Azure subscriptions
Deploying a Governance MVP to govern well-managed environment standards

Provide the right level of cost access


Managing cost is a team activity. The organization readiness section of the Cloud Adoption Framework defines a
small number of core teams and outlines how those teams support cloud adoption efforts. This article expands on
the team definitions to define the scope and roles to assign to members of each team for the proper level of
visibility into cost management data.
Roles define what a user can do to various assets.
Scope defines which assets (user, group, service principal, or managed identity) a user can do those things to.
As a general best practice, we suggest a least-privilege model in assigning people to various roles and scopes.
Roles
Azure Cost Management supports the following built-in roles for each scope:
Owner. Can view costs and manage everything, including cost configuration.
Contributor. Can view costs and manage everything, including cost configuration, but excluding access control.
Reader. Can view everything, including cost data and configuration, but can't make any changes.
Cost Management Contributor. Can view costs and manage cost configuration.
Cost Management Reader Can view cost data and configuration.
As a general best practice, members of all teams should be assigned the role of Cost Management Contributor.
This role grants access to create and manage budgets and exports to more effectively monitor and report on
costs. However, members of the cloud strategy team should be set to Cost Management Reader only. That's
because they're not involved in setting budgets within the Azure Cost Management tool.
Scope
The following scope and role settings will create the required visibility into cost management. This best practice
might require minor changes to align to asset organization decisions.
Cloud adoption team. Responsibilities for ongoing optimization changes require Cost Management
Contributor access at the resource group level.
Working environment. At a minimum, the cloud adoption team should already have Contributor
access to all affected resource groups, or at least those groups related to dev/test or ongoing
deployment activities. No additional scope setting is required.
Production environments. When proper separation of responsibility has been established, the cloud
adoption team probably won't continue to have access to the resource groups related to its projects. The
resource groups that support the production instances of their workloads will need additional scope to
give this team visibility into the production cost impact of its decisions. Setting the Cost Management
Contributor scope for production resource groups for this team will allow the team to monitor costs
and set budgets based on usage and ongoing investment in the supported workloads.
Cloud strategy team. Responsibilities for tracking costs across multiple projects and business units require
Cost Management Reader access at the root level of the management group hierarchy.
Assign Cost Management Reader access to this team at the management group. This will ensure
ongoing visibility into all deployments associated with the subscriptions governed by that management
group hierarchy.
Cloud governance team. Responsibilities for managing cost, budget alignment, and reporting across all
adoption efforts requires Cost Management Contributor access at the root level of the management group
hierarchy.
In a well-managed environment, the cloud governance team likely has a higher degree of access
already, making additional scope assignment for Cost Management Contributor unnecessary.
Cloud center of excellence. Responsibility for managing costs related to shared services requires Cost
Management Contributor access at the subscription level. Additionally, this team might require Cost
Management Contributor access to resource groups or subscriptions that contain assets deployed by CCoE
automations to understand how those automations affect costs.
Shared services. When a cloud center of excellence is engaged, best practice suggests that assets
managed by the CCoE are supported from a centralized shared service subscription within a hub and
spoke model. In this scenario, the CCoE likely has contributor or owner access to that subscription,
making additional scope assignment for Cost Management Contributor unnecessary.
CCoE automation/controls. The CCoE commonly provides controls and automated deployment
scripts to cloud adoption teams. The CCoE has a responsibility to understand how these accelerators
affect costs. To gain that visibility, the team needs Cost Management Contributor access to any resource
groups or subscriptions running those accelerators.
Cloud operations team. Responsibility for managing ongoing costs of production environments requires
Cost Management Contributor access to all production subscriptions.
The general recommendation puts production and nonproduction assets in separate subscriptions that
are governed by nodes of the management group hierarchy associated with production environments.
In a well-managed environment, members of the operations team likely have owner or contributor
access to production subscriptions already, making the Cost Management Contributor role
unnecessary.

Additional cost management resources


Azure Cost Management is a well-documented tool for setting budgets and gaining visibility into cloud costs for
Azure or AWS. After you establish access to a well-managed environment hierarchy, the following articles can
help you use that tool to monitor and control costs.
Get started with Azure Cost Management
For more information on getting started with Azure Cost Management, see How to optimize your cloud
investment with Azure Cost Management.
Use Azure Cost Management
Create and manage budgets
Export cost data
Optimize costs based on recommendations
Use cost alerts to monitor usage and spending
Use Azure Cost Management to govern AWS costs
AWS Cost and Usage report integration
Manage AWS costs
Establish access, roles, and scope
Understanding cost management scope
Setting scope for a resource group
Skills readiness path during the Ready phase of a
migration journey
4 minutes to read • Edit Online

During the Ready phase of a migration journey, the objective is to prepare for the journey ahead. This phase is
accomplished in two primary areas: organizational readiness and environmental (technical) readiness. Each area
might require new skills for both technical and nontechnical contributors. The following sections describe a few
options to help build the necessary skills.

Organizational readiness learning paths


Depending on the motivations and business outcomes associated with a cloud adoption effort, leaders might be
required to establish new organizational structures or virtual teams (V -teams) to facilitate various functions. The
following articles help to develop the skills that are necessary to structure those teams in accordance with desired
outcomes:
Initial organization alignment: Overview of organizational alignment and various team structures to facilitate
specific goals.
Break down silos and fiefdoms: Understand two common organizational antipatterns and ways to guide the
team to productive collaboration.

Environmental (technical) readiness learning paths


During the Ready phase, technical staff are called upon to create a migration landing zone that's capable of hosting,
operating, and governing workloads that were migrated to the cloud. Developing the necessary skills can be
accelerated with the following learning paths:
Create an Azure account: The first step to using Azure is to create an account. Your account holds the Azure
services you provision and handles your personal settings like identity, billing, and preferences.
Azure portal: Tour the Azure portal features and services, and customize the portal.
Introduction to Azure: Get started with Azure by creating and configuring your first virtual machine in the
cloud.
Introduction to security in Azure: Discuss the basic concepts for protecting your infrastructure and data when
you work in the cloud. Understand what responsibilities are yours and what Azure takes care of for you.
Manage resources in Azure: Learn how to work with the Azure command line and web portal to create,
manage, and control cloud-based resources.
Create a VM: Create a virtual machine by using the Azure portal.
Azure networking: Learn some of the Azure networking basics and how Azure networking helps improve
resiliency and reduce latency.
Azure compute options: Review the Azure compute services.
Secure resources with role-based access control (RBAC ): Use RBAC to secure resources.
Data storage options: Benefits of Azure data storage.
During the Ready phase, architects are called upon to architect solutions that span all Azure environments. The
following skill-building resources can prepare architects for these tasks:
Foundations for cloud architecture: PluralSight course to help architect the right foundational solutions.
Microsoft Azure architecture: PluralSight course to ground architects in Azure architecture.
Designing migrations for Microsoft Azure: PluralSight course to help architects design a migration solution.

Deeper skills exploration


Beyond these initial options for developing skills, there are a variety of learning options available.
Typical mappings of cloud IT roles
Microsoft and partners offer a variety of options for all audiences to develop their skills with Azure services:
Microsoft IT Pro Career Center: Serves as a free online resource to help map your cloud career path. Learn
what industry experts suggest for your cloud role and the skills to get you there. Follow a learning curriculum at
your own pace to build the skills you need most to stay relevant.
Turn your knowledge of Azure into official recognition with Microsoft Azure certification training and exams.

Microsoft Learn
Microsoft Learn is a new approach to learning. Readiness for the new skills and responsibilities that come with
cloud adoption doesn't come easily. Microsoft Learn provides a more rewarding approach to hands-on learning
that helps you achieve your goals faster. Earn points and levels and achieve more.
The following examples are a few tailored learning paths on Microsoft Learn which align to the Ready portion of
the Cloud Adoption Framework:
Azure fundamentals: Learn cloud concepts such as High Availability, Scalability, Elasticity, Agility, Fault Tolerance,
and Disaster Recovery. Understand the benefits of cloud computing in Azure and how it can save you time and
money. Compare and contrast basic strategies for transitioning to the Azure cloud. Explore the breadth of services
available in Azure including compute, network, storage and security.
Manage resources in Azure: Learn how to work with the Azure command line and web portal to create, manage,
and control cloud-based resources.
Administer infrastructure resources in Azure: Learn how to create, manage, secure and scale virtual machine
resources.
Store data in Azure: Azure provides a variety of ways to store data: unstructured, archival, relational, and more.
Learn the basics of storage management in Azure, how to create a Storage Account, and how to choose the right
model for the data you want to store in the cloud.
Architect great solutions in Azure: Learn how to design and build secure, scalable, high-performing solutions in
Azure by examining the core principles found in every good architecture.

Learn more
For additional learning paths, browse the Microsoft Learn catalog. Use the Roles filter to align learning paths with
your role.
Any enterprise-scale cloud adoption plan, will include workloads which do not warrant significant investments in the creation of
new business logic. Those workloads could be moved to the cloud through any number of approaches: lift and shift; lift and
optimize; or modernize. Each of these approaches is considered a migration. The following exercises will help establish the
iterative processes to assess, migrate, optimize, secure, and manage those workloads.

Getting started
To prepare you for this phase of the cloud adoption lifecycle, the framework suggests the following five exercises:

Migration prerequisite
Validate that a landing zone has been deployed and is ready to host the first few workloads that will be migrated to Azure. If a
cloud adoption strategy and cloud adoption plan have not been created, validate that both efforts are in progress.

Migrate your first workload


Leverage the Azure migration guide to guide the migration of your first workload. This will help you become familiar with the
tools and approaches needed to scale adoption efforts.

Expanded migration scenarios


Leverage the expanded scope checklist to identify scenarios which would require modifications to your future state
architecture, migration processes, landing zone configurations, or migration tooling decisions.

Best Practices
Validate any modifications against the best practices section to ensure proper implementation of expanded scope or
workload/architecture specific migration approaches.

Process Improvements
Migration is a process heavy activity. As migration efforts scale, use the migration considerations section to evaluate and
mature various aspects of your processes.
Iterative migration process
At its core, migration to the cloud consists of four simple phases: Assess, Migrate, Optimize, and Secure & Manage. This section
of the Cloud Adoption Framework teaches readers to maximize the return from each phase of the process and align those
phases with your cloud adoption plan. The following graphic illustrates those phases in an iterative approach:

Create a balanced cloud portfolio


Any balanced technology portfolio has a mixture of assets in various states. Some applications are scheduled for retirement and
given minimal support. Other applications or assets are supported in a maintenance state, but the features of those solutions are
stable. For newer business processes, changing market conditions will likely spur ongoing feature enhancements or
modernization. When opportunities to drive new revenue streams arise, new applications or assets are introduced into the
environment. At each stage of an asset's lifecycle, the impact any investment has on revenue and profit will change. The later the
lifecycle stage, the less likely a new feature or modernization effort will yield a strong return on investment.
The cloud provides various adoption mechanisms, each with similar degrees of investment and return. Building cloud-native
applications can significantly reduce operating expenses. Once a cloud-native application is released, development of new
features and solutions can iterate faster. Modernizing an application can yield similar benefits by removing legacy constraints
associated with on-premises development models. Unfortunately, these two approaches are labor-intensive and depend on the
size, skill, and experience of software development teams. Often, labor is misaligned—people with the skills and talent to
modernize applications would rather build new applications. In a labor-constrained market, large-scale modernization projects
can suffer from an employee satisfaction and talent issue. In a balanced portfolio, this approach should be reserved for
applications that would receive significant feature enhancements if they remained on-premises.

Envision an end state


An effective journey needs a target destination. Establish a rough vision of the end state before taking the first step. This
infographic outlines a starting point consisting of existing applications, data, and infrastructure, which defines the digital estate.
During the migration process, each asset is transitioned via one of the options on the right.

Migration implementation
These articles outlines two journeys, each with a similar goal—to migrate a large percentage of existing assets to Azure.
However, the business outcomes and current state will significantly influence the processes required to get there. Those subtle
deviations result in two radically different approaches to reaching a similar end state.
To guide incremental execution during the transition to the end state, this model separates migration into two areas of focus.
Migration preparation: Establish a rough migration backlog based largely on the current state and desired outcomes.
Business outcomes: The key business objectives driving this migration.
Digital estate estimate: A rough estimate of the number and condition of workloads to be migrated.
Roles and responsibilities: A clear definition of the team structure, separation of responsibilities, and access requirements.
Change management requirements: The cadence, processes, and documentation required to review and approve changes.
These initial inputs shape the migration backlog. The output of the migration backlog is a prioritized list of applications to
migrate to the cloud. That list shapes the execution of the cloud migration process. Over time, it will also grow to include much of
the documentation needed to manage change.
Migration process: Each cloud migration activity is contained in one of the following processes, as it relates to the migration
backlog.
Assess: Evaluate an existing asset and establish a plan for migration of the asset.
Migrate: Replicate the functionality of an asset in the cloud.
Optimize: Balance the performance, cost, access, and operational capacity of a cloud asset.
Secure and manage: Ensure a cloud asset is ready for ongoing operations.
The information gathered during development of a migration backlog determines the complexity and level of effort required
within the cloud migration process during each iteration and for each release of functionality.

Transition to the end state


The goal is a smooth and partly automated migration to the cloud. The migration process uses the tools provided by a cloud
vendor to rapidly replicate and stage assets in the cloud. Once verified a simple network change reroutes users to the cloud
solution. For many use cases, the technology to achieve this goal is largely available. There are example cases that demonstrate
the speed at which 10,000 VMs can be replicated in Azure.
However, an incremental migration approach is still required. In most environments, the long list of VMs to be migrated must be
decomposed into smaller units of work for a migration to be successful. There are many factors that limit the number of VMs
that can be migrated in a given period. Outbound network speed is one of the few technical limits; most of the limits are imposed
the business's ability to validate and adapt to change.
The incremental migration approach of the Cloud Adoption Framework helps build an incremental plan that reflects and
documents technical and cultural limitations. The goal of this model is to maximize migration velocity while minimizing overhead
from both IT and the business. Provided below are two examples of an incremental migration execution based on the migration
backlog.

Azure migration guide


Narrative summary: This customer is migrating fewer than 1,000 VMs. Fewer than ten applications supported are
owned by an application owner not in the IT organization. The remaining applications, VMs, and associated data are
owned and supported by members of the cloud adoption team. Members of the cloud adoption team have administrative
access to the production environments in the existing datacenter.

Complex scenario guide


Narrative summary: This customer's migration has complexity across the business, culture, and technology. This guide
includes multiple specific complexity challenges and ways to overcome those challenges.
These two journeys represent two extremes of experience for customers who invest in cloud migration. Most companies reflect a
combination of the two scenarios above. After reviewing the journeys, use the Cloud Adoption Framework migration model to
start the migration conversation and modify the baseline journeys to more closely meet your needs.

Next steps
Choose one of these journeys:
Azure migration guide
Expanded scope guide
4 minutes to read • Edit Online

Azure migration guide: Before you start


Before you start
Before you migrate resources to Azure, you need to choose the migration method and the features you'll use to
govern and secure your environment. This guide leads you through this decision process.

TIP
For an interactive experience, view this guide in the Azure portal. Go to the Azure Quickstart Center in the Azure portal,
select Migrate your environment to Azure, and then follow the step-by-step instructions.

Overview
When to use this guide
Migration options
This guide walks you through the basics of migrating applications and resources from your on-premises
environment to Azure. It is designed for migration scopes with minimal complexity. To determine the suitability
of this guide for your migration, see the When to use this guide tab.
When you migrate to Azure, you may migrate your applications as-is using IaaS -based virtual machine
solutions (known as a rehost or lift and shift migration), or you may have the flexibility to use managed services
and other cloud-native features to modernize your applications. See the Migration options tab for more
information on these choices. As you develop your migration strategy, you might consider:
Will my migrating applications work in the cloud?
What is the best strategy (with regard to technology, tools, and migrations) for my application? See the
Microsoft Cloud Adoption Framework's Migration tools decision guide for more information.
How do I minimize downtime during the migration?
How do I control costs?
How do I track resource costs and bill them accurately?
How do I ensure we remain compliant and meet regulations?
How do I meet legal requirements for data sovereignty in certain countries?
This guide helps answer these questions. It suggests the tasks and features to consider as you prepare to deploy
resources in Azure, including:
Configure prerequisites. Plan and prepare for migration.
Assess your technical fit. Validate the technical readiness and suitability for migration.
Manage costs and billing. Look at the costs of your resources.
Migrate your services. Perform the actual migration.
Organize your resources. Lock resources critical to your system and tag resources to track them.
Optimize and transform. Use the post-migration opportunity to review your resources.
Secure and manage. Ensure that your environment is secure and monitored properly.
Get assistance. Get help and support during your migration or post-migration activities.
To learn more about organizing and structuring your subscriptions, managing your deployed resources, and
complying with your corporate policy requirements, see Governance in Azure.
4 minutes to read • Edit Online

Prerequisites
Prerequisites for migrating to Azure
The resources in this section will help prepare your current environment for migration to Azure.
Overview
Understand migration approaches
Planning checklist
Reasons for migrating to Azure include removing risks associated with legacy hardware, reducing capital expense,
freeing up datacenter space, and quickly realizing return on investment (ROI).
Eliminate legacy hardware. You may have applications hosted on infrastructure that is nearing end of life or
support, whether on-premises or at a hosting provider. Migration to the cloud offers an attractive solution to
the challenge as the ability to migrate "as-is" allows the team to quickly resolve the current infrastructure
lifecycle challenge and then turn its attention to long-term planning for application lifecycle and optimization in
the cloud.
Address end-of-support for software. You may have applications that depend on other software or operating
systems that are nearing end of support. Moving to Azure may provide extended support options for these
dependencies or other migration options that minimize refactoring requirements to support your applications
going forward. For example, see extended support options for Windows Server 2008 and SQL Server 2008.
Reduce capital expense. Hosting your own server infrastructure requires considerable investment in
hardware, software, electricity, and personnel. Migrating to a cloud solution can provide significant reductions in
capital expense. To achieve the best capital expense reductions, a redesign of the solution may be required.
However, an "as-is" migration is a great first step.
Free up datacenter space. You may choose Azure in order to expand your datacenter capacity. One way to do
this is using the cloud as an extension of your on-premises capabilities.
Quickly realize return on investment. Making a return on investment (ROI) is much easier with cloud
solutions, as the cloud payment model provides great utilization insight and promotes a culture for realizing
ROI.
Each of the above scenarios may be entry points for extending your cloud footprint using another methodology
(rehost, refactor, rearchitect, rebuild, or replace).

Migration characteristics
The guide assumes that prior to this migration, your digital estate consists mostly of on-premises hosted
infrastructure and may include hosted business-critical applications. After a successful migration, your data estate
may look very much how it did on-premises but with the infrastructure hosted in cloud resources. Alternatively,
the ideal data estate is a variation of your current data estate, since it has aspects of your on-premises
infrastructure with components which have been refactored to optimize and take advantage of the cloud platform.
The focus of this migration journey is to achieve:
Remediation of legacy hardware end-of-life.
Reduction of capital expense.
Return on investment.

NOTE
An additional benefit of this migration journey is the additional software support model for Windows 2008, Windows 2008
R2, and SQL Server 2008, and SQL Server 2008 R2. For more information, see:
Windows Server 2008 and Windows Server 2008 R2.
SQL Server 2008 and SQL Server 2008 R2.
Assess the digital estate
5 minutes to read • Edit Online

In an ideal migration, every asset (infrastructure, app, or data) would be compatible with a cloud platform and
ready for migration. In reality, not everything should be migrated to the cloud. Furthermore, not every asset is
compatible with cloud platforms. Before migrating a workload to the cloud, it is important to assess the workload
and each related asset (infrastructure, apps, and data).
The resources in this section will help you assess of your environment to determine its suitability for migration and
which methods to consider.
Tools
Scenarios and Stakeholders
Timelines
Cost Management
The following tools help you assess your environment to determine the suitability of migration and best approach
to use. For helpful information on choosing the right tools to support your migration efforts, see the Cloud
Adoption Framework's migration tools decision guide.

Azure Migrate
The Azure Migrate service assesses on-premises infrastructure, applications and data for migration to Azure. The
service assesses the migration suitability of on-premises assets, performs performance-based sizing, and provides
cost estimates for running on-premises assets in Azure. If you're considering lift and shift migrations, or are in the
early assessment stages of migration, this service is for you. After completing the assessment, Azure Migrate can
be used to execute the migration.

Create a new server migration project


To get started with a server migration assessment using Azure Migrate, follow these steps:
1. Select Azure Migrate.
2. In Overview, click Assess and migrate servers.
3. Select Add tools.
4. In Discover, assess and migrate servers, click Add tools.
5. In Migrate project, select your Azure subscription, and create a resource group if you don't have one.
6. In Project Details, specify the project name, and geography in which you want to create the project, and click
Next.
7. In Select assessment tool, select Skip adding an assessment tool for now > Next.
8. In Select migration tool, select Azure Migrate: Server Migration > Next.
9. In Review + add tools, review the settings, and click Add tools
10. After adding the tool, it appears in the Azure Migrate project > Servers > Migration tools.
A S S E S S A N D M I G R A TE
SERVERS

Learn more
Azure Migrate overview
Migrate physical or virtualized servers to Azure
Azure Migrate in the Azure portal

Service Map
Service Map automatically discovers application components on Windows and Linux systems and maps the
communication between services. With Service Map, you can view your servers in the way that you think of them:
as interconnected systems that deliver critical services. Service Map shows connections between servers,
processes, inbound and outbound connection latency, and ports across any TCP -connected architecture, with no
configuration required other than the installation of an agent.
Azure Migrate uses Service Map to enhance the reporting capabilities and dependencies across the environment.
Full details of this integration are outlined in Dependency visualization. If you use the Azure Migration service then
there are no additional steps required to configure and obtain the benefits of Service Map. The following
instructions are provided for your reference should your wish to use Service Map for other purposes or projects.
Enable dependency visualization using Service Map
To use dependency visualization, you need to download and install agents on each on-premises machine that you
want to analyze.
Microsoft Monitoring agent (MMA) needs to be installed on each machine.
The Microsoft Dependency agent needs to be installed on each machine.
In addition, if you have machines with no internet connectivity, you need to download and install Log Analytics
gateway on them.
Learn more
Using Service Map solution in Azure
Azure Migrate and Service Map: Dependency visualization
Migrate assets (infrastructure, apps, and data)
10 minutes to read • Edit Online

In this phase of the journey, you use the output of the assess phase to initiate the migration of the environment.
This guide helps identify the appropriate tools to reach a "done state", including native tools, third-party tools, and
project management tools.
Native migration tools
Third-party migration tools
Project management tools
Cost management
The following sections describe the native Azure tools available to perform or assist with migration. For
information on choosing the right tools to support your migration efforts, see the Cloud Adoption Framework's
Migration tools decision guide.

Azure Migrate
Azure Migrate delivers a unified and extensible migration experience. Azure Migrate provides a one-stop,
dedicated experience to track your migration journey across the phases of assessment and migration to Azure. It
provides you the option to use the tools of your choice and track the progress of migration across these tools.
Azure Migrate provides the following functionality:
1. Enhanced assessment and migration capabilities:
Hyper-V assessments.
Improved VMware assessment.
Agentless migration of VMware virtual machines to Azure.
2. Unified assessment, migration, and progress tracking.
3. Extensible approach with ISV integration (such as Cloudamize).
To perform a migration using Azure Migrate follow these steps:
1. Search for Azure Migrate under All services. Select Azure Migrate to continue.
2. Select Add a tool to start your migration project.
3. Select the subscription, resource group, and geography to host the migration.
4. Select Select assessment tool > Azure Migrate: Server Assessment > Next.
5. Select Review + add tools, and verify the configuration. Click Add tools to initiate the job to create the
migration project and register the selected solutions.
Learn more
Azure Migrate tutorial - Migrate physical or virtualized servers to Azure

Azure Site Recovery


The Azure Site Recovery service can manage the migration of on-premises resources to Azure. It can also manage
and orchestrate disaster recovery of on-premises machines and Azure VMs for business continuity and disaster
recovery (BCDR ) purposes.
The following steps outline the process to use Site Recovery to migrate:
TIP
Depending on your scenario, these steps may differ slightly. For more information, see the Migrate on-premises machines to
Azure article.

Prepare Azure Site Recovery service


1. In the Azure portal, select +Create a resource > Management Tools > Backup and Site Recovery.
2. If you haven't yet created a recovery vault, complete the wizard to create a Recovery Services vault resource.
3. In the Resource menu, select Site Recovery > Prepare Infrastructure > Protection goal.
4. In Protection goal, select what you want to migrate.
a. VMware: Select To Azure > Yes, with VMware vSphere Hypervisor.
b. Physical machine: Select To Azure > Not virtualized/Other.
c. Hyper-V: Select To Azure > Yes, with Hyper-V. If Hyper-V VMs are managed by VMM, select Yes.
Configure migration settings
1. Set up the source environment as appropriate.
2. Set up the target environment.
a. Click Prepare infrastructure > Target, and select the Azure subscription you want to use.
b. Specify the Resource Manager deployment model.
c. Site Recovery checks that you have one or more compatible Azure storage accounts and networks.
3. Set up a replication policy.
4. Enable replication.
5. Run a test migration (test failover).
Migrate to Azure using failover
1. In Settings > Replicated items select the machine > Failover.
2. In Failover select a Recovery Point to fail over to. Select the latest recovery point.
3. Configure any encryption key settings as required.
4. Select Shut down machine before beginning failover. Site Recovery will attempt to shut down virtual
machines before triggering the failover. Failover continues even if shutdown fails. You can follow the failover
progress on the Jobs page.
5. Check that the Azure VM appears in Azure as expected.
6. In Replicated items, right-click the VM and choose Complete Migration.
7. Perform any post-migration steps as required (see relevant information in this guide).
C R E A TE A R E C O V E R Y S E R V I C E S
V AU LT

For more information, see:


Migrate on-premises machines to Azure

Azure Database Migration Service


The Azure Database Migration Service is a fully managed service that enables seamless migrations from multiple
database sources to Azure data platforms, with minimal downtime (online migrations). The Azure Database
Migration Service performs all of the required steps. You can initiate your migration projects with the assurance
that the process takes advantage of best practices recommended by Microsoft.
Create an Azure Database Migration Service instance
If this is the first time using Azure Database Migration Service, you need to register the resource provider for your
Azure subscription:
1. Select All services, then Subscriptions, and choose the target subscription.
2. Select Resource providers.
3. Search for migration , and then to the right of Microsoft.DataMigration, select Register.
G O TO
S U B S C R I P TI O N S

After you register the resource provider, you can create an instance of Azure Database Migration Service.
1. Select +Create a resource and search the marketplace for Azure Database Migration Service.
2. Complete the Create Migration Service wizard, and select Create.
The service is now ready to migrate the supported source databases (for example, SQL Server, MySQL,
PostgreSQL, or MongoDb).
C R E A TE A N A Z U R E D A TA B A S E M I G R A TI O N S E R V I C E
I N S TA N C E

For more information, see:


Azure Database Migration Service overview
Create an instance of the Azure Database Migration Service
Azure Migrate in the Azure portal
Azure portal: Create a migration project

Data Migration Assistant


The Data Migration Assistant (DMA) helps you upgrade to a modern data platform by detecting compatibility
issues that can affect database functionality in your new version of SQL Server or Azure SQL Database. DMA
recommends performance and reliability improvements for your target environment and allows you to move your
schema, data, and uncontained objects from your source server to your target server.

NOTE
For large migrations (in terms of number and size of databases), we recommend that you use the Azure Database Migration
Service, which can migrate databases at scale.

To get started with the Data Migration Assistant follow these steps.
1. Download and Install the Data Migration Assistant from the Microsoft Download Center.
2. Create an assessment by clicking the New (+) icon and select the Assessment project type.
3. Set the source and target server type. Click Create.
4. Configure the assessment options as required (recommend all defaults).
5. Add the databases to assess.
6. Click Next to start the assessment.
7. View results within the Data Migration Assistant tool set.
For an enterprise, we recommend following the approach outlined in Assess an enterprise and consolidate
assessment reports with DMA to assess multiple servers, combine the reports and then use provided Power BI
reports to analyze the results.
For more information, including detailed usage steps, see:
Data Migration Assistant overview
Assess an enterprise and consolidate assessment reports with DMA
Analyze consolidated assessment reports created by Data Migration Assistant with Power BI

SQL Server Migration Assistant


Microsoft SQL Server Migration Assistant (SSMA) is a tool designed to automate database migration to SQL
Server from Microsoft Access, DB2, MySQL, Oracle, and SAP ASE. The general concept is to collect, assess, and
then review with these tools, however, due to the variances in the process for each of the source systems we
recommend reviewing the detailed SQL Server Migration Assistant documentation.
For more information, see:
SQL Server Migration Assistant overview

Database Experimentation Assistant


Database Experimentation Assistant (DEA) is a new A/B testing solution for SQL Server upgrades. It will assist in
evaluating a targeted version of SQL for a given workload. Customers who are upgrading from previous SQL
Server versions (SQL Server 2005 and above) to any new version of the SQL Server can use these analysis
metrics.
The Database Experimentation Assistant contains the following workflow activities:
Capture: The first step of SQL Server A/B testing is to capture a trace on your source server. The source server
usually is the production server.
Replay: The second step of SQL Server A/B testing is to replay the trace file that was captured to your target
servers. Then, collect extensive traces from the replays for analysis.
Analysis: The final step is to generate an analysis report by using the replay traces. The analysis report can
help you gain insight about the performance implications of the proposed change.
For more information, see:
Overview of Database Experimentation Assistant

Cosmos DB Data Migration Tool


Azure Cosmos DB Data Migration tool can import data from various sources into Azure Cosmos DB collections
and tables. You can import from JSON files, CSV files, SQL, MongoDB, Azure Table storage, Amazon DynamoDB,
and even Azure Cosmos DB SQL API collections. The Data Migration tool can also be used when migrating from a
single partition collection to a multipartition collection for the SQL API.
For more information, see:
Cosmos DB Data Migration Tool
Migration-focused cost control mechanisms
8 minutes to read • Edit Online

The cloud introduces a few shifts in how we work, regardless of our role on the technology team. Cost is a great
example of this shift. In the past, only finance and IT leadership were concerned with the cost of IT assets
(infrastructure, apps, and data). The cloud empowers every member of IT to make and act on decisions that better
support the end user. However, with that power comes the responsibility to be cost conscious when making those
decisions.
This article introduces the tools that can help make wise cost decisions before, during, and after a migration to
Azure.
The tools in this article include:

Azure Migrate
Azure pricing calculator
Azure TCO calculator
Azure Cost Management
Azure Advisor

The processes described in this article may also require a partnership with IT managers, finance, or line-of-
business application owners.
Estimate VM costs prior to migration
Estimate and optimize VM costs during and after migration
Tips and tricks to optimize costs
Prior to migration of any asset (infrastructure, app, or data), there is an opportunity to estimate costs and refine
sizing based on observed performance criteria for those assets. Estimating costs serves two purposes: it allows for
cost control, and it provides a checkpoint to ensure that current budgets account for necessary performance
requirements.

Cost calculators
For manual cost calculations, there are two handy calculators which can provide a quick cost estimate based on the
architecture of the workload to be migrated.
The Azure pricing calculator provides cost estimates based on manually entered Azure products.
Sometimes decisions require a comparison of the future cloud costs and the current on-premises costs. The
Total Cost of Ownership (TCO ) calculator can provide such a comparison.
These manual cost calculators can be used on their own to forecast potential spend and savings. They can also be
used in conjunction with Azure Migrate's cost forecasting tools to adjust the cost expectations to fit alternative
architectures or performance constraints.

Azure Migrate calculations


Prerequisites: The remainder of this tab assumes the reader has already populated Azure Migrate with a
collection of assets (infrastructure, apps, and data) to be migrated. The prior article on assessments provides
instructions on collecting the initial data. Once the data is populated, follow the next few steps to estimate monthly
costs based on the data collected.
Azure Migrate calculates monthly cost estimates based on data captured by the collector and service map. The
following steps will load the cost estimates:
1. Navigate to Azure Migrate Assessment in the portal.
2. In the project Overview page, select +Create assessment.
3. Click View all to review the assessment properties.
4. Create the group, and specify a group name.
5. Select the machines that you want to add to the group.
6. Click Create Assessment, to create the group and the assessment.
7. After the assessment is created, view it in Overview > Dashboard.
8. In the Assessment Details section of the portal navigation, select Cost details.
The resulting estimate, pictured below, identifies the monthly costs of compute and storage, which often represent
the largest portion of cloud costs.

Figure 1 - Image of the Cost Details view of an assessment in Azure Migrate.

Additional resources
Set up and review an assessment with Azure Migrate
For a more comprehensive plan on cost management across larger numbers of assets (infrastructure, apps, and
data), see the Cloud Adoption Framework governance model. In particular, guidance on the Cost Management
discipline and the Cost Management improvement in the governance guide for complex enterprises.
Optimize and transform
3 minutes to read • Edit Online

Now that you have migrated your services to Azure, the next phase includes reviewing the solution for possible
areas of optimization. This could include reviewing the design of the solution, right-sizing the services, and
analyzing costs.
This phase is also an opportunity to optimize your environment and perform possible transformations of the
environment. For example, you may have performed a "rehost" migration, and now that your services are running
on Azure you can revisit the solutions configuration or consumed services, and possibly perform some
"refactoring" to modernize and increase the functionality of your solution.
Right-size assets
Cost Management
All Azure services that provide a consumption-based cost model can be resized through the Azure portal, CLI, or
PowerShell. The first step in correctly sizing a service is to review its usage metrics. The Azure Monitor service
provides access to these metrics. You may need to configure the collection of the metrics for the service you are
analyzing, and allow an appropriate time to collect meaningful data based on your workload patterns.
1. Go to Monitor.
2. Select Metrics and configure the chart to show the metrics for the service to analyze.
G O TO
M O N I TO R

The following are some common services that you can resize.

Resize a Virtual Machine


Azure Migrate performs a right-sizing analysis as part of its premigration assessment phase, and virtual machines
migrated using this tool will likely already be sized based on your premigration requirements.
However, for virtual machines created or migrated using other methods, or in cases where your post-migration
virtual machine requirements need adjustment, you may want to further refine your virtual machine sizing.
1. Go to Virtual machines.
2. Select the desired virtual machine from the list.
3. Select Size and the desired new size from the list. You may need to adjust the filters to find the size you need.
4. Select Resize.
Note that resizing production virtual machines has the potential to cause service disruptions. Try to apply the
correct sizing for your VMs before you promote them to production.
G O TO V I R TU A L
M AC H INE S

Manage Reservations for Azure resources


Resize a Windows VM
Resize a Linux virtual machine using Azure CLI
Partners can use the Partner Center to review the usage.
Microsoft Azure VM sizing for maximum reservation usage
Resize a storage account
1. Go to Storage accounts.
2. Select the desired storage account.
3. Select Configure and adjust the properties of the storage account to match your requirements.
4. Select Save.
G O TO S TO R A G E
A C C O U N TS

Resize a SQL Database


1. Go to either SQL databases, or SQL servers and then select the server.
2. Select the desired database.
3. Select Configure and the desired new service tier size.
4. Select Apply.
G O TO S Q L
D A TA B A S E S
Secure and manage
4 minutes to read • Edit Online

After migrating your environment to Azure, it's important to consider the security and methods used to manage
the environment. Azure provides many features and capabilities to meet these needs in your solution.
Azure Monitor
Azure Service Health
Azure Advisor
Azure Security Center
Azure Backup
Azure Site Recovery
Azure Monitor maximizes the availability and performance of your applications by delivering a comprehensive
solution for collecting, analyzing, and acting on telemetry from your cloud and on-premises environments. It helps
you understand how your applications are performing and proactively identifies issues affecting them and the
resources they depend on.

Use and configure Azure Monitor


1. Go to Monitor in the Azure portal.
2. Select Metrics, Logs, or Service Health for overviews.
3. Select any of the relevant insights.
G O TO A Z U R E
M O N I TO R

Learn more
Azure Monitor overview.
2 minutes to read • Edit Online

Assistance
Obtain assistance during your journey to Azure
We know that getting the right support at the right time will accelerate your migration efforts. Review the
assistance avenues below to meet your needs.
Support Plans
Partners

Microsoft Support
Microsoft offers a basic support plan to all Azure customers. You have 24x7 access to billing and subscription
support, online self-help, documentation, whitepapers, and support forums.
If you need help from Microsoft Support while using Azure, follow these steps to create a support request:
1. Select Help + support in the Azure portal.
2. Select New support request to enter details about your issue and contact support.
1. Select Help + support.
2. Select New support request to enter details about your issue and contact support.
C R E A TE A S U P P O R T
REQU EST

To view your support requests, follow these steps:


1. Select Help + support in the Azure portal.
2. Select All support requests to view your support requests.
1. Select Help + support.
2. Select All support requests to view your support requests.
V I E W YO U R S U P P O R T
R E Q U E S TS

Need support engineer assistance for deep technical guidance?


1. Select Help + support in the Azure portal.
2. Select Support Plans to review the plans available to you.
1. Select Help + support.
2. Select Support Plans to review the plans available to you.
R E V IE W S U PPO R T
PL ANS

Online communities
The following online communities provide community-based support:
MSDN forums
Stack Overflow
Expanded scope for cloud migration
2 minutes to read • Edit Online

The Azure migration guide in the Cloud Adoption Framework is the suggested starting point for readers who
are interested in a rehost migration to Azure, also known as a "lift and shift" migration. That guide walks you
through a series of prerequisites, tools, and approaches to migrating virtual machines to the cloud.
While this guide is an effective baseline to familiarize you with this type of migration, it makes several
assumptions. Those assumptions align the guide with many of the Cloud Adoption Framework's readers by
providing a simplified approach to migrations. This section of the Cloud Adoption Framework addresses some
expanded scope migration scenarios, which help guide efforts when those assumptions don't apply.

Cloud migration expanded scope checklist


The following checklist outlines the common areas of complexity which could require the scope of the migration
to be expanded beyond the Azure migration guide.
Business driven scope expansion
Balance the portfolio: The cloud strategy team is interested in investing more heavily in migration
(rehosting existing workloads and applications with a minimum of modifications) or innovation (refactoring
or rebuilding those workloads and applications using modern cloud technology). Often, a balance between
the two priorities is the key to success. In this guide, the topic of balancing the cloud adoption portfolio is a
common one, addressed in each of the migrate processes.
Support global markets: The business operates in multiple geographic regions with disparate data
sovereignty requirements. To meet those requirements, additional considerations should be factored into the
prerequisite review and distribution of assets during migration.
Technology driven scope expansion
VMware migration: Migrating VMware hosts can accelerate the overall migration process. Each migrated
VMware host can move multiple workloads to the cloud using a lift and shift approach. After migration,
those VMs and workloads can stay in VMware or be migrated to modern cloud capabilities.
SQL Server migration: Migrating SQL Servers can accelerate the overall migration process. Each SQL
Server migrated can move multiple databases and services, potentially accelerating multiple workloads.
Multiple datacenters: Migrating multiple datacenters adds a lot of complexity. During the Assess, Migrate,
Optimization, and Manage processes, additional considerations are discussed to prepare for more complex
environments.
Data requirements exceed network capacity: Companies frequently choose to migrate to the cloud
because the capacity, speed, or stability of an existing datacenter is no longer satisfactory. Unfortunately,
those same constraints add complexity to the migration process, requiring additional planning during the
assessment and migration processes.
Governance or compliance strategy: When governance and compliance are vital to the success of a
migration, additional alignment between IT governance teams and the cloud adoption team is required.
If any of these complexities are present in your scenario, then this section of the Cloud Adoption Framework
will likely provide the type of guidance needed to properly align scope in the migration processes.
Each of these scenarios is addressed by the various articles in this section of the Cloud Adoption Framework.

Next steps
Browse the table of contents on the left to address specific needs or scope changes. Alternatively, the first scope
enhancement on the list, Balance the portfolio, is a good starting point when reviewing these scenarios.
Balance the portfolio
Balance the portfolio
9 minutes to read • Edit Online

Cloud adoption is a portfolio management effort, cleverly disguised as technical implementation. Like any
portfolio management exercise, balancing the portfolio is critical. At a strategic level, this means balancing
migration, innovation, and experimentation to get the most out of the cloud. When the cloud adoption effort leans
too far in one direction or another, complexity finds its way into the migration effort. This article will guide the
reader through approaches to achieve balance in the portfolio.

General scope expansion


This topic is strategic in nature. As such, the approach taken in this article is equally strategic. To ground the
strategy in data-driven decisions, this article assumes the reader has evaluated the existing digital estate (or is in
the process of doing so). The objective of this approach is to aid in evaluating workloads to ensure proper balance
across the portfolio through qualitative questions and portfolio refinement.
Document business outcomes
Before balancing the portfolio, it is important to document and share the business outcomes driving the cloud
migration effort. For a few examples of general business outcomes related to cloud migrations, see the Cloud
migration executive summary.
The following table can help document and share desired business outcomes. It's important to note that most
businesses are pursuing several outcomes at a time. The importance of this exercise is to clarify the outcomes that
are most directly related to the cloud migration effort:

PRIORITY FOR THIS


OUTCOME MEASURED BY GOAL TIME FRAME EFFORT

Reduce IT Costs Datacenter Budget Reduce by $2M 12 months #1

Datacenter Exit Exit from Datacenters 2 Datacenters 6 months #2

Increase business Improve time to Reduce deployment 2 years #3


agility market time by six months

Improve Customer Customer Satisfaction 10% improvement 12 months #4


Experience (CSAT)

IMPORTANT
The above table is a fictional example and should not used to set priorities. In many cases, this table could considered an
antipattern by placing cost savings above customer experiences.

The above table could accurately represent the priorities of the cloud strategy team and the cloud adoption team
overseeing a cloud migration. Due to short-term constraints, this team is placing a higher emphasis on IT cost
reduction and prioritizing a datacenter exit as a means to achieve the desired IT cost reductions. However, by
documenting the competing priorities in this table, the cloud adoption team is empowered to help the cloud
strategy team identify opportunities to better align implementation of the overarching portfolio strategy.
Move fast while maintaining balance
The guidance regarding incremental rationalization of the digital estate suggests an approach in which the
rationalization starts with an unbalanced position. The cloud strategy team should evaluate every workload for
compatibility with a rehost approach. Such an approach is suggested because it allows for the rapid evaluation of
a complex digital estate based on quantitative data. Making such an initial assumption allows the cloud adoption
team to engage quickly, reducing time to business outcomes. However, as stated in that article, qualitative
questions will provide the necessary balance in the portfolio. This article documents the process for creating the
promised balance.
Importance of sunset and retire decisions
The table in the documenting business outcomes section above misses a key outcome that would support the
number one objective of reducing IT costs. When IT costs reductions rank anywhere in the list of business
outcomes, it is important to consider the potential to sunset or retire workloads. In some scenarios, cost savings
can come from NOT migrating workloads that don't warrant a short-term investment. Some customers have
reported cost savings in excess of 20% total cost reductions by retiring underutilized workloads.
To balance the portfolio, better reflecting sunset and retire decisions, the cloud strategy team and the cloud
adoption team are encouraged to ask the following questions of each workload within assess and migrate
processes:
Has the workload been used by end users in the past six months?
Is end-user traffic consistent or growing?
Will this workload be required by the business 12 months from now?
If the answer to any of these questions is "No", then the workload could be a candidate for retirement. If
retirement potential is confirmed with the app owner, then it may not make sense to migrate the workload. This
prompts for a few qualification questions:
Can a retirement plan or sunset plan be established for this workload?
Can this workload be retired prior to the datacenter exit?
If the answer to both of these questions is "Yes", then it would be wise to consider not migrating the workload.
This approach would help meet the objectives of reducing costs and exiting the datacenter.
If the answer to either question is "No", it may be wise to establish a plan for hosting the workload until it can be
retired. This plan could include moving the assets to a lower-cost datacenter or alternative datacenter, which
would also accomplish the objectives of reducing costs and exiting one datacenter.

Suggested prerequisites
The prerequisites specified in the baseline guide should still be sufficient for addressing this complexity topic.
However, the asset inventory and digital estate should be highlighted and bolded among those prerequisites, as
that data will drive the following activities.

Assess process changes


Balancing the portfolio requires additional qualitative analysis during the assess process, which will help drive
simple portfolio rationalization.
Suggested action during the assess process
Based on the data from the table in the documenting business outcomes section above, there is a likely risk of the
portfolio leaning too far into a migration-focused execution model. If customer experience was top priority, an
innovation heavy portfolio would be more likely. Neither is right or wrong, but leaning too far in one direction
commonly results in diminishing returns, adds unnecessary complexity, and increases execution time related to
cloud adoption efforts.
To reduce complexity, you should follow a traditional approach to portfolio rationalization, but in an iterative
model. The following steps outline a qualitative model to such an approach:
The cloud strategy team maintains a prioritized backlog of workloads to be migrated.
The cloud strategy team and the cloud adoption team host a release planning meeting prior to the completion
of each release.
In the release planning meeting, the teams agree on the top 5 to 10 workloads in the prioritized backlog.
Outside of the release planning meeting, the cloud adoption team asks the following questions of application
owners and subject matter experts:
Could this application be replaced with a platform as a service (PaaS ) equivalent?
Is this application a third-party application?
Has budget been approved to invest in ongoing development of the application in the next 12 months?
Would additional development of this application improve the customer experience? Create a
competitive differentiator? Drive additional revenue for the business?
Will the data within this workload contribute to a downstream innovation related to BI, Machine
Learning, IoT, or related technologies?
Is the workload compatible with modern application platforms like Azure App Service?
The answers to the above questions and any other required qualitative analysis would then influence
adjustments to the prioritized backlog. These adjustments may include:
If a workload could be replaced with a PaaS solution, it may be removed from the migration backlog
entirely. At a minimum, additional due diligence to decide between rehost and replace would be added
as a task, temporarily reducing that workload's priority from the migration backlog.
If a workload is (or should be) undergoing development advancement, then it may best fit into a
refactor-rearchitect-rebuild model. Since innovation and migration require different technical skills,
applications that align to a refactor-rearchitect-rebuild approach should be managed through an
innovation backlog rather than a migration backlog.
If a workload is part of a downstream innovation, then it may make sense to refactor the data platform,
but leave the application layers as a rehost candidate. Minor refactoring of a workload's data platform
can often be addressed in a migration or an innovation backlog. This rationalization outcome may result
in more detailed work items in the backlog, but otherwise no change to priorities.
If a workload isn't strategic but is compatible with modern, cloud-based application hosting platforms,
then it may be wise to perform minor refactoring on the application to deploy it as a modern app. This
can contribute to the overall savings by reducing the overall IaaS and OS licensing requirements of the
cloud migration.
If a workload is a third-party application and that workload's data isn't planned for use in a downstream
innovation, then it may be best to leave as a rehost option on the backlog.
These questions shouldn't be the extent of the qualitative analysis completed for each workload, but they help
guide a conversation about addressing the complexity of an imbalanced portfolio.

Migrate process changes


During migration, portfolio balancing activities can have a negative impact on migration velocity (Speed at which
assets are migrated). The following guidance will expand on why and how to align work to avoid interruptions to
the migration effort.
Suggested action during the migrate process
Portfolio rationalization requires diversity of technical effort. It is tempting for cloud adoption teams to match that
portfolio diversity within migration efforts. Business stakeholders of ask for a single cloud adoption team to
address the entire migration backlog. This is seldom an advisable approach, in many cases this can be counter
productive.
These diverse efforts should be segmented across two or more cloud adoption teams. Using a two team model as
an example mode of execution, Team 1 is the Migration Team and Team 2 is the Innovation Team. For larger
efforts, these teams could be further segmented to address other approaches like Replace/PaaS efforts or Minor
Refactoring. The following outlines the skills and roles needed to Rehost, Refactor, or Minor Refactoring:
Rehost: Rehost requires team members to implement infrastructure focused changes. Generally using a tool like
Azure Site Recovery to migrate VMs or other assets to Azure. This work aligns well to datacenter admins or IT
implementors. The cloud migration team is well structured to deliver this work at high scale. This is the fastest
approach to migrate existing assets in most scenarios.
Refactor: Refactor requires team members to modify source code, change the architecture of an application, or
adopt new cloud services. Generally this effort would use development tools like Visual Studio and deployment
pipeline tools like Azure DevOps to redeploy modernized applications to Azure. This work aligns well to
application development roles or DevOps pipeline development roles. Cloud Innovation Team is best structured
to deliver this work. It can take longer to replace existing assets with cloud assets in this approach, but the apps
can take advantage of cloud-native features.
Minor Refactoring: Some applications can be modernized with minor refactoring at the data or application level.
This work requires team members to deploy data to cloud-based data platforms or to make minor configuration
changes to the application. This may require limited support for data or application development subject matter
experts. However, this work is similar to the work conducted by IT implementors when deploying third-party
apps. This work could easily align with the cloud migration team or the cloud strategy team. While this effort is
not nearly as fast as a rehost migration, it takes less time to execute than refactor efforts.
During migration, efforts should be segmented in the three ways listed above and executed by the appropriate
team in the appropriate iteration. While you should diversify the portfolio, also ensure that efforts stay very
focused and segregated.

Optimize and promote process changes


No additional changes are required during Optimize and promote processes within the Migration effort.

Secure and manage process changes


No additional changes are required during Secure and manage processes within the Migration effort.

Next steps
Return to the expanded scope checklist to ensure your migration method is fully aligned.
Expanded scope checklist
Skills readiness for cloud migration
2 minutes to read • Edit Online

During a cloud migration, it is likely that employees, as well as some incumbent systems integration partners or
managed services partners, will need to develop new skills to be effective during migration efforts.
There are four distinct processes that are completed iteratively during the "Migrate" phase of any migration
journey. The following sections align the necessary skills for each of those processes with references to two
prerequisites for skilling resources.

Prerequisites skilling resources


Implementation of "Migrate" processes will build on the skills acquired during "Plan" and "Ready" phases of the
migration journey.

Assess skilling resources


The following tools can aid the team in execution of assess activities:
Balance the portfolio: Ensure balance and proper investment allocations across an application portfolio.
Build a business justification: Create and understand the business justification driving the cloud migration
effort.
Rationalize the digital estate: Rationalize assets in the digital estate.
Application portfolio assessment: Criteria for making decisions regarding migration or innovation options
within the application portfolio.
Assessing and Planning Microsoft Azure Migration: PluralSight course to aid in assessing on-premises
workloads
During Assess processes, architects will be called upon to architect solutions for each workload. The following
skilling resources can prepare architects for these tasks:
Foundations for Cloud Architecture: PluralSight course to help architect the right foundational solutions
Microsoft Azure Architecture: PluralSight course to ground architects in Azure Architecture
Designing Migrations for Microsoft Azure: PluralSight course to help architects design a migration solution

Migrate skilling resources


The following tutorial can prepare the team for migration activities:
Migrate to Azure: Using Azure Site Recovery to migrate VMs to Azure.
Rehost workloads to Azure: PluralSight course that teaches viewers how to rehost workloads to Azure
Migrating Physical and Virtual Servers to Azure: PluralSight course for migrating servers to Azure
Import and Export Data to Azure: PluralSight course on the movement of data to and from Azure

Optimize and promote process changes


The following tools can help the team optimize resources and promote to production:
Cost and sizing: Adjust sizing to align costs and budgets.
Promote a workload: Change network configuration to reroute production users to migrated workloads.
Secure and manage process changes
The following tools can help the team find ways to secure and manage migrated assets:
Secure and manage workloads in Azure: Best practices for securing and managing workloads in Azure.

Next steps
Return to the expanded scope checklist to ensure your migration method is fully aligned.
Expanded scope checklist
Accelerate migration with VMware hosts
2 minutes to read • Edit Online

Migrating entire VMware hosts can move multiple workloads and several assets in a single migration effort. The
following guidance expands the scope of the Azure migration guide through a VMware host migration. Most of
the effort required in this scope expansion occurs during the prerequisites and migration processes of a migration
effort.

Suggested prerequisites
When migrating your first VMware host to Azure, you must meet a number of prerequisites to prepare identity,
network, and management requirements. After these prerequisites are met, each additional host should require
significantly less effort to migrate. The following sections provide more detail about the prerequisites.
Secure your Azure environment
Implement the appropriate cloud solution for role-based access control and network connectivity in your Azure
environment. The secure your environment guide can help with this implementation.
Private cloud management
There are two required tasks and one optional task to establish the private cloud management. Escalate private
cloud privileges and workload DNS and DHCP setup are each required best practices.
If the objective is to migrate workloads by using Layer 2 stretched networks, this third best practice is also
required.
Private cloud networking
After the management requirements are established, you can establish private cloud networking by using the
following best practices:
VPN connection to Private Cloud
On-premises network connection with ExpressRoute
Azure virtual network connection with ExpressRoute
Configure DNS name resolution
Integration with the cloud adoption plan
After you've met the other prerequisites, you should include each VMware host in the cloud adoption plan. Within
the cloud adoption plan, add each host to be migrated, as a distinct workload. Within each workload, add the VMs
to be migrated as assets. To add workloads and assets to the adoption plan in bulk, see adding/editing work items
with Excel.

Migrate process changes


During each iteration, the adoption team works through the backlog to migrate the highest priority workloads. The
process doesn't really change for VMware hosts. When the next workload on the backlog is a VMware host, the
only change will be the tool used.
You can use the following tools in the migration effort:
Native VMware tools
Azure Data Box
Alternatively, you can migrate workloads through a disaster recovery failover by using the following tools:
Back up workload virtual machines
Configure Private Cloud as disaster recovery site using Zerto
Configure Private Cloud as disaster recovery site using VMware SRM

Next steps
Return to the expanded scope checklist to ensure your migration method is fully aligned.
Expanded scope checklist
Accelerate migration by migrating multiple databases
or entire SQL Servers
9 minutes to read • Edit Online

Migrating entire SQL Server instances can accelerate workload migration efforts. The following guidance expands
the scope of the Azure migration guide by migrating an instance of SQL Server outside of a workload-focused
migration effort. This approach can seed the migration of multiple workloads with a single data-platform
migration. Most of the effort required in this scope expansion occurs during the prerequisites, assessment,
migration, and optimization processes of a migration effort.

Is this expanded scope right for you?


The approach recommended in the Azure migration guide is to migrate each data structure alongside associated
workloads as part of a single migration effort. The iterative approach to migration reduces discovery, assessment,
and other tasks that can create blockers and slow business value returns.
However, some data structures can be migrated more effectively through a separate data-platform migration. The
following are a few examples:
End of service: Quickly moving a SQL Server instance as an isolated iteration within a larger migration effort
can avoid end-of-service challenges. This guide will help integrate the migration of a SQL Server in the broader
migration process. However, if you are migrating/upgrading a SQL Server independent of any other cloud
adoption effort, the SQL Server End of Life overview or SQL Server migration documentation articles may
provide clearer guidance.
SQL Server services: The data structure is part of a broader solution that requires SQL Server running on a
virtual machine. This is common for solutions that use SQL Server services such as SQL Server Reporting
Services, SQL Server Integration Services, or SQL Server Analysis Services.
High density, low usage databases: The instance of SQL Server has a high density of databases. Each of
those databases has low transaction volumes, and requires little in the way of compute resources. You should
consider other, more modern solutions, but an infrastructure as a service (IaaS ) approach might result in
significantly reduced operating cost.
Total cost of ownership: When applicable, you can apply Azure Hybrid Benefits to the list price creating the
lowest cost of ownership for instances of SQL Server. This is especially common for customers who host SQL
Server in multicloud scenarios.
Migration accelerator: "Lift-and-shift" migration of a SQL Server instance can move several databases in one
iteration. This approach sometimes allows future iterations to focus more specifically on applications and VMs,
meaning that you can migrate more workloads in a single iteration.
VMware migration: A common on-premises architecture includes applications and VMs on a virtual host, and
databases on bare metal. In this scenario, you can migrate entire SQL Server instances to support your
migration of the VMware host to Azure VMware Service. For more information, see VMware host migration.
If none of the above criteria apply to this migration, it might be best to continue with the standard migration
process. In the standard process, data structures are migrated iteratively, alongside each workload.
If this guide aligns with your criteria, continue with this expanded scope guide as an effort within the standard
migration process. During the prerequisites phase, you can integrate the effort into the overall adoption plan.

Suggested prerequisites
Before performing a SQL Server migration, start with an expansion of the digital estate by including a data estate.
The data estate records an inventory of the data assets you're considering for migration. The following tables
outline an approach to recording the data estate.
Server inventory
The following is an example of a server inventory:

DATAB NUMBE
SQL PURPO VERSIO CRITICA SENSITI ASE CLUSTE R OF
SERVER SE N LITY VITY COUNT SSIS SSRS SSAS R NODES

sql-01 Core 2016 Missio Highly 40 N/A N/A N/A Yes 3


apps n- confide
critical ntial

sql-02 Core 2016 Missio Highly 40 N/A N/A N/A Yes 3


apps n- confide
critical ntial

sql-03 Core 2016 Missio Highly 40 N/A N/A N/A Yes 3


apps n- confide
critical ntial

sql-04 BI 2012 High XX 6 N/A Confide Yes - No 1


ntial multidi
mensio
nal
cube

sql-05 Integra 2008 Low Genera 20 Yes N/A N/A No 1


tion R2 l

Database inventory
The following is an example of a database inventory for one of the servers above:

DATA
MIGRATION
ASSISTANT DMA TARGET
SERVER DATABASE CRITICALITY SENSITIVITY (DMA) RESULTS REMEDIATION PLATFORM

sql-01 DB-1 Mission- Highly Compatible N/A Azure SQL


critical Confidential Database

sql-01 DB-2 High Confidential Schema Changes Azure SQL


change implemented Database
required

sql-01 DB-3 High General Compatible N/A Azure SQL


managed
instance

sql-01 DB-4 Low Highly Schema Changes Azure SQL


Confidential change scheduled managed
required instance

sql-01 DB-5 Mission- General Compatible N/A Azure SQL


critical managed
instance
DATA
MIGRATION
ASSISTANT DMA TARGET
SERVER DATABASE CRITICALITY SENSITIVITY (DMA) RESULTS REMEDIATION PLATFORM

sql-01 DB-6 High Confidential Compatible N/A Azure SQL


Database

Integration with the cloud adoption plan


After this discovery process is complete, you can include it in the cloud adoption plan. Within the cloud adoption
plan, add each SQL Server instance to be migrated as a distinct workload. Within each workload, the databases
and services (SSIS, SSAS, SSRS ) can each be added as assets. To add workloads and assets in bulk to the adoption
plan, see adding and editing work items with Excel.
After the workloads and assets are included in the plan, you and your team can continue with a standard migration
process by using the adoption plan. When the adoption team moves into the assessment, migration, and
optimization processes, factor in the changes discussed in the following sections.

Assessment process changes


If any database in the plan can be migrated to a platform as a service (PaaS ) data platform, use DMA to evaluate
the compatibility of the selected database. When the database requires schema conversions, you should complete
those conversions as part of the assessment process, to avoid disruptions to the migration pipeline.
Suggested action during the assessment process
For databases that can be migrated to a PaaS solution, the following actions are completed during the assessment
process.
Assess with DMA: Use Data Migration Assistant to detect compatibility issues that can affect database
functionality in your target Azure SQL Database managed instance. Use DMA to recommend performance and
reliability improvements, and to move the schema, data, and uncontained objects from your source server to
your target server. For more information, see Data Migration Assistant.
Remediate and convert: Based on the output of DMA, convert the source data schema to remediate
compatibility issues. Test the converted data schema with the dependent applications.

Migrate process changes


During migration, you can choose from among many different tools and approaches. But each approach follows a
simple process: migrate schema, data, and objects. Then sync data to the target data source.
The target and source of the data structure and services can make these two steps rather complicated. The
following sections help you understand the best tooling choice based on your migration decisions.
Suggested action during the migrate process
The suggested path for migration and synchronization uses a combination of the following three tools. The
following sections outline more complex migration and synchronization options that allow for a broader variety of
target and source solutions.

MIGRATION OPTION PURPOSE

Azure Database Migration Service Supports online (minimal downtime) and offline (one time)
migrations at scale to an Azure SQL Database managed
instance. Supports migration from: SQL Server 2005, SQL
Server 2008 and SQL Server 2008 R2, SQL Server 2012, SQL
Server 2014, SQL Server 2016, and SQL Server 2017.
MIGRATION OPTION PURPOSE

Transactional replication Transactional replication to an Azure SQL Database managed


instance is supported for migrations from: SQL Server 2012
(SP2 CU8, SP3, or later), SQL Server 2014 (RTM CU10 or later,
or SP1 CU3 or later), SQL Server 2016, SQL Server 2017.

Bulk load Use bulk load to an Azure SQL Database managed instance
for data stored in: SQL Server 2005, SQL Server 2008 and SQL
Server 2008 R2, SQL Server 2012, SQL Server 2014, SQL
Server 2016, and SQL Server 2017.

Guidance and tutorials for suggested migration process


Choosing the best guidance for migration by using Azure Database Migration Service is contingent on the source
and target platform of choice. The following table links to tutorials for each of the standard approaches for
migrating a SQL database by using Azure Database Migration Service.

SOURCE TARGET TOOL MIGRATION TYPE GUIDANCE

SQL Server Azure SQL Database Database Migration Offline Tutorial


Service

SQL Server Azure SQL Database Database Migration Online Tutorial


Service

SQL Server Azure SQL Database Database Migration Offline Tutorial


managed instance Service

SQL Server Azure SQL Database Database Migration Online Tutorial


managed instance Service

RDS SQL Server Azure SQL Database Database Migration Online Tutorial
(or managed instance) Service

Guidance and tutorials for various services to equivalent PaaS solutions


After moving databases from an instance of SQL Server to Azure Database Migration Service, the schema and
data can be rehosted in a number of PaaS solutions. However, other required services might still be running on
that server. The following three tutorials aid in moving SSIS, SSAS, and SSRS to equivalent PaaS services on
Azure.

SOURCE TARGET TOOL MIGRATION TYPE GUIDANCE

SQL Server Azure Data Factory Azure Data Factory Offline Tutorial
Integration Services integration runtime

SQL Server Analysis Azure Analysis SQL Server Data Tools Offline Tutorial
Services - tabular Services
model

SQL Server Reporting Power BI Report Power BI Offline Tutorial


Services Server

Guidance and tutorials for migration from SQL Server to an IaaS instance of SQL Server
After migrating databases and services to PaaS instances, you might still have data structures and services that are
not PaaS -compatible. When existing constraints prevent migrating data structures or services, the following
tutorial can help with migrating various assets in the data portfolio to Azure IaaS solutions.
Use this approach to migrate databases or other services on the instance of SQL Server.

SOURCE TARGET TOOL MIGRATION TYPE GUIDANCE

Single instance SQL SQL Server on IaaS Varied Offline Tutorial


Server

Optimization process changes


During optimization, you can test, optimize, and promote to production each data structure, service, or SQL Server
instance. This is the greatest impact of deviating from a per-workload migration model.
Ideally, you migrate the dependent workloads, applications, and VMs within the same iteration as the SQL Server
instance. When that ideal scenario occurs, you can test the workload along with the data source. After testing, you
can promote the data structure to production, and terminate the synchronization process.
Now let's consider the scenario in which there's a significant time gap between database migration and workload
migration. Unfortunately, this can be the biggest change to the optimization process during a non-workload-driven
migration. When you migrate multiple databases as part of a SQL Server migration, those databases might coexist
in both the cloud and on-premises, for multiple iterations. During that time, you need to maintain data
synchronization until those dependent assets are migrated, tested, and promoted.
Until all dependent workloads are promoted, you and your team are responsible for supporting the
synchronization of data from the source system to the target system. This synchronization consumes network
bandwidth, cloud costs, and most importantly, people's time. Proper alignment of the adoption plan across the
SQL Server migration workload, and all dependent workloads and applications, can reduce this costly overhead.
Suggested action during the optimization process
During optimization processes, complete the following tasks every iteration, until all data structures and services
have been promoted to production.
1. Validate synchronization of data.
2. Test any migrated applications.
3. Optimize the application and data structure to tune costs.
4. Promote the applications to production.
5. Test for continued on-premises traffic against the on-premises database.
6. Terminate the synchronization of any data promoted to production.
7. Terminate the original source database.
Until step 5 passes, you can't terminate databases and synchronization. Until all databases on an instance of SQL
Server have gone through all seven steps, you should treat the on-premises instance of SQL Server as production.
All synchronization should be maintained.

Next steps
Return to the expanded scope checklist to ensure your migration method is fully aligned.
Expanded scope checklist
Multiple datacenters
3 minutes to read • Edit Online

Often the scope of a migration involves the transition of multiple datacenters. The following guidance will expand
the scope of the Azure migration guide to address multiple datacenters.

General scope expansion


Most of this effort required in this scope expansion will occur during the prerequisites, assess, and optimization
processes of a migration.

Suggested prerequisites
Before beginning the migration, you should create epics within the project management tool to represent each
datacenter to be migrated. It is then important to understand the business outcomes and motivations, which are
justifying this migration. Those motivations can be used to prioritize the list of epics (or datacenters). For instance,
if migration is driven by a desire to exit datacenters before leases must be renewed, then each epic would be
prioritized based on lease renewal date.
Within each epic, the workloads to be assessed and migrated would be managed as features. Each asset within that
workload would be managed as a user story. The work required to assess, migrate, optimize, promote, secure, and
manage each asset would be represented as tasks for each asset.
Sprints or iterations would be then consist of a series of tasks required to migrate the assets and user stories
committed to by the cloud adoption team. Releases would then consist of one or more workloads or features to be
promoted to production.

Assess process changes


The biggest change to the assess process, when expanding scope to address multiple datacenters, is related to the
accurate recording and prioritization of workloads and dependencies across datacenters.
Suggested action during the assess process
Evaluate cross datacenter dependencies: The dependency visualization tools in Azure Migrate can help
pinpoint dependencies. Use of this tool set prior to migration is a good general best practice. However, when
dealing with global complexity it becomes a necessary step to the assessment process. Through dependency
grouping, the visualization can help identify the IP addresses and ports of any assets required to support the
workload.

IMPORTANT
Two important notes: First, a subject matter expert with an understanding of asset placement and IP address schemas is
required to identify assets that reside in a secondary datacenter. Second, it is important to evaluate both downstream
dependencies and clients in the visual to understand bidirectional dependencies.

Migrate process changes


Migrating multiple datacenters is similar to consolidating datacenters. After migration, the cloud becomes the
singular datacenter solution for multiple assets. The most likely scope expansion during the migration process is
the validation and alignment of IP addresses.
Suggested action during the migrate process
The following are activities that heavily affect the success of a cloud migration:
Evaluate network conflicts: When consolidating datacenters into a single cloud provider, there is a likelihood
of creating network, DNS, or other conflicts. During migration, it is important to test for conflicts to avoid
interruptions to production systems hosted in the cloud.
Update routing tables: Often, modifications to routing tables are required when consolidating networks or
datacenters.

Optimize and promote process changes


During optimization, additional testing may be required.
Suggested action during the optimize and promote process
Prior to promotion, it is important to provide additional levels of testing during this scope expansion. During
testing, it is important to test for routing or other network conflicts. Further, it is important to isolate the deployed
application and retest to validate that all dependencies have been migrated to the cloud. In this case, isolation
means separating the deployed environment from production networks. Doing so can catch overlooked assets that
are still running on-premises.

Secure and manage process changes


Secure and manage processes should be unchanged by this scope expansion.

Next steps
Return to the Expanded Scope Checklist to ensure your migration method is fully aligned.
Expanded scope checklist
Data requirements exceed network capacity during a
migration effort
5 minutes to read • Edit Online

In a cloud migration, assets are replicated and synchronized over the network between the existing datacenter and
the cloud. It is not uncommon for the existing data size requirements of various workloads to exceed network
capacity. In such a scenario, the process of migration can be radically slowed, or in some cases, stopped entirely.
The following guidance will expand the scope of the Azure migration guide to provide a solution that works
around network limitations.

General scope expansion


Most of this effort required in this scope expansion will occur during the prerequisites, assess, and migrate
processes of a migration.

Suggested prerequisites
Validate network capacity risks: Digital estate rationalization is a highly recommended prerequisite, especially if
there are concerns of overburdening the available network capacity. During digital estate rationalization, an
inventory of digital assets is collected. That inventory should include existing storage requirements across the
digital estate. As outlined in replication risks: Physics of replication, that inventory can be used to estimate total
migration data size, which can be compared to total available migration bandwidth. If that comparison
doesn't align with the required time to business change, then this article can help accelerate migration velocity
reducing the time required to migrate the datacenter.
Offline transfer of independent data stores: Pictured in the diagram below are examples of both online and
offline data transfers with Azure Data Box. These approaches could be used to ship large volumes of data to the
cloud prior to workload migration. In an offline data transfer, source data is copied to Azure Data Box, which is
then physically shipped to Microsoft for transfer into an Azure storage account as a file or a blob. This process can
be used to ship data that isn't directly tied to a specific workload, prior to other migration efforts. Doing so reduces
the amount of data that needs to be shipped over the network, in an effort to complete a migration within network
constraints.
This approach could be used to transfer data HDFS, backups, archives, File Servers, applications, etc… Existing
technical guidance explains how to use this approach to transfer data from an HDFS store or from disks using
SMB, NFS, REST, or data copy service to Data Box.
There are also third-party partner solutions that use Azure Data Box for a "Seed and Feed" migration, where a
large volume of data is moved via an offline transfer but is later synchronized at a lower scale over the network.
Assess process changes
If the storage requirements of a workload (or workloads) exceed network capacity, then Azure Data Box can still be
used in an offline data transfer.
Network transmission is the recommended approach unless the network is unavailable. The speed of transferring
data over the network, even when bandwidth is constrained, is typically faster than physically shipping the same
amount of data using an offline transfer mechanism such as Data Box.
If connectivity to Azure is available, an analysis should be conducted before using Data Box, especially if migration
of the workload is time sensitive. Data Box is only advisable when the time to transfer the necessary data exceeds
the time to populate, ship, and restore data using Data Box.
Suggested action during the assess process
Network Capacity Analysis: When workload-related data transfer requirements are at risk of exceeding network
capacity, the cloud adoption team would add an additional analysis task to the assess process, called network
capacity analysis. During this analysis, a member of the team with subject matter expertise regarding the local
network and network connectivity would estimate the amount of available network capacity and required data
transfer time. That available capacity would be compared to the storage requirements of all assets to be migrated
during the current release. If the storage requirements exceed the available bandwidth, then assets supporting the
workload would be selected for offline transfer.

IMPORTANT
At the conclusion of the analysis, the release plan may need to be updated to reflect the time required to ship, restore, and
synchronize the assets to be transferred offline.

Drift analysis: Each asset to be transferred offline should be analyzed for storage and configuration drift. Storage
drift is the amount of change in the underlying storage over time. Configuration drift is change in the configuration
of the asset over time. From the time the storage is copied to the time the asset is promoted to production, any
drift could be lost. If that drift needs to be reflected in the migrated asset, some form of synchronization would be
required, between the local asset and the migrated asset. This should be flagged for consideration during
migration execution.
Migrate process changes
When using offline transfer mechanisms, replication processes are not likely required. However, synchronization
processes may still be a requirement. Understanding the results of the drift analysis completed during the Assess
process will inform the tasks required during migration, if an asset is being transferred offline.
Suggested action during the migrate process
Copy storage: This approach could be used to transfer data HDFS, backups, archives, File Servers, applications,
etc… Existing technical guidance explains how to use this approach to transfer data from an HDFS store or from
disks using SMB, NFS, REST, or data copy service to Data Box.
There are also third-party partner solutions that use Azure Data Box for a "seed and sync" migration, where a large
volume of data is moved via an offline transfer but is later synchronized at a lower scale over the network.
Ship the device: Once the data is copied, the device can be shipped to Microsoft. Once received and imported,
the data is available in an Azure storage account.
Restore the asset: Verify the data is available in the storage account. Once verified, the data can be used as a blob
or in Azure Files. If the data is a VHD/VHDX file, the file can be converted managed disks. Those managed disks
can then be used to instantiate a virtual machine, which creates a replica of the original on-premises asset.
Synchronization: If synchronization of drift is a requirement for a migrated asset, one of the third-party partner
solutions could be used to synchronize the files until the asset is restored.

Optimize and promote process changes


Optimize activities are not likely affected by this change in scope.

Secure and manage process changes


Secure and manage activities are not likely affected by this change in scope.

Next steps
Return to the expanded scope checklist to ensure your migration method is fully aligned.
Expanded scope checklist
Governance or compliance strategy
4 minutes to read • Edit Online

When governance or compliance is required throughout a migration effort, additional scope is required. The
following guidance will expand the scope of the Azure migration guide to address different approaches to
addressing governance or compliance requirements.

General scope expansion


Prerequisite activities are affected the most when governance or compliance are required. Additional adjustments
may be required during assessment, migration, and optimization.

Suggested prerequisites
Configuration of the base Azure environment could change significantly when integrating governance or
compliance requirements. To understand how prerequisites change, it's important to understand the nature of the
requirements. Prior to beginning any migration that requires governance or compliance, an approach should be
chosen and implemented in the cloud environment. The following are a few high-level approaches commonly seen
during migrations:
Common governance approach: For most organizations, the Cloud Adoption Framework governance model is
a sufficient approach that consists of a minimum viable product (MVP ) implementation, followed by targeted
iterations of governance maturity to address tangible risks identified in the adoption plan. This approach provides
the minimum tooling needed to establish consistent governance, so the team can understand the tools. It then
expands on those tools to address common governance concerns.
ISO 27001 Compliance blueprints: For customer who are required to adhere to ISO compliance standards, the
ISO 27001 Shared Services blueprint samples can serve as a more effective MVP to produce richer governance
constraints earlier in the iterative process. The ISO 27001 App Service Environment/SQL Database Sample
expands on the blueprint to map controls and deploy a common architecture for an application environment. As
additional compliance blueprints are released, they will be referenced here as well.
Virtual Datacenter: A more robust governance starting point may be required. In such cases, consider the Azure
Virtual Datacenter (VDC ). This approach is commonly suggested during enterprise-scale adoption efforts, and
especially for efforts that exceed 10,000 assets. It is also the de facto choice for complex governance scenarios
when any of the following are required: extensive third-party compliance requirements, deep domain expertise, or
parity with mature IT governance policies and compliance requirements.
Partnership option to complete prerequisites
Microsoft Services: Microsoft Services provides solution offerings that can align to the Cloud Adoption
Framework governance model, compliance blueprints, or Virtual Datacenter options to ensure the most
appropriate governance or compliance model. Use the Secure Cloud Insights (SCI) solution offering to establish a
data-driven picture of a customer deployment in Azure and validate the customer´s Azure implementation
maturity while identifying optimization of existing deployment architectures, remove governance security and
availability risks. Based on customer insights, you should lead with the following approaches:
Cloud Foundation: Establish the customer's core Azure designs, patterns, and governance architecture with
the Hybrid Cloud Foundation (HCF ) solution offering. Map the customer's requirements to the most
appropriate reference architecture. Implement a minimum viable product consisting of Shared Services and
IaaS workloads.
Cloud Modernization: Use the Cloud Modernization solution offering as a comprehensive approach to move
applications, data, and infrastructure to an enterprise-ready cloud, as well as to optimize and modernize after
cloud deployment.
Innovate with Cloud: Engage customer through an innovative and unique cloud center of excellence (CCoE )
solution approach that builds a modern IT organization to enable agility at scale with DevOps while staying in
control. Implements an agile approach to capture business requirements, reuse deployment packages aligned
with security, compliance and service management policies, and maintains the Azure platform aligned with
operational procedures.

Assess process changes


During assessment, additional decisions are required to align to the required governance approach. The cloud
governance team should provide all members of the cloud adoption team with any policy statements, architectural
guidance, or governance/compliance requirements prior to the assessment of a workload.
Suggested action during the assess process
Governance and compliance assessment requirements are too customer-specific to provide general guidance on
the actual steps taken during assessment. However, the process should include tasks and time allocations for
"alignment to compliance/governance requirements". For additional understanding of these requirements, see the
following links:
For a deeper understanding of governance, review the Five Disciplines of Cloud Governance overview. This
section of the Cloud Adoption Framework also includes templates to document the policies, guidance, and
requirements for each of the five sections:
Cost Management
Security Baseline
[Resource Consistency]../../govern/resource-consistency/template.md)
[Identity Baseline]../../govern/identity-baseline/template.md)
Deployment Acceleration
For guidance on developing governance guidance based on the Cloud Adoption Framework governance model,
see Implementing a cloud governance strategy.

Optimize and promote process changes


During the optimization and promotion processes, the cloud governance team shluld invest time to test and
validate adherence to governance and compliance standards. Additionally, this step is a good time to inject
processes for the cloud governance team to curate templates that could provide additional deployment
acceleration for future projects.
Suggested action during the optimize and promote process
During this process, the project plan should include time allocations for the cloud governance team to execute a
compliance review for each workload planned for production promotion.

Next steps
As the final item on the expanded scope checklist, return to the checklist and reevaluate any additional scope
requirements for the migration effort.
Expanded scope checklist
Azure migration best practices
2 minutes to read • Edit Online

Azure provides several tools in Azure to help execute a migration effort. This section of the Cloud Adoption
Framework is designed to help readers implement those tools in alignment with best practices for migration.
These best practices are aligned to one of the processes within the Cloud Adoption Framework migration model
pictured below.
Expand any process in the table of contents on the left to see best practices typically required during that process.

NOTE
Digital estate planning and asset assessment represent two different levels of migration planning and assessment:
Digital estate planning: You plan or rationalize the digital estate during planning, to establish an overall migration
backlog. However, this plan is based on some assumptions and details that need to be validated before a workload can
be migrated.
Asset assessment: You assess a workload's individual assets before migration of the workload, to evaluate cloud
compatibility and understand architecture and sizing constraints. This process validates initial assumptions and provides
the details needed to migrate an individual asset.
Assess on-premises workloads for migration to Azure
21 minutes to read • Edit Online

This article shows how the fictional company Contoso assesses an on-premises app for migration to Azure. In the
example scenario, Contoso's on-premises SmartHotel360 app currently runs on VMware. Contoso assesses the
app's VMs using the Azure Migrate service, and the app's SQL Server database using Data Migration Assistant.

Overview
As Contoso considers migrating to Azure, the company needs a technical and financial assessment to determine
whether its on-premises workloads are good candidates for cloud migration. In particular, the Contoso team wants
to assess machine and database compatibility for migration. It wants to estimate capacity and costs for running
Contoso's resources in Azure.
To get started and to better understand the technologies involved, Contoso assesses two of its on-premises apps,
summarized in the following table. The company assesses for migration scenarios that rehost and refactor apps for
migration. Learn more about rehosting and refactoring in the migration examples overview.

APP NAME PLATFORM APP TIERS DETAILS

SmartHotel360 Runs on Windows with a Two-tiered app. The front- VMs are VMware, running
SQL Server database end ASP.NET website runs on an ESXi host managed by
(manages Contoso travel on one VM (WEBVM) and vCenter Server.
requirements) the SQL Server runs on
another VM (SQLVM). You can download the
sample app from GitHub.

osTicket Runs on Linux/Apache with Two-tiered app. A front-end The app is used by customer
MySQL PHP (LAMP) PHP website runs on one service apps to track issues
(Contoso service desk app) VM (OSTICKETWEB) and for internal employees and
the MySQL database runs external customers.
on another VM
(OSTICKETMYSQL). You can download the
sample from GitHub.

Current architecture
This diagram shows the current Contoso on-premises infrastructure:
Contoso has one main datacenter. The datacenter is located in the city of New York in the Eastern United States.
Contoso has three additional local branches across the United States.
The main datacenter is connected to the internet with a fiber Metro Ethernet connection (500 MBps).
Each branch is connected locally to the internet by using business-class connections with IPsec VPN tunnels
back to the main datacenter. The setup allows Contoso's entire network to be permanently connected and
optimizes internet connectivity.
The main datacenter is fully virtualized with VMware. Contoso has two ESXi 6.5 virtualization hosts that are
managed by vCenter Server 6.5.
Contoso uses Active Directory for identity management. Contoso uses DNS servers on the internal network.
The domain controllers in the datacenter run on VMware VMs. The domain controllers at local branches run on
physical servers.

Business drivers
Contoso's IT leadership team has worked closely with the company's business partners to understand what the
business wants to achieve with this migration:
Address business growth. Contoso is growing. As a result, pressure has increased on the company's on-
premises systems and infrastructure.
Increase efficiency. Contoso needs to remove unnecessary procedures and streamline processes for its
developers and users. The business needs IT to be fast and to not waste time or money, so the company can
deliver faster on customer requirements.
Increase agility. Contoso IT needs to be more responsive to the needs of the business. It must be able to react
faster than the changes that occur in the marketplace for the company to be successful in a global economy. IT
at Contoso must not get in the way or become a business blocker.
Scale. As the company's business grows successfully, Contoso IT must provide systems that can grow at the
same pace.

Assessment goals
The Contoso cloud team has identified goals for its migration assessments:
After migration, apps in Azure should have the same performance capabilities that apps have today in
Contoso's on-premises VMware environment. Moving to the cloud doesn't mean that app performance is less
critical.
Contoso needs to understand the compatibility of its applications and databases with Azure requirements.
Contoso also needs to understand its hosting options in Azure.
Contoso's database administration should be minimized after apps move to the cloud.
Contoso wants to understand not only its migration options, but also the costs associated with the
infrastructure after it moves to the cloud.

Assessment tools
Contoso uses Microsoft tools for its migration assessment. The tools align with the company's goals and should
provide Contoso with all the information it needs.

TECHNOLOGY DESCRIPTION COST

Data Migration Assistant Contoso uses Data Migration Assistant Data Migration Assistant is a free,
to assess and detect compatibility issues downloadable tool.
that might affect its database
functionality in Azure. Data Migration
Assistant assesses feature parity
between SQL sources and targets. It
recommends performance and reliability
improvements.

Azure Migrate Contoso uses the Azure Migrate service As of May 2018, Azure Migrate is a free
to assess its VMware VMs. Azure service.
Migrate assesses the migration
suitability of the machines. It provides
sizing and cost estimates for running in
Azure.

Service Map Azure Migrate uses Service Map to Service Map is part of Azure Monitor
show dependencies between machines logs. Currently, Contoso can use Service
that the company wants to migrate. Map for 180 days without incurring
charges.

In this scenario, Contoso downloads and runs Data Migration Assistant to assess the on-premises SQL Server
database for its travel app. Contoso uses Azure Migrate with dependency mapping to assess the app VMs before
migration to Azure.

Assessment architecture
Contoso is a fictional name that represents a typical enterprise organization.
Contoso has an on-premises datacenter (contoso-datacenter) and on-premises domain controllers
(CONTOSODC1, CONTOSODC2).
VMware VMs are located on VMware ESXi hosts running version 6.5 (contosohost1, contosohost2).
The VMware environment is managed by vCenter Server 6.5 (vcenter.contoso.com, running on a VM ).
The SmartHotel360 travel app has these characteristics:
The app is tiered across two VMware VMs (WEBVM and SQLVM ).
The VMs are located on VMware ESXi host contosohost1.contoso.com.
The VMs are running Windows Server 2008 R2 Datacenter with SP1.
The VMware environment is managed by vCenter Server (vcenter.contoso.com ) running on a VM.
The osTicket service desk app:
The app is tiered across two VMs (OSTICKETWEB and OSTICKETMYSQL ).
The VMs are running Ubuntu Linux Server 16.04-LTS.
OSTICKETWEB is running Apache 2 and PHP 7.0.
OSTICKETMYSQL is running MySQL 5.7.22.

Prerequisites
Contoso and other users must meet the following prerequisites for the assessment:
Owner or Contributor permissions for the Azure subscription, or for a resource group in the Azure subscription.
An on-premises vCenter Server instance running version 6.5, 6.0, or 5.5.
A read-only account in vCenter Server, or permissions to create one.
Permissions to create a VM on the vCenter Server instance by using an .ova template.
At least one ESXi host running version 5.5 or later.
At least two on-premises VMware VMs, one running a SQL Server database.
Permissions to install Azure Migrate agents on each VM.
The VMs should have direct internet connectivity.
You can restrict internet access to the required URLs.
If your VMs don't have internet connectivity, the Azure Log Analytics Gateway must be installed on them,
and agent traffic directed through it.
The FQDN of the VM running the SQL Server instance, for database assessment.
Windows Firewall running on the SQL Server VM should allow external connections on TCP port 1433
(default). This setup allows Data Migration Assistant to connect.

Assessment overview
Here's how Contoso performs its assessment:
Step 1: Download and install Data Migration Assistant. Contoso prepares Data Migration Assistant for
assessment of the on-premises SQL Server database.
Step 2: Assess the database by using Data Migration Assistant. Contoso runs and analyzes the database
assessment.
Step 3: Prepare for VM assessment by using Azure Migrate. Contoso sets up on-premises accounts and
adjusts VMware settings.
Step 4: Discover on-premises VMs by using Azure Migrate. Contoso creates an Azure Migrate collector
VM. Then, Contoso runs the collector to discover VMs for assessment.
Step 5: Prepare for dependency analysis by using Azure Migrate. Contoso installs Azure Migrate agents
on the VMs, so the company can see dependency mapping between VMs.
Step 6: Assess the VMs by using Azure Migrate. Contoso checks dependencies, groups the VMs, and runs
the assessment. When the assessment is ready, Contoso analyzes the assessment in preparation for migration.

> [!NOTE]
> Assessments shouldn't just be limited to using tooling to discover information about your environment, you
should schedule in time to speak to business owners, end users, other members within the IT department, etc in
order to get a full picture of what is happening within the environment and understand things tooling cannot
tell you.

Step 1: Download and install Data Migration Assistant


1. Contoso downloads Data Migration Assistant from the Microsoft Download Center.
Data Migration Assistant can be installed on any machine that can connect to the SQL Server instance.
Contoso doesn't need to run it on the SQL Server machine.
Data Migration Assistant shouldn't be run on the SQL Server host machine.
2. Contoso runs the downloaded setup file (DownloadMigrationAssistant.msi) to begin the installation.
3. On the Finish page, Contoso selects Launch Microsoft Data Migration Assistant before finishing the
wizard.

Step 2: Run and analyze the database assessment for SmartHotel360


Now, Contoso can run an assessment to analyze its on-premises SQL Server database for the SmartHotel360 app.
1. In Data Migration Assistant, Contoso selects New > Assessment, and then gives the assessment a project
name.
2. For Source server type, Contoso selects SQL Server and for Target Server type, Contoso selects SQL
Server on Azure Virtual Machines
NOTE
Currently, Data Migration Assistant doesn't support assessment for migrating to an Azure SQL Database Managed
Instance. As a workaround, Contoso uses SQL Server on an Azure VM as the supposed target for the assessment.

3. In Select Target Version, Contoso selects SQL Server 2017 as the target version. Contoso needs to select
this version because it's the version that's used by the SQL Database Managed Instance.
4. Contoso selects reports to help it discover information about compatibility and new features:
Compatibility issues note changes that might break migration or that require a minor adjustment
before migration. This report keeps Contoso informed about any features currently in use that are
deprecated. Issues are organized by compatibility level.
New feature recommendation notes new features in the target SQL Server platform that can be used
for the database after migration. New feature recommendations are organized under the headings
Performance, Security, and Storage.
5. In Connect to a server, Contoso enters the name of the VM that's running the database and credentials to
access it. Contoso selects Trust server certificate to make sure the VM can access SQL Server. Then,
Contoso selects Connect.

6. In Add source, Contoso adds the database it wants to assess, and then selects Next to start the
assessment.
7. The assessment is created.

8. In Review results, Contoso views the assessment results.


Analyze the database assessment
Results are displayed as soon as they're available. If Contoso fixes issues, it must select Restart assessment to
rerun the assessment.
1. In the Compatibility issues report, Contoso checks for any issues at each compatibility level. Compatibility
levels map to SQL Server versions as follows:
100: SQL Server 2008/Azure SQL Database
110: SQL Server 2012/Azure SQL Database
120: SQL Server 2014/Azure SQL Database
130: SQL Server 2016/Azure SQL Database
140: SQL Server 2017/Azure SQL Database

2. In the Feature recommendations report, Contoso views performance, security, and storage features that
the assessment recommends after migration. A variety of features are recommended, including In-Memory
OLTP, columnstore indexes, Stretch Database, Always Encrypted, dynamic data masking, and transparent
data encryption.
NOTE
Contoso should enable transparent data encryption for all SQL Server databases. This is even more critical when a
database is in the cloud than when it's hosted on-premises. Transparent data encryption should be enabled only after
migration. If transparent data encryption is already enabled, Contoso must move the certificate or asymmetric key to
the master database of the target server. Learn how to move a transparent data encryption-protected database to
another SQL Server instance.

3. Contoso can export the assessment in JSON or CSV format.

NOTE
For large-scale assessments:
Run multiple assessments concurrently and view the state of the assessments on the All assessments page.
Consolidate assessments into a SQL Server database.
Consolidate assessments into a Power BI report.

Step 3: Prepare for VM assessment by using Azure Migrate


Contoso needs to create a VMware account that Azure Migrate can use to automatically discover VMs for
assessment, verify rights to create a VM, note the ports that need to be opened, and set the statistics settings level.
Set up a VMware account
VM discovery requires a read-only account in vCenter Server that has the following properties:
User type: At least a read-only user.
Permissions: For the datacenter object, select the Propagate to Child Objects checkbox. For Role, select
Read-only.
Details: The user is assigned at the datacenter level, with access to all objects in the datacenter.
To restrict access, assign the No access role with the Propagate to child object to the child objects (vSphere
hosts, datastores, VMs, and networks).
Verify permissions to create a VM
Contoso verifies that it has permissions to create a VM by importing a file in .ova format. Learn how to create and
assign a role with privileges.
Verify ports
The Contoso assessment uses dependency mapping. Dependency mapping requires an agent to be installed on
VMs that will be assessed. The agent must be able to connect to Azure from TCP port 443 on each VM. Learn
about connection requirements.

Step 4: Discover VMs


To discover VMs, Contoso creates an Azure Migrate project. Contoso downloads and sets up the collector VM.
Then, Contoso runs the collector to discover its on-premises VMs.
Create a project
Set up a new Azure Migrate project as follows.
1. In the Azure portal > All services, search for Azure Migrate.
2. Under Services, select Azure Migrate.
3. In Overview, under Discover, assess and migrate servers, click Assess and migrate servers.

4. In Getting started, click Add tools.


5. In Migrate project, select your Azure subscription, and create a resource group if you don't have one.
6. In *Project Details, specify the project name, and the geography in which you want to create the project.
United States, Asia, Europe, Australia, United Kingdom, Canada, India and Japan are supported.
The project geography is used only to store the metadata gathered from on-premises VMs.
You can select any target region when you run a migration.
7. Click Next.
8. In Select assessment tool, select Azure Migrate: Server Assessment > Next.

9. In Select migration tool, select Skip adding a migration tool for now > Next.
10. In Review + add tools, review the settings, and click Add tools.
11. Wait a few minutes for the Azure Migrate project to deploy. You'll be taken to the project page. If you don't
see the project, you can access it from Servers in the Azure Migrate dashboard.
Download the collector appliance
1. In Migration Goals > Servers > Azure Migrate: Server Assessment, click Discover.
2. In Discover machines > Are your machines virtualized?, click Yes, with VMware vSphere
hypervisor.
3. Click Download to download the .OVA template file.

Verify the collector appliance


Before deploying the VM, Contoso checks that the OVA file is secure:
1. On the machine on which the file was downloaded, Contoso opens an administrator Command Prompt
window.
2. Contoso runs the following command to generate the hash for the OVA file:
C:\>CertUtil -HashFile <file_location> [Hashing Algorithm]

Example:
C:\>CertUtil -HashFile C:\AzureMigrate\AzureMigrate.ova SHA256

3. The generated hash should match the hash values listed in the Verify security section of the Assess VMware
VMs for migration tutorial.
Create the collector appliance
Now, Contoso can import the downloaded file to the vCenter Server instance and provision the collector appliance
VM:
1. In the vSphere Client console, Contoso selects File > Deploy OVF Template.
2. In the Deploy OVF Template Wizard, Contoso selects Source, and then specifies the location of the OVA file.
3. In Name and Location, Contoso specifies a display name for the collector VM. Then, it selects the
inventory location in which to host the VM. Contoso also specifies the host or cluster on which to run the
collector appliance.
4. In Storage, Contoso specifies the storage location. In Disk Format, Contoso selects how it wants to
provision the storage.
5. In Network Mapping, Contoso specifies the network in which to connect the collector VM. The network
needs internet connectivity to send metadata to Azure.
6. Contoso reviews the settings, and then selects Power on after deployment > Finish. A message that
confirms successful completion appears when the appliance is created.
Run the collector to discover VMs
Now, Contoso runs the collector to discover VMs. Currently, the collector currently supports only English (United
States) as the operating system language and collector interface language.
1. In the vSphere Client console, Contoso selects Open Console. Contoso specifies the accepts the licensing
terms, and password preferences for the collector VM.
2. On the desktop, Contoso selects the Microsoft Azure Appliance Configuration Manager shortcut.
3. In Azure Migrate Collector, Contoso selects Set up prerequisites. Contoso accepts the license terms and
reads the third-party information.
4. The collector checks that the VM has internet access, that the time is synced, and that the collector service is
running. (The collector service is installed by default on the VM.) Contoso also installs the VMware vSphere
Virtual Disk Development Kit.

NOTE
It's assumed that the VM has direct access to the internet without using a proxy.

5. Login to you Azure account and select the subscription and Migrate project you created earlier. Also enter a
name for the appliance so you can identify it in the Azure portal.
6. In Specify vCenter Server details, Contoso enters the name (FQDN ) or IP address of the vCenter Server
instance and the read-only credentials used for discovery.
7. Contoso selects a scope for VM discovery. The collector can discover only VMs that are within the specified
scope. The scope can be set to a specific folder, datacenter, or cluster.

8. The collector will now start to discovery and collect information about the Contoso environment.

Verify VMs in the portal


When collection is finished, Contoso checks that the VMs appear in the portal:
1. In the Azure Migrate project, Contoso selects Servers > Discovered Servers. Contoso checks that the VMs
that it wants to discover are shown.
2. Currently, the machines don't have the Azure Migrate agents installed. Contoso must install the agents to
view dependencies.

Step 5: Prepare for dependency analysis


To view dependencies between VMs that it wants to assess, Contoso downloads and installs agents on the app
VMs. Contoso installs agents on all VMs for its apps, both for Windows and Linux.
Take a snapshot
To keep a copy of the VMs before modifying them, Contoso takes a snapshot before the agents are installed.
Download and install the VM agents
1. In Machines, Contoso selects the machine. In the Dependencies column, Contoso selects Requires
installation.
2. In the Discover machines pane, Contoso:
Downloads the Microsoft Monitoring Agent (MMA) and the Microsoft Dependency agent for each
Windows VM.
Downloads the MMA and Dependency agent for each Linux VM.
3. Contoso copies the workspace ID and key. Contoso needs the workspace ID and key when it installs the
MMA.

Install the agents on Windows VMs


Contoso runs the installation on each VM.
Install the MMA on Windows VMs
1. Contoso double-clicks the downloaded agent.
2. In Destination Folder, Contoso keeps the default installation folder, and then selects Next.
3. In Agent Setup Options, Contoso selects Connect the agent to Azure Log Analytics > Next.

4. In Azure Log Analytics, Contoso pastes the workspace ID and key that it copied from the portal.

5. In Ready to Install, Contoso installs the MMA.


Install the Dependency agent on Windows VMs
1. Contoso double-clicks the downloaded Dependency agent.
2. Contoso accepts the license terms and waits for the installation to finish.
Install the agents on Linux VMs
Contoso runs the installation on each VM.
Install the MMA on Linux VMs
1. Contoso installs the Python ctypes library on each VM by using the following command:
sudo apt-get install python-ctypeslib

2. Contoso must run the command to install the MMA agent as root. To become root, Contoso runs the
following command, and then enters the root password:
sudo -i

3. Contoso installs the MMA:


Contoso enters the workspace ID and key in the command.
Commands are for 64-bit.
The workspace ID and primary key are located in the Log Analytics workspace in the Azure portal. Select
Settings, and then select the Connected Sources tab.
Run the following commands to download the Log Analytics agent, validate the checksum, and install
and onboard the agent:

wget https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-
Linux/master/installer/scripts/onboard_agent.sh && sh onboard_agent.sh -w 6b7fcaff-7efb-4356-ae06-
516cacf5e25d -s k7gAMAw5Bk8pFVUTZKmk2lG4eUciswzWfYLDTxGcD8pcyc4oT8c6ZRgsMy3MmsQSHuSOcmBUsCjoRiG2x9A8Mg==

Install the Dependency Agent on Linux VMs


After the MMA is installed, Contoso installs the Dependency agent on the Linux VMs:
1. The Dependency agent is installed on Linux computers by using InstallDependencyAgent-Linux64.bin, a
shell script that has a self-extracting binary. Contoso runs the file by using sh, or it adds execute permissions
to the file itself.
2. Contoso installs the Linux Dependency agent as root:
wget --content-disposition https://aka.ms/dependencyagentlinux -O InstallDependencyAgent-Linux64.bin &&
sudo sh InstallDependencyAgent-Linux64.bin -s

Step 6: Run and analyze the VM assessment


Contoso can now verify machine dependencies and create a group. Then, it runs the assessment for the group.
Verify dependencies and create a group
1. To determine which machines to analyze, Contoso selects View Dependencies.

2. For SQLVM, the dependency map shows the following details:


Process groups or processes that have active network connections running on SQLVM during the
specified time period (an hour, by default).
Inbound (client) and outbound (server) TCP connections to and from all dependent machines.
Dependent machines that have the Azure Migrate agents installed are shown as separate boxes.
Machines that don't have the agents installed show port and IP address information.
3. For machines that have the agent installed (WEBVM ), Contoso selects the machine box to view more
information. The information includes the FQDN, operating system, and MAC address.
4. Contoso selects the VMs to add to the group (SQLVM and WEBVM ). Contoso holds the Ctrl key while
clicking to select multiple VMs.
5. Contoso selects Create Group, and then enters a name (smarthotelapp).

NOTE
To view more granular dependencies, you can expand the time range. You can select a specific duration or select start
and end dates.

Run an assessment
1. In Groups, Contoso opens the group (smarthotelapp), and then selects Create assessment.

2. To view the assessment, Contoso selects Manage > Assessments.


Contoso uses the default assessment settings, but you can customize settings.
Analyze the VM assessment
An Azure Migrate assessment includes information about the compatibility of on-premises with Azure, suggested
right-sizing for Azure VM, and estimated monthly Azure costs.

Review confidence rating

An assessment has a confidence rating of from 1 star to 5 stars (1 star is the lowest and 5 stars is the highest).
The confidence rating is assigned to an assessment based on the availability of data points that are needed
to compute the assessment.
The rating helps you estimate the reliability of the size recommendations that are provided by Azure
Migrate.
The confidence rating is useful when you are doing performance-based sizing. Azure Migrate might not
have enough data points for utilization-based sizing. For as on-premises sizing, the confidence rating is
always 5 stars because Azure Migrate has all the data points it needs to size the VM.
Depending on the percentage of data points available, the confidence rating for the assessment is provided:

AVAILABILITY OF DATA POINTS CONFIDENCE RATING

0%-20% 1 star

21%-40% 2 stars

41%-60% 3 stars
AVAILABILITY OF DATA POINTS CONFIDENCE RATING

61%-80% 4 stars

81%-100% 5 stars

Verify Azure readiness

The assessment report shows the information that's summarized in the table. To show performance-based sizing,
Azure Migrate needs the following information. If the information can't be collected, sizing assessment might not
be accurate.
Utilization data for CPU and memory.
Read/write IOPS and throughput for each disk attached to the VM.
Network in/out information for each network adapter attached to the VM.

SETTING INDICATION DETAILS

Azure VM readiness Indicates whether the VM is ready for Possible states:


migration.
- Ready for Azure

- Ready with conditions

- Not ready for Azure

- Readiness unknown

If a VM isn't ready, Azure Migrate shows


some remediation steps.

Azure VM size For ready VMs, Azure Migrate provides Sizing recommendation depends on
an Azure VM size recommendation. assessment properties:

- If you used performance-based sizing,


sizing considers the performance
history of the VMs.

- If you used as on-premises, sizing is


based on the on-premises VM size and
utilization data isn't used.
SETTING INDICATION DETAILS

Suggested tool Because Azure machines are running


the agents, Azure Migrate looks at the
processes that are running inside the
machine. It identifies whether the
machine is a database machine.

VM information The report shows settings for the on-


premises VM, including operating
system, boot type, and disk and storage
information.

Review monthly cost estimates


This view shows the total compute and storage cost of running the VMs in Azure. It also shows details for each
machine.

Cost estimates are calculated by using the size recommendations for a machine.
Estimated monthly costs for compute and storage are aggregated for all VMs in the group.

Clean up after assessment


When the assessment finishes, Contoso retains the Azure Migrate appliance to use in future evaluations.
Contoso turns off the VMware VM. Contoso will use it again when it evaluates additional VMs.
Contoso keeps the Contoso Migration project in Azure. The project currently is deployed in the
ContosoFailoverRG resource group in the East US Azure region.
The collector VM has a 180-day evaluation license. If this limit expires, Contoso will need to download the
collector and set it up again.

Conclusion
In this scenario, Contoso assesses its SmartHotel360 app database by using the Data Migration Assessment tool. It
assesses the on-premises VMs by using the Azure Migrate service. Contoso reviews the assessments to make sure
that on-premises resources are ready for migration to Azure.

Next steps
After Contoso assesses this workload as a potential migration candidate, it can begin preparing its on-premises
infrastructure and its Azure infrastructure for migration. See the deploy Azure infrastructure article in the Cloud
Adoption Framework migrate best practices section for an example of how Contoso performs these processes.
Best practices to set up networking for workloads
migrated to Azure
27 minutes to read • Edit Online

As you plan and design for migration, in addition to the migration itself, one of the most critical steps is the design
and implementation of Azure networking. This article describes best practices for networking when migrating to
IaaS and PaaS implementations in Azure.

IMPORTANT
The best practices and opinions described in this article are based on the Azure platform and service features available at the
time of writing. Features and capabilities change over time. Not all recommendations might be applicable for your
deployment, so select those that work for you.

Design virtual networks


Azure provides virtual networks (VNets):
Azure resources communicate privately, directly, and securely with each other over VNets.
You can configure endpoint connections on VNets for VMs and services that require internet communication.
A VNet is a logical isolation of the Azure cloud that's dedicated to your subscription.
You can implement multiple VNets within each Azure subscription and Azure region.
Each VNet is isolated from other VNets.
VNets can contain private and public IP addresses defined in RFC 1918, expressed in CIDR notation. Public IP
addresses specified in a VNet's address space are not directly accessible from the internet.
VNets can connect to each other using VNet peering. Connected VNets can be in the same or different regions.
Thus resources in one VNet can connect to resources in other VNets.
By default, Azure routes traffic between subnets within a VNet, connected VNets, on-premises networks, and
the internet.
When planning your VNet topology, you should consider how to arrange IP address spaces, how to implement a
hub and spoke network, how to segment VNets into subnets, setting up DNS, and implementing Azure
availability zones.

Best practice: Plan IP addressing


When you create VNets as part of your migration, it's important to plan out your VNet IP address space.
You should assign an address space that isn't larger than a CIDR range of /16 for each VNet. VNets allow for
the use of 65536 IP addresses, and assigning a smaller prefix than /16 would result in the loss of IP addresses.
It's important not to waste IP addresses, even if they're in the private ranges defined by RFC 1918.
The VNet address space shouldn't overlap with on-premises network ranges.
Network Address Translation (NAT) shouldn't be used.
Overlapping addresses can cause networks that can't be connected and routing that doesn't work properly. If
networks overlap, you'll need to redesign the network or use network address translation (NAT).
Learn more:
Get an overview of Azure VNets.
Read the networking FAQ.
Learn about networking limitations.

Best practice: Implement a hub and spoke network topology


A hub and spoke network topology isolates workloads while sharing services such as identity and security.
The hub is an Azure VNet that acts as a central point of connectivity.
The spokes are VNets that connect to the hub VNet using VNet peering.
Shared services are deployed in the hub, while individual workloads are deployed as spokes.
Consider the following:
Implementing a hub and spoke topology in Azure centralizes common services such as connections to on-
premises networks, firewalls, and isolation between VNets. The hub VNet provides a central point of
connectivity to on-premises networks, and a place to host services use by workloads hosted in spoke VNets.
A hub and spoke configuration is typically used by larger enterprises. Smaller networks might consider a
simpler design to save on costs and complexity.
Spoke VNets can be used to isolate workloads, with each spoke managed separately from other spokes. Each
workload can include multiple tiers, and multiple subnets that are connected with Azure load balancers.
Hub and spoke VNets can be implemented in different resource groups, and even in different subscriptions.
When you peer virtual networks in different subscriptions, the subscriptions can be associated to the same, or
different, Azure Active Directory (Azure AD ) tenants. This allows for decentralized management of each
workload, while sharing services maintained in the hub network.

Hub and spoke topology


Learn more:
Read about a hub and spoke topology.
Get network recommendations for running Azure Windows and Linux VMs.
Learn about VNet peering.

Best practice: Design subnets


To provide isolation within a VNet, you segment it into one or more subnets, and allocate a portion of the VNet's
address space to each subnet.
You can create multiple subnets within each VNet.
By default, Azure routes network traffic between all subnets in a VNet.
Your subnet decisions are based on your technical and organizational requirements.
You create subnets using CIDR notation.
When deciding on network range for subnets, it's important to note that Azure retains five IP addresses from
each subnet that can't be used. For example, if you create the smallest available subnet of /29 (with eight IP
addresses), Azure will retain five addresses, so you only have three usable addresses that can be assigned to
hosts on the subnet.
For most cases, use /28 as the smallest subnet.
Example:
The table shows an example of a VNet with an address space of 10.245.16.0/20 segmented into subnets, for a
planned migration.

SUBNET CIDR ADDRESSES USE

DEV-FE-EUS2 10.245.16.0/22 1019 Front-end/web tier VMs

DEV-APP-EUS2 10.245.20.0/22 1019 App-tier VMs

DEV-DB-EUS2 10.245.24.0/23 507 Database VMs

Learn more:
Learn about designing subnets.
Learn how a fictional company (Contoso) prepared their networking infrastructure for migration.

Best practice: Set up a DNS server


Azure adds a DNS server by default when you deploy a VNet. This allows you to rapidly build VNets and deploy
resources. However, this DNS server only provides services to the resources on that VNet. If you want to connect
multiple VNets together, or connect to an on-premises server from VNets, you need additional name resolution
capabilities. For example, you might need Active Directory to resolve DNS names between virtual networks. To do
this, you deploy your own custom DNS server in Azure.
DNS servers in a VNet can forward DNS queries to the recursive resolvers in Azure. This enables you to
resolve host names within that VNet. For example, a domain controller running in Azure can respond to
DNS queries for its own domains, and forward all other queries to Azure.
DNS forwarding allows VMs to see both your on-premises resources (via the domain controller) and
Azure-provided host names (using the forwarder). Access to the recursive resolvers in Azure is provided
using the virtual IP address 168.63.129.16.
DNS forwarding also enables DNS resolution between VNets, and allows on-premises machines to resolve
host names provided by Azure.
To resolve a VM host name, the DNS server VM must reside in the same VNet, and be configured to
forward host name queries to Azure.
Because the DNS suffix is different in each VNet, you can use conditional forwarding rules to send DNS
queries to the correct VNet for resolution.
When you use your own DNS servers, you can specify multiple DNS servers for each VNet. You can also
specify multiple DNS servers per network interface (for Azure Resource Manager), or per cloud service (for
the classic deployment model).
DNS servers specified for a network interface or cloud service take precedence over DNS servers specified
for the VNet.
In the Azure Resource Manager deployment model, you can specify DNS servers for a VNet and a network
interface, but the best practice is to use the setting only on VNets.

DNS servers for VNet


Learn more:
Learn about name resolution when you use your own DNS server.
Learn about DNS naming rules and restrictions.

Best practice: Set up availability zones


Availability zones increase high-availability to protect your apps and data from datacenter failures.
Availability Zones are unique physical locations within an Azure region.
Each zone is made up of one or more datacenters equipped with independent power, cooling, and
networking.
To ensure resiliency, there's a minimum of three separate zones in all enabled regions.
The physical separation of availability zones within a region protects applications and data from datacenter
failures.
Zone-redundant services replicate your applications and data across availability zones to protect from
single points of failure. - - With availability zones, Azure offers an SLA of 99.99% VM uptime.
Availability zone
You can plan and build high-availability into your migration architecture by colocating compute, storage,
networking, and data resources within a zone, and replicating them in other zones. Azure services that
support availability zones fall into two categories:
Zonal services: You associate a resource with a specific zone. For example VMs, managed disks, IP
addresses).
Zone-redundant services: The resource replicates automatically across zones. For example, zone-
redundant storage, Azure SQL Database.
You can deploy a standard Azure load balanced with internet-facing workloads or app tiers, to provide
zonal fault tolerance.
Load balancer
Learn more:
Get an overview of availability zones.

Design hybrid cloud networking


For a successful migration, it's critical to connect on-premises corporate networks to Azure. This creates an
always-on connection known as a hybrid-cloud network, where services are provided from the Azure cloud to
corporate users. There are two options for creating this type of network:
Site-to-site VPN: You establish a site-to-site connection between your compatible on-premises VPN device
and an Azure VPN gateway that's deployed in a VNet. Any authorized on-premises resource can access VNets.
Site-to-site communications are sent through an encrypted tunnel over the internet.
Azure ExpressRoute: You establish an Azure ExpressRoute connection between your on-premises network
and Azure, through an ExpressRoute partner. This connection is private, and traffic doesn't go over the internet.
Learn more:
Learn more about hybrid-cloud networking.

Best practice: Implement a highly available site-to-site VPN


To implement a site-to-site VPN, you set up a VPN gateway in Azure.
A VPN gateway is a specific type of VNet gateway that sends encrypted traffic between an Azure VNet and an
on-premises location over the public internet.
A VPN gateway can also send encrypted traffic between Azure VNets over the Microsoft network.
Each VNet can have only one VPN gateway.
You can create multiple connections to the same VPN gateway. When you create multiple connections, all VPN
tunnels share the available gateway bandwidth.
Every Azure VPN gateway consists of two instances in an active-standby configuration.
For planned maintenance or unplanned disruption to the active instance, failover occurs and the standby
instance takes over automatically, and resumes the site-to-site or VNet-to-VNet connection.
The switchover causes a brief interruption.
For planned maintenance, connectivity should be restored within 10 to 15 seconds.
For unplanned issues, the connection recovery takes longer, about one to 1.5 minutes in the worst case.
Point-to-site (P2S ) VPN client connections to the gateway will be disconnected, and the users will need
to reconnect from client machines.
When setting up a site-to-site VPN, you do the following:
You need a VNet whose address range doesn't overlap with the on-premises network to which the VPN will
connect.
You create a gateway subnet in the network.
You create a VPN gateway, specify the gateway type (VPN ) and whether the gateway is policy-based or route-
based. A route-based VPN is considered more capable and future-proof.
You create a local network gateway on-premises, and configure your on-premises VPN device.
You create a failover site-to-site VPN connection between the VNet gateway and the on-premises device.
Using route-based VPN allows for either active-passive or active-active connections to Azure. Route-based
also supports both site-to-site (from any computer) and point-to-site (from a single computer) connections
concurrently.
You specify the gateway SKU that you want to use. This will depend on your workload requirements,
throughputs, features, and SLAs.
Border gateway protocol (BGP ) is an optional feature you can use with Azure ExpressRoute and route-based
VPN gateways to propagate your on-premises BGP routes to your VNets.

Site-to -site VPN


Learn more:
Review compatible on-premises VPN devices.
Get an overview of VPN gateways.
Learn about highly available VPN connections.
Learn about planning and designing a VPN gateway.
Review VPN gateway settings.
Review gateway SKUs.
Read about setting up BGP with Azure VPN gateways.
Best practice: Configure a gateway for VPN Gateways
When you create a VPN gateway in Azure, you must use a special subnet named GatewaySubnet. When creating
this subnet note these best practices:
The prefix length of the gateway subnet can have a maximum prefix length of 29 (for example,
10.119.255.248/29). The current recommendation is that you use a prefix length of 27 (for example,
10.119.255.224/27).
When you define the address space of the gateway subnet, use the very last part of the VNet address space.
When using the Azure GatewaySubnet, never deploy any VMs or other devices such as Application Gateway to
the gateway subnet.
Don't assign a network security group (NSG ) to this subnet. It will cause the gateway to stop functioning.
Learn more:
Use this tool to determine your IP address space.

Best practice: Implement Azure Virtual WAN for branch offices


For multiple VPN connections, Azure Virtual WAN is a networking service that provides optimized and
automated branch-to-branch connectivity through Azure.
Virtual WAN allows you to connect and configure branch devices to communicate with Azure. This can be
done manually, or by using preferred provider devices through a Virtual WAN partner.
Using preferred provider devices allows for simple use, connectivity, and configuration management.
The Azure WAN built-in dashboard provides instant troubleshooting insights that save time, and provide an
easy way to track large-scale site-to-site connectivity.
Learn more: Learn about Azure Virtual WAN.
Best practice: Implement ExpressRoute for mission-critical connections
The Azure ExpressRoute service extends your on-premises infrastructure into the Microsoft cloud by creating
private connections between the virtual Azure datacenter and on-premises networks.
ExpressRoute connections can be over an any-to-any (IP VPN ) network, a point-to-point Ethernet network, or
through a connectivity provider. They don't go over the public internet.
ExpressRoute connections offer higher security, reliability, and higher speeds (up to 10 Gbps), along with
consistent latency.
ExpressRoute is useful for virtual datacenters, as customers can get the benefits of compliance rules associated
with private connections.
With ExpressRoute Direct you can connect directly to Microsoft routers at 100Gbps, for larger bandwidth
needs.
ExpressRoute uses BGP to exchange routes between on-premises networks, Azure instances, and Microsoft
public addresses.
Deploying ExpressRoute connections usually involves engaging with an ExpressRoute service provider. For a quick
start, it's common to initially use a site-to-site VPN to establish connectivity between the virtual datacenter and
on-premises resources, and then migrate to an ExpressRoute connection when a physical interconnection with
your service provider is established.
Learn more:
Read an overview of ExpressRoute.
Learn about ExpressRoute Direct.
Best practice: Optimize ExpressRoute routing with BGP communities
When you have multiple ExpressRoute circuits, you have more than one path to connect to Microsoft. As a result,
suboptimal routing can happen and your traffic might take a longer path to reach Microsoft, and Microsoft to your
network. The longer the network path,the higher the latency. Latency has direct impact on app performance and
user experience.
Example:
Let's review an example:
You have two offices in the US, one in Los Angeles and one in New York.
Your offices are connected on a WAN, which can be either your own backbone network or your service
provider's IP VPN.
You have two ExpressRoute circuits, one in US West and one in US East, that are also connected on the WAN.
Obviously, you have two paths to connect to the Microsoft network.
Problem:
Now imagine you have an Azure deployment (for example, Azure App Service) in both US West and US East.
You want users in each office to access their nearest Azure services for an optimal experience.
Thus you want to connect users in Los Angeles to Azure US West and users in New York to Azure US East.
This works for East Coast users, but not for those on the West Coast. The problem is:
On each ExpressRoute circuit, we advertise both prefixes in Azure US East (23.100.0.0/16) and Azure
US West (13.100.0.0/16).
Without knowing which prefix is from which region, prefixes aren't treated differently.
Your WAN network can assume that both prefixes are closer to US East than US West, and thus route
users from both offices to the ExpressRoute circuit in US East, providing a suboptimal experience for
users in the Los Angeles office.
BGP communities unoptimized connection
Solution:
To optimize routing for both office users, you need to know which prefix is from Azure US West and which is from
Azure US East. You can encode this information by using BGP community values.
You assign a unique BGP community value to each Azure region. For example, 12076:51004 for US East;
12076:51006 for US West.
Now that it's clear which prefix belongs to which Azure region, you can configure a preferred ExpressRoute
circuit.
Because you're using BGP to exchange routing information, you can use BGP's local preference to influence
routing.
In our example, you assign a higher local preference value to 13.100.0.0/16 in US West than in US East, and
similarly, a higher local preference value to 23.100.0.0/16 in US East than in US West.
This configuration ensures that when both paths to Microsoft are available, users in Los Angeles will connect to
Azure US West using the west circuit, and users New York connect to Azure US East using the east circuit.
Routing is optimized on both sides.
BGP communities optimized connection
Learn more:
Learn about optimizing routing.

Secure VNets
The responsibility for securing VNets is shared between Microsoft and you. Microsoft provides many networking
features, as well as services that help keep resources secure. When designing security for VNets, best practices
you should follow include implementing a perimeter network, using filtering and security groups, securing access
to resources and IP addresses, and implementing attack protection.
Learn more:
Get an overview of best practices for network security.
Learn how to design for secure networks.

Best practice: Implement an Azure perimeter network


Although Microsoft invests heavily in protecting the cloud infrastructure, you must also protect your cloud
services and resource groups. A multilayered approach to security provides the best defense. Putting a perimeter
network in place is an important part of that defense strategy.
A perimeter network protects internal network resources from an untrusted network.
It's the outermost layer that's exposed to the internet. It generally sits between the internet and the enterprise
infrastructure, usually with some form of protection on both sides.
In a typical enterprise network topology, the core infrastructure is heavily fortified at the perimeters, with
multiple layers of security devices. The boundary of each layer consists of devices and policy enforcement
points.
Each layer can include a combination of the network security solutions that include firewalls, Denial of Service
(DoS ) prevention, intrusion detection/intrusion protection systems (IDS/IPS ), and VPN devices.
Policy enforcement on the perimeter network can use firewall policies, access control lists (ACLs), or specific
routing.
As incoming traffic arrives from the internet, it's intercepted and handled by a combination of defense solution
to block attacks and harmful traffic, while allowing legitimate requests into the network.
Incoming traffic can route directly to resources in the perimeter network. The perimeter network resource can
then communicate with other resources deeper in the network, moving traffic forward into the network after
validation.
The following figure shows an example of a single subnet perimeter network in a corporate network, with two
security boundaries.

Perimeter network deployment


Learn more:
Learn about deploying a perimeter network between Azure and your on-premises datacenter.

Best practice: Filter VNet traffic with NSGs


Network security groups (NSG ) contain multiple inbound and outbound security rules that filter traffic going to
and from resources. Filtering can be by source and destination IP address, port, and protocol.
NSGs contain security rules that allow or deny inbound network traffic to (or outbound network traffic from)
several types of Azure resources. For each rule, you can specify source and destination, port, and protocol.
NSG rules are evaluated by priority using five-tuple information (source, source port, destination, destination
port, and protocol) to allow or deny the traffic.
A flow record is created for existing connections. Communication is allowed or denied based on the connection
state of the flow record.
A flow record allows an NSG to be stateful. For example, if you specify an outbound security rule to any
address over port 80, you don't need an inbound security rule to respond to the outbound traffic. You only
need to specify an inbound security rule if communication is initiated externally.
The opposite is also true. If inbound traffic is allowed over a port, you don't need to specify an outbound
security rule to respond to traffic over the port.
Existing connections aren't interrupted when you remove a security rule that enabled the flow. Traffic flows are
interrupted when connections are stopped, and no traffic is flowing in either direction, for at least a few
minutes.
When creating NSGs, create as few as possible but as many that are necessary.
Best practice: Secure north/south and east/west traffic
When securing VNets, it's important to consider attack vectors.
Using only subnet NSGs simplifies your environment, but only secures traffic into your subnet. This is known
as north/south traffic.
Traffic between VMs on the same subnet is known as east/west traffic.
It's important to use both forms of protection, so that if a hacker gains access from the outside they'll be
stopped when trying to attach machines located in the same subnet.
Use service tags on NSGs
A service tag represents a group of IP address prefixes. Using a service tag helps minimize complexity when you
create NSG rules.
You can use service tags instead of specific IP addresses when you create rules.
Microsoft manages the address prefixes associated with a service tag, and automatically updates the service
tag as addresses change.
You can't create your own service tag, or specify which IP addresses are included within a tag.
Service tags take the manual work out of assigning a rule to groups of Azure services. For example, if you want to
allow a VNet subnet containing web servers access to an Azure SQL Database, you could create an outbound rule
to port 1433, and use the Sql service tag.
This Sql tag denotes the address prefixes of the Azure SQL Database and Azure SQL Data Warehouse
services.
If you specify Sql as the value, traffic is allowed or denied to Sql.
If you only want to allow access to Sql in a specific region, you can specify that region. For example, if you want
to allow access only to Azure SQL Database in the East US region, you can specify Sql.EastUS as a service
tag.
The tag represents the service, but not specific instances of the service. For example, the tag represents the
Azure SQL Database service, but doesn't represent a particular SQL database or server.
All address prefixes represented by this tag are also represented by the Internet tag.
Learn more:
Read about NSGs.
Review the service tags available for NSGs.

Best practice: Use application security groups


Application security groups enable you to configure network security as a natural extension of an app structure.
You can group VMs and define network security policies based on application security groups.
Application security groups enable you to reuse your security policy at scale without manual maintenance of
explicit IP addresses.
Application security groups handle the complexity of explicit IP addresses and multiple rule sets, allowing you
to focus on your business logic.
Example:
Application
security group example

NETWORK INTERFACE APPLICATION SECURITY GROUP

NIC1 AsgWeb

NIC2 AsgWeb

NIC3 AsgLogic

NIC4 AsgDb

In our example, each network interface belongs to only one application security group, but in fact an interface
can belong to multiple groups, in accordance with Azure limits.
None of the network interfaces have an associated NSG. NSG1 is associated to both subnets and contains the
following rules.

RULE NAME PURPOSE DETAILS


RULE NAME PURPOSE DETAILS

Allow-HTTP-Inbound-Internet Allow traffic from the internet to the Priority: 100


web servers. Inbound traffic from the
internet is denied by the Source: internet
DenyAllInbound default security rule, so
no additional rule is needed for the Source port: *
AsgLogic or AsgDb application security
groups. Destination: AsgWeb

Destination port: 80

Protocol: TCP

Access: Allow.

Deny-Database-All AllowVNetInBound default security rule Priority: 120


allows all communication between
resources in the same VNet, this rule is Source: *
needed to deny traffic from all
resources. Source port: *

Destination: AsgDb

Destination port: 1433

Protocol: All

Access: Deny.

Allow-Database-BusinessLogic Allow traffic from the AsgLogic Priority: 110


application security group to the AsgDb
application security group. The priority Source: AsgLogic
for this rule is higher than the Deny-
Database-All rule, and is processed Source port: *
before that rule, so traffic from the
AsgLogic application security group is Destination: AsgDb
allowed, and all other traffic is blocked.
Destination port: 1433

Protocol: TCP

Access: Allow.

The rules that specify an application security group as the source or destination are only applied to the network
interfaces that are members of the application security group. If the network interface is not a member of an
application security group, the rule is not applied to the network interface, even though the network security
group is associated to the subnet.
Learn more:
Learn about application security groups.
Best practice: Secure access to PaaS using VNet service endpoints
VNet service endpoints extend your VNet private address space and identity to Azure services over a direct
connection.
Endpoints allow you to secure critical Azure service resources to your VNets only. Traffic from your VNet to the
Azure service always remains on the Microsoft Azure backbone network.
VNet private address space can be overlapping and thus cannot be used to uniquely identify traffic originating
from a VNet.
After service endpoints are enabled in your VNet, you can secure Azure service resources by adding a VNet
rule to the service resources. This provides improved security by fully removing public internet access to
resources, and allowing traffic only from your VNet.

Service endpoints
Learn more:
Learn about VNet service endpoints.

Best practice: Control public IP addresses


Public IP addresses in Azure can be associated with VMs, load balancers, application gateways, and VPN
gateways.
Public IP addresses allow internet resources to communicate inbound to Azure resources, and Azure resources
to communicate outbound to the internet.
Public IP addresses are created with a basic or standard SKU, which have several differences. Standard SKUs
can be assigned to any service, but are most usually configured on VMs, load balancers, and application
gateways.
It's important to note that a basic public IP address doesn't have an NSG automatically configured. You need to
configure your own and assign rules to control access. Standard SKU IP addresses have an NSG and rules
assigned by default.
As a best practice, VMs shouldn't be configured with a public IP address.
If you need a port opened, it should only be for web services such as port 80 or 443.
Standard remote management ports such as SSH (22) and RDP (3389) should be set to deny, along
with all other ports, using NSGs.
A better practice is to put VMs behind an Azure load balancer or application gateway. Then if access to remote
management ports is needed, you can use just-in-time VM access in the Azure Security Center.
Learn more:
Public IP addresses in Azure
Manage virtual machine access using just-in-time

Take advantage of Azure security features for networking


Azure has platform security features that are easy to use, and provide rich countermeasures to common network
attacks. These include Azure Firewall, web application firewall, and Network Watcher.

Best practice: Deploy Azure Firewall


Azure Firewall is a managed cloud-based network security service that protects your VNet resources. It is a fully
stateful managed firewall with built-in high availability and unrestricted cloud scalability.

Azure Firewall
Azure Firewall can centrally create, enforce, and log application and network connectivity policies across
subscriptions and VNets.
Azure Firewall uses a static public IP address for your VNet resources, allowing outside firewalls to identify
traffic originating from your VNet.
Azure Firewall is fully integrated with Azure Monitor for logging and analytics.
As a best practice when creating Azure Firewall rules, use the FQDN tags to create rules.
An FQDN tag represents a group of FQDNs associated with well-known Microsoft services.
You can use an FQDN tag to allow the required outbound network traffic through the firewall.
For example, to manually allow Windows Update network traffic through your firewall, you would need to
create multiple application rules. Using FQDN tags, you create an application rule, and include the Windows
Updates tag. With this rule in place, network traffic to Microsoft Windows Update endpoints can flow through
your firewall.
Learn more:
Get an overview of Azure Firewall.
Learn about FQDN tags.

Best practice: Deploy a web application firewall (WAF)


Web applications are increasingly targets of malicious attacks that exploit commonly known vulnerabilities.
Exploits include SQL injection attacks and cross-site scripting attacks. Preventing such attacks in application code
can be challenging, and can require rigorous maintenance, patching and monitoring at multiple layers of the
application topology. A centralized web application firewall helps make security management much simpler and
helps app administrators guard against threats or intrusions. A web app firewall can react to security threats faster,
by patching known vulnerabilities at a central location, instead of securing individual web applications. Existing
application gateways can be converted to a web application firewall enabled application gateway easily.
The web application firewall (WAF ) is a feature of Azure Application Gateway.
WAF provides centralized protection of your web applications, from common exploits and vulnerabilities.
WAF protects without modification to back-end code.
It can protect multiple web apps at the same time behind an application gateway.
WAF is integrated with Azure Security Center.
You can customize WAF rules and rule groups to suit your app requirements.
As a best practice, you should use a WAF in front on any web-facing app, including apps on Azure VMs or as
an Azure App Service.
Learn more:
Learn about WAF.
Review WAF limitations and exclusions.

Best practice: Implement Azure Network Watcher


Azure Network Watcher provides tools to monitor resources and communications in an Azure VNet. For example,
you can monitor communications between a VM and an endpoint such as another VM or FQDN, view resources
and resource relationships in a VNet, or diagnose network traffic issues.

Network Watcher
With Network Watcher you can monitor and diagnose networking issues without logging into VMs.
You can trigger packet capture by setting alerts, and gain access to real-time performance information at the
packet level. When you see an issue, you can investigate it in detail.
As a best practice, use Network Watcher to review NSG flow logs.
NSG flow logs in Network Watcher allow you to view information about ingress and egress IP traffic
through an NSG.
Flow logs are written in JSON format.
Flow logs show outbound and inbound flows on a per-rule basis, the network interface (NIC ) to which
the flow applies, 5-tuple information about the flow (source/destination IP, source/destination port, and
protocol), and whether the traffic was allowed or denied.
Learn more:
Get an overview of Network Watcher.
Learn more about NSG flow Logs.

Use partner tools in the Azure Marketplace


For more complex network topologies, you might use security products from Microsoft partners, in particular
network virtual appliances (NVAs).
An NVA is a VM that performs a network function, such as a firewall, WAN optimization, or other network
function.
NVAs bolster VNet security and network functions. They can be deployed for highly available firewalls,
intrusion prevention, intrusion detection, web application firewalls (WAFs), WAN optimization, routing, load
balancing, VPN, certificate management, Active Directory, and multi-factor authentication.
NVA is available from numerous vendors in the Azure Marketplace.

Best practice: Implement firewalls and NVAs in hub networks


In the hub, the perimeter network (with access to the internet) is normally managed through an Azure firewall, a
firewall farm, or a web application firewall (WAF ). Consider the following comparisons.

FIREWALL TYPE DETAILS

WAFs Web applications are common, and tend to suffer from


vulnerabilities and potential exploits.

WAFs are designed to detect attacks against web applications


(HTTP/HTTPS), more specifically than a generic firewall.

Compared with traditional firewall technology, WAFs have a


set of specific features that protect internal web servers from
threats.

Azure Firewall Like NVA firewall farms, Azure Firewall uses a common
administration mechanism, and a set of security rules to
protect workloads hosted in spoke networks, and to control
access to on-premises networks.

The Azure Firewall has built-in scalability.


FIREWALL TYPE DETAILS

NVA firewalls Like Azure Firewall NVA firewall farms have common
administration mechanism, and a set of security rules to
protect workloads hosted in spoke networks, and to control
access to on-premises networks.

NVA firewalls can be manually scaled behind a load balancer.

Though an NVA firewall has less specialized software than a


WAF, it has broader application scope to filter and inspect any
type of traffic in egress and ingress.

If you want to use NVA you can find them in the Azure
Marketplace.

We recommend using one set of Azure Firewalls (or NVAs) for traffic originating on the internet, and another for
traffic originating on-premises.
Using only one set of firewalls for both is a security risk, as it provides no security perimeter between the two
sets of network traffic.
Using separate firewall layers reduces the complexity of checking security rules, and it's clear which rules
correspond to which incoming network request.
Learn more:
Learn about using NVAs in an Azure VNet.

Next steps
Review other best practices:
Best practices for security and management after migration.
Best practices for cost management after migration.
Application migration patterns and examples
7 minutes to read • Edit Online

This section of the Cloud Adoption Framework provides examples of several common migration scenarios,
demonstrating how you can migrate on-premises infrastructure to the Microsoft Azure cloud.

Introduction
Azure provides access to a comprehensive set of cloud services. As developers and IT professionals, you can use
these services to build, deploy, and manage applications on a range of tools and frameworks, through a global
network of datacenters. As your business faces challenges associated with the digital shift, the Azure cloud helps
you to figure out how to optimize resources and operations, engage with your customers and employees, and
transform your products.
However, Azure recognizes that even with all the advantages that the cloud provides in terms of speed and
flexibility, minimized costs, performance, and reliability, many organizations are going to need to run on-premises
datacenters for some time to come. In response to cloud adoption barriers, Azure provides a hybrid cloud strategy
that builds bridges between your on-premises datacenters, and the Azure public cloud. For example, using Azure
cloud resources like Azure Backup to protect on-premises resources, or using Azure analytics to gain insights into
on-premises workloads.
As part of the hybrid cloud strategy, Azure provides growing solutions for migrating on-premises apps and
workloads to the cloud. With simple steps, you can comprehensively assess your on-premises resources to figure
out how they'll run in the Azure cloud. Then, with a deep assessment in hand, you can confidently migrate
resources to Azure. When resources are up and running in Azure, you can optimize them to retain and improve
access, flexibility, security, and reliability.

Migration patterns
Strategies for migration to the cloud fall into four broad patterns: rehost, refactor, rearchitect, or rebuild. The
strategy you adopt depends on your business drivers and migration goals. You might adopt multiple patterns. For
example, you could choose to rehost simple apps, or apps that aren't critical to your business, but rearchitect those
that are more complex and business-critical. Let's look at these patterns.

PATTERN DEFINITION WHEN TO USE

Rehost Often referred to as a lift and shift When you need to move apps quickly
migration. This option doesn't require to the cloud.
code changes, and allows you to
migrate your existing apps to Azure When you want to move an app
quickly. Each app is migrated as is, to without modifying it.
reap the benefits of the cloud, without
the risk and cost associated with code When your apps are architected so that
changes. they can take advantage of Azure IaaS
scalability after migration.

When apps are important to your


business, but you don't need immediate
changes to app capabilities.
PATTERN DEFINITION WHEN TO USE

Refactor Often referred to as "repackaging," If your app can easily be repackaged to


refactoring requires minimal changes to work in Azure.
apps, so that they can connect to Azure
PaaS, and use cloud offerings. If you want to apply innovative DevOps
practices provided by Azure, or you're
For example, you could migrate existing thinking about DevOps using a
apps to Azure App Service or Azure container strategy for workloads.
Kubernetes Service (AKS).
For refactoring, you need to think about
Or, you could refactor relational and the portability of your existing code
nonrelational databases into options base, and available development skills.
such as Azure SQL Database Managed
Instance, Azure Database for MySQL,
Azure Database for PostgreSQL, and
Azure Cosmos DB.

Rearchitect Rearchitecting for migration focuses on When your apps need major revisions
modifying and extending app to incorporate new capabilities, or to
functionality and the code base to work effectively on a cloud platform.
optimize the app architecture for cloud
scalability. When you want to use existing
application investments, meet scalability
For example, you could break down a requirements, apply innovative Azure
monolithic application into a group of DevOps practices, and minimize use of
microservices that work together and virtual machines.
scale easily.

Or, you could rearchitect relational and


nonrelational databases to a fully
managed database solution, such as
Azure SQL Database Managed Instance,
Azure Database for MySQL, Azure
Database for PostgreSQL, and Azure
Cosmos DB.

Rebuild Rebuild takes things a step further by When you want rapid development, and
rebuilding an app from scratch using existing apps have limited functionality
Azure cloud technologies. and lifespan.

For example, you could build greenfield When you're ready to expedite business
apps with cloud-native technologies like innovation (including DevOps practices
Azure Functions, Azure AI, Azure SQL provided by Azure), build new
Database Managed Instance, and Azure applications using cloud-native
Cosmos DB. technologies, and take advantage of
advancements in AI, Blockchain, and
IoT.

Migration example articles


The articles in this section provide examples of several common migration scenarios. Each of these examples
include background information and detailed deployment scenarios that illustrate how to set up a migration
infrastructure and assess the suitability of on-premises resources for migration. More articles will be added to this
section over time.
Common migration and modernization project categories.
The articles in the series are summarized below.
Each migration scenario is driven by slightly different business goals that determine the migration strategy.
For each deployment scenario, we provide information about business drivers and goals, a proposed
architecture, steps to perform the migration, and recommendation for cleanup and next steps after migration is
complete.
Assessment
ARTICLE DETAILS

Assess on-premises resources for migration to Azure This article shows how to run an assessment of an on-
premises app running on VMware. In the example, an example
organization assesses app VMs using the Azure Migrate
service, and the app SQL Server database using Data
Migration Assistant.

Infrastructure
ARTICLE DETAILS

Deploy Azure infrastructure This article shows how an organization can prepare its on-
premises infrastructure and its Azure infrastructure for
migration. The infrastructure example established in this article
is referenced in the other samples provided in this section.

Windows Server workloads


ARTICLE DETAILS

Rehost an app on Azure VMs This article provides an example of migrating on-premises app
VMs to Azure VMs using the Site Recovery service.

Rearchitect an app in Azure containers and Azure SQL This article provides an example of migrating an app while
Database rearchitecting the app web tier as a Windows container
running in Azure Service Fabric, and the database with Azure
SQL Database.

Linux workloads
ARTICLE DETAILS

Rehost a Linux app on Azure VMs and Azure Database for This article provides an example of migrating a Linux-hosted
MySQL app to Azure VMs by using Site Recovery. It migrates the app
database to Azure Database for MySQL by using MySQL
Workbench.

Rehost a Linux app on Azure VMs This example shows how to complete a lift and shift migration
of a Linux-based app to Azure VMs, using the Site Recovery
service.

SQL Server workloads


ARTICLE DETAILS

Rehost an app on an Azure VM and SQL Database Managed This article provides an example of a lift and shift migration to
Instance Azure for an on-premises app. This involves migrating the app
front-end VM using Azure Site Recovery, and the app
database to an Azure SQL Database Managed Instance using
the Azure Database Migration Service.

Rehost an app on Azure VMs and in a SQL Server Always On This example shows how to migrate an app and data using
availability group Azure hosted SQL Server VMs. It uses Site Recovery to
migrate the app VMs, and the Azure Database Migration
Service to migrate the app database to a SQL Server cluster
that's protected by an Always On availability group.

ASP.NET, PHP, and Java apps


ARTICLE DETAILS

Refactor an app in an Azure web app and Azure SQL Database This example shows how to migrate an on-premises
Windows-based app to an Azure web app and migrates the
app database to an Azure SQL Server instance with the Data
Migration Assistant.

Refactor a Linux app to multiple regions using Azure App This example shows how to migrate an on-premises Linux-
Service, Azure Traffic Manager, and Azure Database for MySQL based app to an Azure web app on multiple Azure regions
using Azure Traffic Manager, integrated with GitHub for
continuous delivery. The app database is migrated to an Azure
Database for MySQL instance.

Rebuild an app in Azure This article provides an example of rebuilding an on-premises


app using a range of Azure capabilities and managed services,
including Azure App Service, Azure Kubernetes Service (AKS),
Azure Functions, Azure Cognitive Services, and Azure Cosmos
DB.

Refactor Team Foundation Server on Azure DevOps Services This article shows an example migration of an on-premises
Team Foundation Server deployment to Azure DevOps
Services in Azure.

Migration scaling
ARTICLE DETAILS

Scale a migration to Azure This article how an example organization prepares to scale to
a full migration to Azure.
Demo apps
The example articles provided in this section use two demo apps: SmartHotel360 and osTicket.
SmartHotel360: This app was developed by Microsoft as a test app that you can use when working with
Azure. It's provided as open source and you can download it from GitHub. It's an ASP.NET app connected to a
SQL Server database. In the scenarios discussed in these articles, the current version of this app is deployed to
two VMware VMs running Windows Server 2008 R2, and SQL Server 2008 R2. These app VMs are hosted on-
premises and managed by vCenter Server.
osTicket: An open-source service desk ticketing app that runs on Linux. You can download it from GitHub. In
the scenarios discussed in these articles, the current version of this app is deployed on-premises to two
VMware VMs running Ubuntu 16.04 LTS, using Apache 2, PHP 7.0, and MySQL 5.7
Deploy a migration infrastructure
37 minutes to read • Edit Online

This article shows how the fictional company Contoso prepares its on-premises infrastructure for migration,
sets up an Azure infrastructure in preparation for migration, and runs the business in a hybrid environment.
When you use this example to help plan your own infrastructure migration efforts, keep the following in mind:
The provided sample architecture is specific to Contoso. Review your own organization's business needs,
structure, and technical requirements when making important infrastructure decisions about subscription
design or networking architecture.
Whether you need all the elements described in this article depends on your migration strategy. For example,
if you're building only cloud-native apps in Azure, you might need a less complex networking structure.

Overview
Before Contoso can migrate to Azure, it's critical to prepare an Azure infrastructure. Generally, there are six
broad areas Contoso needs to think about:
Step 1: Azure subscriptions. How will Contoso purchase Azure, and interact with the Azure platform and
services?
Step 2: Hybrid identity. How will it manage and control access to on-premises and Azure resources after
migration? How does Contoso extend or move identity management to the cloud?
Step 3: Disaster recovery and resilience. How will Contoso ensure that its apps and infrastructure are
resilient if outages and disasters occur?
Step 4: Networking. How should Contoso design a networking infrastructure, and establish connectivity
between its on-premises datacenter and Azure?
Step 5: Security. How will it secure the hybrid/Azure deployment?
Step 6: Governance. How will Contoso keep the deployment aligned with security and governance
requirements?

Before you start


Before we start looking at the infrastructure, you might want to read some background information about the
Azure capabilities we discuss in this article:
Several options are available for purchasing Azure access, including Pay-As-You-Go, Enterprise Agreements
(EA), Open Licensing from Microsoft resellers, or from Microsoft Partners known as Cloud Solution
Providers (CSPs). Learn about purchase options, and read about how Azure subscriptions are organized.
Get an overview of Azure identity and access management. In particular, learn about Azure AD and
extending on-premises Active Directory to the cloud. There's a useful downloadable e-book about identity
and access management (IAM ) in a hybrid environment.
Azure provides a robust networking infrastructure with options for hybrid connectivity. Get an overview of
networking and network access control.
Get an introduction to Azure Security, and read about creating a plan for governance.

On-premises architecture
Here's a diagram showing the current Contoso on-premises infrastructure.
Contoso has one main datacenter located in the city of New York in the Eastern United States.
There are three additional local branches across the United States.
The main datacenter is connected to the internet with a fiber metro ethernet connection (500 mbps).
Each branch is connected locally to the internet using business class connections, with IPSec VPN tunnels
back to the main datacenter. This allows the entire network to be permanently connected, and optimizes
internet connectivity.
The main datacenter is fully virtualized with VMware. Contoso has two ESXi 6.5 virtualization hosts,
managed by vCenter Server 6.5.
Contoso uses Active Directory for identity management, and DNS servers on the internal network.
The domain controllers in the datacenter run on VMware VMs. The domain controllers at local branches run
on physical servers.

Step 1: Buy and subscribe to Azure


Contoso needs to figure out how to buy Azure, how to architect subscriptions, and how to license services and
resources.
Buy Azure
Contoso is going with an Enterprise Agreement (EA). This entails an upfront monetary commitment to Azure,
entitling Contoso to earn great benefits, including flexible billing options and optimized pricing.
Contoso estimated what its yearly Azure spend will be. When it signed the agreement, Contoso paid for the
first year in full.
Contoso needs to use all commitments before the year is over, or lose the value for those dollars.
If for some reason Contoso exceeds its commitment and spends more, Microsoft will invoice them for the
difference.
Any cost incurred above the commitment will be at the same rates as those in the Contoso contract. There
are no penalties for going over.
Manage subscriptions
After paying for Azure, Contoso needs to figure out how to manage Azure subscriptions. Contoso has an EA,
and thus no limit on the number of Azure subscriptions it can set up.
An Azure Enterprise Enrollment defines how a company shapes and uses Azure services, and defines a
core governance structure.
As a first step, Contoso has defined a structure known as an enterprise scaffold for Enterprise Enrollment.
Contoso used this article to help understand and design a scaffold.
For now, Contoso has decided to use a functional approach to manage subscriptions.
Inside the enterprise it will use a single IT department that controls the Azure budget. This will be the
only group with subscriptions.
Contoso will extend this model in the future, so that other corporate groups can join as departments
in the Enterprise Enrollment.
Inside the IT department Contoso has structured two subscriptions, Production and Development.
If Contoso requires additional subscriptions in the future, it needs to manage access, policies and
compliance for those subscriptions. Contoso will do that by introducing Azure management groups,
as an additional layer above subscriptions.

Examine licensing
With subscriptions configured, Contoso can look at Microsoft licensing. The licensing strategy will depend on
the resources that Contoso wants to migrate into Azure and how Azure VMs and services are selected and
deployed.
Azure Hybrid Benefit
When deploying VMs in Azure, standard images include a license that will charge Contoso by the minute for the
software being used. However, Contoso has been a long-term Microsoft customer, and has maintained EAs and
open licenses with Software Assurance (SA).
Azure Hybrid Benefit provides a cost-effective method for Contoso migration, by allowing it to save on Azure
VMs and SQL Server workloads by converting or reusing Windows Server Datacenter and Standard edition
licenses covered with Software Assurance. This will enable Contoso to pay a lower based compute rate for VMs
and SQL Server. Learn more.
License Mobility
License Mobility through SA gives Microsoft Volume Licensing customers like Contoso the flexibility to deploy
eligible server apps with active SA on Azure. This eliminates the need to purchase new licenses. With no
associated mobility fees, existing licenses can easily be deployed in Azure. Learn more.
Reserve instances for predictable workloads
Predictable workloads are those that always need to be available with VMs running. For example, line-of-
business apps such as an SAP ERP system. On the other hand, unpredictable workloads are those that are
variable, such as VMs that are on during high demand and off when demand is low.
In exchange for using reserved instances for specific VM instances must be maintained for large durations of
time, Console can get both a discount, and prioritized capacity. Using Azure Reserved Instances, together with
Azure Hybrid Benefit, Contoso can save up to 82% off regular pay-as-you-go pricing (April 2018).

Step 2: Manage hybrid identity


Giving and controlling user access to Azure resources with identity and access management (IAM ) is an
important step in pulling together an Azure infrastructure.
Contoso decides to extend its on-premises Active Directory into the cloud, rather than build a new separate
system in Azure.
It creates an Azure-based Active Directory to do this.
Contoso doesn't have Office 365 in place, so it needs to provision a new Azure AD.
Office 365 uses Azure AD for user management. If Contoso was using Office 365, it would already have an
Azure AD tenant, and can use that as the primary directory.
Learn more about Azure AD for Office 365, and learn how to add a subscription to an existing Azure AD
tenant.
Create an Azure AD
Contoso is using the Azure AD Free edition that's included with an Azure subscription. Contoso admins set up a
directory as follows:
1. In the Azure portal, they navigate to Create a resource > Identity > Azure Active Directory.
2. In Create directory, they specify a name for the directory, an initial domain name, and region in which
the Azure AD directory should be created.
NOTE
The directory that's created has an initial domain name in the form domainname.onmicrosoft.com. The name
can't be changed or deleted. Instead, they need to add its registered domain name to Azure AD.

Add the domain name


To use its standard domain name, Contoso admins need to add it as a custom domain name to Azure AD. This
option allows them to assign familiar user names. For example, a user can log in with the email address
billg@contoso.com, rather than needing billg@contosomigration.microsoft.com.
To set up a custom domain name they add it to the directory, add a DNS entry, and then verify the name in
Azure AD.
1. In Custom domain names > Add custom domain, they add the domain.
2. To use a DNS entry in Azure they need to register it with their domain registrar.
In the Custom domain names list, they note the DNS information for the name. It's using an MX
entry.
They need access to the name server to do this. They log into the Contoso.com domain, and create a
new MX record for the DNS entry provided by Azure AD, using the details noted.
3. After the DNS records propagate, in the details name for the domain, they select Verify to check the
custom domain name.
Set up on-premises and Azure groups and users
Now that the Azure AD is up and running, Contoso admins need to add employees to on-premises Active
Directory groups that will synchronize to Azure Active Directory. They should use on-premises group names
that match the names of resource groups in Azure. This makes it easier to identify matches for synchronization
purposes.
Create resource groups in Azure
Azure resource groups gather Azure resources together. Using a resource group ID allows Azure to perform
operations on the resources within the group.
An Azure subscription can have multiple resource groups, but a resource group can only exist within a single
subscription.
In addition, a single resource group can have multiple resources, but a resource can only belong to a single
resource group.
Contoso admins set up Azure resource groups as summarized in the following table.

RESOURCE GROUP DETAILS

ContosoCobRG This group contains all resources related to continuity of


business (COB). It includes vaults that Contoso will use for
the Azure Site Recovery service, and the Azure Backup
service.

It will also include resources used for migration, including


Azure Migrate and Azure Database Migration Service.

ContosoDevRG This group contains development and test resources.

ContosoFailoverRG This group serves as a landing zone for failed over resources.

ContosoNetworkingRG This group contains all networking resources.


RESOURCE GROUP DETAILS

ContosoRG This group contains resources related to production apps


and databases.

They create resource groups as follows:


1. In the Azure portal > Resource groups, they add a group.
2. For each group they specify a name, the subscription to which the group belongs, and the region.
3. Resource groups appear in the Resource groups list.

Sc a l e r e so u r c e g r o u p s

In future, Contoso will add other resource groups based on needs. For example, they could define a resource
group for each app or service, so that they can be managed and secured independently.
Create matching security groups on-premises
1. In the on-premises Active Directory, Contoso admins set up security groups with names that match the
names of the Azure resource groups.

2. For management purposes, they create an additional group that will be added to all of the other groups.
This group will have rights to all resource groups in Azure. A limited number of Global Admins will be
added to this group.
Synchronize Active Directory
Contoso wants to provide a common identity for accessing resources on-premises and in the cloud. To do this, it
will integrate the on-premises Active Directory with Azure AD. With this model:
Users and organizations can take advantage of a single identity to access on-premises applications and cloud
services such as Office 365, or thousands of other sites on the internet.
Admins can use the groups in Active Directory to implement Role Based Access Control (RBAC ) in Azure.
To facilitate integration, Contoso uses the Azure AD Connect tool. When you install and configure the tool on a
domain controller, it synchronizes the local on-premises Active Directory identities to Azure AD.
Download the tool
1. In the Azure portal, Contoso admins go to Azure Active Directory > Azure AD Connect, and
download the latest version of the tool to the server they're using for synchronization.

2. They start the AzureADConnect.msi installation, with Use express settings. This is the most common
installation, and can be used for a single-forest topology, with password hash synchronization for
authentication.
3. In Connect to Azure AD, they specify the credentials for connecting to the Azure AD (in the form
admin@contoso.com or admin@contoso.onmicrosoft.com).

4. In Connect to AD DS, they specify credentials for the on-premises Active Directory (in the form
CONTOSO\admin or contoso.com\admin).

5. In Ready to configure, they select Start the synchronization process when configuration
completes to start the sync immediately. Then they install.
Note that:
Contoso has a direct connection to Azure. If your on-premises Active Directory is behind a proxy, read
this article.
After the first synchronization, on-premises Active Directory objects are visible in the Azure AD directory.
The Contoso IT team is represented in each group, based on its role.

Set up RBAC
Azure role-based access control (RBAC ) enables fine-grained access management for Azure. Using RBAC, you
can grant only the amount of access that users need to perform tasks. You assign the appropriate RBAC role to
users, groups, and applications at a scope level. The scope of a role assignment can be a subscription, a resource
group, or a single resource.
Contoso admins now assigns roles to the Active Directory groups that they synchronized from on-premises.
1. In the ControlCobRG resource group, they select Access control (IAM ) > Add role assignment.
2. In Add role assignment > Role, > Contributor, they select the ContosoCobRG group from the list.
The group then appears in the Selected members list.
3. They repeat this with the same permissions for the other resource groups (except for
ContosoAzureAdmins), by adding the Contributor permissions to the account that matches the
resource group.
4. For the ContosoAzureAdmins group, they assign the Owner role.

Step 3: Design for resiliency


Set up regions
Azure resources are deployed within regions.
Regions are organized into geographies, and data residency, sovereignty, compliance and resiliency
requirements are honored within geographical boundaries.
A region is composed of a set of datacenters. These datacenters are deployed within a latency-defined
perimeter, and connected through a dedicated regional low -latency network.
Each Azure region is paired with a different region for resiliency.
Read about Azure regions, and understand how regions are paired.
Contoso has decided to go with the East US 2 (located in Virginia) as the primary region, and Central US
(located in Iowa) as the secondary region. There are a couple of reasons for this:
The Contoso datacenter is located in New York, and Contoso considered latency to the closest datacenter.
The East US 2 region has all the service and products that Contoso needs to use. Not all Azure regions are
the same in terms of the products and services available. You can review Azure products by region.
Central US is the Azure paired region for East US 2.
As it thinks about the hybrid environment, Contoso needs to consider how to build resilience and a disaster
recovery strategy into the region design. Broadly, strategies range from a single-region deployment, which
relies on Azure platform features such as fault domains and regional pairing for resilience, through to a full
Active-Active model in which cloud services and database are deployed and servicing users from two regions.
Contoso has decided to take a middle road. It will deploy apps and resources in a primary region, and keep a full
copy of the infrastructure in the secondary region, so that it's ready to act as a full backup in case of complete
app disaster, or regional failure.
Set up availability
Availability sets:
Availability sets help protect apps and data from a local hardware and networking outage within a datacenter.
Availability sets distribute Azure VMs across different physical hardware within a datacenter.
Fault domains represent underlying hardware with a common power source and network switch within the
datacenter. VMs in an availability set are distributed across different fault domains to minimize outages
caused by a single hardware or networking failure.
Update domains represent underlying hardware that can undergo maintenance or be rebooted at the same
time. Availability sets also distribute VMs across multiple update domains to ensure at least one instance will
be running at all times.
Contoso will implement availability sets whenever VM workloads require high availability. Learn more.
Availability zones:
Availability zones help protect apps and data from failures affecting an entire datacenter within a region.
Each availability zone represents a unique physical location within an Azure region.
Each zone is made up of one or more datacenters equipped with independent power, cooling, and
networking.
There's a minimum of three separate zones in all enabled regions.
The physical separation of zones within a region protects applications and data from datacenter failures.
Contoso will deploy availability zones as apps call for scalability, high-availability, and resiliency. Learn more.
Set up backup
Azure Backup:
Azure Backup allows you to back up and restore Azure VM disks.
Azure backup allows automated backups of VM disk images, stored in Azure storage.
Backups are application consistent, ensuring backed up data is transactionally consistent and that
applications will boot up post-restore.
Azure Backup supports locally redundant storage (LRS ) to replicate multiple copies of your backup data
within a datacenter, in case of a local hardware failure.
In the event of a regional outage, Azure Backup also supports geo-redundant storage (GRS ), replicating your
backup data to a secondary paired region.
Azure Backup encrypts data in-transit using AES 256. Backed-up data at-rest is encrypted using Storage
Service Encryption (SSE ).
Contoso will use Azure Backup with GRS on all production VMs to ensure workload data is backed up and can
be quickly restored in case of outage or other disruption. Learn more.
Set up disaster recovery
Azure Site Recovery:
Azure Site Recovery helps ensure business continuity by keeping business apps and workloads running during
regional outages.
Azure Site Recovery continually replicates Azure VMs from a primary to a secondary region, ensuring
functional copies in both locations.
In the event of an outage in the primary region, your application or service fails over to using VMs instances
replicated in the secondary region, minimizing potential disruption.
When operations return to normal, your applications or services can fail back to VMs in the primary region.
Contoso will implement Azure Site Recovery for all production VMs used in mission-critical workloads,
ensuring minimal disruption during an outage in the primary region. Learn more

Step 4: Design a network infrastructure


With the regional design in place, Contoso is ready to consider a networking strategy. It needs to think about
how the on-premises datacenter and Azure connect and communicate with each other, and how to design the
network infrastructure in Azure. Specifically Contoso needs to:
Plan hybrid network connectivity. Figure out how it's going to connect networks across on-premises and
Azure.
Design an Azure network infrastructure. Decide how it will deploy networks over regions. How will
networks communicate within the same region, and across regions?
Design and set up Azure networks. Set up Azure networks and subnets, and decide what will reside in
them.
Plan hybrid network connectivity
Contoso considered a number of architectures for hybrid networking between Azure and the on-premises
datacenter. For more information, see Choose a solution for connecting an on-premises network to Azure.
As a reminder, the Contoso on-premises network infrastructure currently consists of the datacenter in New York,
and local branches in the eastern portion of the US. All locations have a business class connection to the
internet. Each of the branches is then connected to the datacenter via an IPSec VPN tunnel over the internet.

Here's how Contoso decided to implement hybrid connectivity:


1. Set up a new site-to-site VPN connection between the Contoso datacenter in New York and the two Azure
regions in East US 2 and Central US.
2. Branch office traffic bound for Azure virtual networks will route through the main Contoso datacenter.
3. As Contoso scales up Azure deployment, it will establish an ExpressRoute connection between the datacenter
and the Azure regions. When this happens, Contoso will retain the VPN site-to-site connection for failover
purposes only.
Learn more about choosing between a VPN and ExpressRoute hybrid solution.
Verify ExpressRoute locations and support.
VPN only:

VPN and ExpressRoute:

Design the Azure network infrastructure


It's critical that Contoso puts networks in place in a way that makes the hybrid deployment secure and scalable.
To do this, Contoso are taking a long-term approach, and are designing virtual networks (VNets) to be resilient
and enterprise ready. Learn more about planning VNets.
To connect the two regions, Contoso has decided to implement a hub-to-hub network model:
Within each region, Contoso will use a hub and spoke model.
To connect networks and hubs, Contoso will use Azure network peering.
Network peering
Azure provides network peering to connect VNets and hubs. Global peering allows connections between
VNets/hubs in different regions. Local peering connects VNets in the same region. VNet peering provides
several advantages:
Network traffic between peered VNets is private.
Traffic between the VNets is kept on the Microsoft backbone network. No public internet, gateways, or
encryption is required in the communication between the VNets.
Peering provides a default, low -latency, high-bandwidth connection between resources in different VNets.
Learn more about network peering.
Hub-to-hub across regions
Contoso will deploy a hub in each region. A hub is a virtual network (VNet) in Azure that acts as a central point
of connectivity to your on-premises network. The hub VNets will connect to each other using global VNet
peering. Global VNet peering connects VNets across Azure regions.
The hub in each region is peered to its partner hub in the other region.
The hub is peered to every network in its region, and can connect to all network resources.

Hub and spoke model within a region


Within each region, Contoso will deploy VNets for different purposes, as spoke networks from the region hub.
VNets within a region use peering to connect to their hub, and to each other.
Design the hub network
Within the hub and spoke model that Contoso has chosen, it needs to think about how traffic from the on-
premises datacenter, and from the internet, will be routed. Here's how Contoso has decided to handle routing
for both the East US 2 and Central US hubs:
Contoso is designing a network known as "reverse c", as this is the path that the packets follow from the
inbound to outbound network.
The network architecture has two boundaries, an untrusted front-end perimeter zone and a back-end trusted
zone.
A firewall will have a network adapter in each zone, controlling access to trusted zones.
From the internet:
Internet traffic will hit a load-balanced public IP address on the perimeter network.
This traffic is routed through the firewall, and subject to firewall rules.
After network access controls are implemented, traffic will be forwarded to the appropriate location in
the trusted zone.
Outbound traffic from the VNet will be routed to the internet using user-defined routes. The traffic is
forced through the firewall, and inspected in line with Contoso policies.
From the Contoso datacenter:
Incoming traffic over VPN site-to-site (or ExpressRoute) hits the public IP address of the Azure VPN
gateway.
Traffic is routed through the firewall and subject to firewall rules.
After applying firewall rules, traffic is forwarded to an internal load balancer (Standard SKU ) on the
trusted internal zone subnet.
Outbound traffic from the trusted subnet to the on-premises datacenter over VPN is routed through
the firewall, and rules applied, before going over the VPN site-to-site connection.
Design and set up Azure networks
With a network and routing topology in place, Contoso is ready to set up Azure networks and subnets.
Contoso will implement a Class A private network in Azure (0.0.0.0 to 127.255.255.255). This works, since
on-premises it currently has a Class B private address space 172.160.0/16 so Contoso can be sure there
won't be any overlap between address ranges.
It's going to deploy VNets in the primary and secondary regions.
Contoso will use a naming convention that includes the prefix VNET and the region abbreviation EUS2 or
CUS. Using this standard, the hub networks will be named VNET-HUB -EUS2 (East US 2), and VNET-
HUB -CUS (Central US ).
Contoso doesn't have an IPAM solution, so it needs to plan for network routing without NAT.
Virtual networks in East US 2
East US 2 is the primary region that Contoso will use to deploy resources and services. Here's how Contoso will
architect networks within it:
Hub: The hub VNet in East US 2 is the central point of primary connectivity to the on-premises datacenter.
VNets: Spoke VNets in East US 2 can be used to isolate workloads if required. In addition to the Hub VNet,
Contoso will have two spoke VNets in East US 2:
VNET-DEV -EUS2. This VNet will provide the development and test team with a fully functional
network for dev projects. It will act as a production pilot area, and will rely on the production
infrastructure to function.
VNET-PROD -EUS2. Azure IaaS production components will be located in this network.
Each VNet will have its own unique address space, with no overlap. Contoso intend to configure
routing without requiring NAT.
Subnets:
There will be a subnet in each network for each app tier
Each subnet in the Production network will have a matching subnet in the Development VNet.
In addition, the Production network has a subnet for domain controllers.
VNets in East US 2 are summarized in the following table.

VNET RANGE PEER

VNET-HUB-EUS2 10.240.0.0/20 VNET-HUB-CUS2, VNET-DEV-EUS2,


VNET-PROD-EUS2

VNET-DEV-EUS2 10.245.16.0/20 VNET-HUB-EUS2

VNET-PROD-EUS2 10.245.32.0/20 VNET-HUB-EUS2, VNET-PROD-CUS

Subnets in the East US 2 Hub network (VNET-HUB-EUS2)


SUBNET/ZONE CIDR **USABLE IP ADDRESSES

IB-UntrustZone 10.240.0.0/24 251

IB-TrustZone 10.240.1.0/24 251

OB-UntrustZone 10.240.2.0/24 251

OB-TrustZone 10.240.3.0/24 251

GatewaySubnets 10.240.10.0/24 251

Subnets in the East US 2 Dev network (VNET-DEV-EUS2)


The Development VNet is used by the development team as a production pilot area. It has three subnets.

SUBNET CIDR ADDRESSES IN SUBNET

DEV-FE-EUS2 10.245.16.0/22 1019 Front-ends/web tier VMs

DEV-APP-EUS2 10.245.20.0/22 1019 App-tier VMs

DEV-DB-EUS2 10.245.24.0/23 507 Database VMs

Subnets in the East US 2 Production network (VNET-PROD-EUS2)


Azure IaaS components are located in the Production network. Each app tier has its own subnet. Subnets match
those in the Development network, with the addition of a subnet for domain controllers.

SUBNET CIDR ADDRESSES IN SUBNET

PROD-FE-EUS2 10.245.32.0/22 1019 Front-ends/web tier VMs

PROD-APP-EUS2 10.245.36.0/22 1019 App-tier VMs

PROD-DB-EUS2 10.245.40.0/23 507 Database VMs

PROD-DC-EUS2 10.245.42.0/24 251 Domain controller VMs


Virtual networks in Central US (secondary region )
Central US is Contoso's secondary region. Here's how Contoso will architect networks within it:
Hub: The hub VNet in East US 2 is the central point of connectivity to the on-premises datacenter, and the
spoke VNets in East US 2 can be used to isolate workloads if required, managed separately from other
spokes.
VNets: Contoso will have two VNets in Central US:
VNET-PROD -CUS. This VNet is a production network, similar to VNET-PROD_EUS2.
VNET-ASR -CUS. This VNet will act as a location in which VMs are created after failover from on-
premises, or as a location for Azure VMs that are failed over from the primary to the secondary
region. This network is similar to the production networks, but without any domain controllers on it.
Each VNet in the region will have its own address space, with no overlap. Contoso will configure
routing without NAT.
Subnets: The subnets will be architected in a similar way to those in East US 2. The exception is that
Contoso doesn't need a subnet for domain controllers.
The VNets in Central US are summarized in the following table.

VNET RANGE PEER

VNET-HUB-CUS 10.250.0.0/20 VNET-HUB-EUS2, VNET-ASR-CUS,


VNET-PROD-CUS

VNET-ASR-CUS 10.255.16.0/20 VNET-HUB-CUS, VNET-PROD-CUS

VNET-PROD-CUS 10.255.32.0/20 VNET-HUB-CUS, VNET-ASR-CUS,


VNET-PROD-EUS2
Subnets in the Central US Hub network (VNET-HUB-CUS)

SUBNET CIDR USABLE IP ADDRESSES

IB-UntrustZone 10.250.0.0/24 251

IB-TrustZone 10.250.1.0/24 251

OB-UntrustZone 10.250.2.0/24 251

OB-TrustZone 10.250.3.0/24 251

GatewaySubnet 10.250.2.0/24 251

Subnets in the Central US Production network (VNET-PROD-CUS)


In parallel with the production network in the primary East US 2 region, there's a production network in the
secondary Central US region.

SUBNET CIDR ADDRESSES IN SUBNET

PROD-FE-CUS 10.255.32.0/22 1019 Front-ends/web-tier VMs

PROD-APP-CUS 10.255.36.0/22 1019 App-tier VMs

PROD-DB-CUS 10.255.40.0/23 507 Database VMs

PROD-DC-CUS 10.255.42.0/24 251 Domain controller VMs

Subnets in the Central US failover/recovery network in Central US (VNET-ASR-CUS)


The VNET-ASR -CUS network is used for purposes of failover between regions. Site Recovery will be used to
replicate and fail over Azure VMs between the regions. It also functions as a Contoso datacenter to Azure
network for protected workloads that remain on-premises, but fail over to Azure for disaster recovery.
VNET-ASR -CUS is the same basic subnet as the production VNet in East US 2, but without the need for a
domain controller subnet.

SUBNET CIDR ADDRESSES IN SUBNET

ASR-FE-CUS 10.255.16.0/22 1019 Front-ends/web-tier VMs

ASR-APP-CUS 10.255.20.0/22 1019 App-tier VMs


SUBNET CIDR ADDRESSES IN SUBNET

ASR-DB-CUS 10.255.24.0/23 507 Database VMs

Configure peered connections


The hub in each region will be peered to the hub in the other region, and to all VNets within the hub region. This
allows for hubs to communicate, and to view all VNets within a region. Note that:
Peering creates a two-sided connection. One from the initiating peer on the first VNet, and another one on
the second VNet.
In a hybrid deployment, traffic that passes between peers needs to be visible from the VPN connection
between the on-premises datacenter and Azure. To enable this, there are some specific settings that must be
set on peered connections.
For any connections from spoke VNets through the hub to the on-premises datacenter, Contoso needs to allow
traffic to be forwarded, and transverse the VPN gateways.
Do m ai n c o n t r o l l er

For the domain controllers in the VNET-PROD -EUS2 network, Contoso wants traffic to flow both between the
EUS2 hub/production network, and over the VPN connection to on-premises. To do this it Contoso admins
must allow the following:
1. Allow forwarded traffic and Allow gateway transit configurations on the peered connection. In our
example this would be the VNET-HUB -EUS2 to VNET-PROD -EUS2 connection.
2. Allow forwarded traffic and Use remote gateways on the other side of the peering, on the VNET-
PROD -EUS2 to VNET-HUB -EUS2 connection.

3. On-premises they'll set up a static route that directs the local traffic to route across the VPN tunnel to the
VNet. The configuration would be completed on the gateway that provides the VPN tunnel from Contoso
to Azure. They use RRAS for this.

Pr o du c t i o n n et w o r ks

A spoked peer network can't see a spoked peer network in another region via a hub.
For Contoso's production networks in both regions to see each other, Contoso admins need to create a direct
peered connection for VNET-PROD -EUS2 and VENT-PROD -CUS.
Set up DNS
When you deploy resources in virtual networks, you have a couple of choices for domain name resolution. You
can use name resolution provided by Azure, or provide DNS servers for resolution. The type of name resolution
you use depends on how your resources need to communicate with each other. Get more information about the
Azure DNS service.
Contoso admins have decided that the Azure DNS service isn't a good choice in the hybrid environment.
Instead, they will use the on-premises DNS servers.
Since this is a hybrid network all the VMs on-premises and in Azure need to be able to resolve names to
function properly. This means that custom DNS settings must be applied to all the VNets.
Contoso currently has DCs deployed in the Contoso datacenter and at the branch offices. The primary
DNS servers are CONTOSODC1(172.16.0.10) and CONTOSODC2(172.16.0.1)
When the VNets are deployed, the on-premises domain controllers will be set to be used as DNS servers
in the networks.
To configure this, when using custom DNS on the VNet, Azure's recursive resolvers IP address (such as
168.63.129.16) must be added to the DNS list. To do this, Contoso configures DNS server settings on
each VNet. For example, the custom DNS settings for the VNET-HUB -EUS2 network would be as
follows:

In addition to the on-premises domain controllers, Contoso are going to implement four more to support the
Azure networks, two for each region. Here's what Contoso will deploy in Azure.

REGION DC VNET SUBNET IP ADDRESS

EUS2 CONTOSODC3 VNET-PROD-EUS2 PROD-DC-EUS2 10.245.42.4


REGION DC VNET SUBNET IP ADDRESS

EUS2 CONTOSODC4 VNET-PROD-EUS2 PROD-DC-EUS2 10.245.42.5

CUS CONTOSODC5 VNET-PROD-CUS PROD-DC-CUS 10.255.42.4

CUS CONTOSODC6 VNET-PROD-CUS PROD-DC-CUS 10.255.42.4

After deploying the on-premises domain controllers, Contoso needs to update the DNS settings on networks on
either region to include the new domain controllers in the DNS server list.
Set up domain controllers in Azure
After updating network settings, Contoso admins are ready to build out the domain controllers in Azure.
1. In the Azure portal, they deploy a new Windows Server VM to the appropriate VNet.
2. They create availability sets in each location for the VM. Availability sets do the following:
Ensure that the Azure fabric separates the VMs into different infrastructures in the Azure Region.
Allows Contoso to be eligible for the 99.95% SLA for VMs in Azure. Learn more.

3. After the VM is deployed, they open the network interface for the VM. They set the private IP address to
static, and specify a valid address.
4. Now, they attach a new data disk to the VM. This disk contains the Active Directory database, and the
sysvol share.
The size of the disk will determine the number of IOPS that it supports.
Over time the disk size might need to increase as the environment grows.
The drive shouldn't be set to Read/Write for host caching. Active Directory databases don't support
this.

5. After the disk is added, they connect to the VM over Remote Desktop, and open Server Manager.
6. Then in File and Storage Services, they run the New Volume Wizard, ensuring that the drive is given
the letter F: or above on the local VM.
7. In Server Manager, they add the Active Directory Domain Services role. Then, they configure the VM
as a domain controller.

8. After the VM is configured as a DC and rebooted, they open DNS Manager and configure the Azure
DNS resolver as a forwarder. This allows the DC to forward DNS queries it can't resolve in the Azure
DNS.

9. Now, they update the custom DNS settings for each VNet with the appropriate domain controller for the
VNet region. They include on-premises DCs in the list.
Set up Active Directory
Active Directory is a critical service in networking, and must be configured correctly. Contoso admins will build
Active Directory sites for the Contoso datacenter, and for the EUS2 and CUS regions.
1. They create two new sites (AZURE -EUS2, and AZURE -CUS ) along with the datacenter site
(ContosoDatacenter).
2. After creating the sites, they create subnets in the sites, to match the VNets and datacenter.
3. Then, they create two site links to connect everything. The domain controllers should then be moved to
their location.

4. After everything is configured, the Active Directory replication topology is in place.

5. With everything complete, a list of the domain controllers and sites are shown in the on-premises Active
Directory Administrative Center.
Step 5: Plan for governance
Azure provides a range of governance controls across services and the Azure platform. For more information,
see the Azure governance options.
As they configure identity and access control, Contoso has already begun to put some aspects of governance
and security in place. Broadly, there are three areas it needs to consider:
Policy: Azure Policy applies and enforces rules and effects over your resources, so that resources stay
compliant with corporate requirements and SLAs.
Locks: Azure allows you to lock subscriptions, resources groups, and other resources, so that they can be
modified only by those with authority to do so.
Tags: Resources can be controlled, audited, and managed with tags. Tags attach metadata to resources,
providing information about resources or owners.
Set up policies
The Azure Policy service evaluates your resources, scanning for those not compliant with the policy definitions
you have in place. For example, you might have a policy that only allows certain types of VMs, or requires
resources to have a specific tag.
Policies specify a policy definition, and a policy assignment specifies the scope in which a policy should be
applied. The scope can range from a management group to a resource group. Learn about creating and
managing policies.
Contoso wants to get started with a couple of policies:
It wants a policy to ensure that resources can be deployed in the EUS2 and CUS regions only.
It wants to limit VM SKUs to approved SKUs only. The intention is to ensure that expensive VM SKUs aren't
used.
Limit resources to regions
Contoso uses the built-in policy definition Allowed locations to limit resource regions.
1. In the Azure portal, select All services, and search for Policy.
2. Select Assignments > Assign policy.
3. In the policy list, select Allowed locations.
4. Set Scope to the name of the Azure subscription, and select the two regions in the allowed list.
5. By default the policy is set with Deny, meaning that if someone starts a deployment in the subscription
that isn't in EUS2 or CUS, the deployment will fail. Here's what happens if someone in the Contoso
subscription tries to set up a deployment in West US.

Allow specific VM SKUs


Contoso will use the built-in policy definition Allow virtual machines SKUs to limit the type of VMs that can
be created in the subscription.

Check policy compliance


Policies go into effect immediately, and Contoso can check resources for compliance.
1. In the Azure portal, select the Compliance link.
2. The compliance dashboard appears. You can drill down for further details.

Set up locks
Contoso has long been using the ITIL framework for the management of its systems. One of the most
important aspects of the framework is change control, and Contoso wants to make sure that change control is
implemented in the Azure deployment.
Contoso is going to implement locks as follows:
Any production or failover component must be in a resource group that has a ReadOnly lock. This means
that to modify or delete production items, the lock must be removed.
Nonproduction resource groups will have CanNotDelete locks. This means that authorized users can read or
modify a resource, but cannot delete it.
Learn more about locks.
Set up tagging
To track resources as they're added, it will be increasingly important for Contoso to associate resources with an
appropriate department, customer, and environment.
In addition to providing information about resources and owners, tags will enable Contoso to aggregate and
group resources, and to use that data for chargeback purposes.
Contoso needs to visualize its Azure assets in a way that makes sense for the business. For example by role or
department. Note that resources don't need to reside in the same resource group to share a tag. Contoso will
create a simple tag taxonomy so that everyone uses the same tags.

TAG NAME VALUE

CostCenter 12345: It must be a valid cost center from SAP.

BusinessUnit Name of business unit (from SAP). Matches CostCenter.

ApplicationTeam Email alias of the team that owns support for the app.

CatalogName Name of the app or ShareServices, per the service catalog


that the resource supports.

ServiceManager Email alias of the ITIL Service Manager for the resource.

COBPriority Priority set by the business for BCDR. Values of 1-5.


TAG NAME VALUE

ENV DEV, STG, PROD are the possible values. Representing


developing, staging, and production.

For example:

After creating the tag, Contoso will go back and create new policy definitions and assignments, to enforce the
use of the required tags across the organization.

Step 6: Consider security


Security is crucial in the cloud, and Azure provides a wide array of security tools and capabilities. These help you
to create secure solutions, on the secure Azure platform. Read Confidence in the trusted cloud to learn more
about Azure security.
There are a few aspects for Contoso to consider:
Azure Security Center: Azure Security Center provides unified security management and advanced threat
protection across hybrid cloud workloads. With Security Center, you can apply security policies across your
workloads, limit your exposure to threats, and detect and respond to attacks. Learn more.
Network security groups (NSGs): An NSG is a filter (firewall) that contains a list of security rules which,
when applied, allow or deny network traffic to resources connected to Azure VNets. Learn more.
Data encryption: Azure Disk Encryption is a capability that helps you encrypt your Windows and Linux
IaaS virtual machine disks. Learn more.
Work with the Azure Security Center
Contoso is looking for a quick view into the security posture of its new hybrid cloud, and specifically its Azure
workloads. As a result, Contoso has decided to implement Azure Security Center starting with the following
features:
Centralized policy management
Continuous assessment
Actionable recommendations
Centralize policy management
With centralized policy management, Contoso will ensure compliance with security requirements by centrally
managing security policies across the entire environment. It can simply and quickly implement a policy which
applies to all of its Azure resources.

Assess and action


Contoso will take advantage of the continuous security assessment which monitors the security of machines,
networks, storage, data, and applications; to discover potential security issues.
Security Center will analyze the security state of Contoso's compute, infrastructure, and data resources, and
of Azure apps and services.
Continuous assessment helps the Contoso operations team to discover potential security issues, such as
systems with missing security updates or exposed network ports.
In particular Contoso wants to make sure all of the VMs are protected. Security Center helps with this,
verifying VM health, and making prioritized and actionable recommendations to remediate security
vulnerabilities before they're exploited.
Work with NSGs
Contoso can limit network traffic to resources in a virtual network using network security groups.
A network security group contains a list of security rules that allow or deny inbound or outbound network
traffic based on source or destination IP address, port, and protocol.
When applied to a subnet, rules are applied to all resources in the subnet. In addition to network interfaces,
this includes instances of Azure services deployed in the subnet.
Application security groups (ASGs) enable you to configure network security as a natural extension of an app
structure, allowing you to group VMs and define network security policies based on those groups.
Application security groups mean that Contoso can reuse the security policy at scale, without manual
maintenance of explicit IP addresses. The platform handles the complexity of explicit IP addresses and
multiple rule sets, allowing you to focus on your business logic.
Contoso can specify an application security group as the source and destination in a security rule.
After a security policy is defined, Contoso can create VMs, and assign the VM NICs to a group.
Contoso will implement a mix of NSGs and ASGs. Contoso is concerned about NSG management. It's also
worried about the overuse of NSGs, and the added complexity for operations staff. Here's what Contoso will do:
All traffic into and out of all subnets (north-south), will be subject to an NSG rule, except for the
GatewaySubnets in the Hub networks.
Any firewalls or domain controller will be protected by both subnet NSGs and NIC NSGs.
All production applications will have ASGs applied.
Contoso has built a model of how this will look for its applications.
The NSGs associated with the ASGs will be configured with least privilege to ensure that only allowed packets
can flow from one part of the network to its destination.

ACTION NAME SOURCE TARGET PORT

Allow AllowInternetToFE VNET-HUB-EUS1/IB- APP1-FE 80, 443


TrustZone

Allow AllowWebToApp APP1-FE APP1-APP 80, 443

Allow AllowAppToDB APP1-APP APP1-DB 1433

Deny DenyAllInbound Any Any Any

Encrypt data
Azure Disk Encryption integrates with Azure Key Vault to help control and manage the disk-encryption keys and
secrets in a Key Vault subscription. It ensures that all data on VM disks are encrypted at rest in Azure storage.
Contoso has determined that specific VMs require encryption.
Contoso will apply encryption to VMs with customer, confidential, or PPI data.

Conclusion
In this article, Contoso set up an Azure infrastructure and policy for Azure subscription, hybrid identify, disaster
recovery, networking, governance, and security.
Not all of the steps that Contoso completed here are required for a migration to the cloud. In this case, it wanted
to plan a network infrastructure that can be used for all types of migrations, and is secure, resilient, and scalable.
With this infrastructure in place, Contoso is ready to move on and try out migration.

Next steps
After setting up their Azure infrastructure, Contoso is ready to begin migrating workloads to the cloud. See the
migration patterns and examples overview section for a selection of scenarios using this sample infrastructure
as a migration target.
Rehost an on-premises app on Azure VMs
12 minutes to read • Edit Online

This article demonstrates how the fictional company Contoso rehosts a two-tier Windows .NET front-end app
running on VMware VMs, by migrating the app VMs to Azure VMs.
The SmartHotel360 app used in this example is provided as open source. If you'd like to use it for your own testing
purposes, you can download it from GitHub.

Business drivers
The IT Leadership team has worked closely with business partners to understand what they want to achieve with
this migration:
Address business growth. Contoso is growing, and as a result there is pressure on their on-premises systems
and infrastructure.
Limit risk. The SmartHotel360 app is critical for the Contoso business. It wants to move the app to Azure with
zero risk.
Extend. Contoso doesn't want to modify the app, but does want to ensure that it's stable.

Migration goals
The Contoso cloud team has pinned down goals for this migration. These goals are used to determine the best
migration method:
After migration, the app in Azure should have the same performance capabilities as it does today in VMware.
The app will remain as critical in the cloud as it is on-premises.
Contoso doesn't want to invest in this app. It is important to the business, but in its current form Contoso
simply wants to move it safely to the cloud.
Contoso doesn't want to change the ops model for this app. Contoso do want to interact with it in the cloud in
the same way that they do now.
Contoso doesn't want to change any app functionality. Only the app location will change.

Solution design
After pinning down goals and requirements, Contoso designs and review a deployment solution, and identifies the
migration process, including the Azure services that Contoso will use for the migration.
Current app
The app is tiered across two VMs (WEBVM and SQLVM ).
The VMs are located on VMware ESXi host contosohost1.contoso.com (version 6.5).
The VMware environment is managed by vCenter Server 6.5 (vcenter.contoso.com ), running on a VM.
Contoso has an on-premises datacenter (contoso-datacenter), with an on-premises domain controller
(contosodc1).
Proposed architecture
Since the app is a production workload, the app VMs in Azure will reside in the production resource group
ContosoRG.
The app VMs will be migrated to the primary Azure region (East US 2) and placed in the production network
(VNET-PROD -EUS2).
The web front-end VM will reside in the front-end subnet (PROD -FE -EUS2) in the production network.
The database VM will reside in the database subnet (PROD -DB -EUS2) in the production network.
The on-premises VMs in the Contoso datacenter will be decommissioned after the migration is done.

Database considerations
As part of the solution design process, Contoso did a feature comparison between Azure SQL Database and SQL
Server. The following considerations helped them to decide to go with SQL Server running on an Azure IaaS VM:
Using an Azure VM running SQL Server seems to be an optimal solution if Contoso needs to customize the
operating system or the database server, or if it might want to colocate and run third-party apps on the same
VM.
With Software Assurance, in future Contoso can exchange existing licenses for discounted rates on a SQL
Database Managed Instance using the Azure Hybrid Benefit for SQL Server. This can save up to 30% on
Managed Instance.
Solution review
Contoso evaluates the proposed design by putting together a pros and cons list.

CONSIDERATION DETAILS

Pros Both the app VMs will be moved to Azure without changes,
making the migration simple.

Since Contoso is using a lift and shift approach for both app
VMs, no special configuration or migration tools are needed
for the app database.

Contoso can take advantage of their investment in Software


Assurance, using the Azure Hybrid Benefit.

Contoso will retain full control of the app VMs in Azure.


CONSIDERATION DETAILS

Cons WEBVM and SQLVM are running Windows Server 2008 R2.
The operating system is supported by Azure for specific roles
(July 2018). Learn more.

The web and data tiers of the app will remain a single point of
failover.

SQLVM is running on SQL Server 2008 R2 which isn't in


mainstream support. However it is supported for Azure VMs
(July 2018). Learn more.

Contoso will need to continue supporting the app as Azure


VMs rather than moving to a managed service such as Azure
App Service and Azure SQL Database.

Migration process
Contoso will migrate the app front-end and database VMs to Azure VMs with the Azure Migrate Server Migration
tool agentless method.
As a first step, Contoso prepares and sets up Azure components for Azure Migrate Server Migration, and
prepares the on-premises VMware infrastructure.
They already have the Azure infrastructure in place, so Contoso just needs to add configure the replication of
the VMs through the Azure Migrate Server Migration tool.
With everything prepared, Contoso can start replicating the VMs.
After replication is enabled and working, Contoso will migrate the VM by failing it over to Azure.

Azure services
SERVICE DESCRIPTION COST

Azure Migrate Server Migration The service orchestrates and manages During replication to Azure, Azure
migration of your on-premises apps Storage charges are incurred. Azure
and workloads, and AWS/GCP VM VMs are created, and incur charges,
instances. when failover occurs. Learn more about
charges and pricing.

Prerequisites
Here's what Contoso needs to run this scenario.
REQUIREMENTS DETAILS

Azure subscription Contoso created subscriptions in an earlier article in this series.


If you don't have an Azure subscription, create a free account.

If you create a free account, you're the administrator of your


subscription and can perform all actions.

If you use an existing subscription and you're not the


administrator, you need to work with the admin to assign you
Owner or Contributor permissions.

If you need more granular permissions, review this article.

Azure infrastructure Learn how Contoso set up an Azure infrastructure.

Learn more about specific prerequisites requirements for


Azure Migrate Server Migration.

On-premises servers On-premises vCenter Servers should be running version 5.5,


6.0, or 6.5

ESXi hosts should run version 5.5, 6.0 or 6.5

One or more VMware VMs should be running on the ESXi


host.

Scenario steps
Here's how Contoso admins will run the migration:
Step 1: Prepare Azure for Azure Migrate Server Migration. They add the Server Migration tool to their
Azure Migrate project.
Step 2: Prepare on-premises VMware for Azure Migrate Server Migration. They prepare accounts for
VM discovery, and prepare to connect to Azure VMs after failover.
Step 3: Replicate VMs. They set up replication, and start replicating VMs to Azure storage.
Step 4: Migrate the VMs with Azure Migrate Server Migration. They run a test failover to make sure
everything's working, and then run a full failover to migrate the VMs to Azure.

Step 1: Prepare Azure for the Azure Migrate Server Migration tool
Here are the Azure components Contoso needs to migrate the VMs to Azure:
A VNet in which Azure VMs will be located when they're created during failover.
The Azure Migrate Server Migration tool provisioned.
They set these up as follows:
1. Set up a network-Contoso already set up a network that can be for Azure Migrate Server Migration when
they deployed the Azure infrastructure
The SmartHotel360 app is a production app, and the VMs will be migrated to the Azure production
network (VNET-PROD -EUS2) in the primary East US 2 region.
Both VMs will be placed in the ContosoRG resource group, which is used for production resources.
The app front-end VM (WEBVM ) will migrate to the front-end subnet (PROD -FE -EUS2), in the
production network.
The app database VM (SQLVM ) will migrate to the database subnet (PROD -DB -EUS2), in the
production network.
2. Provision the Azure Migrate Server Migration tool-With the network and storage account in place, Contoso
now creates a Recovery Services vault (ContosoMigrationVault), and places it in the ContosoFailoverRG
resource group in the primary East US 2 region.

Need more help?


Learn about setting up Azure Migrate Server Migration tool.
Prepare to connect to Azure VMs after failover
After failover, Contoso wants to connect to the Azure VMs. To do this, Contoso admins do the following before
migration:
1. For access over the internet, they:
Enable RDP on the on-premises VM before failover.
Ensure that TCP and UDP rules are added for the Public profile.
Check that RDP is allowed in Windows Firewall > Allowed Apps for all profiles.
2. For access over site-to-site VPN, they:
Enable RDP on the on-premises machine.
Allow RDP in the Windows Firewall -> Allowed apps and features, for Domain and Private
networks.
Set the operating system's SAN policy on the on-premises VM to OnlineAll.
In addition, when they run a failover they need to check the following:
There should be no Windows updates pending on the VM when triggering a failover. If there are, they won't be
able to log into the VM until the update completes.
After failover, they can check Boot diagnostics to view a screenshot of the VM. If this doesn't work, they
should verify that the VM is running, and review these troubleshooting tips.
Need more help?
Learn about preparing VMs for migration

Step 3: Replicate the on-premises VMs


Before Contoso admins can run a migration to Azure, they need to set up and enable replication.
With discovery completed, you can begin replication of VMware VMs to Azure.
1. In the Azure Migrate project > Servers, Azure Migrate: Server Migration, click Replicate.
2. In Replicate, > Source settings > Are your machines virtualized?, select Yes, with VMware vSphere.
3. In On-premises appliance, select the name of the Azure Migrate appliance that you set up > OK.

4. In Virtual machines, select the machines you want to replicate.


If you've run an assessment for the VMs, you can apply VM sizing and disk type (premium/standard)
recommendations from the assessment results. To do this, in Import migration settings from an
Azure Migrate assessment?, select the Yes option.
If you didn't run an assessment, or you don't want to use the assessment settings, select the No options.
If you selected to use the assessment, select the VM group, and assessment name.
5. In Virtual machines, search for VMs as needed, and check each VM you want to migrate. Then click Next:
Target settings.
6. In Target settings, select the subscription, and target region to which you'll migrate, and specify the
resource group in which the Azure VMs will reside after migration. In Virtual Network, select the Azure
VNet/subnet to which the Azure VMs will be joined after migration.
7. In Azure Hybrid Benefit, select the following:
Select No if you don't want to apply Azure Hybrid Benefit. Then click Next.
Select Yes if you have Windows Server machines that are covered with active Software Assurance or
Windows Server subscriptions, and you want to apply the benefit to the machines you're migrating. Then
click Next.
8. In Compute, review the VM name, size, OS disk type, and availability set. VMs must conform with Azure
requirements.
VM size: If you're using assessment recommendations, the VM size dropdown will contain the
recommended size. Otherwise Azure Migrate picks a size based on the closest match in the Azure
subscription. Alternatively, pick a manual size in Azure VM size.
OS disk: Specify the OS (boot) disk for the VM. The OS disk is the disk that has the operating system
bootloader and installer.
Availability set: If the VM should be in an Azure availability set after migration, specify the set. The set
must be in the target resource group you specify for the migration.
9. In Disks, specify whether the VM disks should be replicated to Azure, and select the disk type (standard
SSD/HDD or premium-managed disks) in Azure. Then click Next.
You can exclude disks from replication.
If you exclude disks, won't be present on the Azure VM after migration.
10. In Review and start replication, review the settings, and click Replicate to start the initial replication for
the servers.

NOTE
You can update replication settings any time before replication starts, in Manage > Replicating machines. Settings can't be
changed after replication starts.

Step 4: Migrate the VMs


Contoso admins run a quick test failover, and then a full failover to migrate the VMs.
Run a test failover
1. In Migration goals > Servers > Azure Migrate: Server Migration, click Test migrated servers.
2. Right-click the VM to test, and click Test migrate.

3. In Test Migration, select the Azure VNet in which the Azure VM will be located after the migration. We
recommend you use a nonproduction VNet.
4. The Test migration job starts. Monitor the job in the portal notifications.
5. After the migration finishes, view the migrated Azure VM in Virtual Machines in the Azure portal. The
machine name has a suffix -Test.
6. After the test is done, right-click the Azure VM in Replicating machines, and click Clean up test
migration.

Migrate the VMs


Now Contoso admins run a full failover to complete the migration.
1. In the Azure Migrate project > Servers > Azure Migrate: Server Migration, click Replicating servers.

2. In Replicating machines, right-click the VM > Migrate.


3. In Migrate > Shut down virtual machines and perform a planned migration with no data loss,
select Yes > OK.
By default Azure Migrate shuts down the on-premises VM, and runs an on-demand replication to
synchronize any VM changes that occurred since the last replication occurred. This ensures no data loss.
If you don't want to shut down the VM, select No
4. A migration job starts for the VM. Track the job in Azure notifications.
5. After the job finishes, you can view and manage the VM from the Virtual Machines page.
Need more help?
Learn about running a test failover.
Learn about migrating VMs to Azure.

Clean up after migration


With migration complete, the SmartHotel360 app tiers are now running on Azure VMs.
Now, Contoso needs to complete these cleanup steps:
After the migration is complete, stop replication.
Remove the WEBVM machine from the vCenter inventory.
Remove the SQLVM machine from the vCenter inventory.
Remove WEBVM and SQLVM from local backup jobs.
Update internal documentation to show the new location, and IP addresses for the VMs.
Review any resources that interact with the VMs, and update any relevant settings or documentation to reflect
the new configuration.

Review the deployment


With the app now running, Contoso now needs to fully operationalize and secure it in Azure.
Security
The Contoso security team reviews the Azure VMs, to determine any security issues.
To control access, the team reviews the network security groups (NSGs) for the VMs. NSGs are used to ensure
that only traffic allowed to the app can reach it.
The team also consider securing the data on the disk using Azure Disk Encryption and Key Vault.
For more information, see Security best practices for IaaS workloads in Azure.

BCDR
For business continuity and disaster recovery (BCDR ), Contoso takes the following actions:
Keep data safe: Contoso backs up the data on the VMs using the Azure Backup service. Learn more.
Keep apps up and running: Contoso replicates the app VMs in Azure to a secondary region using Site Recovery.
Learn more.
Licensing and cost optimization
1. Contoso has existing licensing for their VMs, and will take advantage of the Azure Hybrid Benefit. Contoso will
convert the existing Azure VMs, to take advantage of this pricing.
2. Contoso will enable Azure Cost Management licensed by Cloudyn, a Microsoft subsidiary. It's a multicloud cost
management solution that helps to use and manage Azure and other cloud resources. Learn more about Azure
Cost Management.

Conclusion
In this article, Contoso rehosted the SmartHotel360 app in Azure by migrating the app VMs to Azure VMs using
the Azure Migrate Server Migration tool.
Rearchitect an on-premises app to an Azure
container and Azure SQL Database
20 minutes to read • Edit Online

This article demonstrates how the fictional company Contoso rearchitects a two-tier Windows .NET app running
on VMware VMs as part of a migration to Azure. Contoso migrates the app front-end VM to an Azure Windows
container, and the app database to an Azure SQL database.
The SmartHotel360 app used in this example is provided as open source. If you'd like to use it for your own testing
purposes, you can download it from GitHub.

Business drivers
The Contoso IT leadership team has worked closely with business partners to understand what they want to
achieve with this migration:
Address business growth. Contoso is growing, and as a result there is pressure on its on-premises systems
and infrastructure.
Increase efficiency. Contoso needs to remove unnecessary procedures, and streamline processes for
developers and users. The business needs IT to be fast and not waste time or money, thus delivering faster on
customer requirements.
Increase agility. Contoso IT needs to be more responsive to the needs of the business. It must be able to react
faster than the changes in the marketplace, to enable the success in a global economy. It mustn't get in the way,
or become a business blocker.
Scale. As the business grows successfully, Contoso IT must provide systems that are able to grow at the same
pace.
Reduce costs. Contoso wants to minimize licensing costs.

Migration goals
The Contoso cloud team has pinned down goals for this migration. These goals were used to determine the best
migration method.

GOALS DETAILS
GOALS DETAILS

App reqs The app in Azure will remain as critical as it is today.

It should have the same performance capabilities as it


currently does in VMware.

Contoso wants to stop supporting Windows Server 2008 R2,


on which the app currently runs, and are willing to invest in
the app.

Contoso wants to move away from SQL Server 2008 R2 to a


modern PaaS Database platform, which will minimize the need
for management.

Contoso wants to take advantage of its investment in SQL


Server licensing and Software Assurance where possible.

Contoso wants to be able to scale up the app web tier.

Limitations The app consists of an ASP.NET app and a WCF service


running on the same VM. Contoso wants to split this across
two web apps using Azure App Service.

Azure reqs Contoso wants to move the app to Azure, and run it in a
container to extend app life. It doesn't want to start
completely from scratch to implement the app in Azure.

DevOps Contoso wants to move to a DevOps model using Azure


DevOps Services for code builds and release pipeline.

Solution design
After pinning down goals and requirements, Contoso designs and review a deployment solution, and identifies the
migration process, including the Azure services that Contoso will use for the migration.
Current app
The SmartHotel360 on-premises app is tiered across two VMs (WEBVM and SQLVM ).
The VMs are located on VMware ESXi host contosohost1.contoso.com (version 6.5)
The VMware environment is managed by vCenter Server 6.5 (vcenter.contoso.com ), running on a VM.
Contoso has an on-premises datacenter (contoso-datacenter), with an on-premises domain controller
(contosodc1).
The on-premises VMs in the Contoso datacenter will be decommissioned after the migration is done.
Proposed architecture
For the database tier of the app, Contoso compared Azure SQL Database with SQL Server using this article.
It decided to go with Azure SQL Database for a few reasons:
Azure SQL Database is a relational-database managed service. It delivers predictable performance at
multiple service levels, with near-zero administration. Advantages include dynamic scalability with no
downtime, built-in intelligent optimization, and global scalability and availability.
Contoso uses the lightweight Data Migration Assistant (DMA) to assess and migrate the on-premises
database to Azure SQL.
With Software Assurance, Contoso can exchange its existing licenses for discounted rates on a SQL
Database, using the Azure Hybrid Benefit for SQL Server. This could provide savings of up to 30%.
SQL Database provides several security features including always encrypted, dynamic data masking, and
row -level security/threat detection.
For the app web tier, Contoso has decided convert it to the Windows Container using Azure DevOps
services.
Contoso will deploy the app using Azure Service Fabric, and pull the Windows container image from the
Azure Container Registry (ACR ).
A prototype for extending the app to include sentiment analysis will be implemented as another service
in Service Fabric, connected to Cosmos DB. This will read information from Tweets, and display on the
app.
To implement a DevOps pipeline, Contoso will use Azure DevOps for source code management (SCM ), with
Git repos. Automated builds and releases will be used to build code, and deploy it to the Azure Container
Registry and Azure Service Fabric.

Solution review
Contoso evaluates the proposed design by putting together a pros and cons list.

CONSIDERATION DETAILS
CONSIDERATION DETAILS

Pros The SmartHotel360 app code will need to be altered for


migration to Azure Service Fabric. However, the effort is
minimal, using the Service Fabric SDK tools for the changes.

With the move to Service Fabric, Contoso can start to develop


microservices to add to the application quickly over time,
without risk to the original code base.

Windows Containers offer the same benefits as containers in


general. They improve agility, portability, and control.

Contoso can take advantage of its investment in Software


Assurance using the Azure Hybrid Benefit for both SQL Server
and Windows Server.

After the migration it will no longer need to support Windows


Server 2008 R2. Learn more.

Contoso can configure the web tier of the app with multiple
instances, so that it's no longer a single point of failure.

It will no longer depend on the aging SQL Server 2008 R2.

SQL Database supports Contoso's technical requirements.


Contoso admins assessed the on-premises database using the
Data Migration Assistant and found it compatible.

SQL Database has built-in fault tolerance that Contoso doesn't


need to set up. This ensures that the data tier is no longer a
single point of failover.

Cons Containers are more complex than other migration options.


The learning curve on containers could be an issue for
Contoso. They introduce a new level of complexity that
provides a lot of value in spite of the curve.

The operations team at Contoso will need to ramp up to


understand and support Azure, containers and microservices
for the app.

If Contoso uses the Data Migration Assistant instead of Azure


Database Migration Service to migrate the database, it won't
have the infrastructure ready for migrating databases at scale.

Migration process
1. Contoso provisions the Azure service fabric cluster for Windows.
2. It provisions an Azure SQL instance, and migrates the SmartHotel360 database to it.
3. Contoso converts the Web tier VM to a Docker container using the Service Fabric SDK tools.
4. It connects the service fabric cluster and the ACR, and deploys the app using Azure service fabric.
Azure services
SERVICE DESCRIPTION COST

Data Migration Assistant (DMA) Assesses and detect compatibility issues It's a downloadable tool free of charge.
that might affect database functionality
in Azure. DMA assesses feature parity
between SQL sources and targets, and
recommends performance and reliability
improvements.

Azure SQL Database Provides an intelligent, fully managed Cost based on features, throughput and
relational cloud database service. size. Learn more.

Azure Container Registry Stores images for all types of container Cost based on features, storage, and
deployments. usage duration. Learn more.

Azure Service Fabric Builds and operate always-on, scalable Cost based on size, location, and
and distributed apps duration of the compute nodes. Learn
more.

Azure DevOps Provides a continuous integration and


continuous deployment (CI/CD) pipeline
for app development. The pipeline
starts with a Git repository for
managing app code, a build system for
producing packages and other build
artifacts, and a Release Management
system to deploy changes in dev, test,
and production environments.

Prerequisites
Here's what Contoso needs to run this scenario:

REQUIREMENTS DETAILS
REQUIREMENTS DETAILS

Azure subscription Contoso created subscriptions earlier in this article series. If


you don't have an Azure subscription, create a free account.

If you create a free account, you're the administrator of your


subscription and can perform all actions.

If you use an existing subscription and you're not the


administrator, you need to work with the admin to assign you
Owner or Contributor permissions.

Azure infrastructure Learn how Contoso previously set up an Azure infrastructure.

Developer prerequisites Contoso needs the following tools on a developer workstation:

- Visual Studio 2017 Community Edition: Version 15.5

- .NET workload enabled.

- Git

- Service Fabric SDK v 3.0 or later

- Docker CE (Windows 10) or Docker EE (Windows Server) set


to use Windows Containers.

Scenario steps
Here's how Contoso runs the migration:
Step 1: Provision a SQL Database instance in Azure. Contoso provisions a SQL instance in Azure. After the
front-end web VM is migrated to an Azure container, the container instance with the app web front-end will
point to this database.
Step 2: Create an Azure Container Registry (ACR). Contoso provisions an enterprise container registry for
the docker container images.
Step 3: Provision Azure Service Fabric. It provisions a Service Fabric Cluster.
Step 4: Manage service fabric certificates. Contoso sets up certificates for Azure DevOps Services access to
the cluster.
Step 5: Migrate the database with DMA. It migrates the app database with the Data Migration Assistant.
Step 6: Set up Azure DevOps Services. Contoso sets up a new project in Azure DevOps Services, and
imports the code into the Git Repo.
Step 7: Convert the app. Contoso converts the app to a container using Azure DevOps and SDK tools.
Step 8: Set up build and release. Contoso sets up the build and release pipelines to create and publish the
app to the ACR and Service Fabric Cluster.
Step 9: Extend the app. After the app is public, Contoso extends it to take advantage of Azure capabilities, and
republishes it to Azure using the pipeline.

Step 1: Provision an Azure SQL Database


Contoso admins provision an Azure SQL database.
1. They select to create a SQL Database in Azure.
2. They specify a database name to match the database running on the on-premises VM
(SmartHotel.Registration). They place the database in the ContosoRG resource group. This is the resource
group they use for production resources in Azure.

3. They set up a new SQL Server instance (sql-smarthotel-eus2) in the primary region.
4. They set the pricing tier to match server and database needs. And they select to save money with Azure
Hybrid Benefit because they already have a SQL Server license.
5. For sizing they use v-Core-based purchasing, and set the limits for the expected requirements.

6. Then they create the database instance.


7. After the instance is created, they open the database, and note details they need when they use the Data
Migration Assistant for migration.

Need more help?


Get help provisioning a SQL Database.
Learn about v-Core resource limits.

Step 2: Create an ACR and provision an Azure Container


The Azure container is created using the exported files from the Web VM. The container is housed in the Azure
Container Registry (ACR ).
1. Contoso admins create a Container Registry in the Azure portal.

2. They provide a name for the registry ( contosoacreus2), and place it in the primary region, in the resource
group they use for their infrastructure resources. They enable access for admin users, and set it as a
premium SKU so that they can use geo-replication.
Step 3: Provision Azure Service Fabric
The SmartHotel360 container will run in the Azure Service Fabric Cluster. Contoso admins create the Service
Fabric Cluster as follows:
1. Create a Service Fabric resource from the Azure Marketplace.
2. In Basics, they provide a unique DS name for the cluster, and credentials for accessing the on-premises VM.
They place the resource in the production resource group (ContosoRG) in the primary East US 2 region.

3. In Node type configuration, they input a node type name, durability settings, VM size, and app endpoints.
4. In Create Key Vault, they create a new Key Vault in their infrastructure resource group, to house the
certificate.
5. In Access policies, they enable access to virtual machines to deploy the key vault.

6. They specify a name for the certificate.


7. In the summary page, they copy the link that's used to download the certificate. They need this to connect to
the Service Fabric Cluster.
8. After validation passes, they provision the cluster.
9. In the Certificate Import Wizard, they import the downloaded certificate to dev machines. The certificate is
used to authenticate to the cluster.

10. After the cluster is provisioned, they connect to the Service Fabric Cluster Explorer.
11. They need to select the correct certificate.

12. The Service Fabric Explorer loads, and the Contoso Admin can manage the cluster.

Step 4: Manage Service Fabric certificates


Contoso needs cluster certificates to allow Azure DevOps Services access to the cluster. Contoso admins set this
up.
1. They open the Azure portal and browse to the Key Vault.
2. They open the certificates, and copy the thumbprint of the certificate that was created during the
provisioning process.

3. They copy it to a text file for later reference.


4. Now, they add a client certificate that will become an Admin client certificate on the cluster. This allows
Azure DevOps Services to connect to the cluster for the app deployment in the release pipeline. To do they,
they open Key Vault in the portal, and select Certificates > Generate/Import.

5. They enter the name of the certificate, and provide an X.509 distinguished name in Subject.

6. After the certificate is created, they download it locally in PFX format.

7. Now, they go back to the certificates list in the Key Vault, and copy the thumbprint of the client certificate
that's just been created. They save it in the text file.

8. For Azure DevOps Services deployment, they need to determine the Base64 value of the certificate. They do
this on the local developer workstation using PowerShell. They paste the output into a text file for later use.

[System.Convert]::ToBase64String([System.IO.File]::ReadAllBytes("C:\path\to\certificate.pfx"))

9. Finally, they add the new certificate to the Service Fabric cluster. To do this, in the portal they open the
cluster, and select Security.

10. They select Add > Admin Client, and paste in the thumbprint of the new client certificate. Then they select
Add. This can take up to 15 minutes.

Step 5: Migrate the database with DMA


Contoso admins can now migrate the SmartHotel360 database using DMA.
Install DMA
1. They download the tool from the Microsoft Download Center to the on-premises SQL Server VM (SQLVM ).
2. They run setup (DownloadMigrationAssistant.msi) on the VM.
3. On the Finish page, they select Launch Microsoft Data Migration Assistant before finishing the wizard.
Configure the firewall
To connect to the Azure SQL Database, Contoso admins set up a firewall rule to allow access.
1. In the Firewall and virtual networks properties for the database, they allow access to Azure services, and
add a rule for the client IP address of the on-premises SQL Server VM.
2. A server-level firewall rule is created.

Need more help?


Learn about creating and managing firewall rules for Azure SQL Database.
Migrate
Contoso admins now migrate the database.
1. In the DMA create a new project (SmartHotelDB ) and select Migration.
2. They select the source server type as SQL Server, and the target as Azure SQL Database.

3. In the migration details, they add SQLVM as the source server, and the SmartHotel.Registration database.

4. They receive an error which seems to be associated with authentication. However after investigating, the
issue is the period (.) in the database name. As a workaround, they decided to provision a new SQL database
using the name SmartHotel-Registration, to resolve the issue. When they run DMA again, they're able to
select SmartHotel-Registration, and continue with the wizard.
5. In Select Objects, they select the database tables, and generate a SQL script.

6. After DMA creates the script, they select Deploy schema.


7. DMA confirms that the deployment succeeded.

8. Now they start the migration.


9. After the migration finishes, Contoso can verify that the database is running on the Azure SQL instance.

10. They delete the extra SQL database SmartHotel.Registration in the Azure portal.

Step 6: Set up Azure DevOps Services


Contoso needs to build the DevOps infrastructure and pipelines for the application. To do this, Contoso admins
create a new Azure DevOps project, import their code, and then build and release pipelines.
1. In the Contoso Azure DevOps account, they create a new project (ContosoSmartHotelRearchitect), and
select Git for version control.

2. They import the Git Repo that currently holds their app code. It's in a public repo and you can download it.

3. After the code is imported, they connect Visual Studio to the repo, and clone the code using Team Explorer.
4. After the repository is cloned to the developer machine, they open the Solution file for the app. The web app
and wcf service each have separate project within the file.

Step 7: Convert the app to a container


The on-premises app is a traditional three tier app:
It contains WebForms and a WCF Service connecting to SQL Server.
It uses Entity Framework to integrate with the data in the SQL database, exposing it through a WCF service.
The WebForms application interacts with the WCF service.
Contoso admins will convert the app to a container using Visual Studio and the SDK Tools, as follows:
1. Using Visual Studio, they review the open solution file (SmartHotel.Registration.sln) in the SmartHotel360-
internal-booking-apps\src\Registration directory of the local repo. Two apps are shown. The web front-
end SmartHotel.Registration.Web and the WCF service app SmartHotel.Registration.WCF.

2. They right-click the web app > Add > Container Orchestrator Support.
3. In Add Container Orchestra Support, they select Service Fabric.

4. They repeat the process for SmartHotel.Registration.WCF app.


5. Now, they check how the solution has changed.
The new app is SmartHotel.RegistrationApplication/
It contains two services: SmartHotel.Registration.WCF and SmartHotel.Registration.Web.

6. Visual Studio created the Docker file, and pulled down the required images locally to the developer machine.
7. A manifest file (ServiceManifest.xml) is created and opened by Visual Studio. This file tells Service Fabric
how to configure the container when it's deployed to Azure.

8. Another manifest file (**ApplicationManifest.xml) contains the configuration applications for the containers.

9. They open the ApplicationParameters/Cloud.xml file, and update the connection string to connect the
app to the Azure SQL database. The connection string can be located in the database in the Azure portal.

10. They commit the updated code and push to Azure DevOps Services.
Step 8: Build and release pipelines in Azure DevOps Services
Contoso admins now configure Azure DevOps Services to perform build and release process to action the DevOps
practices.
1. In Azure DevOps Services, they select Build and release > New pipeline.

2. They select Azure DevOps Services Git and the relevant repo.
3. In Select a template, they select fabric with Docker support.

4. They change the Action Tag images to Build an image, and configure the task to use the provisioned ACR.
5. In the Push images task, they configure the image to be pushed to the ACR, and select to include the latest
tag.
6. In Triggers, they enable continuous integration, and add the master branch.

7. They select Save and Queue to start a build.


8. After the build succeeds, they move onto the release pipeline. In Azure DevOps Services they select
Releases > New pipeline.

9. They select the Azure Service Fabric deployment template, and name the Stage (SmartHotelSF).

10. They provide a pipeline name (ContosoSmartHotel360Rearchitect). For the stage, they select 1 job, 1
task to configure the Service Fabric deployment.

11. Now, they select New to add a new cluster connection.


12. In Add Service Fabric service connection, they configure the connection, and the authentication settings
that will be used by Azure DevOps Services to deploy the app. The cluster endpoint can be located in the
Azure portal, and they add tcp:// as a prefix.
13. The certificate information they collected is input in Server Certificate Thumbprint and Client
Certificate.

14. They select the pipeline > Add an artifact.

15. They select the project and build pipeline, using the latest version.
16. Note that the lightning bolt on the artifact is checked.

17. In addition, note that the continuous deployment trigger is enabled.

18. They select Save > Create a release.


19. After the deployment finishes, SmartHotel360 will now be running Service Fabric.

20. To connect to the app, they direct traffic to the public IP address of the Azure load balancer in front of the
Service Fabric nodes.

Step 9: Extend the app and republish


After the SmartHotel360 app and database are running in Azure, Contoso wants to extend the app.
Contoso's developers are prototyping a new .NET Core application which will run on the Service Fabric cluster.
The app will be used to pull sentiment data from Cosmos DB.
This data will be in the form of Tweets that are processed using a Serverless Azure Function, and the Azure
Cognitive Services Text Analysis API.
Provision Azure Cosmos DB
As a first step, Contoso admins provision an Azure Cosmos database.
1. They create an Azure Cosmos DB resource from the Azure Marketplace.
2. They provide a database name (contososmarthotel), select the SQL API, and place the resource in the
production resource group, in the primary East US 2 region.

3. In Getting Started, they select Data Explorer, and add a new collection.
4. In Add Collection they provide IDs and set storage capacity and throughput.

5. In the portal, they open the new database > Collection > Documents and select New Document.
6. They paste the following JSON code into the document window. This is sample data in the form of a single
tweet.

{
"id": "2ed5e734-8034-bf3a-ac85-705b7713d911",
"tweetId": 927750234331580911,
"tweetUrl": "https://twitter.com/status/927750237331580911",
"userName": "CoreySandersWA",
"userAlias": "@CoreySandersWA",
"userPictureUrl": "",
"text": "This is a tweet about #SmartHotel360",
"language": "en",
"sentiment": 0.5,
"retweet_count": 1,
"followers": 500,
"hashtags": [
""
]
}
7. They locate the Cosmos DB endpoint, and the authentication key. These are used in the app to connect to
the collection. In the database, they select Keys, and copy the URI and primary key to Notepad.

Update the sentiment app


With the Cosmos DB provisioned, Contoso admins can configure the app to connect to it.
1. In Visual Studio, they open file ApplicationModern\ApplicationParameters\cloud.xml in Solution Explorer.
2. They fill in the following two parameters:

<Parameter Name="SentimentIntegration.CosmosDBEndpoint" Value="[URI]" />

<Parameter Name="SentimentIntegration.CosmosDBAuthKey" Value="[Key]" />

Republish the app


After extending the app, Contoso admins republish it to Azure using the pipeline.
1. They commit and push their code to Azure DevOps Services. This kicks off the build and release pipelines.
2. After the build and deployment finishes, SmartHotel360 will now be running Service Fabric. The Service
Fabric Management console now shows three services.

3. They can now click through the services to see that the SentimentIntegration app is up and running.
Clean up after migration
After migration, Contoso needs to complete these cleanup steps:
Remove the on-premises VMs from the vCenter inventory.
Remove the VMs from local backup jobs.
Update internal documentation to show the new locations for the SmartHotel360 app. Show the database as
running in Azure SQL database, and the front end as running in Service Fabric.
Review any resources that interact with the decommissioned VMs, and update any relevant settings or
documentation to reflect the new configuration.

Review the deployment


With the migrated resources in Azure, Contoso needs to fully operationalize and secure their new infrastructure.
Security
Contoso admins need to ensure that their new SmartHotel-Registration database is secure. Learn more.
In particular, they should update the container to use SSL with certificates.
They should consider using Key Vault to protect secrets for their Service Fabric apps. Learn more.
Backups
Contoso needs to review backup requirements for the Azure SQL Database. Learn more.
Contoso admins should consider implementing failover groups to provide regional failover for the database.
Learn more.
They can take advantage of geo-replication for the ACR premium SKU. Learn more.
Contoso needs to consider deploying the web app in the main East US 2 and Central US region when Web App
for Containers becomes available. Contoso admins could configure Traffic Manager to ensure failover in case of
regional outages.
Cosmos DB backs up automatically. Contoso read about this process to learn more.
Licensing and cost optimization
After all resources are deployed, Contoso should assign Azure tags based on infrastructure planning.
All licensing is built into the cost of the PaaS services that Contoso is consuming. This will be deducted from the
EA.
Contoso will enable Azure Cost Management licensed by Cloudyn, a Microsoft subsidiary. It's a multicloud cost
management solution that helps you to use and manage Azure and other cloud resources. Learn more about
Azure Cost Management.

Conclusion
In this article, Contoso refactored the SmartHotel360 app in Azure by migrating the app front-end VM to Service
Fabric. The app database was migrated to an Azure SQL database.
Rehost an on-premises Linux app to Azure VMs
12 minutes to read • Edit Online

This article shows how the fictional company Contoso rehosts a two-tier Linux-based Apache MySQL PHP
(LAMP ) app, using Azure IaaS VMs.
osTicket, the service desk app used in this example is provided as open source. If you'd like to use it for your own
testing purposes, you can download it from GitHub.

Business drivers
The IT Leadership team has worked closely with business partners to understand what they want to achieve with
this migration:
Address business growth. Contoso is growing, and as a result there's pressure on the on-premises systems
and infrastructure.
Limit risk. The service desk app is critical for the Contoso business. Contoso wants to move it to Azure with
zero risk.
Extend. Contoso don't want to change the app right now. It simply wants to ensure that the app is stable.

Migration goals
The Contoso cloud team has pinned down goals for this migration, to determine the best migration method:
After migration, the app in Azure should have the same performance capabilities as it does today in their on-
premises VMware environment. The app will remain as critical in the cloud as it is on-premises.
Contoso doesn't want to invest in this app. It is important to the business, but in its current form Contoso
simply wants to move it safely to the cloud.
Contoso doesn't want to change the ops model for this app. It wants to interact with the app in the cloud in the
same way that they do now.
Contoso doesn't want to change app functionality. Only the app location will change.
Having completed a couple of Windows app migrations, Contoso wants to learn how to use a Linux-based
infrastructure in Azure.

Solution design
After pinning down goals and requirements, Contoso designs and review a deployment solution, and identifies the
migration process, including the Azure services that Contoso will use for the migration.
Current app
The OSTicket app is tiered across two VMs (OSTICKETWEB and OSTICKETMYSQL ).
The VMs are located on VMware ESXi host contosohost1.contoso.com (version 6.5).
The VMware environment is managed by vCenter Server 6.5 (vcenter.contoso.com ), running on a VM.
Contoso has an on-premises datacenter (contoso-datacenter), with an on-premises domain controller
(contosodc1)
Proposed architecture
Since the app is a production workload, the VMs in Azure will reside in the production resource group
ContosoRG.
The VMs will be migrated to the primary region (East US 2) and placed in the production network (VNET-
PROD -EUS2):
The web VM will reside in the front-end subnet (PROD -FE -EUS2).
The database VM will reside in the database subnet (PROD -DB -EUS2).
The on-premises VMs in the Contoso datacenter will be decommissioned after the migration is done.

Solution review
Contoso evaluates the proposed design by putting together a pros and cons list.

CONSIDERATION DETAILS

Pros Both the app VMs will be moved to Azure without changes,
making the migration simple.

Since Contoso is using a lift and shift approach for both app
VMs, no special configuration or migration tools are needed
for the app database.

Contoso will retain full control of the app VMs in Azure.

The app VMs are running Ubuntu 16.04-TLS, which is an


endorsed Linux distribution. Learn more.

Cons The web and data tier of the app will remain a single point of
failover.

Contoso will need to continue supporting the app as Azure


VMs rather than moving to a managed service such as Azure
App Service and Azure Database for MySQL.

Contoso is aware that by keeping things simple with a lift and


shift VM migration, they're not taking full advantage of the
features provided by Azure Database for MySQL (built-in high
availability, predictable performance, simple scaling, automatic
backups and built-in security).
Migration process
Contoso will migrate as follows:
As a first step, Contoso prepares and sets up Azure components for Azure Migrate Server Migration, and
prepares the on-premises VMware infrastructure.
They already have the Azure infrastructure in place, so Contoso just needs to add configure the replication of
the VMs through the Azure Migrate Server Migration tool.
With everything prepared, Contoso can start replicating the VMs.
After replication is enabled and working, Contoso will migrate the VM by failing it over to Azure.

Azure services
SERVICE DESCRIPTION COST

Azure Migrate Server Migration The service orchestrates and manages During replication to Azure, Azure
migration of your on-premises apps Storage charges are incurred. Azure
and workloads, and AWS/GCP VM VMs are created, and incur charges,
instances. when failover occurs. Learn more about
charges and pricing.

Prerequisites
Here's what Contoso needs for this scenario.

REQUIREMENTS DETAILS

Azure subscription Contoso created subscriptions in an early article in this series.


If you don't have an Azure subscription, create a free account.

If you create a free account, you're the administrator of your


subscription and can perform all actions.

If you use an existing subscription and you're not the


administrator, you need to work with the admin to assign you
Owner or Contributor permissions.

If you need more granular permissions, review this article.

Azure infrastructure Learn how Contoso set up an Azure infrastructure.

Learn more about specific prerequisites requirements for


Azure Migrate Server Migration.
REQUIREMENTS DETAILS

On-premises servers The on-premises vCenter server should be running version


5.5, 6.0, or 6.5

An ESXi host running version 5.5, 6.0 or 6.5

One or more VMware VMs running on the ESXi host.

On-premises VMs Review Linux machines that are endorsed to run on Azure.

Scenario steps
Here's how Contoso will complete the migration:
Step 1: Prepare Azure for Azure Migrate Server Migration. They add the Server Migration tool to their
Azure Migrate project.
Step 2: Prepare on-premises VMware for Azure Migrate Server Migration. They prepare accounts for
VM discovery, and prepare to connect to Azure VMs after failover.
Step 3: Replicate VMs. They set up replication, and start replicating VMs to Azure storage.
Step 4: Migrate the VMs with Azure Migrate Server Migration. They run a test failover to make sure
everything's working, and then run a full failover to migrate the VMs to Azure.

Step 1: Prepare Azure for the Azure Migrate Server Migration tool
Here are the Azure components Contoso needs to migrate the VMs to Azure:
A VNet in which Azure VMs will be located when they're created during failover.
The Azure Migrate Server Migration tool provisioned.
They set these up as follows:
1. Set up a network: Contoso already set up a network that can be for Azure Migrate Server Migration when
they deployed the Azure infrastructure
The SmartHotel360 app is a production app, and the VMs will be migrated to the Azure production
network (VNET-PROD -EUS2) in the primary East US 2 region.
Both VMs will be placed in the ContosoRG resource group, which is used for production resources.
The app front-end VM (WEBVM ) will migrate to the front-end subnet (PROD -FE -EUS2), in the
production network.
The app database VM (SQLVM ) will migrate to the database subnet (PROD -DB -EUS2), in the
production network.
2. Provision the Azure Migrate Server Migration tool: With the network and storage account in place,
Contoso now creates a Recovery Services vault (ContosoMigrationVault), and places it in the
ContosoFailoverRG resource group in the primary East US 2 region.

Need more help?


Learn about setting up Azure Migrate Server Migration tool.
Prepare to connect to Azure VMs after failover
After failover to Azure, Contoso wants to be able to connect to the replicated VMs in Azure. To do this, there's a
couple of things that the Contoso admins need to do:
To access Azure VMs over the internet, they enable SSH on the on-premises Linux VM before migration. For
Ubuntu this can be completed using the following command: Sudo apt-get ssh install -y.
After they run the migration (failover), they can check Boot diagnostics to view a screenshot of the VM.
If this doesn't work, they'll need to check that the VM is running, and review these troubleshooting tips.
Need more help?
Learn about preparing VMs for migration

Step 3: Replicate the on-premises VMs


Before Contoso admins can run a migration to Azure, they need to set up and enable replication.
With discovery completed, you can begin replication of VMware VMs to Azure.
1. In the Azure Migrate project > Servers, Azure Migrate: Server Migration, click Replicate.

2. In Replicate, > Source settings > Are your machines virtualized?, select Yes, with VMware vSphere.
3. In On-premises appliance, select the name of the Azure Migrate appliance that you set up > OK.

4. In Virtual machines, select the machines you want to replicate.


If you've run an assessment for the VMs, you can apply VM sizing and disk type (premium/standard)
recommendations from the assessment results. To do this, in Import migration settings from an
Azure Migrate assessment?, select the Yes option.
If you didn't run an assessment, or you don't want to use the assessment settings, select the No options.
If you selected to use the assessment, select the VM group, and assessment name.

5. In Virtual machines, search for VMs as needed, and check each VM you want to migrate. Then click Next:
Target settings.
6. In Target settings, select the subscription, and target region to which you'll migrate, and specify the
resource group in which the Azure VMs will reside after migration. In Virtual Network, select the Azure
VNet/subnet to which the Azure VMs will be joined after migration.
7. In Azure Hybrid Benefit, select the following:
Select No if you don't want to apply Azure Hybrid Benefit. Then click Next.
Select Yes if you have Windows Server machines that are covered with active Software Assurance or
Windows Server subscriptions, and you want to apply the benefit to the machines you're migrating. Then
click Next.
8. In Compute, review the VM name, size, OS disk type, and availability set. VMs must conform with Azure
requirements.
VM size: If you're using assessment recommendations, the VM size dropdown will contain the
recommended size. Otherwise Azure Migrate picks a size based on the closest match in the Azure
subscription. Alternatively, pick a manual size in Azure VM size.
OS disk: Specify the OS (boot) disk for the VM. The OS disk is the disk that has the operating system
bootloader and installer.
Availability set: If the VM should be in an Azure availability set after migration, specify the set. The set
must be in the target resource group you specify for the migration.
9. In Disks, specify whether the VM disks should be replicated to Azure, and select the disk type (standard
SSD/HDD or premium-managed disks) in Azure. Then click Next.
You can exclude disks from replication.
If you exclude disks, won't be present on the Azure VM after migration.
10. In Review and start replication, review the settings, and click Replicate to start the initial replication for
the servers.

NOTE
You can update replication settings any time before replication starts, in Manage > Replicating machines. Settings can't be
changed after replication starts.

Step 4: Migrate the VMs


Contoso admins run a quick test failover, and then a full failover to migrate the VMs.
Run a test failover
1. In Migration goals > Servers > Azure Migrate: Server Migration, click Test migrated servers.

2. Right-click the VM to test, and click Test migrate.

3. In Test Migration, select the Azure VNet in which the Azure VM will be located after the migration. We
recommend you use a nonproduction VNet.
4. The Test migration job starts. Monitor the job in the portal notifications.
5. After the migration finishes, view the migrated Azure VM in Virtual Machines in the Azure portal. The
machine name has a suffix -Test.
6. After the test is done, right-click the Azure VM in Replicating machines, and click Clean up test
migration.

Migrate the VMs


Now Contoso admins run a full failover to complete the migration.
1. In the Azure Migrate project > Servers > Azure Migrate: Server Migration, click Replicating servers.

2. In Replicating machines, right-click the VM > Migrate.


3. In Migrate > Shut down virtual machines and perform a planned migration with no data loss, select
Yes > OK.
By default Azure Migrate shuts down the on-premises VM, and runs an on-demand replication to
synchronize any VM changes that occurred since the last replication occurred. This ensures no data loss.
If you don't want to shut down the VM, select No
4. A migration job starts for the VM. Track the job in Azure notifications.
5. After the job finishes, you can view and manage the VM from the Virtual Machines page.
Connect the VM to the database
As the final step in the migration process, Contoso adins update the connection string of the application to point to
the app database running on the OSTICKETMYSQL VM.
1. They make an SSH connection to the OSTICKETWEB VM using Putty or another SSH client. The VM is
private so they connect using the private IP address.

2. They need to make sure that the OSTICKETWEB VM can communicate with the OSTICKETMYSQL VM.
Currently the configuration is hardcoded with the on-premises IP address 172.16.0.43.
Before the update:
After the update:

3. They restart the service with systemctl restart apache2.

4. Finally, they update the DNS records for OSTICKETWEB and OSTICKETMYSQL, on one of the Contoso
domain controllers.
Need more help?
Learn about running a test failover.
Learn about migrating VMs to Azure.

Clean up after migration


With migration complete, the osTicket app tiers are now running on Azure VMs.
Now, Contoso needs to clean up as follows:
Remove the on-premises VMs from the vCenter inventory.
Remove the on-premises VMs from local backup jobs.
Update their internal documentation to show the new location, and IP addresses for OSTICKETWEB and
OSTICKETMYSQL.
Review any resources that interact with the VMs, and update any relevant settings or documentation to reflect
the new configuration.
Contoso used the Azure Migrate service with dependency mapping to assess the VMs for migration. Admins
should remove the Microsoft Monitoring Agent, and the Microsoft Dependency agent they installed for this
purpose, from the VM.

Review the deployment


With the app now running, Contoso needs to fully operationalize and secure their new infrastructure.
Security
The Contoso security team review the OSTICKETWEB and OSTICKETMYSQLVMs to determine any security
issues.
The team reviews the network security groups (NSGs) for the VMs to control access. NSGs are used to ensure
that only traffic allowed to the application can pass.
The team also considers securing the data on the VM disks using Disk encryption and Azure Key Vault.
For more information, see Security best practices for IaaS workloads in Azure.
BCDR
For business continuity and disaster recovery, Contoso takes the following actions:
Keep data safe. Contoso backs up the data on the VMs using the Azure Backup service. Learn more.
Keep apps up and running. Contoso replicates the app VMs in Azure to a secondary region using Site
Recovery. Learn more.
Licensing and cost optimization
After deploying resources, Contoso assigns Azure tags as defined during the Azure infrastructure deployment.
Contoso has no licensing issues with the Ubuntu servers.
Contoso will enable Azure Cost Management licensed by Cloudyn, a Microsoft subsidiary. It's a multicloud cost
management solution that helps you to use and manage Azure and other cloud resources. Learn more about
Azure Cost Management.
Rehost an on-premises Linux app to Azure VMs and
Azure Database for MySQL
17 minutes to read • Edit Online

This article shows how the fictional company Contoso rehosts a two-tier Linux-based Apache/MySQL/PHP
(LAMP ) app, migrating it from on-premises to Azure using Azure VMs and Azure Database for MySQL.
osTicket, the service desk app used in this example, is provided as open source. If you'd like to use it for your own
testing, you can download it from GitHub.

Business drivers
The IT Leadership team has worked closely with business partners to understand what they want to achieve:
Address business growth. Contoso is growing, and as a result there's pressure on the on-premises systems
and infrastructure.
Limit risk. The service desk app is critical for the business. Contoso wants to move it to Azure with zero risk.
Extend. Contoso doesn't want to change the app right now. It simply wants to keep the app stable.

Migration goals
The Contoso cloud team has pinned down goals for this migration, in order to determine the best migration
method:
After migration, the app in Azure should have the same performance capabilities as it does today in their on-
premises VMware environment. The app will remain as critical in the cloud as it is on-premises.
Contoso doesn't want to invest in this app. It's important to the business, but in its current form Contoso simply
want to move it safely to the cloud.
Having completed a couple of Windows app migrations, Contoso wants to learn how to use a Linux-based
infrastructure in Azure.
Contoso wants to minimize database admin tasks after the application is moved to the cloud.

Proposed architecture
In this scenario:
The app is tiered across two VMs ( OSTICKETWEB and OSTICKETMYSQL ).
The VMs are located on VMware ESXi host contosohost1.contoso.com (version 6.5).
The VMware environment is managed by vCenter Server 6.5 ( vcenter.contoso.com ), running on a VM.
Contoso has an on-premises datacenter ( contoso-datacenter ), with an on-premises domain controller (
contosodc1 ).
The web tier app on OSTICKETWEB will be migrated to an Azure IaaS VM.
The app database will be migrated to the Azure Database for MySQL PaaS service.
Since Contoso is migrating a production workload, the resources will reside in the production resource group
ContosoRG .
The resources will be replicated to the primary region (East US 2), and placed in the production network (
VNET-PROD-EUS2 ):
The web VM will reside in the front-end subnet ( PROD-FE-EUS2 ).
The database instance will reside in the database subnet ( PROD-DB-EUS2 ).
The app database will be migrated to Azure Database for MySQL using MySQL tools.
The on-premises VMs in the Contoso datacenter will be decommissioned after the migration is done.

Migration process
Contoso will complete the migration process as follows:
To migrate the web VM:
1. As a first step, Contoso sets up the Azure and on-premises infrastructure needed to deploy Site Recovery.
2. After preparing the Azure and on-premises components, Contoso sets up and enables replication for the web
VM.
3. After replication is up-and-running, Contoso migrates the VM by failing it over to Azure.
To migrate the database:
1. Contoso provisions a MySQL instance in Azure.
2. Contoso sets up MySQL workbench, and backs up the database locally.
3. Contoso then restore the database from the local backup to Azure.
Azure services
SERVICE DESCRIPTION COST

Azure Site Recovery The service orchestrates and manages During replication to Azure, Azure
migration and disaster recovery for Storage charges are incurred. Azure
Azure VMs, and on-premises VMs and VMs are created, and incur charges,
physical servers. when failover occurs. Learn more about
charges and pricing.

Azure Database for MySQL The database is based on the open-


source MySQL Server engine. It
provides a fully managed enterprise-
ready community MySQL database, as a
service for app development and
deployment.

Prerequisites
Here's what Contoso needs for this scenario.

REQUIREMENTS DETAILS

Azure subscription Contoso created subscriptions during an earlier article. If you


don't have an Azure subscription, create a free account.

If you create a free account, you're the administrator of your


subscription and can perform all actions.

If you use an existing subscription and you're not the


administrator, you need to work with the admin to assign you
Owner or Contributor permissions.

If you need more granular permissions, review this article.

Azure infrastructure Contoso set up the Azure infrastructure as described in Azure


infrastructure for migration.

Learn more about specific network and storage requirements


for Site Recovery.

On-premises servers The on-premises vCenter server should be running version


5.5, 6.0, or 6.5

An ESXi host running version 5.5, 6.0 or 6.5

One or more VMware VMs running on the ESXi host.

On-premises VMs Review Linux VM requirements that are supported for


migration with Site Recovery.

Verify supported Linux file and storage systems.

VMs must meet Azure requirements.

Scenario steps
Here's how Contoso admins will complete the migration:
Step 1: Prepare Azure for Site Recovery. They create an Azure storage account to hold replicated data, and
create a Recovery Services vault.
Step 2: Prepare on-premises VMware for Site Recovery. They prepare accounts for VM discovery and
agent installation, and prepare to connect to Azure VMs after failover.
Step 3: Provision the database. In Azure, they provision an instance of Azure Database for MySQL.
Step 4: Replicate VMs. They configure the Site Recovery source and target environment, set up a replication
policy, and start replicating VMs to Azure storage.
Step 5: Migrate the database. They set up migration with MySQL tools.
Step 6: Migrate the VMs with Site Recovery. Lastly, they run a test failover to make sure everything's
working, and then run a full failover to migrate the VMs to Azure.

Step 1: Prepare Azure for the Site Recovery service


Contoso needs a couple of Azure components for Site Recovery:
A VNet in which failed over resources are located. Contoso already created the VNet during Azure
infrastructure deployment
A new Azure storage account to hold replicated data.
A Recovery Services vault in Azure.
The Contoso admins create a storage account and vault as follows:
1. They create a storage account (contosovmsacc20180528) in the East US 2 region.
The storage account must be in the same region as the Recovery Services vault.
They use a general purpose account, with standard storage, and LRS replication.

2. With the network and storage account in place, they create a vault (ContosoMigrationVault), and place it in
the ContosoFailoverRG resource group, in the primary East US 2 region.
Need more help?
Learn about setting up Azure for Site Recovery.

Step 2: Prepare on-premises VMware for Site Recovery


Contoso admins prepare the on-premises VMware infrastructure as follows:
They create an account on the vCenter server, to automate VM discovery.
They create an account that allows automatic installation of the Mobility service on VMware VMs that will be
replicated.
They prepare on-premises VMs, so that they can connect to Azure VMs when they're created after the
migration.
Prepare an account for automatic discovery
Site Recovery needs access to VMware servers to:
Automatically discover VMs. At least a read-only account is required.
Orchestrate replication, failover, and failback. You need an account that can run operations such as creating and
removing disks, and turning on VMs.
Contoso admins set up the account as follows:
1. They create a role at the vCenter level.
2. They then assign that role the required permissions.
Prepare an account for Mobility service installation
The Mobility service must be installed on each VM that Contoso wants to migrate.
Site Recovery can do an automatic push installation of this component when you enable replication for the
VMs.
For automatic installation. Site Recovery needs an account with permissions to access the VM.
Account details are input during replication setup.
The account can be domain or local account, as long as it has installation permissions.
Prepare to connect to Azure VMs after failover
After failover to Azure, Contoso wants to be able to connect to the Azure VMs. To do this, Contoso admins need to
do the following:
To access over the internet, they enable SSH on the on-premises Linux VM before the migration. For Ubuntu
this can be completed using the following command: Sudo apt-get ssh install -y.
After the failover, they should check Boot diagnostics to view a screenshot of the VM.
If this doesn't work, they need to verify that the VM is running, and review these troubleshooting tips.
Need more help?
Learn about creating and assigning a role for automatic discovery.
Learn about creating an account for push installation of the Mobility service.

Step 3: Provision Azure Database for MySQL


Contoso admins provision a MySQL database instance in the primary East US 2 region.
1. In the Azure portal, they create an Azure Database for MySQL resource.

2. They add the name contosoosticket for the Azure database. They add the database to the production
resource group ContosoRG, and specify credentials for it.
3. The on-premises MySQL database is version 5.7, so they select this version for compatibility. They use the
default sizes, which match their database requirements.
4. For Backup Redundancy Options, they select to use Geo-Redundant. This option allows them to restore
the database in their secondary Central US region if an outage occurs. They can only configure this option
when they provision the database.

5. In the VNET-PROD -EUS2 network > Service endpoints, they add a service endpoint (a database subnet)
for the SQL service.
6. After adding the subnet, they create a virtual network rule that allows access from the database subnet in
the production network.

Step 4: Replicate the on-premises VMs


Before they can migrate the web VM to Azure, Contoso admins set up and enable replication.
Set a protection goal
1. In the vault, under the vault name (ContosoVMVault) they set a replication goal (Getting Started > Site
Recovery > Prepare infrastructure.
2. They specify that their machines are located on-premises, that they're VMware VMs, and that they want to
replicate to Azure.
Confirm deployment planning
To continue, they confirm that they've completed deployment planning, by selecting Yes, I have done it. Contoso
are only migrating a single VM in this scenario, and don't need deployment planning.
Set up the source environment
Contoso admins now configure the source environment. To do this, using an OVF template they deploy a Site
Recovery configuration server as a highly available, on-premises VMware VM. After the configuration server is up
and running, they register it in the vault.
The configuration server runs several components:
The configuration server component that coordinates communications between on-premises and Azure and
manages data replication.
The process server that acts as a replication gateway. It receives replication data; optimizes it with caching,
compression, and encryption; and sends it to Azure storage.
The process server also installs Mobility Service on VMs you want to replicate and performs automatic
discovery of on-premises VMware VMs.
Contoso admins do this as follows:
1. They download the OVF template from Prepare Infrastructure > Source > Configuration Server.
2. They import the template into VMware to create the VM, and deploy the VM.
3. When they turn on the VM for the first time, it boots up into a Windows Server 2016 installation experience.
They accept the license agreement, and enter an administrator password.
4. After the installation finishes, they sign in to the VM as the administrator. At first sign-in, the Azure Site
Recovery Configuration Tool runs by default.
5. In the tool, they specify a name to use for registering the configuration server in the vault.
6. The tool checks that the VM can connect to Azure.
7. After the connection is established, they sign in to the Azure subscription. The credentials must have access
to the vault in which they'll register the configuration server.
8. The tool performs some configuration tasks and then reboots.
9. They sign in to the machine again, and the Configuration Server Management Wizard starts automatically.
10. In the wizard, they select the NIC to receive replication traffic. This setting can't be changed after it's
configured.
11. They select the subscription, resource group, and vault in which to register the configuration server.

12. Now, they download and install MySQL Server, and VMware PowerCLI.
13. After validation, they specify the FQDN or IP address of the vCenter server or vSphere host. They leave the
default port, and specify a friendly name for the vCenter server.
14. They input the account that they created for automatic discovery, and the credentials that Site Recovery will
use to automatically install the Mobility Service.

15. After registration finishes, in the Azure portal, they check that the configuration server and VMware server
are listed on the Source page in the vault. Discovery can take 15 minutes or more.
16. With everything in place, Site Recovery connects to VMware servers, and discovers VMs.
Set up the target
Now Contoso admins input target replication settings.
1. In Prepare infrastructure > Target, they select the target settings.
2. Site Recovery checks that there's an Azure storage account and network in the specified target.
Create a replication policy
With the source and target set up, Contoso admins are ready to create a replication policy.
1. In Prepare infrastructure > Replication Settings > Replication Policy > Create and Associate, they
create a policy ContosoMigrationPolicy.
2. They use the default settings:
RPO threshold: Default of 60 minutes. This value defines how often recovery points are created. An
alert is generated if continuous replication exceeds this limit.
Recovery point retention: Default of 24 hours. This value specifies how long the retention window
is for each recovery point. Replicated VMs can be recovered to any point in a window.
App-consistent snapshot frequency: Default of one hour. This value specifies the frequency at
which application-consistent snapshots are created.

3. The policy is automatically associated with the configuration server.

Need more help?


You can read a full walkthrough of all these steps in Set up disaster recovery for on-premises VMware VMs.
Detailed instructions are available to help you set up the source environment, deploy the configuration server,
and configure replication settings.
Learn more about the Azure Guest agent for Linux.
Enable replication for the Web VM
Now Contoso admins can start replicating the OSTICKETWEB VM.
1. In Replicate application > Source > +Replicate they select the source settings.
2. They indicate that they want to enable virtual machines, and select the source settings, including the vCenter
server, and the configuration server.

3. Now they specify the target settings. These include the resource group and network in which the Azure VM
will be located after failover, and the storage account in which replicated data will be stored.
4. They select OSTICKETWEB for replication.

5. In the VM properties, they select the account that should be used to automatically install the Mobility
Service on the VM.
6. In Replication settings > Configure replication settings, they check that the correct replication policy is
applied, and select Enable Replication. The Mobility service will be automatically installed.
7. They track replication progress in Jobs. After the Finalize Protection job runs, the machine is ready for
failover.
Need more help?
You can read a full walkthrough of all these steps in Enable replication.

Step 5: Migrate the database


Contoso admins migrate the database using backup and restore, with MySQL tools. They install MySQL
Workbench, back up the database from OSTICKETMYSQL, and then restore it to Azure Database for MySQL
Server.
Install MySQL Workbench
1. They check the prerequisites and downloads MySQL Workbench.
2. They install MySQL Workbench for Windows in accordance with the installation instructions.
3. In MySQL Workbench, they create a MySQL connection to OSTICKETMYSQL.

4. They export the database as osticket, to a local self-contained file.


5. After the database has been backed up locally, they create a connection to the Azure Database for MySQL
instance.

6. Now, they can import (restore) the database in the Azure Database for MySQL instance, from the self-
contained file. A new schema (osticket) is created for the instance.
Step 6: Migrate the VMs with Site Recovery
Finally, Contoso admins run a quick test failover, and then migrate the VM.
Run a test failover
Running a test failover helps verify that everything's working as expected, before the migration.
1. They run a test failover to the latest available point in time (Latest processed).
2. They select Shut down machine before beginning failover, so that Site Recovery attempts to shut down
the source VM before triggering the failover. Failover continues even if shutdown fails.
3. Test failover runs:
A prerequisites check runs to make sure all of the conditions required for migration are in place.
Failover processes the data, so that an Azure VM can be created. If select the latest recovery point, a
recovery point is created from the data.
An Azure VM is created using the data processed in the previous step.
4. After the failover finishes, the replica Azure VM appears in the Azure portal. They check that the VM is the
appropriate size, that it's connected to the right network, and that it's running.
5. After verifying, they clean up the failover, and record and save any observations.
Migrate the VM
To migrate the VM, Contoso admins creates a recovery plan that includes the VM, and fail over the plan to Azure.
1. They create a plan, and add OSTICKETWEB to it.
2. They run a failover on the plan. They select the latest recovery point, and specify that Site Recovery should
try to shut down the on-premises VM before triggering the failover. They can follow the failover progress on
the Jobs page.

3. During the failover, vCenter Server issues commands to stop the two VMs running on the ESXi host.
4. After the failover, they verify that the Azure VM appears as expected in the Azure portal.

5. After checking the VM, they complete the migration. This stops replication for the VM, and stops Site
Recovery billing for the VM.
Need more help?
Learn about running a test failover.
Learn how to create a recovery plan.
Learn about failing over to Azure.
Connect the VM to the database
As the final step in the migration process, Contoso admins update the connection string of the app to point to the
Azure Database for MySQL.
1. They make an SSH connection to the OSTICKETWEB VM using Putty or another SSH client. The VM is
private so they connect using the private IP address.
2. They update settings so that the OSTICKETWEB VM can communicate with the OSTICKETMYSQL
database. Currently the configuration is hardcoded with the on-premises IP address 172.16.0.43.
Before the update:
After the update:

3. They restart the service with systemctl restart apache2.

4. Finally, they update the DNS records for OSTICKETWEB, on one of the Contoso domain controllers.
Clean up after migration
With migration complete, the osTicket app tiers are running on Azure VMs.
Now, Contoso needs to do the following:
Remove the VMware VMs from the vCenter inventory.
Remove the on-premises VMs from local backup jobs.
Update internal documentation show new locations and IP addresses.
Review any resources that interact with the on-premises VMs, and update any relevant settings or
documentation to reflect the new configuration.
Contoso used the Azure Migrate service with dependency mapping to assess the OSTICKETWEB VM for
migration. They should now remove the agents (the Microsoft Monitoring Agent and the Microsoft
Dependency agent) they installed for this purpose, from the VM.

Review the deployment


With the app now running, Contoso need to fully operationalize and secure their new infrastructure.
Security
The Contoso security team review the VM and database to determine any security issues.
They review the network security groups (NSGs) for the VM, to control access. NSGs are used to ensure that
only traffic allowed to the application can pass.
They consider securing the data on the VM disks using Disk encryption and Azure Key Vault.
Communication between the VM and database instance isn't configured for SSL. They will need to do this to
ensure that database traffic can't be hacked.
For more information, see Security best practices for IaaS workloads in Azure.
BCDR
For business continuity and disaster recovery, Contoso takes the following actions:
Keep data safe. Contoso backs up the data on the app VM using the Azure Backup service. Learn more. They
don't need to configure backup for the database. Azure Database for MySQL automatically creates and stores
server backups. They selected to use geo-redundancy for the database, so it's resilient and production-ready.
Keep apps up and running. Contoso replicates the app VMs in Azure to a secondary region using Site
Recovery. Learn more.
Licensing and cost optimization
After deploying resources, Contoso assigns Azure tags, in accordance with decisions they made during the
Azure infrastructure deployment.
There are no licensing issues for the Contoso Ubuntu servers.
Contoso will enable Azure Cost Management licensed by Cloudyn, a Microsoft subsidiary. It's a multicloud cost
management solution that helps you to use and manage Azure and other cloud resources. Learn more about
Azure Cost Management.
Rehost an on-premises app on an Azure VM and
SQL Database Managed Instance
28 minutes to read • Edit Online

This article shows how the fictional company Contoso migrates a two-tier Windows .NET front-end app running
on VMware VMs to an Azure VM using the Azure Site Recovery service. It also shows how Contoso migrates the
app database to Azure SQL Database Managed Instance.
The SmartHotel360 app used in this example is provided as open source. If you'd like to use it for your own testing
purposes, you can download it from GitHub.

Business drivers
Contoso's IT leadership team has worked closely with the company's business partners to understand what the
business wants to achieve with this migration:
Address business growth. Contoso is growing. As a result, pressure has increased on the company's on-
premises systems and infrastructure.
Increase efficiency. Contoso needs to remove unnecessary procedures, and to streamline processes for its
developers and users. The business needs IT to be fast and to not waste time or money, so the company can
deliver faster on customer requirements.
Increase agility. Contoso IT needs to be more responsive to the needs of the business. It must be able to react
faster than the changes that occur in the marketplace for the company to be successful in a global economy. IT
at Contoso must not get in the way or become a business blocker.
Scale. As the company's business grows successfully, Contoso IT must provide systems that can grow at the
same pace.

Migration goals
The Contoso cloud team has identified goals for this migration. The company uses migration goals to determine
the best migration method.
After migration, the app in Azure should have the same performance capabilities that the app has today in
Contoso's on-premises VMware environment. Moving to the cloud doesn't mean that app performance is less
critical.
Contoso doesn't want to invest in the app. The app is critical and important to the business, but Contoso simply
wants to move the app in its current form to the cloud.
Database administration tasks should be minimized after the app is migrated.
Contoso doesn't want to use an Azure SQL Database for this app. It's looking for alternatives.

Solution design
After pinning down their goals and requirements, Contoso designs and reviews a deployment solution, and
identifies the migration process, including the Azure services that it will use for the migration.
Current architecture
Contoso has one main datacenter (contoso-datacenter) . The datacenter is located in the city of New York in
the Eastern United States.
Contoso has three additional local branches across the United States.
The main datacenter is connected to the internet with a fiber Metro Ethernet connection (500 MBps).
Each branch is connected locally to the internet by using business-class connections with IPsec VPN tunnels
back to the main datacenter. The setup allows Contoso's entire network to be permanently connected and
optimizes internet connectivity.
The main datacenter is fully virtualized with VMware. Contoso has two ESXi 6.5 virtualization hosts that are
managed by vCenter Server 6.5.
Contoso uses Active Directory for identity management. Contoso uses DNS servers on the internal network.
Contoso has an on-premises domain controller (contosodc1).
The domain controllers run on VMware VMs. The domain controllers at local branches run on physical servers.
The SmartHotel360 app is tiered across two VMs (WEBVM and SQLVM ) that are located on a VMware ESXi
version 6.5 host (contosohost1.contoso.com ).
The VMware environment is managed by vCenter Server 6.5 (vcenter.contoso.com ) running on a VM.

Proposed architecture
In this scenario, Contoso wants to migrate its two-tier on-premises travel app as follows:
Migrate the app database (SmartHotelDB ) to an Azure SQL Database Managed Instance.
Migrate the front-end WebVM to an Azure VM.
The on-premises VMs in the Contoso datacenter will be decommissioned when the migration is finished.

Database considerations
As part of the solution design process, Contoso did a feature comparison between Azure SQL Database and SQL
Server Managed Instance. The following considerations helped them to decide to go with Managed Instance.
Managed Instance aims to deliver almost 100% compatibility with the latest on-premises SQL Server version.
Microsoft recommends Managed instance for customers running SQL Server on-premises or on IaaS VM who
want to migrate their apps to a fully managed service with minimal design changes.
Contoso is planning to migrate a large number of apps from on-premises to IaaS. Many of these are ISV
provided. Contoso realizes that using Managed Instance will help ensure database compatibility for these apps,
rather than using SQL Database which might not be supported.
Contoso can simply do a lift and shift migration to Managed Instance using the fully automated Azure Database
Migration Service. With this service in place, Contoso can reuse it for future database migrations.
SQL Managed Instance supports SQL Server Agent which is an important issue for the SmartHotel360 app.
Contoso needs this compatibility, otherwise it will have to redesign maintenance plans required by the app.
With Software Assurance, Contoso can exchange their existing licenses for discounted rates on a SQL Database
Managed Instance using the Azure Hybrid Benefit for SQL Server. This can allow Contoso to save up to 30% on
Managed Instance.
SQL Managed Instance is fully contained in the virtual network, so it provides greater isolation and security for
Contoso's data. Contoso can get the benefits of the public cloud, while keeping the environment isolated from
the public Internet.
Managed Instance supports many security features including Always-encrypted, dynamic data masking, row -
level security, and threat detection.
Solution review
Contoso evaluates the proposed design by putting together a pros and cons list.

CONSIDERATION DETAILS

Pros WEBVM will be moved to Azure without changes, making the


migration simple.

SQL Managed Instance supports Contoso's technical


requirements and goals.

Managed Instance will provide 100% compatibility with their


current deployment, while moving them away from SQL
Server 2008 R2.

They can take advantage of their investment in Software


Assurance and using the Azure Hybrid Benefit for SQL Server
and Windows Server.

They can reuse the Azure Database Migration Service for


additional future migrations.

SQL Managed Instance has built-in fault tolerance that


Contoso doesn't need to configures. This ensures that the
data tier is no longer a single point of failover.
CONSIDERATION DETAILS

Cons The WEBVM is running Windows Server 2008 R2. Although


this operating system is supported by Azure, it is no longer
supported platform. Learn more.

The web tier remains a single point of failover with only


WEBVM providing services.

Contoso will need to continue supporting the app web tier as


a VM rather than moving to a managed service, such as Azure
App Service.

For the data tier, Managed Instance might not be the best
solution if Contoso wants to customize the operating system
or the database server, or if they want to run third-party apps
along with SQL Server. Running SQL Server on an IaaS VM
could provide this flexibility.

Migration process
Contoso will migrate the web and data tiers of its SmartHotel360 app to Azure by completing these steps:
1. Contoso already has its Azure infrastructure in place, so it just needs to add a couple of specific Azure
components for this scenario.
2. The data tier will be migrated by using the Azure Database Migration Service. This service connects to the
on-premises SQL Server VM across a site-to-site VPN connection between the Contoso datacenter and
Azure. The service then migrates the database.
3. The web tier will be migrated by using a lift and shift migration by using Site Recovery. The process entails
preparing the on-premises VMware environment, setting up and enabling replication, and migrating the
VMs by failing them over to Azure.

Azure services
SERVICE DESCRIPTION COST

Azure Database Migration Service The Azure Database Migration Service Learn about supported regions and
enables seamless migration from Database Migration Service pricing.
multiple database sources to Azure data
platforms with minimal downtime.
SERVICE DESCRIPTION COST

Azure SQL Database Managed Instance Managed Instance is a managed Using a SQL Database Managed
database service that represents a fully Instance running in Azure incurs
managed SQL Server instance in the charges based on capacity. Learn more
Azure cloud. It uses the same code as about Managed Instance pricing.
the latest version of SQL Server
Database Engine, and has the latest
features, performance improvements,
and security patches.

Azure Site Recovery The Site Recovery service orchestrates During replication to Azure, Azure
and manages migration and disaster Storage charges are incurred. Azure
recovery for Azure VMs and on- VMs are created and incur charges
premises VMs and physical servers. when failover occurs. Learn more about
Site Recovery charges and pricing.

Prerequisites
Contoso and other users must meet the following prerequisites for this scenario:

REQUIREMENTS DETAILS

Azure subscription You should have already created a subscription when you
perform the assessment in the first article in this series. If you
don't have an Azure subscription, create a free account.

If you create a free account, you're the administrator of your


subscription and can perform all actions.

If you use an existing subscription and you're not the


administrator of the subscription, you need to work with the
admin to assign you Owner or Contributor permissions.

If you need more granular permissions, see Use role-based


access control to manage Site Recovery access.

Azure infrastructure Contoso set up their Azure infrastructure as described in


Azure infrastructure for migration.

Site Recovery (on-premises) Your on-premises vCenter Server instance should be running
version 5.5, 6.0, or 6.5

An ESXi host running version 5.5, 6.0, or 6.5

One or more VMware VMs running on the ESXi host.

VMs must meet Azure requirements.

Supported network and storage configuration.


REQUIREMENTS DETAILS

Database Migration Service For the Azure Database Migration Service, you need a
compatible on-premises VPN device.

You must be able to configure the on-premises VPN device. It


must have an external-facing public IPv4 address. The address
can't be located behind a NAT device.

Make sure you have access to your on-premises SQL Server


database.

Windows Firewall should be able to access the source


database engine. Learn how to configure Windows Firewall for
Database Engine access.

If there's a firewall in front of your database machine, add rules


to allow access to the database and files via SMB port 445.

The credentials that are used to connect to the source SQL


Server instance and which target Managed Instance must be
members of the sysadmin server role.

You need a network share in your on-premises database that


the Azure Database Migration Service can use to back up the
source database.

Make sure that the service account running the source SQL
Server instance has write permissions on the network share.

Make a note of a Windows user and password that has full


control permissions on the network share. The Azure Database
Migration Service impersonates these user credentials to
upload backup files to the Azure Storage container.

The SQL Server Express installation process sets the TCP/IP


protocol to Disabled by default. Make sure that it's enabled.

Scenario steps
Here's how Contoso plans to set up the deployment:
Step 1: Set up a SQL Database Managed Instance. Contoso needs an existing managed instance to which
the on-premises SQL Server database will migrate.
Step 2: Prepare the Azure Database Migration Service. Contoso must register the database migration
provider, create an instance, and then create an Azure Database Migration Service project. Contoso also must
set up a shared access signature (SAS ) uniform resource identifier (URI) for the Azure Database Migration
Service. An SAS URI provides delegated access to resources in Contoso's storage account, so Contoso can
grant limited permissions to storage objects. Contoso sets up an SAS URI, so the Azure Database Migration
Service can access the storage account container to which the service uploads the SQL Server backup files.
Step 3: Prepare Azure for Site Recovery. Contoso must create a storage account to hold replicated data for
Site Recovery. It also must create an Azure Recovery Services vault.
Step 4: Prepare on-premises VMware for Site Recovery. Contoso will prepare accounts for VM discovery
and agent installation to connect to Azure VMs after failover.
Step 5: Replicate VMs. To set up replication, Contoso configure the Site Recovery source and target
environments, sets up a replication policy, and starts replicating VMs to Azure Storage.
Step 6: Migrate the database using the Azure Database Migration Service. Contoso migrates the
database.
Step 7: Migrate the VMs by using Site Recovery. Contoso runs a test failover to make sure everything's
working. Then, Contoso runs a full failover to migrate the VMs to Azure.

Step 1: Prepare a SQL Database Managed Instance


To set up an Azure SQL Database Managed Instance, Contoso needs a subnet that meets the following
requirements:
The subnet must be dedicated. It must be empty, and it can't contain any other cloud service. The subnet can't be
a gateway subnet.
After the Managed Instance is created, Contoso should not add resources to the subnet.
The subnet can't have a network security group associated with it.
The subnet must have a user-defined route table. The only route assigned should be 0.0.0.0/0 next-hop internet.
Optional custom DNS: If custom DNS is specified on the Azure virtual network, Azure's recursive resolvers IP
address (such as 168.63.129.16) must be added to the list. Learn how to configure custom DNS for a Managed
Instance.
The subnet must not have a service endpoint (storage or SQL ) associated with it. Service endpoints should be
disabled on the virtual network.
The subnet must have a minimum of 16 IP addresses. Learn how to size the Managed Instance subnet.
In Contoso's hybrid environment, custom DNS settings are required. Contoso configures DNS settings to use
one or more of the company's Azure DNS servers. Learn more about DNS customization.
Set up a virtual network for the Managed Instance
Contoso admins set up the virtual network as follows:
1. They create a new virtual network ( VNET-SQLMI -EU2) in the primary East US 2 region. It adds the virtual
network to the ContosoNetworkingRG resource group.
2. They assign an address space of 10.235.0.0/24. They ensure that the range doesn't overlap with any other
networks in its enterprise.
3. They add two subnets to the network:
SQLMI -DS -EUS2 (10.235.0.0.25)
SQLMI -SAW -EUS2 (10.235.0.128/29). This subnet is used to attach a directory to the Managed
Instance.
4. After the virtual network and subnets are deployed, they peer networks as follows:
Peers VNET-SQLMI -EUS2 with VNET-HUB -EUS2 (the hub virtual network for the East US 2).
Peers VNET-SQLMI -EUS2 with VNET-PROD -EUS2 (the production network).

5. They set custom DNS settings. DNS points first to Contoso's Azure domain controllers. Azure DNS is
secondary. The Contoso Azure domain controllers are located as follows:
Located in the PROD -DC -EUS2 subnet, in the East US 2 production network (VNET-PROD -EUS2)
CONTOSODC3 address: 10.245.42.4
CONTOSODC4 address: 10.245.42.5
Azure DNS resolver: 168.63.129.16
Need more help?
Get an overview of SQL Database Managed Instance.
Learn how to create a virtual network for a SQL Database Managed Instance.
Learn how to set up peering.
Learn how to update Azure Active Directory DNS settings.
Set up routing
The Managed Instance is placed in a private virtual network. Contoso needs a route table for the virtual network to
communicate with the Azure Management Service. If the virtual network can't communicate with the service that
manages it, the virtual network becomes inaccessible.
Contoso considers these factors:
The route table contains a set of rules (routes) that specify how packets sent from the Managed Instance should
be routed in the virtual network.
The route table is associated with subnets in which Managed Instances are deployed. Each packet that leaves a
subnet is handled based on the associated route table.
A subnet can be associated with only one route table.
There are no additional charges for creating route tables in Microsoft Azure.
To set up routing Contoso admins do the following:
1. They create a user-defined route table in the ContosoNetworkingRG resource group.

2. To comply with Managed Instance requirements, after the route table (MIRouteTable) is deployed, they add
a route that has an address prefix of 0.0.0.0/0. The Next hop type option is set to Internet.
3. They associate the route table with the SQLMI -DB -EUS2 subnet (in the VNET-SQLMI -EUS2 network).

Need more help?


Learn how to set up routes for a Managed Instance.
Create a Managed Instance
Now, Contoso admins can provision a SQL Database Managed Instance:
1. Because the Managed Instance serves a business app, they deploy the Managed Instance in the company's
primary East US 2 region. They add the Managed Instance to the ContosoRG resource group.
2. They select a pricing tier, size compute, and storage for the instance. Learn more about Managed Instance
pricing.
3. After the Managed Instance is deployed, two new resources appear in the ContosoRG resource group:
A virtual cluster in case Contoso has multiple Managed Instances.
The SQL Server Database Managed Instance.

Need more help?


Learn how to provision a Managed Instance.

Step 2: Prepare the Azure Database Migration Service


To prepare the Azure Database Migration Service, Contoso admins need to do a few things:
Register the Azure Database Migration Service provider in Azure.
Provide the Azure Database Migration Service with access to Azure Storage for uploading the backup files that
are used to migrate a database. To provide access to Azure Storage, they create an Azure Blob storage container.
They generate an SAS URI for the Blob storage container.
Create an Azure Database Migration Service project.
Then, they complete the following steps:
1. They register the database migration provider under its subscription.

2. They create a Blob storage container. Contoso generates an SAS URI so that the Azure Database Migration
Service can access it.

3. They create an Azure Database Migration Service instance.

4. They place the Azure Database Migration Service instance in the PROD -DC -EUS2 subnet of the VNET-
PROD -DC -EUS2 virtual network.
The Azure Database Migration Service is placed here because the service must be in a virtual
network that can access the on-premises SQL Server VM via a VPN gateway.
The VNET-PROD -EUS2 is peered to VNET-HUB -EUS2 and is allowed to use remote gateways.
The Use remote gateways option ensures that the Azure Database Migration Service can
communicate as required.

Need more help?


Learn how to set up the Azure Database Migration Service.
Learn how to create and use SAS.

Step 3: Prepare Azure for the Site Recovery service


Several Azure elements are required for Contoso to set up Site Recovery for migration of its web tier VM
(WEBMV ):
A virtual network in which failed-over resources are located.
A storage account to hold replicated data.
A Recovery Services vault in Azure.
Contoso admins set up Site Recovery as follows:
1. Because the VM is a web front end to the SmartHotel360 app, Contoso fails over the VM to its existing
production network (VNET-PROD -EUS2) and subnet (PROD -FE -EUS2). The network and subnet are
located in the primary East US 2 region. Contoso set up the network when it deployed the Azure
infrastructure.
2. They create a storage account (contosovmsacc20180528). Contoso uses a general-purpose account.
Contoso selects standard storage and locally redundant storage replication.
3. With the network and storage account in place, they create a vault (ContosoMigrationVault). Contoso
places the vault in the ContosoFailoverRG resource group, in the primary East US 2 region.

Need more help?


Learn how to set up Azure for Site Recovery.

Step 4: Prepare on-premises VMware for Site Recovery


To prepare VMware for Site Recovery, Contoso admins must complete these tasks:
Prepare an account on the vCenter Server instance or vSphere ESXi host. The account automates VM discovery.
Prepare an account that allows automatic installation of the Mobility Service on VMware VMs that Contoso
wants to replicate.
Prepare on-premises VMs to connect to Azure VMs when they're created after failover.
Prepare an account for automatic discovery
Site Recovery needs access to VMware servers to:
Automatically discover VMs. A minimum of a read-only account is required.
Orchestrate replication, failover, and failback. Contoso needs an account that can run operations such as
creating and removing disks and turning on VMs.
Contoso admins set up the account by completing these tasks:
1. Creates a role at the vCenter level.
2. Assigns the required permissions to that role.
Need more help?
Learn how to create and assign a role for automatic discovery.
Prepare an account for Mobility Service installation
The Mobility Service must be installed on the VM that Contoso wants to replicate. Contoso considers these factors
about the Mobility Service:
Site Recovery can do an automatic push installation of this component when Contoso enables replication for
the VM.
For automatic push installation, Contoso must prepare an account that Site Recovery uses to access the VM.
This account is specified when replication is configured in the Azure console.
Contoso must have a domain or local account with permissions to install on the VM.
Need more help?
Learn how to create an account for push installation of the Mobility Service.
Prepare to connect to Azure VMs after failover
After failover to Azure, Contoso wants to be able to connect to the replicated VMs in Azure. To connect to the
replicated VMs in Azure, Contoso admins must complete a few tasks on the on-premises VM before the migration:
1. For access over the internet, they enable RDP on the on-premises VM before failover. They ensure that TCP and
UDP rules are added for the Public profile, and that RDP is allowed in Windows Firewall > Allowed Apps
for all profiles.
2. For access over Contoso's site-to-site VPN, they enable RDP on the on-premises machine. They allow RDP in
Windows Firewall > Allowed apps and features for Domain and Private networks.
3. They set the operating system's SAN policy on the on-premises VM to OnlineAll.
Contoso admins also need to check these items when they run a failover:
There should be no Windows updates pending on the VM when a failover is triggered. If Windows updates are
pending, users Contoso can't sign in to the virtual machine until the update is finished.
After failover, admins should check Boot diagnostics to view a screenshot of the VM. If they can't view the
boot diagnostics, they should check that the VM is running, and then review troubleshooting tips.

Step 5: Replicate the on-premises VMs to Azure


Before running a migration to Azure, Contoso admins need to set up and enable replication for the on-premises
VM.
Set a replication goal
1. In the vault, under the vault name (ContosoVMVault), they set a replication goal (Getting Started > Site
Recovery > Prepare infrastructure).
2. They specify that the machines are located on-premises, that they're VMware VMs, replicating to Azure.
Confirm deployment planning
To continue, Contoso admins confirm that they've completed deployment planning. They select Yes, I have done
it. In this deployment, Contoso is migrating only a single VM, deployment planning isn't needed.
Set up the source environment
Now, Contoso admins configure the source environment. To set up its source environment, they download an OVF
template, and use it to deploy the configuration server and its associated components as a highly available, on-
premises VMware VM. Components on the server include:
The configuration server that coordinates communications between the on-premises infrastructure and Azure.
The configuration server manages data replication.
The process server that acts as a replication gateway. The process server:
Receives replication data.
Optimizes replication date by using caching, compression, and encryption.
Sends replication date to Azure Storage.
The process server also installs the Mobility Service on the VMs that will be replicated. The process server
performs automatic discovery of on-premises VMware VMs.
After the configuration server VM is created and started, Contoso registers the server in the vault.
To set up the source environment Contoso admins do the following:
1. They download the OVF template from the Azure portal ( Prepare Infrastructure > Source >
Configuration Server).
2. They import the template into VMware to create and deploy the VM.
3. When they turn on the VM for the first time, it starts in a Windows Server 2016 installation experience. They
accept the license agreement and enters an administrator password.
4. When the installation is finished, they sign in to the VM as the administrator. At first time sign-in, the Azure
Site Recovery Configuration Tool runs automatically.
5. In the Site Recovery Configuration Tool, they enter a name to use to register the configuration server in the
vault.
6. The tool checks the virtual machine's connection to Azure. After the connection is established, they select
Sign in to sign in to the Azure subscription. The credentials must have access to the vault in which the
configuration server is registered.
7. The tool performs some configuration tasks, and then reboots. They sign in to the machine again. The
Configuration Server Management Wizard starts automatically.
8. In the wizard, they select the NIC to receive replication traffic. This setting can't be changed after it's
configured.
9. They select the subscription, resource group, and Recovery Services vault in which to register the
configuration server.
10. They download and install MySQL Server and VMmare PowerCLI. Then, they validates the server settings.
11. After validation, they enter the FQDN or IP address of the vCenter Server instance or vSphere host. They
leave the default port, and enter a display name for the vCenter Server instance in Azure.
12. They specify the account created earlier so that Site Recovery can automatically discover VMware VMs that
are available for replication.
13. They enter credentials, so the Mobility Service is automatically installed when replication is enabled. For
Windows machines, the account needs local administrator permissions on the VMs.

14. When registration is finished, in the Azure portal, they verify again that the configuration server and
VMware server are listed on the Source page in the vault. Discovery can take 15 minutes or more.
15. Site Recovery connects to VMware servers by using the specified settings, and discovers VMs.
Set up the target
Now, Contoso admins configure the target replication environment:
1. In Prepare infrastructure > Target, they select the target settings.
2. Site Recovery checks that there's a storage account and network in the specified target.
Create a replication policy
When the source and target are set up, Contoso admins create a replication policy and associates the policy with
the configuration server:
1. In Prepare infrastructure > Replication Settings > Replication Policy > Create and Associate, they
create the ContosoMigrationPolicy policy.
2. They use the default settings:
RPO threshold: Default of 60 minutes. This value defines how often recovery points are created. An
alert is generated if continuous replication exceeds this limit.
Recovery point retention: Default of 24 hours. This value specifies how long the retention window is
for each recovery point. Replicated VMs can be recovered to any point in a window.
App-consistent snapshot frequency: Default of 1 hour. This value specifies the frequency at which
application-consistent snapshots are created.

3. The policy is automatically associated with the configuration server.


Need more help?
You can read a full walkthrough of these steps in Set up disaster recovery for on-premises VMware VMs.
Detailed instructions are available to help you set up the source environment, deploy the configuration server,
and configure replication settings.
Enable replication
Now, Contoso admins can start replicating WebVM.
1. In Replicate application > Source > Replicate, they select the source settings.
2. They indicate that they want to enable virtual machines, select the vCenter Server instance, and set the
configuration server.

3. They specify the target settings, including the resource group and network in which the Azure VM will be
located after failover. They specify the storage account in which replicated data will be stored.
4. They select WebVM for replication. Site Recovery installs the Mobility Service on each VM when replication
is enabled.
5. They check that the correct replication policy is selected, and enable replication for WEBVM. They track
replication progress in Jobs. After the Finalize Protection job runs, the machine is ready for failover.
6. In Essentials in the Azure portal, they can see status for the VMs that are replicating to Azure:

Need more help?


You can read a full walkthrough of these steps in Enable replication.

Step 6: Migrate the database


Contoso admins need to create an Azure Database Migration Service project, and then migrate the database.
Create an Azure Database Migration Service project
1. They create an Azure Database Migration Service project. They select the SQL Server source server type,
and Azure SQL Database Managed Instance as the target.

2. The Migration Wizard opens.


Migrate the database
1. In the Migration Wizard, they specify the source VM on which the on-premises database is located. They
enter the credentials to access the database.
2. They select the database to migrate (SmartHotel.Registration):

3. For the target, they enter the name of the Managed Instance in Azure, and the access credentials.

4. In New Activity > Run Migration, they specify settings to run migration:
Source and target credentials.
The database to migrate.
The network share created on the on-premises VM. The Azure Database Migration Service takes
source backups to this share.
The service account that runs the source SQL Server instance must have write permissions on this
share.
The FQDN path to the share must be used.
The SAS URI that provides the Azure Database Migration Service with access to the storage account
container to which the service uploads the backup files for migration.

5. They save the migration settings, and then run the migration.
6. In Overview, they monitor the migration status.
7. When migration is finished, they verify that the target databases exist on the Managed Instance.

Step 7: Migrate the VM


Contoso admins run a quick test failover, and then migrate the VM.
Run a test failover
Before migrating WEBVM, a test failover helps ensure that everything works as expected. Admins complete the
following steps:
1. They run a test failover to the latest available point in time (Latest processed).
2. They select Shut down machine before beginning failover. With this option selected, Site Recovery
attempts to shut down the source VM before it triggers the failover. Failover continues, even if shutdown fails.
3. Test failover runs: a. A prerequisites check runs to make sure that all the conditions required for migration are in
place. b. Failover processes the data so that an Azure VM can be created. If the latest recovery point is selected,
a recovery point is created from the data. c. Azure VM is created by using the data processed in the preceding
step.
4. When the failover is finished, the replica Azure VM appears in the Azure portal. They verify that everything is
working properly: the VM is the appropriate size, it's connected to the correct network, and it's running.
5. After verifying the test failover, they clean up the failover, and record any observations.
Migrate the VM
1. After verifying that the test failover worked as expected, Contoso admins create a recovery plan for
migration, and add WEBVM to the plan:

2. They run a failover on the plan, selecting the latest recovery point. They specify that Site Recovery should try
to shut down the on-premises VM before it triggers the failover.

3. After the failover, they verify that the Azure VM appears as expected in the Azure portal.
4. After verifying, they complete the migration to finish the migration process, stop replication for the VM, and
stop Site Recovery billing for the VM.

Update the connection string


As the final step in the migration process, Contoso admins update the connection string of the application to point
to the migrated database that's running on Contoso's Managed Instance.
1. In the Azure portal, they find the connection string by selecting Settings > Connection Strings.

2. They update the string with the user name and password of the SQL Database Managed Instance.
3. After the string is configured, they replace the current connection string in the web.config file of its
application.
4. After updating the file and saving it, they restart IIS on WEBVM by running IISRESET /RESTART in a
Command Prompt window.
5. After IIS is restarted, the app uses the database that's running on the SQL Database Managed Instance.
6. At this point, they can shut down on-premises the SQLVM machine. The migration has been completed.
Need more help?
Learn how to run a test failover.
Learn how to create a recovery plan.
Learn how to fail over to Azure.

Clean up after migration


With the migration complete, the SmartHotel360 app is running on an Azure VM and the SmartHotel360 database
is available in the Azure SQL Database Managed Instance.
Now, Contoso needs to do the following cleanup tasks:
Remove the WEBVM machine from the vCenter Server inventory.
Remove the SQLVM machine from the vCenter Server inventory.
Remove WEBVM and SQLVM from local backup jobs.
Update internal documentation to show the new location and IP address for WEBVM.
Remove SQLVM from internal documentation. Alternatively, Contoso can revise the documentation to show
SQLVM as deleted and no longer in the VM inventory.
Review any resources that interact with the decommissioned VMs. Update any relevant settings or
documentation to reflect the new configuration.

Review the deployment


With the migrated resources in Azure, Contoso needs to fully operationalize and secure its new infrastructure.
Security
The Contoso security team reviews the Azure VMs and SQL Database Managed Instance to check for any security
issues with its implementation:
The team reviews the network security groups that are used to control access for the VM. Network security
groups help ensure that only traffic that is allowed to the app can pass.
Contoso's security team also is considering securing the data on the disk by using Azure Disk Encryption
and Azure Key Vault.
The team enables threat detection on the Managed Instance. Threat detection sends an alert to Contoso's
security team/service desk system to open a ticket if a threat is detected. Learn more about threat detection
for Managed Instance.
To learn more about security practices for VMs, see Security best practices for IaaS workloads in Azure.
BCDR
For business continuity and disaster recovery (BCDR ), Contoso takes the following actions:
Keep data safe: Contoso backs up the data on the VMs using the Azure Backup service. Learn more.
Keep apps up and running: Contoso replicates the app VMs in Azure to a secondary region using Site Recovery.
Learn more.
Contoso learns more about managing SQL Managed Instance, including database backups.
Licensing and cost optimization
Contoso has an existing licensing for WEBVM. To take advantage of pricing with Azure Hybrid Benefit, Contoso
converts the existing Azure VM.
Contoso enables Azure Cost Management licensed by Cloudyn, a Microsoft subsidiary. Cost Management is a
multicloud cost management solution that helps Contoso use and manage Azure and other cloud resources.
Learn more about Azure Cost Management.

Conclusion
In this article, Contoso rehosts the SmartHotel360 app in Azure by migrating the app front-end VM to Azure by
using the Site Recovery service. Contoso migrates the on-premises database to an Azure SQL Database Managed
Instance by using the Azure Database Migration Service.
Rehost an on-premises app on Azure VMs and SQL
Server Always On availability groups
27 minutes to read • Edit Online

This article demonstrates how the fictional company Contoso rehosts a two-tier Windows .NET app running on
VMware VMs as part of a migration to Azure. Contoso migrates the app front-end VM to an Azure VM, and the
app database to an Azure SQL Server VM, running in a Windows Server failover cluster with SQL Server Always
On availability groups.
The SmartHotel360 app used in this example is provided as open source. If you'd like to use it for your own testing
purposes, you can download it from GitHub.

Business drivers
The IT leadership team has worked closely with business partners to understand what they want to achieve with
this migration:
Address business growth. Contoso is growing, and as a result there is pressure on on-premises systems and
infrastructure.
Increase efficiency. Contoso needs to remove unnecessary procedures, and streamline processes for
developers and users. The business needs IT to be fast and not waste time or money, thus delivering faster on
customer requirements.
Increase agility. Contoso IT needs to be more responsive to the needs of the business. It must be able to react
faster than the changes in the marketplace, to enable the success in a global economy. IT mustn't get in the way,
or become a business blocker.
Scale. As the business grows successfully, Contoso IT must provide systems that are able to grow at the same
pace.

Migration goals
The Contoso cloud team has pinned down goals for this migration. These goals were used to determine the best
migration method:
After migration, the app in Azure should have the same performance capabilities as it does today in VMware.
The app will remain as critical in the cloud as it is on-premises.
Contoso doesn't want to invest in this app. It is important to the business, but in its current form Contoso
simply want to move it safely to the cloud.
The on-premises database for the app has had availability issues. Contoso would like to deploy it in Azure as a
high-availability cluster, with failover capabilities.
Contoso wants to upgrade from their current SQL Server 2008 R2 platform, to SQL Server 2017.
Contoso doesn't want to use an Azure SQL Database for this app, and is looking for alternatives.

Solution design
After pinning down their goals and requirements, Contoso designs and reviews a deployment solution, and
identifies the migration process, including the Azure services that it will use for the migration.
Current architecture
The app is tiered across two VMs (WEBVM and SQLVM ).
The VMs are located on VMware ESXi host contosohost1.contoso.com (version 6.5)
The VMware environment is managed by vCenter Server 6.5 (vcenter.contoso.com ), running on a VM.
Contoso has an on-premises datacenter (contoso-datacenter), with an on-premises domain controller
(contosodc1).
Proposed architecture
In this scenario:
Contoso will migrate the app front-end WEBVM to an Azure IaaS VM.
The front-end VM in Azure will be deployed in the ContosoRG resource group (used for production
resources).
It will be located in the Azure production network (VNET-PROD -EUS2) in the primary East US2 region.
The app database will be migrated to an Azure SQL Server VM.
It will be located in Contoso's Azure database network (PROD -DB -EUS2) in the primary East US2
region.
It will be placed in a Windows Server failover cluster with two nodes, that uses SQL Server Always On
availability groups.
In Azure the two SQL Server VM nodes in the cluster will be deployed in the ContosoRG resource
group.
The VM nodes will be located in the Azure production network (VNET-PROD -EUS2) in the primary East
US2 region.
VMs will run Windows Server 2016 with SQL Server 2017 Enterprise Edition. Contoso doesn't have
licenses for this operating system, so it will use an image in the Azure Marketplace that provides the
license as a charge to their Azure EA commitment.
Apart from unique names, both VMs use the same settings.
Contoso will deploy an internal load balancer which listens for traffic on the cluster, and directs it to the
appropriate cluster node.
The internal load balancer will be deployed in the ContosoNetworkingRG (used for networking
resources).
The on-premises VMs in the Contoso datacenter will be decommissioned after the migration is done.

Database considerations
As part of the solution design process, Contoso did a feature comparison between Azure SQL Database and SQL
Server. The following considerations helped them to decide to go with an Azure IaaS VM running SQL Server:
Using an Azure VM running SQL Server seems to be an optimal solution if Contoso needs to customize the
operating system or the database server, or if it might want to colocate and run third-party apps on the same
VM.
Using the Data Migration Assistant, Contoso can easily assess and migrate to an Azure SQL Database.
Solution review
Contoso evaluates their proposed design by putting together a pros and cons list.

CONSIDERATION DETAILS

Pros WEBVM will be moved to Azure without changes, making the


migration simple.

The SQL Server tier will run on SQL Server 2017 and Windows
Server 2016. This retires their current Windows Server 2008
R2 operating system, and running SQL Server 2017 supports
Contoso's technical requirements and goals. IT provides 100%
compatibility while moving away from SQL Server 2008 R2.

Contoso can take advantage of their investment in Software


Assurance, using the Azure Hybrid Benefit.

A high availability SQL Server deployment in Azure provides


fault tolerance so that the app data tier is no longer a single
point of failover.

Cons WEBVM is running Windows Server 2008 R2. The operating


system is supported by Azure for specific roles (July 2018).
Learn more.

The web tier of the app will remain a single point of failover.

Contoso will need to continue supporting the web tier as an


Azure VM rather than moving to a managed service such as
Azure App Service.

With the chosen solution, Contoso will need to continue


managing two SQL Server VMs rather than moving to a
managed platform such as Azure SQL Database Managed
Instance. In addition, with Software Assurance, Contoso could
exchange their existing licenses for discounted rates on Azure
SQL Database Managed Instance.

Azure services
SERVICE DESCRIPTION COST

Data Migration Assistant DMA runs locally from the on-premises DMA is a free, downloadable tool.
SQL Server machine, and migrates the
database across a site-to-site VPN to
Azure.

Azure Site Recovery Site Recovery orchestrates and manages During replication to Azure, Azure
migration and disaster recovery for Storage charges are incurred. Azure
Azure VMs, and on-premises VMs and VMs are created, and incur charges,
physical servers. when failover occurs. Learn more about
charges and pricing.

Migration process
Contoso admins will migrate the app VMs to Azure.
They'll migrate the front-end VM to Azure VM using Site Recovery:
As a first step, they'll prepare and set up Azure components, and prepare the on-premises VMware
infrastructure.
With everything prepared, they can start replicating the VM.
After replication is enabled and working, they migrate the VM by failing it over to Azure.
They'll migrate the database to a SQL Server cluster in Azure, using the Data Migration Assistant (DMA).
As a first step they'll need to provision SQL Server VMs in Azure, set up the cluster and an internal load
balancer, and configure Always On availability groups.
With this in place, they can migrate the database
After the migration, they'll enable Always On protection for the database.

Prerequisites
Here's what Contoso needs to do for this scenario.

REQUIREMENTS DETAILS

Azure subscription Contoso already created a subscription in an early article in


this series. If you don't have an Azure subscription, create a
free account.

If you create a free account, you're the administrator of your


subscription and can perform all actions.

If you use an existing subscription and you're not the


administrator, you need to work with the admin to assign you
Owner or Contributor permissions.

If you need more granular permissions, review this article.

Azure infrastructure Learn how Contoso set up an Azure infrastructure.

Learn more about specific network and storage requirements


for Site Recovery.
REQUIREMENTS DETAILS

Site Recovery (on-premises) The on-premises vCenter server should be running version
5.5, 6.0, or 6.5

An ESXi host running version 5.5, 6.0 or 6.5

One or more VMware VMs running on the ESXi host.

VMs must meet Azure requirements.

Supported network and storage configuration.

VMs you want to replicate must meet Azure requirements.

Scenario steps
Here's how Contoso will run the migration:
Step 1: Prepare a cluster. Create a cluster for deploying two SQL Server VM nodes in Azure.
Step 2: Deploy and set up the cluster. Prepare an Azure SQL Server cluster. Databases are migrated into this
existing cluster.
Step 3: Deploy the load balancer. Deploy a load balancer to balance traffic to the SQL Server nodes.
Step 4: Prepare Azure for Site Recovery. Create an Azure storage account to hold replicated data, and a
Recovery Services vault.
Step 5: Prepare on-premises VMware for Site Recovery. Prepare accounts for VM discovery and agent
installation. Prepare on-premises VMs so that users can connect to Azure VMs after migration.
Step 6: Replicate VMs. Enable VM replication to Azure.
Step 7: Install DMA. Download and install the Data Migration Assistant.
Step 8: Migrate the database with DMA. Migrate the database to Azure.
Step 9: Protect the database. Create an Always On availability group for the cluster.
Step 10: Migrate the web app VM. Run a test failover to make sure everything's working as expected. Then
run a full failover to Azure.

Step 1: Prepare a SQL Server Always On availability group cluster


Contoso admins set up the cluster as follows:
1. They create two SQL Server VMs by selecting SQL Server 2017 Enterprise Windows Server 2016 image in
the Azure Marketplace.

2. In the Create virtual machine Wizard > Basics, they configure:


Names for the VMs: SQLAOG1 and SQLAOG2.
Since machines are business-critical, they enable SSD for the VM disk type.
They specify machine credentials.
They deploy the VMs in the primary EAST US 2 region, in the ContosoRG resource group.
3. In Size, they start with D2s_V3 SKU for both VMs. They'll scale later as they need to.
4. In Settings, they do the following:
Since these VMs are critical databases for the app, they use managed disks.
They place the machines in the production network of the EAST US 2 primary region ( VNET-
PROD -EUS2), in the database subnet (PROD -DB -EUS2).
They create a new availability set: SQLAOGAVSET, with two fault domains and five update domains.

5. In SQL Server settings, they limit SQL connectivity to the virtual network (private), on default port 1433.
For authentication they use the same credentials as they use onsite (contosoadmin).

Need more help?


Get help provisioning a SQL Server VM.
Learn about configuring VMs for different SQL Server SKUs.

Step 2: Deploy and set up the cluster


Here's how Contoso admins set up the cluster:
1. They set up an Azure storage account to act as the cloud witness.
2. They add the SQL Server VMs to the Active Directory domain in the Contoso on-premises datacenter.
3. They create the cluster in Azure.
4. They configure the cloud witness.
5. Lastly, they enable SQL Always On availability groups.
Set up a storage account as cloud witness
To set up a cloud witness, Contoso needs an Azure Storage account that will hold the blob file used for cluster
arbitration. The same storage account can be used to set up cloud witness for multiple clusters.
Contoso admins create a storage account as follows:
1. They specify a recognizable name for the account (contosocloudwitness).
2. They deploy a general all-purpose account, with LRS.
3. They place the account in a third region - South Central US. They place it outside the primary and
secondary region so that it remains available in case of regional failure.
4. They place it in their resource group that holds infrastructure resources, ContosoInfraRG.

5. When they create the storage account, primary and secondary access keys are generated for it. They need
the primary access key to create the cloud witness. The key appears under the storage account name >
Access Keys.
Add SQL Server VMs to Contoso domain
1. Contoso adds SQLAOG1 and SQLAOG2 to contoso.com domain.
2. Then, on each VM they install the Windows Failover Cluster Feature and Tools.
Set up the cluster
Before setting up the cluster, Contoso admins take a snapshot of the OS disk on each machine.

1. Then, they run a script they've put together to create the Windows Failover Cluster.
2. After they've created the cluster, they verify that the VMs appear as cluster nodes.

Configure the cloud witness


1. Contoso admins configure the cloud witness using the Quorum Configuration Wizard in Failover Cluster
Manager.
2. In the wizard they select to create a cloud witness with the storage account.
3. After the cloud witness is configured, in appears in the Failover Cluster Manager snap-in.

Enable SQL Server Always On availability groups


Contoso admins can now enable Always On:
1. In SQL Server Configuration Manager, they enable Always On availability groups for the SQL Server
(MSSQLSERVER) service.
2. They restart the service for changes to take effect.
With Always On enabled, Contoso can set up the Always On availability group that will protect the SmartHotel360
database.
Need more help?
Read about cloud witness and setting up a storage account for it.
Get instructions for setting up a cluster and creating an availability group.

Step 3: Deploy the Azure Load Balancer


Contoso admins now want to deploy an internal load balancer that sits in front of the cluster nodes. The load
balancer listens for traffic, and directs it to the appropriate node.
They create the load balancer as follows:
1. In the Azure portal > Networking > Load Balancer, they set up a new internal load balancer: ILB -PROD -
DB -EUS2-SQLAOG.
2. They place the load balancer in the production network VNET-PROD -EUS2, in the database subnet
PROD -DB -EUS2.
3. They assign it a static IP address: 10.245.40.100.
4. As a networking element, they deploy the load balancer in the networking resource group
ContosoNetworkingRG.
After the internal load balancer is deployed, they need to set it up. They create a back-end address pool, set up a
health probe, and configure a load balancing rule.
Add a back-end pool
To distribute traffic to the VMs in the cluster, Contoso admins set up a back-end address pool that contains the IP
addresses of the NICs for VMs that will receive network traffic from the load balancer.
1. In the load balancer settings in the portal, Contoso add a back-end pool: ILB -PROD -DB -EUS -SQLAOG-
BEPOOL.
2. They associate the pool with availability set SQLAOGAVSET. The VMs in the set (SQLAOG1 and
SQLAOG2) are added to the pool.
Create a health probe
Contoso admins create a health probe so that the load balancer can monitor the app health. The probe dynamically
adds or removes VMs from the load balancer rotation, based on how they respond to health checks.
They create the probe as follows:
1. In the load balancer settings in the portal, Contoso creates a health probe: SQLAlwaysOnEndPointProbe.
2. They set the probe to monitor VMs on TCP port 59999.
3. They set an interval of 5 seconds between probes, and a threshold of 2. If two probes fail, the VM will be
considered unhealthy.
Configure the load balancer to receive traffic
Now, Contoso admins set up a load balancer rule to define how traffic is distributed to the VMs.
The front-end IP address handles incoming traffic.
The back-end IP pool receives the traffic.
They create the rule as follows:
1. In the load balancer settings in the portal, they add a new load balancing rule:
SQLAlwaysOnEndPointListener.
2. They set a front-end listener to receive incoming SQL client traffic on TCP 1433.
3. They specify the back-end pool to which traffic will be routed, and the port on which VMs listen for traffic.
4. They enable floating IP (direct server return). This is always required for SQL Always On.
Need more help?
Get an overview of Azure Load Balancer.
Learn about creating a load balancer.

Step 4: Prepare Azure for the Site Recovery service


Here are the Azure components Contoso needs to deploy Site Recovery:
A VNet in which VMs will be located when they're creating during failover.
An Azure storage account to hold replicated data.
A Recovery Services vault in Azure.
Contoso admins set these up as follows:
1. Contoso already created a network/subnet they can use for Site Recovery when they deployed the Azure
infrastructure.
The SmartHotel360 app is a production app, and WEBVM will be migrated to the Azure production
network (VNET-PROD -EUS2) in the primary East US2 region.
WEBVM will be placed in the ContosoRG resource group, which is used for production resources, and in
the production subnet (PROD -FE -EUS2).
2. Contoso admins create an Azure storage account (contosovmsacc20180528) in the primary region.
They use a general-purpose account, with standard storage, and LRS replication.
The account must be in the same region as the vault.
3. With the network and storage account in place, they now create a Recovery Services vault
(ContosoMigrationVault), and place it in the ContosoFailoverRG resource group, in the primary East US
2 region.

Need more help?


Learn about setting up Azure for Site Recovery.

Step 5: Prepare on-premises VMware for Site Recovery


Here's what Contoso admins prepare on-premises:
An account on the vCenter server or vSphere ESXi host, to automate VM discovery.
An account that allows automatic installation of the Mobility service on VMware VMs that you want to
replicate.
On-premises VM settings, so that Contoso can connect to the replicated Azure VM after failover.
Prepare an account for automatic discovery
Site Recovery needs access to VMware servers to:
Automatically discover VMs.
Orchestrate replication, failover, and failback.
At least a read-only account is required. You need an account that can run operations such as creating and
removing disks, and turning on VMs.
Contoso admins set up the account as follows:
1. They create a role at the vCenter level.
2. They then assign that role the required permissions.
Prepare an account for Mobility service installation
The Mobility service must be installed on each VM.
Site Recovery can do an automatic push installation of this component when replication is enabled for the VM.
You need an account that Site Recovery can use to access the VM for the push installation. You specify this
account when you set up replication in the Azure console.
The account can be domain or local, with permissions to install on the VM.
Prepare to connect to Azure VMs after failover
After failover, Contoso wants to be able to connect to Azure VMs. To do this, Contoso admins do the following
before migration:
1. For access over the internet they:
Enable RDP on the on-premises VM before failover
Ensure that TCP and UDP rules are added for the Public profile.
Check that RDP is allowed in Windows Firewall > Allowed Apps for all profiles.
2. For access over site-to-site VPN, they:
Enable RDP on the on-premises machine.
Allow RDP in the Windows Firewall -> Allowed apps and features, for Domain and Private
networks.
Set the operating system's SAN policy on the on-premises VM to OnlineAll.
In addition, when they run a failover they need to check the following:
There should be no Windows updates pending on the VM when triggering a failover. If there are, users won't be
able to log into the VM until the update completes.
After failover, they can check Boot diagnostics to view a screenshot of the VM. If this doesn't work, they
should verify that the VM is running, and review these troubleshooting tips.
Need more help?
Learn about creating and assigning a role for automatic discovery.
Learn about creating an account for push installation of the Mobility service.

Step 6: Replicate the on-premises VMs to Azure with Site Recovery


Before they can run a migration to Azure, Contoso admins need to set up and enable replication.
Set a replication goal
1. In the vault, under the vault name (ContosoVMVault) they select a replication goal (Getting Started > Site
Recovery > Prepare infrastructure.
2. They specify that their machines are located on-premises, running on VMware, and replicating to Azure.
Confirm deployment planning
To continue, they need to confirm that they have completed deployment planning, by selecting Yes, I have done it.
In this scenario Contoso are only migrating a VM, and don't need deployment planning.
Set up the source environment
Contoso admins need to configure their source environment. To do this, they download an OVF template and use it
to deploy the Site Recovery configuration server as a highly available, on-premises VMware VM. After the
configuration server is up and running, they register it in the vault.
The configuration server runs several components:
The configuration server component that coordinates communications between on-premises and Azure and
manages data replication.
The process server that acts as a replication gateway. It receives replication data; optimizes it with caching,
compression, and encryption; and sends it to Azure storage.
The process server also installs Mobility Service on VMs you want to replicate and performs automatic
discovery of on-premises VMware VMs.
Contoso admins perform these steps as follows:
1. In the vault, they download the OVF template from Prepare Infrastructure > Source > Configuration
Server.
2. They import the template into VMware to create and deploy the VM.
3. When they turn on the VM for the first time, it boots up into a Windows Server 2016 installation experience.
They accept the license agreement, and enter an administrator password.
4. After the installation finishes, they sign in to the VM as the administrator. At first sign-in, the Azure Site
Recovery Configuration Tool runs by default.
5. In the tool, they specify a name to use for registering the configuration server in the vault.
6. The tool checks that the VM can connect to Azure. After the connection is established, they sign in to the
Azure subscription. The credentials must have access to the vault in which you want to register the
configuration server.
7. The tool performs some configuration tasks and then reboots.
8. They sign in to the machine again, and the Configuration Server Management Wizard starts automatically.
9. In the wizard, they select the NIC to receive replication traffic. This setting can't be changed after it's
configured.
10. They select the subscription, resource group, and vault in which to register the configuration server.

11. They then download and install MySQL Server, and VMware PowerCLI.
12. After validation, they specify the FQDN or IP address of the vCenter server or vSphere host. They leave the
default port, and specify a friendly name for the vCenter server.
13. They specify the account that they created for automatic discovery, and the credentials that are used to
automatically install the Mobility Service. For Windows machines, the account needs local administrator
privileges on the VMs.

14. After registration finishes, in the Azure portal, they double check that the configuration server and VMware
server are listed on the Source page in the vault. Discovery can take 15 minutes or more.
15. Site Recovery then connects to VMware servers using the specified settings, and discovers VMs.
Set up the target
Now Contoso admins specify target replication settings.
1. In Prepare infrastructure > Target, they select the target settings.
2. Site Recovery checks that there's an Azure storage account and network in the specified target.
Create a replication policy
Now, Contoso admins can create a replication policy.
1. In Prepare infrastructure > Replication Settings > Replication Policy > Create and Associate, they
create a policy ContosoMigrationPolicy.
2. They use the default settings:
RPO threshold: Default of 60 minutes. This value defines how often recovery points are created. An
alert is generated if continuous replication exceeds this limit.
Recovery point retention: Default of 24 hours. This value specifies how long the retention window
is for each recovery point. Replicated VMs can be recovered to any point in a window.
App-consistent snapshot frequency: Default of one hour. This value specifies the frequency at
which application-consistent snapshots are created.

3. The policy is automatically associated with the configuration server.


Enable replication
Now Contoso admins can start replicating WebVM.
1. In Replicate application > Source > +Replicate they select the source settings.
2. They indicate that they want to enable VMs, select the vCenter server, and the configuration server.

3. Now, they specify the target settings, including the resource group and VNet, and the storage account in
which replicated data will be stored.

4. They select the WebVM for replication, checks the replication policy, and enables replication. Site Recovery
installs the Mobility Service on the VM when replication is enabled.

5. They track replication progress in Jobs. After the Finalize Protection job runs, the machine is ready for
failover.
6. In Essentials in the Azure portal, they can see the structure for the VMs replicating to Azure.

Need more help?


You can read a full walkthrough of all these steps in Set up disaster recovery for on-premises VMware VMs.
Detailed instructions are available to help you set up the source environment, deploy the configuration server,
and configure replication settings.
You can learn more about enabling replication.

Step 7: Install the Data Migration Assistant (DMA)


Contoso admins will migrate the SmartHotel360 database to Azure VM SQLAOG1 using the DMA. They set up
DMA as follows:
1. They download the tool from the Microsoft Download Center to the on-premises SQL Server VM (SQLVM ).
2. They run setup (DownloadMigrationAssistant.msi) on the VM.
3. On the Finish page, they select Launch Microsoft Data Migration Assistant before finishing the wizard.

Step 8: Migrate the database with DMA


1. In the DMA they run a new migration, SmartHotel.
2. They select the Target server type as SQL Server on Azure Virtual Machines.

3. In the migration details, they add SQLVM as the source server, and SQLAOG1 as the target. They specify
credentials for each machine.
4. They create a local share for the database and configuration information. It must be accessible with write
access by the SQL Service account on SQLVM and SQLAOG1.

5. Contoso selects the logins that should be migrated, and starts the migration. After it finishes, DMA shows
the migration as successful.
6. They verify that the database is running on SQLAOG1.

DMA connects to the on-premises SQL Server VM across a site-to-site VPN connection between the Contoso
datacenter and Azure, and then migrates the database.

Step 9: Protect the database with Always On


With the app database running on SQLAOG1, Contoso admins can now protect it using Always On availability
groups. They configure Always On using SQL Management Studio, and then assign a listener using Windows
clustering.
Create an Always On availability group
1. In SQL Management Studio, they right-click Always on High Availability to start the New Availability
Group Wizard.
2. In Specify Options, they name the availability group SHAOG. In Select Databases, they select the
SmartHotel360 database.
3. In Specify Replicas, they add the two SQL nodes as availability replicas, and configure them to provide
automatic failover with synchronous commit.

4. They configure a listener for the group (SHAOG) and port. The IP address of the internal load balancer is
added as a static IP address (10.245.40.100).

5. In Select Data Synchronization, they enable automatic seeding. With this option, SQL Server
automatically creates the secondary replicas for every database in the group, so Contoso don't have to
manually back up and restore these. After validation, the availability group is created.
6. Contoso ran into an issue when creating the group. They aren't using Active Directory Windows Integrated
security, and thus need to grant permissions to the SQL login to create the Windows Failover Cluster roles.

7. After the group is created, Contoso can see it in SQL Management Studio.
Configure a listener on the cluster
As a last step in setting up the SQL deployment, Contoso admins configure the internal load balancer as the
listener on the cluster, and brings the listener online. They use a script to do this.

Verify the configuration


With everything set up, Contoso now has a functional availability group in Azure that uses the migrated database.
Admins verify this by connecting to the internal load balancer in SQL Management Studio.

Need more help?


Learn about creating an availability group and listener.
Manually set up the cluster to use the load balancer IP address.
Learn more about creating and using SAS.

Step 10: Migrate the VM with Site Recovery


Contoso admins run a quick test failover, and then migrate the VM.
Run a test failover
Running a test failover helps ensure that everything's working as expected before the migration.
1. They run a test failover to the latest available point in time (Latest processed).
2. They select Shut down machine before beginning failover, so that Site Recovery attempts to shut down
the source VM before triggering the failover. Failover continues even if shutdown fails.
3. Test failover runs:
A prerequisites check runs to make sure all of the conditions required for migration are in place.
Failover processes the data, so that an Azure VM can be created. If select the latest recovery point, a
recovery point is created from the data.
An Azure VM is created using the data processed in the previous step.
4. After the failover finishes, the replica Azure VM appears in the Azure portal. They check that the VM is the
appropriate size, that it's connected to the right network, and that it's running.
5. After verifying, They clean up the failover, and record and save any observations.
Run a failover
1. After verifying that the test failover worked as expected, Contoso admins create a recovery plan for
migration, and add WEBVM to the plan.
2. They run a failover on the plan. They select the latest recovery point, and specify that Site Recovery should
try to shut down the on-premises VM before triggering the failover.

3. After the failover, they verify that the Azure VM appears as expected in the Azure portal.

4. After verifying the VM in Azure, they complete the migration to finish the migration process, stop
replication for the VM, and stop Site Recovery billing for the VM.

Update the connection string


As the final step in the migration process, Contoso admins update the connection string of the application to point
to the migrated database running on the SHAOG listener. This configuration will be changed on the WEBVM now
running in Azure. This configuration is located in the web.config of the ASP application.
1. Locate the file at C:\inetpub\SmartHotelWeb\web.config. Change the name of the server to reflect the
FQDN of the AOG: shaog.contoso.com.

2. After updating the file and saving it, they restart IIS on WEBVM. They do this using the IISRESET
/RESTART from a cmd prompt.
3. After IIS has been restarted, the application is now using the database running on the SQL MI.
Need more help?
Learn about running a test failover.
Learn how to create a recovery plan.
Learn about failing over to Azure.
Clean up after migration
After migration, the SmartHotel360 app is running on an Azure VM, and the SmartHotel360 database is located in
the Azure SQL cluster.
Now, Contoso needs to complete these cleanup steps:
Remove the on-premises VMs from the vCenter inventory.
Remove the VMs from local backup jobs.
Update internal documentation to show the new locations and IP addresses for VMs.
Review any resources that interact with the decommissioned VMs, and update any relevant settings or
documentation to reflect the new configuration.
Add the two new VMs (SQLAOG1 and SQLAOG2) should be added to production monitoring systems.
Review the deployment
With the migrated resources in Azure, Contoso needs to fully operationalize and secure their new infrastructure.
Security
The Contoso security team reviews the Azure VMs WEBVM, SQLAOG1 and SQLAOG2 to determine any security
issues.
The team reviews the network security groups (NSGs) for the VM to control access. NSGs are used to ensure
that only traffic allowed to the application can pass.
The team considers securing the data on the disk using Azure Disk Encryption and Key Vault.
The team should evaluate transparent data encryption (TDE ), and then enable it on the SmartHotel360
database running on the new SQL AOG. Learn more.
For more information, see Security best practices for IaaS workloads in Azure.

BCDR
For business continuity and disaster recovery (BCDR ), Contoso takes the following actions:
To keep data safe, Contoso backs up the data on the WEBVM, SQLAOG1 and SQLAOG2 VMs using the Azure
Backup service. Learn more.
Contoso will also learn about how to use Azure Storage to back up SQL Server directly to blob storage. Learn
more.
To keep apps up and running, Contoso replicates the app VMs in Azure to a secondary region using Site
Recovery. Learn more.
Licensing and cost optimization
1. Contoso has existing licensing for their WEBVM and will take advantage of the Azure Hybrid Benefit. Contoso
will convert the existing Azure VMs to take advantage of this pricing.
2. Contoso will enable Azure Cost Management licensed by Cloudyn, a Microsoft subsidiary. It's a multicloud cost
management solution that helps you to use and manage Azure and other cloud resources. Learn more about
Azure Cost Management.

Conclusion
In this article, Contoso rehosted the SmartHotel360 app in Azure by migrating the app front-end VM to Azure
using the Site Recovery service. Contoso migrated the app database to a SQL Server cluster provisioned in Azure,
and protected it in a SQL Server Always On availability group.
Refactor an on-premises app to an Azure App
Service web app and Azure SQL database
14 minutes to read • Edit Online

This article demonstrates how the fictional company Contoso refactors a two-tier Windows .NET app running on
VMware VMs as part of a migration to Azure. They migrate the app front-end VM to an Azure App Service web
app, and the app database to an Azure SQL database.
The SmartHotel360 app used in this example is provided as open source. If you'd like to use it for your own testing
purposes, you can download it from GitHub.

Business drivers
The IT leadership team has worked closely with business partners to understand what they want to achieve with
this migration:
Address business growth. Contoso is growing, and there is pressure on on-premises systems and
infrastructure.
Increase efficiency. Contoso needs to remove unnecessary procedures, and streamline processes for
developers and users. The business needs IT to be fast and not waste time or money, thus delivering faster on
customer requirements.
Increase agility. Contoso IT needs to be more responsive to the needs of the business. It must be able to react
faster than the changes in the marketplace, to enable the success in a global economy. It mustn't get in the way,
or become a business blocker.
Scale. As the business grows successfully, Contoso IT must provide systems that are able to grow at the same
pace.
Reduce costs. Contoso wants to minimize licensing costs.

Migration goals
The Contoso cloud team has pinned down goals for this migration. These goals were used to determine the best
migration method.

REQUIREMENTS DETAILS
REQUIREMENTS DETAILS

App The app in Azure will remain as critical as it is today.

It should have the same performance capabilities as it


currently does in VMware.

The team doesn't want to invest in the app. For now, admins
will simply move the app safely to the cloud.

The team want to stop supporting Windows Server 2008 R2,


on which the app currently runs.

The team also wants to move away from SQL Server 2008 R2
to a modern PaaS Database platform, which will minimize the
need for management.

Contoso wants to take advantage of its investment in SQL


Server licensing and Software Assurance where possible.

In addition, Contoso wants to mitigate the single point of


failure on the web tier.

Limitations The app consists of an ASP.NET app and a WCF service


running on the same VM. They want to split this across two
web apps using the Azure App Service.

Azure Contoso wants to move the app to Azure, but doesn't want to
run it on VMs. Contoso wants to use Azure PaaS services for
both the web and data tiers.

DevOps Contoso wants to move to a DevOps model, using Azure


DevOps for their builds and release pipelines.

Solution design
After pinning down goals and requirements, Contoso designs and review a deployment solution, and identifies the
migration process, including the Azure services that will be used for migration.
Current app
The SmartHotel360 on-premises app is tiered across two VMs (WEBVM and SQLVM ).
The VMs are located on VMware ESXi host contosohost1.contoso.com (version 6.5)
The VMware environment is managed by vCenter Server 6.5 (vcenter.contoso.com ), running on a VM.
Contoso has an on-premises datacenter (contoso-datacenter), with an on-premises domain controller
(contosodc1).
The on-premises VMs in the Contoso datacenter will be decommissioned after the migration is done.
Proposed solution
For the database tier of the app, Contoso compared Azure SQL Database with SQL Server using this article.
Contoso decided to go with Azure SQL Database for a few reasons:
Azure SQL Database is a relational-database managed service. It delivers predictable performance at
multiple service levels, with near-zero administration. Advantages include dynamic scalability with no
downtime, built-in intelligent optimization, and global scalability and availability.
Contoso can use the lightweight Data Migration Assistant (DMA) to assess and migrate the on-premises
database to Azure SQL.
With Software Assurance, Contoso can exchange existing licenses for discounted rates on a SQL
Database, using the Azure Hybrid Benefit for SQL Server. This could provide savings of up to 30%.
SQL Database provides security features such as always encrypted, dynamic data masking, and row -level
security/threat detection.
For the app web tier, Contoso has decided to use Azure App Service. This PaaS service enables that to deploy
the app with just a few configuration changes. Contoso will use Visual Studio to make the change, and deploy
two web apps. One for the website, and one for the WCF service.
To meet requirements for a DevOps pipeline, Contoso has selected to use Azure DevOps for Source Code
Management (SCM ) with Git repos. Automated builds and release will be used to build the code, and deploy it
to the Azure App Service.
Solution review
Contoso evaluates their proposed design by putting together a pros and cons list.

CONSIDERATION DETAILS

Pros The SmartHotel360 app code won't need to be altered for


migration to Azure.

Contoso can take advantage of their investment in Software


Assurance using the Azure Hybrid Benefit for both SQL Server
and Windows Server.

After the migration Windows Server 2008 R2 won't need to be


supported. Learn more.

Contoso can configure the web tier of the app with multiple
instances, so that it's no longer a single point of failure.

The database will no longer depend on the aging SQL Server


2008 R2.

SQL Database supports the technical requirements. Contoso


assessed the on-premises database using the Data Migration
Assistant and found that it's compatible.

Azure SQL Database has built-in fault tolerance that Contoso


don't need to set up. This ensures that the data tier is no
longer a single point of failover.

Cons Azure App Service only supports one app deployment for each
web app. This means that two web apps must be provisioned
(one for the website and one for the WCF service).

If Contoso uses the Data Migration Assistant instead of Azure


Database Migration Service to migrate their database, it won't
have the infrastructure ready for migrating databases at scale.
Contoso will need to build another region to ensure failover if
the primary region is unavailable.

Proposed architecture
Migration process
1. Contoso provisions an Azure SQL instance, and migrates the SmartHotel360 database to it.
2. Contoso provisions and configures web apps, and deploys the SmartHotel360 app to them.

Azure services
SERVICE DESCRIPTION COST

Data Migration Assistant (DMA) Contoso will use DMA to assess and It's a downloadable tool free of charge.
detect compatibility issues that might
affect their database functionality in
Azure. DMA assesses feature parity
between SQL sources and targets, and
recommends performance and reliability
improvements.

Azure SQL Database An intelligent, fully managed relational Cost based on features, throughput,
cloud database service. and size. Learn more.

Azure App Service Create powerful cloud apps using a fully Cost based on size, location, and usage
managed platform duration. Learn more.
SERVICE DESCRIPTION COST

Azure DevOps Provides a continuous integration and


continuous deployment (CI/CD) pipeline
for app development. The pipeline
starts with a Git repository for
managing app code, a build system for
producing packages and other build
artifacts, and a Release Management
system to deploy changes in dev, test,
and production environments.

Prerequisites
Here's Contoso needs to run this scenario:

REQUIREMENTS DETAILS

Azure subscription Contoso created subscriptions during an early article. If you


don't have an Azure subscription, create a free account.

If you create a free account, you're the administrator of your


subscription and can perform all actions.

If you use an existing subscription and you're not the


administrator, you need to work with the admin to assign you
Owner or Contributor permissions.

Azure infrastructure Learn how Contoso set up an Azure infrastructure.

Scenario steps
Here's how Contoso will run the migration:
Step 1: Provision a SQL Database instance in Azure. Contoso provisions a SQL instance in Azure. After the
app website is migrate to Azure, the WCF service web app will point to this instance.
Step 2: Migrate the database with DMA. Contoso migrates the app database with the Data Migration
Assistant.
Step 3: Provision web apps. Contoso provisions the two web apps.
Step 4: Set up Azure DevOps. Contoso creates a new Azure DevOps project, and imports the Git repo.
Step 5: Configure connection strings. Contoso configures connection strings so that the web tier web app,
the WCF service web app, and the SQL instance can communicate.
Step 6: Set up build and release pipelines. As a final step, Contoso sets up build and release pipelines to
create the app, and deploys them to two separate web wpps.

Step 1: Provision an Azure SQL Database


1. Contoso admins select to create a SQL Database in Azure.
2. They specify a database name to match the database running on the on-premises VM
(SmartHotel.Registration). They place the database in the ContosoRG resource group. This is the resource
group they use for production resources in Azure.

3. They set up a new SQL Server instance (sql-smarthotel-eus2) in the primary region.
4. They set the pricing tier to match their server and database needs. And they select to save money with Azure
Hybrid Benefit because they already have a SQL Server license.
5. For sizing they use v-Core-based purchasing, and set the limits for their expected requirements.

6. Then they create the database instance.


7. After the instance is created, they open the database, and note details they need when they use the Data
Migration Assistant for migration.

Need more help?


Get help provisioning a SQL Database.
Learn about v-Core resource limits.

Step 2: Migrate the database with DMA


Contoso admins will migrate the SmartHotel360 database using DMA.
Install DMA
1. They download the tool from the Microsoft Download Center to the on-premises SQL Server VM (SQLVM ).
2. They run setup (DownloadMigrationAssistant.msi) on the VM.
3. On the Finish page, they select Launch Microsoft Data Migration Assistant before finishing the wizard.
Migrate the database with DMA
1. In the DMA, they create a new project (SmartHotelDB ) and select Migration.
2. They select the source server type as SQL Server, and the target as Azure SQL Database.

3. In the migration details, they add SQLVM as the source server, and the SmartHotel.Registration database.

4. They receive an error which seems to be associated with authentication. However after investigating, the
issue is the period (.) in the database name. As a workaround, they decided to provision a new SQL database
using the name SmartHotel-Registration, to resolve the issue. When they run DMA again, they're able to
select SmartHotel-Registration, and continue with the wizard.
5. In Select Objects, they select the database tables, and generate a SQL script.

6. After DMA creates the script, they select Deploy schema.


7. DMA confirms that the deployment succeeded.

8. Now they start the migration.


9. After the migration finishes, Contoso admins can verify that the database is running on the Azure SQL
instance.

10. They delete the extra SQL database SmartHotel.Registration in the Azure portal.

Step 3: Provision web apps


With the database migrated, Contoso admins can now provision the two web apps.
1. They select Web App in the portal.

2. They provide an app name (SHWEB -EUS2), run it on Windows, and place it un the production resources
group ContosoRG. They create a new web app and Azure App Service plan.

3. After the web app is provisioned, they repeat the process to create a web app for the WCF service ( SHWCF-
EUS2)
4. After they're done, they browse to the address of the apps to check they've been created successfully.

Step 4: Set up Azure DevOps


Contoso needs to build the DevOps infrastructure and pipelines for the application. To do this, Contoso admins
create a new DevOps project, import the code, and then set up build and release pipelines.
1. In the Contoso Azure DevOps account, they create a new project (ContosoSmartHotelRefactor), and
select Git for version control.
2. They import the Git Repo that currently holds their app code. It's in a public repo and you can download it.

3. After the code is imported, they connect Visual Studio to the repo, and clone the code using Team Explorer.
4. After the repository is cloned to the developer machine, they open the Solution file for the app. The web app
and wcf service each have separate project within the file.

Step 5: Configure connection strings


Contoso admins need to make sure the web apps and database can all communicate. To do this, they configure
connection strings in the code and in the web apps.
1. In the web app for the WCF service ( SHWCF-EUS2) > Settings > Application settings, they add a new
connection string named DefaultConnection.
2. The connection string is pulled from the SmartHotel-Registration database, and should be updated with
the correct credentials.
3. Using Visual Studio, they open the SmartHotel.Registration.wcf project from the solution file. The
connectionStrings section of the web.config file for the WCF service SmartHotel.Registration.Wcf should
be updated with the connection string.

4. The client section of the web.config file for the SmartHotel.Registration.Web should be changed to point to
the new location of the WCF service. This is the URL of the WCF web app hosting the service endpoint.

5. After the changes are in the code, admins need to commit the changes. Using Team Explorer in Visual
Studio, they commit and sync.

Step 6: Set up build and release pipelines in Azure DevOps


Contoso admins now configure Azure DevOps to perform build and release process.
1. In Azure DevOps, they select Build and release > New pipeline.
2. They select Azure Repos Git and the relevant repo.

3. In Select a template, they select the ASP.NET template for their build.
4. The name ContosoSmartHotelRefactor-ASP.NET-CI is used for the build. They select Save & Queue.

5. This kicks off the first build. They select the build number to watch the process. After it's finished they can
see the process feedback, and select Artifacts to review the build results.

6. The folder Drop contains the build results.


The two zip files are the packages that contain the apps.
These files are used in the release pipeline for deployment to Azure App Service.
7. They select Releases > +New pipeline.

8. They select the deployment template for Azure App Service.


9. They name the release pipeline ContosoSmartHotel360Refactor, and specify the name of the WCF web
app (SHWCF -EUS2) for the Stage name.

10. Under the stages, they select 1 job, 1 task to configure deployment of the WCF service.

11. They verify the subscription is selected and authorized, and select the App service name.
12. On the pipeline > Artifacts, they select +Add an artifact, and select to build with the
ContosoSmarthotel360Refactor pipeline.
13. They select the lightning bolt on the artifact is checked., to enable continuous deployment trigger.

14. The continuous deployment trigger should be set to Enabled.


15. Now, they move back to the Stage 1 job, I tasks, and select Deploy Azure App Service.

16. In Select a file or folder, they locate the SmartHotel.Registration.Wcf.zip file that was creating during
the build, and select Save.

17. They select Pipeline > Stages +Add, to add an environment for SHWEB -EUS2. They select another Azure
App Service deployment.

18. They repeat the process to publish the web app (SmartHotel.Registration.Web.zip) file to the correct web
app.

19. After it's saved, the release pipeline will show as follows.

20. They move back to Build, and select Triggers > Enable continuous integration. This enables the pipeline
so that when changes are committed to the code, and full build and release occurs.
21. They select Save & Queue to run the full pipeline. A new build is triggered that in turn creates the first
release of the app to the Azure App Service.

22. Contoso admins can follow the build and release pipeline process from Azure DevOps. After the build
completes, the release will start.

23. After the pipeline finishes, both sites have been deployed and the app is up and running online.
At this point, the app is successfully migrated to Azure.

Clean up after migration


After migration, Contoso needs to complete these cleanup steps:
Remove the on-premises VMs from the vCenter inventory.
Remove the VMs from local backup jobs.
Update internal documentation to show the new locations for the SmartHotel360 app. Show the database as
running in Azure SQL database, and the front end as running in two web apps.
Review any resources that interact with the decommissioned VMs, and update any relevant settings or
documentation to reflect the new configuration.

Review the deployment


With the migrated resources in Azure, Contoso needs to fully operationalize and secure their new infrastructure.
Security
Contoso needs to ensure that their new SmartHotel-Registration database is secure. Learn more.
In particular, Contoso should update the web apps to use SSL with certificates.
Backups
Contoso needs to review backup requirements for the Azure SQL Database. Learn more.
Contoso also needs to learn about managing SQL Database backups and restores. Learn more about automatic
backups.
Contoso should consider implementing failover groups to provide regional failover for the database. Learn
more.
Contoso needs to consider deploying the web app in the main East US 2 and Central US region for resilience.
Contoso could configure Traffic Manager to ensure failover in case of regional outages.
Licensing and cost optimization
After all resources are deployed, Contoso should assign Azure tags based on their infrastructure planning.
All licensing is built into the cost of the PaaS services that Contoso is consuming. This will be deducted from the
EA.
Contoso will enable Azure Cost Management licensed by Cloudyn, a Microsoft subsidiary. It's a multicloud cost
management solution that helps you to use and manage Azure and other cloud resources. Learn more about
Azure Cost Management.

Conclusion
In this article, Contoso refactored the SmartHotel360 app in Azure by migrating the app front-end VM to two
Azure App Service web apps. The app database was migrated to an Azure SQL database.
Refactor a Linux app to multiple regions using Azure
App Service, Traffic Manager, and Azure Database for
MySQL
11 minutes to read • Edit Online

This article shows how the fictional company Contoso refactors a two-tier Linux-based Apache MySQL PHP
(LAMP ) app, migrating it from on-premises to Azure using Azure App Service with GitHub integration and Azure
Database for MySQL.
osTicket, the service desk app used in this example is provided as open source. If you'd like to use it for your own
testing purposes, you can download it from GitHub.

Business drivers
The IT Leadership team has worked closely with business partners to understand what they want to achieve:
Address business growth. Contoso is growing and moving into new markets. It needs additional customer
service agents.
Scale. The solution should be built so that Contoso can add more customer service agents as the business
scales.
Improve resiliency. In the past issues with the system affected internal users only. With the new business
model, external users will be affected, and Contoso need the app up and running at all times.

Migration goals
The Contoso cloud team has pinned down goals for this migration, in order to determine the best migration
method:
The application should scale beyond current on-premises capacity and performance. Contoso is moving the
application to take advantage of Azure's on-demand scaling.
Contoso wants to move the app code base to a continuous delivery pipeline. As app changes are pushed to
GitHub, Contoso wants to deploy those changes without tasks for operations staff.
The application must be resilient with capabilities for growth and failover. Contoso wants to deploy the app in
two different Azure regions, and set it up to scale automatically.
Contoso wants to minimize database admin tasks after the app is moved to the cloud.

Solution design
After pinning down their goals and requirements, Contoso designs and reviews a deployment solution, and
identifies the migration process, including the Azure services that will be used for the migration.

Current architecture
The app is tiered across two VMs (OSTICKETWEB and OSTICKETMYSQL ).
The VMs are located on VMware ESXi host contosohost1.contoso.com (version 6.5).
The VMware environment is managed by vCenter Server 6.5 (vcenter.contoso.com ), running on a VM.
Contoso has an on-premises datacenter (contoso-datacenter), with an on-premises domain controller
(contosodc1).
Proposed architecture
Here's the proposed architecture:
The web tier app on OSTICKETWEB will be migrated by building an Azure App Service in two Azure regions.
Azure App Service for Linux will be implemented using the PHP 7.0 Docker container.
The app code will be moved to GitHub, and the Azure App Service web app will be configured for continuous
delivery with GitHub.
Azure App Servers will be deployed in both the primary (East US 2) and secondary (Central US ) region.
Traffic Manager will be set up in front of the two web apps in both regions.
Traffic Manager will be configured in priority mode to force the traffic through East US 2.
If the Azure App Server in East US 2 goes offline, users can access the failed over app in Central US.
The app database will be migrated to the Azure Database for MySQL service using MySQL Workbench tools.
The on-premises database will be backed up locally, and restored directly to Azure Database for MySQL.
The database will reside in the primary East US 2 region, in the database subnet (PROD -DB -EUS2) in the
production network (VNET-PROD -EUS2):
Since they're migrating a production workload, Azure resources for the app will reside in the production
resource group ContosoRG.
The Traffic Manager resource will be deployed in Contoso's infrastructure resource group ContosoInfraRG.
The on-premises VMs in the Contoso datacenter will be decommissioned after the migration is done.

Migration process
Contoso will complete the migration process as follows:
1. As a first step, Contoso admins set up the Azure infrastructure, including provisioning Azure App Service,
setting up Traffic Manager, and provisioning an Azure Datbase for MySQL instance.
2. After preparing the Azure, they migrate the database using MySQL Workbench.
3. After the database is running in Azure, they up a GitHub private repository for Azure App Service with
continuous delivery, and load it with the osTicket app.
4. In the Azure portal, they load the app from GitHub to the Docker container running Azure App Service.
5. They tweak DNS settings, and configure autoscaling for the app.
Azure services
SERVICE DESCRIPTION COST

Azure App Service The service runs and scales Pricing is based on the size
applications using the Azure of the instances, and the
PaaS service for websites. features required. Learn
more.

Traffic Manager A load balancer that uses Pricing is based on the Learn more.
DNS to direct users to Azure, number of DNS queries
or external websites and received, and the number of
services. monitored endpoints.

Azure Database for MySQL The database is based on Pricing based on compute,
the open-source MySQL storage, and backup
Server engine. It provides a requirements. Learn more.
fully managed, enterprise-
ready community MySQL
database, as a service for
app development and
deployment.

Prerequisites
Here's what Contoso needs to run this scenario.

REQUIREMENTS DETAILS

Azure subscription Contoso created subscriptions earlier in this article series. If


you don't have an Azure subscription, create a free account.

If you create a free account, you're the administrator of your


subscription and can perform all actions.

If you use an existing subscription and you're not the


administrator, you need to work with the admin to assign you
Owner or Contributor permissions.

Azure infrastructure Contoso set up their Azure infrastructure as described in


Azure infrastructure for migration.
Scenario steps
Here's how Contoso will complete the migration:
Step 1: Provision Azure App Service. Contoso admins will provision web apps in the primary and secondary
regions.
Step 2: Set up Traffic Manager. They set up Traffic Manager in front of the web apps, for routing and load
balancing traffic.
Step 3: Provision MySQL. In Azure, they provision an instance of Azure Database for MySQL.
Step 4: Migrate the database. They migrate the database using MySQL Workbench.
Step 5: Set up GitHub. They set up a local GitHub repository for the app web sites/code.
Step 6: Deploy the web apps. They deploy the web apps from GitHub.

Step 1: Provision Azure App Service


Contoso admins provision two web apps (one in each region) using Azure App Service.
1. They create a web App resource in the primary East US 2 region ( osticket-eus2) from the Azure
Marketplace.
2. They put the resource in the production resource group ContosoRG.

3. They create a new App Service plan in the primary region ( APP -SVP -EUS2), using the standard size.
4. They select a Linux OS with PHP 7.0 runtime stack, which is a Docker container.

5. They create a second web app (osticket-cus), and Azure App Service plan for the Central US region.
Need more help?
Learn about Azure App Service web apps.
Learn about Azure App Service on Linux.

Step 2: Set up Traffic Manager


Contoso admins set up Traffic Manager to direct inbound web requests to the web apps running on the osTicket
web tier.
1. They create a Traffic Manager resource (osticket.trafficmanager.net) from the Azure Marketplace. They
use priority routing so that East US 2 is the primary site. They place the resource in their infrastructure
resource group (ContosoInfraRG). Note that Traffic Manager is global and not bound to a specific location.
2. Now, they configure Traffic Manager with endpoints. They add the East US 2 web app as the primary site
(osticket-eus2), and the Central US app as secondary ( osticket-cus).

3. After adding the endpoints, they can monitor them.


Need more help?
Learn about Traffic Manager.
Learn about routing traffic to a priority endpoint.

Step 3: Provision Azure Database for MySQL


Contoso admins provision a MySQL database instance in the primary East US 2 region.
1. In the Azure portal, they create an Azure Database for MySQL resource.

2. They add the name contosoosticket for the Azure database. They add the database to the production
resource group ContosoRG, and specify credentials for it.
3. The on-premises MySQL database is version 5.7, so they select this version for compatibility. They use the
default sizes, which match their database requirements.
4. For Backup Redundancy Options, they select to use Geo-Redundant. This option allows them to restore
the database in their secondary Central US region if an outage occurs. They can only configure this option
when they provision the database.

5. They set up connection security. In the database > Connection Security, they set up Firewall rules to allow
the database to access Azure services.
6. They add the local workstation client IP address to the start and end IP addresses. This allows the web apps
to access the MySQL database, along with the database client that's performing the migration.
Step 4: Migrate the database
Contoso admins migrate the database using backup and restore, with MySQL tools. They install MySQL
Workbench, back up the database from OSTICKETMYSQL, and then restore it to Azure Database for MySQL
Server.
Install MySQL Workbench
1. They check the prerequisites and downloads MySQL Workbench.
2. They install MySQL Workbench for Windows in accordance with the installation instructions. The machine
on which they install must be accessible to the OSTICKETMYSQL VM, and Azure via the internet.
3. In MySQL Workbench, they create a MySQL connection to OSTICKETMYSQL.

4. They export the database as osticket, to a local self-contained file.


5. After the database has been backed up locally, they create a connection to the Azure Database for MySQL
instance.

6. Now, they can import (restore) the database in the Azure Database for MySQL instance, from the self-
contained file. A new schema (osticket) is created for the instance.
7. After data is restored, it can be queried using Workbench, and appears in the Azure portal.

8. Finally, they need to update the database information on the web apps. On the MySQL instance, they open
Connection Strings.
9. In the strings list, they locate the web app settings, and select to copy them.

10. They open a Notepad window and paste the string into a new file, and update it to match the osticket
database, MySQL instance, and credentials settings.

11. They can verify the server name and login from Overview in the MySQL instance in the Azure portal.

Step 5: Set up GitHub


Contoso admins create a new private GitHub repo, and sets up a connection to the osTicket database in Azure
Database for MySQL. Then, they load the web app into Azure App Service.
1. They browse to the OsTicket software public GitHub repo, and fork it to the Contoso GitHub account.

2. After forking, they navigate to the include folder, and find the ost-config.php file.
3. The file opens in the browser and they edit it.

4. In the editor, they update the database details, specifically DBHOST and DBUSER.

5. Then they commit the changes.

6. For each web app (osticket-eus2 and osticket-cus), they modify the Application settings in the Azure
portal.
7. They enter the connection string with the name osticket, and copy the string from notepad into the value
area. They select MySQL in the dropdown list next to the string, and save the settings.

Step 6: Configure the web apps


As the final step in the migration process, Contoso admins configure the web apps with the osTicket web sites.
1. In the primary web app ( osticket-eus2) they open Deployment option and set the source to GitHub.

2. They select the deployment options.


3. After setting the options, the configuration shows as pending in the Azure portal.

4. After the configuration is updated and the osTicket web app is loaded from GitHub to the Docket container
running the Azure App Service, the site shows as Active.

5. They repeat the above steps for the secondary web app ( osticket-cus).
6. After the site is configured, it's accessible via the Traffic Manager profile. The DNS name is the new location
of the osTicket app. Learn more.
7. Contoso wants a DNS name that's easy to remember. They create an alias record (CNAME )
osticket.contoso.com which points to the Traffic Manager name, in the DNS on their domain controllers.

8. They configure both the osticket-eus2 and osticket-cus web apps to allow the custom hostnames.

Set up autoscaling
Finally, they set up automatic scaling for the app. This ensures that as agents use the app, the app instances
increase and decrease according to business needs.
1. In App Service APP -SRV -EUS2, they open Scale Unit.
2. They configure a new autoscale setting with a single rule that increases the instance count by one when the
CPU percentage for the current instance is above 70% for 10 minutes.

3. They configure the same setting on APP -SRV -CUS to ensure that the same behavior applies if the app fails
over to the secondary region. The only difference is that they set the default instance to 1 since this is for
failovers only.

Clean up after migration


With migration complete, the osTicket app is refactored to running in an Azure App Service web app with
continuous delivery using a private GitHub repo. The app's running in two regions for increased resilience. The
osTicket database is running in Azure database for MySQL after migration to the PaaS platform.
For clean up, Contoso needs to do the following:
Remove the VMware VMs from the vCenter inventory.
Remove the on-premises VMs from local backup jobs.
Update internal documentation show new locations and IP addresses.
Review any resources that interact with the on-premises VMs, and update any relevant settings or
documentation to reflect the new configuration.
Reconfigure monitoring to point at the osticket-trafficmanager.net URL, to track that the app is up and running.

Review the deployment


With the app now running, Contoso need to fully operationalize and secure their new infrastructure.
Security
The Contoso security team reviewed the app to determine any security issues. They identified that the
communication between the osTicket app and the MySQL database instance isn't configured for SSL. They will
need to do this to ensure that database traffic can't be hacked. Learn more.
Backups
The osTicket web apps don't contain state data and thus don't need to be backed up.
They don't need to configure backup for the database. Azure Database for MySQL automatically creates server
backups and stores. They selected to use geo-redundancy for the database, so it's resilient and production-
ready. Backups can be used to restore your server to a point-in-time. Learn more.
Licensing and cost optimization
There are no licensing issues for the PaaS deployment.
Contoso will enable Azure Cost Management licensed by Cloudyn, a Microsoft subsidiary. It's a multicloud cost
management solution that helps you use and manage Azure and other cloud resources. Learn more about
Azure Cost Management.
Rebuild an on-premises app on Azure
20 minutes to read • Edit Online

This article demonstrates how the fictional company Contoso rebuilds a two-tier Windows .NET app running on
VMware VMs as part of a migration to Azure. Contoso migrates the app's front-end VM to an Azure App Service
web app. The app back end is built using microservices deployed to containers managed by Azure Kubernetes
Service (AKS ). The site interacts with Azure Functions to provide pet photo functionality.
The SmartHotel360 app used in this example is provided as open source. If you'd like to use it for your own testing
purposes, you can download it from GitHub.

Business drivers
The IT leadership team has worked closely with business partners to understand what they want to achieve with
this migration:
Address business growth. Contoso is growing, and wants to provide differentiated experiences for customers
on Contoso websites.
Be agile. Contoso must be able to react faster than the changes in the marketplace, to enable the success in a
global economy.
Scale. As the business grows successfully, the Contoso IT team must provide systems that are able to grow at
the same pace.
Reduce costs. Contoso wants to minimize licensing costs.

Migration goals
The Contoso cloud team has pinned down app requirements for this migration. These requirements were used to
determine the best migration method:
The app in Azure is still as critical as it is today. It should perform well and scale easily.
The app shouldn't use IaaS components. Everything should be built to use PaaS or serverless services.
The app builds should run in cloud services, and containers should reside in a private Enterprise-wide container
registry in the cloud.
The API service used for pet photos should be accurate and reliable in the real world, since decisions made by
the app must be honored in their hotels. Any pet granted access is allowed to stay at the hotels.
To meet requirements for a DevOps pipeline, Contoso will use Azure DevOps for source code management
(SCM ), with Git Repos. Automated builds and releases will be used to build code and deploy to Azure App
Service, Azure Functions, and AKS.
Different CI/CD pipelines are needed for microservices on the back end, and for the web site on the front end.
The back-end services have a different release cycle from the front-end web app. To meet this requirement, they
will deploy two different pipelines.
Contoso needs management approval for all front-end website deployment, and the CI/CD pipeline must
provide this.

Solution design
After pinning down goals and requirements, Contoso designs and review a deployment solution, and identifies the
migration process, including the Azure services that will be used for the migration.
Current app
The SmartHotel360 on-premises app is tiered across two VMs (WEBVM and SQLVM ).
The VMs are located on VMware ESXi host contosohost1.contoso.com (version 6.5)
The VMware environment is managed by vCenter Server 6.5 (vcenter.contoso.com ), running on a VM.
Contoso has an on-premises datacenter (contoso-datacenter), with an on-premises domain controller
(contosodc1).
The on-premises VMs in the Contoso datacenter will be decommissioned after the migration is done.
Proposed architecture
The front end of the app is deployed as an Azure App Service web app in the primary Azure region.
An Azure function provides uploads of pet photos, and the site interacts with this functionality.
The pet photo function uses the Azure Cognitive Services Vision API and Cosmos DB.
The back end of the site is built using microservices. These will be deployed to containers managed on the
Azure Kubernetes service (AKS ).
Containers will be built using Azure DevOps, and pushed to the Azure Container Registry (ACR ).
For now, Contoso will manually deploy the web app and function code using Visual Studio.
Microservices will be deployed using a PowerShell script that calls Kubernetes command-line tools.

Solution review
Contoso evaluates the proposed design by putting together a pros and cons list.

CONSIDERATION DETAILS
CONSIDERATION DETAILS

Pros Using PaaS and serverless solutions for the end-to-end


deployment significantly reduces management time that
Contoso must provide.

Moving to a microservice architecture allows Contoso to easily


extend the solution over time.

New functionality can be brought online without disrupting


any of the existing solutions code bases.

The web app will be configured with multiple instances with no


single point of failure.

Autoscaling will be enabled so that the app can handle


differing traffic volumes.

With the move to PaaS services, Contoso can retire out-of-


date solutions running on Windows Server 2008 R2 operating
system.

Cosmos DB has built-in fault tolerance, which requires no


configuration by Contoso. This means that the data tier is no
longer a single point of failover.

Cons Containers are more complex than other migration options.


The learning curve could be an issue for Contoso. They
introduce a new level of complexity that provides a lot of value
in spite of the curve.

The operations team at Contoso needs to ramp up to


understand and support Azure, containers and microservices
for the app.

Contoso hasn't fully implemented DevOps for the entire


solution. Contoso needs to consider that for the deployment
of services to AKS, Azure Functions, and Azure App Service.

Migration process
1. Contoso provision the ACR, AKS, and Cosmos DB.
2. They provision the infrastructure for the deployment, including Azure App Service web app, storage
account, function, and API.
3. After the infrastructure is in place, they'll build their microservices container images using Azure DevOps,
which pushes them to the ACR.
4. Contoso will deploy these microservices to AKS using a PowerShell script.
5. Finally, they'll deploy the function and web app.
Azure services
SERVICE DESCRIPTION COST

AKS Simplifies Kubernetes management, AKS is a free service. Pay for only the
deployment, and operations. Provides a virtual machines, and associated
fully managed Kubernetes container storage and networking resources
orchestration service. consumed. Learn more.

Azure Functions Accelerates development with an event- Pay only for consumed resources. Plan
driven, serverless compute experience. is billed based on per-second resource
Scale on demand. consumption and executions. Learn
more.

Azure Container Registry Stores images for all types of container Cost based on features, storage, and
deployments. usage duration. Learn more.

Azure App Service Quickly build, deploy, and scale App Service plans are billed on a per
enterprise-grade web, mobile, and API second basis. Learn more.
apps running on any platform.

Prerequisites
Here's what Contoso needs for this scenario:

REQUIREMENTS DETAILS

Azure subscription Contoso created subscriptions during an earlier article. If you


don't have an Azure subscription, create a free account.

If you create a free account, you're the administrator of your


subscription and can perform all actions.

If you use an existing subscription and you're not the


administrator, you need to work with the admin to assign you
Owner or Contributor permissions.

Azure infrastructure Learn how Contoso set up an Azure infrastructure.


REQUIREMENTS DETAILS

Developer prerequisites Contoso needs the following tools on a developer workstation:

- Visual Studio 2017 Community Edition: Version 15.5

.NET workload enabled.

Git

Azure PowerShell

Azure CLI

Docker CE (Windows 10) or Docker EE (Windows Server) set to


use Windows Containers.

Scenario steps
Here's how Contoso will run the migration:
Step 1: Provision AKS and ACR. Contoso provisions the managed AKS cluster and Azure container registry
using PowerShell.
Step 2: Build Docker containers. They set up CI for Docker containers using Azure DevOps, and push them
to the ACR.
Step 3: Deploy back-end microservices. They deploy the rest of the infrastructure that will be used by back-
end microservices.
Step 4: Deploy front-end infrastructure. They deploy the front-end infrastructure, including blob storage for
the pet phones, the Cosmos DB, and Vision API.
Step 5: Migrate the back end. They deploy microservices and run on AKS, to migrate the back end.
Step 6: Publish the front end. They publish the SmartHotel360 app to the App Service, and the function app
that will be called by the pet service.

Step 1: Provision back-end resources


Contoso admins run a deployment script to create the managed Kubernetes cluster using AKS and the Azure
Container Registry (ACR ).
The instructions for this section use the SmartHotel360-Azure-backend repository.
The SmartHotel360-Azure-backend GitHub repository contains all of the software for this part of the
deployment.
Ensure prerequisites
1. Before they start, Contoso admins ensure that all prerequisite software in installed on the dev machine they're
using for the deployment.
2. They clone the repository local to the dev machine using Git:
git clone https://github.com/Microsoft/SmartHotel360-Azure-backend.git

Provision AKS and ACR


The Contoso admins provision as follows:
1. They open the folder using Visual Studio Code, and moves to the /deploy/k8s directory, which contains the
script gen-aks-env.ps1.
2. They run the script to create the managed Kubernetes cluster, using AKS and ACR.
3. With the file open, they update the $location parameter to eastus2, and save the file.

4. They select View > Integrated Terminal to open the integrated terminal in Visual Studio Code.
5. In the PowerShell Integrated terminal, they sign into Azure using the Connect-AzureRmAccount command.
Learn more about getting started with PowerShell.

6. They authenticate Azure CLI by running the az login command, and following the instructions to
authenticate using their web browser. Learn more about logging in with Azure CLI.
7. They run the following command, passing the resource group name of ContosoRG, the name of the AKS
cluster smarthotel-aks-eus2, and the new registry name.

.\gen-aks-env.ps1 -resourceGroupName ContosoRg -orchestratorName smarthotelakseus2 -registryName


smarthotelacreus2

8. Azure creates another resource group, containing the resources for the AKS cluster.

9. After the deployment is finished, they install the kubectl command-line tool. The tool is already installed on
the Azure CloudShell.

az aks install-cli

10. They verify the connection to the cluster by running the kubectl get nodes command. The node is the
same name as the VM in the automatically created resource group.
11. They run the following command to start the Kubernetes Dashboard:

az aks browse --resource-group ContosoRG --name smarthotelakseus2

12. A browser tab opens to the Dashboard. This is a tunneled connection using the Azure CLI.

Step 2: Configure the back-end pipeline


Create an Azure DevOps project and build
Contoso creates an Azure DevOps project, and configures a CI Build to create the container and then pushes it to
the ACR. The instructions in this section use the SmartHotel360-Azure-Backend repository.
1. From visualstudio.com, they create a new organization (contosodevops360.visualstudio.com ), and
configure it to use Git.
2. They create a new project (SmartHotelBackend) using Git for version control, and Agile for the workflow.
3. They import the GitHub repo.

4. In Pipelines, they select Build, and create a new pipeline using Azure Repos Git as a source, from the
repository.
5. They select to start with an empty job.

6. They select Hosted Linux Preview for the build pipeline.

7. In Phase 1, they add a Docker Compose task. This task builds the Docker compose.
8. They repeat and add another Docker Compose task. This one pushes the containers to ACR.

9. They select the first task (to build), and configure the build with the Azure subscription, authorization, and
the ACR.
10. They specify the path of the docker-compose.yaml file, in the src folder of the repo. They select to build
service images and include the latest tag. When the action changes to Build service images, the name of
the Azure DevOps task changes to Build services automatically.

11. Now, they configure the second Docker task (to push). They select the subscription and the
smarthotelacreus2 ACR.
12. Again, they enter the file to the docker-compose.yaml file, and select Push service images and include the
latest tag. When the action changes to Push service images, the name of the Azure DevOps task changes
to Push services automatically.

13. With the Azure DevOps tasks configured, Contoso saves the build pipeline, and starts the build process.
14. They select the build job to check progress.
15. After the build finishes, the ACR shows the new repos, which are populated with the containers used by the
microservices.

Deploy the back-end infrastructure


With the AKS cluster created and the Docker images built, Contoso admins now deploy the rest of the
infrastructure that will be used by back-end microservices.
Instructions in the section use the SmartHotel360-Azure-Backend repo.
In the /deploy/k8s/arm folder, there's a single script to create all items.
They deploy as follows:
1. They open a developer command prompt, and use the command az login for the Azure subscription.
2. They use the deploy.cmd file to deploy the Azure resources in the ContosoRG resource group and EUS2
region, by typing the following command:

.\deploy.cmd azuredeploy ContosoRG -c eastus2


3. In the Azure portal, they capture the connection string for each database, to be used later.

Create the back-end release pipeline


Now, Contoso admins do the following:
Deploy the NGINX ingress controller to allow inbound traffic to the services.
Deploy the microservices to the AKS cluster.
As a first step they update the connection strings to the microservices using Azure DevOps. They then configure
a new Azure DevOps Release pipeline to deploy the microservices.
The instructions in this section use the SmartHotel360-Azure-Backend repo.
Some of the configuration settings (for example Active Directory B2C ) aren’t covered in this article. For more
information about these settings, review the repo above.
They create the pipeline:
1. Using Visual Studio they update the /deploy/k8s/config_local.yml file with the database connection
information they noted earlier.
2. They open Azure DevOps, and in the SmartHotel360 project, in Releases, they select +New Pipeline.

3. They select Empty Job to start the pipeline without a template.


4. They provide the stage and pipeline names.

5. They add an artifact.


6. They select Git as the source type, and specify the project, source, and master branch for the SmartHotel360
app.

7. They select the task link.

8. They add a new Azure PowerShell task so that they can run a PowerShell script in an Azure environment.
9. They select the Azure subscription for the task, and select the deploy.ps1 script from the Git repo.

10. They add arguments to the script. The script will delete all cluster content (except ingress and ingress
controller), and deploy the microservices.

11. They set the preferred Azure PowerShell version to the latest, and save the pipeline.
12. They move back to the Release page, and manually create a new release.
13. They select the release after creating it, and in Actions, they select Deploy.

14. When the deployment is complete, they run the following command to check the status of services, using
the Azure Cloud Shell: kubectl get services.

Step 3: Provision front-end services


Contoso admins need to deploy the infrastructure that will be used by the front-end apps. They create a blob
storage container for storing the pet images; the Cosmos database to store documents with the pet information;
and the Vision API for the website.
Instructions for this section use the SmartHotel360-public-web repo.
Create blob storage containers
1. In the Azure portal, they open the storage account that was created and select Blobs.
2. They create a new container (Pets) with the public access level set to container. Users will upload their pet
photos to this container.

3. They create a second new container named settings. A file with all the front-end app settings will be placed
in this container.

4. They capture the access details for the storage account in a text file, for future reference.
Provision a Cosmos database
Contoso admins provision a Cosmos database to be used for pet information.
1. They create an Azure Cosmos DB in the Azure Marketplace.

2. They specify a name (contosomarthotel), select the SQL API, and place it in the production resource group
ContosoRG, in the main East US 2 region.
3. They add a new collection to the database, with default capacity and throughput.

4. They note the connection information for the database, for future reference.
Provision Computer Vision
Contoso admins provision the Computer Vision API. The API will be called by the function, to evaluate pictures
uploaded by users.
1. They create a Computer Vision instance in the Azure Marketplace.

2. They provision the API (smarthotelpets) in the production resource group ContosoRG, in the main East US
2 region.
3. They save the connection settings for the API to a text file for later reference.

Provision the Azure web app


Contoso admins provision the web app using the Azure portal.
1. They select Web App in the portal.
2. They provide an app name (smarthotelcontoso), run it on Windows, and place it in the production
resources group ContosoRG. They create a new Application Insights instance for app monitoring..
3. After they're done, they browse to the address of the app to check it's been created successfully.
4. Now, in the Azure portal they create a staging slot for the code. The pipeline will deploy to this slot. This
ensures that code isn't put into production until admins perform a release.
Provision the Azure function app
In the Azure portal, Contoso admins provision the Function App.
1. They select Function App.

2. They provide an app name (smarthotelpetchecker). They place the app in the production resource group
ContosoRG.They set the hosting place to Consumption Plan, and place the app in the East US 2 region. A
new storage account is created, along with an Application Insights instance for monitoring.
3. After the app is deployed, they browse to the app address to check it's been created successfully.

Step 4: Set up the front-end pipeline


Contoso admins create two different projects for the front-end site.
1. In Azure DevOps, they create a project SmartHotelFrontend.
2. They import the SmartHotel360 front end Git repository into the new project.
3. For the function app, they create another Azure DevOps project (SmartHotelPetChecker), and import the
PetChecker Git repository into this project.
Configure the web app
Now Contoso admins configure the web app to use Contoso resources.
1. They connect to the Azure DevOps project, and clone the repository locally to the development machine.
2. In Visual Studio, they open the folder to show all the files in the repo.

3. They update the configuration changes as required.


When the web app starts up, it looks for the SettingsUrl app setting.
This variable must contain a URL pointing to a configuration file.
By default, the setting used is a public endpoint.
4. They update the /config-sample.json/sample.json file.
This is the configuration file for the web when using the public endpoint.
They edit the urls and pets_config sections with the values for the AKS API endpoints, storage accounts,
and Cosmos database.
The URLs should match the DNS name of the new web app that Contoso will create.
For Contoso, this is smarthotelcontoso.eastus2.cloudapp.azure.com.

5. After the file is updated, they rename it smarthotelsettingsurl, and upload it to the blob storage they
created earlier.

6. They select the file to get the URL. The URL is used by the app when it pulls down the configuration files.

7. In the appsettings.Production.json file, they update the SettingsURL to the URL of the new file.
Deploy the website to Azure App Service
Contoso admins can now publish the website.
1. They open Azure DevOps, and in the SmartHotelFrontend project, in Builds and Releases, they select
+New Pipeline.
2. They select Azure DevOps Git as a source.
3. They select the ASP.NET Core template.
4. They review the pipeline, and check that Publish Web Projects and Zip Published Projects are selected.

5. In Triggers, they enable continuous integration, and add the master branch. This ensures that each time the
solution has new code committed to the master branch, the build pipeline starts.
6. They select Save & Queue to start a build.
7. After the build completes, they configure a release pipeline using Azure App Service Deployment.
8. They provide a Stage name Staging.

9. They add an artifact and select the build they just configured.
10. They select the lightning bolt icon on the artifact, and enable continuous deployment.
11. In Environment, they select 1 job, 1 task under Staging.
12. After selecting the subscription, and app name, they open the Deploy Azure App Service task. The
deployment is configured to use the staging deployment slot. This automatically builds code for review and
approval in this slot.
13. In the Pipeline, they add a new stage.
14. They select Azure App Service deployment with slot, and name the environment Prod.
15. They select 1 job, 2 tasks, and select the subscription, app service name, and the staging slot.
16. They remove the Deploy Azure App Service to Slot from the pipeline. It was placed there by the previous
steps.
17. They save the pipeline. On the pipeline, they select Post-deployment conditions.
18. They enable Post-deployment approvals, and add a dev lead as the approver.

19. In the Build pipeline, they manually kick off a build. This triggers the new release pipeline, which deploys the
site to the staging slot. For Contoso, the URL for the slot is
https://smarthotelcontoso-staging.azurewebsites.net/ .

20. After the build finishes, and the release deploys to the slot, Azure DevOps emails the dev lead for approval.
21. The dev lead selects View approval, and can approve or reject the request in the Azure DevOps portal.

22. The lead makes a comment and approves. This starts the swap of the staging and prod slots, and moves
the build into production.
23. The pipeline completes the swap.

24. The team checks the prod slot to verify that the web app is in production at
https://smarthotelcontoso.azurewebsites.net/ .

Deploy the PetChecker Function app


Contoso admins deploy the app as follows.
1. They clone the repository locally to the development machine by connecting to the Azure DevOps project.
2. In Visual Studio, they open the folder to show all the files in the repo.
3. They open the src/PetCheckerFunction/local.settings.json file, and add the app settings for storage, the
Cosmos database, and the Computer Vision API.

4. They commit the code, and sync it back to Azure DevOps, pushing their changes.
5. They add a new Build pipeline, and select Azure DevOps Git for the source.
6. They select the ASP.NET Core (.NET Framework) template.
7. They accept the defaults for the template.
8. In Triggers, then select to Enable continuous integration, and select Save & Queue to start a build.
9. After the build succeeds, they build a Release pipeline, adding Azure App Service deployment with slot.
10. They name the environment Prod, and select the subscription. They set the App type to Function App,
and the app service name as smarthotelpetchecker.
11. They add an artifact Build.
12. They enable Continuous deployment trigger, and select Save.
13. They select Queue new build to run the full CI/CD pipeline.
14. After the function is deployed, it appears in the Azure portal, with the Running status.
15. They browse to the app to test that the Pet Checker app is working as expected, at
http://smarthotel360public.azurewebsites.net/Pets.
16. They select the avatar to upload a picture.

17. The first photo they want to check is of a small dog.


18. The app returns a message of acceptance.

Review the deployment


With the migrated resources in Azure, Contoso now needs to fully operationalize and secure the new
infrastructure.
Security
Contoso needs to ensure that the new databases are secure. Learn more.
The app needs to be updated to use SSL with certificates. The container instance should be redeployed to
answer on 443.
Contoso should consider using Key Vault to protect secrets for their Service Fabric apps. Learn more.
Backups and disaster recovery
Contoso needs to review backup requirements for the Azure SQL Database. Learn more.
Contoso should consider implementing SQL failover groups to provide regional failover for the database. Learn
more.
Contoso can use geo-replication for the ACR premium SKU. Learn more.
Cosmos DB backs up automatically. Contoso can learn more about this process.
Licensing and cost optimization
After all resources are deployed, Contoso should assign Azure tags based on their infrastructure planning.
All licensing is built into the cost of the PaaS services that Contoso is consuming. This will be deducted from the
EA.
Contoso will enable Azure Cost Management licensed by Cloudyn, a Microsoft subsidiary. It's a multicloud cost
management solution that helps you use and manage Azure and other cloud resources. Learn more about
Azure Cost Management.

Conclusion
In this article, Contoso rebuilds the SmartHotel360 app in Azure. The on-premises app front-end VM is rebuilt to
Azure App Service web apps. The application back end is built using microservices deployed to containers
managed by Azure Kubernetes Service (AKS ). Contoso enhanced app functionality with a pet photo app.

Suggested skills
Microsoft Learn is a new approach to learning. Readiness for the new skills and responsibilities that come with
cloud adoption doesn't come easily. Microsoft Learn provides a more rewarding approach to hands-on learning
that helps you achieve your goals faster. Earn points and levels, and achieve more!
Here are a couple of examples of tailored learning paths on Microsoft Learn that align with the Contoso
SmartHotel360 app in Azure.
Deploy a website to Azure with Azure App Service: Web apps in Azure allow you to publish and manage your
website easily without having to work with the underlying servers, storage, or network assets. Instead, you can
focus on your website features and rely on the robust Azure platform to provide secure access to your site.
Process and classify images with the Azure Cognitive Vision Services: Azure Cognitive Services offers pre-built
functionality to enable computer vision functionality in your applications. Learn how to use the Cognitive Vision
Services to detect faces, tag and classify images, and identify objects.
Refactor a Team Foundation Server deployment to
Azure DevOps Services
15 minutes to read • Edit Online

This article shows how the fictional company Contoso refactors their on-premises Team Foundation Server (TFS )
deployment by migrating it to Azure DevOps Services in Azure. Contoso's development team have used TFS for
team collaboration and source control for the past five years. Now, they want to move to a cloud-based solution for
dev and test work, and for source control. Azure DevOps Services will play a role as they move to an Azure
DevOps model, and develop new cloud-native apps.

Business drivers
The IT Leadership team has worked closely with business partners to identify future goals. Partners aren't overly
concerned with dev tools and technologies, but they have captured these points:
Software: Regardless of the core business, all companies are now software companies, including Contoso.
Business leadership is interested in how IT can help lead the company with new working practices for users, and
experiences for their customers.
Efficiency: Contoso needs to streamline process and remove unnecessary procedures for developers and
users. This will allow the company to deliver on customer requirements more efficiently. The business needs IT
to fast, without wasting time or money.
Agility: Contoso IT needs to respond to business needs, and react more quickly than the marketplace to enable
success in a global economy. IT mustn't be a blocker for the business.

Migration goals
The Contoso cloud team has pinned down goals for the migration to Azure DevOps Services:
The team needs a tool to migrate the data to the cloud. Few manual processes should be needed.
Work item data and history for the last year must be migrated.
They don't want to set up new user names and passwords. All current system assignments must be maintained.
They want to move away from Team Foundation Version Control (TFVC ) to Git for source control.
The cutover to Git will be a "tip migration" that imports only the latest version of the source code. It will happen
during a downtime when all work will be halted as the codebase shifts. They understand that only the current
master branch history will be available after the move.
They're concerned about the change and want to test it before doing a full move. They want to retain access to
TFS even after the move to Azure DevOps Services.
They have multiple collections, and want to start with one that has only a few projects to better understand the
process.
They understand that TFS collections are a one-to-one relationship with Azure DevOps Services organizations,
so they'll have multiple URLs. However, this matches their current model of separation for code bases and
projects.

Proposed architecture
Contoso will move their TFS projects to the cloud, and no longer host their projects or source control on-
premises.
TFS will be migrated to Azure DevOps Services.
Currently Contoso has one TFS collection named ContosoDev , which will be migrated to an Azure DevOps
Services organization called contosodevmigration.visualstudio.com .
The projects, work items, bugs and iterations from the last year will be migrated to Azure DevOps Services.
Contoso will use their Azure Active Directory, which they set up when they deployed their Azure infrastructure
at the beginning of their migration planning.

Migration process
Contoso will complete the migration process as follows:
1. There's a lot of preparation involved. As a first step, Contoso needs to upgrade their TFS implementation to a
supported level. Contoso is currently running TFS 2017 Update 3, but to use database migration it needs to run
a supported 2018 version with the latest updates.
2. After upgrading, Contoso will run the TFS migration tool, and validate their collection.
3. Contoso will build a set of preparation files, and perform a migration dry run for testing.
4. Contoso will then run another migration, this time a full migration that includes work items, bugs, sprints, and
code.
5. After the migration, Contoso will move their code from TFVC to Git.

Prerequisites
Here's what Contoso needs to run this scenario.
REQUIREMENTS DETAILS

Azure subscription Contoso created subscriptions in an earlier article in this series.


If you don't have an Azure subscription, create a free account.

If you create a free account, you're the administrator of your


subscription and can perform all actions.

If you use an existing subscription and you're not the


administrator, you need to work with the admin to assign you
Owner or Contributor permissions.

If you need more granular permissions, review this article.

Azure infrastructure Contoso set up their Azure infrastructure as described in


Azure infrastructure for migration.

On-premises TFS server On-premises need to either be running TFS 2018 Upgrade 2
or be upgraded to it as part of this process.

Scenario steps
Here's how Contoso will complete the migration:
Step 1: Create an Azure storage account. This storage account will be used during the migration process.
Step 2: Upgrade TFS. Contoso will upgrade their deployment to TFS 2018 Upgrade 2.
Step 3: Validate collection. Contoso will validate the TFS collection in preparation for migration.
Step 4: Build preparation file. Contoso will create the migration files using the TFS Migration Tool.

Step 1: Create a storage account


1. In the Azure portal, Contoso admins create a storage account ( contosodevmigration).
2. They place the account in their secondary region they use for failover - Central US. They use a general-
purpose standard account with locally redundant storage.
Need more help?
Introduction to Azure storage.
Create a storage account.

Step 2: Upgrade TFS


Contoso admins upgrade the TFS server to TFS 2018 Update 2. Before they start:
They download TFS 2018 Update 2.
They verify the hardware requirements, and read through the release notes and upgrade gotchas.
They upgrade as follows:
1. To start, they back up their TFS server (running on a VMware vM ) and take a VMware snapshot.
2. The TFS installer starts, and they choose the install location. The installer needs internet access.

3. After the installation finishes, the Server Configuration Wizard starts.


4. After verification, the Wizard completes the upgrade.

5. They verify the TFS installation by reviewing projects, work items, and code.
NOTE
Some TFS upgrades need to run the Configure Features Wizard after the upgrade completes. Learn more.

Need more help?


Learn about upgrading TFS.

Step 3: Validate the TFS collection


Contoso admins run the TFS Migration Tool against the ContosoDev collection database to validate it before
migration.
1. They download and unzip the TFS Migration Tool. It's important to download the version for the TFS
update that's running. The version can be checked in the admin console.

2. They run the tool to perform the validation, by specifying the URL of the project collection:

`TfsMigrator validate /collection:http://contosotfs:8080/tfs/ContosoDev`

3. The tool shows an error.


4. They located the log files are located in the Logs folder, just before the tool location. A log file is generated
for each major validation. TfsMigration.log holds the main information.

5. They find this entry, related to identity.

6. They run TfsMigrator validate /help at the command line, and see that the command /tenantDomainName
seems to be required to validate identities.

7. They run the validation command again, and include this value, along with their Azure AD name:
TfsMigrator validate /collection:http://contosotfs:8080/tfs/ContosoDev
/tenantDomainName:contosomigration.onmicrosoft.com
.

8. An Azure AD sign-in screen appears, and they enter the credentials of a Global Admin user.
9. The validation passes, and is confirmed by the tool.

Step 4: Create the migration files


With the validation complete, Contoso admins can use the TFS Migration Tool to build the migration files.
1. They run the prepare step in the tool.
TfsMigrator prepare /collection:http://contosotfs:8080/tfs/ContosoDev
/tenantDomainName:contosomigration.onmicrosoft.com /accountRegion:cus

Prepare does the following:


Scans the collection to find a list of all users and populates the identify map log (IdentityMapLog.csv).
Prepares the connection to Azure Active Directory to find a match for each identity.
Contoso has already deployed Azure AD and synchronized it using Azure AD Connect, so Prepare
should be able to find the matching identities and mark them as Active.
2. An Azure AD sign-in screen appears, and they enter the credentials of a Global Admin.

3. Prepare completes, and the tool reports that the import files have been generated successfully.

4. They can now see that both the IdentityMapLog.csv and the import.json file have been created in a new
folder.
5. The import.json file provides import settings. It includes information such as the desired organization name,
and storage account information. Most of the fields are populated automatically. Some fields required user
input. Contoso opens the file, and adds the Azure DevOps Services organization name to be created:
contosodevmigration. With this name, their Azure DevOps Services URL will be
contosodevmigration.visualstudio.com.

NOTE
The organization must be created before the migration, It can be changed after migration is done.

6. They review the identity log map file that shows the accounts that will be brought into Azure DevOps
Services during the import.
Active identities refer to identities that will become users in Azure DevOps Services after the import.
On Azure DevOps Services, these identities will be licensed, and show up as a user in the organization
after migration.
These identities are marked as Active in the Expected Import Status column in the file.
Step 5: Migrate to Azure DevOps Services
With preparation in place, Contoso admins can now focus on the migration. After running the migration, they'll
switch from using TFVC to Git for version control.
Before they start, the admins schedule downtime with the dev team, to take the collection offline for migration.
These are the steps for the migration process:
1. Detach the collection. Identity data for the collection resides in the TFS server configuration database while
the collection is attached and online. When a collection is detached from the TFS server, it takes a copy of that
identity data, and packages it with the collection for transport. Without this data, the identity portion of the
import cannot be executed. It's recommended that the collection stay detached until the import has been
completed, as there's no way to import the changes which occurred during the import.
2. Generate a backup. The next step of the migration process is to generate a backup that can be imported into
Azure DevOps Services. Data-tier Application Component Packages (DACPAC ), is a SQL Server feature that
allows database changes to be packaged into a single file, and deployed to other instances of SQL. It can also be
restored directly to Azure DevOps Services, and is therefore used as the packaging method for getting
collection data into the cloud. Contoso will use the SqlPackage.exe tool to generate the DACPAC. This tool is
included in SQL Server Data Tools.
3. Upload to storage. After the DACPAC is created, they upload it to Azure Storage. After it's uploaded, they get a
shared access signature (SAS ), to allow the TFS Migration Tool access to the storage.
4. Fill out the import. Contoso can then fill out missing fields in the import file, including the DACPAC setting. To
start with they'll specify that they want to do a dry run import, to check that everything's working properly
before the full migration.
5. Do a dry run. Dry run imports help test collection migration. Dry runs have limited life, and are deleted before
a production migration runs. They're deleted automatically after a set duration. A note about when the dry run
will be deleted is included in the success email received after the import finishes. Take note and plan accordingly.
6. Complete the production migration. With the dry run migration completed, Contoso admins do the final
migration by updating the import.json file, and running import again.
Detach the collection
Before starting, Contoso admins take a local SQL Server backup, and VMware snapshot of the TFS server, before
detaching.
1. In the TFS Admin console, they select the collection they want to detach (ContosoDev).

2. In General, they select Detach Collection.


3. In the Detach Team Project Collection Wizard > Servicing Message, they provide a message for users who
might try to connect to projects in the collection.

4. In Detach Progress, they monitor progress and select Next when the process finishes.

5. In Readiness Checks, when checks finish they select Detach.


6. They select Close to finish up.

7. The collection is no longer referenced in the TFS Admin console.


Generate a DACPAC
Contoso creates a backup (DACPAC ) for import into Azure DevOps Services.
SqlPackage.exe in SQL Server Data Tools is used to create the DACPAC. There are multiple versions of
SqlPackage.exe installed with SQL Server Data Tools, located under folders with names such as 120, 130, and
140. It's important to use the right version to prepare the DACPAC.
TFS 2018 imports need to use SqlPackage.exe from the 140 folder or higher. For CONTOSOTFS, this file is
located in folder: `C:\Program Files (x86)\Microsoft Visual
Studio\2017\Enterprise\Common7\IDE\Extensions\Microsoft\SQLDB\DAC\140
Contoso admins generate the DACPAC as follows:
1. They open a command prompt and navigate to the SQLPackage.exe location. They type this following
command to generate the DACPAC:

SqlPackage.exe /sourceconnectionstring:"Data Source=SQLSERVERNAME\INSTANCENAME;Initial


Catalog=Tfs_ContosoDev;Integrated Security=True" /targetFile:C:\TFSMigrator\Tfs_ContosoDev.dacpac
/action:extract /p:ExtractAllTableData=true /p:IgnoreUserLoginMappings=true /p:IgnorePermissions=true
/p:Storage=Memory

2. The following message appears after the command runs.

3. They verify the properties of the DACPAC file


Update the file to storage
After the DACPAC is created, Contoso uploads it to Azure Storage.
1. They download and install Azure Storage Explorer.

2. They connect to their subscription and locate the storage account they created for the migration
(contosodevmigration). They create a new blob container, azuredevopsmigration.

3. They specify the DACPAC file for upload as a block blob.


4. After the file is uploaded, they select the file name > Generate SAS. They expand the blob containers under
the storage account, select the container with the import files, and select Get Shared Access Signature.

5. They accept the defaults and select Create. This enables access for 24 hours.
6. They copy the Shared Access Signature URL, so that it can be used by the TFS Migration Tool.
NOTE
The migration must happen before within the allowed time window or permissions will expire. Don't generate an SAS key
from the Azure portal. Keys generated like this are account-scoped, and won't work with the import.

Fill in the import settings


Earlier, Contoso admins partially filled out the import specification file (import.json). Now, they need to add the
remaining settings.
They open the import.json file, and fill out the following fields:
Location: Location of the SAS key that was generated above.
Dacpac: Set the name to the DACPAC file you uploaded to the storage account. Include the ".dacpac" extension.
ImportType: Set to DryRun for now.

Do a dry run migration


Contoso admins start with a dry run migration, to make sure everything's working as expected.
1. They open a command prompt, and navigate to the TfsMigration location ( C:\TFSMigrator ).
2. As a first step they validate the import file. They want to be sure the file is formatted properly, and that the
SAS key is working.
TfsMigrator import /importFile:C:\TFSMigrator\import.json /validateonly

3. The validation returns an error that the SAS key needs a longer expiry time.

4. They use Azure Storage Explorer to create a new SAS key with expiry set to seven days.

5. They update the import.json file and run the validation again. This time it completes successfully.
TfsMigrator import /importFile:C:\TFSMigrator\import.json /validateonly

6. They start the dry run:


TfsMigrator import /importFile:C:\TFSMigrator\import.json

7. A message is issued to confirm the migration. Note the length of time for which the staged data will be
maintained after the dry run.

8. Azure AD Sign In appears, and should be completing with Contoso Admin sign-in.
9. A message shows information about the import.

10. After 15 minutes or so, they browse to the URL, and see the following information:
11. After the migration finishes a Contoso Dev Leads signs into Azure DevOps Services to check that the dry
run worked properly. After authentication, Azure DevOps Services needs a few details to confirm the
organization.
12. In Azure DevOps Services, the Dev Lead can see that the projects have been migrated to Azure DevOps
Services. There's a notice that the organization will be deleted in 15 days.

13. The Dev Lead opens one of the projects and opens Work Items > Assigned to me. This shows that work
item data has been migrated, along with identity.
14. The Dev Lead also checks other projects and code, to confirm that the source code and history has been
migrated.

Run the production migration


With the dry run complete, Contoso admins move on to the production migration. They delete the dry run, update
the import settings, and run import again.
1. In the Azure DevOps Services portal, they delete the dry run organization.
2. They update the import.json file to set the ImportType to ProductionRun.
3. They start the migration as they did for the dry run:
TfsMigrator import /importFile:C:\TFSMigrator\import.json .
4. A message shows to confirm the migration, and warns that data could be held in a secure location as a
staging area for up to seven days.

5. In Azure AD Sign In, they specify a Contoso Admin sign-in.


6. A message shows information about the import.

7. After around 15 minutes, they browse to the URL, and sees the following information:
8. After the migration finishes, a Contoso Dev Lead logs onto Azure DevOps Services to check that the
migration worked properly. After login, he can see that projects have been migrated.

9. The Dev Lead opens one of the projects and opens Work Items > Assigned to me. This shows that work
item data has been migrated, along with identity.
10. The Dev Lead checks other work item data to confirm.

11. The Dev Lead also checks other projects and code, to confirm that the source code and history has been
migrated.
Move source control from TFVC to GIT
With migration complete, Contoso wants to move from TFVC to Git for source code management. They need to
import the source code currently in their Azure DevOps Services organization as Git repos in the same
organization.
1. In the Azure DevOps Services portal, they open one of the TFVC repos ( $/PolicyConnect) and review it.

2. They select the Source dropdown > Import.


3. In Source type they select TFVC, and specify the path to the repo. They've decided not to migrate the
history.

NOTE
Due to differences in how TFVC and Git store version control information, we recommend that Contoso don't migrate
history. This is the approach that Microsoft took when it migrated Windows and other products from centralized
version control to Git.
4. After the import, admins review the code.

5. They repeat the process for the second repository ( $/SmartHotelContainer).

6. After reviewing the source, the Dev Leads agree that the migration to Azure DevOps Services is done. Azure
DevOps Services now becomes the source for all development within teams involved in the migration.
Need more help?
Learn more about importing from TFVC.

Clean up after migration


With migration complete, Contoso needs to do the following:
Review the post-import article for information about additional import activities.
Either delete the TFVC repos, or place them in read-only mode. The code bases mustn't used, but can be
referenced for their history.

Post-migration training
Contoso will need to provide Azure DevOps Services and Git training for relevant team members.
Scale a migration to Azure
21 minutes to read • Edit Online

This article demonstrates how the fictional company Contoso performs a migration at scale to Azure. They
consider how to plan and perform a migration of more than 3000 workloads, 8000 databases, and over 10,000
VMs.

Business drivers
The IT leadership team has worked closely with business partners to understand what they want to achieve with
this migration:
Address business growth. Contoso is growing, causing pressure on on-premises systems and infrastructure.
Increase efficiency. Contoso needs to remove unnecessary procedures, and streamline processes for
developers and users. The business needs IT to be fast and not waste time or money, thus delivering faster on
customer requirements.
Increase agility. Contoso IT needs to be more responsive to the needs of the business. It must be able to react
faster than the changes in the marketplace, to enable the success in a global economy. It mustn't get in the way,
or become a business blocker.
Scale. As the business grows successfully, the Contoso IT team must provide systems that are able to grow at
the same pace.
Improve cost models. Contoso wants to lessen capital requirements in the IT budget. Contoso wants to use
cloud abilities to scale and reduce the need for expensive hardware.
Lower licensing costs. Contoso wants to minimize cloud costs.

Migration goals
The Contoso cloud team has pinned down goals for this migration. These goals were used to determine the best
migration method.

REQUIREMENTS DETAILS

Move to Azure quickly Contoso wants to start moving apps and VMs to Azure as
quickly as possible.

Compile a full inventory Contoso wants a complete inventory of all apps, databases,
and VMs in the organization.

Assess and classify apps Contoso wants fully take advantage of the cloud. As a default
Contoso assumes that all services will run as PaaS. IaaS will be
used where PaaS isn't appropriate.

Train and move to DevOps Contoso wants to move to a DevOps model. Contoso will
provide Azure and DevOps training, and reorganize teams as
necessary.

After pinning down goals and requirements, Contoso reviews the IT footprint, and identifies the migration process.

Current deployment
After planning and setting up an Azure infrastructure and trying out different proof-of-concept (POC ) migration
combinations as detailed in the table above, Contoso is ready to embark on a full migration to Azure at scale.
Here's what Contoso wants to migrate.

ITEM VOLUME DETAILS

Workloads More than 3,000 apps Apps run on VMs.

Apps are Windows, SQL-based, and


OSS LAMP.

Databases Around 8,500 Databases include SQL Server, MySQL,


PostgreSQL.

VMs More than 35,000 VMs run on VMware hosts and


managed by vCenter Servers.

Migration process
Now that Contoso have pinned down business drivers and migration goals, it determines a four-pronged
approach for the migration process:
Phase 1: Assess. Discover the current assets, and figure out whether they're suitable for migration to Azure.
Phase 2: Migrate. Move the assets to Azure. How they move apps and objects to Azure will depend on the app
and what they want to achieve.
Phase 3: Optimize. After moving resources to Azure, Contoso needs to improve and streamline them for
maximum performance and efficiency.
Phase 4: Secure and manage. With everything in place, Contoso now uses Azure security and management
resources and services to govern, secure, and monitor its cloud apps in Azure.
These phases aren't serial across the organization. Each piece of Contoso's migration project will be at a different
stage of the assessment and migration process. Optimization, security, and management will be ongoing over time.

Phase 1: Assess
Contoso kicks off the process by discovering and assessing on-premises apps, data, and infrastructure. Here's what
Contoso will do:
Contoso needs to discover apps, maps dependencies across apps, and decide on migration order and priority.
As Contoso assesses, it will build out a comprehensive inventory of apps and resources. Along with the new
inventory, Contoso will use and update the existing Configuration Management Database (CMDB ) and Service
Catalog.
The CMDB holds technical configurations for Contoso apps.
The Service Catalog documents the operational details of apps, including associated business partners,
and Service Level Agreements (SLAs).
Discover apps
Contoso runs thousands of apps across a range of servers. In addition to the CMDB and Service Catalog, Contoso
needs discovery and assessment tools.
The tools must provide a mechanism that can feed assessment data into the migration process.
Assessment tools must provide data that helps build up an intelligent inventory of Contoso's physical and
virtual resources. Data should include profile information, and performance metrics.
When discovery is complete, Contoso should have a complete inventory of assets, and metadata associated
with them. This inventory will be used to define the migration plan.
Identify classifications
Contoso identifies some common categories to classify assets in the inventory. These classifications are critical to
Contoso's decision making for migration. The classification list helps to establish migration priorities, and identify
complex issues.

CATEGORY ASSIGNED VALUE DETAILS

Business group List of business group names Which group is responsible for the
inventory item?

POC candidate Y/N Can the app be used as a POC or early


adopter for cloud migration?

Technical debt None/Some/Severe Is the inventory item running or using


an out-of-support product, platform or
operating system?

Firewall implications Y/N Does the app communicate with the


Internet/outside traffic? Does it
integrate with a firewall?

Security issues Y/N Are there known security issues with


the app? Does the app use unencrypted
data or out-of-date platforms?

Discover app dependencies


As part of the assessment process, Contoso needs to identify where apps are running, and figure out the
dependencies and connections between app servers. Contoso maps the environment in steps.
1. As a first step, Contoso discovers how servers and machines map to individual apps, network locations, and
groups.
2. With this information, Contoso can clearly identify apps that have few dependencies, and are thus suitable for a
quick migration.
3. Contoso can use mapping to help them identify more complex dependencies and communications between app
servers. Contoso can then group these servers logically to represent apps, and plan a migration strategy based
on these groups.
With mapping completed, Contoso can ensure that all app components are identified and accounted for when
building the migration plan.
Evaluate apps
As the last step in the discovery and assessment process, Contoso can evaluate assessment and mapping results to
figure out how to migrate each app in the Service Catalog.
To capture this evaluation process, they add a couple of additional classifications to the inventory.

CATEGORY ASSIGNED VALUE DETAILS

Business group List of business group names Which group is responsible for the
inventory item?

POC candidate Y/N Can the app be used as a POC or early


adopter for cloud migration?

Technical debt None/Some/Severe Is the inventory item running or using


an out-of-support product, platform or
operating system?

Firewall implications Y/N Does the app communicate with the


Internet/outside traffic? Does it
integrate with a firewall?
CATEGORY ASSIGNED VALUE DETAILS

Security issues Y/N Are there known security issues with


the app? Does the app use unencrypted
data or out-of-date platforms?

Migration strategy Rehost/Refactor/Rearchitect/Rebuild What kind of migration is needed for


the app? How will the app be deployed
in Azure? Learn more.

Technical complexity 1-5 How complex is the migration? This


value should be defined by Contoso
DevOps and relevant partners.

Business criticality 1-5 How important is the app for the


business? For example, a small
workgroup app might be assigned a
score of one, while a critical app used
across the org might be assigned a
score of five. This score will affect the
migration priority level.

Migration priority 1/2/3 What the migration priority for the app?

Migration risk 1-5 What's the risk level for migrating the
app? This value should be agreed on by
Contoso DevOps and relevant partners.

Figure out costs


To figure out costs and the potential savings of Azure migration, Contoso can use the Total Cost of Ownership
(TCO ) calculator to calculate and compare the TCO for Azure to a comparable on-premises deployment.
Identify assessment tools
Contoso decides which tool to use for discovery, assessment, and building the inventory. Contoso identifies a mix
of Azure tools and services, native app tools and scripts, and partner tools. In particular, Contoso is interested in
how Azure Migrate can be used to assess at scale.
Azure Migrate
The Azure Migrate service helps you to discover and assess on-premises VMware VMs, in preparation for
migration to Azure. Here's what Azure Migrate does:
1. Discover: Discover on-premises VMware VMs.
Azure Migrate supports discovery from multiple vCenter Servers (serially), and can run discoveries in
separate Azure Migrate projects.
Azure Migrate performs discovery by means on a VMware VM running the Migrate Collector. The same
collector can discover VMs on different vCenter servers, and send data to different projects.
2. Assess readiness: Assess whether on-premises machines are suitable for running in Azure. Assessment
includes:
Size recommendations: Get size recommendations for Azure VMs, based on the performance history of
on-premises VMs.
Estimated monthly costs: Get estimated costs for running on-premises machines in Azure.
3. Identify dependencies: Visualize dependencies of on-premises machines, to create optimal machines groups for
assessment and migration.
M i g r a t e a t sc a l e

Contoso needs to use Azure Migrate correctly given the scale of this migration.
Contoso will do an app-by-app assessment with Azure Migrate. This ensures that Azure Migrate returns timely
data to the Azure portal.
Contoso admins read about deploying Azure Migrate at scale
Contoso notes the Azure Migrate limits summarized in the following table.

ACTION LIMIT

Create Azure Migrate project 10,000 VMs

Discovery 10,000 VMs

Assessment 10,000 VMs

Contoso will use Azure Migrate as follows:


In vCenter Contoso will organize VMs into folders. This will make it easy for them to focus as they run an
assessment against VMs in a specific folder.
Azure Migrate uses Azure Service Map to assess dependencies between machines. This requires agents to be
installed on VMs to be assessed.
Contoso will use automated scripts to install the required Windows or Linux agents.
By scripting, Contoso can push the installation to VMs within a vCenter folder.
Database tools
In addition to Azure Migrate, Contoso will focus on using tools specifically for database assessment. Tools such as
the Data Migration Assistant will help assess SQL Server databases for migration.
The Data Migration Assistant (DMA) can help Contoso to figure out whether on-premises databases are
compatible with a range of Azure database solutions, such as Azure SQL Database, SQL Server running on an
Azure IaaS VM, and Azure SQL Managed Instance.
In addition to DMS, Contoso has some other scripts that they use to discover and documenting the SQL Server
databases. These are located in the GitHub repo.
Partner assessment tools
There are several other partner tools which can help Contoso in assessing the on-premises environment for
migration to Azure. Learn more about Azure Migration partners.
Phase 2: Migrate
With their assessment complete Contoso needs to identify tools to move their apps, data, and infrastructure to
Azure.
Migration strategies
There are four broad migration strategies that Contoso can consider.

STRATEGY DETAILS USAGE

Rehost Often referred to as a lift and shift Contoso can rehost less-strategic apps,
migration, this is a no-code option for requiring no code changes.
migrating existing apps to Azure
quickly.

An app is migrated as-is, with the


benefits of the cloud, without the risks
or costs associated with code changes.

Refactor Also referred to as "repackaging", this Contoso can refactor strategic apps to
strategy requires minimal app code or retain the same basic functionality, but
configuration changes need to connect move them to run on an Azure platform
the app to Azure PaaS, and take better such as Azure App Service.
advantage of cloud capabilities.
This requires minimum code changes.

On the other hand, Contoso will have


to maintain a VM platform since this
won't be managed by Microsoft.

Rearchitect This strategy modifies or extends an


app code base to optimize the app
architecture for cloud capabilities and
scale.

It modernizes an app into a resilient,


highly scalable, independently
deployable architecture.

Azure services can accelerate the


process, scale applications with
confidence, and manage apps with ease.

Rebuild This strategy rebuilds an app from Contoso can rewrite critical apps from
scratch using cloud-native technologies. the ground up, to take advantage of
cloud technologies such as serverless
Azure platform as a service (PaaS) computer, or microservices.
provides a complete development and
deployment environment in the cloud. Contoso will manage the app and
It eliminates some expense and services it develops, and Azure manages
complexity of software licenses, and everything else.
removes the need for an underlying
app infrastructure, middleware, and
other resources.

Data must also be considered, especially with the volume of databases that Contoso has. Contoso's default
approach is to use PaaS services such as Azure SQL Database to take full advantage of cloud features. By moving
to a PaaS service for databases, Contoso will only have to maintain data, leaving the underlying platform to
Microsoft.
Evaluate migration tools
Contoso are primarily using a couple of Azure services and tools for the migration:
Azure Site Recovery: Orchestrates disaster recovery, and migrates on-premises VMs to Azure.
Azure Database Migration Service: Migrates on-premises databases such as SQL Server, MySQL, and Oracle
to Azure.
Azure Site Recovery
Azure Site Recovery is the primary Azure service for orchestrating disaster recovery and migration from within
Azure, and from on-premises sites to Azure.
1. Site Recovery enables, orchestrates replication from your on-premises sites to Azure.
2. When replication is set up and running, on-premises machines can be failed over to Azure, completing the
migration.
Contoso already completed a POC to see how Site Recovery can help them to migrate to the cloud.
U se Si t e R e c o v e r y a t sc a l e

Contoso plans to perform multiple lift and shift migrations. To ensure this works, Site Recovery will be replicating
batches of around 100 VMs at a time. To figure out how this will work, Contoso needs to perform capacity
planning for the proposed Site Recovery migration.
Contoso needs to gather information about their traffic volumes. In particular:
Contoso needs to determine the rate of change for VMs it wants to replicate.
Contoso also needs to take network connectivity from the on-premises site to Azure into account.
In response to capacity and volume requirements, Contoso will need to allocate sufficient bandwidth based on
the daily data change rate for the required VMs, to meet its recovery point objective (RPO ).
Lastly, they need to figure out how many servers are needed to run the Site Recovery components that are
needed for the deployment.
Ga t h e r o n -p re mi s e s i n f o rma t i o n

Contoso can use the Site Recovery Deployment Planner tool to complete these steps:
Contoso can use the tool to remotely profile VMs without an impact on the production environment. This helps
pinpoint bandwidth and storage requirements for replication and failover.
Contoso can run the tool without installing any Site Recovery components on-premises.
The tool gathers information about compatible and incompatible VMs, disks per VM, and data churn per disk. It
also identifies network bandwidth requirements, and the Azure infrastructure needed for successful replication
and failover.
Contoso needs to ensure that then run the planner tool on a Windows Server machines that matches the
minimum requirements for the Site Recovery configuration server. The configuration server is a Site Recovery
machine that's needed in order to replicate on-premises VMware VMs.
I d e n t i f y Si t e R e c o v e ry re q u i re me n t s

In addition to the VMs being replicated, Site Recovery requires several components for VMware migration.

COMPONENT DETAILS

Configuration server Usually a VMware VM set up using an OVF template.

The configuration server component coordinates


communications between on-premises and Azure, and
manages data replication.
COMPONENT DETAILS

Process server Installed by default on the configuration server.

The process server component receives replication data;


optimizes it with caching, compression, and encryption; and
sends it to Azure storage.

The process server also installs Azure Site Recovery Mobility


Service on VMs you want to replicate, and performs automatic
discovery of on-premises machines.

Scaled deployments need additional, standalone process


servers to handle large volumes of replication traffic.

Mobility Service The Mobility Service agent is installed on each VMware VM


that will be migrated with Site Recovery.

Contoso needs to figure out how to deploy these components, based on capacity considerations.
Component | Capacity requirements --- | --- Maximum daily change rate | A single process server can
handle a daily change rate up to 2 TB. Since a VM can only use one process server, the maximum daily data change
rate that's supported for a replicated VM is 2 TB. Maximum throughput | A standard Azure storage account can
handle a maximum of 20,000 requests per second, and input/output operations per second (IOPS ) across a
replicating VM should be within this limit. For example, if a VM has 5 disks, and each disk generates 120 IOPS (8K
size) on the VM, then it will be within the Azure per disk IOPS limit of 500.

Note that the number of storage accounts needed is equal to the total source machine IOPS, divided by 20,000. A
replicated machine can only belong to a single storage account in Azure. Configuration server | Based on
Contoso's estimate of replicating 100=200 VMs together, and the configuration server sizing requirements,
Contoso estimate is needs a configuration server machine as follows:

CPU: 16 vCPUs (2 sockets × 8 cores @ 2.5 GHz)

Memory: 32 GB

Cache disk: 1 TB

Data change rate: 1 TB to 2 TB.

In addition to sizing requirements Contoso will need to make sure that the configuration server is optimally
located, on the same network and LAN segment as the VMs that will be migrated. Process server | Contoso will
deploy a standalone dedicated process server with the ability to replicate 100-200 VMs:

CPU: 16 vCPUs (2 sockets × 8 cores @ 2.5 GHz)

Memory: 32 GB

Cache disk: 1 TB

Data change rate: 1 TB to 2 TB.

The process server will be working hard, and as such should be located on an ESXi host that can handle the disk
I/O, network traffic and CPU required for the replication. Contoso will consider a dedicated host for this purpose.
Networking | Contoso has reviewed the current site-to-site VPN infrastructure, and decided to implement Azure
ExpressRoute. The implementation is critical because it will lower latency, and improve bandwidth to Contoso's
primary East US 2 Azure region.

Monitoring: Contoso will need to carefully monitor data flowing from the process server. If the data overloads the
network bandwidth Contoso will consider throttling the process server bandwidth. Azure storage | For migration,
Contoso must identify the right type and number of target Azure storage accounts. Site Recovery replicates VM
data to Azure storage.

Site Recovery can replicate to standard or premium (SSD ) storage accounts.

To decide about storage, Contoso must review storage limits, and factor in expected growth and increased usage
over time. Given the speed and priority of migrations, Contoso has decided to use premium SSDs.

Contoso has made the decision to use Managed disks for all VMs that are deployed to Azure. The IOPS required
will determine if the disks will be Standard HDD, Standard SSD, or Premium (SSD ).

Azure Database Migration Service


The Azure Database Migration Service is a fully managed service that enables seamless migrations from multiple
database sources to Azure data platforms with minimal downtime.
DMS integrates functionality of existing tools and services. It uses the Data Migration Assistant (DMA), to
generate assessment reports that pinpoint recommendations about database compatibility and any required
modifications.
DMS uses a simple, self-guided migration process, with intelligent assessment that helps address potential
issues before the migration.
DMS can migrate at scale from multiple sources to the target Azure database.
DMS provides support from SQL Server 2005 to SQL Server 2017.
DMS isn't the only Microsoft database migration tool. Get a comparison of tools and services.
U se D M S a t sc a l e

Contoso will use DMS when migrating from SQL Server.


When provisioning DMS, Contoso needs to size it correctly and set it to optimize performance for data
migrations. Contoso will select the "business-critical tier with 4 vCores" option, thus allowing the service to
take advantage of multiple vCPUs for parallelization and faster data transfer.

Another scaling tactic for Contoso is temporarily scale up the Azure SQL or MySQL Database target
instance to the Premium tier SKU during the data migration. This minimizes database throttling that could
affect data transfer activities when using lower-level SKUs.
U se o t h e r t o o l s
In addition to DMS, Contoso can use other tools and services to identify VM information.
They have scripts to help with manual migrations. These are available in the GitHub repo.
Various partner tools can also be used for migration.

Phase 3: Optimize
After Contoso moves resources to Azure, they need to streamline them to improve performance, and maximize
ROI with cost management tools. Given that Azure is a pay-for-use service, it's critical for Contoso to understand
how systems are performing, and to ensure they're sized properly.
Azure Cost Management
To make the most of their cloud investment, Contoso will take advantage of the free Azure Cost Management tool.
This licensed solution built by Cloudyn, a Microsoft subsidiary, allows Contoso to manage cloud spending with
transparency and accuracy. It provides tools to monitor, allocate, and trim cloud costs.
Azure Cost Management provides simple dashboard reports to help with cost allocation, showbacks and
chargebacks.
Cost Management can optimize cloud spending by identifying underutilized resources that Contoso can then
manage and adjust.
Learn more about Azure Cost Management.

Native tools
Contoso will also use scripts to locate unused resources.
During large migrations, there are often leftover pieces of data such as virtual hard drives (VHDs), which incur a
charge, but provide no value to the company. Scripts are available in the GitHub repo.
Contoso will take advantage of work done by Microsoft's IT department, and consider implementing the Azure
Resource Optimization (ARO ) Toolkit.
Contoso can deploy an Azure Automation account with preconfigured runbooks and schedules to its
subscription, and start saving money. Azure resource optimization happens automatically on a subscription
after a schedule is enabled or created, including optimization on new resources.
This provides decentralized automation capabilities to reduce costs. Features include:
Autosnooze Azure VMs based on low CPU.
Schedule Azure VMs to snooze and unsnooze.
Schedule Azure VMs to snooze or unsnooze in ascending and descending order using Azure tags.
Bulk deletion of resource groups on-demand.
Get started with the ARO toolkit in this GitHub repo.
Partner optimization tools
Partner tools such as Hanu and Scalr can be used.

Phase 4: Secure and manage


In this phase, Contoso uses Azure security and management resources to govern, secure, and monitor cloud apps
in Azure. These resources help you run a secure and well-managed environment while using products available in
the Azure portal. Contoso begins using these services during migration and, with Azure hybrid support, continues
using many of them for a consistent experience across the hybrid cloud.
Security
Contoso will rely on the Azure Security Center for unified security management and advanced threat protection
across hybrid cloud workloads.
The Security Center provides full visibility into, and control over, the security of cloud apps in Azure.
Contoso can quickly detect and take action in response to threats, and reduce security exposure by enabling
adaptive threat protection.
Learn more about the Security Center.
Monitoring
Contoso needs visibility into the health and performance of the newly migrated apps, infrastructure, and data now
running Azure. Contoso will use built-in Azure cloud monitoring tools such as Azure Monitor, Log Analytics
workspace, and Application Insights.
Using these tools Contoso can easily collect data from sources and gain rich insights. For example, Contoso can
gauge CPU disk and memory utilization for VMs, view applications and network dependencies across multiple
VMs, and track application performance.
Contoso will use these cloud monitoring tools to take action and integrate with service solutions.
Learn more about Azure monitoring.
Business continuity and disaster recovery
Contoso will need a business continuity and disaster recovery (BCDR ) strategy for their Azure resources.
Azure provides built-in BCDR features to keep data safe and apps/services up and running.
In addition to built-in features, Contoso wants to ensure that it can recover from failures, avoid costly business
disruptions, meet compliance goals, and protect data against ransomware and human errors. To do this:
Contoso will deploy Azure Backup as a cost-efficient solution for backup of Azure resources. Because it's
built-in, Contoso can set up cloud backups in a few simple steps.
Contoso will set up disaster recovery for Azure VMs using Azure Site Recovery for replication, failover,
and failback between Azure regions that it specifies. This ensures that apps running on Azure VMs will
remain available in a secondary region of Contoso's choosing if an outage occurs in the primary region.
Learn more.
Conclusion
In this article, Contoso planned for an Azure migration at scale. They divided the migration process into four
stages. From assessment and migration, through to optimization, security, and management after migration was
complete. Mostly, it's important to plan a migration project as a whole process, but to migrate systems within an
organization by breaking sets down into classifications and numbers that make sense for the business. By
assessing data and applying classifications, and project can be broken down into a series of smaller migrations,
which can run safely and rapidly. The sum of these smaller migrations quickly turns into a large successful
migration to Azure.
VMware host migration best practices for Azure
2 minutes to read • Edit Online

Migration of an entire VMware host to Azure may accelerate the standard migration methodology outlined in the
Cloud Adoption Framework and pictured below.

Migration processes
The expanded scope article on VMware host migration outlines the approach to integrate VMware host migrations
with other Azure migration efforts to reduce complexity and standardize the process.

Migration best practices


The table of contents to the left outlines a number of best practices across multiple Microsoft web properties,
which can guide the execution of VMware host migration to Azure VMware Solutions (AVS ). Bookmark this page
for quick reference to the full list of best practices.
SQL Server migration best practices for Azure
2 minutes to read • Edit Online

Migration of an entire SQL Server to Azure may accelerate the standard migration methodology outlined in the
Cloud Adoption Framework and pictured below.

Migration processes
The expanded scope article on SQL Server migration outlines the approach to integrate SQL Server migrations
with other Azure migration efforts to reduce complexity and standardize the process.

Migration best practices


The table of contents to the left outlines a number of best practices around Microsoft which can guide the
execution of SQL Server Migration using Azure Database Migration Guide, Azure Database Migration Service
(DMS ), or other tools. Bookmark this page for quick reference to the full list of best practices.
Mainframe migration overview
4 minutes to read • Edit Online

Many companies and organizations benefit from moving some or all their mainframe workloads, applications, and
databases to the cloud. Azure provides mainframe-like features at cloud scale without many of the drawbacks
associated with mainframes.
The term mainframe generally refers to a large computer system, but the vast majority currently of mainframes
deployed are IBM System Z servers or IBM plug-compatible systems running MVS, DOS, VSE, OS/390, or z/OS.
Mainframe systems continue to be used in many industries to run vital information systems, and they have a place
in highly specific scenarios, such as large, high-volume, transaction-intensive IT environments.
Migrating to the cloud enables companies to modernize their infrastructure. With cloud services you can make
mainframe applications, and the value that they provide, available as a workload whenever your organization needs
it. Many workloads can be transferred to Azure with only minor code changes, such as updating the names of
databases. You can migrate more complex workloads using a phased approach.
Most Fortune 500 companies are already running Azure for their critical workloads. Azure's significant bottom-line
incentives motivate many migration projects. Companies typically move development and test workloads to Azure
first, followed by DevOps, email, and disaster recovery as a service.

Intended audience
If you're considering a migration or the addition of cloud services as an option for your IT environment, this guide
is for you.
This guidance helps IT organizations start the migration conversation. You may be more familiar with Azure and
cloud-based infrastructures than you are with mainframes, so this guide starts with an overview of how
mainframes work, and continues with various strategies for determining what and how to migrate.

Mainframe architecture
In the late 1950s, mainframes were designed as scale-up servers to run high-volume online transactions and batch
processing. Because of this, mainframes have software for online transaction forms (sometimes called green
screens) and high-performance I/O systems for processing batch runs.
Mainframes have a reputation for high reliability and availability, and are known for their ability to run huge online
transactions and batch jobs. A transaction results from a piece of processing initiated by a single request, typically
from a user at a terminal. Transactions can also come from multiple other sources, including web pages, remote
workstations, and applications from other information systems. A transaction can also be triggered automatically at
a predefined time as the following figure shows.
A typical IBM mainframe architecture includes these common components:
Front-end systems: Users can initiate transactions from terminals, web pages, or remote workstations.
Mainframe applications often have custom user interfaces that can be preserved after migration to Azure.
Terminal emulators are still used to access mainframe applications, and are also called green-screen
terminals.
Application tier: Mainframes typically include a customer information control system (CICS ), a leading
transaction management suite for the IBM z/OS mainframe that is often used with IBM Information
Management System (IMS ), a message-based transaction manager. Batch systems handle high-throughput
data updates for large volumes of account records.
Code: Programming languages used by mainframes include COBOL, Fortran, PL/I, and Natural. Job control
language (JCL ) is used to work with z/OS.
Database tier: A common relational database management system (DBMS ) for z/OS is IBM DD2. It
manages data structures called dbspaces that contain one or more tables and are assigned to storage pools
of physical data sets called dbextents. Two important database components are the directory that identifies
data locations in the storage pools, and the log that contains a record of operations performed on the
database. Various flat-file data formats are supported. DB2 for z/OS typically uses virtual storage access
method (VSAM ) datasets to store the data.
Management tier: IBM mainframes include scheduling software such as TWS -OPC, tools for print and
output management such as CA-SAR and SPOOL, and a source control system for code. Secure access
control for z/OS is handled by resource access control facility (RACF ). A database manager provides access
to data in the database and runs in its own partition in a z/OS environment.
LPAR: Logical partitions, or LPARs, are used to divide compute resources. A physical mainframe is
partitioned into multiple LPARs.
z/OS: A 64-bit operating system that is most commonly used for IBM mainframes.
IBM systems use a transaction monitor such as CICS to track and manage all aspects of a business transaction.
CICS manages the sharing of resources, the integrity of data, and prioritization of execution. CICS authorizes users,
allocates resources, and passes database requests by the application to a database manager, such as IBM DB2.
For more precise tuning, CICS is commonly used with IMS/TM (formerly IMS/Data Communications or IMS/DC ).
IMS was designed to reduce data redundancy by maintaining a single copy of the data. It complements CICS as a
transaction monitor by maintaining state throughout the process and recording business functions in a data store.

Mainframe operations
The following are typical mainframe operations:
Online: Workloads include transaction processing, database management, and connections. They are often
implemented using IBM DB2, CICS, and z/OS connectors.
Batch: Jobs run without user interaction, typically on a regular schedule such as every weekday morning.
Batch jobs can be run on systems based on Windows or Linux by using a JCL emulator such as Micro Focus
Enterprise Server or BMC Control-M software.
Job control language (JCL ): Specify resources needed to process batch jobs. JCL conveys this
information to z/OS through a set of job control statements. Basic JCL contains six types of statements: JOB,
ASSGN, DLBL, EXTENT, LIBDEF, and EXEC. A job can contain several EXEC statements (steps), and each
step could have several LIBDEF, ASSGN, DLBL, and EXTENT statements.
Initial program load (IPL ): Refers to loading a copy of the operating system from disk into a processor's
real storage and running it. IPLs are used to recover from downtime. An IPL is like booting the operating
system on Windows or Linux VMs.

Next steps
Myths and facts
Mainframe myths and facts
2 minutes to read • Edit Online

Mainframes figure prominently in the history of computing and remain viable for highly specific workloads. Most
agree that mainframes are a proven platform with long-established operating procedures that make them reliable,
robust environments. Software runs based on usage, measured in million instructions per second (MIPS ), and
extensive usage reports are available for chargebacks.
The reliability, availability, and processing power of mainframes have taken on almost mythical proportions. To
evaluate the mainframe workloads that are most suitable for Azure, you first want to distinguish the myths from
the reality.

Myth: Mainframes never go down and have a minimum of five 9s of


availability
Mainframe hardware and operating systems are viewed as reliable and stable. But the reality is that downtime
must be scheduled for maintenance and reboots (referred to as initial program loads or IPLs). When these tasks
are considered, a mainframe solution often has closer to two or three 9s of availability, which is equivalent to that
of high-end, Intel-based servers.
Mainframes also remain as vulnerable to disasters as any other servers do, and require uninterruptible power
supply (UPS ) systems to handle these types of failures.

Myth: Mainframes have limitless scalability


A mainframe's scalability depends on the capacity of its system software, such as the customer information control
system (CICS ), and the capacity of new instances of mainframe engines and storage. Some large companies that
use mainframes have customized their CICS for performance, and have otherwise outgrown the capability of the
largest available mainframes.

Myth: Intel-based servers are not as powerful as mainframes


The new core-dense, Intel-based systems have as much compute capacity as mainframes.

Myth: The cloud can't accommodate mission-critical applications for


large companies such as financial institutions
Although there may be some isolated instances where cloud solutions fall short, it is usually because the
application algorithms cannot be distributed. These few examples are the exceptions, not the rule.

Summary
By comparison, Azure offers an alternative platform that is capable of delivering equivalent mainframe
functionality and features, and at a much lower cost. In addition, the total cost of ownership (TCO ) of the cloud's
subscription-based, usage-driven cost model is far less expensive than mainframe computers.

Next steps
Make the switch from mainframes to Azure
Make the switch from mainframes to Azure
4 minutes to read • Edit Online

As an alternative platform for running traditional mainframe applications, Azure offers hyperscale compute and
storage in a high availability environment. You get the value and agility of a modern, cloud-based platform without
the costs associated with a mainframe environment.
This section provides technical guidance for making the switch from a mainframe platform to Azure.

MIPS vs. vCPUs


There is no universal mapping formula that exists for determining the number of virtual central processing units
(vCPUs) needed to run mainframe workloads. However, the metric of a million instructions per second (MIPS ) is
often mapped to vCPUs on Azure. MIPS measures the overall compute power of a mainframe by providing a
constant value of the number of cycles per second for a given machine.
A small organization might require less than 500 MIPS, while a large organization typically uses more than 5,000
MIPS. At $1,000 per single MIPS, a large organization spends approximately $5 million annually to deploy a
5,000-MIPS infrastructure. The annual cost estimate for a typical Azure deployment of this scale is approximately
one-tenth the cost of a MIPS infrastructure. For details, see Table 4 in the Demystifying Mainframe-to-Azure
Migration white paper.
An accurate calculation of MIPS to vCPUs with Azure depends on the type of vCPU and the exact workload you
are running. However, benchmark studies provide a good basis for estimating the number and type of vCPUs you
will need. A recent HPE zREF benchmark provides the following estimates:
288 MIPS per Intel-based core running on HP Proliant servers for online (CICS ) jobs.
170 MIPS per Intel core for COBOL batch jobs.
This guide estimates 200 MIPS per vCPU for online processing and 100 MIPS per vCPU for batch processing.

NOTE
These estimates are subject to change as new virtual machine (VM) series become available in Azure.

High availability and failover


Mainframe systems often offer five 9s availability (99.999 percent) when mainframe coupling and Parallel Sysplex
are used. Yet system operators still need to schedule downtime for maintenance and initial program loads (IPLs).
The actual availability approaches two or three 9s, comparable to high end, Intel-based servers.
By comparison, Azure offers commitment-based service level agreements (SLAs), where multiple 9s availability is
the default, optimized with local or geo-based replication of services.
Azure provides additional availability by replicating data from multiple storage devices, either locally or in other
geographic regions. In the event of an Azure-based failure, compute resources can access the replicated data on
either the local or regional level.
When you use Azure platform as a service (PaaS ) resources, such as Azure SQL Database and Azure Cosmos
Database, Azure can automatically handle failovers. When you use Azure infrastructure as a service (IaaS ), failover
relies on specific system functionality, such as SQL Server Always On features, failover clustering instances, and
availability groups.

Scalability
Mainframes typically scale up, while cloud environments scale out. Mainframes can scale out with the use of a
coupling facility (CF ), but the high cost of hardware and storage makes mainframes expensive to scale out.
A CF also offers tightly coupled compute, whereas the scale-out features of Azure are loosely coupled. The cloud
can scale up or down to match exact user specifications, with compute power, storage, and services scaling on
demand under a usage-based billing model.

Backup and recovery


Mainframe customers typically maintain disaster recovery sites or make use or an independent mainframe
provider for disaster contingencies. Synchronization with a disaster recovery site is usually done through offline
copies of data. Both options incur high costs.
Automated geo-redundancy is also available through the mainframe coupling facility, albeit at great expense, and
is usually reserved for mission-critical systems. In contrast, Azure has easy-to-implement and cost-effective
options for backup, recovery, and redundancy at local or regional levels, or via geo-redundancy.

Storage
Part of understanding how mainframes work involves decoding various overlapping terms. For example, central
storage, real memory, real storage, and main storage all generally refer to storage attached directly to the
mainframe processor.
Mainframe hardware includes processors and many other devices, such as direct-access storage devices (DASDs),
magnetic tape drives, and several types of user consoles. Tapes and DASDs are used for system functions and by
user programs.
Types of physical storage for mainframes include:
Central storage: Located directly on the mainframe processor, this is also known as processor or real storage.
Auxiliary storage: Located separately from the mainframe, this type includes storage on DASDs and is also
known as paging storage.
The cloud offers a range of flexible, scalable options, and you will pay only for those options that you need. Azure
Storage offers a massively scalable object store for data objects, a file system service for the cloud, a reliable
messaging store, and a NoSQL store. For VMs, managed and unmanaged disks provide persistent, secure disk
storage.

Mainframe development and testing


A major driver in mainframe migration projects is the changing face of application development. Organizations
want their development environment to be more agile and responsive to business needs.
Mainframes typically have separate logical partitions (LPARs) for development and testing, such as QA and staging
LPARs. Mainframe development solutions include compilers (COBOL, PL/I, Assembler) and editors. The most
common is the Interactive System Productivity Facility (ISPF ) for the z/OS operating system that runs on IBM
mainframes. Others include ROSCOE Programming Facility (RPF ) and Computer Associates tools, such as CA
Librarian and CA-Panvalet.
Emulation environments and compilers are available on x86 platforms, so development and testing can typically be
among the first workloads to migrate from a mainframe to Azure. The availability and widespread use of DevOps
tools in Azure is accelerating the migration of development and testing environments.
When solutions are developed and tested on Azure and are ready for deployment to the mainframe, you will need
to copy the code to the mainframe and compile it there.

Next steps
Mainframe application migration
Mainframe application migration
10 minutes to read • Edit Online

When migrating applications from mainframe environments to Azure, most teams follow a pragmatic approach:
reuse wherever and whenever possible, and then start a phased deployment where applications are rewritten or
replaced.
Application migration typically involves one or more of the following strategies:
Rehost: You can move existing code, programs, and applications from the mainframe, and then recompile
the code to run in a mainframe emulator hosted in a cloud instance. This approach typically starts with
moving applications to a cloud-based emulator, and then migrating the database to a cloud-based database.
Some engineering and refactoring are required along with data and file conversions.
Alternatively, you can rehost using a traditional hosting provider. One of the principal benefits of the cloud is
outsourcing infrastructure management. You can find a datacenter provider that will host your mainframe
workloads for you. This model may buy time, reduce vendor lock in, and produce interim cost savings.
Retire: All applications that are no longer needed should be retired before migration.
Rebuild: Some organizations choose to completely rewrite programs using modern techniques. Given the
added cost and complexity of this approach, it's not as common as a lift and shift approach. Often after this
type of migration, it makes sense to begin replacing modules and code using code transformation engines.
Replace: This approach replaces mainframe functionality with equivalent features in the cloud. Software as
a service (SaaS ) is one option, which is using a solution created specifically for an enterprise concern, such
as finance, human resources, manufacturing, or enterprise resource planning. In addition, many industry-
specific apps are now available to solve problems that custom mainframe solutions used to previously solve.
You should consider starting by planning those workloads that you want to initially migrate, and then determine
those requirements for moving associated applications, legacy codebases, and databases.

Mainframe emulation in Azure


Azure cloud services can emulate traditional mainframe environments, enabling you to reuse existing mainframe
code and applications. Common server components that you can emulate include online transaction processing
(OLTP ), batch, and data ingestion systems.
OLTP systems
Many mainframes have OLTP systems that process thousands or millions of updates for huge numbers of users.
These applications often use transaction processing and screen-form handling software, such as customer
information control system (CICS ), information management systems (IMS ), and terminal interface processor
(TIP ).
When moving OLTP applications to Azure, emulators for mainframe transaction processing (TP ) monitors are
available to run as infrastructure as a service (IaaS ) using virtual machines (VMs) on Azure. The screen handling
and form functionality can also be implemented by web servers. This approach can be combined with database
APIs, such as ActiveX data objects (ADO ), open database connectivity (ODBC ), and Java database connectivity
(JDBC ) for data access and transactions.
Time -constrained batch updates
Many mainframe systems perform monthly or annual updates of millions of account records, such as those used in
banking, insurance, and government. Mainframes handle these types of workloads by offering high-throughput
data handling systems. Mainframes batch jobs are typically serial in nature and depend on the input/output
operations per second (IOPS ) provided by the mainframe backbone for performance.
Cloud-based batch environments use parallel compute and high-speed networks for performance. If you need to
optimize batch performance, Azure provides various compute, storage, and networking options.
Data ingestion systems
Mainframes ingest large batches of data from retail, financial services, manufacturing, and other solutions for
processing. With Azure, you can use simple command-line utilities such as AzCopy for copying data to and from
storage location. You can also use the Azure Data Factory service, enabling you to ingest data from disparate data
stores to create and schedule data-driven workflows.
In addition to emulation environments, Azure provides platform as a service (PaaS ) and analytics services that can
enhance existing mainframe environments.

Migrate OLTP workloads to Azure


The lift and shift approach is the no-code option for quickly migrating existing applications to Azure. Each
application is migrated as is, which provides the benefits of the cloud without the risks or costs of making code
changes. Using an emulator for mainframe transaction processing (TP ) monitors on Azure supports this approach.
TP monitors are available from various vendors and run on virtual machines, an infrastructure as a service (IaaS )
option on Azure. The following before and after diagrams show a migration of an online application backed by
IBM DB2, a relational database management system (DBMS ), on an IBM z/OS mainframe. DB2 for z/OS uses
virtual storage access method (VSAM ) files to store the data and Indexed Sequential Access Method (ISAM ) for
flat files. This architecture also uses CICS for transaction monitoring.

On Azure, emulation environments are used to run the TP manager and the batch jobs that use JCL. In the data
tier, DB2 is replaced by Azure SQL Database, although Microsoft SQL Server, DB2 LUW, or Oracle Database can
also be used. An emulator supports IMS, VSAM, and SEQ. The mainframe's system management tools are
replaced by Azure services, and software from other vendors, that run in VMs.
The screen handling and form entry functionality is commonly implemented using web servers, which can be
combined with database APIs, such as ADO, ODBC, and JDBC for data access and transactions. The exact line-up
of Azure IaaS components to use depends on the operating system you prefer. For example:
Windows–based VMs: Internet Information Server (IIS ) along with ASP.NET for the screen handling and
business logic. Use ADO.NET for data access and transactions.
Linux–based VMs: The Java-based application servers that are available, such as Apache Tomcat for screen
handling and Java-based business functionality. Use JDBC for data access and transactions.

Migrate batch workloads to Azure


Batch operations in Azure differ from the typical batch environment on mainframes. Mainframe batch jobs are
typically serial in nature and depend on the IOPS provided by the mainframe backbone for performance. Cloud-
based batch environments use parallel computing and high-speed networks for performance.
To optimize batch performance using Azure, consider the compute, storage, networking, and monitoring options as
follows.
Compute
Use:
VMs with the highest clock speed. Mainframe applications are often single-threaded and mainframe CPUs
have a very high clock speed.
VMs with large memory capacity to allow caching of data and application work areas.
VMs with higher density vCPUs to take advantage of multithreaded processing if the application supports
multiple threads.
Parallel processing, as Azure easily scales out for parallel processing, delivering more compute power for a
batch run.
Storage
Use:
Azure premium SSD or Azure ultra SSD for maximum available IOPS.
Striping with multiple disks for more IOPS per storage size.
Partitioning for storage to spread IO over multiple Azure storage devices.
Networking
Use Azure Accelerated Networking to minimize latency.
Monitoring
Use monitoring tools, Azure Monitor, Azure Application Insights, and even the Azure logs enable
administrators to monitor any over performance of batch runs and help eliminate bottlenecks.

Migrate development environments


The cloud's distributed architectures rely on a different set of development tools that provide the advantage of
modern practices and programming languages. To ease this transition, you can use a development environment
with other tools that are designed to emulate IBM z/OS environments. The following list shows options from
Microsoft and other vendors:

COMPONENT AZURE OPTIONS

z/OS Windows, Linux, or UNIX

CICS Azure services offered by Micro Focus, Oracle, GT Software


(Fujitsu), TmaxSoft, Raincode, and NTT Data, or rewrite using
Kubernetes

IMS Azure services offered by Micro Focus and Oracle


COMPONENT AZURE OPTIONS

Assembler Azure services from Raincode and TmaxSoft; or COBOL, C, or


Java, or map to operating system functions

JCL JCL, PowerShell, or other scripting tools

COBOL COBOL, C, or Java

Natural Natural, COBOL, C, or Java

FORTRAN and PL/I FORTRAN, PL/I, COBOL, C, or Java

REXX and PL/I REXX, PowerShell, or other scripting tools

Migrate databases and data


Application migration usually involves rehosting the data tier. You can migrate SQL Server, open-source, and other
relational databases to fully managed solutions on Azure, such as Azure SQL Database Managed Instance, Azure
Database Service for PostgreSQL, and Azure Database for MySQL with Azure Database Migration Service.
For example, you can migrate if the mainframe data tier uses:
IBM DB2 or an IMS database, use Azure SQL database, SQL Server, DB2 LUW, or Oracle Database on
Azure.
VSAM and other flat files, use Indexed Sequential Access Method (ISAM ) flat files for Azure SQL, SQL
Server, DB2 LUW, or Oracle.
Generation Date Groups (GDGs), migrate to files on Azure that use a naming convention and filename
extensions that provide similar functionality to GDGs.
The IBM data tier includes several key components that you must also migrate. For example, when you migrate a
database, you also migrate a collection of data contained in pools, each containing dbextents, which are z/OS
VSAM data sets. Your migration must include the directory that identifies data locations in the storage pools. Also,
your migration plan must consider the database log, which contains a record of operations performed on the
database. A database can have one, two (dual or alternate), or four (dual and alternate) logs.
Database migration also includes these components:
Database manager: Provides access to data in the database. The database manager runs in its own partition
in a z/OS environment.
Application requester: Accepts requests from applications before passing them to an application server.
Online resource adapter: Includes application requester components for use in CICS transactions.
Batch resource adapter: Implements application requester components for z/OS batch applications.
Interactive SQL (ISQL ): Runs as a CICS application and interface enabling users to enter SQL statements or
operator commands.
CICS application: Runs under the control of CICS, using available resources and data sources in CICS.
Batch application: Runs process logic without interactive communication with users to, for example, produce
bulk data updates or generate reports from a database.

Optimize scale and throughput for Azure


Generally speaking, mainframes scale up, while the cloud scales out. To optimize scale and throughput of
mainframe-style applications running on Azure, it is important that you understand at how mainframes can
separate and isolate applications. A z/OS mainframe uses a feature called Logical Partitions (LPARS ) to isolate and
manage the resources for a specific application on a single instance.
For example, a mainframe might use one logical partition (LPAR ) for a CICS region with associated COBOL
programs, and a separate LPAR for DB2. Additional LPARs are often used for the development, testing, and
staging environments.
On Azure, it's more common to use separate VMs to serve this purpose. Azure architectures typically deploy VMs
for the application tier, a separate set of VMs for the data tier, another set for development, and so on. Each tier of
processing can be optimized using the most suitable type of VMs and features for that environment.
In addition, each tier can also provide appropriate disaster recovery services. For example, production and
database VMs might require a hot or warm recovery, while the development and testing VMs support a cold
recovery.
The following figure shows a possible Azure deployment using a primary and a secondary site. In the primary site,
the production, staging, and testing VMs are deployed with high availability. The secondary site is for backup and
disaster recovery.

Perform a staged mainframe to Azure


Moving solutions from a mainframe to Azure may involve a staged migration, whereby some applications are
moved first, and others remain on the mainframe temporarily or permanently. This approach typically requires
systems that allow applications and databases to interoperate between the mainframe and Azure.
A common scenario is to move an application to Azure while keeping the data used by the application on the
mainframe. Specific software is used to enable the applications on Azure to access data from the mainframe.
Fortunately, a wide range of solutions provide integration between Azure and existing mainframe environments,
support for hybrid scenarios, and migration over time. Microsoft partners, independent software vendors, and
system integrators can help you on your journey.
One option is Microsoft Host Integration Server, a solution that provides the distributed relational database
architecture (DRDA) required for applications in Azure to access data in DB2 that remains on the mainframe.
Other options for mainframe-to-Azure integration include solutions from IBM, Attunity, Codit, other vendors, and
open source options.

Partner solutions
If you are considering a mainframe migration, the partner ecosystem is available to assist you.
Azure provides a proven, highly available, and scalable infrastructure for systems that currently run on
mainframes. Some workloads can be migrated with relative ease. Other workloads that depend on legacy system
software, such as CICS and IMS, can be rehosted using partner solutions and migrated to Azure over time.
Regardless of the choice you make, Microsoft and our partners are available to assist you in optimizing for Azure
while maintaining mainframe system software functionality.

Learn more
For more information, see the following resources:
Get started with Azure
Deploy IBM DB2 pureScale on Azure
Host Integration Server documentation
Best practices for costing and sizing workloads
migrated to Azure
17 minutes to read • Edit Online

As you plan and design for migration, focusing on costs ensures the long-term success of your Azure migration.
During a migration project, it's critical that all teams (such as finance, management, and application development
teams) understand associated costs.
Before migration, estimating your migration spend, with a baseline for monthly, quarterly, and yearly budget
targets is critical to success.
After migration, you should optimize costs, continually monitor workloads, and plan for future usage patterns.
Migrated resources might start out as one type of workload, but shift to another type over time, based on
usage, costs, and shifting business requirements.
This article describes best practices for costing and sizing before and after migration.

IMPORTANT
The best practices and opinions described in this article are based on Azure platform and service features available at the
time of writing. Features and capabilities change over time. Not all recommendations might be applicable for your
deployment, so select what works for you.

Before migration
Before you move your workloads to the cloud, estimate the monthly cost of running them in Azure. Proactively
managing cloud costs helps you adhere to your operating expense budget. If budget is limited, take this into
account before migration. Consider converting workloads to Azure serverless technologies, where appropriate, to
reduce costs.
The best practices in this section help you to estimate costs, perform right-sizing for VMs and storage, use Azure
Hybrid benefits, use reserved VMs, and estimate cloud spending across subscriptions.

Best practice: Estimate monthly workload costs


To forecast your monthly bill for migrated workloads, there are several tools you can use.
Azure pricing calculator: You select the products you want to estimate, for example VMs and storage. You
input costs into the pricing calculator, to build an estimate.
Azure pricing calculator
Azure Migrate: To estimate costs, you need to review and account for all the resources required to run
your workloads in Azure. To acquire this data, you create inventory of your assets, including servers, VMs,
databases, and storage. You can use Azure Migrate to collect this information.
Azure Migrate discovers and assesses your on-premises environment to provide an inventory.
Azure Migrate can map and show you dependencies between VMs so that you have a complete picture.
An Azure Migrate assessment contains estimated cost.
Compute costs: Using the Azure VM size recommended when you create an assessment, Azure
Migrate uses the Billing API to calculate estimated monthly VM costs. The estimation considers the
operating system, software assurance, reserved instances, VM uptime, location, and currency
settings. It aggregates the cost across all VMs in the assessment, and calculates a total monthly
compute cost.
Storage cost: Azure Migrate calculates total monthly storage costs by aggregating the storage costs
of all VMs in an assessment. You can calculate the monthly storage cost for a specific machine by
aggregating the monthly cost of all disks attached to it.
Azure Migrate assessment
Learn more:
Use the Azure pricing calculator.
Get an overview of Azure Migrate.
Read about Azure Migrate assessments.
Learn more about the Azure Database Migration Service.

Best practice: Right-size VMs


You can choose various options when you deploy Azure VMs to support workloads. Each VM type has specific
features and different combinations of CPU, memory, and disks. VMs are grouped as shown below:

TYPE DETAILS USE

General purpose Balanced CPU-to-memory. Good for testing and development,


small- to medium-size databases, low-
to medium-volume traffic web servers.

Compute-optimized High CPU-to-memory. Good for medium-volume traffic web


server, network appliances, batch
processes, app servers.

Memory-optimized High memory-to-CPU. Good for relational databases, medium-


to large-size cache, in-memory
analytics.

Storage optimized High disk throughput and IO. Good for big data, SQL and NoSQL
databases.

GPU optimized Specialized VMs. Single or multiple Heavy graphics and video editing.
GPUs.

High performance Fastest and most powerful CPU. VMs Critical high-performance apps.
with optional high-throughput network
interfaces (RDMA)

It's important to understand the pricing differences between these VMs, and the long-term budget effects.
Each type has several VM series within it.
Additionally, when you select a VM within a series, you can only scale the VM up and down within that series.
For example, a DSv2_2 can scale up to DSv2_4, but it can't be changed to a different series such as Fsv2_2.
Learn more:
Learn more about VM types and sizing, and map sizes to types.
Plan VM sizing.
Review a sample assessment for the fictional Contoso company.

Best practice: Select the right storage


Tuning and maintaining on-premises storage (SAN or NAS ), and the networks to support them, can be costly and
time-consuming. File (storage) data is commonly migrated to the cloud to help alleviate operational and
management headaches. Microsoft provides several options for moving data to Azure, and you need to make
decisions about those options. Picking the right storage type for data can save your organization several
thousands of dollars every month. A few considerations:
Data that isn't accessed much, and isn't business-critical, doesn't need to be placed on the most expensive
storage.
Conversely, important business-critical data should be located on higher tier storage options.
During migration planning, take an inventory of data and classify it by importance, in order to map it to the
most suitable storage. Consider budget and costs, as well as performance. Cost shouldn't necessarily be the
main decision-making factor. Picking the least expensive option could expose the workload to performance and
availability risks.
Storage data types
Azure provides different types of storage data.

DATA TYPE DETAILS USAGE

Blobs Optimized to store massive Access data from Use for streaming and
amounts of unstructured everywhere over random access scenarios.
objects, such as text or HTTP/HTTPS. For example, to serve
binary data images and documents
directly to a browser, stream
video and audio, and store
backup and disaster
recovery data.

Files Managed file shares Use when migrating on-


accessed over SMB 3.0 premises file shares, and to
provide multiple
access/connections to file
data.

Disks Based on page blobs. Use Premium disks for VMs.


Use managed disks for
Disk type (speed): Standard simple management and
(HDD or SSD) or Premium scaling.
(SSD).

Disk management:
Unmanaged (you manage
disk settings and storage) or
Managed (you select the
disk type and Azure
manages the disk for you).
DATA TYPE DETAILS USAGE

Queues Store and retrieve large Connect app components


numbers of messages with asynchronous message
accessed via authenticated queueing.
calls (HTTP or HTTPS)

Tables Store tables. Now part of Azure Cosmos


DB Table API.

Access tiers
Azure storage provides different options for accessing block blob data. Selecting the right access tier helps ensure
that you store block blob data in the most cost-effective manner.

TYPE DETAILS USAGE

Hot Higher storage cost than Cool. Lower Use for data in active use that's
access charges than Cool. accessed frequently.

This is the default tier.

Cool Lower storage cost than Hot. Higher Store short-term, data is available but
access charges than Hot. accessed infrequently.

Store for minimum of 30 days.

Archive Used for individual block blobs. Use for data that can tolerate server
hours of retrieval latency and will
Most cost-effective option for storage. remain in the tier for at least 180 days.
Data access is more expensive than hot
and cold.

Storage account types


Azure provides different types of storage accounts and performance tiers.

ACCOUNT TYPE DETAILS USAGE

General Purpose v2 Standard Supports blobs (block, page, append), Use for most scenarios and most types
files, disks, queues, and tables. of data. Standard storage accounts can
be HDD or SSD based.
Supports Hot, Cool, and Archive access
tiers. ZRS is supported.

General Purpose v2 Premium Supports Blob storage data (page Microsoft recommends using for all
blobs). Supports Hot, Cool, and Archive VMs.
access tiers. ZRS is supported.

Stored on SSD.

General Purpose v1 Access tiering isn't supported. Doesn't Use if apps need the Azure classic
support ZRS deployment model.
ACCOUNT TYPE DETAILS USAGE

Blob Specialized storage account for storing you can't store page blobs in these
unstructured objects. Provides block accounts, and therefore can't store VHD
blobs and append blobs only (no File, files. You can set an access tier to Hot
Queue, Table or Disk storage services). or Cool.
Provides the same durability, availability,
scalability and performance as General
Purpose v2.

Storage redundancy options


Storage accounts can use different types of redundancy for resilience and high availability.

TYPE DETAILS USAGE

Locally redundant storage (LRS) Protects against a local outage by Consider if your app stores data that
replicating within a single storage unit can be easily reconstructed.
to a separate fault domain and update
domain. Keeps multiple copies of your
data in one datacenter. Provides at least
99.999999999 % (11 9's) durability of
objects over a given year.

Zone-redundant storage (ZRS) Protects again a datacenter outage by Consider if you need consistency,
replicating across three storage clusters durability, and high availability. Might
in a single region. Each storage cluster not protect against a regional disaster
is physically separated and located in its when multiple zones are permanently
own availability zone. Provides at least affected.
99.9999999999 % (12 9's) durability of
objects over a given year by keeping
multiple copies of your data across
multiple datacenters or regions.

Geographically redundant storage Protects against an entire region Replica data isn't available unless
(GRS) outage by replicating data to a Microsoft initiates a failover to the
secondary region hundreds of miles secondary region. If failover occurs,
away from the primary. Provides at read and write access is available.
least 99.99999999999999 % (16 9's)
durability of objects over a given year.

Read-access geographically Similar to GRS. Provides at least Provides and 99.99 % read availability
redundant storage (RA-GRS) 99.99999999999999 % (16 9's) by allowing read access from the
durability of objects over a given year second region used for GRS.

Learn more:
Review Azure Storage pricing.
Learn about Azure Import/Export for migration large amounts of data to the Azure blobs and files.
Compare blobs, files, and disk storage data types.
Learn more about access tiers.
Review different types of storage accounts.
Learn about storage redundancy, LRS, ZRS, GRS, and Read-access GRS.
Learn more about Azure Files.

Best practice: Take advantage of Azure Hybrid benefits


Due to years of software investment in systems such as Windows Server and SQL Server, Microsoft is in a unique
position to offer customers value in the cloud, with substantial discounts that other cloud providers can't
necessarily provide.
An integrated Microsoft on-premises/Azure product portfolio generates competitive and cost advantages. If you
currently have an operating system or other software licensing through software assurance (SA), you can take
those licenses with you to the cloud for with Azure Hybrid Benefit.
Learn more:
Take a look at the Hybrid Benefit Savings Calculator.
Learn more about Hybrid Benefit for Windows Server.
Review pricing guidance for SQL Server Azure VMs.

Best practice: Use reserved VM instances


Most cloud platforms are set up as pay-as-you-go. This model presents disadvantages, since you don't necessarily
know how dynamic workloads will be. When you specify clear intentions for a workload, you contribute to
infrastructure planning.
Using Azure Reserved VM instances, you prepay for a one or three-year term VM instance.
Prepayment provides a discount on the resources you use.
You can significantly reduce VM, SQL database compute, Azure Cosmos DB, or other resource costs by up to
72% on pay-as-you-go prices.
Reservations provide a billing discount, and don't affect the runtime state of your resources.
You can cancel reserved instances.

Azure reserved VMs


Learn more:
Learn about Azure Reservations.
Read the reserved instances FAQ.
Get pricing guidance for SQL Server Azure VMs.

Best practice: Aggregate cloud spend across subscriptions


It's inevitable that eventually you'll have more than one Azure subscription. For example, you might need an
additional subscription to separate development and production boundaries, or you might have a platform that
requires a separate subscription for each client. Having the ability to aggregate data reporting across all the
subscriptions into a single platform is a valuable feature.
To do this, you can use Azure Cost Management APIs. Then, after aggregating data into a single source such as
Azure SQL, you can use tools like Power BI to surface the aggregated data. You can create aggregated
subscription reports, and granular reports. For example, for users who need proactive insights into cost
management, you can create specific views of costs, based on department, resource group etc. You don't need to
provide them with full access to Azure billing data.
Learn more:
Get an overview of the Azure Consumption API.
Learn about connecting to Azure Consumption Insights in Power BI Desktop.
Learn how to manage access to billing information for Azure using role-based access control (RBAC ).

After migration
After a successful migration of your workloads, and a few weeks of collecting consumption data, you'll have a clear
idea of resources costs.
As you analyze data, you can start to generate a budget baseline for Azure resource groups and resources.
Then, as you understand where your cloud budget is being spent, you can analyze how to further reduce your
costs.
Best practices in this section include using Azure Cost Management for cost budgeting and analysis, monitoring
resources and implementing resource group budgets, and optimizing monitoring, storage, and VMs.

Best practice: Use Azure Cost Management


Microsoft provides Azure Cost Management to help you track spending:
Helps you to monitor and control Azure spending, and optimize use of resources.
Reviews your entire subscription and all of its resources, and makes recommendations.
Provides with a full API, to integrate external tools and financial systems for reporting.
Tracks resource usage and manage cloud costs with a single, unified view.
Provides rich operational and financial insights to help you make informed decisions.
In Cost Management, you can:
Create a budget: Create a budget for financial accountability.
You can account for the services you consume or subscribe to for a specific period (monthly,
quarterly, annually) and a scope (subscriptions/resource groups). For example, you can create an
Azure subscription budget for a monthly, quarterly, or annual period.
After you create a budget, it's shown in cost analysis. Viewing your budget against current
spending is one of the first steps needed when analyzing your costs and spending.
Email notifications can be sent when budget thresholds are reached.
You can export costs management data to Azure storage, for analysis.

Azure Cost Management budget


Do a cost analysis: Get a cost analysis to explore and analyze your organizational costs, to help you
understand how costs are accrued, and identify spending trends.
Cost analysis is available to EA users.
You can view cost analysis data for various scopes, including by department, account, subscription or
resource group.
You can get a cost analysis that shows total costs for the current month, and accumulated daily costs.
Azure Cost Management analysis
Get recommendations: Get Advisor recommendations that show you how you can optimize and improve
efficiency.
Learn more:
Get an overview of Azure Cost Management.
Learn how to optimize your cloud investment with Azure Cost Management.
Learn how to use Azure Cost Management reports.
Get a tutorial on optimizing costs from recommendations.
Review the Azure Consumption API.

Best practice: Monitor resource utilization


In Azure you pay for what you use, when resources are consumed, and you don't pay when they aren't. For VMs,
billing occurs when a VM is allocated, and you aren't charged after a VM is deallocated. With this in mind you
should monitor VMs in use, and verify VM sizing.
Continually evaluate your VM workloads to determine baselines.
For example, if your workload is used heavily Monday through Friday, 8am to 6pm, but hardly used outside
those hours, you could downgrade VMs outside peak times. This might mean changing VM sizes, or using
virtual machine scale sets to autoscale VMs up or down.
Some companies "snooze", VMs by putting them on a calendar that specifies when they should be available,
and when they're not needed.
In addition to VM monitoring, you should monitor other networking resources such as ExpressRoute and
virtual network gateways for under and over use.
You can monitor VM usage using Microsoft tools such as Azure Cost Management, Azure Monitor, and Azure
Advisor. Third-party tools are also available.
Learn more:
Get an overview of Azure Monitor and Azure Advisor.
Get Advisor cost recommendations.
[Learn how to optimize costs from recommendations, and prevent unexpected charges.
Learn about the Azure Resource Optimization (ARO ) Toolkit.

Best practice: Implement resource group budgets


Often, resource groups are used to represent cost boundaries. Together with this usage pattern, the Azure team
continues to develop new and enhanced ways to track and analyze resource spending at different levels, including
the ability to create budgets at the resource group and resources.
A resource group budget helps you track the costs associated with a resource group.
You can trigger alerts and run a wide variety of playbooks as the budget is reached or exceeded.
Learn more:
Learn how to manage costs with Azure Budgets.
Follow a tutorial to create and manage an Azure budget.

Best practice: Optimize Azure Monitor retention


As you move resources into Azure and enable diagnostic logging for them, you generate a lot of log data. Typically
this log data is sent to a storage account that's mapped to a Log Analytics workspace.
The longer the log data retention period, the more data you'll have.
Not all log data is equal, and some resources will generate more log data than others.
Due to regulations and compliance, it's likely that you'll need to retain log data for some resources longer than
others.
You should walk a careful line between optimizing your log storage costs, and keeping the log data you need.
We recommend evaluating and setting up the logging immediately after completing a migration, so that you
aren't spending money retaining logs of no importance.
Learn more:
Learn about monitoring usage and estimated costs.

Best practice: Optimize storage


If you followed best practices for selecting storage before migration, you are probably reaping some benefits.
However, there are probably additional storage costs that you can still optimize. Over time blobs and files become
stale. Data might not be used anymore, but regulatory requirements might mean that you need to keep it for a
certain period. As such, you might not need to store it on the high-performance storage that you used for the
original migration.
Identifying and moving stale data to cheaper storage areas can have a huge impact on your monthly storage
budget and cost savings. Azure provides many ways to help you identify and then store this stale data.
Take advantage of access tiers for general-purpose v2 storage, moving less important data from Hot to Cool
and Archived tiers.
Use StorSimple to help move stale data based on customized policies.
Learn more:
Learn more about access tiers.
Get an overview of StorSimple, and StorSimple pricing.

Best practice: Automate VM optimization


The ultimate goal of running a VM in the cloud is to maximize the CPU, memory, and disk that it uses. If you
discover VMs that aren't optimized, or have frequent periods when VMs aren't used, it makes sense to either shut
them down, or downscale them using virtual machine scale sets.
You can optimize a VM with Azure Automation, virtual machine scale sets, auto-shutdown, and scripted or third-
party solutions.
Learn more:
Learn how to use vertical autoscaling.
Schedule a VM autostart.
Learn how to start or stop VMs off hours in Azure Automation.
[Get more information] about Azure Advisor, and the Azure Resource Optimization (ARO ) Toolkit.

Best practices: Use Logic Apps and runbooks with Budgets API
Azure provides a REST API that has access to your tenant billing information.
You can use the Budgets API to integrate external systems and workflows that are triggered by metrics that
you build from the API data.
You can pull usage and resource data into your preferred data analysis tools.
The Azure Resource Usage and RateCard APIs can help you accurately predict and manage your costs.
The APIs are implemented as a Resource Provider and are included in the APIs exposed by the Azure Resource
Manager.
The Budgets API can be integrated with Azure Logic Apps and Runbooks.
Learn more:
Learn more about the Budgets API.
Get insights into Azure usage with the Billing API.

Best practice: Implement serverless technologies


VM workloads are often migrated "as is" to avoid downtime. Often VMs may host tasks that are intermittent,
taking a short period to run, or alternatively many hours. For example, VMs that run scheduled tasks such as
Windows task scheduler or PowerShell scripts. When these tasks aren't running, you're nevertheless absorbing
VM and disk storage costs.
After migration, after a thorough review of these types of tasks you might consider migrating them to serverless
technologies such as Azure Functions or Azure Batch jobs. With this solution, you no longer need to manage and
maintain the VMs, bringing additional cost savings.
Learn more:
Learn about Azure Functions.
Learn about Azure Batch.

Next steps
Review other best practices:
Best practices for security and management after migration.
Best practices for networking after migration.
Best practices for securing and managing workloads
migrated to Azure
29 minutes to read • Edit Online

As you plan and design for migration, in addition to thinking about the migration itself, you need to consider your
security and management model in Azure after migration. This article describes planning and best practices for
securing your Azure deployment after migrating, and for ongoing tasks to keep your deployment running at an
optimal level.

IMPORTANT
The best practices and opinions described in this article are based on the Azure platform and service features available at the
time of writing. Features and capabilities change over time.

Secure migrated workloads


After migration, the most critical task is to secure migrated workloads from internal and external threats. These
best practices help you to do that:
Work with Azure Security Center: Learn how to work with the monitoring, assessments, and recommendations
provided by Azure Security Center.
Encrypt your data: Get best practices for encrypting your data in Azure.
Set up antimalware: Protect your VMs from malware and malicious attacks.
Secure web apps: Keep sensitive information secure in migrated web apps.
Review subscriptions: Verify who can access your Azure subscriptions and resources after migration.
Work with logs: Review your Azure auditing and security logs on a regular basis.
Review other security features: Understand and evaluate advanced security features that Azure offers.

Best practice: Follow Azure Security Center recommendations


Microsoft works hard to ensure that Azure tenant admins have the information needed to enable security features
that protect workloads from attacks. Azure Security Center provides unified security management. From the
Security Center, you can apply security policies across workloads, limit threat exposure, and detect and respond to
attacks. Security Center analyzes resources and configurations across Azure tenants and makes security
recommendations, including:
Centralized policy management: Ensure compliance with company or regulatory security requirements by
centrally managing security policies across all your hybrid cloud workloads.
Continuous security assessment: Monitor the security posture of machines, networks, storage and data
services, and applications to discover potential security issues.
Actionable recommendations: Remediate security vulnerabilities before they can be exploited by attackers
with prioritized and actionable security recommendations.
Prioritized alerts and incidents: Focus on the most critical threats first with prioritized security alerts and
incidents.
In addition to assessments and recommendations, the Azure Security Center provides other security features that
can be enabled for specific resources.
Just-in-time (JIT) access. Reduce your network attack surface with just in time, controlled access to
management ports on Azure VMs.
Having VM RDP port 3389 open on the internet exposes VMs to continual bad actor activity. Azure IP
addresses are well-known, and hackers continually probe them for attacks on open 3389 ports.
Just in time uses network security groups (NSGs) and incoming rules that limit the amount of time that
a specific port is open.
With just in time enabled, Security Center checks that a user has role-based access control (RBAC ) write
access permissions for a VM. In addition, specify rules for how users can connect to VMs. If permissions
are OK, an access request is approved and Security Center configures NSGs to allow inbound traffic to
the selected ports for the amount of time you specify. NSGs are return to their previous state when the
time expires.
Adaptive application controls. Keep software and malware off VMs by controlling which apps run on them
using dynamic allow lists.
Adaptive application controls allow you to approve apps, and prevent rogue users or administrators
from installing unapproved or vetting software apps on your VMs.
You can block or alert attempts to run malicious apps, avoid unwanted or malicious apps, and
ensure compliance with your organization's app security policy.
File Integrity Monitoring. Ensure the integrity of files running on VMs.
You don't need to install software to cause VM issues. Changing a system file can also cause VM failure
or performance degradation. File integrity Monitoring examines system files and registry settings for
changes, and notifies you if something is updated.
Security Center recommends which files you should monitor.
Learn more:
Learn more about Azure Security Center.
Learn more about just in time VM access.
Learn about applying adaptive application controls.
Get started with File Integrity Monitoring.

Best practice: Encrypt data


Encryption is an important part of Azure security practices. Ensuring that encryption is enabled at all levels helps
prevent unauthorized parties from gaining access to sensitive data, including data in transit and at rest.
Encryption for IaaS
Virtual machines: For VMs, you can use Azure Disk Encryption to encrypt your Windows and Linux IaaS VM
disks.
Disk encryption uses BitLocker for Windows, and DM -Crypt for Linux to provide volume encryption for
the OS and data disks.
You can use an encryption key created by Azure, or you can supply your own encryption keys,
safeguarded in Azure Key Vault.
With Disk Encryption, IaaS VM data is secured at rest (on the disk) and during VM boot.
Azure Security Center alerts you if you have VMs that aren't encrypted.
Storage: Protect at rest data stored in Azure storage.
Data stored in Azure storage accounts can be encrypted using Microsoft-generated AES keys that are
FIPS 140-2 compliant, or you can use your own keys.
Storage Service Encryption is enabled for all new and existing storage accounts and can't be disabled.
Encryption for PaaS
Unlike IaaS where you manage your own VMs and infrastructure, in a PaaS model platform and infrastructure is
managed by the provider, leaving you to focus on core app logic and capabilities. With so many different types of
PaaS services, each service is evaluated individually for security purposes. As an example, let's see how we might
enable encryption for Azure SQL Database.
Always Encrypted: Use the Always Encrypted Wizard in SQL Server Management Studio to protect data at
rest.
You create Always Encrypted key to encrypt individual column data.
Always Encrypted keys can be stored as encrypted in database metadata, or stored in trusted key stores
such as Azure Key Vault.
App changes will probably be needed to use this feature.
Transparent data encryption (TDE ): Protect the Azure SQL Database with real-time encryption and
decryption of the database, associated backups, and transaction log files at rest.
TDE allows encryption activities to take place without changes at the app layer.
TDE can use encryption keys provided by Microsoft, or you can provide your own keys using Bring Your
Own Key support.
Learn more:
Learn about Azure Disk Encryption for IaaS VMs.
Enable encryption for IaaS Windows VMs.
Learn about Azure Storage Service Encryption for data at rest.
Read an overview of Always Encrypted.
Read about TDE for Azure SQL Database.
Learn about TDE with Bring Your Own Key.

Best practice: Protect VMs with antimalware


In particular, older Azure-migrated VMs may not have the appropriate level of antimalware installed. Azure
provides a free endpoint solution that helps protect VMs from viruses, spyware, and other malware.
Microsoft Antimalware for Azure generates alerts when known malicious or unwanted software tries to install
itself.
It's a single agent solution that runs in the background without human intervention.
In Azure Security Center, you can easily identify VMs that don't have endpoint protection running, and install
Microsoft Antimalware as needed.
Antimalware for VMs
Learn more:
Learn about Microsoft Antimalware.

Best practice: Secure web apps


Migrated web apps face a couple of issues:
Most legacy web applications tend to have sensitive information inside configuration files. Files containing
such information can present security issues when apps are backed up, or when app code is checked into or
out of source control.
In addition, when you migrate web apps residing in a VM, you are likely moving that machine from an on-
premises network and firewall-protected environment to an environment facing the internet. Make sure that
you set up a solution that does the same work as your on-premises protection resources.
Azure provides a couple of solutions:
Azure Key Vault: Today, web app developers are taking steps to ensure that sensitive information isn't leaked
from these files. One method to secure information is to extract it from files and put it into an Azure Key Vault.
You can use Key Vault to centralize storage of app secrets, and control their distribution. It avoids the
need to store security information in app files.
Apps can securely access information in the vault using URIs, without needing custom code.
Azure Key Vault allows you to lock down access via Azure security controls and to seamlessly implement
'rolling keys'. Microsoft does not see or extract your data.
App Service Environment: If an app you migrate needs extra protection, you can consider adding an App
Service Environment and web application firewall to protect the app resources.
The Azure App Service Environment provides a fully isolated and dedicated environment in which to
running App Service apps such as Windows and Linux web apps, Docker containers, mobile apps, and
functions.
It's useful for apps that are very high scale, require isolation and secure network access or have high
memory utilization.
Web application firewall: A feature of Azure Application Gateway that provides centralized protection for
web apps.
It protects web apps without requiring back-end code modifications.
It protects multiple web apps at the same time behind an application gateway.
A web application firewall can be monitored using Azure Monitor, and is integrated into Azure Security
Center.
Azure Key Vault
Learn more:
Get an overview of Azure Key Vault.
Learn about web application firewall.
Get an introduction to App Service Environments.
Learn how to configure a web app to read secrets from Key Vault.
Learn about web application firewall.

Best practice: Review subscriptions and resource permissions


As you migrate your workloads and run them in Azure, staff with workload access move around. Your security
team should review access to your Azure tenant and resource groups on a regular basis. Azure has offerings for
identity management and access control security, including role-based access control (RBAC ) to authorize
permissions to access Azure resources.
RBAC assigns access permissions for security principals. Security principals represent users, groups (a set of
users), service principals (identity used by apps and services), and managed identities (an Azure Active
Directory identity automatically managed by Azure).
RBAC can assign roles to security principles, such as owner, contributor and reader, and role definitions (a
collection of permissions) that define the operations that can be performed by the roles.
RBAC can also set scopes that set the boundary for a role. Scope can be set at several levels, including a
management group, subscription, resource group, or resource.
Ensure that admins with Azure access are only able to access resources that you want to allow. If the
predefined roles in Azure aren't granular enough, you can create custom roles to separate and limit access
permissions.
Access control - IAM
Learn more:
About RBAC.
Learn to manage access using RBAC and the Azure portal.
Learn about custom roles.

Best practice: Review audit and security logs


Azure Active Directory (Azure AD ) provides activity logs that appear in Azure Monitor. The logs capture the
operations performed in Azure tenancy, when they occurred, and who performed them.
Audit logs show the history of tasks in the tenant. Sign-in activity logs show who carried out the tasks.
Access to security reports depends on your Azure AD license. In Free and Basic, you get a list of risky users
and sign-ins. In Premium 1 and Premium 2 editions you get underlying event information.
You can route activity logs to various endpoints for long-term retention and data insights.
Make it a common practice to review the logs or integrate your security information and event management
(SIEM ) tools to automatically review abnormalities. If you're not using Premium 1 or 2, you'll need to do a lot
of analysis yourself or using your SIEM system. Analysis includes looking for risky sign-ins and events, and
other user attack patterns.
Azure AD
Users and Groups
Learn more:
Learn about Azure AD activity logs in Azure Monitor.
Learn how to audit activity reports in the Azure AD portal.

Best practice: Evaluate other security features


Azure provides other security features that provide advanced security options. Some of these best practices
require add-on licenses and premium options.
Implement Azure AD administrative units (AU ). Delegating administrative duties to support staff can be
tricky with just basic Azure access control. Giving support staff access to administer all the groups in Azure AD
might not be the ideal approach for organizational security. Using AU allows you to segregate Azure resources
into containers in a similar way to on-premises organizational units (OU ). To use AU the AU admin must have
a premium Azure AD license. Learn more.
Use multi-factor authentication. If you have a premium Azure AD license, you can enable and enforce
multi-factor authentication on your admin accounts. Phishing is the most common way that accounts
credentials are compromised. Once a bad actor has admin account credentials, there's no stopping them from
far-reaching actions, such as deleting all your resource groups. You can establish multi-factor authentication in
several ways, including with email, an authenticator app, and phone text messages. As an administrator, you
can select the least intrusive option. Multi-factor authentication integrates with threat analytics and conditional
access policies to randomly require a multi-factor authentication challenge respond. Learn more about security
guidance, and how to set up multi-factor authentication.
Implement conditional access. In most small and medium size organizations, Azure admins and the
support team are probably located in a single geography. In this case, most logins will come from the same
areas. If the IP addresses of these locations are fairly static, it makes sense that you shouldn't see administrator
logins from outside these areas. Even in an event in which a remote bad actor compromises an admin's
credentials, you can implement security features like conditional access combined with multi-factor
authentication to prevent login from remote locations, or from spoofed locations from random IP addresses.
Learn more about conditional access, and review best practices for conditional access in Azure AD.
Review Enterprise Application permissions. Over time, admins select Microsoft and third-party links
without knowing their impact on the organization. Links can present consent screens that assign permissions
to Azure apps, and might allow access to read Azure AD data, or even full access to manage your entire Azure
subscription. You should regularly review the apps to which your admins and users have allowed access to
Azure resources. Ensure that these apps have only the permissions that are necessary. Additionally, quarterly
or semi-annually you can email users with a link to app pages so that they're aware of the apps to which
they've allowed access to their organizational data. Learn more about application types, and how to control app
assignments in Azure AD.

Managed migrated workloads


In this section we'll recommend some best practices for Azure management, including:
Manage resources: Best practices for Azure resource groups and resources, including smart naming,
preventing accidental deletion, managing resource permissions, and effective resource tagging.
Use blueprints: Get a quick overview on using blueprints for building and managing your deployment
environments.
Review architectures: Review sample Azure architectures to learn from as you build your post-migration
deployments.
Set up management groups: If you have multiple subscriptions, you can gather them into management groups,
and apply governance settings to those groups.
Set up access policies: Apply compliance policies to your Azure resources.
Implement a BCDR strategy: Put together a business continuity and disaster recovery (BCDR ) strategy to keep
data safe, your environment resilient, and resources up and running when outages occur.
Manage VMs: Group VMs into availability groups for resilience and high availability. Use managed disks for
ease of VM disk and storage management.
Monitor resource usage: Enable diagnostic logging for Azure resources, build alerts and playbooks for
proactive troubleshooting, and use the Azure dashboard for a unified view of your deployment health and
status.
Manage support and updates: Understand your Azure support plan and how to implement it, get best
practices for keeping VMs up-to-date, and put processes in place for change management.

Best practice: Name resource groups


Ensuring that your resource groups have meaningful names that admins and support team members can easy
recognize and navigate will drastically improve productivity and efficiency.
We recommend following Azure naming conventions.
If you're synchronizing your on-premises Active Directory to Azure AD using Azure AD Connect, consider
matching the names of security groups on-premises to the names of resource groups in Azure.
Resource group
naming
Learn more:
Learn about naming conventions.

Best practice: Implement delete locks for resource groups


The last thing you need is for a resource group to disappear because it was deleted accidentally. We recommend
that you implement delete locks so that this doesn't happen.

Delete locks
Learn more:
Learn about locking resources to prevent unexpected changes.

Best practice: Understand resource access permissions


A subscription owner has access to all the resource groups and resources in your subscription.
Add people sparingly to this valuable assignment. Understanding the ramifications of these types of
permissions is important in keeping your environment secure and stable.
Make sure you place resources in appropriate resources groups:
Match resources with a similar lifecycle together. Ideally, you shouldn't need to move a resource when
you need to delete an entire resource group.
Resources that support a function or workload should be placed together for simplified management.
Learn more:
Learn about organizing subscriptions and resource groups.

Best practice: Tag resources effectively


Often, using only a resource group name related to resources won't provide enough metadata for effective
implementation of mechanisms such as internal billing or management within a subscription.
As a best practice, you should use Azure tags to add useful metadata that can be queried and reported on.
Tags provide a way to logically organize resources with properties that you define. Tags can be applied to
resource groups or resources directly.
Tags can be applied on a resource group or on individual resources. Resource group tags aren't inherited by
the resources in the group.
You can automate tagging using PowerShell or Azure Automation, or tag individual groups and resources. -
tagging approach or a self-service one. If you have a request and change management system in place, then
you can easily use the information in the request to populate your company-specific resource tags.

Tagging
Learn more:
Learn about tagging and tag limitations.
Review PowerShell and CLI examples to set up tagging, and to apply tags from a resource group to its
resources.
Read Azure tagging best practices.
Best practice: Implement blueprints
Just as blueprint allows engineers and architects to sketch a project's design parameters, Azure Blueprints enable
cloud architects and central IT groups to define a repeatable set of Azure resources that implements and adheres
to an organization's standards, patterns, and requirements. Using Azure Blueprints, development teams can
rapidly build and create new environments that meet organizational compliance requirements, and that have a set
of built-in components, such as networking, to speed up development and delivery.
Use blueprints to orchestrate the deployment of resource groups, Azure Resource Manager templates, and
policy and role assignments.
Blueprints are stored in a globally distributed Azure Cosmos DB. Blueprint objects are replicated to multiple
Azure regions. Replication provides low latency, high availability, and consistent access to blueprint, regardless
of the region to which a blueprint deploys resources.
Learn more:
Read about blueprints.
Review a blueprint example used to accelerate AI in healthcare.

Best practice: Review Azure reference architectures


Building secure, scalable, and manageable workloads in Azure can be daunting. With continual changes, it can be
difficult to keep up with different features for an optimal environment. Having a reference to learn from can be
helpful when designing and migrating your workloads. Azure and Azure partners have built several sample
reference architectures for various types of environments. These samples are designed to provide ideas that you
can learn from and build on.
Reference architectures are arranged by scenario. They contain best practices and advice on management,
availability, scalability, and security. The Azure App Service Environment provides a fully isolated and dedicated
environment in which to run App Service apps, including Windows and Linux web apps, Docker containers,
mobile apps, and functions. App Service adds the power of Azure to your application, with security, load balancing,
autoscaling, and automated management. You can also take advantage of its DevOps capabilities, such as
continuous deployment from Azure DevOps and GitHub, package management, staging environments, custom
domain, and SSL certificates. App Service is useful for apps that need isolation and secure network access, and
those that use high amounts of memory and other resources that need to scale.
Learn more:
Learn about Azure reference architectures.
Review Azure example scenarios.

Best practice: Manage resources with Azure management groups


If your organization has multiple subscriptions, you need to manage access, policies, and compliance for them.
Azure management groups provide a level of scope above subscriptions.
You organize subscriptions into containers called management groups and apply governance conditions to
them.
All subscriptions in a management group automatically inherit the management group conditions.
Management groups provide large-scale enterprise-grade management, no matter what type of subscriptions
you have.
For example, you can apply a management group policy that limits the regions in which VMs can be created.
This policy is then applied to all management groups, subscriptions, and resources under that management
group.
You can build a flexible structure of management groups and subscriptions, to organize your resources into a
hierarchy for unified policy and access management.
The following diagram shows an example of creating a hierarchy for governance using management groups.

Management groups
Learn more:
Learn more about organizing resources into management groups.

Best practice: Deploy Azure Policy


Azure Policy is a service in Azure that you use to create, assign and, manage policies.
Policies enforce different rules and effects over your resources, so those resources stay compliant with your
corporate standards and service level agreements.
Azure Policy evaluates your resources, scanning for those not compliant with your policies.
For example, you could create a policy that allows only a specific SKU size for VMs in your environment. Azure
Policy will evaluate this setting when creating and updating resources, and when scanning existing resources.
Azure provides some built-in policies that you can assign, or you can create your own.

Azure Policy
Learn more:
Get an overview of Azure Policy.
Learn about creating and managing policies to enforce compliance.

Best practice: Implement a BCDR strategy


Planning for business continuity and disaster recovery (BCDR ), is a critical exercise that you should complete as
part of your Azure migration planning process. In legal terms, your contracts may include a force majeure clause
that excuses obligations due to a greater force such as hurricanes or earthquakes. However, you also have
obligations around your ability to ensure that services will continue to run, and recover where necessary, when
disaster strike. Your ability to do this can make or break your company's future.
Broadly, your BCDR strategy must consider:
Data backup: How to keep your data safe so that you can recover it easily if outages occur.
Disaster recovery: How to keep your apps resilient and available if outages occur.
Set up BCDR
When migrating to Azure, it's important to understand that although the Azure platform provides these inbuilt
resiliency capabilities, you need to design your Azure deployment to take advantage of Azure features and
services that provide high availability, disaster recovery, and backup.
Your BCDR solution will depend on your company objectives and is influenced by your Azure deployment
strategy. Infrastructure as a service (IaaS ) and platform as a service (PaaS ) deployments present different
challenges for BCDR.
Once in place, your BCDR solutions should be tested regularly to check that your strategy remains viable.
Back up an IaaS deployment
In most cases, an on-premises workload is retired after migration, and your on-premises strategy for backing up
data must be extended or replaced. If you migrate your entire datacenter to Azure, you'll need to design and
implement a full backup solution using Azure technologies, or third-party integrated solutions.
For workloads running on Azure IaaS VMs, consider these backup solutions:
Azure Backup: Provides application-consistent backups for Azure Windows and Linux VMs.
Storage snapshots: Takes snapshots of blob storage.
Azure Backup
Azure Backup creates data recovery points that are stored in Azure storage. Azure Backup can back up Azure VM
disks, and Azure Files (preview ). Azure Files provide file shares in the cloud, accessible via SMB.
You can use Azure Backup to back up VMs in a couple of ways.
Direct backup from VM settings. You can back up VMs with Azure Backup directly from the VM options in
the Azure portal. You can back up the VM once per day, and you can restore the VM disk as needed. Azure
Backup takes app-aware data snapshots (VSS ), and no agent is installed on the VM.
Direct backup in a Recovery Services vault. You can back up your IaaS VMs by deploying an Azure Backup
Recovery Services vault. This provides a single location to track and manage backups as well as granular
backup and restore options. Backup is up to three times a day, at the file/folder level. It isn't app-aware and
Linux isn't supported. Install the Microsoft Azure Recovery Services (MARS ) agent on each VM that you want
to back up using this method.
Protect the VM to Azure Backup Server. Azure Backup Server is provided free with Azure Backup. The VM
is backed up to local Azure Backup Server storage. You then back up the Azure Backup Server to Azure in a
vault. Backup is app-aware, with full granularity over backup frequency and retention. You can back up at the
app level, for example by backing up SQL Server or SharePoint.
For security, Azure Backup encrypts data in-flight using AES 256 and sends it over HTTPS to Azure. Backed-up
data at-rest in Azure is encrypted using Storage Service Encryption (SSE ), and data for transmission and storage.
Azure Backup
Learn more:
Learn about different types of backups.
Plan a backup infrastructure for Azure VMs.
Storage snapshots
Azure VMs are stored as page blobs in Azure Storage.
Snapshots capture the blob state at a specific point in time.
As an alternative backup method for Azure VM disks, you can take a snapshot of storage blobs and copy them
to another storage account.
You can copy an entire blob, or use an incremental snapshot copy to copy only delta changes and reduce
storage space.
As an extra precaution, you can enable soft delete for blob storage accounts. With this feature enabled, a blob
that's deleted is marked for deletion but not immediately purged. During the interim period, the blob can be
restored.
Learn more:
Learn about Azure blob storage.
Learn how to create a blob snapshot.
Review a sample scenario for blob storage backup.
Read about soft delete.
Disaster recovery and forced failover (preview ) in Azure Storage
Third-party backup
In addition, you can use third-party solutions to back up Azure VMs and storage containers to local storage or
other cloud providers. Learn more about backup solutions in the Azure marketplace.
Set up disaster recovery for IaaS apps
In addition to protecting data, BCDR planning must consider how to keep apps and workloads available in case of
disaster. For workloads running on Azure IaaS VMs and Azure storage consider these solutions:
Azure Site Recovery
Azure Site Recovery is the primary Azure service for ensuring that Azure VMs can be brought online and VM
apps made available when outages occur.
Site Recovery replicates VMs from a primary to secondary Azure region. When disaster strikes, you fail VMs over
from the primary region, and continue accessing them as normal in the secondary region. When operations
return to normal, you can fail back VMs to the primary region.

Site
Recovery
Learn more:
Review disaster recovery scenarios for Azure VMs.
Learn how to set up disaster recovery for an Azure VM after migration.

Best practice: Use managed disks and availability sets


Azure uses availability sets to logically group VMs together, and to isolate VMs in a set from other resources. VMs
in an availability set are spread across multiple fault domains with separate subsystems, to protect against local
failures, and are also spread across multiple update domains so that not all VMs in a set reboot at the same time.
Azure managed disks simplify disk management for Azure IaaS VMs by managing the storage
accounts associated with the VM disks.
We recommend that you use managed disks where possible. You only have to specify the type of storage you
want to use and the size of disk you need, and Azure creates and manages the disk for you, behind the scenes.
You can convert existing disks to managed.
You should create VMs in availability sets for high resilience and availability. When planned or unplanned
outages occur, availability sets ensure that at least one of your VMs in the set continues to be available.
Managed disks
Learn more:
Get an overview of managed disks.
Learn about converting disks to managed.
Learn how to manage the availability of Windows VMs in Azure.

Best practice: Monitor resource usage and performance


You might have moved your workloads to Azure for its immense scaling capabilities. However, moving your
workload doesn't mean that Azure will automatically implement scaling without your input. As an example:
If your marketing organization pushes a new TV advertisement that drives 300% more traffic, this could cause
site availability issues. Your newly migrated workload might hit assigned limits and crash.
Another example might be a distributed denial-of-service (DDoS ) attack on your migrated workload. In this
case you might not want to scale, but to prevent the source of the attacks from reaching your resources.
These two cases have different resolutions, but for both you need an insight into what's happening with usage and
performance monitoring.
Azure Monitor can help surface these metrics, and provide response with alerts, autoscaling, event hubs, logic
apps and more.
In addition to Azure monitoring, you can integrate your third-party SIEM application to monitor the Azure logs
for auditing and performance events.
Azure Monitor
Learn more:
Learn about Azure Monitor.
Get best practices for monitoring and diagnostics.
Learn about autoscaling.
Learn how to route Azure data to a SIEM tool.

Best practice: Enable diagnostic logging


Azure resources generate a fair number of logging metrics and telemetry data.
By default, most resource types don't have diagnostic logging enabled.
By enabling diagnostic logging across your resources, you can query logging data, and build alerts and
playbooks based on it.
When you enable diagnostic logging, each resource will have a specific set of categories. You select one or
more logging categories, and a location for the log data. Logs can be sent to a storage account, event hub, or to
Azure Monitor logs.
Diagnostic logging
Learn more:
Learn about collecting and consuming log data.
Learn what's supported for diagnostic logging.

Best practice: Set up alerts and playbooks


With diagnostic logging enabled for Azure resources, you can start to use logging data to create custom alerts.
Alerts proactively notify you when conditions are found in your monitoring data. You can then address issues
before system users notice them. You can alert on things like metric values, log search queries, activity log
events, platform health, and website availability.
When alerts are triggered, you can run a Logic App Playbook. A playbook helps you to automate and
orchestrate a response to a specific alert. Playbooks are based on Azure Logic Apps. You can use Logic App
templates to create playbooks, or create your own.
As a simple example, you can create an alert that triggers when a port scan happens against a network security
group. You can set up a playbook that runs and locks down the IP address of the scan origin.
Another example might be an app with a memory leak. When the memory usage gets to a certain point, a
playbook can recycle the process.

Alerts
Learn more:
Learn about alerts.
Learn about security playbooks that respond to Security Center alerts.

Best practice: Use the Azure dashboard


The Azure portal is a web-based unified console that allows you to build, manage, and monitor everything from
simple web apps to complex cloud applications. It includes a customizable dashboard and accessibility options.
You can create multiple dashboards and share them with others who have access to your Azure subscriptions.
With this shared model, your team has visibility into the Azure environment, allowing them to be proactive
when managing systems in the cloud.

Azure dashboard
Learn more:
Learn how to create a dashboard.
Learn about dashboard structure.

Best practice: Understand support plans


At some point, you will need to collaborate with your support staff or Microsoft support staff. Having a set of
policies and procedures for support during scenarios such as disaster recovery is vital. In addition, your admins
and support staff should be trained on implementing those policies.
In the unlikely event that an Azure service issue affects your workload, admins should know how to submit a
support ticket to Microsoft in the most appropriate and efficient way.
Familiarize yourself with the various support plans offered for Azure. They range from response times
dedicated to Developer instances, to Premier support with a response time of less than 15 minutes.
Support plans
Learn more:
Get an overview of Azure support plans.
Learn about service level agreements (SLAs).

Best practice: Manage updates


Keeping Azure VMs updated with the latest operating system and software updates is a massive chore. The ability
to surface all VMs, to figure out which updates they need, and to automatically push those updates is extremely
valuable.
You can use Update Management in Azure Automation to manage operating system updates for machines
running Windows and Linux computers that are deployed in Azure, on-premises, and in other cloud providers.
Use Update Management to quickly assess the status of available updates on all agent computers, and manage
update installation.
You can enable Update Management for VMs directly from an Azure Automation account. You can also update
a single VM from the VM page in the Azure portal.
In addition, Azure VMs can be registered with System Center Configuration Manager. You could then migrate
the Configuration Manager workload to Azure, and do reporting and software updates from a single web
interface.
Updates
Learn more:
Learn about update management in Azure.
Learn how to integrate Configuration Manager with update management.
Frequently asked questions about Configuration Manager in Azure.

Implement a change management process


As with any production system, making any type of change can affect your environment. A change management
process that requires requests to be submitted in order to make changes to production systems is a valuable
addition in your migrated environment.
You can build best practice frameworks for change management to raise awareness in administrators and
support staff.
You can use Azure Automation to help with configuration management and change tracking for your migrated
workflows.
When enforcing change management process, you can use audit logs to link Azure change logs to presumably
(or not) existing change requests. So that if you see a change made without a corresponding change request,
you can investigate what went wrong in the process.
Azure has a change tracking solution in Azure Automation:
The solution tracks changes to Windows and Linux software and files, Windows registry keys, Windows
services, and Linux daemons.
Changes on monitored servers are sent to the Azure Monitor service in the cloud for processing.
Logic is applied to the received data and the cloud service records the data.
On the Change Tracking dashboard, you can easily see the changes that were made in your server
infrastructure.
Change management
Learn more:
Learn about Change Tracking.
Learn about Azure Automation capabilities.

Next steps
Review other best practices:
Best practices for networking after migration.
Best practices for cost management after migration.
Cloud Adoption Framework migration model
3 minutes to read • Edit Online

This section of the Cloud Adoption Framework explains the principles behind its migration model. Wherever
possible, this content attempts to maintain a vendor-neutral position while guiding you through the processes and
activities that can be applied to any cloud migration, regardless of your chosen cloud vendor.

Understand migration motivations


Cloud migration is a portfolio management effort, cleverly disguised as a technical implementation. During the
migration process, you will decide to move some assets, invest in others, and retire obsolete or unused assets.
Some assets will be optimized, refactored, or replaced entirely as part of this process. Each of these decisions
should align with the motivations behind your cloud migration. The most successful migrations also go a step
further and align these decisions with desired business outcomes.
The Cloud Adoption Framework migration model depends on your organization having completed a process of
business readiness for cloud adoption. Make sure you have reviewed Plan and Ready guidance in the Cloud
Adoption Framework to determine the business drivers or other justification for a cloud migration, as well as any
required organizational planning or training required before executing a migration process at scale.

NOTE
While business planning is important, a growth mindset is equally important. In parallel with broader business planning
efforts by the cloud strategy team, it's suggested that the cloud adoption team begin migrating a first workload as a
precursor to wider scale migration efforts. This initial migration will allow the team to gain practical experience with the
business and technical issues involved in a migration.

Envision an end state


It's important to establish a rough vision of your end state before starting your migration efforts. The diagram
below shows an on-premises starting point of infrastructure, applications, and data, which defines your digital
estate. During the migration process, those assets are transitioned using one of the five migration strategies
described in The five Rs of rationalization.

Migration and modernization of workloads range from simple rehost (also called lift and shift) migrations using
infrastructure as a service (IaaS ) capabilities that don't require code and app changes, through refactoring with
minimal changes, to rearchitecting to modify and extend code and app functionality to take advantage of cloud
technologies.
Cloud-native strategies and platform as a service (PaaS ) strategies rebuild on-premises workloads using Azure
platform offerings and managed services. Workloads that have equivalent fully managed software as a service
(SaaS ) cloud-based offerings can often be fully replaced by these services as part of the migration process.

NOTE
During the public preview of the Cloud Adoption Framework, this section of the framework emphasizes a rehost migration
strategy. Although PaaS and SaaS solutions are discussed as alternatives when appropriate, the migration of virtual machine-
based workloads using IaaS capabilities is the primary focus.
Other sections and future iterations of this content will expand on other approaches. For a high-level discussion on
expanding the scope of your migration to include more complicated migration strategies, see the article balancing the
portfolio.

Incremental migration
The Cloud Adoption Framework migration model is based on an incremental cloud transformation process. It
assumes that your organization will start with an initial, limited-scope, cloud migration effort, which we refer to
commonly as the first workload. This effort will expand iteratively to include more workloads as your Operations
teams refine and improve your migration processes.
Cloud migrations tools like Azure Site Recovery can migrate entire datacenters consisting of tens of thousands of
VMs. However, the business and existing IT operations can seldom handle such a high pace of change. As such
many organizations break up a migration effort into multiple iterations, moving one workload (or a collection of
workloads) per iteration.
The principles behind this incremental model are based on the execution of processes and prerequisites referenced
in the following infographic.

The consistent application of these principles represents an end goal for your cloud migration processes and
should not be viewed as a required starting point. As your migration efforts mature, refer to the guidance in this
section to help define the best process to support your organizational needs.

Next steps
Begin learning about this model by investigating the prerequisites to migration.
Prerequisites to migration
Prerequisites for migration
3 minutes to read • Edit Online

Prior to beginning any migrations, your migration target environment must be prepared for the coming changes.
In this case, environment refers to the technical foundation in the cloud. Environment also means the business
environment and mindset driving the migration. Likewise, the environment includes the culture of the teams
executing the changes and those receiving the output. Lack of preparation for these changes is the most common
reason for failure of migrations. This series of articles walks you through suggested prerequisites to prepare the
environment.

Objective
Ensure business, culture, and technical readiness prior to beginning an iterative migration plan.

Review business drivers


Before beginning any cloud migration, review the Plan and Ready guidance in the Cloud Adoption Framework to
ensure your organization is prepared for cloud adoption and migration processes. In particular, review the
business requirements and expected outcomes driving the migration:
Getting started: Migrate
Why are we moving to the cloud?

Definition of done
Prerequisites are completed when the following are true:
Business readiness. The cloud strategy team has defined and prioritized a high-level migration backlog
representing the portion of the digital estate to be migrated in the next two or three releases. The cloud
strategy team and the cloud adoption team have agreed to an initial strategy for managing change.
Culture readiness. The roles, responsibilities, and expectations of the cloud adoption team, cloud strategy
team, and affected users have been agreed on regarding the workloads to be migrated in the next two or three
releases.
Technical readiness. The landing zone (or allocated hosting space in the cloud) that will receive the migrated
assets meets minimum requirements to host the first migrated workload.
Cau t i on

Preparation is key to the success of a migration. However, too much preparation can lead to analysis paralysis,
where too much time spent on planning can seriously delay a migration effort. The processes and prerequisites
defined in this section are meant to help you make decisions, but don't let them block you from making
meaningful progress.
Choose a relatively simple workload for your initial migration. Use the processes discussed in this section as you
plan and implement this first migration. This first migration effort will quickly demonstrate cloud principles to
your team and force them to learn about how the cloud works. As your team gains experience, integrate these
learnings as you take on larger and more complex migrations.

Accountability during prerequisites


Two teams are accountable for readiness during the prerequisites phase:
Cloud strategy team: This team is responsible for identifying and prioritizing the first two or three workloads
to serve as migration candidates.
Cloud adoption team: This team is responsible for validating readiness of the technical environment and the
feasibility of migrating the proposed workloads.
A single member of each team should be identified as accountable for each of the three definitions of done
statements in the prior section.

Responsibilities during prerequisites


In addition to the high-level accountability, there are actions that an individual or group needs to be directly
responsible for. The following are a few such responsibilities that affect these activities:
Business prioritization. Make business decisions regarding the workloads to be migrated and general timing
constraints. For more information, see Cloud migration business motivations.
Change management readiness. Establish and communicate the plan for tracking technical change during
migration.
Business user alignment. Establish a plan for readying the business user community for migration execution.
Digital estate inventory and analysis. Execution of the tools required to inventory and analyze the digital
estate. See the Cloud Adoption Framework discussion of the digital estate for more information.
Cloud readiness. Evaluate the target deployment environment to ensure that it complies with requirements of
the first few workload candidates. See the Azure setup guide for more information.
The remaining articles in this series help with the execution of each.

Next steps
With a general understanding of the prerequisites, you are ready to address the first prerequisite early migration
decisions.
Early migration decisions
Decisions that affect migration
6 minutes to read • Edit Online

During migration, several factors affect decisions and execution activities. This article explains the central theme of
those decisions and explores a few questions that carry through the discussions of migration principles in this
section of the Cloud Adoption Framework guidance.

Business outcomes
The objective or goal of any adoption effort can have a significant impact on the suggested approach to execution.
Migration. Urgent business drivers, speed of adoption, or cost savings are examples of operational outcomes.
These outcomes are central to efforts that drive business value from transitive change in IT or operations
models. The Migrate section of the Cloud Adoption Framework focuses heavily on Migration focused business
outcomes.
Application innovation. Improving customer experience and growing market share are examples of
incremental outcomes. The outcomes result from a collection of incremental changes focused on the needs and
desires of current customers.
Data-driven innovation. New products or services, especially those that come from the power of data, are
examples of disruptive outcomes. These outcomes are the result of experimentation and predictions that use
data to disrupt status quo in the market.
No business would pursue just one of these outcomes. Without operations, there are no customers, and vice versa.
Cloud adoption is no different. Companies commonly work to achieve each of these outcomes, but trying to focus
on all of them simultaneously can spread your efforts too thin and slow progress on work that could most benefit
your business needs.
This prerequisite isn't a demand for you to pick one of these three goals, but instead to help your cloud strategy
team and your cloud adoption team establish a set of operational priorities that will guide execution for the next
three to six months. These priorities are set by ranking each of the three itemized options from most significant to
least significant, as they relate to the efforts this team can contribute to in the next one or two quarters.
Act on migration outcomes
If operational outcomes rank highest in the list, this section of the Cloud Adoption Framework will work well for
your team. In this section, it is assumed that you need to prioritize speed and cost savings as primary key
performance indicators (KPIs), in which case a migration model to adoption would be well aligned with the
outcomes. A migration-focused model is heavily predicated on lift and shift migration of infrastructure as a service
(IaaS ) assets to deplete a datacenter and to produce cost savings. In such a model, modernization may occur but is
a secondary focus until the primary migration mission is realized.
Act on application innovations
If market share and customer experience are your primary drivers, this may not be the best section of the Cloud
Adoption Framework to guide your teams' efforts. Application innovation requires a plan that focuses on the
modernization and transition of workloads, regardless of the underlying infrastructure. In such a case, the
guidance in this section can be informative but may not be the best approach to guide core decisions.
Act on data innovations
If data, experimentation, research and development (R&D ), or new products are your priority for the next six
months or so, this may not be the best section of the Cloud Adoption Framework to guide your teams' efforts. Any
data innovation effort could benefit from guidance regarding the migration of existing source data. However, the
broader focus of that effort would be on the ingress and integration of additional data sources. Extending that
guidance with predictions and new experiences is much more important than the migration of IaaS assets.

Balance the portfolio


This section of the Cloud Adoption Framework establishes the theory to help readers understand different
approaches to addressing change within a balanced portfolio. The article on balancing the portfolio is one example
of an expanded scope, designed to help act on this theory.

Effort
Migration effort can vary widely depending on the size and complexities of the workloads involved. A smaller
workload migration involving a few hundred virtual machines (VMs) is a tactical process, potentially being
implemented using automated tools such as Azure Migrate. Conversely, a large enterprise migration of tens of
thousands of workloads requires a highly strategic process and can involve extensive refactoring, rebuilding, and
replacing of existing applications integrating platform as a service (PaaS ) and software as a service (SaaS )
capabilities. Identifying and balancing the scope of your planned migrations is critical.
Before making any decisions that could have a long-term impact on the current migration program, it is vital that
you create consensus on the following decisions.
Effort type
In any migration of significant scale (>250 VMs), assets are migrated using a variety of transition options,
discussed in the five Rs of rationalization: Rehost, Refactor, Rearchitect, Rebuild, and Replace.
Some workloads are modernized through a rebuild or rearchitect process, creating more modern applications with
new features and technical capabilities. Other assets go through a refactor process, for instance a move to
containers or other more modern hosting and operational approaches that don't necessarily affect the solutions
codebase. Commonly, virtual machines and other assets that are more well-established go through a rehost
process, transitioning those assets from the datacenter to the cloud. Some workloads could potentially be
migrated to the cloud but should instead be replaced using service–based (SaaS -based) cloud services that meet
the same business need, for example by using Office 365 as an alternative to migrating Exchange Server instances.
In the majority of scenarios, some business event creates a forcing function that causes a high percentage of assets
to temporarily migrate using the rehost process, followed by a more significant secondary transition using one of
the other migration strategies after they are in the cloud. This process is commonly known as a cloud transition.
During the process of rationalizing the digital estate, these types of decisions are applied to each asset to migrate.
However, the prerequisite needed at this time is to make a baseline assumption. Of the five migration strategies,
which best aligns with the business objectives or business outcomes driving this migration effort? This decision
serves as a guiding assumption throughout the migration effort.
Effort scale
Scale of the migration is the next important prerequisite decision. The processes required to migrate 1,000 assets
is different from the process required to move 10,000 assets. Before beginning any migration effort, it is important
to answer the following questions:
How many assets support the migrating workloads today? Assets would include data structures,
applications, VMs, and necessary IT appliances. It's recommended that you choose a relatively small workload
for your first migration candidate.
Of those assets, how many are planned for migration? It is common for a percentage of assets to be
terminated during a migration process, due to lack of sustained end-user dependency.
What are the top-down estimates of the migratable assets scale? For the workloads included for
migration, estimate the number of supporting assets such as applications, virtual machines, data sources, and
IT appliances. See the digital estate section of the Cloud Adoption Framework for guidance on identifying
relevant assets.
Effort timing
Often, migrations are driven by a compelling business event that is time sensitive. For instance, one common
driver is the termination or renewal of a third-party hosting contract. Although there are many potential business
events necessitating a migration, they are share one commonality: an end date. It is important to understand the
timing of any approaching business events, so activities and velocity can be planned and validated properly.

Recap
Before proceeding, document the following assumptions and share them with the cloud strategy team and the
cloud adoption teams:
Business outcomes.
Roles, documented and refined for the Assess, Migrate, Optimize, and Secure and Manage migration processes.
Definition of done, documented and refined separately for the Assess, Migrate, Optimize, and Secure and
Manage migration processes.
Effort type.
Effort scale.
Effort timing.

Next steps
After the process is understood among the team, it's time to review technical prerequisites. The migration
environment planning checklist helps to ensure that the technical foundation is ready for migration.
Once the process is understood among the team, its time to review technical prerequisites the [Migration Planning
Checklist] will help ensure the technical foundation is ready for migration.
Review the migration planning checklist
Migration environment planning checklist: validate
environmental readiness prior to migration
3 minutes to read • Edit Online

As an initial step in the migration process, you need to create the right environment in the cloud to receive, host,
and support migrating assets. This article provides a list of things to validate in the current environment prior to
migration.
The following checklist aligns with the guidance found in the Ready section of the Cloud Adoption Framework.
Review that section for guidance regarding execution of any of the following.

Effort type assumption


This article and checklist assume a rehost or cloud transition approach to cloud migration.

Governance alignment
The first and most important decision regarding any migration-ready environment is the choice of governance
alignment. Has a consensus been achieved regarding alignment of governance with the migration foundation? At
a minimum, the cloud adoption team should understand whether this migration is landing in a single environment
with limited governance, a fully governed environment factory, or some variant in between. For more options and
guidance on governance alignment, see the article on Governance and compliance alignment.

Cloud readiness implementation


Whether you choose to align with a broader cloud governance strategy or not for your initial migration, you will
need to ensure your cloud deployment environment is configured to support your workloads.
If you're planning to align your migration with a cloud governance strategy from the start, you'll need to apply the
Five Disciplines of Cloud Governance to help inform decisions on policies, toolchains, and enforcement
mechanisms that will align your cloud environment with overall corporate requirements. Consult the Cloud
Adoption Framework actionable governance design guides for examples of how to implement this model using
Azure services.
If your initial migrations are not closely aligned with a broader cloud governance strategy, the general issues of
organization, access, and infrastructure planning still need to be managed. Consult the Azure setup guide for help
with these cloud readiness decisions.
Cau t i on

We highly recommend that you develop a governance strategy for anything beyond your initial workload
migration.
Regardless of your level of governance alignment, you will need to make decisions related to the following topics.
Resource organization
Based on the governance alignment decision, an approach to the organization and deployment of resources
should be established prior to migration.
Nomenclature
A consistent approach for naming resources, along with consistent naming schemas, should be established prior
to migration.
Resource governance
A decision regarding the tools to govern resources should be made prior to migration. The tools do not need to be
fully implemented, but a direction should be selected and tested. The cloud governance team should define and
require the implementation of a minimum viable product (MVP ) for governance tooling prior to migration.

Network
Your cloud-based workloads will require the provisioning of virtual networks to support end-user and
administrative access. Based on resource organization and resource governance decisions, you should select a
network approach align it to IT security requirements. Further, your networking decisions should be aligned with
any hybrid network constraints required to operate the workloads in the migration backlog and support any
access to resources hosted on-premises.

Identity
Cloud-based identity services are a prerequisite for offering identity and access management (IAM ) for your cloud
resources. Align your identity management strategy with your cloud adoption plans before proceeding. For
example, when migrating existing on-premises assets, consider supporting a hybrid identity approach using
directory synchronization to allow a consistent set of user credentials across you on-premises and cloud
environments during and after the migration.

Next steps
If the environment meets the minimum requirements, it may be deemed approved for migration readiness.
Cultural complexity and change management helps to align roles and responsibilities to ensure proper
expectations during execution of the plan.
Cultural complexity and change management
Prepare for cultural complexity: aligning roles and
responsibilities
3 minutes to read • Edit Online

An understanding of the culture required to operate the existing datacenters is important to the success of any
migration. In some organizations, datacenter management is contained within centralized IT operations teams. In
these centralized teams, roles and responsibilities tend to be well defined and well understood throughout the
team. For larger enterprises, especially those bound by third-party compliance requirements, the culture tends to
be more nuanced and complex. Cultural complexity can lead to roadblocks that are difficult to understand and time
consuming to overcome.
In either scenario, it's wise to invest in the documentation of roles and responsibilities required to complete a
migration. This article outlines some of the roles and responsibilities seen in a datacenter migration, to serve as a
template for documentation that can drive clarity throughout execution.

Business functions
In any migration, there are a few key functions that are best executed by the business, whenever possible. Often, IT
is capable of completing the following tasks. However, engaging members of the business could aid in reducing
barriers later in the adoption process. It also ensures mutual investment from key stakeholders throughout the
migration process.

PROCESS ACTIVITY DESCRIPTION

Assess Business goals Define the desired business outcomes


of the migration effort.

Assess Priorities Ensure alignment with changing


business priorities and market
conditions.

Assess Justification Validate assumptions that drive


changing business justifications.

Assess Risk Help the cloud adoption team


understand the impact of tangible
business risks.

Assess Approve Review and approve the business


impact of proposed architecture
changes.

Optimize Change plan Define a plan for consumption of


change within the business, including
periods of low activities and change
freezes.

Optimize Testing Align power users capable of validating


performance and functionality.
PROCESS ACTIVITY DESCRIPTION

Secure and manage Interruption impact Aid the cloud adoption team in
quantifying the impact of a business
process interruption.

Secure and manage Service-level agreement (SLA) validation Aid the cloud adoption team in defining
service level agreements and acceptable
tolerances for business outages.

Ultimately, the cloud adoption team is accountable for each of these activities. However, establishing
responsibilities and a regular cadence with the business for the completion of these activities on an established
rhythm can improve stakeholder alignment and cohesiveness with the business.

Common roles and responsibilities


Each process within the discussion of the Cloud Adoption Framework migration principles includes a process
article outlining specific activities to align roles and responsibilities. For clarity during execution, a single
accountable party should be assigned for each activity, along with any responsible parties required to support
those activities. However, the following list contains a series of common roles and responsibilities that have a
higher degree of impact on migration execution. These roles should be identified early in the migration effort.

NOTE
In the following table, an accountable party should start the alignment of roles. That column should be customized to fit
existing processes for efficient execution. Ideally a single person should be named as the accountable party.

PROCESS ACTIVITY DESCRIPTION ACCOUNTABLE PARTY

Prerequisite Digital estate Align the existing inventory cloud strategy team
to basic assumptions, based
on business outcomes.

Prerequisite Migration backlog Prioritize the sequence of cloud strategy team


workloads to be migrated.

Assess Architecture Challenge initial assumptions cloud adoption team


to define the target
architecture based on usage
metrics.

Assess Approval Approve the proposed cloud strategy team


architecture.

Migrate Replication access Access to existing on- cloud adoption team


premises hosts and assets to
establish replication
processes.

Optimize Ready Validate that the system cloud adoption team


meets performance and cost
requirements prior to
promotion.
PROCESS ACTIVITY DESCRIPTION ACCOUNTABLE PARTY

Optimize Promote Permissions to promote a cloud adoption team


workload to production and
redirect production traffic.

Secure and manage Ops transition Document production cloud adoption team
systems prior to production
operations.

Cau t i on

For these activities, permissions and authorization heavily influence the accountable party, who must have direct
access to production systems in the existing environment or must have means of securing access through other
responsible actors. Determining this accountable party directly affects the promotion strategy during the migrate
and optimize processes.

Next steps
When the team has a general understanding of roles and responsibilities, it's time to begin preparing the technical
details of the migration. Understanding technical complexity and change management can help prepare the cloud
adoption team for the technical complexity of migration by aligning to an incremental change management
process.
Technical complexity and change management
Prepare for technical complexity: agile change
management
12 minutes to read • Edit Online

When an entire datacenter can be deprovisioned and re-created with a single line of code, traditional processes
struggle to keep up. The guidance throughout the Cloud Adoption Framework is built on practices like IT service
management (ITSM ), The Open Group Architecture Framework (TOGAF ), and others. However, to ensure agility
and responsiveness to business change, this framework molds those practices to fit agile methodologies and
DevOps approaches.
When shifting to an agile model where flexibility and iteration are emphasized, technical complexity and change
management are handled differently than they are in a traditional waterfall model focusing on a linear series of
migration steps. This article outlines a high-level approach to change management in an agile-based migration
effort. At the end of this article, you should have a general understanding of the levels of change management
and documentation involved in an incremental migration approach. Additional training and decisions are required
to select and implement agile practices based on that understanding. The intention of this article is to prepare
cloud architects for a facilitated conversation with project management to explain the general concept of change
management in this approach.

Address technical complexity


When changing any technical system, complexity and interdependency inject risk into project plans. Cloud
migrations are no exception. When moving thousands—or tens of thousands—of assets to the cloud, these risks
are amplified. Detecting and mapping all dependencies across a large digital estate could take years. Few
businesses can tolerate such a long analysis cycle. To balance the need for architectural analysis and business
acceleration, the Cloud Adoption Framework focuses on an INVEST model for product backlog management. The
following sections summarize this type of model.

INVEST in workloads
The term workload appears throughout the Cloud Adoption Framework. A workload is a unit of application
functionality that can be migrated to the cloud. It could be a single application, a layer of an application, or a
collection of an application. The definition is flexible and may change at various phrases of migration. The Cloud
Adoption Framework uses the term invest to define a workload.
INVEST is a common acronym in many agile methodologies for writing user stories or product backlog items,
both of which are units of output in agile project management tools. The measurable unit of output in a migration
is a migrated workload. The Cloud Adoption Framework modifies the INVEST acronym a bit to create a construct
for defining workloads:
Independent: A workload should not have any inaccessible dependencies. For a workload to be considered
migrated, all dependencies should be accessible and included in the migration effort.
Negotiable: As additional discovery is performed, the definition of a workload changes. The architects
planning the migration could negotiate factors regarding dependencies. Examples of negotiation points could
include prerelease of features, making features accessible over a hybrid network, or packaging all
dependencies in a single release.
Valuable: Value in a workload is measured by the ability to provide users with access to a production
workload.
Estimable: Dependencies, assets, migration time, performance, and cloud costs should all be estimable and
should be estimated prior to migration.
Small: The goal is to package workloads in a single sprint. However, this may not always be feasible. Instead,
teams are encouraged to plan sprints and releases to minimize the time required to move a workload to
production.
Testable: There should always be a defined means of testing or validating completion of the migration of a
workload.
This acronym is not intended as a basis for rigid adherence but should help guide the definition of the term
workload.

Migration backlog: Aligning business priorities and timing


The migration backlog allows you to track your top-level portfolio of migratable workloads. Prior to migration,
the cloud strategy team and the cloud adoption team are encouraged to perform a review of the current digital
estate, and agree to a prioritized list of workloads to be migrated. This list forms the basis of the initial migration
backlog.
Initially, workloads on the migration backlog are unlikely to meet the INVEST criteria outlined in the previous
section. Instead, they serve as a logical grouping of assets from an initial inventory as a placeholder for future
work. Those placeholders may not be technically accurate, but they serve as the basis for coordination with the
business.

The migration, release, and iteration backlogs track different levels of activity during migration processes.
In any migration backlog, the change management team should strive to obtain the following information for any
workload in the plan. At a minimum, this data should be available for any workloads prioritized for migration in
the next two or three releases.
Migration backlog data points
Business impact. Understanding of the impact to the business of missing the expected timeline or reducing
functionality during freeze windows.
Relative business priority. A ranked list of workloads based on business priorities.
Business owner. Document the one individual responsible for making business decisions regarding this
workload.
Technical owner. Document the one individual responsible for technical decisions related to this workload.
Expected timelines. When the migration is scheduled for completion.
Workload freezes. Time frames in which the workload should be ineligible for change.
Workload name.
Initial inventory. Any assets required to provide the functionality of the workload, including VMs, IT
appliances, data, applications, deployment pipelines, and others. This information is likely to be inaccurate.

Release backlog: Aligning business change and technical coordination


In the context of a migration, a release is an activity that deploys one or more workloads into production. A
release generally covers several iterations or technical work. However, it represents a single iteration of business
change. After one or more workloads have been prepared for production promotion, a release occurs. The
decision to package a release is made when the workloads migrated represent enough business value to justify
injecting change into a business environment. Releases are executed in conjunction with a business change plan,
after business testing has been completed. The cloud strategy team is responsible for planning and overseeing the
execution of a release to ensure that the desired business change is released.
A release backlog is the future state plan that defines a coming release. Release backlog is the pivot point between
business change management (migration backlog) and technical change management (sprint backlog). A release
backlog consists of a list of workloads from the migration backlog that align to a specific subset of business
outcome realization. Definition and submission of a release backlog to the cloud adoption team serve as a trigger
for deeper analysis and migration planning. After the cloud adoption team has verified the technical details
associated with a release, it can choose to commit to the release, establishing a release timeline based on current
knowledge.
Given the degree of analysis required to validate a release, the cloud strategy team should maintain a running list
of the next two to four releases. The team should also attempt to validate as much of the following information as
possible, before defining and submitting a release. A disciplined cloud strategy team capable of maintaining the
next four releases can significantly increase the consistency and accuracy of release timeline estimates.
Release backlog data points
A partnership between the cloud strategy team and the cloud adoption team collaborates to add the following
data points for any workloads in the release backlog:
Refined inventory. Validation of required assets to be migrated. Often validated through log or monitoring
data at the host, network, or OS level to ensure an accurate understanding of network and hardware
dependencies of each asset under standard load.
Usage patterns. An understanding of the patterns of usage from end users. These patterns often include an
analysis of end-user geographical distribution, network routes, seasonal usage spikes, daily/hourly usage
spikes, and end-user composition (interval versus external).
Performance expectations. Analysis of available log data capturing throughput, pageviews, network routes,
and other performance data required to replicate the end-user experience.
Dependencies. Analysis of network traffic and application usage patterns to identify any additional workload
dependencies, which should be factored into sequencing and environmental readiness. Don't include a
workload in a release until one of the following criteria can be met:
All dependent workloads have been migrated.
Network and security configurations have been implemented to allow the workload to access all
dependencies in alignment with existing performance expectations.
Desired migration approach. At the migration backlog level, the assumed migration effort is the only
consideration used in analysis. For instance, if the business outcome is an exit from an existing datacenter, all
migrations are assumed to be a rehost scenario in the migration backlog. In the release backlog, the cloud
strategy team and the cloud adoption team should evaluate the long-term value of additional features,
modernization, and continued development investments to evaluate whether a more modern approach should
be involved.
Business testing criteria. After a workload is added to the migration backlog, testing criteria should be
mutually agreed on. In some cases, testing criteria can be limited to a performance test with a defined power
user group. However, for statistical validation, an automated performance test is desired and should be
included. The existing instance of the application often has no automated testing capabilities. Should this prove
accurate, it is not uncommon for the cloud architects to work with power users to create a baseline load test
against the existing solution to establish a benchmark to be used during migration.
Release backlog cadence
In mature migrations, releases come in a regular cadence. The velocity of the cloud adoption team often
normalizes, producing a release every two to four iterations (approximately every one or two months). However,
this should be an organic outcome. Creating artificial release cadences can negatively affect the cloud adoption
team's ability to achieve consistent throughput.
To stabilize business impact, the cloud strategy team should establish a monthly release process with the business
to maintain regular dialogue but should also establish the expectation that it will be several months before a
regular release cadence can be predicted.

Sprint or iteration backlog: Aligning technical change and effort


A sprint, or iteration, is a consistent, time-bound unit of work. In the migration process, this is often measured in
two-week increments. However, it's not unheard of to have one-week or four-week iterations. Creating time-
bound iterations forces consistent intervals of effort completion and allows for more frequent adjustment to
plans, based on new learnings. During any given sprint, there are usually tasks for the assessment, migration, and
optimization of workloads defined in the migration backlog. Those units of work should be tracked and managed
in the same project-management tool as the migration and release backlog, to drive consistency across each level
of change management.
A sprint backlog, or iteration backlog, consists of the technical work to be completed in a single sprint or iteration,
dealing with migrating individual assets. That work should be derived from the list of workloads being migrated.
When using tools like Azure DevOps (previously Visual Studio Online) for project management, the work items in
a sprint would be children of the product backlog Items in a release backlog and the epics in a migration backlog.
Such a parent-child relationship allows for clarity at all levels of change management.
Within a single sprint or iteration, the cloud adoption team would work to deliver the committed amount of
technical work, driving toward the migration of a defined workload. This is the end result of the change
management strategy. When complete, these efforts can be tested by validating production readiness of a
workload staged in the cloud.
Large or complex sprint structures
For a small migration with a self-contained migration team, a single sprint could include all four phases of a
migration for a single workload (assess, migrate, optimize, and secure and manage). More commonly, each of
these processes shared by multiple teams in distinct work items across numerous sprints. Depending on the effort
type, effort scale, and roles, these sprints can take a few different shapes.
Migration factory. Large-scale migrations sometimes require an approach that resembles a factory in the
execution model. In this model, various teams are allocated to the execution of a specific migration process (or
subset of the process). After completion, the output of one team's sprint populates the backlog for the next
team. This is an efficient approach for large-scale rehost migrations of many potential workloads involving
thousands of virtual machines moving through phases of assessment, architecture, remediation, and
migration. However, for this approach to work, a new homogenous environment with streamlined change
management and approval processes is a must.
Migration waves. Another approach that works well for large migrations is a wave model. In this model,
division of labor isn't nearly as clear. Teams dedicate themselves to the migration process execution of
individual workloads. However, the nature of each sprint changes. In one sprint, the team may complete
assessment and architecture work. In another sprint, it may complete the migration work. In yet another sprint,
the focus would be on optimization and production release. This approach allows a core team to stay aligned
to workloads, seeing them through the process in its entirety. When using this approach, the diversity of skills
and context switching could reduce the potential velocity of the team, slowing the migration effort.
Additionally, roadblocks during approval cycles can cause significant delays. It is important to maintain options
in the release backlog to keep the team moving during blocked periods, with this model. It is also important to
cross-train team members and to ensure that skill sets align with the theme of each sprint.
Sprint backlog data points
The outcome of a sprint captures and documents the changes made to a workload, thus closing the change-
management loop. When completed, at a minimum, the following should be documented. Throughout the
execution of a sprint, this documentation should be completed in tandem with the technical work items.
Assets deployed. Any assets deployed to the cloud to host the workload.
Remediation. Any changes to the assets to prepare for cloud migration.
Configuration. Chosen configuration of any assets deployed, including any references to configuration
scripts.
Deployment model. Approach used to deploy the asset to the cloud, including references to any deployment
scripts or tools.
Architecture. Documentation of the architecture deployed to the cloud.
Performance metrics. Output of automated testing or business testing performed to validate performance at
the time of deployment.
Unique requirements or configuration. Any unique aspects of the deployment, configuration, or technical
requirements necessary to operate the workload.
Operational approval. Sign-off of validating operational readiness from the application owner and the IT
operations staff responsible for managing the workload post deployment.
Architecture approval. Sign-off from the workload owner and the cloud adoption team to validate any
architecture changes required to host each asset.

Next steps
After change management approaches have been established, its time to address the final prerequisite, Migration
backlog review
Migration backlog review
Migration backlog review
2 minutes to read • Edit Online

The actionable output of the plan phase is a migration backlog, which influences all of the prerequisites discussed
so far. Development of the migration backlog should be completed as a first prerequisite. This article serves as a
milestone to complete prerequisite activities. The cloud strategy team is accountable for the care and maintenance
of the digital estate. However, the realization of the resultant backlog is the responsibility of every member of the
migration effort. As a final prerequisite, the cloud strategy team and the cloud adoption team should review and
understand the migration backlog. During that review, the members of both teams must gain sufficient knowledge
to articulate the following key points in the migration backlog.

Business outcomes and metrics


Every member of the team should understand the desired business outcomes. Migrations take time. It's easy for
team members to become distracted by urgent but less important activities during migration. Establishing and
reinforcing the desired outcomes helps the team understand the priority and relative importance of the migration,
enabling better decision-making over time.
Tracking migration progress is equally important to the motivation of the team and to continued stakeholder
support. Progress can be tracked through migration KPIs and learning metrics. Regardless of how the effort is
tracked, it is important that the team is aware of these metrics so that they can evaluate performance during
subsequent iterations.

Business priorities
Sometimes, prioritizing one workload over another may seem illogical to the cloud adoption team. Understanding
the business priorities that drove those decisions can help maintain the team's motivation. It also allows the team
to make a stronger contribution to the prioritization process.

Core assumptions
The article on digital estate rationalization discusses the agility and time-saving impact of basic assumptions when
evaluating a digital estate. To fully realize those values, the cloud adoption team needs to understand the
assumptions and the reasons that they were established. That knowledge better equips the cloud adoption team to
challenge those assumptions.

Next steps
With a general understanding of the digital estate and migration backlog, the team is ready to move beyond
prerequisites and to begin assessing workloads.
Assess workloads
Assess assets prior to migration
3 minutes to read • Edit Online

Many of your existing workloads are ideal candidates for cloud migration, but not every asset is compatible with
cloud platforms and not all workloads can benefit from hosting in the cloud. Digital estate planning allows you to
generate an overall migration backlog of potential workloads to migrate. However, this planning effort is high-
level. It relies on assumptions made by the cloud strategy team and does not dig deeply into technical
considerations.
As a result, before migrating a workload to the cloud it's critical to assess the individual assets associated with that
workload for their migration suitability. During this assessment, your cloud adoption team should evaluate
technical compatibility, required architecture, performance/sizing expectations, and dependencies to ensure that
the migrated workload can be deployed to the cloud effectively.
The Assess process is the first of four incremental activities that occur within an iteration. As discussed in the
prerequisite article regarding technical complexity and change management, a decision should be made in
advance to determine how this phase is executed. In particular, will assessments be completed by the cloud
adoption team during the same sprint as the actual migration effort? Alternatively, will a wave or factory model be
used to complete assessments in a separate iteration? If the answer to this basic process question can't be
answered by every member of the team, it may be wise to revisit the Prerequisites" section.

Objective
Assess a migration candidate, evaluating the workload, associated assets, and dependencies prior to migration.

Definition of done
This process is complete when the following are known about a single migration candidate:
The path from on-premises to cloud, including production promotion approach decision, has been defined.
Any required approvals, changes, cost estimates, or validation processes have been completed to allow the
cloud adoption team to execute the migration.

Accountability during assessment


The cloud adoption team is accountable for the entire assessment process. However, members of the cloud
strategy team has a few responsibilities, as listed in the following section.

Responsibilities during assessment


In addition to the high-level accountability, there are actions that an individual or group needs to be directly
responsible for. The following are a few activities that require assignments to responsible parties:
Business priority. The team understands the purpose for migrating this workload, including any intended
impact to the business.
A member of the cloud strategy team should carry final responsibility for this activity, under the
direction of the cloud adoption team.
Stakeholder alignment. The team aligns expectations and priorities with internal stakeholders, identifying
success criteria for the migration. What does success look like post-migration?
Cost. The cost of the target architecture has been estimated, and the overall budget has been adjusted.
Migration support. The team has decided how the technical work of the migration will be completed,
including decisions regarding partner or Microsoft support.
Evaluation. The workload is evaluated for compatibility and dependencies.
This activity should be assigned to a subject matter expert who is familiar with the architecture and
operations of the candidate workload.
Architect. The team has agreed on the final state architecture for the migrated workload.
Backlog alignment. The cloud adoption team reviews requirements and commits to the migration of the
candidate workload. After commitment, the release backlog and iteration backlog are to be updated
accordingly.
Work breakdown structure or work-back schedule. The team establishes a schedule of major milestones
identifying goals for when planning, implementation, and review processes are completed.
Final approval. Any necessary approvers have reviewed the plan and have signed off on the approach to
migrate the asset.
To avoid surprises later in the process, at least one representative of the business should be involved in
the approval process.
Cau t i on

This full list of responsibilities and actions can support large and complex migrations involving multiple roles with
varying levels of responsibility, and requiring a detailed approval process. Smaller and simpler migration efforts
may not require all of roles and actions described here. To determine which of these activities add value and which
are unnecessary, your cloud adoption team and the cloud strategy team should use this complete process as part
of your first workload migration. After the workload has been verified and tested, the team can evaluate this
process and choose which actions to use moving forward.

Next steps
With a general understanding of the assessment process, you are ready to begin the process by aligning business
priorities.
Align business priorities
Business priorities: Maintaining alignment
3 minutes to read • Edit Online

Transformation is often defined as a dramatic or spontaneous change. At the board level, change can look like a
dramatic transformation. However, for those who work through the process of change in an organization,
transformation is a bit misleading. Under the surface, transformation is better described as a series of properly
executed transitions from one state to another.
The amount of time required to rationalize or transition a workload will vary, depending on the technical
complexity involved. However, even when this process can be applied to a single workload or group of
applications quickly, it takes time to produce substantial changes among a user base. It takes longer for changes to
propagate through various layers of existing business processes. If transformation is expected to shape behavior
patterns in consumers, the results can take longer to produce significant results.
Unfortunately, the market doesn't wait for businesses to transition. Consumer behavior patterns change on their
own, often unexpectedly. The market's perception of a company and its products can be swayed by social media or
a competitor's positioning. Fast and unexpected market changes require companies to be nimble and responsive.
The ability to execute processes and technical transitions requires a consistent, stable effort. Quick decisions and
nimble actions are needed to respond to market conditions. These two are at odds, making it easy for priorities to
fall out of alignment. This article describes approaches to maintaining transitional alignment during migration
efforts.

How can business and technical priorities stay aligned during a


migration?
The cloud adoption team and the cloud governance team focus on the execution of the current iteration and
current release. Iterations provide stable increments of technical work, thus avoiding costly disruptions that would
otherwise slow the progress of migration efforts. Releases ensure that the technical effort and energy stay focused
on the business objectives of the workload migration. A migration project could require many releases over an
extended period. By the time it is completed, market conditions have likely changed significantly.
In parallel, the cloud strategy team focuses on executing the business change plan and preparing for the next
release. The cloud strategy team generally looks at least one release ahead, and it monitors for changing market
conditions and adjusts the migration backlog accordingly. This focus of managing transformation and adjusting
the plan creates natural pivots around the technical work. When business priorities change, adoption is only one
release behind, creating technical and business agility.

Business alignment questions


The following questions can help the cloud strategy team shape and prioritize the migration backlog to help
ensure that the transformation effort best aligns with current business needs.
Has the cloud adoption team identified a list of workloads ready for migration?
Has the cloud adoption team selected a single candidate for an initial migration from that list of workloads?
Do the cloud adoption team and the cloud governance team have all of the necessary data regarding the
workload and cloud environment to be successful?
Does the candidate workload deliver the most relevant impact for the business in the next release?
Are there other workloads that are better candidates for migration?
Tangible actions
During the execution of the business change plan, the cloud strategy team monitors for positive and negative
results. When those observations require technical change, the adjustments are added as work items to the release
backlog to be prioritized in the next iteration.
When the market changes, the cloud strategy team works with the business to understand how to best respond to
the changes. When that response requires a change in migration priorities, the migration backlog is adjusted. This
moves up workloads that were previously lower in priority.

Next steps
With properly aligned business priorities, the cloud adoption team can confidently begin to evaluate workloads to
develop architecture and migration plans.
Evaluate workloads
Evaluate workload readiness
3 minutes to read • Edit Online

This activity focuses on evaluating readiness of a workload to migrate to the cloud. During this activity, the cloud
adoption team validates that all assets and associated dependencies are compatible with the chosen deployment
model and cloud provider. During the process, the team documents any efforts required to remediate
compatibility issues.

Evaluation assumptions
Most of the content discussing principles in the Cloud Adoption Framework is cloud agnostic. However, the
readiness evaluation process must be largely specific to each specific cloud platform. The following guidance
assumes an intention to migrate to Azure. It also assumes use of Azure Migrate (also known as Azure Site
Recovery) for replication activities. For alternative tools, see replication options.
This article doesn't capture all possible evaluation activities. It is assumed that each environment and business
outcome will dictate specific requirements. To help accelerate the creation of those requirements, the remainder of
this article shares a few common evaluation activities related to infrastructure, database, and network evaluation.

Common infrastructure evaluation activities


VMware requirements: Review the Azure Site Recovery requirements for VMware.
Hyper-V requirements: Review the Azure Site Recovery requirements for Hyper-V.
Be sure to document any discrepancies in host configuration, replicated VM configuration, storage requirements,
or network configuration.

Common database evaluation activities


Document the Recovery Point Objectives and Recovery Time Objectives of the current database deployment.
These are used in architecture activities to aid in decision-making.
Document any requirements for high-availability configuration. For assistance understanding SQL Server
requirements, see the SQL Server High Availability Solutions Guide.
Evaluate PaaS compatibility. The Azure Data Migration Guide maps on-premises databases to compatible
Azure PaaS solutions, like Cosmos DB or Azure DB for MySQL, PostgreSQL, or MariaDB.
When PaaS compatibility is an option without the need for any remediation, consult the team responsible for
architecture activities. PaaS migrations can produce significant time savings and reductions in the total cost of
ownership (TCO ) of most cloud solutions.
When PaaS compatibility is an option but remediation is required, consult the teams responsible for
architecture activities and remediation activities. In many scenarios, the advantages of PaaS migrations for
database solutions can outweigh the increase in remediation time.
Document the size and rate of change for each database to be migrated.
When possible, document any applications or other assets that make calls to each database.
NOTE
Synchronization of any asset consumes bandwidth during the replication processes. A very common pitfall is to overlook the
bandwidth consumption required to keep assets synchronized between the point of replication and release. Databases are
common consumers of bandwidth during release cycles, and databases with large storage footprints or a high rate of change
are especially concerning. Consider an approach of replicating the data structure, with controlled updates before user
acceptance testing (UAT) and release. In such scenarios, alternatives to Azure Site Recovery may be more appropriate. For
more detail, see guidance from the Azure Data Migration Guide.

Common network evaluation activities


Calculate the total storage for all VMs to be replicated during the iterations leading up to a release.
Calculate the drift or change rate of storage for all VMs to be replicated during the iterations leading up to a
release.
Calculate the bandwidth requirements needed for each iteration by summing total storage and drift.
Calculate unused bandwidth available on the current network to validate per iteration alignment.
Document bandwidth needed to reach anticipated migration velocity. If any remediation is required to provide
necessary bandwidth, notify the team responsible for remediation activities.

NOTE
Total storage directly affects bandwidth requirements during initial replication. However, storage drift continues from the
point of replication until release. This means that drift has a cumulative effect on available bandwidth.

Next steps
After the evaluation of a system is complete, the outputs feed the development of a new cloud architecture.
Architect workloads prior to migration
Architect workloads prior to migration
3 minutes to read • Edit Online

This article expands on the assessment process by reviewing activities associated with defining the architecture of
a workload within a given iteration. As discussed in the article on incremental rationalization, some architectural
assumptions are made during any business transformation that requires a migration. This article clarifies those
assumptions, shares a few roadblocks that can be avoided, and identifies opportunities to accelerate business
value by challenging those assumptions. This incremental model for architecture allows teams to move faster and
to obtain business outcomes sooner.

Architecture assumptions prior to migration


The following assumptions are typical for any migration effort:
IaaS. It is commonly assumed that migrating workloads primarily involves the movement of virtual machines
from a physical datacenter to a cloud datacenter via an IaaS migration, requiring a minimum of
redevelopment or reconfiguration. This is known as a lift and shift migration. (Exceptions follow.)
Architecture consistency. Changes to core architecture during a migration considerably increase complexity.
Debugging a changed system on a new platform introduces many variables that can be difficult to isolate. For
this reason, workloads should undergo only minor changes during migration and any changes should be
thoroughly tested.
Retirement test. Migrations and the hosting of assets consume operational and potential capital expenses. It
is assumed that any workloads being migrated have been reviewed to validate ongoing usage. The choice to
retire unused assets produces immediate cost savings.
Resize assets. It is assumed that few on-premises assets are fully using the allocated resources. Prior to
migration, it is assumed that assets will be resized to best fit actual usage requirements.
Business continuity and disaster recovery (BCDR) requirements. It is assumed that an agreed-on SLA
for the workload has been negotiated with the business prior to release planning. These requirements are
likely to produce minor architecture changes.
Migration downtime. Likewise, downtime to promote the workload to production can have an adverse effect
on the business. Sometimes, the solutions that must transition with minimum downtime need architecture
changes. It is assumed that a general understanding of downtime requirements has been established prior to
release planning.

Roadblocks that can be avoided


The itemized assumptions can create roadblocks that could slow progress or cause later pain points. The
following are a few roadblocks to watch for, prior to the release:
Paying for technical debt. Some aging workloads carry with them a high amount of technical debt. This can
lead to long-term challenges by increasing hosting costs with any cloud provider. When technical debt
unnaturally increases hosting costs, alternative architectures should be evaluated.
User traffic patterns. Existing solutions may depend on existing network routing patterns. These patterns
could slow performance considerably. Further, introduction of new hybrid wide area network (WAN ) solutions
can take weeks or even months. Prepare early in the architecture process for these roadblocks by considering
traffic patterns and changes to any core infrastructure services.

Accelerate business value


Some scenarios could require an different architecture than the assumed IaaS rehosting strategy. The following
are a few examples:
PaaS alternatives. PaaS deployments can reduce hosting costs, and they can also reduce the time required to
migrate certain workloads. For a list of approaches that could benefit from a PaaS conversion, see the article
on evaluating assets.
Scripted deployments/DevOps. If a workload has an existing DevOps deployment or other forms of scripted
deployment, the cost of changing those scripts could be lower than the cost of migrating the asset.
Remediation efforts. The remediation efforts required to prepare a workload for migration can be extensive. In
some cases, it makes more sense to modernize the solution than it does to remediate underlying compatibility
issues.
In each of these itemized scenarios, an alternative architecture could be the best possible solution.

Next steps
After the new architecture is defined, accurate cost estimations can be calculated.
Estimate cloud costs
Estimate cloud costs
2 minutes to read • Edit Online

During migration, there are several factors that can affect decisions and execution activities. To help understand
which of those options are best for different situations, this article discusses various options for estimating cloud
costs.

Digital estate size


The size of your digital estate directly affects migration decisions. Migrations that involve fewer than 250 VMs can
be estimated much more easily than a migration involving 10,000+ VMs. It's highly recommended that you select
a smaller workload as your first migration. This gives your team a chance to learn how to estimate the costs of a
simple migration effort before attempting to estimate larger and more complicated workload migrations.
However, note that smaller, single-workload, migrations can still involve a widely varying amount of supporting
assets. If your migration involves under 1,000 VMs, a tool like Azure Migrate is likely sufficient to gather data on
the inventory and forecast costs. Additional cost-estimate tooling options are described in the article on digital
estate cost calculations.
For 1,000+ unit digital estates, it's still possible to break down an estimate into four or five actionable iterations,
making the estimation process manageable. For larger estates or when a higher degree of forecast accuracy is
required, a more comprehensive approach, like that outlined in the "Digital estate" section of the Cloud Adoption
Framework, will likely be required.

Accounting models
Accounting models
If you are familiar with traditional IT procurement processes, estimation in the cloud may seem foreign. When
adopting cloud technologies, acquisition shifts from a rigid, structured capital expense model to a fluid operating
expense model. In the traditional capital expense model, the IT team would attempt to consolidate buying power
for multiple workloads across various programs to centralize a pool of shared IT assets that could support each of
those solutions. In the operating expenses cloud model, costs can be directly attributed to the support needs of
individual workloads, teams, or business units. This approach allows for a more direct attribution of costs to the
supported internal customer. When estimating costs, it's important to first understand how much of this new
accounting capability will be used by the IT team.
For those wanting to replicate the legacy capital expense approach to accounting, use the outputs of either
approach suggested in the "Digital estate size" section above to get an annual cost basis. Next, multiply that
annual cost by the company's typical hardware refresh cycle. Hardware refresh cycle is the rate at which a
company replaces aging hardware, typically measured in years. Annual run rate multiplied by hardware refresh
cycle creates a cost structure similar to a capital expense investment pattern.

Next steps
After estimating costs, migration can begin. However, it would be wise to review partnership and support options
before beginning any migration.
Understanding partnership options
Understand partnership options
6 minutes to read • Edit Online

During migration, the cloud adoption team performs the actual migration of workloads to the cloud. Unlike the
collaborative and problem-solving tasks when defining the digital estate or building the core cloud infrastructure,
migration tends to be a series of repetitive execution tasks. Beyond the repetitive aspects, there are likely testing
and tuning efforts that require deep knowledge of the chosen cloud provider. The repetitive nature of this process
can sometimes be best addressed by a partner, reducing strain on full-time staff. Additionally, partners may be
able to better align deep technical expertise when the repetitive processes encounter execution anomalies.
Partners tend to be closely aligned with a single cloud vendor or a small number of cloud vendors. To better
illustrate partnership options, the remainder of this article assumes that Microsoft Azure is the chosen cloud
provider.
During plan, build, or migrate, a company generally has four execution partnership options:
Guided self-service. The existing technical team executes the migration, with help from Microsoft.
FastTrack for Azure. Use the Microsoft FastTrack for Azure program to accelerate migration.
Solutions Partner. Get connected with Azure Solutions Partners or Cloud Solutions Partners (CSPs) to
accelerate migration.
Supported self-service. Execution is completed by the existing technical staff with support from Microsoft.

Guided self-service
If an organization is planning an Azure migration on its own, Microsoft is always there to assist throughout the
journey. To help fast-track migration to Azure, Microsoft and its partners have developed an extensive set of
architectures, guides, tools, and services to reduce risk and to speed migration of virtual machines, applications,
and databases. These tools and services support a broad selection of operating systems, programming languages,
frameworks, and databases.
Assessment and migration tools. Azure provides a wide range of tools to be used in different phases for
your cloud transformation, including assessing your existing infrastructure. For more , refer to the "Assess"
section in the "Migration" chapter that follows.
Microsoft Cloud Adoption Framework. This framework presents a structured approach to cloud adoption
and migration. It is based on best practices across many Microsoft-supported customer engagements and is
organized as a series of steps, from architecture and design to implementation. For each step, supporting
guidance helps you with the design of your application architecture.
Cloud design patterns. Azure provides some useful cloud design patterns for building reliable, scalable,
secure workloads in the cloud. Each pattern describes the problem that the pattern addresses, considerations
for applying the pattern, and an example based on Azure. Most of the patterns include code samples or
snippets that show how to implement the pattern on Azure. However, they are relevant to any distributed
system, whether hosted on Azure or on other cloud platforms.
Cloud fundamentals. Fundamentals help teach the basic approaches to implementation of core concepts.
This guide helps technicians think about solutions that go beyond a single Azure service.
Example scenarios. The guide provides references from real customer implementations, outlining the tools,
approaches, and processes that past customers have followed to accomplish specific business goals.
Reference architectures. Reference architectures are arranged by scenario, with related architectures
grouped together. Each architecture includes best practices, along with considerations for scalability, availability,
manageability, and security. Most also include a deployable solution.
FastTrack for Azure
FastTrack for Azure provides direct assistance from Azure engineers, working hand in hand with partners, to help
customers build Azure solutions quickly and confidently. FastTrack brings best practices and tools from real
customer experiences to guide customers from setup, configuration, and development to production of Azure
solutions, including:
Datacenter migration
Windows Server on Azure
Linux on Azure
SAP on Azure
Business continuity and disaster recovery (BCDR )
High-performance computing*
Cloud-native apps
DevOps
App modernization
Cloud-scale analytics**
Intelligent apps
Intelligent agents**
Data modernization to Azure
Security and management
Globally distributed data
IoT***
*Limited preview in United States, Canada, United Kingdom, and Western Europe
**Limited preview in United Kingdom and Western Europe
***Available in H2 2019
During a typical FastTrack for Azure engagement, Microsoft helps to define the business vision to plan and
develop Azure solutions successfully. The team assesses architectural needs and provides guidance, design
principles, tools, and resources to help build, deploy, and manage Azure solutions. The team matches skilled
partners for deployment services on request and periodically checks in to ensure that deployment is on track and
to help remove blockers.
The main phases of a typical FastTrack for Azure engagement are:
Discovery. Identify key stakeholders, understand the goal or vision for problems to be solved, and then assess
architectural needs.
Solution enablement. Learn design principles for building applications, review architecture of applications
and solutions, and receive guidance and tools to drive proof of concept (PoC ) work through to production.
Continuous partnership. Azure engineers and program managers check in every so often to ensure that
deployment is on track and to help remove blockers.

Microsoft Services offerings aligned to Cloud Adoption Framework


approaches
Assess: Microsoft Services uses a unified, data and tool driven approach consisting of architectural workshops,
Azure real-time information, security and identity threat models and various tools to provide insights into
challenges, risks, recommendations and issues to an existing Azure environment with a key outcome such as high-
level modernization roadmap.
Adopt: Through Microsoft Services' Azure Cloud Foundation, establish your core Azure designs, patterns and
governance architecture by mapping your requirements to the most appropriate reference architecture and plan,
design and deploy the infrastructure, management, security, and identity required for workloads.
Migrate/Optimize: Microsoft Services' Cloud Modernization Solution offers a comprehensive approach to move
applications and infrastructure to Azure, as well as to optimize and modernize after cloud deployment, backed by
streamlined migration.
Innovate: Microsoft Services' Cloud center of excellence (CCoE ) solution offers a DevOps coaching engagement
and uses DevOps principles combined with prescriptive cloud-native service management and security controls to
help drive business innovation, increase agility, and reduce time to value within a secure, predictable, and flexible
services delivery and operations management capability.

Azure Support
If you have questions or need help, create a support request. If your support request requires deep technical
guidance, visit Azure Support Plans to align the best plan for your needs.

Azure Solutions Partner


Microsoft Certified Solution Providers specialize in providing up-to-date, Microsoft technology–based customer
solutions all over the world. Optimize your business in the cloud with help from an experienced partner.
Get help from partners with ready-made or custom Azure solutions and partners who can help deploy and
manage those solutions:
Find a Cloud Solutions Partner. A certified CSP can help take full advantage of the cloud by assessing
business goals for cloud adoption, identifying the right cloud solution that meets business needs and helps the
business become more agile and efficient.
Find a Managed Service Partner. An Azure managed service partner (MSP ) helps a business transition to
Azure by guiding all aspects of the cloud journey. From consulting to migrations and operations management,
cloud MSPs show customers all the benefits that come with cloud adoption. They also act as a one-stop shop
for common support, provisioning, and the billing experience, all with a flexible pay-as-you-go (PAYG ) business
model.

Next steps
After a partner and support strategy is selected, the release and iteration backlogs can be updated to reflect
planned efforts and assignments.
Manage change using release and iteration backlogs
Manage change in an incremental migration effort
2 minutes to read • Edit Online

This article assumes that migration processes are incremental in nature, running parallel to the govern process.
However, the same guidance could be used to populate initial tasks in a work breakdown structure for traditional
waterfall change management approaches.

Release backlog
A release backlog consists of a series of assets (VMs, databases, files, and applications, among others) that must
be migrated before a workload can be released for production usage in the cloud. During each iteration, the cloud
adoption team documents and estimates the efforts required to move each asset to the cloud. See the "Iteration
backlog" section that follows.

Iteration backlog
An iteration backlog is a list of the detailed work required to migrate a specific number of assets from the existing
digital estate to the cloud. The entries on this list are often stored in an agile management tool, like Azure
DevOps, as work items.
Prior to starting the first iteration, the cloud adoption team specifies an iteration duration, usually two to four
weeks. This time box is important to create a start and finish time period for each set of committed activities.
Maintaining consistent execution windows makes it easy to gauge velocity (pace of migration) and alignment to
changing business needs.
Prior to each iteration, the team reviews the release backlog, estimating the effort and priorities of assets to be
migrated. It then commits to deliver a specific number of agreed-on migrations. After this is agreed to by the
cloud adoption team, the list of activities becomes the current iteration backlog.
During each iteration, team members work as a self-organizing team to fulfill commitments in the current
iteration backlog.

Next steps
After an iteration backlog is defined and accepted by the cloud adoption team, change management approvals
can be finalized.
Approve architecture changes prior to migration
Approve architecture changes before migration
4 minutes to read • Edit Online

During the assess process of migration, each workload is evaluated, architected, and estimated to develop a future
state plan for the workload. Some workloads can be migrated to the cloud with no change to the architecture.
Maintaining on-premises configuration and architecture can reduce risk and streamline the migration process.
Unfortunately, not every application can run in the cloud without changes to the architecture. When architecture
changes are required, this article can help classify the change and can provide some guidance on the proper
approval activities.

Business impact and approval


During migration, some things are likely to change in ways that impact the business. Although change sometimes
can't be avoided, surprises as a result of undisclosed or undocumented changes should be. To maintain
stakeholder support throughout the migration effort, it's important to avoid surprises. Surprising application
owners or business stakeholders can slow or halt a cloud adoption effort.
Prior to migration, it is important to prepare the workload's business owner for any changes that could affect
business processes, such as changes to:
Service-level agreements.
Access patterns or security requirements that impact the end user.
Data retention practices.
Core application performance.
Even when a workload can be migrated with minimal to no change, there could still be a business impact.
Replication processes can slow the performance of production systems. Changes to the environment in
preparation for migration have the potential to cause routing or network performance limitations. There are many
additional impacts that could result from replication, staging, or promotion activities.
Regular approval activities can help minimize or avoid surprises as a result of change or performance-driven
business impacts. The cloud adoption team should execute a change approval process at the end of the
assessment process, before beginning the migration process.

Existing culture
Your IT teams likely have existing mechanisms for managing change involving your on-premises assets. Typically
these mechanisms are governed by traditional Information Technology Infrastructure Library–based (ITIL -based)
change management processes. In many enterprise migrations, these processes involve a Change Advisory Board
(CAB ) that is responsible for reviewing, documenting, and approving all IT-related requests for changes (RFC ).
The CAB generally includes experts from multiple IT and business teams, offering a variety of perspectives and
detailed review for all IT-related changes. A CAB approval process is a proven way to reduce risk and minimize the
business impact of changes involving stable workloads managed by IT operations.

Technical approval
Organizational readiness for the approval of technical change is among the most common reasons for cloud
migration failure. More projects are stalled by a series of technical approvals than any deficit in a cloud platform.
Preparing the organization for technical change approval is an important requirement for migration success. The
following are a few best practices to ensure that the organization is ready for technical approval.
ITIL Change Advisory Board challenges
Every change management approach has its own set of controls and approval processes. Migration is a series of
continuous changes that start with a high degree of ambiguity and develop additional clarity through the course of
execution. As such, migration is best governed by agile-based change management approaches, with the cloud
strategy team serving as a product owner.
However, the scale and frequency of change during a cloud migration doesn't fit well with the nature of ITIL
processes. The requirements of a CAB approval can risk the success of a migration, slowing or stopping the effort.
Further, in the early stages of migration, ambiguity is high and subject matter expertise tends to be low. For the
first several workload migrations or releases, the cloud adoption team is often in a learning mode. As such, it could
be difficult for the team to provide the types of data needed to pass a CAB approval.
The following best practices can help the CAB maintain a degree of comfort during migration without become a
painful blocker.
Standardize change
It is tempting for a cloud adoption team to consider detailed architectural decisions for each workload being
migrated to the cloud. It is equally tempting to use cloud migration as a catalyst to refactor past architectural
decisions. For organizations that are migrating a few hundred VMs or a few dozen workloads, either approach can
be properly managed. When migrating a datacenter consisting of 1,000 or more assets, each of these approaches
is considered a high-risk antipattern that significantly reduces the likelihood of success. Modernizing, refactoring,
and rearchitecting every application require diverse skill sets and a significant variety of changes, and these tasks
create dependencies on human efforts at scale. Each of these dependencies injects risk into the migration effort.
The article on digital estate rationalization discusses the agility and time-saving impact of basic assumptions when
rationalizing a digital estate. There is an additional benefit of standardized change. By choosing a default
rationalization approach to govern the migration effort, the Cloud Advisory Board or product owner can review
and approve the application of one change to a long list of workloads. This reduces technical approval of each
workload to those that require a significant architecture change to be cloud compatible.
Clarify expectations and roles of approvers
Before the first workload is assessed, the cloud strategy team should document and communicate the expectations
of anyone involved in the approval of change. This simple activity can avoid costly delays when the cloud adoption
team is fully engaged.
Seek approval early
When possible, technical change should be detected and documented during the assessment process. Regardless
of approval processes, the cloud adoption team should engage approvers early. The sooner that change approval
can begin, the less likely an approval process is to block migration activities.

Next steps
With the help of these best practices, it should be easier to integrate proper, low -risk approval into migration
efforts. After workload changes are approved, the cloud adoption team is ready to migrate workloads.
Migrate workloads
Execute a migration
2 minutes to read • Edit Online

After a workload has been assessed, it can be migrated to the cloud. This series of articles explains the various
activities that may be involved in the execution of a migration.

Objective
The objective of a migration is to migrate a single workload to the cloud.

Definition of done
The migration phase is complete when a workload is staged and ready for testing in the cloud, including all
dependent assets required for the workload to function. During the optimize process, the workload is prepared for
production usage.
This definition of done can vary, depending on your testing and release processes. The next article in this series
covers deciding on a promotion model and can help you understand when it would be best to promote a migrated
workload to production.

Accountability during migration


The cloud adoption team is accountable for the entire migration process. However, members of the cloud strategy
team have a few responsibilities, as discussed in the following section.

Responsibilities during migration


In addition to the high-level accountability, there are actions that an individual or group needs to be directly
responsible for. The following are a few activities that require assignments to responsible parties:
Remediation. Resolve any compatibility issues that prevent the workload from being migrated to the cloud.
As discussed in the prerequisite article regarding technical complexity and change management, a
decision should be made in advance to determine how this activity is to be executed. In particular, will
remediation be completed by the cloud adoption team during the same sprint as the actual migration
effort? Alternatively, will a wave or factory model be used to complete remediation in a separate
iteration? If the answer to this basic process question can't be answered by every member of the team, it
may be wise to revisit the section on prerequisites.
Replication. Create a copy of each asset in the cloud to synchronize VMs, data, and applications with
resources in the cloud.
Depending on the promotion model, different tools may be required to complete this activity.
Staging. After all assets for a workload have been replicated and verified, the workload can be staged for
business testing and execution of a business change plan.

Next steps
With a general understanding of the migration process, you are ready to decide on a promotion model.
Decide on a promotion model
Promotion models: single-step, staged, or flight
5 minutes to read • Edit Online

Workload migration is often discussed as a single activity. In reality, it is a collection of smaller activities that
facilitate the movement of a digital asset to the cloud. One of the last activities in a migration is the promotion of
an asset to production. Promotion is the point at which the production system changes for end users. It can often
be as simple as changing the network routing, redirecting end users to the new production asset. Promotion is
also the point at which IT operations or cloud operations change the focus of operational management processes
from the previous production system to the new production systems.
There are several promotion models. This article outlines three of the most common ones used in cloud
migrations. The choice of a promotion model changes the activities seen within the migrate and optimize
processes. As such, promotion model should be decided early in a release.

Impact of promotion model on migrate and optimize activities


In each of the following promotion models, the chosen migration tool replicates and stages the assets that make
up a workload. After staging, each model treats the asset a bit differently.
Single-step promotion. In a single-step promotion model, the staging process doubles as the promotion
process. After all assets are staged, end-user traffic is rerouted and staging becomes production. In such a case,
promotion is part of the migration process. This is the fastest migration model. However, this approach makes
it more difficult to integrate robust testing or optimization activities. Further, this type of model assumes that
the migration team has access to the staging and production environment, which compromises separation of
duty requirements in some environments.

NOTE
The table of contents for this site lists the promotion activity as part of the optimize process. In a single-step model,
promotion occurs during the migrate process. When using this model, roles and responsibilities should be updated
to reflect this.

Staged. In a staged promotion model, the workload is considered migrated after it is staged, but it is not yet
promoted. Prior to promotion, the migrated workload undergoes a series of performance tests, business tests,
and optimization changes. It is then promoted at a future date in conjunction with a business test plan. This
approach improves the balance between cost and performance, while making it easier to obtain business
validation.
Flight. The flight promotion model combines single-step and staged models. In a flight model, the assets in
the workload are treated like production after landing in staging. After a condensed period of automated
testing, production traffic is routed to the workload. However, it is a subset of the traffic. That traffic serves as
the first flight of production and testing. Assuming the workload performs from a feature and performance
perspective, additional traffic is migrated. After all production traffic has been moved onto the new assets, the
workload is considered fully promoted.
The chosen promotion model affects the sequence of activities to be performed. It also affects the roles and
responsibilities of the cloud adoption team. It may even impact the composition of a sprint or multiple sprints.

Single-step promotion
This model uses migration automation tools to replicate, stage, and promote assets. The assets are replicated into
a contained staging environment controlled by the migration tool. After all assets have been replicated, the tool
can execute an automated process to promote the assets into the chosen subscription in a single step. While in
staging, the tool continues to replicate the asset, minimizing loss of data between the two environments. After an
asset is promoted, the linkage between the source system and the replicated system is severed. In this approach, if
additional changes occur in the initial source systems, the changes are lost.
Pros. Positive benefits of this approach include:
This model introduces less change to the target systems.
Continuous replication minimizes data loss.
If a staging process fails, it can quickly be deleted and repeated.
Replication and repeated staging tests enable an incremental scripting and testing process.
Cons. Negative aspects of this approach include:
Assets staged within the tools-isolated sandbox don't allow for complex testing models.
During replication, the migration tool consumes bandwidth in the local datacenter. Staging a large volume of
assets over an extended duration has an exponential impact on available bandwidth, hurting the migration
process and potentially affecting performance of production workloads in the on-premises environment.

Staged promotion
In this model, the staging sandbox managed by the migration tool is used for limited testing purposes. The
replicated assets are then deployed into the cloud environment, which serves as an extended staging environment.
The migrated assets run in the cloud, while additional assets are replicated, staged, and migrated. When full
workloads become available, richer testing is initiated. When all assets associated with a subscription have been
migrated, the subscription and all hosted workloads are promoted to production. In this scenario, there is no
change to the workloads during the promotion process. Instead, the changes tend to be at the network and
identity layers, routing users to the new environment and revoking access of the cloud adoption team.
Pros. Positive benefits of this approach include:
This model provides more accurate business testing opportunities.
The workload can be studied more closely to better optimize performance and cost of the assets.
A larger numbers of assets can be replicated within similar time and bandwidth constraints.
Cons. Negative aspects of this approach include:
The chosen migration tool can't facilitate ongoing replication after migration.
A secondary means of data replication is required to synchronize data platforms during the staged time frame.

Flight promotion
This model is similar to the staged promotion model. However, there is one fundamental difference. When the
subscription is ready for promotion, end-user routing happens in stages or flights. At each flight, additional users
are rerouted to the production systems.
Pros. Positive benefits of this approach include:
This model mitigates the risks associated with a big migration or promotion activity. Errors in the migrated
solution can be identified with less impact to business processes.
It allows for monitoring of workload performance demands in the cloud environment for an extended duration,
increasing accuracy of asset-sizing decisions.
Larger numbers of assets can be replicated within similar time and bandwidth constraints.
Cons. Negative aspects of this approach include:
The chosen migration tool can't facilitate ongoing replication after migration.
A secondary means of data replication is required to synchronize data platforms during the staged time frame.

Next steps
After a promotion model is defined and accepted by the cloud adoption team, remediation of assets can begin.
Remediating assets prior to migration
Remediate assets prior to migration
4 minutes to read • Edit Online

During the assessment process of migration, the team seeks to identify any configurations that would make an
asset incompatible with the chosen cloud provider. Remediate is a checkpoint in the migration process to ensure
that those incompatibilities have been resolved. This article discusses a few common remediation tasks for
reference. It also establishes a skeleton process for deciding whether remediation is a wise investment.

Common remediation tasks


In any corporate environment, technical debt exists. Some of this is healthy and expected. Architecture decisions
that were well suited for an on-premises environment may not be entirely suitable in a cloud platform. In either
case, common remediation tasks may be required to prepare assets for migration. The following are a few
examples:
Minor host upgrades. Occasionally, an outdated host needs to be upgraded prior to replication.
Minor guest OS upgrades. It is more likely that an OS will need patching or upgrading prior to replication.
SLA modifications. Backup and recovery change significantly in a cloud platform. It is likely that assets will
need minor modifications to their backup processes to ensure continued function in the cloud.
PaaS migration. In some cases, a PaaS deployment of a data structure or application may be required to
accelerate deployment. Minor modifications may be required to prepare the solution for PaaS deployment.
PaaS code changes. It is not uncommon for custom applications to require minor code modifications to be
PaaS ready. Examples could include methods that write to local disk or use of in-memory session state, among
others.
Application configuration changes. Migrated applications may require changes to variables, such as
network paths to dependent assets, service account changes, or updates to dependent IP addresses.
Minor changes to network paths. Routing patterns may need to be modified to properly route user traffic to
the new assets.

NOTE
This isn't production routing to the new assets, but rather configuration to allow for proper routing to the assets in
general.

Large-scale remediation tasks


When a datacenter is properly maintained, patched, and updated, there is likely to be little need for remediation.
Remediation-rich environments tend to be common among large enterprises, organizations that have been
through large IT downsizing, some legacy managed service environments, and acquisition-rich environments. In
each of these types of environments, remediation may consume a large portion of the migration effort. When the
following remediation tasks frequently appear and are negatively affecting migration speed or consistency, it may
be wise to break out remediation into a parallel effort and team (similar to how cloud adoption and cloud
governance run in parallel).
Frequent host upgrades. When large numbers of hosts must be upgraded to complete the migration of a
workload, the migration team is likely to suffer from delays. It may be wise to break out affected applications
and address the remediations prior to including affected applications in any planned releases.
Frequent guest OS upgrade. Large enterprises commonly have servers running on outdated versions of
Linux or Windows. Aside from the apparent security risks of operating an outdated OS, there are also
incompatibility issues that prevent affected workloads from being migrated. When a large number of VMs
require OS remediation, it may be wise to break out these efforts into a parallel iteration.
Major code changes. Older custom applications may require significantly more modifications to prepare
them for PaaS deployment. When this is the case, it may be wise to remove them from the migration backlog
entirely, managing them in a wholly separate program.

Decision framework
While remediation for smaller workloads can be straightforward, which is one of the reasons it's recommended
you choose smaller workload for your initial migration. However, as your migration efforts mature and you begin
to tackle larger workloads, remediation can be a time consuming and costly process. For example, remediation
efforts for a Windows Server 2003 migration involving a 5,000+ VM pool of assets can delay a migration by
months. When such large-scale remediation is required, the following questions can help guide decisions:
Have all workloads affected by the remediation been identified and notated in the migration backlog?
For workloads that are not affected, will a migration produce a similar return on investment (ROI)?
Can the affected assets be remediated in alignment with the original migration timeline? What impact would
timeline changes have on ROI?
Is it economically feasible to remediate the assets in parallel with migration efforts?
Is there sufficient bandwidth on staff to remediate and migrate? Should a partner be engaged to execute one
or both tasks?
If these questions don't yield favorable answers, a few alternative approaches that move beyond a basic IaaS
rehosting strategy may be worth considering:
Containerization. Some assets can be hosted in a containerized environment without remediation. This could
produce less-than-favorable performance and doesn't resolve security or compliance issues.
Automation. Depending on the workload and remediation requirements, it may be more profitable to script
the deployment to new assets using a DevOps approach.
Rebuild. When remediation costs are very high and business value is equally high, a workload may be a good
fit as a candidate for rebuilding or rearchitecting.

Next steps
After remediation is complete, replication activities are ready.
Replicate assets
What role does replication play in the migration
process?
4 minutes to read • Edit Online

On-premises datacenters are filled with physical assets like servers, appliances, and network devices. However,
each server is only a physical shell. The real value comes from the binary running on the server. The applications
and data are the purpose for the datacenter. Those are the primary binaries to migrate. Powering these
applications and data stores are other digital assets and binary sources, like operating systems, network routes,
files, and security protocols.
Replication is the workhorse of migration efforts. It is the process of copying a point-in-time version of various
binaries. The binary snapshots are then copied to a new platform and deployed onto new hardware, in a process
referred to as seeding. When executed properly, the seeded copy of the binary should behave identically to the
original binary on the old hardware. However, that snapshot of the binary is immediately out of date and
misaligned with the original source. To keep the new binary and the old binary aligned, a process referred to as
synchronization continuously updates the copy stored in the new platform. Synchronization continues until the
asset is promoted in alignment with the chosen promotion model. At that point, the synchronization is severed.

Required prerequisites to replication


Prior to replication, the new platform and hardware must be prepared to receive the binary copies. The article on
prerequisites outlines minimum environment requirements to help create a safe, robust, performant platform to
receive the binary replicas.
The source binaries must also be prepared for replication and synchronization. The articles on assessment,
architecture, and remediation each address the actions necessary to ensure that the source binary is ready for
replication and synchronization.
A toolchain that aligns with the new platform and source binaries must be implemented to execute and manage
the replication and synchronization processes. The article on replication options outlines various tools that could
contribute to a migration to Azure.

Replication risks - physics of replication


When planning for the replication of any binary source to a new destination, there are a few fundamental laws to
seriously consider during planning and execution.
Speed of light. When moving high volumes of data, fiber is still the fastest option. Unfortunately, those
cables can only move data at two-thirds the speed of light. This means that there is no method for
instantaneous or unlimited replication of data.
Speed of WAN pipeline. More consequential than the speed of data movement is the uplink bandwidth,
which defines the volume of data per second that can be carried over a company's existing WAN to the target
datacenter.
Speed of WAN expansion. If budgets allow, additional bandwidth can be added to a company's WAN
solution. However, it can take weeks or months to procure, provision, and integrate additional fiber
connections.
Speed of disks. If data could move faster and there was no limit to the bandwidth between the source binary
and the target destination, physics would still be a limiter. Data can be replicated only as quickly as it can be
read from source disks. Reading every one or zero from every spinning disk in a datacenter takes time.
Speed of human calculations. Disks and light move faster than human decision processes. When a group of
humans is required to collaborate and make decisions together, the results will come even more slowly.
Replication can never overcome delays related to human intelligence.
Each of these laws of physics drive the following risks that commonly affect migration plans:
Replication time. Advanced replication tools can't overcome basic physics—replication requires time and
bandwidth. Plans should include realistic timelines that reflect the amount of time it takes to replicate binaries.
Total available migration bandwidth is the amount of up-bound bandwidth, measured in megabits per second
(Mbps) or gigabits per second (Gbps), that is not consumed by other higher priority business needs. Total
migration storage is the total disk space, measured in gigabytes or terabytes, required to store a snapshot of
all assets to be migrated. An initial estimate of time can be calculated by dividing the total migration storage
by total available migration bandwidth. Note the conversion from bits to bytes. See the following entry,
"Cumulative effect of disk drift," for a more accurate calculation of time.
Cumulative effect of disk drift. From the point of replication to the promotion of an asset to production, the
source and destination binaries must remain synchronized. Drift in binaries consumes additional bandwidth,
as all changes to the binary must be replicated on a recurring basis. During synchronization, all binary drift
must be included in the calculation for total migration storage. The longer it takes to promote an asset to
production, the more cumulative drift will occur. The more assets being synchronized, the more bandwidth
consumed. With each asset being held in a synchronization state, a bit more of the total available migration
bandwidth is lost.
Time to business change. As mentioned in the previous entry, "Cumulative effect of disk drift,"
synchronization time has a cumulative negative effect on migration speed. Prioritization of the migration
backlog and advanced preparation for the business change plan are crucial to the speed of migration. The
most significant test of business and technical alignment during a migration effort is the pace of promotion.
The faster an asset can be promoted to production, the less impact disk drift will have on bandwidth and the
more bandwidth/time that can be allocated to replication of the next workload.

Next steps
After replication is complete, staging activities can begin.
Staging activities during a migration
Replication options
2 minutes to read • Edit Online

Before any migration, you should ensure that primary systems are safe and will continue to run without issues.
Any downtime disrupts users or customers, and it costs time and money. Migration is not as simple as turning off
the virtual machines on-premises and copying them across to Azure. Migration tools must take into account
asynchronous or synchronous replication to ensure that live systems can be copied to Azure with no downtime.
Most of all, systems must be kept in lockstep with on-premises counterparts. You might want to test migrated
resources in isolated partitions in Azure, to ensure that workloads work as expected.
The content within the Cloud Adoption Framework assumes that Azure Migrate (or Azure Site Recovery) is the
most appropriate tool for replicating assets to the cloud. However, there are other options available. This article
discusses those options to help enable decision-making.

Azure Site Recovery (also known as Azure Migrate)


Azure Site Recovery orchestrates and manages disaster recovery for Azure VMs, on-premises VMs, and physical
servers. You can also use Site Recovery to manage migration of machines on-premises and other cloud providers
to Azure. Replicate on-premises machines to Azure or Azure VMs to a secondary region. Then, you fail the VM
over from the primary site to the secondary and complete the migration process. With Azure Site Recovery, you
can achieve various migration scenarios:
Migrate from on-premises to Azure. Migrate on-premises VMware VMs, Hyper-V VMs, and physical
servers to Azure. To do this, complete almost the same steps as you would for full disaster recovery. Simply
don't fail machines back from Azure to the on-premises site.
Migrate between Azure regions. Migrate Azure VMs from one Azure region to another. After the migration
is complete, configure disaster recovery for the Azure VMs now in the secondary region to which you
migrated.
Migrate from other cloud to Azure. You can migrate your compute instances provisioned on other cloud
providers to Azure VMs. Site Recovery treats those instances as physical servers for migration purposes.
Azure Site Recovery moving assets to Azure or other clouds
After you have assessed on-premises and cloud infrastructure for migration, Azure Site Recovery contributes to
your migration strategy by replicating on-premises machines. With the following easy steps, you can set up
migration of on-premises VMs, physical servers, and cloud VM instances to Azure:
Verify prerequisites.
Prepare Azure resources.
Prepare on-premises VM or cloud instances for migration.
Deploy a configuration server.
Enable replication for VMs.
Test failover to make sure everything's working.
Run a one-time failover to Azure.

Azure Database Migration Service


This service helps reduce the complexity of your cloud migration by using a single comprehensive service instead
of multiple tools. Azure Database Migration Service is designed as a seamless, end-to-end solution for moving on-
premises SQL Server databases to the cloud. It is a fully managed service designed to enable seamless migrations
from multiple database sources to Azure data platforms with minimal downtime. It integrates some of the
functionality of existing tools and services, providing customers with a comprehensive, highly available solution.
The service uses the Data Migration Assistant to generate assessment reports that provide recommendations to
guide you through the changes required prior to performing a migration. It's up to you to perform any required
remediation. When you are ready to begin the migration process, the Azure Database Migration Service performs
all of the associated steps. You can fire and forget your migration projects with peace of mind, knowing that the
process takes advantage of best practices as determined by Microsoft.

Next steps
After replication is complete, staging activities can begin.
Staging activities during a migration
Understand staging activities during a migration
2 minutes to read • Edit Online

As described in the article on promotion models, staging is the point at which assets have been migrated to the
cloud. However, they are not yet ready to be promoted to production. This is often the last step in the migrate
process of a migration. After staging, the workload is managed by an IT operations or cloud operations team to
prepare it for production usage.

Deliverables
Staged assets may not be ready for use in production. There are several production readiness checks that should
be finalized before this stage is considered complete. The following is a list of deliverables often associated with
completion of asset staging.
Automated testing. Any automated tests available to validate workload performance should be run before
concluding the staging process. After the asset leaves staging, synchronization with the original source system
is terminated. As such, it is harder to redeploy the replicated assets, after the assets are staged for optimization.
Migration documentation. Most migration tools can produce an automated report of the assets being
migrated. Before concluding the staging activity, all migrated assets should be documented for clarity.
Configuration documentation. Any changes made to an asset (during remediation, replication, or staging)
should be documented for operational readiness.
Backlog documentation. The migration backlog should be updated to reflect the workload and assets
staged.

Next steps
After staged assets are tested and documented, you can proceed to optimization activities.
Optimize migrated workloads
Optimize migrated workloads
2 minutes to read • Edit Online

After a workload and its supporting assets have been migrated to the cloud, it must be prepared before it can be
promoted to production. In this process, activities ready the workload, size the dependent assets, and prepare the
business for when the migrated cloud-based workload enters production usage.
The objective of optimization is to prepare a migrated workload for promotion to production usage.

Definition of done
The optimization process is complete when a workload has been properly configured, sized, and is being used in
production.

Accountability during optimization


The cloud adoption team is accountable for the entire optimization process. However, members of the cloud
strategy team, the cloud operations team, and the cloud governance team should also be responsible for activities
within this process.

Responsibilities during optimization


In addition to the high-level accountability, there are actions that an individual or group needs to be directly
responsible for. The following are a few activities that require assignments to responsible parties:
Business testing. Resolve any compatibility issues that prevent the workload from completing its migration to
the cloud.
Power users from within the business should participate heavily in testing of the migrated workload.
Depending on the degree of optimization attempted, multiple testing cycles may be required.
Business change plan. Development of a plan for user adoption, changes to business processes, and
modification to business KPIs or learning metrics as a result of the migration effort.
Benchmark and optimize. Study of the business testing and automated testing to benchmark performance.
Based on usage, the cloud adoption team refines sizing of the deployed assets to balance cost and performance
against expected production requirements.
Ready for production. Prepare the workload and environment for the support of the workload's ongoing
production usage.
Promote. Redirect production traffic to the migrated and optimized workload. This activity represents the
completion of a release cycle.
In addition to core activities, there are a few parallel activities that require specific assignments and execution
plans:
Decommission. Generally, cost savings can be realized from a migration, when the previous production assets
are decommissioned and properly disposed of.
Retrospective. Every release creates an opportunity for deeper learning and adoption of a growth mindset.
When each release cycle is completed, the cloud adoption team should evaluate the processes used during
migration to identify improvements.

Next steps
With a general understanding of the optimization process, you are ready to begin the process by establishing a
business change plan for the candidate workload.
Business change plan
Business change plan
3 minutes to read • Edit Online

Traditionally, IT has overseen the release of new workloads. During a major transformation, like a datacenter
migration or a cloud migration, a similar pattern of IT lead adoption could be applied. However, the traditional
approach might miss opportunities to realize additional business value. For this reason, before a migrated
workload is promoted to production, implementing a broader approach to user adoption is suggested. This article
outlines the ways in which a business change plan adds to a standard user adoption plan.

Traditional user adoption approach


User adoption plans focus on how users will adopt a new technology or change to a given technology. This
approach is time tested for introducing users to new tools. In a typical user adoption plan, IT focuses on the
installation, configuration, maintenance, and training associated with the technical changes being introduced to
the business environment.
Although approaches may vary, general themes are present in most user adoption plans. These themes are
typically based on a risk control and facilitation approach that aligns to incremental improvement. The Eason
Matrix, illustrated in the figure below, represents the drivers behind those themes across a spectrum of adoption
types.

Eason Matrix of user adoption types.


These themes are often based on the assumption that introduction of new solutions to users should focus largely
on risk control and facilitation of change. Additionally, IT has focused mostly on risk from the technology change
and facilitation of that change.

Create business change plans


A business change plan looks beyond the technical change and assumes that every release in a migration effort
drives some level of business process change. It looks upstream and downstream from the technical changes. The
following questions help participants think about user adoption from a business change perspective, to maximize
business impact:
Upstream questions. Upstream questions look at impacts or changes that come before user adoption happens:
Has an expected business outcome been quantified?
Does the business impact map to defined learning metrics?
Which business processes and teams take advantage of this technical solution?
Who in the business can best align power users for testing and feedback?
Have the affected business leaders been involved in the prioritization and migration planning?
Are there any critical events or dates for the business that could be affected by this change?
Does the business change plan maximize impact but minimize business disruption?
Is downtime expected? Has a downtime window been communicated to end users?
Downstream questions. After the adoption is complete, the business change can begin. Unfortunately, this is
where many user adoption plans end. Downstream questions help the cloud strategy team maintain a focus on
transformation after technical change is completed:
Are business users responding well to the changes?
Has performance anticipation been maintained, now that the technical change has been adopted?
Are business processes or customer experiences changing in the anticipated ways?
Are additional changes required to realize learning metrics?
Did the changes align to the targeted business outcomes? If not, why not?
Are additional changes required to contribute to business outcomes?
Have any negative effects been observed as a result of this change?
The business change plan varies from company to company. The goal of these questions is to help better
integrate the business into the change associated with each release. By looking at each release not as a technology
change to be adopted but instead as a business change plan, business outcomes can become more obtainable.

Next steps
After business change is documented and planned, business testing can begin.
Guidance for business testing (UAT) during migration

References
Eason, K. (1988) Information technology and organizational change, New York: Taylor and Francis.
Guidance for business testing (UAT) during migration
3 minutes to read • Edit Online

Traditionally seen as an IT function, user acceptance testing during a business transformation can be orchestrated
solely by IT. However, this function is often most effectively executed as a business function. IT then supports this
business activity by facilitating the testing, developing testing plans, and automating tests when possible. Although
IT can often serve as a surrogate for testing, there is no replacement for firsthand observation of real users
attempting to take advantage of a new solution in the context of a real or replicated business process.

NOTE
When available, automated testing is a much more effective and efficient means of testing any system. However, cloud
migrations often focus most heavily on legacy systems or at least stable production systems. Often, those systems aren't
managed by thorough and well-maintained automated tests. This article assumes that no such tests are available at the
time of migration.

Second to automated testing is testing of the process and technology changes by power users. Power users are
the people that commonly execute a real-world process that requires interactions with a technology tool or set of
tools. They could be represented by an external customer using an e-commerce site to acquire goods or services.
Power users could also be represented by a group of employees executing a business process, such as a call center
servicing customers and recording their experiences.
The goal of business testing is to solicit validation from power users to certify that the new solution performs in
line with expectations and does not impede business processes. If that goal isn't met, the business testing serves
as a feedback loop that can help define why and how the workload isn't meeting expectations.

Business activities during business testing


During business testing, the first iteration is manually driven directly with customers. This is the purest but most
time-consuming form of feedback loop.
Identify power users. The business generally has a better understanding of the power users who are most
affected by a technical change.
Align and prepare power users. Ensure that power users understand the business objectives, desired
outcomes, and expected changes to business processes. Prepare them and their management structure for the
testing process.
Engage in feedback loop interpretation. Help the IT staff understand the impact of various points of
feedback from power users.
Clarify process change. When transformation could trigger a change to business processes, communicate
the change and any downstream impacts.
Prioritize feedback. Help the IT team prioritize feedback based on the business impact.
At times, IT may employ analysts or product owners who can serve as proxies for the itemized business testing
activities. However, business participation is highly encouraged and is likely to produce favorable business
outcomes.

IT activities during business testing


IT serves as one of the recipients of the business testing output. The feedback loops exposed during business
testing eventually become work items that define technical change or process change. As a recipient, IT is expected
to aid in facilitation, collection of feedback, and management of resultant technical actions. The typical activities IT
performs during business testing include:
Provide structure and logistics for business testing.
Aid in facilitation during testing.
Provide a means and process for recording feedback.
Help the business prioritize and validate feedback.
Develop plans for acting on technical changes.
Identify existing automated tests that could streamline the testing by power users.
For changes that could require repeated deployment or testing, study testing processes, define benchmarks,
and create automation to further streamline power user testing.

Next steps
In conjunction with business testing, optimization of migrated assets can refine cost and workload performance.
Benchmark and resize cloud assets
Benchmark and resize cloud assets
3 minutes to read • Edit Online

Monitoring usage and spending is critically important for cloud infrastructures. Organizations pay for the
resources they consume over time. When usage exceeds agreement thresholds, unexpected cost overages can
quickly accumulate. Cost Management reports monitor spending to analyze and track cloud usage, costs, and
trends. Using overtime reports, detect anomalies that differ from normal trends. Inefficiencies in cloud deployment
are visible in optimization reports. Note inefficiencies in cost-analysis reports.
In the traditional on-premises models of IT, requisition of IT systems is costly and time consuming. The processes
often require lengthy capital expenditure review cycles and may even require an annual planning process. As such,
it is common practice to buy more than is needed. It is equally common for IT administrators to then
overprovision assets in preparation for anticipated future demands.
In the cloud, the accounting and provisioning models eliminate the time delays that lead to overbuying. When an
asset needs additional resources, it can be scaled up or out almost instantly. This means that assets can safely be
reduced in size to minimize resources and costs consumed. During benchmarking and optimization, the cloud
adoption team seeks to find the balance between performance and costs, provisioning assets to be no larger and
no smaller than necessary to meet production demands.

Should assets be optimized during or after the migration?


When should an asset be optimized—during or after the migration? The simple answer is both. However, that's
not entirely accurate. To explain, take a look at two basic scenarios for optimizing resource sizing:
Planned resizing. Often, an asset is clearly oversized and underutilized and should be resized during
deployment. Determining if an asset has been successfully resized in this case requires user acceptance testing
after migration. If a power user does not experience performance or functionality losses during testing, you can
conclude the asset has been successfully sized.
Optimization. In cases where the need for optimization is unclear, IT teams should use a data-driven approach
to resource size management. Using benchmarks of the asset's performance, an IT team can make educated
decisions regarding the most appropriate size, services, scale, and architecture of a solution. They can then
resize and test performance theories post-migration.
During the migration, use educated guesses and experiment with sizing. However, true optimization of resources
requires data based on actual performance in a cloud environment. For true optimization to occur, the IT team
must first implement approaches to monitoring performance and resource utilization.

Benchmark and optimize with Azure Cost Management


Azure Cost Management licensed by Cloudyn, a Microsoft subsidiary, manages cloud spend with transparency
and accuracy. This service monitors, benchmarks, allocates, and optimizes cloud costs.
Historical data can help manage costs by analyzing usage and costs over time to identify trends, which are then
used to forecast future spending. Cost Management also includes useful projected cost reports. Cost allocation
manages costs by analyzing costs based on tagging policies. Use cost allocation for showback/chargeback to show
resource utilization and associated costs to influence consumption behaviors or charge tenant customers. Access
control helps manage costs by ensuring that users and teams access only the Cost Management data that they
need. Alerting helps manage costs through automatic notification when unusual spending or overspending occurs.
Alerts can also notify other stakeholders automatically for spending anomalies and overspending risks. Various
reports support alerts based on budget and cost thresholds.
Improve efficiency
Determine optimal VM usage, identify idle VMs, or remove idle VMs and unattached disks with Cost
Management. Using information in sizing optimization and inefficiency reports, create a plan to downsize or
remove idle VMs.

Next steps
After a workload has been tested and optimized, it is time to ready the workload for promotion.
Getting a migrated workload ready for production promotion
Prepare a migrated application for production
promotion
2 minutes to read • Edit Online

After a workload is promoted, production user traffic is routed to the migrated assets. Readiness activities provide
an opportunity to prepare the workload for that traffic. The following are a few business and technology
considerations to help guide readiness activities.

Validate the business change plan


Transformation happens when business users or customers take advantage of a technical solution to execute
processes that drive the business. Readiness is a good opportunity to validate the business change plan and to
ensure proper training for the business and technical teams involved. In particular, ensure that the following
technology-related aspects of the change plan are properly communicated:
End-user training is completed (or at least planned).
Any outage windows have been communicated and approved.
Production data has been synchronized and validated by end users.
Validate promotion and adoption timing; ensure timelines and changes have been communicated to end users.

Final technical readiness tests


Ready is the last step prior to production release. That means it is also the last chance to test the workload. The
following are a few tests that are suggested during this phase:
Network isolation testing. Test and monitor network traffic to ensure proper isolation and no unexpected
network vulnerabilities. Also validate that any network routing to be severed during cutover is not experiencing
unexpected traffic.
Dependency testing. Ensure that all workload application dependencies have been migrated and are
accessible from the migrated assets.
Business continuity and disaster recovery (BCDR) testing. Validate that any backup and recovery SLAs
are established. If possible, perform a full recovery of the assets from the BCDR solution.
End-user route testing. Validate traffic patterns and routing for end-user traffic. Ensure that network
performance aligns with expectations.
Final performance check. Ensure that performance testing has been completed and approved by end users.
Execute any automated performance testing.

Final business validation


After the business change plan and technical readiness have been validated, the following final steps can complete
the business validation:
Cost validation (plan versus actual). Testing is likely to produce changes in sizing and architecture. Ensure
that actual deployment pricing still aligns with the original plan.
Communicate and execute cutover plan. Prior to cutover, communicate the cutover and execute
accordingly.

Next steps
After all readiness activities have been completed, its time to promote the workload.
What is required to promote a migrated resource to production?
What is required to promote a migrated resource to
production?
2 minutes to read • Edit Online

Promotion to production marks the completion of a workload's migration to the cloud. After the asset and all of its
dependencies are promoted, production traffic is rerouted. The rerouting of traffic makes the on-premises assets
obsolete, allowing them to be decommissioned.
The process of promotion varies according to the workload's architecture. However, there are several consistent
prerequisites and a few common tasks. This article describes each and serves as a kind of prepromotion checklist.

Prerequisite processes
Each of the following processes should be executed, documented, and validated prior to production deployment:
Assess: The workload has been assessed for cloud compatibility.
Architect: The structure of the workload has been properly architected to align with the chosen cloud provider.
Replicate: The assets have been replicated to the cloud environment.
Stage: The replicated assets have been restored in a staged instance of the cloud environment.
Business testing: The workload has been fully tested and validated by business users.
Business change plan: The business has shared a plan for the changes to be made in accordance with the
production promotion; this should include a user adoption plan, changes to business processes, users that
require training, and timelines for various activities.
Ready: Generally, a series of technical changes must be made before promotion.

Best practices to execute prior to promotion


The following technical changes will likely need to be completed and documented as part of the promotion
process:
Domain alignment. Some corporate policies require separate domains for staging and production. Ensure
that all assets are joined to the proper domain.
User routing. Validate that users are accessing the workload through proper network routes; verify consistent
performance expectations.
Identity alignment. Validate that the users being rerouted to the application have proper permissions within
the domain to host the application.
Performance. Perform a final validation of workload performance to minimize surprises.
Validation of business continuity and disaster recovery. Validate that proper backup and recovery
processes are functioning as expected.
Data classification. Validate data classification to ensure that proper protections and policies have been
implemented.
Chief information security officer (CISO ) verification. Validate that the information security officer has
reviewed the workload, business risks, risk tolerance, and mitigation strategies.

Final step: Promote


Workloads will require varying levels of detailed review and promotion processes. However, network realignment
serves as the common final step for all promotion releases. When everything else is ready, update DNS records or
IP addresses to route traffic to the migrated workload.

Next steps
Promotion of a workload signals the completion of a release. However, in parallel with migration, retired assets
need to be decommissioned taking them out of service.
Decommission retired assets
Decommission retired assets
2 minutes to read • Edit Online

After a workload is promoted to production, the assets that previously hosted the production workload are no
longer required to support business operations. At that point, the older assets are considered retired. Retired
assets can then be decommissioned, reducing operational costs. Decommissioning a resource can be as simple as
turning off the power to the asset and disposing of the asset responsibly. Unfortunately, decommissioning
resources can sometimes have undesired consequences. The following guidance can aid in properly
decommissioning retired resources, with minimal business interruptions.

Cost savings realization


When cost savings are the primary motivation for a migration, decommissioning is an important step. Until an
asset is decommissioned, it continues to consume power, environmental support, and other resources that drive
costs. After the asset is decommissioned, the costs savings can start to be realized.

Continued monitoring
After a migrated workload is promoted, the assets to be retired should continue to be monitored to validate that
no additional production traffic is being routed to the wrong assets.

Testing windows and dependency validation


Even with the best planning, production workloads may still contain dependencies on assets that are presumed
retired. In such cases, turning off a retired asset could cause unexpected system failures. As such, the termination
of any assets should be treated with the same level of rigor as a system maintenance activity. Proper testing and
outage windows should be established to facilitate the termination of the resource.

Holding period and data validation


It's not uncommon for migrations to miss data during replication processes. This is especially true for older data
that isn't used on a regular basis. After a retired asset has been turned off, it is still wise to maintain the asset for a
while to serve as a temporary backup of the data. Companies should allow at least 30 days for holding and testing
before destroying retired assets.

Next steps
After retired assets are decommissioned, the migration is completed. This creates a good opportunity to improve
the migration process, and a retrospective engages the cloud adoption team in a review of the release in an effort
to learn and improve.
Retrospective
How do retrospectives help build a growth mindset?
3 minutes to read • Edit Online

"Culture eats strategy for breakfast." The best migration plan can easily be undone, if it doesn't have executive
support and encouragement from leadership. Learning, growing, and even failure are at the heart of a growth
mindset. They are also at the heart of any transformation effort.
Humility and curiosity have never been more important than they are during a business transformation.
Embracing digital transformation requires both in ample supply. These traits are strengthened by regular
introspection and an environment of encouragement. When employees are encouraged to take risks, they find
better solutions. When employees are allowed to fail and learn, they succeed. Retrospectives are an opportunity
for such investigation and growth.
Retrospectives reinforce the principles of a growth mindset: experimentation, testing, learning, sharing, growing,
and empowering. They provide a safe place for team members to share the challenges faced in the current sprint.
And they allow the team to discuss and collaborate on ways to overcome those challenges. Retrospectives
empower the team to create sustainable growth.

Retrospective structure
A quick search on any search engine will offer many different approaches and tools for running a retrospective.
Depending on the maturity of the culture and experience level of the team, these could prove useful. However, the
general structure of a retrospective remains roughly the same. During these meetings, each member of the team is
expected to contribute a thought regarding three basic questions:
What went well?
What could have been better?
What did we learn?
Although these questions are simple in nature, they require employees to pause and reflect on their work over the
last iteration. This small pause for introspection is the primary building block of a growth mindset. The humility
and honesty produced when sharing the answers can become infectious beyond the time contract for the
retrospective meeting.

Leadership's role in a retrospective


The topic of leadership involvement in a retrospective is highly debated. Many technical teams suggest that
leaders of any level should not be involved in the process, since it could discourage transparency and open
dialogue. Others suggest that retrospectives are a good place for leaders to stay connected and to find ways to
provide additional support. This decision is best left to the team and its leadership structure.
If leaders are involved in the retrospective, one role is highly encouraged. The leader's primary duty in a
retrospective is to make the team feel safe. Creating a growth mindset within a culture requires employees to be
free to share their failures and successes without fear of rebuke. Leaders who applaud the courage and humility
required to admit shortcomings are more likely to see a growth mindset established in their teams. When leaders
take action on data points shared in a retrospective, they are likely to see this tool become an ineffective formality.

Lessons learned
Highly effective teams don't just run retrospective meetings. They live retrospective processes. The lessons learned
and shared in these meetings can influence process, shape future work, and help the team execute more
effectively. Lessons learned in a retrospective should help the team grow organically. The primary byproducts of a
retrospective are an increase in experimentation and a refinement of the lessons learned by the team.
That new growth is most tangibly represented in changes to the release or iteration backlog.
The retrospective marks the end of a release or iteration, as teams gain experience and learn lessons, and they
adjust the adjust the release and iteration backlog to reflect new processes and experiments to be tested. This
starts the next iteration through the migration processes.

Next steps
The Secure and Manage section of this content can help prepare the reader for the transition from migration to
operations.
Secure monitoring and management tools
Secure monitoring and management tools
4 minutes to read • Edit Online

After a migration is complete, migrated assets should be managed by controlled IT operations. This article does
not represent a deviation from operational best practices. Instead, the following should be considered a minimum
viable product for securing and managing migrated assets, either from IT operations or independently as IT
operations come online.

Monitoring
Monitoring is the act of collecting and analyzing data to determine the performance, health, and availability of your
business workload and the resources that it depends on. Azure includes multiple services that individually perform
a specific role or task in the monitoring space. Together, these services deliver a comprehensive solution for
collecting, analyzing, and acting on telemetry from your workload applications and the Azure resources that
support them. Gain visibility into the health and performance of your apps, infrastructure, and data in Azure with
cloud monitoring tools, such as Azure Monitor, Log Analytics, and Application Insights. Use these cloud
monitoring tools to take action and integrate with your service management solutions:
Core monitoring. Core monitoring provides fundamental, required monitoring across Azure resources. These
services require minimal configuration and collect core telemetry that the premium monitoring services use.
Deep application and infrastructure monitoring. Azure services provide rich capabilities for collecting and
analyzing monitoring data at a deeper level. These services build on core monitoring and take advantage of
common functionality in Azure. They provide powerful analytics with collected data to give you unique insights
into your applications and infrastructure.
Learn more about Azure Monitor for monitoring migrated assets.

Security monitoring
Rely on the Azure Security Center for unified security monitoring and advanced threat notification across your
hybrid cloud workloads. The Security Center gives full visibility into and control over the security of cloud
applications in Azure. Quickly detect and take action to respond to threats and reduce exposure by enabling
adaptive threat protection. The built-in dashboard provides instant insights into security alerts and vulnerabilities
that require attention. Azure Security Center can help with many functions, including:
Centralized policy monitoring. Ensure compliance with company or regulatory security requirements by
centrally managing security policies across hybrid cloud workloads.
Continuous security assessment. Monitor the security of machines, networks, storage and data services, and
applications to discover potential security issues.
Actionable recommendations. Remediate security vulnerabilities before they can be exploited by attackers.
Include prioritized and actionable security recommendations.
Advanced cloud defenses. Reduce threats with just-in-time access to management ports and safe lists to
control applications running on your VMs.
Prioritized alerts and incidents. Focus on the most critical threats first, with prioritized security alerts and
incidents.
Integrated security solutions. Collect, search, and analyze security data from a variety of sources, including
connected partner solutions.
Learn more about Azure Security Center for securing migrated assets.
Service health monitoring
Azure Service Health provides personalized alerts and guidance when Azure service issues affect you. It can notify
you, help you understand the impact of issues, and keep you updated as the issue is resolved. It can also help you
prepare for planned maintenance and changes that could affect the availability of your resources.
Service health dashboard. Check the overall health of your Azure services and regions, with detailed updates
on any current service issues, upcoming planned maintenance, and service transitions.
Service health alerts. Configure alerts that will notify you and your teams in the event of a service issue like
an outage or upcoming planned maintenance.
Service health history. Review past service issues and download official summaries and reports from
Microsoft.
Learn more about Azure Service Health for staying informed about the health of your migrated resources.

Protect assets and data


Azure Backup provides a means of protecting VMs, files, and data. Azure Backup can help with many functions,
including:
Backing up VMs.
Backing up files.
Backing up SQL Server databases.
Recovering protected assets.
Learn more about Azure Backup for protecting migrated assets.

Optimize resources
Azure Advisor is your personalized guide to Azure best practices. It analyzes your configuration and usage
telemetry and offers recommendations to help you optimize your Azure resources for high availability, security,
performance, and cost. Advisor’s inline actions help you quickly and easily remediate your recommendations and
optimize your deployments.
Azure best practices. Optimize migrated resources for high availability, security, performance, and cost.
Step-by-step guidance. Remediate recommendations efficiently with guided quick links.
New recommendations alerts. Stay informed about new recommendations, such as additional opportunities
to rightsize VMs and save money.
Learn more about Azure Advisor for optimizing your migrated resources.

Suggested skills
Microsoft Learn is a new approach to learning. Readiness for the new skills and responsibilities that come with
cloud adoption doesn't come easily. Microsoft Learn provides a more rewarding approach to hands-on learning
that helps you achieve your goals faster. Earn points and levels, and achieve more!
Here is an example of a tailored learning path on Microsoft Learn that's aligned with the Secure and Manage
portion of the Cloud Adoption Framework:
Secure your cloud data: Azure was designed for security and compliance. Learn how to leverage the built-in
services to store your app data securely to ensure that only authorized services and clients have access to it.
All IT portfolios contain a few workloads and ideas that could significantly improve a company's position in the market. Most
cloud adoption efforts focus on the migration and modernization of existing workloads. It's innovation, however, that can provide
the greatest business value. Cloud adoption-related innovation can unlock new technical skills and expanded business
capabilities.
This section of the Cloud Adoption Framework focuses on the elements of your portfolio that drive the greatest return on
investment.

Get started
To prepare you for this phase of the cloud adoption lifecycle, the framework suggests the following exercises:

Business value consensus


Before you decide on technical solutions, identify how new innovation can drive business value. Map that value to your cloud
strategy. In this incremental methodology, business value is represented by a hypothesis about customer needs.

Azure innovation guide


Azure includes a number of cloud tools that can accelerate the deployment of innovative solutions. Depending on your
hypothesis, you might consider various combinations of tools. The creation of a minimum viable product (MVP ) with basic
tools is suggested.

Best practices
Your architectural decisions should follow best practices for each tool in the toolchain. By adhering to such guidance, you can
better accelerate solution development and provide a reference for solid architectural designs.

Feedback loops
During each iteration, the solutions under development offer a way for your teams to learn alongside customers. Fast and
accurate feedback loops with your customers can help you better test, measure, learn, and ultimately reduce the time to
market impact. Learn how Azure and GitHub accelerate feedback loops.

Methodology summary
The considerations overview establishes a common language for innovation across application development, DevOps, IT, and
business teams.
The exercises in the Get started section help make the methodology actionable during the development of innovative
solutions.

This approach builds on existing lean methodologies. It's designed to help you create a cloud-focused conversation about
customer adoption and a scientific model for creating business value. The approach also maps existing Azure services to
manageable decision processes. This alignment can help you find the right technical options to address specific customer needs
or hypotheses.

Suggested skills
Microsoft Learn is a new approach to learning. Readiness for the new skills and responsibilities that come with cloud adoption
doesn't come easily. Microsoft Learn provides a more rewarding approach to hands-on learning that helps you achieve your
goals faster. Earn points and levels, and achieve more!
Here are a couple of examples of role-specific learning paths on Microsoft Learn that align with the Innovate portion of the
Cloud Adoption Framework.
Administer containers in Azure: Azure Container Instances (ACI) are the quickest and easiest way to run containers in Azure. This
learning path will teach you how to create and manage your containers, and how you can use ACI to provide elastic scale for
Kubernetes.
Create serverless applications: Azure Functions enable the creation of event-driven, compute-on-demand systems that can be
triggered by various external events. Learn how to leverage functions to execute server-side logic and build serverless
architectures.
To discover additional learning paths, browse the Learn catalog. Use the Roles filter to align learning paths with your role.

Next steps
The first exercise for cloud innovation is to:
Build consensus for business value of innovation
2 minutes to read • Edit Online

Azure innovation guide: Before you start


NOTE
This guide provides a starting point for innovation guidance in the Cloud Adoption Framework. It is also available in the
Azure Quickstart Center.

Before you start


Before you start developing innovative solutions by using Azure services, you need to prepare your environment,
which includes preparing to manage customer feedback loops. In this guide, we introduce features that help you
engage customers, build solutions, and drive adoption. For more information, best practices, and considerations
related to preparing your cloud environment, see the Cloud Adoption Framework innovate section.
In this guide, you'll learn how to:
Manage customer feedback: Set up tools and processes to manage the build-measure-learn feedback loop
by using GitHub and Azure DevOps.
Democratize data: Data alone might be enough to drive innovative solutions to your customers. Deploy
common data options in Azure.
Engage through apps: Some innovation requires an engaging experience. Leverage cloud-native application
platforms to create engaging experiences.
Empower adoption: Invention is great, but a plan to reduce friction is needed to empower and scale adoption.
Deploy a foundation for CI/CD, DevOps, and other adoption enablers.
Interact through devices: Create ambient experiences to bring your apps and data closer to the customers'
point of need. IoT, mixed reality, and mobile experiences are easier with Azure.
Predict and influence: Find patterns in data. Put those patterns to work to predict and influence customer
behaviors by using Azure-based predictive analytics tools.

TIP
For an interactive experience, view this guide in the Azure portal. Go to the Azure Quickstart Center in the Azure portal,
select Azure innovation guide, and then follow the step-by-step instructions.

Next steps: Prepare for innovation with a shared repository and ideation management tools
This guide provides interactive steps that let you try features as they're introduced. To come back to where you left
off, use the breadcrumb for navigation.
4 minutes to read • Edit Online

Azure innovation guide: Prepare for customer feedback


Prepare for customer feedback
User adoption, engagement, and retention are key to successful innovation. Why?
Building an innovative new solution isn't about giving users what they want or think they want. It's about the
formulation of a hypothesis that can be tested and improved upon. That testing comes in two forms:
Quantitative (testing feedback): This feedback measures the actions we hope to see.
Qualitative (customer feedback): This feedback tells us what those metrics mean in the customer's voice.
Before you integrate feedback loops, you need to have a shared repository for your solution. A centralized repo
will provide a way to record and act on all the feedback coming in about your project. GitHub is the home for open
source software. It's also one of the most commonly used platforms for hosting source code repositories for
commercially developed apps. The article on building GitHub repositories can help you get started with your repo.
Each of the following tools in Azure integrates with (or is compatible with) projects hosted in GitHub:
Quantitative feedback for web apps
Quantitative feedback for APIs
Qualitative feedback
Close the loop with pipelines
Application Insights is a monitoring tool that provides near-real-time quantitative feedback on the usage of your
application. This feedback can help you test and validate your current hypothesis to shape the next feature or user
story in your backlog.
Action
To view quantitative data on your applications:
1. Go to Application Insights.
If your application doesn't appear in the list, select Add and follow the prompts to start configuring
Application Insights.
If the desired app is in the list, select the application.
2. The Overview pane includes some statistics on the application. Select Application Dashboard to build a
custom dashboard for data that's more relevant to your hypothesis.
G O TO A P P L I C A TI O N
I N S I G H TS

To view the data about your apps, go to the Azure portal.


Learn more
Set up Azure Monitor
Get started with Azure Monitor Application Insights
Build a telemetry dashboard
2 minutes to read • Edit Online

Azure innovation guide: Democratize data


Democratize data
One of the first steps in democratizing data is to enhance data discoverability. Cataloging and managing data
sharing can help enterprises get the most value from their existing information assets. A data catalog makes data
sources easy to discover and understand by the users who manage the data. Azure Data Catalog enables
management inside an enterprise, whereas Azure Data Share enables management and sharing outside the
enterprise.
Azure services that provide data processing, like Azure Time Series Insights and Stream Analytics, are other
capabilities that customers and partners are successfully using for their innovation needs.
Catalog
Share
Insights

Azure Data Catalog


Azure Data Catalog addresses the discovery challenges of data consumers and enables data producers who
maintain information assets. It bridges the gap between IT and the business, allowing everyone to contribute their
insights. You can store your data where you want it and connect with the tools you want to use. With Azure Data
Catalog, you can control who can discover registered data assets. You can integrate into existing tools and
processes by using open REST APIs.
Register
Search and annotate
Connect and manage
Go to the Azure Data Catalog documentation
Action
You can use only one Azure data catalog per organization. If a data catalog has already been created for your
organization, you can't add more catalogs.
To create an Azure data catalog for your organization:
1. Go to Azure Data Catalog.
2. Select Create.
G O TO A Z U R E D A TA
C A TA L O G
10 minutes to read • Edit Online

Azure innovation guide: Engage customers through apps


Engage customers through apps
Innovation with apps includes both modernizing your existing apps that are hosted on-premises and building
cloud-native apps by using containers or serverless technologies. Azure provides PaaS services like Azure App
Service to help you easily modernize your existing web and API apps written in .NET, .NET Core, Java, Node.js,
Ruby, Python, or PHP for deployment in Azure.
With an open-standard container model, building microservices or containerizing your existing apps and deploying
them on Azure is simple when you use managed services like Azure Kubernetes Service, Azure Container
Instances, and Web App for Containers. Serverless technologies like Azure Functions and Azure Logic Apps use a
consumption model (pay for what you use) and help you focus on building your application rather than deploying
and managing infrastructure.
Deliver value faster
Create cloud-native apps
Isolate points of failure
One of the advantages of cloud-based solutions is the ability to gather feedback faster and start delivering value to
your user. Whether that user is an external customer or a user in your own company, the faster you can get
feedback on your applications, the better.

Azure App Service


Azure App Service provides a hosting environment for your applications that removes the burden of infrastructure
management and OS patching. It provides automation of scale to meet the demands of your users while bound by
limits that you define to keep costs in check.
Azure App Service provides first-class support for languages like ASP.NET, ASP.NET Core, Java, Ruby, Node.js,
PHP, and Python. If you need to host another runtime stack, Web App for Containers lets you quickly and easily
host a Docker container within App Service, so you can host your custom code stack in an environment that gets
you out of the server business.
Action
To configure or monitor Azure App Service deployments:
1. Go to App Services.
2. Configure a new service: Select Add and follow the prompts.
3. Manage existing services: Select the desired app from the list of hosted applications.
G O TO A P P
S E R V IC E S

Azure Cognitive Services


With Azure Cognitive Services, you can infuse advanced intelligence directly into your app through a set of APIs
that let you take advantage of Microsoft-supported AI and machine learning algorithms.
Action
To configure or monitor Azure Cognitive Services deployments:
1. Go to Cognitive Services.
2. Configure a new service: Select Add and follow the prompts.
3. Manage existing services: Select the desired service from the list of hosted services.
G O TO C O G N I TI V E
S E R V IC E S

Azure Bot Service


Azure Bot Service extends your standard application by adding a natural bot interface that uses AI and machine
learning to create a new way to interact with your customers.
Action
To configure or monitor Azure Bot Services deployments:
1. Go to Bot Services.
2. Configure a new service: Select Add and follow the prompts.
3. Manage existing services: Select the desired bot from the list of hosted services.
G O TO B O T
S E R V IC E S

Azure DevOps
During your innovation journey, you'll eventually find yourself on the path to DevOps. Microsoft has long had an
on-premises product known as Team Foundation Server (TFS ). During our own innovation journey, Microsoft
developed Azure DevOps, a cloud-based service that provides build and release tools supporting many languages
and destinations for your releases. For more information, see Azure DevOps.

Visual Studio App Center


As mobile apps continue to grow in popularity, the need for a platform that can provide automated testing on real
devices of various configurations grows. Visual Studio App Center doesn't just provide a place where you can test
your applications across iOS, Android, Windows, and macOS. It also provides a monitoring platform that can use
Azure Application Insights to analyze your telemetry quickly and easily. For more information, see Visual Studio
App Center overview.
Visual Studio App Center also provides a notification service that lets you use a single call to send notifications to
your app across platforms without having to contact each notification service individually. For more information,
see Visual Studio App Center Push (ACP ).
Learn more
App Service overview
Web App for Containers: Run a custom container
An introduction to Azure Functions
Azure for .NET and .NET Core developers
Azure SDK for Python documentation
Azure for Java cloud developers
Create a PHP web app in Azure
Azure SDK for JavaScript documentation
Azure SDK for Go documentation
DevOps solutions
7 minutes to read • Edit Online

Azure innovation guide: Empower adoption


Empower adoption
You know that innovation is critical to business success. You don't accomplish innovation solely through the
introduction of new technologies. You need to focus on supporting the people who catalyze change and create the
new value that you seek. Developers are at the center of digital transformation, and to empower them to achieve
more, you need to accelerate developer velocity. To unleash the creative energy of developer teams, you need to
help them build productively, foster global and secure collaboration, and remove barriers so they can scale
innovation.

Generate value
In every industry, every organization is trying to do one thing: drive constant value generation.
The focus on innovation is essentially a process to help your organization find new ways to generate value.
Perhaps the biggest mistake organizations make is trying to create new value by introducing new technologies.
Sometimes the attitude is "if we just use more technology, we'll see things improve." But innovation is first and
foremost a people story.
Innovation is about the combination of people and technology.
Organizations that successfully innovate see vision, strategy, culture, unique potential, and capabilities as the
foundational elements. They then turn to technology with a specific purpose in mind. Every company is becoming a
software company. The hiring of software engineers is growing at a faster rate outside the tech industry than
inside, according to LinkedIn data.
Innovation is accomplished when organizations support their people to create the value they seek. One group of
those people, developers, is a catalyst for innovation. They play an increasingly vital role in value creation and
growth across every industry. They're the builders of our era, writing the world's code and sitting at the heart of
innovation. Innovative organizations build a culture that empowers developers to achieve more.
Developer productivity
Innovate collaboratively
Innovation characteristics
LiveOps innovation

Developer velocity
Empowering developers to invent means accelerating developer velocity, enabling them to create more, innovate
more, and solve more problems. Developer velocity is the underpinning of each organization's tech intensity.
Developer velocity isn't just about speed. It's also about unleashing developer ingenuity, turning your developers'
ideas into software with speed and agility so that innovative solutions can be built. The differentiated Azure
solution is uniquely positioned to unleash innovation in your organization.

Build productively
There are several areas of opportunity where Azure can help you build productively:
Ensure developers become and stay proficient in their domain by helping them advance their knowledge.
Hone the right skills by giving them the right tools.
One of the best ways to improve your developers' skills is by giving them tools they know and love. Azure tools
meet developers where they are today and introduce them to new technologies in the context of the code they're
writing. With the Azure commitment to open-source software and support for all languages and frameworks in
Azure tools, your developers can build how they want and deploy where you want.
Azure DevOps provides best-in-class tools for every developer. Azure developer services infuse modern
development practices and emerging trends into our tools. With the Azure platform, developers have access to the
latest technologies and a cutting-edge toolchain that supports the way they work.
AI-assisted development tools
Integrated tools and cloud
Remote development and pair programming
Go to the Get started documentation for Azure DevOps
Action
To create a DevOps project:
1. Go to Azure DevOps Projects.
2. Select Create DevOps project.
3. Select Runtime, Framework and Service.
G O TO A Z U R E D E V O P S
PR O JE CT
5 minutes to read • Edit Online

Azure innovation guide: Interact through devices


Interact through devices
Innovate through intermittently connected and perceptive edge devices. Orchestrate millions of such devices,
acquire and process limitless data, and take advantage of a growing number of multisensory, multidevice
experiences. For devices at the edge of your network, Azure provides a framework for building immersive and
effective business solutions. With ubiquitous computing, enabled by Azure combined with artificial intelligence (AI)
technology, you can build every type of intelligent application and system you can envision.
Azure customers employ a continually expanding set of connected systems and devices that gather and analyze
data—close to their users, the data, or both. Users get real-time insights and experiences, delivered by highly
responsive and contextually aware apps. By moving parts of the workload to the edge, these devices can spend less
time sending messages to the cloud and react more quickly to spatial events.
Industrial assets
HoloLens 2
Azure Sphere
Kinect DK
Drones
Azure SQL Database Edge
IoT Plug and Play
Global scale IoT service
Azure Digital Twins
Location intelligence
Spatial experiences
Azure Remote Rendering
Architect solutions that exercise bidirectional communication with IoT devices at billions scale. Use out-of-box,
device-to-cloud telemetry data to understand the state of your devices and define message routes to other Azure
services just through configuration. By taking advantage of cloud-to-device messages, you can reliably send
commands and notifications to your connected devices and track message delivery with acknowledgment receipts.
And you'll automatically resend device messages as needed to accommodate intermittent connectivity.
Here are a few features you'll find:
Security-enhanced communication channel for sending and receiving data from IoT devices.
Built-in device management and provisioning to connect and manage IoT devices at scale.
Full integration with Event Grid and serverless compute, simplifying IoT application development.
Compatibility with Azure IoT Edge for building hybrid IoT applications.
Go to IoT Hub
Go to Device Provisioning Services
Action
To create an IoT hub:
1. Go to IoT Hub.
2. Select Create IoT hub.
G O TO I O T
HUB

The IoT Hub Device Provisioning Service is a helper service for IoT Hub that enables zero-touch, just-in-time
provisioning.
Action
To create IoT Hub Device Provisioning Services:
1. Go to IoT Hub Device Provisioning Services.
2. Select Create Device Provisioning Services.
G O TO D E V I C E P R O V I S I O N I N G
S E R V IC E S
3 minutes to read • Edit Online

Azure innovation guide: Predict and influence


Predict and influence
As an innovator, your company has insights into the data, behavior, and needs of its customer base. Studying those
insights can aid in predicting your customers' needs, possibly before your customers themselves are aware of
those needs. This article introduces a few approaches to delivering predictive solutions. In the final sections, the
article introduces approaches to integrating the predictions back into your solution to influence customer
behaviors.
The following table can help you find the best solution, based on your implementation needs.

BUILD AND TRAIN AND BUILD


SERVICE PREBUILT MODELS EXPERIMENT WITH PYTHON REQUIRED SKILLS

Azure Cognitive Yes No No API and developer


Services skills

Azure Machine Yes Yes No General


Learning Studio understanding of
predictive algorithms

Azure Machine Yes Yes Yes Data scientist


Learning service

Azure Cognitive Services


Azure Machine Learning Studio
Azure Machine Learning service
The fastest and easiest path to predicting customer needs is Azure Cognitive Services. Cognitive Services allows
predictions to be made based on existing models, which require no additional training. These services are optimal,
and effective when you have no data scientist on staff to train the predictive model. For some services, no training
is required. Other services require only minimal training.
For a list of available services and the amount of training that might be required, see Cognitive Services and
machine learning.
Action
To use a Cognitive Services API:
1. In the Azure portal, go to Cognitive Services.
2. Select Add to find a Cognitive Services API in Azure Marketplace.
3. Do either of the following:
If you know the name of the service you want to use, enter the name in the Search the Marketplace
box.
For a list of Cognitive Services APIs, select the See More link next to the Cognitive Services heading.
G O TO C O G N I TI V E
S E R V IC E S

Go directly to Cognitive Services in the Azure portal.


Innovation in the digital economy
4 minutes to read • Edit Online

The digital economy is an undeniable force in almost every industry. During the Industrial Revolution, gasoline,
conveyor belts, and human ingenuity were key resources for promoting market innovation. Product quality, price,
and logistics drove markets as companies sought to deliver better products to their customers more quickly.
Today's digital economy shifts the way in which customers interact with corporations. The primary forms of
capital and market differentiators have all shifted as a result. In the digital economy, customers are less concerned
with logistics and more concerned with their overall experience of using a product. This shift arises from direct
interaction with technology in our daily lives and from a realization of the value associated with those interactions.
In the Innovate phase of the Cloud Adoption Framework, we'll focus on understanding customer needs and
rapidly building innovations that shape how your customers interact with your products. We'll also illustrate an
approach to delivering on the value of a minimum viable product (MVP ). Finally, we'll map decisions common to
innovation cycles to help you understand how the cloud can unlock innovation and create partnerships with your
customers.

Innovate methodology
The simple methodology for cloud innovation within the Cloud Adoption Framework is illustrated in the
following image. Subsequent articles in this section will show how to establish core processes, approaches, and
mechanisms for finding and driving innovation within your company.

This article series emphasizes the following aspects of this methodology:


First, always start with customer adoption to generate feedback that builds customer partnerships through the
build-measure-learn feedback loop.
Second, examine approaches to developing digital inventions that prioritize adoption.
The following section describes the formula for innovation and the commitments required for success with this
approach.

Formula for innovation


Successful innovation is not a big-bang transformational event or an elusive magical unicorn. Success in
innovation is more of a balancing act, illustrated by a simple equation: Innovation = Invention + Adoption.
Innovation happens at the intersection of invention and adoption. True innovation stems from slowly adjusting
human experiences through new approaches, new processes, and new technologies. In this formula, invention
means creating a new solution that meets a customer need. Conversely, adoption means applying the new
solution to shape human behaviors and interactions. Finding the right balance between invention and adoption
requires iteration, data-driven decision making, constant learning, and a growth mindset. It also requires
technologies that can keep pace with the countless opportunities to learn in today's digital society.
The cloud is often a great platform for invention or the technological aspects of innovation. Unfortunately, most
great ideas fail during the hard work of adoption, rather than during the ideation or invention processes. To
ensure success, development teams should always start with adoption as the test for innovation. That's why this
methodology starts with adoption. To use this methodology, the following three commitments should be agreed
upon by the team:
Commitment to prioritize customers over technology
Commitment to transparency
Commitment to iteration

Cultural commitments
Adopting the Innovate methodology requires some cultural commitments to effectively use the metrics outlined
in this article. Before you change your approach to driving innovation, make sure the adoption and leadership
teams are ready to make these important commitments.

Commitment to prioritize customers over technology


Every development team has a set of tools or technologies that they're most familiar with. It's wise to play to
those strengths and use what you know. However, for innovation to be successful, teams must maintain a focus
on customer needs and the hypothesis being tested. At times, this focus may not align with the capabilities of a
particular tool or architectural approach. To be successful in innovation, the development team must remain
open-minded. During the invention process, focus technical decisions on the needs of the customer over the
preferences of your team.

Commitment to transparency
To understand measurement in an innovation approach, you must first understand the commitment to
transparency. Innovation can only thrive in an environment that adheres to a growth mindset. At the root of a
growth mindset is a cultural imperative to learn from experiences. Successful innovation and continuous learning
start with a commitment to transparency in measurement. This is a brave commitment for the cloud adoption
team. However, that commitment is meaningless if it's not matched by a commitment to preserve transparency
within the leadership and cloud strategy teams.
Transparency is important because measuring customer impact doesn't address the question of right or wrong.
Nor are impact measurements indicative of the quality of work or the performance of the adoption team. Instead,
they represent an opportunity to learn and better meet your customers' needs. Misuse of innovation metrics can
stifle that culture. Eventually, such misuse will lead to manipulation of metrics, which in turn causes long-term
failure of the invention, the supporting staff, and ultimately the management structure who misused the data.
Leaders and contributors alike should avoid using measurements for anything other than an opportunity to learn
and improve the MVP solution.

Commitment to iteration
Only one promise rings true across all innovation cycles—you won't get it right on the first try. Measurement
helps you understand what adjustments you should make to achieve the desired results. Changes that lead to
favorable outcomes stem from iterations of the build-measure-learn process. The cloud adoption team and the
cloud strategy team must commit to an iterative mindset before adopting a growth mindset or a build-measure-
learn approach.

Next steps
Before building the next great invention, get started with customer adoption by understanding the build-
measure-learn feedback loop.
Customer adoption with the build-measure-learn feedback loop
Build consensus on the business value of innovation
5 minutes to read • Edit Online

The first step to developing any new innovation is to identify how that innovation can drive business value. In this
exercise, you answer a series of questions that highlight the importance of investing ample time when your
organization defines business value.

Qualifying questions
Before you develop any solution (in the cloud or on-premises), validate your business value criteria by answering
the following questions:
1. What is the defined customer need that you seek to address with this solution?
2. What opportunities would this solution create for your business?
3. Which business outcomes would be achieved with this solution?
4. Which of your company's motivations would be served with this solution?
If the answers to all four questions are well documented, you might not need to complete the rest of this exercise.
Fortunately, you can easily test any documentation. Set up two short meetings to test both the documentation and
your organization's internal alignment. Invite committed business stakeholders to one meeting and set up a
separate meeting with the engaged development team. Ask the four questions above to each group, and then
compare the results.

NOTE
The existing documentation should not be shared with either team before the meeting. If true alignment exists, the guiding
hypotheses should be referenced or even recited by members of each group.

WARNING
Don't facilitate the meeting. This test is to determine alignment; it's not an alignment creation exercise. When you start the
meeting, remind the attendees that the objective is to test directional alignment to existing agreements within the team.
Establish a five-minute time limit for each question. Set a timer and close each question after five minutes even if the
attendees haven't agreed upon an answer.

Account for the different languages and interests of each group. If the test results in answers that are directionally
aligned, consider this exercise a victory. You're ready to move on to solution development.
If one or two of the answers are directionally aligned, recognize that your hard work is paying off. You're already
better aligned than most organizations. Future success is likely with minor continuing investment in alignment.
Review each of the following sections for ideas that may help you build further alignment.
If either team fails to answer all four questions in 30 minutes, then alignment and the considerations in the
following sections are likely to have a significant impact on this effort and others. Pay careful attention to each of
the following sections.

Address the big picture first


The Cloud Adoption Framework follows a prescribed path through these phases: strategy, plan, ready, and adopt.
Cloud innovation fits within the adopt phase of this process. The answers to qualifying questions three and four
concern outcomes and motivations. When these answers are misaligned, it indicates that your organization missed
something during the strategy phase of the cloud adoption lifecycle. Several of the following scenarios are likely to
be at play.
Alignment opportunity: When business stakeholders can't agree on motivations and business outcomes
related to a cloud innovation effort, it's a symptom of a larger challenge. The exercises in the cloud strategy
phase can be useful in developing alignment among business stakeholders. Additionally, it's highly
recommended that the same stakeholders form a cloud strategy team that meets regularly.
Communication opportunity: When the development team can't agree on motivations and business
outcomes, it might be a symptom of strategic communication gaps. You can quickly resolve this issue by
reviewing the cloud strategy with the cloud adoption team. Several weeks after the review, the team should
repeat the qualifying questions exercise.
Prioritization opportunity: A cloud strategy is essentially an executive-level hypothesis. The best cloud
strategies are open to iteration and feedback. If both teams understand the strategy, but still can't quite
align answers to these questions, then priorities might be misaligned. Organize a session with the cloud
adoption team and the cloud strategy team. This session can help the efforts of both groups. The cloud
adoption team starts by sharing their aligned answers to the qualifying questions. From there, a
conversation between the cloud adoption team and cloud strategy team can highlight opportunities to
better align priorities.
These big picture opportunities often reveal ways to better align the innovative solution with the cloud strategy.
This exercise has two common outcomes:
These conversations can help your team improve your organization's cloud strategy and better represent
important customer needs. Such a change can result in greater executive support for your team.
Conversely, these conversations might show that your cloud adoption team should invest in a different
solution. In this case, consider migrating this solution before continuing to invest in innovation. Alternately,
these conversations might indicate that you adopt a citizen developer approach to test the business value first.
In either case, they will help your team avoid making a large investment with limited business returns.

Address solution alignment


It's fairly common for the answers to questions one and two to be misaligned. During the early stages of ideation
and development, customer need and business opportunity often get out of alignment. Many development teams
find it challenging to achieve a balance between too much and too little definition. The Cloud Adoption Framework
recommends lean approaches like build-measure-learn feedback loops to answer these questions. The following
list shows opportunities and approaches to create alignment.
Hypothesis opportunity: It's common for various stakeholders and development teams to have too many
expectations for a solution. Unrealistic expectations can be a sign that the hypothesis is too vague. Follow the
guidance on building with customer empathy to construct a clearer hypothesis.
Build opportunity: Teams might be misaligned because they disagree on the way to solve the customer need.
Such disagreement typically indicates that the team is being delayed by a premature technical spike. To keep
the team focused on the customer, start the first iteration and build a small minimum viable product (MVP ) to
address part of the hypothesis. For more guidance to help the team move forward, see Develop digital
inventions.
Training opportunity: Either team can be misaligned because they need deep technical requirements and
extensive functional requirements. This need can lead to an opportunity for training in agile methodologies.
When the team culture isn't ready for agile processes, you might find innovation and keeping pace with the
market to be a challenge. For training resources about DevOps and agile practices, see:
Evolve your DevOps practices
Build applications with Azure DevOps
Deploy applications with Azure DevOps
By following the methodology and the backlog management tools in each section of this article, you can help
create solution alignment.

Next steps
After you've aligned your business value proposition and communicated it, you're ready to start building your
solution.
Return to the innovate exercises for next steps
Create customer partnerships through the build-
measure-learn feedback loop
2 minutes to read • Edit Online

True innovation comes from the hard work of building solutions that demonstrate customer empathy, from
measuring the impact of those changes on the customer, and from learning with the customer. Most importantly, it
comes from feedback over multiple iterations.
If the past decade has taught us anything about innovation, it's that the old rules of business have changed. Large,
wealthy incumbents no longer have an unbreakable hold on the market. The first or best players to market are
always the winners. Having the best idea doesn't lead to market dominance. In a rapidly changing business
climate, market leaders are the most agile. Those who can adapt to changing conditions lead.
Large or small, the companies that thrive in the digital economy as innovative leaders are those with the greatest
ability to listen to their customer base. That skill can be cultivated and managed. At the core of all good
partnerships is a clear feedback loop. The process for building customer partnerships within the Cloud Adoption
Framework is the build-measure-learn feedback loop.

The build-measure-learn feedback loop


As described in Innovation in the digital economy, innovation requires a balance of invention and adoption.
Customer feedback and partnership drive adoption. By turning your customers into strong, loyal partners during
innovation cycles, you can realize better products and gain quicker traction in the market.
This process for managing customer partnerships and integrating them into your innovation efforts includes three
phases of development:
Build with customer empathy
Measure for customer impact
Learn with customers
Each phase of the process helps you build better solutions with your customers.

Next steps
Learn how to Build with customer empathy to begin your build-measure-learn cycle.
Build with customer empathy
Build with customer empathy
11 minutes to read • Edit Online

"Necessity is the mother of invention." This proverb captures the indelibility of the human spirit and our natural
drive to invent. As explained in the Oxford English Dictionary, "When the need for something becomes
imperative, you are forced to find ways of getting or achieving it." Few would deny these universal truths about
invention. However, as described in Innovation in the digital economy, innovation requires a balance of
invention and adoption.
Continuing with the analogy, innovation comes from a more extended family. Customer empathy is the proud
parent of innovation. Creating a solution that drives innovation requires a legitimate customer need—one that
keeps the customer coming back to solve critical challenges. These solutions are based on what a customer
needs rather than on their wants or whims. To find customers' true needs, we start with empathy—a deep
understanding of the customer's experience. Empathy is an underdeveloped skill for many engineers, product
managers, and even business leaders. Fortunately, the diverse interactions and rapid pace of the cloud architect
role have already started fostering this skill.
Why is empathy so important? From the first release of a minimum viable product (MVP ) to the general
availability of a market-grade solution, customer empathy helps us understand and share in the experience of
the customer. Empathy helps us build a better solution. More importantly, it better positions us to invent
solutions that will encourage adoption. In a digital economy, those who can most readily empathize with
customer needs can build a brighter future that redefines and leads the market.

How to build with empathy


Planning is intrinsically an exercise in defining assumptions. The more we plan, the more we see assumptions
creep into the foundation of a great idea. Assumptions tend to be the product of self-empathy—in other words,
"what would I want if I were in this position?" Starting with the build phase minimizes the period in which
assumptions can invade a solution. This approach also accelerates the feedback loop with real customers,
triggering earlier opportunities to learn and sharpen empathy.
Cau t i on

Properly defining what to build can be tricky and requires some practice. If you build something too quickly, if
might not reflect customer needs. If you spend too much time trying to understand initial customer needs and
solution requirements, the market may meet them before you have a chance to build anything at all. In either
scenario, the opportunity to learn can be significantly delayed or reduced. Sometimes the data can even be
corrupted.
The most innovative solutions in history began with an intuitive belief. That gut feeling comes from both existing
expertise and firsthand observation. We start with the build phase because it allows for a rapid test of that
intuition. From there, we can cultivate deeper understanding and clearer degrees of empathy. At every iteration
or release of a solution, balance comes from building MVPs that demonstrate customer empathy.
To steady this balancing act, the following two sections discuss the concepts of building with empathy and
defining an MVP.
Define a customer focused-hypothesis
Building with empathy means creating a solution based on defined hypotheses that illustrate a specific customer
need. The following steps aim to formulate a hypothesis that will encourage building with empathy.
1. When you build with empathy, the customer is always the focus. This intention can take many shapes. You
could reference a customer archetype, a specific persona, or even a picture of a customer in the midst of the
problem you want to solve. And keep in mind that customers can be internal (employees or partners) or
external (consumers or business customers). This definition is the first hypothesis to be tested: Can we help
this specific customer?
2. Understand the customer experience. Building with empathy means you can relate to the customer's
experience and understand their challenges. This mindset indicates the next hypothesis to be tested: Can we
help this specific customer with this manageable challenge?
3. Define a simple solution to a single challenge. Relying on expertise across people, processes, and subject
matter experts will lead to a potential solution. This is the full hypothesis to be tested: Can we help this
specific customer with this manageable challenge through the proposed solution?
4. Arrive at a value statement. What long-term value do you hope to provide to these customers? The answer to
this question creates your full hypothesis: How will these customers' lives be improved by using the
proposed solution to address this manageable challenge?
This last step is the culmination of an empathy-driven hypothesis. It defines the audience, the problem, the
solution, and the metric by which improvement is to be made, all of which center on the customer. During the
measure and learn phases, each hypothesis should be tested. Changes in the customer, problem statement, or
solution are anticipated as the team develops greater empathy for the addressable customer base.
Cau t i on

The goal is to build with customer empathy, not to plan with it. It's all too easy to get stuck in endless cycles of
planning and tweaking to hit upon the perfect customer empathy statement. Before you try to develop such a
statement, review the following sections on defining and building an MVP.
After core assumptions are proven, later iterations will focus on growth tests in addition to empathy tests. After
empathy is built, tested, and validated, you can begin to understand the addressable market at scale. This can be
done through an expansion of the standard hypothesis formula described earlier. Based on available data,
estimate the size of the total market—the number of potential customers.
From there, estimate the percentage of that total market that experiences a similar challenge and that might
therefore be interested in this solution. This is your addressable market. The next hypothesis to be tested is: how
will x% of customers' lives be improved by using the proposed solution to address this manageable challenge? A
small sampling of customers will reveal leading indicators that suggest a percentage impact on the pool of
customers engaged.
Define a solution to test the hypothesis
During each iteration of a build-measure-learn feedback loop, your attempt to build with empathy is defined by
an MVP.
An MVP is the smallest unit of effort (invention, engineering, application development, or data architecture)
required to create enough of a solution to learn with the customer. The goal of every MVP is to test some or all
of the prior hypotheses and to receive feedback directly from the customer. The output is not a beautiful
application with all the features required to change your industry. The desired output of each iteration is a
learning opportunity—a chance to more deeply test a hypothesis.
Timeboxing is a standard way to make sure a product remains lean. For example, make sure your development
team thinks the solution can be created in a single iteration to allow for rapid testing. To better understand using
velocity, iterations, and releases to define what minimal means, see Planning velocity, iterations, release, and
iteration paths.
Reduce complexity and delay technical spikes
The disciplines of invention found in the Innovate methodology describe the functionality that's often required
to deliver a mature innovation or scale-ready MVP solution. Use these disciplines as a long-term guide for
feature inclusion. Likewise, use them as a cautionary guide during early testing of customer value and empathy
in your solution.
Feature breadth and the different disciplines of invention can't all be created in a single iteration. It might take
several releases for an MVP solution to include the complexity of multiple disciplines. Depending on the
investment in development, there might be multiple parallel teams working within different disciplines to test
multiple hypotheses. Although it's smart to maintain architectural alignment between those teams, it's unwise to
try to build complex, integrated solutions until value hypotheses can be validated.
Complexity is best detected in the frequency or volume of technical spikes. Technical spikes are efforts to create
technical solutions that can't be easily tested with customers. When customer value and customer empathy are
untested, technical spikes represent a risk to innovation and should be minimized. For the types of mature tested
solutions found in a migration effort, technical spikes can be common throughout adoption. However, they
delay the testing of hypotheses in innovation efforts and should be postponed whenever possible.
A relentless simplification approach is suggested for any MVP definition. This approach means removing
anything that doesn't add to your ability to validate the hypothesis. To minimize complexity, reduce the number
of integrations and features that aren't required to test the hypothesis.
Build an MVP
At each iteration, an MVP solution can take many different shapes. The common requirement is only that the
output allows for measurement and testing of the hypothesis. This simple requirement initiates the scientific
process and allows the team to build with empathy. To deliver this customer-first focus, an initial MVP might rely
on only one of the disciplines of invention.
In some cases, the fastest path to innovation means temporarily avoiding these disciplines entirely, until the
cloud adoption team is confident that the hypothesis has been accurately validated. Coming from a technology
company like Microsoft, this guidance might sound counterintuitive. However, this simply emphasizes that
customer needs, not a specific technology decision, are the highest priority in an MVP solution.
Typically, an MVP solution consists of a simple web app or data solution with minimal features and limited
polish. For organizations that have professional development expertise, this path is often the fastest one to
learning and iteration. The following list includes several other approaches a team might take to build an MVP:
A predictive algorithm that's wrong 99% of the time but that demonstrates specific desired outcomes.
An IoT device that doesn't communicate securely at production scale but that demonstrates the value of
nearly real-time data within a process.
An application built by a citizen developer to test a hypothesis or meet smaller-scale needs.
A manual process that re-creates the benefits of the application to follow.
A wireframe or video that's detailed enough to allow the customer to interact.
Developing an MVP shouldn't require massive amounts of development investment. Preferably, investment
should be as constrained as possible to minimize the number of hypotheses being tested at one time. Then, in
each iteration and with each release, the solution is intentionally improved toward a scale-ready solution that
represents multiple disciplines of invention.
Accelerate MVP development
Time to market is crucial to the success of any innovation. Faster releases lead to faster learning. Faster learning
leads to products that can scale more quickly. At times, traditional application development cycles can slow this
process. More frequently, innovation is constrained by limits on available expertise. Budgets, headcount, and
availability of staff can all create limits to the number of new innovations a team can handle.
Staffing constraints and the desire to build with empathy have spawned a rapidly growing trend toward citizen
developers. These developers reduce risk and provide scale within an organization's professional development
community. Citizen developers are subject matter experts where the customer experience is concerned, but
they're not trained as engineers. These individuals use prototyping tools or lighter-weight development tools
that might be frowned upon by professional developers. These business-aligned developers create MVP
solutions and test theories. When aligned well, this process can create production solutions that provide value
but don't pass a sufficiently effective scale hypothesis. They can also be used to validate a prototype before scale
efforts begin.
Within any innovate plan, cloud adoption teams should diversify their portfolios to include citizen developer
efforts. By scaling development efforts, more hypotheses can be formed and tested at a reduced investment.
When a hypothesis is validated and an addressable market is identified, professional developers can harden and
scale the solution by using modern development tools.
Final build gate: Customer pain
When customer empathy is strong, a clearly existing problem should be easy to identify. The customer's pain
should be obvious. During build, the cloud adoption team is building a solution to test a hypothesis based on a
customer pain point. If the hypothesis is well-defined but the pain point is not, the solution is not truly based on
customer empathy. In this scenario, build is not the right starting point. Instead, invest first in building empathy
and learning from real customers. The best approach for building empathy and validating pain is simple: listen
to your customers. Invest time in meeting with and observing them until you can identify a pain point that
occurs frequently. After the pain point is well-understood, you're ready to test a hypothesized solution for
addressing that pain.

When not to apply this approach


There are many legal, compliance, and industry requirements that might require an alternative approach. If
public releases of a developing solution create risk to patent timing, intellectual property protection, customer
data leaks, or violation of established compliance requirements, this approach may not be suitable. When
perceived risks like these exist, consult legal counsel before adopting any guided approach to release
management.

References
Some of the concepts in this article build on topics discussed in The Lean Startup (Eric Ries, Crown Business,
2011).

Next steps
After you've built an MVP solution, you can measure the empathy value and scale value. Learn how to measure
for customer impact.
Measure for customer impact
Measure for customer impact
4 minutes to read • Edit Online

There are several ways to measure for customer impact. This article will help you define metrics to validate
hypotheses that arise out of an effort to build with customer empathy.

Strategic metrics
During the strategy phase of the cloud adoption lifecycle, we examine motivations and business outcomes. These
practices provide a set of metrics by which to test customer impact. When innovation is successful, you tend to
see results that are aligned with your strategic objectives.
Before establishing learning metrics, define a small number of strategic metrics that you want this innovation to
affect. Generally those strategic metrics align with one or more of the following outcome areas: business agility,
customer engagement, customer reach, financial impact, or in the case of operational innovation: solution
performance.
Document the agreed-upon metrics and track their impact frequently. But don't expect results in any of these
metrics to emerge for several iterations. For more information about setting and aligning expectations across the
parties involved, see Commitment to iteration.
Aside from motivation and business outcome metrics, the remainder of this article focuses on learning metrics
designed to guide transparent discovery and customer-focused iterations. For more information about these
aspects, see Commitment to transparency.

Learning metrics
When the first version of any minimum viable product (MVP ) is shared with customers, preferably at the end of
the first development iteration, there will be no impact on strategic metrics. Several iterations later, the team may
still be struggling to change behaviors enough to materially affect strategic metrics. During learning processes,
such as build-measure-learn cycles, we advise the team to adopt learning metrics. These metrics tracking and
learning opportunities.
Customer flow and learning metrics
If an MVP solution validates a customer-focused hypothesis, the solution will drive some change in customer
behaviors. Those behavior changes across customer cohorts should improve business outcomes. Keep in mind
that changing customer behavior is typically a multistep process. Because each step provides an opportunity to
measure impact, the adoption team can keep learning along the way and build a better solution.
Learning about changes to customer behavior starts by mapping the flow that you hope to see from an MVP
solution.
In most cases, a customer flow will have an easily defined starting point and no more than two end points.
Between the start and end points are a variety of learning metrics to be used as measures in the feedback loop:
1. Starting point—initial trigger: The starting point is the scenario that triggers the need for this solution.
When the solution is built with customer empathy, that initial trigger should inspire a customer to try the MVP
solution.
2. Customer need met: The hypothesis is validated when a customer need has been met by using the solution.
3. Solution steps: This term refers to the steps that are required to move the customer from the initial trigger to
a successful outcome. Each step produces a learning metric based on a customer decision to move on to the
next step.
4. Individual adoption achieved: The next time the trigger is encountered, if the customer returns to the
solution to get their need met, individual adoption has been achieved.
5. Business outcome indicator: When a customer behaves in a way that contributes to the defined business
outcome, a business outcome indicator is observed.
6. True Innovation: When business outcome indicators and individual adoption both occur at the desired scale,
you've realized true innovation.
Each step of the customer flow generates learning metrics. After each iteration (or release), a new version of the
hypothesis is tested. At the same time, tweaks to the solution are tested to reflect adjustments in the hypothesis.
When customers follow the prescribed path in any given step, a positive metric is recorded. When customers
deviate from the prescribed path, a negative metric is recorded.
These alignment and deviation counters create learning metrics. Each should be recorded and tracked as the
cloud adoption team progresses toward business outcomes and true innovation. In Learn with customers, we'll
discuss ways to apply these metrics to learn and build better solutions.
Grouping and observing customer partners
The first measurement in defining learning metrics is the customer partner definition. Any customer who
participates in innovation cycles qualifies as a customer partner. To accurately measure behavior, you should use a
cohort model to define customer partners. In this model, customers are grouped to sharpen your understanding
of their responses to changes in the MVP. These groups typically resemble the following:
Experiment or focus group: Grouping customers based on their participation in a specific experiment
designed to test changes over time.
Segment: Grouping customers by the size of the company.
Vertical: Grouping customers by the industry vertical they represent.
Individual demographics: Grouping based on personal demographics like age and physical location.
These types of groupings help you validate learning metrics across various cross-sections of those customers
who choose to partner with you during your innovation efforts. All subsequent metrics should be derived from
definable customer grouping.

Next steps
As learning metrics accumulate, the team can begin to learn with customers.
Learn with customers
Some of the concepts in this article build on topics first described in The Lean Startup, written by Eric Ries.
Learn with customers
4 minutes to read • Edit Online

Our current customers represent our best resource for learning. By partnering with us, they help us build with
customer empathy to find the best solution to their needs. They also help create a minimum viable product (MVP )
solution by generating metrics from which we measure customer impact. In this article, we'll describe how to
learn with and from our customer-partners.

Continuous learning
At the end of every iteration, we have an opportunity to learn from the build and measure cycles. This process of
continuous learning is quite simple. The following image offers an overview of the process flow.

At its most basic, continuous learning is a method for responding to learning metrics and assessing their impact
on customer needs. This process consists of three primary decisions to be made at the end of each iteration:
Did the hypothesis prove true? When the answer is yes, celebrate for a moment and then move on. There
are always more things to learn, more hypotheses to test, and more ways to help the customer in your next
iteration. When a hypothesis proves true, it's often a good time for teams to decide on a new feature that will
enhance the solution's utility for the customer.
Can you get closer to a validated hypothesis by iterating on the current solution? The answer is
usually yes. Learning metrics typically suggest points in the process that lead to customer deviation. Use these
data points to find the root of a failed hypothesis. At times, the metrics may also suggest a solution.
Is a reset of the hypothesis required? The scariest thing to learn in any iteration is that the hypothesis or
underlying need was flawed. When this happens, an iteration alone isn't necessarily the right answer. When a
reset is required, the hypothesis should be rewritten and the solution reviewed in light of the new hypothesis.
The sooner this type of learning occurs, the easier it will be to pivot. Early hypotheses should focus on testing
the riskiest aspects of the solution in service of avoiding pivots later in development.
Unsure? The second most common response after "iterate" is "we're not sure." Embrace this response. It
represents an opportunity to engage the customer and to look beyond the data.
The answers to these questions will shape the iteration to follow. Companies that demonstrate an ability to apply
continuous learning and boldly make the right decisions for their customers are more likely to emerge as leaders
in their markets.
For better or worse, the practice of continuous learning is an art that requires a great deal of trial and error. It also
requires some science and data-driven decision-making. Perhaps the most difficult part of adopting continuous
learning concerns the cultural requirements. To effectively adopt continuous learning, your business culture must
be open to a fail first, customer-focused approach. The following section provides more details about this
approach.

Growth mindset
Few could deny the radical transformation within Microsoft culture that's occurred over the last several years. This
multifaceted transformation, led by Satya Nadella, has been hailed as a surprising business success story. At the
heart of this story is the simple belief we call the growth mindset. An entire section of this framework could be
dedicated to the adoption of a growth mindset. But to simplify this guidance, we'll focus on a few key points that
inform the process of learning with customers:
Customer first: If a hypothesis is designed to improve the experience of real customers, you have to meet real
customers where they are. Don't just rely on metrics. Compare and analyze metrics based on firsthand
observation of customer experiences.
Continuous learning: Customer focus and customer empathy stem from a learn-it-all mindset. The Innovate
method strives to be learn-it-all, not know -it-all.
Beginner's mindset: Demonstrate empathy by approaching every conversation with a beginner's mindset.
Whether you're new to your field or a 30-year veteran, assume you know little, and you'll learn a lot.
Listen more: Customers want to partner with you. Unfortunately, an ego-driven need to be right blocks that
partnership. To learn beyond the metrics, speak less and listen more.
Encourage others: Don't just listen; use the things you do say to encourage others. In every meeting, find
ways to pull in diverse perspectives from those who may not be quick to share.
Share the code: When we feel our obligation is to the ownership of a code base, we lose sight of the true
power of innovation. Focus on owning and driving outcomes for your customers. Share your code (publicly
with the world or privately within your company) to invite diverse perspectives into the solution and the code
base.
Challenge what works: Success doesn't necessarily mean you're demonstrating true customer empathy.
Avoid having a fixed mindset and a bias toward doing what's worked before. Look for learning in positive and
negative metrics by engaging your customers.
Be inclusive: Work hard to invite diverse perspectives into the mix. There are many variables that can divide
humans into segregated groups. Cultural norms, past behaviors, gender, religion, sexual preference, even
physical abilities. True innovation comes when we challenge ourselves to see past our differences and
consciously strive to include all customers, partners, and coworkers.

Next steps
As a next step to understanding this methodology, Common blockers and challenges to innovation can prepare
you for the changes ahead.
Understanding common blockers and challenges
Some of the concepts in this article build on topics first described in The Lean Startup, written by Eric Ries.
Common blockers and challenges to innovation
5 minutes to read • Edit Online

As described in Innovation in the digital economy, innovation requires a balance of invention and adoption. This
article expands on the common challenges and blockers to innovation, as it aims to help you understand how this
approach can add value during your innovation cycles. Formula for innovation: Innovation = Invention +
Adoption

Adoption challenges
Cloud technology advances have reduced some of the friction related to adoption. However, adoption is more
people-centric than technology-centric. And unfortunately, the cloud can't fix people.
The following list elaborates on some of the most common adoption challenges related to innovation. As you
progress through the Innovate methodology, each of the challenges in the following sections will be identified and
addressed. Before you apply this methodology, evaluate your current innovation cycles to determine which are the
most important challenges or blockers for you. Then, use the methodology to address or remove those blockers.
External challenges
Time to market: In a digital economy, time to market is one of the most crucial indicators of market
domination. Surprisingly, time to market impact has little to do with positioning or early market share. Both of
those factors are fickle and temporary. The time to market advantage comes from the simple truth that more
time your solution has on the market, the more time you have to learn, iterate, and improve. Focus heavily on
quick definition and rapid build of an effective minimum viable product to shorten time to market and
accelerate learning opportunities.
Competitive challenges: Dominant incumbents reduce opportunities to engage and learn from customers.
Competitors also create external pressure to deliver more quickly. Build fast but invest heavily in understanding
the proper measures. Well-defined niches produce more actionable feedback measures and enhance your
ability to partner and learn, resulting in better overall solutions.
Understand your customer: Customer empathy starts with an understanding of the customer and customer
base. One of the biggest challenges for innovators is the ability to rapidly categorize measurements and
learning within the build-measure-learn cycle. It's important to understand your customer through the lenses
of market segmentation, channels, and types of relationships. Throughout the build-measure-learn cycle, these
data points help create empathy and shape the lessons learned.
Internal challenges
Choosing innovation candidates: When investing in innovation, healthy companies spawn an endless
supply of potential inventions. Many of these create compelling business cases that suggest high returns and
generate enticing business justification spreadsheets. As described in the build article, building with customer
empathy should be prioritized over invention that's based only on gain projections. If customer empathy isn't
visible in the proposal, long-term adoption is unlikely.
Balancing the portfolio: Most technology implementations don't focus on changing the market or improving
the lives of customers. In the average IT department, more than 80% of workloads are maintained for basic
process automation. With the ease of innovation, it's tempting to innovate and rearchitect those solutions. Most
of the times, those workloads can experience similar or better returns by migrating or modernizing the solution,
with no change to core business logic or data processes. Balance your portfolio to favor innovation strategies
that can be built with clear empathy for the customer (internal or external). For all other workloads, follow a
migrate path to financial returns.
Maintaining focus and protecting priorities: When you've made a commitment to innovation, it's
important to maintain your team's focus. During the first iteration of a build phase, it's relatively easy to keep a
team excited about the possibilities of changing the future for your customers. However, that first MVP release
is just the beginning. True innovation comes with each build-measure-learn cycle, by learning from the
feedback loops to produce a better solution. As a leader in any innovation process, you should concentrate on
keeping the team focused and on maintaining your innovation priorities through the subsequent, less-
glamorous build iterations.

Invention challenges
Before the widespread adoption of the cloud, invention cycles that depended on information technology were
laborious and time-consuming. Procurement and provisioning cycles frequently delayed the crucial first steps
toward any new solutions. The cost of DevOps solutions and feedback loops delayed teams' abilities to collaborate
on early stage ideation and invention. Costs related to developer environments and data platforms prevented
anyone but highly trained professional developers from participating in the creation of new solutions.
The cloud has overcome many of these invention challenges by providing self-service automated provisioning,
light-weight development and deployment tools, and opportunities for professional developers and citizen
developers to cooperate in creating rapid solutions. Leveraging the cloud for innovation dramatically reduces
customer challenges and blockers to the invention side of the innovation equation.
Invention challenges in a digital economy
The invention challenges of today are different. The endless potential of cloud technologies also produces more
implementation options and deeper considerations about how those implementations might be used.
The Innovate methodology uses the following innovation disciplines to help align your implementation decisions
with your invention and adoption goals:
Data platforms: New sources and variations on data are available. Many of these couldn't be integrated into
legacy or on-premises applications to create cost-effective solutions. Understanding the change you hope to
drive in customers will inform your data platform decisions. Those decisions will be an extension of selected
approaches to ingest, integrate, categorize, and share data. Microsoft refers to this decision-making process as
the democratization of data.
Device interactions: IoT, mobile, and augmented reality blur the lines between digital and physical,
accelerating the digital economy. Understanding the real-world interactions surrounding customer behavior
will drive decisions about device integration.
Applications: Applications are no longer the exclusive domain of professional developers. Nor do they require
traditional server-based approaches. Empowering professional developers, enabling business specialists to
become citizen developers, and expanding compute options for API, micro-services, and PaaS solutions expand
application interface options. Understanding the digital experience required to shape customer behavior will
improve your decision-making about application options.
Source code and deployment: Collaboration between developers of all walks improves both quality and
speed to market. Integration of feedback and a rapid response to learning shape market leaders. Commitments
to the build, measure, and learn processes help accelerate tool adoption decisions.
Predictive solutions: In a digital economy, it's seldom sufficient to simply meet the current needs of your
customers. Customers expect businesses to anticipate their next steps and predict their future needs.
Continuous learning often evolves into prediction tooling. The complexity of customer needs and the
availability of data will help define the best tools and approaches to predict and influence.
In a digital economy, the greatest challenge architects face is to clearly understand their customers' invention and
adoption needs and to then determine the best cloud-based toolchain to deliver on those needs.

Next steps
Based on the knowledge gained regarding the build-measure-learn model and growth mindset, you are now ready
to develop digital inventions within the Innovate methodology.
Develop digital inventions
Develop digital inventions
2 minutes to read • Edit Online

As described in Innovation in the digital economy, innovation requires a balance of invention and adoption.
Customer feedback and partnership are required to drive adoption. The disciplines described in the next section
define a series of approaches to developing digital inventions while keeping adoption and customer empathy in
mind. Each of the disciplines is briefly described, along with deeper links into each process.

Summary of each discipline of digital invention


The following disciplines are not all required to drive innovation in any given case. By following the guidance in
Build with customer empathy, the objective is to test a hypothesis in every iteration. By defining the output of
each iteration as a minimum viable product (MVP ), this should enable you to involve the fewest possible number
of disciplines.
Democratize data: By getting data into the hands of customers, partners, and employees, you encourage
innovative observation. Ingest, centralize, govern, and share data.
Engage through apps: People connect with knowledge through apps and experiences. Empower
professional and citizen developers to create apps quickly.
Empower adoption: Encourage innovation by reducing friction to adoption and partnership. Architect for
visibility, collaboration, speed, and feedback loops.
Interact with devices: Digital and physical lines have blurred across multiple-channels. Deliver experiences
across devices, IoT, and mixed reality.
Predict and influence: Look to the future to lead innovation. Look past current data to inform experiences
and interactions through predictive tools.

Next steps
Democratization of data is the first discipline of innovation to consider and evaluate.
Democratize data
Democratize data
7 minutes to read • Edit Online

Coal, oil, and human potential were the three most consequential assets during the Industrial Revolution. These
assets built companies, shifted markets, and ultimately changed nations. In the digital economy, there are three
equally important assets: data, devices, and human potential. Each of these assets holds great innovation
potential. For any innovation effort in the modern era, data is the new oil.
Across every company today, there are pockets of data that could be used to find and meet customer needs more
effectively. Unfortunately, the process of mining that data to drive innovation has long been costly and time-
consuming. Many of the most valuable solutions to customer needs go unmet because the right people can't
access the data they need.
Democratization of data is the process of getting this data into the right hands to drive innovation. This process
can take several forms, but they generally include solutions for ingested or integrated raw data, centralization of
data, sharing data, and securing data. When these methods are successful, experts around the company can use
the data to test hypotheses. In many cases, cloud adoption teams can build with customer empathy using only
data, and rapidly addressing existing customer needs.

Process of democratizing data


The following phases will guide the decisions and approaches required to adopt a solution that democratizes data.
Not every phase will necessarily be required to build a specific solution. However, you should evaluate each
phase when you're building a solution to a customer hypothesis. Each provides a unique approach to the creation
of innovative solutions.

Share data
When you build with customer empathy, all processes elevate customer need over a technical solution. Because
democratizing data is no exception, we start by sharing data. To democratize data, it must include a solution that
shares data with a data consumer. The data consumer could be a direct customer or a proxy who makes decisions
for customers. Approved data consumers can analyze, interrogate, and report on centralized data, with no
support from IT staff.
Many successful innovations have been launched as an minimum viable product (MVP ) that deliver manual, data-
driven processes on behalf of the customer. In this concierge model, an employee is the data consumer. That
employee uses data to aid the customer. Each time the customer engages manual support, a hypothesis can be
tested and validated. This approach is often a cost effective means of testing a customer-focused hypothesis
before you invest heavily in integrated solutions.
The primary tools for sharing data directly with data consumers include self-service reporting or data embedded
within other experiences, using tools like Power BI.
NOTE
Before you share data, make sure you've read the following sections. Sharing data might require governance to provide
protection for the shared data. Also, that data might be spread across multiple clouds and could require centralization.
Much of the data might even reside within applications, which will require data collection before you can share it.

Govern data
Sharing data can quickly produce an MVP that you can use in customer conversations. However, to turn that
shared data into useful and actionable knowledge, a bit more is generally required. After a hypothesis has been
validated through data sharing, the next phase of development is typically data governance.
Data governance is a broad topic that could require it's own dedicated framework. That degree of granularity is
outside the scope of the Cloud Adoption Framework. However, there are several aspects of data governance that
you should consider as soon as the customer hypothesis is validated. For example:
Is the shared data sensitive? Data should be classified before being shared publicly to protect the interests
of customers and the company.
If the data is sensitive, has it been secured? Protection of sensitive data should be a requirement for any
democratized data. The example workload focused on securing data solutions provides a few references for
securing data.
Is the data catalogued? Capturing details about the data being shared will aid in long-term data
management. Tools for documenting data, like Azure Data Catalog, can make this process much easier in the
cloud. Guidance regarding the annotation of data and documentation of data sources can help accelerate the
process.
When democratization of data is important to a customer-focused hypothesis, make sure the governance of
shared data is somewhere in the release plan. This will help protect customers, data consumers, and the company.
Centralize data
When data is disrupted across an IT environment, opportunities to innovate can be extremely constrained,
expensive, and time-consuming. The cloud provides new opportunities to centralize data across data silos. When
centralization of multiple data sources is required to build with customer empathy, the cloud can accelerate the
testing of hypotheses.
Cau t i on

Centralization of data represents a risk point in any innovation process. When data centralization is a technical
spike (as opposed to a source of customer value), we suggest that you delay centralization until the customer
hypotheses have been validated.
If centralization of data is required, you should first define the appropriate data store for the centralized data. It's a
good practice to establish a data warehouse in the cloud. This scalable option provides a central location for all
your data. This type of solution is available in Online Analytical Processing (OLAP ) or Big Data options.
The reference architectures for OLAP and Big Data solutions can help you choose the most relevant solution in
Azure. If a hybrid solution is required, the reference architecture for extending on-premises data can also help
accelerate solution development.

IMPORTANT
Depending on the customer need and the aligned solution, a simpler approach may be sufficient. The cloud architect should
challenge the team to consider lower cost solutions that could result in faster validation of the customer hypothesis,
especially during early development. The following section on collecting data covers some scenarios that might suggest a
different solution for your situation.
Collect data
When you need data to be centralized to address a customer need, it's very likely that you'll also have to collect
the data from various sources and move it into the centralized data store. There are two primary forms of data
collection: integration and ingestion.
Integration: Data that resides in an existing data store can be integrated into the centralized data store by using
traditional data movement techniques. This is especially common for scenarios that involve multicloud data
storage. These techniques involve extracting the data from the existing data store and then loading it into the
central data store. At some point in this process, the data is typically transformed to be more usable and relevant
in the central store.
Cloud-based tools have turned these techniques into pay-per-use tools, reducing the barrier to entry for data
collection and centralization. Tools like Azure Database Migration Service and Azure Data Factory are two
examples. The reference architecture for data factory with an OLAP data store is an example of one such solution.
Ingestion: Some data doesn't reside in an existing data store. When this transient data is a primary source of
innovation, you'll want to consider alternative approaches. Transient data can be found in a variety of existing
sources like applications, APIs, data streams, IoT devices, a blockchain, an application cache, in media content, or
even in flat files.
You can integrate these various forms of data into a central data store on an OLAP or Big Data solution. However,
for early iterations of the build–measure–learn cycle, an Online Transactional Processing (OLTP ) solution might
be more than sufficient to validate a customer hypothesis. OLTP solutions aren't the highest-quality solution for
any reporting scenario. However, when you're building with customer empathy, it's more important to focus on
customer needs than on technical tooling decisions. After the customer hypothesis is validated at scale, a more
suitable platform might be required. The reference architecture on OLTP data stores can help you determine
which data store is most appropriate for your solution.
Virtualize: Integration and ingestion of data can sometimes slow innovation. When a solution for data
virtualization is already available, it might represent a more reasonable approach. Ingestion and integration can
both duplicate storage and development requirements, add data latency, increase attack surface area, trigger
quality issues, and increase governance efforts. Data virtualization is a more contemporary alternative that leaves
the original data in a single location and creates pass-through or cached queries of the source data.
SQL Server 2017 and Azure SQL Data Warehouse both support PolyBase which is the approach to data
virtualization most commonly used in Azure.

Next steps
With a strategy for democratizing data in place, you'll next want to evaluate approaches to engaging customers
through apps.
Engaging customers through apps
Engage through applications
8 minutes to read • Edit Online

As discussed in Democratize data, data is the new oil. It fuels most innovations across the digital economy.
Building on that analogy, applications are the fueling stations and infrastructure required to get that fuel into the
right hands.
In some cases, data alone is enough to drive change and meet customer needs. More commonly, though, solutions
to customer needs require applications to shape the data and create an experience. Applications are the way we
engage the user. They are the home for the processes required to respond to customer triggers. They are
customers' means of providing data and receiving guidance. This article summarizes several principles that can
help align you with the right application solution, based on the hypotheses to be validated.

Shared code
Teams that more quickly and accurately respond to customer feedback, market changes, and opportunities to
innovate typically lead their respective markets in innovation. The first principle of innovative applications is
summed up in the growth mindset overview: "Share the code." Over time, innovation emerges from a cultural
focus. To sustain innovation, diverse perspectives and contributions are required.
To be ready for innovation, all application development should start with a shared code repository. The most
widely adopted tool for managing code repositories is GitHub, which allows you to create a shared code
repository quickly. Alternatively, Azure Repos is a set of version control tools in Azure DevOps Services that you
can use to manage your code. Azure Repos provides two types of version control:
Git: distributed version control
Team Foundation Version Control (TFVC ): centralized version control

Citizen developers
Professional developers are a vital component of innovation. When a hypothesis proves accurate at scale,
professional developers are required to stabilize and prepare the solution for scale. Most of the principles
referenced in this article require support from professional developers. Unfortunately, current trends suggest
there's a greater demand for professional developers than there are developers. Moreover, the cost and pace of
innovation can be less favorable when professional development is deemed necessary. In response to these
challenges, citizen developers provide a way to scale development efforts and accelerate early hypothesis testing.
The use of citizen developers can be viable and effective when early hypotheses can be validated through tools like
PowerApps for app interfaces, AI Builder for processes and predictions, Microsoft Flow for workflows, and Power
BI for data consumption.

NOTE
When you rely on citizen developers to test hypotheses, it's advisable to have some professional developers on hand to
provide support, review, and guidance. After a hypothesis is validated at scale, a process for transitioning the application into
a more robust programming model will accelerate returns on the innovation. By involving professional developers in process
definitions early on, you can realize cleaner transitions later.

Intelligent experiences
Intelligent experiences combine the speed and scale of modern web applications with the intelligence of cognitive
services and bots. Alone, each of these technologies might be sufficient to meet your customers' needs. When
smartly combined, they broaden the spectrum of needs that can be met through a digital experience, while helping
to contain development costs.
Modern web apps
When an application or experience is required to meet a customer need, modern web applications can be the
fastest way to go. Modern web experiences can engage internal or external customers quickly and allow for rapid
iteration on the solution.
Infusing intelligence
Machine learning and artificial intelligence are increasingly available to developers. The wide-spread availability of
common APIs with predictive capabilities allows developers to better meet the needs of the customer through
expanded access to data and predictions.
Adding intelligence to a solution can enable speech to text, text translation, computer vision, and even visual
search. With these expanded capabilities, it's easier for developers to build solutions that take advantage of
intelligence to create an interactive and modern experience.
Bots
Bots provide an experience that feels less like using a computer and more like dealing with a person — at least
with an intelligent robot. They can be used to shift simple, repetitive tasks (such as making a dinner reservation or
gathering profile information) onto automated systems that might no longer require direct human intervention.
Users converse with a bot through text, interactive cards, and speech. A bot interaction can range from a quick
question-and-answer to a sophisticated conversation that intelligently provides access to services.
Bots are a lot like modern web applications: they live on the internet and use APIs to send and receive messages.
What's in a bot varies widely depending on what kind of bot it is. Modern bot software relies on a stack of
technology and tools to deliver increasingly complex experiences on a variety of platforms. However, a simple bot
could just receive a message and echo it back to the user with very little code involved.
Bots can do the same things as other types of software: read and write files, use databases and APIs, and handle
regular computational tasks. What makes bots unique is their use of mechanisms generally reserved for human-
to-human communication.

Cloud-native solutions
Cloud-native applications are built from the ground up, and they're optimized for cloud scale and performance.
Cloud-native applications are typically built using a microservices, serverless, event-based, or container-based
approaches. Most commonly, cloud-native solutions use a combination of microservices architectures, managed
services, and continuous delivery to achieve reliability and faster time to market.
A cloud-native solution allows centralized development teams to maintain control of the business logic without
the need for monolithic, centralized solutions. This type of solution also creates an anchor to drive consistency
across the input of citizen developers and modern experiences. Finally, cloud-native solutions provide an
innovation accelerator by freeing citizen and professional developers to innovate safely and with a minimum of
blockers.

Innovate through existing solutions


Many customer hypotheses can best be delivered by a modernized version of an existing solution. When the
current business logic meets customer needs (or comes really close), you might be able to accelerate innovation
by building on top of a modernized solution.
Most forms of modernization, including slight refactoring of the application, are included in the Migrate
methodology within the Cloud Adoption Framework. That methodology guides cloud adoption teams through the
process of migrating a digital estate to the cloud. The Azure Migration Guide provides a streamlined approach to
the same methodology, which is suitable for a small number of workloads or even a single application.
After a solution has been migrated and modernized, there are a variety of ways it can be used to create new,
innovative solutions to customer needs. For example, citizen developers could test hypotheses, or professional
developers could create intelligent experiences or cloud-native solutions.
Extend an existing solution
Extending a solution is one common form of modernization. This is approach can be the fastest path to innovation
when the following are true of the customer hypothesis:
Existing business logic should meet (or comes close to meeting) the existing customer need.
An improved experience would better meet the needs of a specific customer cohort.
The business logic required by the minimum viable product (MVP ) solution has been centralized, usually via an
N -tier, web services, API, or microservices design. This approach consists of wrapping the existing solution
within a new experience hosted in the cloud. In Azure, this solution would likely live in Azure App Services.
Rebuild an existing solution
If an application can't be easily extended, it may be necessary to refactor the solution. In this approach, the
workload is migrated to the cloud. After the application is migrated, parts of it are modified or duplicated, as web
services or microservices, which are deployed in parallel with the existing solution. The parallel service-based
solution could be treated like an extended solution. This solution would simply wrap the existing solution with a
new experience hosted in the cloud. In Azure, this solution would likely live in Azure App Services.
Cau t i on

Refactoring or rearchitecting solutions or centralizing business logic can quickly trigger a time-consuming
technical spikeinstead of a source of customer value. This is a risk to innovation, especially early in hypothesis
validation. With a bit of creativity in the design of a solution, there should be a path to MVP that doesn't require
refactoring of existing solutions. It's wise to delay refactoring until the initial hypothesis can be validated at scale.

Operating model innovations


In addition to modern, innovative approaches to app creation, there have been notable innovations in app
operations. These approaches have spawned many organizational movements. One of the most prominent is the
cloud center of excellence operating model. When fully staffed and mature, business teams have the option to
provide their own operational support for a solution.
The type of self-service operational management model found in a cloud center of excellence allows for tighter
controls and faster iterations within the solution environment. These goals are accomplished by transferring
operational control and accountability to the business team.
If you're trying to scale or meet global demand for an existing solution, this approach might be sufficient to
validate a customer hypothesis. After a solution is migrated and slightly modernized, the business team can scale
it to test a variety of hypotheses. These typically involve customer cohorts who are concerned with performance,
global distribution, and other customer needs hindered by IT operations.

Reduce overhead and management


The more there is to maintain within a solution, the slower that solution will iterate. This means you can accelerate
innovation by reducing the impact of operations on available bandwidth.
To prepare for the many iterations required to deliver an innovative solution, it's important to think ahead. For
example, minimize operational burdens early in the process by favoring serverless options. In Azure, serverless
application options could include Azure App Service or containers.
In parallel, Azure provides serverless transaction data options that also reduce overhead. The database products
list provides options for hosting data without the need for a full data platform.

Next steps
Depending on the hypothesis and solution, the principles in this article can aid in designing apps that meet MVP
definitions and engage users. Up next are the principles for empowering adoption, which offer ways to get the
application and data into the hands of customers more quickly and efficiently.
Empower adoption
Empower adoption
8 minutes to read • Edit Online

The ultimate test of innovation is customer reaction to your invention. Did the hypothesis prove true? Do
customers use the solution? Does it scale to meet the needs of the desired percentage of users? Most importantly,
do they keep coming back? None of these questions can be asked until the minimum viable product (MVP )
solution has been deployed. In this article, we'll focus on the discipline of empowering adoption.

Reduce friction that affects adoption


There are a few key friction points to adoption that can be minimized through a combination of technology and
processes. For readers with knowledge of continuous integration (CI) and continuous deployment (CD ) or
DevOps processes, the following will be familiar. This article establishes a starting point for cloud adoption teams
that fuels innovation and feedback loops. In the future, this starting point might foster more robust CI/CD or
DevOps approaches as the products and teams mature.
As described in Measure for customer impact, positive validation of any hypothesis requires iteration and
determination. You'll experience far more failures than wins during any innovation cycle. This is expected.
However, when a customer need, hypothesis, and solution align at scale, the world changes quickly. This article
aims to minimize technical spikes that slow innovation but still make sure you keep a few solid best practices in
place. Doing so will help the team design for future success while delivering on current customer needs.

Empowering adoption: the maturity model


The primary objective of the Innovate methodology is to build customer partnerships and accelerate feedback
loops, which will lead to market innovations. The following image and sections describe initial implementations
that support this methodology.

Shared solution: Establish a centralized repository for all aspects of the solution.
Feedback loops: Make sure that feedback loops can be managed consistently through iterations.
Continuous integration: Regularly build and consolidate the solution.
Reliable testing: Validate solution quality and expected changes to ensure the reliability of your testing metrics.
Solution deployment: Deploy solutions so that the team can quickly share changes with customers.
Integrated measurement: Add learning metrics to the feedback loop for clear analysis by the full team.
To minimize technical spikes, assume that maturity will initially be low across each of these principles. But
definitely plan ahead by aligning to tools and processes that can scale as hypotheses become more fine-grained.
In Azure, the GitHub and Azure DevOps allow small teams to get started with little friction. These teams might
grow to include thousands of developers who collaborate on scale solutions and test hundreds of customer
hypotheses. The remainder of this article illustrates the plan big/start small approach to empowering adoption
across each of these principles.

Shared solution
As described in Measure for customer impact, positive validation of any hypothesis requires iteration and
determination. You'll experience far more failures than wins during any innovation cycle. This is expected.
However, when a customer need, hypothesis, and solution align at scale, the world changes quickly.
When you're scaling innovation, there's no more valuable tool than a shared codebase for the solution.
Unfortunately, there's no reliable way of predicting which iteration or which MVP will yield the winning
combination. That's why it's never too early to establish a shared codebase or repository. This is the one technical
spike that should never be delayed. As the team iterates through various MVP solutions, a shared repo enables
easy collaboration and accelerated development. When changes to the solution drag down learning metrics,
version control lets you roll back to an earlier, more effective version of the solution.
The most widely adopted tool for managing code repositories is GitHub, which lets you create a shared code
repository with just a few clicks. Additionally, the Azure Repos feature of Azure DevOps can be used to create a
Git or Team Foundation repository.

Feedback loops
Making the customer part of the solution is the key to building customer partnerships during innovation cycles.
That's accomplished, in part, by measuring customer impact. It requires conversations and direct testing with the
customer. Both generate feedback that must be managed effectively.
Every point of feedback is a potential solution to the customer need. More importantly, every bit of direct
customer feedback represents an opportunity to improve the partnership. If feedback makes it into an MVP
solution, celebrate that with the customer. Even if some feedback isn't actionable, simply being transparent with
the decision to deprioritize the feedback demonstrates a growth mindset and a focus on continuous learning.
Azure DevOps includes ways to request, provide, and manage feedback. Each of these tools centralizes feedback
so that the team can take action and provide follow -up in service of a transparent feedback loop.

Continuous integration
As adoptions scale and a hypothesis gets closer to true innovation at scale, the number of smaller hypotheses to
be tested tends to grow rapidly. For accurate feedback loops and smooth adoption processes, it's important that
each of those hypotheses is integrated and supportive of the primary hypothesis behind the innovation. This
means that you also have to move quickly to innovate and grow, which requires multiple developers for testing
variations of the core hypothesis. For later stage development efforts, you might even need multiple teams of
developers, each building toward a shared solution. Continuous integration is the first step toward management
of all the moving parts.
In continuous integration, code changes are frequently merged into the main branch. Automated build and test
processes make sure that code in the main branch is always production quality. This ensures that developers are
working together to develop shared solutions that provide accurate and reliable feedback loops.
Azure DevOps and Azure Pipelines provide continuous integration capabilities with just a few clicks in GitHub or a
variety of other repositories. Learn more about continuous integration, or for more information, check out the
hands-on lab. There are also solution architectures to accelerate creation of your CI/CD pipelines through Azure
DevOps.

Reliable testing
Defects in any solution can create false positives or false negatives. Unexpected errors can easily lead to
misinterpretation of user adoption metrics. They can also generate negative feedback from customers that doesn't
accurately represent the test of your hypothesis.
During early iterations of an MVP solution, defects are expected; early adopters might even find them endearing.
In early releases, acceptance testing is typically nonexistent. However, one aspect of building with empathy
concerns the validation of the need and hypothesis. Both can be completed through unit tests at a code level and
manual acceptance tests before deployment. Together, these provide some means of reliability in testing. You
should strive to automate a well-defined series of build, unit, and acceptance tests. These will ensure reliable
metrics related to more granular tweaks to the hypothesis and the resulting solution.
The Azure Test Plans feature provides tooling to develop and operate test plans during manual or automated test
execution.

Solution deployment
Perhaps the most meaningful aspect of empowering adoption concerns your ability to control the release of a
solution to customers. By providing a self-service or automated pipeline for releasing a solution to customers,
you'll accelerate the feedback loop. By allowing customers to quickly interact with changes in the solution, you
invite them into the process. This approach also triggers quicker testing of hypotheses, thereby reducing
assumptions and potential rework.
The are several methods for solution deployment. The following represent the three most common:
Continuous deployment is the most advanced method, as it automatically deploys code changes into
production. For mature teams that are testing mature hypotheses, continuous deployment can be extremely
valuable.
During early stages of development, continuous delivery might be more appropriate. In continuous delivery,
any code changes are automatically deployed to a production-like environment. Developers, business decision-
makers, and others on the team can use this environment to verify that their work is production-ready. You can
also use this method to test a hypothesis with customers without affecting ongoing business activities.
Manual deployment is the least sophisticated approach to release management. As the name suggests,
someone on the team manually deploys the most recent code changes. This approach is error prone,
unreliable, and considered an antipattern by most seasoned engineers.
During the first iteration of an MVP solution, manual deployment is common, despite the preceding assessment.
When the solution is extremely fluid and customer feedback is unknown, there's a significant risk in resetting the
entire solution (or even the core hypothesis). Here's the general rule for manual deployment: no customer proof,
no deployment automation.
Investing early can lead to lost time. More importantly, it can create dependencies on the release pipeline that
make the team more resistant to an early pivot. After the first few iterations or when customer feedback suggests
potential success, a more advanced model of deployment should be quickly adopted.
At any stage of hypothesis validation, Azure DevOps and Azure Pipelines provide continuous delivery and
continuous deployment capabilities. Learn more about continuous delivery, or check out the hands-on lab.
Solution architecture can also accelerate creation of your CI/CD pipelines through Azure DevOps.

Integrated measurements
When you measure for customer impact, it's important to understand how customers react to changes in the
solution. This data, known as telemetry, provides insights into the actions a user (or cohort of users) took when
working with the solution. From this data, it's easy to get a quantitative validation of the hypothesis. Those metrics
can then be used to adjust the solution and generate more fine-grained hypotheses. Those subtler changes help
mature the initial solution in subsequent iterations, ultimately driving to repeat adoption at scale.
In Azure, Azure Monitor provides the tools and interface to collect and review data from customer experiences.
You can apply those observations and insights to refine the backlog by using Azure Boards.

Next steps
After you've gained an understanding of the tools and processes needed to empower adoption, it's time to
examine a more advanced innovation discipline: interact with devices. This discipline can help reduce the barriers
between physical and digital experiences, making your solution even easier to adopt.
Interact with devices
Ambient experiences: Interact with devices
8 minutes to read • Edit Online

In Build with customer empathy, we discussed the three tests of true innovation: Solve a customer need, keep the
customer coming back, and scale across a base of customer cohorts. Each test of your hypothesis requires effort
and iterations on the approach to adoption. This article offers insights on some advanced approaches to reduce
that effort through ambient experiences. By interacting with devices, instead of an application, the customer may
be more likely to turn to your solution first.

Ambient experiences
An ambient experience is a digital experience that relates to the immediate surroundings. A solution that features
ambient experiences strives to meet the customer in their moment of need. When possible, the solution meets the
customer need without leaving the flow of activity that triggered it.
Life in the digital economy is full of distractions. We're all bombarded with social, email, web, visual, and verbal
messaging, each of which is a risk of distraction. This risk increases with every second that elapses between the
customer's point of need and the moment they encounter a solution. Countless customers are lost in that brief
time gap. To foster an increase in repeat adoption, you have to reduce the number of distractions by reducing time
to solution (TTS ).

Interacting with devices


A standard web experience is the most common application development technique used to meet a customer's
needs. This approach assumes that the customer is in front of their computer. If your customer consistently meets
their point of need while in front of their laptop, build a web app. That web app will provide an ambient experience
for that customer in that scenario. However, we know that this scenario is less and less likely in our current era.

Ambient experiences typically require more than a web app these days. Through measurement and learning with
the customer the behavior that triggers the customer's need can be observed, tracked, and used to build a more
ambient experience. The following list summarizes a few approaches to integration of ambient solutions into your
hypotheses, with more details about each in the following paragraphs.
Mobile experience: As with laptops, mobile apps are ubiquitous in customer environments. In some
situations, this might provide a sufficient level of interactivity to make a solution ambient.
Mixed reality: Sometimes a customer's typical surroundings must be altered to make an interaction ambient.
This factor creates something of a false reality in which the customer interacts with the solution and has a need
met. In this case, the solution is ambient within the false reality.
Integrated reality: Moving closer to true ambience, integrated reality solutions focus on the use of a device
that exists within the customer's reality to integrate the solution into their natural behaviors. A virtual assistant
is a great example of integrating reality into the surrounding environment. A lesser known option concerns
Internet of Things (IoT) technologies, which integrate devices that already exist in the customer's surroundings.
Adjusted reality: When any of these ambient solutions use predictive analysis in the cloud to define and
provide an interaction with the customer through the natural surroundings, the solution has adjusted reality.
Understanding the customer need and measuring customer impact both help you determine whether a device
interaction or ambient experience are necessary to validate your hypothesis. With each of those data points, the
following sections will help you find the best solution.

Mobile experience
In the first stage of ambient experience, the user moves away from the computer. Today's consumers and business
professionals move fluidly between mobile and PC devices. Each of the platforms or devices used by your
customer creates a new potential experience. Adding a mobile experience that extends the primary solution is the
fastest way to improve integration into the customer's immediate surroundings. While a mobile device is far from
ambient, it might edge closer to the customer's point of need.
When customers are mobile and change locations frequently, that may represent the most relevant form of
ambient experience for a particular solution. Over the past decade, innovation has frequently been triggered by
the integration of existing solutions with a mobile experience.
Azure App Services is a great example of this approach. During early iterations, the web app feature of Azure App
Services can be used to test the hypothesis. As the hypotheses become more complex, the mobile app feature of
Azure App Services can extend the web app to run in a variety of mobile platforms.

Mixed reality
Mixed reality solutions represent the next level of maturity for ambient experiences. This approach augments or
replicates the customer's surroundings; it creates an extension of reality for the customer to operate within.

IMPORTANT
If a virtual reality (VR) device is required and is not already part of a customer's immediate surrounding or natural
behaviors, augmented or virtual reality is more of an alternative experience and less of an ambient experience.

Mixed reality experiences are increasingly common among remote workforces. Their use is growing even faster in
industries that require collaboration or specialty skills that aren't readily available in the local market. Situations
that require centralized implementation support of a complex product for a remote labor force are particularly
fertile ground for augmented reality. In these scenarios, the central support team and remote employees might
use augmented reality to work on, troubleshoot, and install the product.
For example, consider the case of spatial anchors. Spatial anchors allow you to create mixed reality experiences
with objects that persist their respective locations across devices over time. Through spatial anchors, a specific
behavior can be captured, recorded, and persisted, thereby providing an ambient experience the next time the user
operates within that augmented environment. Azure Spatial Anchors is a service that moves this logic to the
cloud, allowing experiences to be shared across devices and even across solutions.
Integrated reality
Beyond mobile reality or even mixed reality lies integrated reality. Integrated reality aims to remove the digital
experience entirely. All around us are devices with compute and connectivity capabilities. These devices can be
used to collect data from the immediate surroundings without the customer having to ever touch a phone, laptop,
or VR device.
This experience is ideal when some form of device is consistently within the same surroundings in which the
customer need occurs. Common scenarios include factory floors, elevators, and even your car. These types of large
devices already contain compute power. You can also use data from the device itself to detect customer behaviors
and send those behaviors to the cloud. This automatic capture of customer behavior data dramatically reduces the
need for a customer to input data. Additionally, the web, mobile, or VR experience can function as a feedback loop
to share what's been learned from the integrated reality solution.
Examples of integrated reality in Azure could include:
Azure Internet of Things (IoT) solutions, a collection of services in Azure that each aid in managing devices and
the flow of data from those devices into the cloud and back out to end users.
Azure Sphere, a combination of hardware and software. Azure Sphere is an innately secure way to enable an
existing device to securely transmit data between the device and Azure IoT solutions.
Azure Kinect Developers Kit, AI sensors with advance computer vision and speech models. These sensors can
collect visual and audio data from the immediate surroundings and feed those inputs into your solution.
You can use all three of these tools to collect data from the natural surroundings and at the point of customer
need. From there, your solution can respond to those data inputs to solve the need, sometimes before the
customer is even aware that a trigger for that need has occurred.

Adjusted reality
The highest form of ambient experience is adjusted reality, often referred to as ambient intelligence. Adjusted
reality is an approach to using information from your solution to change the customer's reality without requiring
them to interact directly with an application. In this approach, the application you initially built to prove your
hypothesis might no longer be relevant at all. Instead, devices in the environment help modulate the inputs and
outputs to meet customer needs.
Virtual assistants and smart speakers offer great examples of adjusted reality. Alone, a smart speaker is an
example of simple integrated reality. But add a smart light and motion sensor to a smart speaker solution and it's
easy to create a basic solution that turns on the lights when you enter a room.
Factory floors around the world provide additional examples of adjusted reality. During early stages of integrated
reality, sensors on devices detected conditions like overheating, and then alerted a human being through an
application. In adjusted reality, the customer might still be involved, but the feedback loop is tighter. On an
adjusted reality factory floor, one device might detect overheating in a vital machine somewhere along the
assembly line. Somewhere else on the floor, a second device then slows production slightly to allow the machine
to cool and then resume full pace when the condition is resolved. In this situation, the customer is a second-hand
participant. The customer uses your application to set the rules and understand how those rules have affected
production, but they're not necessary to the feedback loop.
The Azure services described in Azure Internet of Things (IoT) solutions, Azure Sphere, and Azure Kinect
Developers Kit could each be components of an adjusted reality solution. Your original application and business
logic would then serve as the intermediary between the environmental input and the change that should be made
in the physical environment.
A digital twin is another example of adjusted reality. This term refers to a digital representation of a physical
device, presented through through computer, mobile, or mixed-reality formats. Unlike less sophisticated 3D
models, a digital twin reflects data collected from an actual device in the physical environment. This solution
allows the user to interact with the digital representation in ways that could never be done in the real world. In this
approach, physical devices adjust a mixed reality environment. However, the solution still gathers data from an
integrated reality solution and uses that data to shape the reality of the customer's current surroundings.
In Azure, digital twins are created and accessed through a service called Azure Digital Twins.

Next steps
Now that you have a deeper understanding of device interactions and the ambient experience that's right for your
solution, you're ready to explore the final discipline of innovation, Predict and influence.
Predict and influence
Predict and influence
5 minutes to read • Edit Online

There are two classes of applications in the digital economy: historical and predictive. Many customer needs can
be met solely by using historical data, including nearly real-time data. Most solutions focus primarily on
aggregating data in the moment. They then process and share that data back to the customer in the form of a
digital or ambient experience.
As predictive modeling becomes more cost-effective and readily available, customers demand forward-thinking
experiences that lead to better decisions and actions. However, that demand doesn't always suggest a predictive
solution. In most cases, a historical view can provide enough data to empower the customer to make a decision on
their own.
Unfortunately, customers often take a myopic view that leads to decisions based on their immediate surroundings
and sphere of influence. As options and decisions grow in number and impact, that myopic view may not serve
the customer's needs. At the same time, as a hypothesis is proven at scale, the company providing the solution can
see across thousands or millions of customer decisions. This big-picture approach makes it possible to see broad
patterns and the impacts of those patterns. Predictive capability is a wise investment when an understanding of
those patterns is necessary to make decisions that best serve the customer.

Examples of predictions and influence


A variety of applications and ambient experiences use data to make predictions:
E -commerce: Based on what other similar consumers have purchased, an e-commerce website suggests
products that may be worth adding to your cart.
Adjusted reality: IoT offers more advanced instances of predictive functionality. For example, a device on an
assembly line detects a rise in a machine's temperature. A cloud-based predictive model determines how to
respond. Based on that prediction, another device slows down the assembly line until the machine can cool.
Consumer products: Cell phones, smart homes, even your car, all use predictive capabilities, which they
exploit to suggest user behavior based on factors like location or time of day. When a prediction and the initial
hypothesis are aligned, the prediction leads to action. At a very mature stage, this alignment can make products
like a self-driving car a reality.

Develop predictive capabilities


Solutions that consistently provide accurate predictive capabilities commonly include five core characteristics:
data, insights, patterns, predictions, and interactions. Each aspect is required to develop predictive capabilities. Like
all great innovations, the development of predictive capabilities requires a commitment to iteration. In each
iteration, one or more of the following characteristics is matured to validate increasingly complex customer
hypotheses.
Cau t i on

If the customer hypothesis developed in Build with customer empathy includes predictive capabilities, the
principles described there might well apply. However, predictive capabilities require significant investment of time
and energy. When predictive capabilities are technical spikes, as opposed to a source of real customer value, we
suggest that you delay predictions until the customer hypotheses have been validated at scale.

Data
Data is the most elemental of the characteristics mentioned earlier. Each of the disciplines for developing digital
inventions generates data. That data, of course, contributes to the development of predictions. For more guidance
on ways to get data into a predictive solution, see Democratizing data and Interacting with devices.
A variety of data sources can be used to deliver predictive capabilities:

Insights
Subject matter experts use data about customer needs and behaviors to develop basic business insights from a
study of raw data. Those insights can pinpoint occurrences of the desired customer behaviors (or, alternatively,
undesirable results). During iterations on the predictions, these insights can aid in identifying potential correlations
that could ultimately generate positive outcomes. For guidance on enabling subject matter experts to develop
insights, see Democratizing data.

Patterns
People have always tried to detect patterns in large volumes of data. Computers were designed for that purpose.
Machine learning accelerates that quest by detecting precisely such patterns, a skill that comprises the machine
learning model. Those patterns are then applied through machine learning algorithms to predict outcomes when a
new set of data is entered into the algorithms.
Using insights as a starting point, machine learning develops and applies predictive models to capitalize on the
patterns in data. Through multiple iterations of training, testing, and adoption, those models and algorithms can
accurately predict future outcomes.
Azure Machine Learning is the cloud-native service in Azure for building and training models based on your data.
This tool also includes a workflow for accelerating the development of machine learning algorithms. This
workflow can be used to develop algorithms through a visual interface or Python.
For more robust machine learning models, ML services in Azure HDInsight provides a machine learning platform
built on Apache Hadoop clusters. This approach enables more granular control of the underlying clusters, storage,
and compute nodes. Azure HDInsight also offers more advanced integration through tools like ScaleR and
SparkR to create predictions based on integrated and ingested data, even working with data from a stream. The
flight delay prediction solution demonstrates each of these advanced capabilities when used to predict flight
delays based on weather conditions. The HDInsight solution also allows for enterprise controls, such as data
security, network access, and performance monitoring to operationalize patterns.

Predictions
After a pattern is built and trained, you can apply it through APIs, which can make predictions during the delivery
of a digital experience. Most of these APIs are built from a well-trained model based on a pattern in your data. As
more customers deploy everyday workloads to the cloud, the prediction APIs used by cloud providers lead to
ever-faster adoption.
Azure Cognitive Services is an example of a predictive API built by a cloud vendor. This service includes predictive
APIs for content moderation, anomaly detection, and suggestions to personalize content. These APIs are ready to
use and are based on well-known content patterns, which Microsoft has used to train models. Each of those APIs
makes predictions based on the data you feed into the API.
Azure Machine Learning lets you deploy custom-built algorithms, which you can create and train based solely on
your own data. Learn more about deploying predictions with Azure Machine Learning.
Set up HDInsight clusters discusses the processes for exposing predictions developed for ML Services on Azure
HDInsight.

Interactions
After a prediction is made available through an API, you can use it to influence customer behavior. That influence
takes the form of interactions. An interaction with a machine learning algorithm happens within your other digital
or ambient experiences. As data is collected through the application or experience, it's run through the machine
learning algorithms. When the algorithm predicts an outcome, that prediction can be shared back with the
customer through the existing experience.
Learn more about how to create an ambient experience through an adjusted reality solution.

Next steps
Having acquainted yourself with the Disciplines of invention and the Innovate methodology, you're now ready to
learn how to build with customer empathy.
Build with empathy
Develop digital inventions in Azure
2 minutes to read • Edit Online

Azure can help accelerate the development of each area of digital invention. This section of the Cloud Adoption
Framework builds on the Innovate methodology. This section shows how you can combine Azure services to
create a toolchain for digital invention.

Alignment to the methodology


There are many combinations of cloud-based tools for digital invention and innovation within Azure. The following
article series demonstrates a few of the tools that closely align with the Innovate methodology. The following
image shows an overview of how different tools align to each type of innovation.

Toolchain
Start with the overview page that relates to the type of digital invention you require to test your hypothesis. You
start with that page for guidance you can act on and so that you can build with customer empathy.
Here are the types of digital invention in this article series:
Democratize data: Tools for sharing data to solve information-related customer needs
Engage via apps: Tools to create apps that engage customers beyond raw data
Empower adoption: Tools to accelerate customer adoption through digital support for your build-measure-
learn cycles
Interact with devices: Tools to create different levels of ambient experiences for your customers
Predict and influence: Tools for predictive analysis and integration of their output into applications
Tools to democratize data in Azure
2 minutes to read • Edit Online

As described in the conceptual article on democratizing data, you can deliver many innovations with little technical
investment. Many major innovations require little more than raw data. Democratizing data is about investing as
little resource as needed to engage your customers who use data to take advantage of their existing knowledge.
Starting with data is a quick way to test a hypothesis before expanding into broader, more costly digital inventions.
As you refine more of the hypothesis and begin to adopt the inventions at scale, the following processes will help
you prepare for operational support of the innovation.

Alignment to the methodology


This type of digital invention can be accelerated through each phase of the following processes, as shown in the
preceding image. Technical guidance to accelerate digital invention is listed in the table of contents on the left side
of this page. Those articles are grouped by phase to align guidance with the overall methodology.
Share data: The first step of democratizing data is to share openly.
Govern data: Ensure that any sensitive data is secured, tracked, and governed before sharing.
Centralize data: Sometimes you need to provide a centralized platform for data sharing and governance.
Collect data: Migration, integration, ingestion, and virtualization can each collect existing data to be
centralized, governed, and shared.
In every iteration, cloud adoption teams should go only as deep into the stack as they require to put the focus on
customer needs over architecture. Delaying technical spikes in favor of customer needs accelerates validation of
your hypothesis.
All guidance maps to the four preceding processes. Guidance ranges from the highest customer effect to the
highest technical effect. Across each process, you'll see guidance on different potential ways that Azure can
accelerate your ability to build with customer empathy.

Toolchain
In Azure, the following tools are commonly used to accelerate digital invention across the preceding phases:
Power BI
Azure Data Catalog
Azure SQL Data Warehouse
Azure Cosmos DB
Azure Database for PostgreSQL
Azure Database for MySQL
Azure Database for MariaDB
Azure Database for PostgreSQL Hyperscale
Azure Data Lake
Azure Database Migration Service
Azure SQL Database, with or without managed instances
Azure Data Factory
Azure Stream Analytics
SQL Server Integration Services
Azure Stack
SQL Server Stretch Database
Microsoft Azure StorSimple
Azure Files
Azure File Sync
PolyBase
As the invention approaches adoption at scale, the aspects of each solution require refinement and technical
maturity. As that happens, more of these services are likely to be required. Use the table of contents on the left side
of this page for Azure tools guidance relevant to your hypothesis-testing process.

Get started
The table of contents on the left side of this page outlines many articles. These articles help you get started with
each of the tools in this toolchain.

NOTE
Some links might leave the Cloud Adoption Framework to help you go beyond the scope of this framework.
What is data classification?
2 minutes to read • Edit Online

Data classification allows you to determine and assign value to your organization's data, and is a common starting
point for governance. The data classification process categorizes data by sensitivity and business impact in order to
identify risks. When data is classified, you can manage it in ways that protect sensitive or important data from theft
or loss.

Understand data risks, then manage them


Before any risk can be managed, it must be understood. In the case of data breach liability, that understanding
starts with data classification. Data classification is the process of associating a metadata characteristic to every
asset in a digital estate, which identifies the type of data associated with that asset.
Any asset identified as a potential candidate for migration or deployment to the cloud should have documented
metadata to record the data classification, business criticality, and billing responsibility. These three points of
classification can go a long way to understanding and mitigating risks.

Classifications Microsoft uses


The following is a list of classifications Microsoft uses. Depending on your industry or existing security
requirements, data classification standards might already exist within your organization. If no standard exists, you
might want to use this sample classification to better understand your own digital estate and risk profile.
Non-business: Data from your personal life that doesn't belong to Microsoft.
Public: Business data that is freely available and approved for public consumption.
General: Business data that isn't meant for a public audience.
Confidential: Business data that can cause harm to Microsoft if overshared.
Highly confidential: Business data that would cause extensive harm to Microsoft if overshared.

Tagging data classification in Azure


Resource tags are a good approach for metadata storage, and you can use these tags to apply data classification
information to deployed resources. Although tagging cloud assets by classification isn't a replacement for a formal
data classification process, it provides a valuable tool for managing resources and applying policy. Azure
Information Protection is an excellent solution to help you classify data itself, regardless of where it sits (on-
premises, in Azure, or somewhere else). Consider it as part of an overall classification strategy.
For additional information on resource tagging in Azure, see Using tags to organize your Azure resources.

Next steps
Apply data classifications during one of the actionable governance guides.
Choose an actionable governance guide
Collect data through the migration and
modernization of existing data sources
2 minutes to read • Edit Online

Companies often have different kinds of existing data that they can democratize. When a customer hypothesis
requires the use of existing data to build modern solutions, a first step might be the migration and modernization
of data to prepare for inventions and innovations. To align with existing migration efforts within a cloud adoption
plan, you can more easily do the migration and modernization within the Migrate methodology.

Use of this article


This article outlines a series of approaches that align with the Migrate process. You can best align these approaches
to the standard Migrate toolchain.
During the Assess process within the Migrate methodology, a cloud adoption team assesses the current state and
desired future state for the migrated asset. When that process is part of an innovation effort, both cloud adoption
teams can use this article to help make those assessments.

Primary toolset
When you migrate and modernize on-premises data, the most common Azure tool choice is Azure Database
Migration Service. This service is part of the broader Azure Migrate toolchain. For existing SQL Server data
sources, Data Migration Assistant can help you assess and migrate a small number of data structures.
To support Oracle and NoSQL migrations, you can also use Database Migration Service for certain types of
source-to-target databases. Examples include Oracle to PostgreSQL and MongoDB to Cosmos DB. More
commonly, adoption teams use partner tools or custom scripts to migrate to Azure Cosmos DB, Azure HDInsight,
or virtual machine options based on infrastructure as a service (IaaS ).

Considerations and guidance


When you use Azure Database Migration Service for migration and modernization of data, it's important to
understand:
The current platform for hosting the data source.
The current version.
The future platform and version that best supports the customer hypothesis or target.
The following table shows source and target pairs to review with the migration team. Each pair includes a tool
choice and a link to a related guide.
Migration type
With an offline migration, application downtime starts when the migration starts. With an online migration,
downtime is limited to the time to cut over at the end of migration.
We suggest that you decide your acceptable business downtime and test an offline migration. You do so to check if
the restoration time meets the acceptable downtime. If the restoration time is unacceptable, do an online migration.
SOURCE TARGET TOOL MIGRATION TYPE GUIDANCE

SQL Server Azure SQL Database Database Migration Offline Tutorial


Service

SQL Server Azure SQL Database Database Migration Online Tutorial


Service

SQL Server Azure SQL Database Database Migration Offline Tutorial


managed instance Service

SQL Server Azure SQL Database Database Migration Online Tutorial


managed instance Service

RDS SQL Server Azure SQL Database Database Migration Online Tutorial
or Azure SQL Service
Database managed
instance

MySQL Azure Database for Database Migration Online Tutorial


MySQL Service

PostgreSQL Azure Database for Database Migration Online Tutorial


PostgreSQL Service

MongoDB Azure Cosmos DB Database Migration Offline Tutorial


Mongo API Service

MongoDB Azure Cosmos DB Database Migration Online Tutorial


Mongo API Service

Oracle Different platform as A partner's tool or Offline or online Decision tree


a service (PaaS) and Azure Migrate
IaaS options

Different NoSQL DB Cosmo DB or IaaS Procedural migrations Offline or online Decision tree
options options or Azure Migrate
Tools to engage via apps in Azure
2 minutes to read • Edit Online

As described in Engage via apps, applications can be an important aspect of an MVP solution. Applications are
often required for testing a hypothesis. This article helps you learn the tools Azure provides to accelerate
development of those applications.

Alignment to the methodology


You can accelerate this type of digital invention through each of the following listed approaches. The preceding
image also shows these approaches. Technical guidance for accelerating digital invention is listed in the table of
contents on the left side of this page. Those articles are grouped by their approaches to aligning guidance with the
overall methodology.
For this article, assume all inventions that result in an application stem from a shared solution as described in
empower adoption. Also assume each application results in some type of customer experience for both internal
and external customers.
Based on these assumptions, the following three paths are the most common for cloud adoption teams who are
developing digital inventions:
Citizen developers: Before engaging professional developers, business subject matter experts use citizen
developer tools. These tools rapidly test and validate that a customer hypothesis can meet the needs of that
customer.
Intelligent experiences: Create modern experiences by using cloud platforms to drive rapid deployment and
short feedback loops. Expand on web applications to infuse intelligence or even integrate bots.
Cloud-native: Build a new invention that naturally takes advantage of cloud capabilities.
Each path results in advantages and disadvantages that are both short-term and long-term. When the cloud
governance team, the cloud operations team, and and the cloud center of excellence team are ready to support
every approach, you can accelerate adoption with minimal effect on sustainable business operations.

Toolchain
Depending on the path that the cloud adoption team takes, Azure provides tools to accelerate the team's ability to
build with customer empathy. The following list of Azure offerings is grouped based on the preceding decision
paths. These offerings include:
Azure App Service
Azure Kubernetes Service (AKS )
Azure Migrate
Azure Stack
PowerApps
Microsoft Flow
Power BI

Get started
The table of contents on the left side of this page outlines many articles. These articles help you get started with
each of the tools in this toolchain.

NOTE
Some links might leave the Cloud Adoption Framework to help you go beyond the scope of this framework.
Tools to empower adoption in Azure
2 minutes to read • Edit Online

As described in Empower adoption, building true innovation at scale requires an investment in removing friction
that could slow adoption. In the early stages of testing a hypothesis, a solution is small. The investment in
removing friction is likely small as well. As hypotheses prove true, the solution and the investment in empowering
adoption grows. This article provides key links to help you get started with each stage of maturity.

Alignment to the methodology


You can accelerate this type of digital invention through the following levels of maturity. These levels align with the
maturity model in the preceding image. Technical guidance to accelerate digital invention is listed in the table of
contents on the left side of this page. Those articles are grouped by maturity level.
Shared solution: Establish a centralized repository for all aspects of the solution.
Feedback loops: Ensure feedback loops can be managed consistently throughout iterations.
Continuous integration: Regularly build and consolidate a solution.
Reliable testing: Validate solution quality and expected changes to drive ensuring measurements.
Solution deployment: Deploy a solution to allow a team to quickly share changes with customers.
Integrated measurement: Add learning metrics to the feedback loop for clear analysis by the full team.

Toolchain
For adoption teams that are mature professional development teams with many contributors, the Azure toolchain
starts with GitHub and Azure DevOps.
As your need grows, you can expand this foundation to use other tool features. The expanded foundation might
involve tools like:
Azure Blueprints
Azure Policy
Azure Resource Manager templates
Azure Monitor
The table of contents on the left side of this page lists guidance for each tool and aligns with the previously
described maturity model.

Get started
The table of contents on the left side of this page outlines many articles. These articles help you get started with
each of the tools in this toolchain.

NOTE
Some links might leave the Cloud Adoption Framework to help you go beyond the scope of this framework.
Tools to interact with devices in Azure
2 minutes to read • Edit Online

As described in the conceptual article on interacting with devices, the devices used to interact with a customer
depend on the amount of ambient experience required to deliver the customer's need and empower adoption.
Speed from the trigger that prompts the customer's need and your solution's ability to meet that need are
determining factors in repeat usage. Ambient experiences help accelerate that response time and create a better
experience for your customers by embedding your solution in the customers' immediate surroundings.

Alignment to the methodology


This type of digital invention can be delivered through any of the following levels of ambient experience. These
levels align with the methodology as shown in the preceding image. The table of contents on the left side of this
page lists technical guidance to accelerate digital invention. Those articles are grouped by level of ambient
experience to align with the methodology.
Mobile experience: Mobile apps are commonly part of a customer's surroundings. In some scenarios, a
mobile device might provide enough interactivity to make a solution ambient.
Mixed reality: Sometimes a customer's natural surroundings must be altered through mixed reality. Engaging
a customer within that mixed reality can provide a form of ambient experience.
Integrated reality: Moving closer to true ambience, integrated reality solutions focus on the use of a
customer's physical device to integrate the solution into natural behaviors.
Adjusted reality: When any of the preceding solutions use predictive analysis to provide an interaction with a
customer within that customer's natural surroundings, that solution creates the highest form of ambient
experience.

Toolchain
In Azure, you commonly use the following tools to accelerate digital invention across each of the preceding levels
of ambient solutions. These tools are grouped based on the amount of experience required to reduce complexity in
aligning tools with those experiences.
Mobile Experience: Azure App Service, PowerApps, Microsoft Flow, Intune
Mixed Reality: Unity, Azure Spatial Anchors, HoloLens
Integrated Reality: Azure IoT Hub, Azure Sphere, Azure Kinect DK
Adjusted Reality: IoT cloud to device, Azure Digital Twins + HoloLens

Get started
The table of contents on the left side of this page outlines many articles. These articles help you get started with
each of the tools in this toolchain.

NOTE
Some links might leave the Cloud Adoption Framework to help you go beyond the scope of this framework.
Tools to predict and influence data in Azure
2 minutes to read • Edit Online

As described in the conceptual article on predict and influence, computers and AI are much better than we are at
seeing patterns. By using cloud-based analytics tools, you can easily detect patterns and apply them to your
customers' needs. Use of these tools results in predictions of the best outcomes. When those predictions are
integrated back into customer experiences, they can influence your customers' behavior patterns through
interactions.

Alignment to the methodology


You can accelerate this type of digital invention through each phase of the following process. The phases align with
the methodology shown in the preceding image. Technical guidance to accelerate digital invention is listed in the
table of contents on the left side of this page. Those articles are grouped by phase to align with the methodology.
In the preceding image, data and insights align with the best practices outlined in the democratizing data article. As
subject matter experts discover insights that might be repeatable, they can use the following three steps to mature
those insights:
Patterns: Find and define patterns to create predictive models.
Predictions: Apply patterns to customer data to predict outcomes based on the model and underlying pattern.
Interactions: Consume the predictions from within an application or data source to drive an interaction with
your customer.

Toolchain
In Azure, the following tools are commonly used to accelerate digital invention across each of the preceding
phases:
Azure Machine Learning
Azure HDInsight
Hadoop R ScaleR
Azure SQL Data Warehouse
How each tool helps with each phase of predict and influence is reflected in the guidance in the table of contents
on the left side of this page.

Get started
The table of contents on the left side of this page outlines many articles. These articles help you get started with
each of the tools in this toolchain.

NOTE
Some links might leave the Cloud Adoption Framework to help you go beyond the scope of this framework.
The cloud creates new paradigms for the technologies that support the business. These new paradigms also change how those
technologies are adopted, managed, and governed. When entire datacenters can be virtually torn down and rebuilt with one
line of code executed by an unattended process, we have to rethink traditional approaches. This is especially true for
governance.

Get started with cloud governance


Cloud governance is an iterative process. For organizations with existing policies that govern on-premises IT environments,
cloud governance should complement those policies. However, the level of corporate policy integration between on-premises
and the cloud varies depending on cloud governance maturity and a digital estate in the cloud. As the cloud estate changes
over time, so do cloud governance processes and policies. The following exercises help you start building your initial
governance foundation.

Methodology
Establish a basic understanding of the methodology that drives cloud governance in the Cloud Adoption Framework to
begin thinking through the end state solution.

Benchmark
Assess your current state and future state to establish a vision for applying the framework.

Initial governance foundation


Begin your governance journey with a small, easily implemented set of governance tools. This initial governance
foundation is called a minimum viable product (MVP ).

Improve the initial governance foundation


Throughout implementation of the cloud adoption plan, iteratively add governance controls to address tangible risks as
you progress toward the end state.

Objective of this content


The guidance in this section of the Cloud Adoption Framework serves two purposes:
Provide examples of actionable governance guides that represent common experiences often encountered by customers.
Each example encapsulates business risks, corporate policies for risk mitigation, and design guidance for implementing
technical solutions. By necessity, the design guidance is specific to Azure. All other content in these guides could be applied
in a cloud-agnostic or multicloud approach.
Help you create personalized governance solutions that meet a variety of business needs. These needs include the
governance of multiple public clouds through detailed guidance on the development of corporate policies, processes, and
tooling.
This content is intended for use by the cloud governance team. It's also relevant to cloud architects who need to develop a
strong foundation in cloud governance.

Intended audience
The content in the Cloud Adoption Framework affects the business, technology, and culture of enterprises. This section of the
Cloud Adoption Framework interacts heavily with IT security, IT governance, finance, line-of-business leaders, networking,
identity, and cloud adoption teams. Various dependencies on these personnel require a facilitative approach by the cloud
architects using this guidance. Facilitation with these teams might be a one-time effort. In some cases, interactions with these
other personnel will be ongoing.
The cloud architect serves as the thought leader and facilitator to bring these audiences together. The content in this collection
of guides is designed to help the cloud architect facilitate the right conversation, with the right audience, to drive necessary
decisions. Business transformation that's empowered by the cloud depends on the cloud architect to help guide decisions
throughout the business and IT.
Cloud architect specialization in this section: Each section of the Cloud Adoption Framework represents a different
specialization or variant of the cloud architect role. This section of the Cloud Adoption Framework is designed for cloud
architects with a passion for mitigating or reducing technical risks. Some cloud providers refer to these specialists as cloud
custodians, but we prefer cloud guardians or, collectively, the cloud governance team. In each actionable governance guide, the
articles show how the composition and role of the cloud governance team might change over time.

Use this guide


If you want to follow this guide from beginning to end, this content aids in developing a robust cloud governance strategy in
parallel with cloud implementation. The guidance walks you through the theory and implementation of such a strategy.
For a crash course on the theory and quick access to Azure implementation, get started with the governance guides overview
Using this guidance, you can start small and iteratively improve your governance needs in parallel with cloud adoption efforts.

Next steps
Establish a basic understanding of the methodology that drives cloud governance in the Cloud Adoption Framework.
Understand the methodology
Cloud governance methodology
3 minutes to read • Edit Online

Adopting the cloud is a journey, not a destination. Along the way, there are clear milestones and tangible business
benefits. However, the final state of cloud adoption is unknown when a company begins the journey. Cloud
governance creates guardrails that keep the company on a safe path throughout the journey.
The Cloud Adoption Framework provides governance guides that describe the experiences of fictional companies,
which are based on the experiences of real customers. Each guide follows the customer through the governance
aspects of their cloud adoption.

Envision an end state


A journey without a target destination is just wandering. It's important to establish a rough vision of the end state
before taking the first step. The following infographic provides a frame of reference for the end state. It's not your
starting point, but it shows your potential destination.

The Cloud Adoption Framework governance model identifies key areas of importance during the journey. Each
area relates to different types of risks the company must address as it adopts more cloud services. Within this
framework, the governance guide identifies required actions for the cloud governance team. Along the way, each
principle of the Cloud Adoption Framework governance model is described further. Broadly, these include:
Corporate policies: Corporate policies drive cloud governance. The governance guide focuses on specific aspects
of corporate policy:
Business risks: Identifying and understanding corporate risks.
Policy and compliance: Converting risks into policy statements that support any compliance requirements.
Processes: Ensuring adherence to the stated policies.
Five Disciplines of Cloud Governance: These disciplines support the corporate policies. Each discipline protects
the company from potential pitfalls:
Cost Management
Security Baseline
Resource Consistency
Identity Baseline
Deployment Acceleration
Essentially, corporate policies serve as the early warning system to detect potential problems. The disciplines help
the company manage risks and create guardrails.

Grow to the end state


Because governance requirements will change throughout the cloud adoption journey, a different approach to
governance is required. Companies can no longer wait for a small team to build guardrails and roadmaps on
every highway before taking the first step. Business results are expected more quickly and smoothly. IT
governance must also move quickly and keep pace with business demands to stay relevant during cloud adoption
and avoid "shadow IT."
An incremental governance approach empowers these traits. Incremental governance relies on a small set of
corporate policies, processes, and tools to establish a foundation for adoption and governance. That foundation is
called a minimum viable product (MVP ). An MVP allows the governance team to quickly incorporate
governance into implementations throughout the adoption lifecycle. An MVP can be established at any point
during the cloud adoption process. However, it's a good practice to adopt an MVP as early as possible.
The ability to respond rapidly to changing risks empowers the cloud governance team to engage in new ways. The
cloud governance team can join the cloud strategy team as scouts, moving ahead of the cloud adoption teams,
plotting routes, and quickly establishing guardrails to manage risks associated with the adoption plans. These just-
in-time governance layers are known as governance iterations. With this approach, governance strategy grows
one step ahead of the cloud adoption teams.
The following diagram shows a simple governance MVP and three governance iterations. During the iterations,
additional corporate policies are defined to remediate new risks. The Deployment Acceleration discipline then
applies those changes across each deployment.

NOTE
Governance is not a replacement for key functions such as security, networking, identity, finance, DevOps, or operations.
Along the way, there will be interactions with and dependencies on members from each function. Those members should be
included on the cloud governance team to accelerate decisions and actions.

Next steps
Use the Cloud Adoption Framework governance benchmark tool to assess your transformation journey and help
you identify gaps in your organization across six key domains as defined in the framework.
Assess your transformation journey
The Cloud Adoption Framework provides a governance benchmark tool to help you identify gaps in your organization across six
key domains as defined in the framework.

Governance benchmark tool


The governance benchmark tool provides a personalized report that outlines the difference between your current state and
business priorities, along with tailored resources to help you get started.

Governance benchmark tool


Assess your current state and future state to establish a vision for applying the framework.

Next steps
Begin your governance journey with a small, easily implemented set of governance tools. This initial governance foundation is
called a minimum viable product (MVP ).
Establish an initial governance foundation
Establishing cloud governance is a broad iterative effort. It is challenging to strike an effective balance between speed and control,
especially during early phases of cloud adoption. The governance guidance in the Cloud Adoption Framework helps provide that
balance via an agile approach to adoption.
This article provides two options for establishing an initial foundation for governance. Either option ensures that governance
constraints can be scaled and expanded as the adoption plan is implemented and requirements become more clearly defined. By
default, the initial foundation assumes an isolate-and-control position. It also focuses more on resource organization than on
resource governance. This lightweight starting point is called a minimum viable product (MVP ) for governance. The objective of
the MVP is reducing barriers to establishing an initial governance position, and then enabling rapid maturation of the solution to
address a variety of tangible risks.

Already using the Cloud Adoption Framework


If you have been following along with the Cloud Adoption Framework, you may already have deployed a governance MVP.
Guidance is a core aspect of any operating model. It is present during every phase of the cloud adoption lifecycle. As such, the
Cloud Adoption Framework provides guidance that injects governance into activities related to the implementation of your cloud
adoption plan. One example of this governance integration is using blueprints to deploy one or more landing zones present in the
ready guidance. Another example is guidance for scaling out subscriptions. If you have followed either of those recommendations,
then the following MVP sections are simply a review of your existing deployment decisions. After a quick review, jump ahead to
Mature the initial governance solution and apply best-practice controls.

Establish an initial governance foundation


The following are two different examples of initial governance foundations (also called governance MVPs) to apply a sound
foundation for governance to new or existing deployments. Choose the MVP that best aligns with your business needs to get
started:

Standard governance guide


A guide for most organizations based on the recommended two-subscription model, designed for deployments in multiple
regions but not spanning public and sovereign/government clouds.

Governance guide for complex enterprises


A guide for enterprises that are managed by multiple independent IT business units or span public and
sovereign/government clouds.

Next steps
Once a governance foundation is in place, apply suitable recommendations to improve the solution and protect against tangible
risks.
Improve the initial governance foundation
This article assumes that you have established an initial cloud governance foundation. As your cloud adoption plan is
implemented, tangible risks will emerge from the proposed approaches by which teams want to adopt the cloud. As these risks
surface in release planning conversations, use the following grid to quickly identify a few best practices for getting ahead of the
adoption plan to prevent risks from becoming real threats.

Maturity vectors
At any time, the following best practices can be applied to the initial governance foundation to address the risk or need mentioned
in the table below.
IM P O R T A N T

Resource organization can affect how these best practices are applied. It is important to start with the recommendations that best
align with the initial cloud governance foundation you implemented in the previous step.

R IS K /NEED S TAND AR D ENTER PR IS E CO MPLE X ENTER PR IS E

Sensitive data in the cloud Discipline improvement Discipline improvement

Mission-critical apps in the cloud Discipline improvement Discipline improvement

Cloud cost management Discipline improvement Discipline improvement

Multicloud Discipline improvement Discipline improvement

Complex/legacy identity management N/A Discipline improvement

Multiple layers of governance N/A Discipline improvement

Next steps
In addition to the application of best practices, the governance methodology in the Cloud Adoption Framework can be
customized to fit unique business constraints. After following the applicable recommendations, evaluate corporate policy to
understand additional customization requirements.
Evaluate corporate policy
The actionable governance guides in this section illustrate the incremental approach of the Cloud Adoption Framework
governance model, based on the governance methodology previously described. You can establish an agile approach to cloud
governance that will grow to meet the needs of any cloud governance scenario.

Review and adopt cloud governance best practices


To begin your cloud adoption journey, choose one of the following governance guides. Each guide outlines a set of best
practices, based on a set of fictional customer experiences. For readers who are new to the incremental approach of the Cloud
Adoption Framework governance model, review the high-level introduction to governance theory below before adopting
either set of best practices.

Standard governance guide


A guide for most organizations based on the recommended two-subscription model, designed for deployments in
multiple regions but not spanning public and sovereign/government clouds.

Governance guide for complex enterprises


A guide for enterprises that are managed by multiple independent IT business units or span public and
sovereign/government clouds.

An incremental approach to cloud governance


Choose a governance guide
The guides demonstrate how to implement a governance MVP. From there, each guide shows how the cloud governance
team can work ahead of the cloud adoption teams as a partner to accelerate adoption efforts. The Cloud Adoption Framework
governance model guides the application of governance from foundation through subsequent improvements and evolutions.
To begin a governance journey, choose one of the two options below. The options are based on synthesized customer
experiences. The titles are based on the complexity of the enterprise for ease of navigation. However, the reader's decision may
be more complex. The following tables outline the differences between the two options.
W A R N IN G

A more robust governance starting point may be required. In such cases, consider the Azure Virtual Datacenter approach
briefly described below. This approach is commonly suggested during enterprise-scale adoption efforts, and especially for
efforts which exceed 10,000 assets. It is also the de facto choice for complex governance scenarios when any of the following
are required: extensive third-party compliance requirements, deep domain expertise, or parity with mature IT governance
policies and compliance requirements.
NOTE

It's unlikely that either guide aligns completely to your situation. Choose whichever guide is closest and use it as a starting
point. Throughout the guide, additional information is provided to help you customize decisions to meet specific criteria.

Business characteristics
CHAR ACTER IS TIC S TAND AR D O R G ANIZATIO N CO MPLE X ENTER PR IS E

Geography (country or geopolitical region) Customers or staff reside largely in one Customers or staff reside in multiple
geography geographies or require sovereign clouds.

Business units affected Business Units that share a common IT Multiple business units that do not share a
infrastructure common IT infrastructure

IT budget Single IT budget Budget allocated across business units and


currencies
CHAR ACTER IS TIC S TAND AR D O R G ANIZATIO N CO MPLE X ENTER PR IS E

IT investments Capital expense-driven investments are Capital expense-driven investments are


planned yearly and usually cover only basic planned yearly and often include
maintenance. maintenance and a refresh cycle of three to
five years.

Current state before adopting cloud governance


S TATE S TAND AR D ENTER PR IS E CO MPLE X ENTER PR IS E

Datacenter or third-party hosting providers Fewer than five datacenters More than five datacenters

Networking No WAN, or 1 – 2 WAN providers Complex network or global WAN

Identity Single forest, single domain. Complex, multiple forests, multiple domains.

Desired future state after incremental improvement of cloud governance


S TATE S TAND AR D O R G ANIZATIO N CO MPLE X ENTER PR IS E

Cost Management – cloud accounting Showback model. Billing is centralized Chargeback model. Billing could be
through IT. distributed through IT procurement.

Security Baseline – protected data Company financial data and IP. Limited Multiple collections of customers' financial
customer data. No third-party compliance and personal data. May need to consider
requirements. third-party compliance.

Azure Virtual Datacenter


Azure Virtual Datacenter is an approach to making the most of the Azure cloud platform's capabilities while respecting an
enterprise's security and governance requirements.
Compared to traditional on-premises environments, Azure allows workload development teams and their business sponsors
to take advantage of the increased deployment agility that cloud platforms offer. However, as your cloud adoption efforts
expand to include mission-critical data and workloads, this agility may conflict with corporate security and policy compliance
requirements established by your IT teams. This is especially true for large enterprises that have existing sophisticated
governance and regulatory requirements.
The Azure Virtual Datacenter approach aims to address these concerns earlier in the adoption lifecycle by providing models,
reference architectures, sample automation artifacts, and guidance to help achieve a balance between developer and IT
governance requirements during enterprise cloud adoption efforts. Central to this approach is the concept of a virtual
datacenter itself: the implementation of isolation boundaries around your cloud infrastructure through the application of
access and security controls, network policies, and compliance monitoring.
A virtual datacenter can be thought of as your own isolated cloud within the Azure platform, integrating management
processes, regulatory requirements, and security processes required by your governance policies. Within this virtual
boundary, Azure Virtual Datacenter offers example models for deploying workloads while ensuring consistent compliance and
provides basic guidance on implementing an organization's separation of roles and responsibilities in the cloud.

Azure Virtual Datacenter assumptions


Although smaller teams may benefit from the models and recommendations the Azure Virtual Datacenter provides, this
approach is designed to guide enterprise IT groups managing large cloud environments. For organizations that meet the
following criteria it's recommended that you consider consulting the Azure Virtual Datacenter guidance when designing your
Azure-based cloud infrastructure:
Your enterprise is subject to regulatory compliance requirements that require centralized monitoring and audit capabilities.
You need to maintain common policy and governance compliance and central IT control over core services.
Your industry depends on a complex platform which requires complex controls and deep domain expertise to govern the
platform. This is most common in large enterprises within finance, oil and gas, or manufacturing.
Your existing IT governance policies require tighter parity with existing features, even during early stage adoption.
For more information, visit the Azure Virtual Datacenter section of the Cloud Adoption Framework.

Next steps
Choose one of these guides:
Standard enterprise governance guide
Governance guide for complex enterprises
Standard enterprise governance guide
8 minutes to read • Edit Online

Overview of best practices


This governance guide follows the experiences of a fictional company through various stages of governance
maturity. It is based on real customer experiences. The best practices are based on the constraints and needs of
the fictional company.
As a quick starting point, this overview defines a minimum viable product (MVP ) for governance based on best
practices. It also provides links to some governance improvements that add further best practices as new
business or technical risks emerge.

WARNING
This MVP is a baseline starting point, based on a set of assumptions. Even this minimal set of best practices is based on
corporate policies driven by unique business risks and risk tolerances. To see if these assumptions apply to you, read the
longer narrative that follows this article.

Governance best practices


These best practices serve as a foundation for an organization to quickly and consistently add governance
guardrails across your subscriptions.
Resource organization
The following diagram shows the governance MVP hierarchy for organizing resources.

Every application should be deployed in the proper area of the management group, subscription, and resource
group hierarchy. During deployment planning, the cloud governance team will create the necessary nodes in the
hierarchy to empower the cloud adoption teams.
1. One management group for each type of environment (such as production, development, and test).
2. Two subscriptions, one for production workloads and another for nonproduction workloads.
3. Consistent nomenclature should be applied at each level of this grouping hierarchy.
4. Resource groups should be deployed in a manner that considers its contents lifecycle: everything that is
developed together, is managed together, and retires together goes together. For more information on
resource group best practices, see here.
5. Region selection is incredibly important and must be considered so that networking, monitoring, auditing can
be in place for failover/failback as well as confirmation that needed SKUs are available in the preferred
regions.
Here is an example of this pattern in use:

These patterns provide room for growth without complicating the hierarchy unnecessarily.

NOTE
In the event of changes to your business requirements, Azure management groups allow you to easily reorganize your
management hierarchy and subscription group assignments. However, keep in mind that policy and role assignments
applied to a management group are inherited by all subscriptions underneath that group in the hierarchy. If you plan to
reassign subscriptions between management groups, make sure that you are aware of any policy and role assignment
changes that may result. See the Azure management groups documentation for more information.

Governance of resources
A set of global policies and RBAC roles will provide a baseline level of governance enforcement. To meet the
cloud governance team's policy requirements, implementing the governance MVP requires completing the
following tasks:
1. Identify the Azure Policy definitions needed to enforce business requirements. This can include using built-in
definitions and creating new custom definitions.
2. Create a blueprint definition using these built-in and custom policy and the role assignments required by the
governance MVP.
3. Apply policies and configuration globally by assigning the blueprint definition to all subscriptions.
Identify policy definitions
Azure provides several built-in policies and role definitions that you can assign to any management group,
subscription, or resource group. Many common governance requirements can be handled using built-in
definitions. However, it's likely that you will also need to create custom policy definitions to handle your specific
requirements.
Custom policy definitions are saved to either a management group or a subscription and are inherited through
the management group hierarchy. If a policy definition's save location is a management group, that policy
definition is available to assign to any of that group's child management groups or subscriptions.
Since the policies required to support the governance MVP are meant to apply to all current subscriptions, the
following business requirements will be implemented using a combination of built-in definitions and custom
definitions created in the root management group:
1. Restrict the list of available role assignments to a set of built-in Azure roles authorized by your cloud
governance team. This requires a custom policy definition.
2. Require the following tags on all resources: Department/Billing Unit, Geography, Data Classification,
Criticality, SLA, Environment, Application Archetype, Application, and Application Owner. This can be handled
using the Require specified tag built-in definition.
3. Require that the Application tag for resources should match the name of the relevant resource group. This
can be handled using the "Require tag and its value" built-in definition.
For information on defining custom policies see the Azure Policy documentation. For guidance and examples of
custom policies, consult the Azure Policy samples site and the associated GitHub repository.
Assign Azure Policy and RBAC roles using Azure Blueprints
Azure policies can be assigned at the resource group, subscription, and management group level, and can be
included in Azure Blueprints definitions. Although the policy requirements defined in this governance MVP apply
to all current subscriptions, it's very likely that future deployments will require exceptions or alternative policies.
As a result, assigning policy using management groups, with all child subscriptions inheriting these assignments,
may not be flexible enough to support these scenarios.
Azure Blueprints allow the consistent assignment of policy and roles, application of Resource Manager templates,
and deployment of resource groups across multiple subscriptions. As with policy definitions, blueprint definitions
are saved to management groups or subscriptions, and are available through inheritance to any children in the
management group hierarchy.
The cloud governance team has decided that enforcement of required Azure Policy and RBAC assignments
across subscriptions will be implemented through Azure Blueprints and associated artifacts:
1. In the root management group, create a blueprint definition named governance-baseline .
2. Add the following blueprint artifacts to the blueprint definition:
a. Policy assignments for the custom Azure Policy definitions defined at the management group root.
b. Resource group definitions for any groups required in subscriptions created or governed by the
Governance MVP.
c. Standard role assignments required in subscriptions created or governed by the Governance MVP.
3. Publish the blueprint definition.
4. Assign the governance-baseline blueprint definition to all subscriptions.

See the Azure Blueprints documentation for more information on creating and using blueprint definitions.
Secure hybrid VNet
Specific subscriptions often require some level of access to on-premises resources. This is common in migration
scenarios or dev scenarios where dependent resources reside in the on-premises datacenter.
Until trust in the cloud environment is fully established it's important to tightly control and monitor any allowed
communication between the on-premises environment and cloud workloads, and that the on-premises network
is secured against potential unauthorized access from cloud-based resources. To support these scenarios, the
governance MVP adds the following best practices:
1. Establish a cloud secure hybrid VNet.
a. The VPN reference architecture establishes a pattern and deployment model for creating a VPN
Gateway in Azure.
b. Validate that on-premises security and traffic management mechanisms treat connected cloud
networks as untrusted. Resources and services hosted in the cloud should only have access to
authorized on-premises services.
c. Validate that the local edge device in the on-premises datacenter is compatible with Azure VPN
Gateway requirements and is configured to access the public internet.
d. Note that VPN tunnels should not be considered production ready circuits for anything but the most
simple workloads. Anything beyond a few simple workloads requiring on-premises connectivity should
use Azure ExpressRoute.
2. In the root management group, create a second blueprint definition named secure-hybrid-vnet .
a. Add the Resource Manager template for the VPN Gateway as an artifact to the blueprint definition.
b. Add the Resource Manager template for the virtual network as an artifact to the blueprint definition.
c. Publish the blueprint definition.
3. Assign the secure-hybrid-vnet blueprint definition to any subscriptions requiring on-premises connectivity.
This definition should be assigned in addition to the governance-baseline blueprint definition.

One of the biggest concerns raised by IT security and traditional governance teams is the risk that early stage
cloud adoption will compromise existing assets. The above approach allows cloud adoption teams to build and
migrate hybrid solutions, with reduced risk to on-premises assets. As trust in the cloud environment increases,
later evolutions may remove this temporary solution.

NOTE
The above is a starting point to quickly create a baseline governance MVP. This is only the beginning of the governance
journey. Further evolution will be needed as the company continues to adopt the cloud and takes on more risk in the
following areas:
Mission-critical workloads
Protected data
Cost management
Multicloud scenarios
Moreover, the specific details of this MVP are based on the example journey of a fictional company, described in the articles
that follow. We highly recommend becoming familiar with the other articles in this series before implementing this best
practice.

Iterative governance improvements


Once this MVP has been deployed, additional layers of governance can be incorporated into the environment
quickly. Here are some ways to improve the MVP to meet specific business needs:
Security Baseline for protected data
Resource configurations for mission-critical applications
Controls for Cost Management
Controls for multicloud evolution

What does this guidance provide?


In the MVP, practices and tools from the Deployment Acceleration discipline are established to quickly apply
corporate policy. In particular, the MVP uses Azure Blueprints, Azure Policy, and Azure management groups to
apply a few basic corporate policies, as defined in the narrative for this fictional company. Those corporate
policies are applied using Resource Manager templates and Azure policies to establish a small baseline for
identity and security.
Incremental improvement of governance practices
Over time, this governance MVP will be used to improve governance practices. As adoption advances, business
risk grows. Various disciplines within the Cloud Adoption Framework governance model will change to manage
those risks. Later articles in this series discuss the incremental improvement of corporate policy affecting the
fictional company. These improvements happen across three disciplines:
Cost Management, as adoption scales.
Security Baseline, as protected data is deployed.
Resource Consistency, as IT Operations begins supporting mission-critical workloads.

Next steps
Now that you're familiar with the governance MVP and have an idea of the governance improvements to follow,
read the supporting narrative for additional context.
Read the supporting narrative
Standard enterprise governance guide: The narrative
behind the governance strategy
2 minutes to read • Edit Online

The following narrative describes the use case for governance during a standard enterprise's cloud adoption
journey. Before implementing the journey, it's important to understand the assumptions and rationale that are
reflected in this narrative. Then you can better align the governance strategy to your own organization's journey.

Back story
The board of directors started the year with plans to energize the business in several ways. They are pushing
leadership to improve customer experiences to gain market share. They are also pushing for new products and
services that will position the company as a thought leader in the industry. They also initiated a parallel effort to
reduce waste and cut unnecessary costs. Though intimidating, the actions of the board and leadership show that
this effort is focusing as much capital as possible on future growth.
In the past, the company's CIO has been excluded from these strategic conversations. However, because the future
vision is intrinsically linked to technical growth, IT has a seat at the table to help guide these big plans. IT is now
expected to deliver in new ways. The team isn't prepared for these changes and is likely to struggle with the
learning curve.

Business characteristics
The company has the following business profile:
All sales and operations reside in a single country, with a low percentage of global customers.
The business operates as a single business unit, with budget aligned to functions, including Sales, Marketing,
Operations, and IT.
The business views most of IT as a capital drain or a cost center.

Current state
Here is the current state of the company's IT and cloud operations:
IT operates two hosted infrastructure environments. One environment contains production assets. The second
environment contains disaster recovery and some dev/test assets. These environments are hosted by two
different providers. IT refers to these two datacenters as Prod and DR respectively.
IT entered the cloud by migrating all end-user email accounts to Office 365. This migration was completed six
months ago. Few other IT assets have been deployed to the cloud.
The application development teams are working in a dev/test capacity to learn about cloud-native capabilities.
The business intelligence (BI) team is experimenting with big data in the cloud and curation of data on new
platforms.
The company has a loosely defined policy stating that personal customer data and financial data cannot be
hosted in the cloud, which limits mission-critical applications in the current deployments.
IT investments are controlled largely by capital expense. Those investments are planned yearly. In the past
several years, investments have included little more than basic maintenance requirements.

Future state
The following changes are anticipated over the next several years:
The CIO is reviewing the policy on personal data and financial data to allow for the future state goals.
The application development and BI teams want to release cloud-based solutions to production over the next
24 months based on the vision for customer engagement and new products.
This year, the IT team will finish retiring the disaster recovery workloads of the DR datacenter by migrating
2,000 VMs to the cloud. This is expected to produce an estimated $25M USD cost savings over the next five
years.

The company plans to change how it makes IT investments by repositioning the committed capital expense as
an operating expense within IT. This change will provide greater cost control and enable IT to accelerate other
planned efforts.

Next steps
The company has developed a corporate policy to shape the governance implementation. The corporate policy
drives many of the technical decisions.
Review the initial corporate policy
Standard enterprise governance guide: Initial
corporate policy behind the governance strategy
4 minutes to read • Edit Online

The following corporate policy defines an initial governance position, which is the starting point for this guide. This
article defines early-stage risks, initial policy statements, and early processes to enforce policy statements.

NOTE
The corporate policy is not a technical document, but it drives many technical decisions. The governance MVP described in
the overview ultimately derives from this policy. Before implementing a governance MVP, your organization should develop a
corporate policy based on your own objectives and business risks.

Cloud governance team


In this narrative, the cloud governance team is comprised of two systems administrators who have recognized the
need for governance. Over the next several months, they will inherit the job of cleaning up the governance of the
company's cloud presence, earning them the title of cloud custodians. In subsequent iterations, this title will likely
change.

Objective
The initial objective is to establish a foundation for governance agility. An effective Governance MVP allows the
governance team to stay ahead of cloud adoption and implement guardrails as the adoption plan changes.

Business risks
The company is at an early stage of cloud adoption, experimenting and building proofs of concept. Risks are now
relatively low, but future risks are likely to have a significant impact. There is little definition around the final state
of the technical solutions to be deployed to the cloud. In addition, the cloud readiness of IT employees is low. A
foundation for cloud adoption will help the team safely learn and grow.
Future-proofing: There is a risk of not empowering growth, but also a risk of not providing the right protections
against future risks.
An agile yet robust governance approach is needed to support the board's vision for corporate and technical
growth. Failure to implement such a strategy will slow technical growth, potentially risking current and future
market share growth. The impact of such a business risk is unquestionably high. However, the role IT will play in
those potential future states is unknown, making the risk associated with current IT efforts relatively high. That
said, until more concrete plans are aligned, the business has a high tolerance for risk.
This business risk can be broken down tactically into several technical risks:
Well-intended corporate policies could slow transformation efforts or break critical business processes, if not
considered within a structured approval flow.
The application of governance to deployed assets could be difficult and costly.
Governance may not be properly applied across an application or workload, creating gaps in security.
With so many teams working in the cloud, there is a risk of inconsistency.
Costs may not properly align to business units, teams, or other budgetary management units.
The use of multiple identities to manage various deployments could lead to security issues.
Despite current policies, there is a risk that protected data could be mistakenly deployed to the cloud.

Tolerance indicators
The current tolerance for risk is high and the appetite for investing in cloud governance is low. As such, the
tolerance indicators act as an early warning system to trigger more investment of time and energy. If and when the
following indicators are observed, you should iteratively improve the governance strategy.
Cost Management: The scale of deployment exceeds predetermined limits on number of resources or
monthly cost.
Security Baseline: Inclusion of protected data in defined cloud adoption plans.
Resource Consistency: Inclusion of any mission-critical applications in defined cloud adoption plans.

Policy statements
The following policy statements establish the requirements needed to remediate the defined risks. These policies
define the functional requirements for the governance MVP. Each will be represented in the implementation of the
governance MVP.
Cost Management:
For tracking purposes, all assets must be assigned to an application owner within one of the core business
functions.
When cost concerns arise, additional governance requirements will be established with the finance team.
Security Baseline:
Any asset deployed to the cloud must have an approved data classification.
No assets identified with a protected level of data may be deployed to the cloud, until sufficient requirements
for security and governance can be approved and implemented.
Until minimum network security requirements can be validated and governed, cloud environments are seen as
a demilitarized zone and should meet similar connection requirements to other datacenters or internal
networks.
Resource Consistency:
Because no mission-critical workloads are deployed at this stage, there are no SLA, performance, or BCDR
requirements to be governed.
When mission-critical workloads are deployed, additional governance requirements will be established with IT
operations.
Identity Baseline:
All assets deployed to the cloud should be controlled using identities and roles approved by current governance
policies.
All groups in the on-premises Active Directory infrastructure that have elevated privileges should be mapped to
an approved RBAC role.
Deployment Acceleration:
All assets must be grouped and tagged according to defined grouping and tagging strategies.
All assets must use an approved deployment model.
Once a governance foundation has been established for a cloud provider, any deployment tooling must be
compatible with the tools defined by the governance team.
Processes
No budget has been allocated for ongoing monitoring and enforcement of these governance policies. Because of
that, the cloud governance team has some ad hoc ways to monitor adherence to policy statements.
Education: The cloud governance team is investing time to educate the cloud adoption teams on the
governance guides that support these policies.
Deployment reviews: Before deploying any asset, the cloud governance team will review the governance
guide with the cloud adoption teams.

Next steps
This corporate policy prepares the cloud governance team to implement the governance MVP, which will be the
foundation for adoption. The next step is to implement this MVP.
Best practices explained
Standard enterprise governance guide: Best practices
explained
10 minutes to read • Edit Online

The governance guide starts with a set of initial corporate policies. These policies are used to establish a
governance MVP that reflects best practices.
In this article, we discuss the high-level strategies that are required to create a governance MVP. The core of the
governance MVP is the Deployment Acceleration discipline. The tools and patterns applied at this stage will enable
the incremental improvements needed to expand governance in the future.

Governance MVP (initial governance foundation)


Rapid adoption of governance and corporate policy is achievable, thanks to a few simple principles and cloud-
based governance tooling. These are the first three disciplines to approach in any governance process. Each
discipline will be further described in this article.
To establish the starting point, this article will discuss the high-level strategies behind Identity Baseline, Security
Baseline, and Deployment Acceleration that are required to create a governance MVP, which will serve as the
foundation for all adoption.

Implementation process
The implementation of the governance MVP has dependencies on Identity, Security, and Networking. Once the
dependencies are resolved, the cloud governance team will decide a few aspects of governance. The decisions from
the cloud governance team and from supporting teams will be implemented through a single package of
enforcement assets.
This implementation can also be described using a simple checklist:
1. Solicit decisions regarding core dependencies: Identity, Networking, Monitoring, and Encryption.
2. Determine the pattern to be used during corporate policy enforcement.
3. Determine the appropriate governance patterns for the Resource Consistency, Resource Tagging, and Logging
and Reporting disciplines.
4. Implement the governance tools aligned to the chosen policy enforcement pattern to apply the dependent
decisions and governance decisions.

Dependent decisions
The following decisions come from teams outside of the cloud governance team. The implementation of each will
come from those same teams. However, the cloud governance team is responsible for implementing a solution to
validate that those implementations are consistently applied.
Identity Baseline
Identity Baseline is the fundamental starting point for all governance. Before attempting to apply governance,
identity must be established. The established identity strategy will then be enforced by the governance solutions. In
this governance guide, the Identity Management team implements the Directory Synchronization pattern:
RBAC will be provided by Azure Active Directory (Azure AD ), using the directory synchronization or "Same
Sign-On" that was implemented during company's migration to Office 365. For implementation guidance, see
Reference Architecture for Azure AD Integration.
The Azure AD tenant will also govern authentication and access for assets deployed to Azure.
In the governance MVP, the governance team will enforce application of the replicated tenant through subscription
governance tooling, discussed later in this article. In future iterations, the governance team could also enforce rich
tooling in Azure AD to extend this capability.
Security Baseline: Networking
Software Defined Network is an important initial aspect of the Security Baseline. Establishing the governance
MVP depends on early decisions from the Security Management team to define how networks can be safely
configured.
Given the lack of requirements, IT security is playing it safe and requires a Cloud DMZ Pattern. That means
governance of the Azure deployments themselves will be very light.
Azure subscriptions may connect to an existing datacenter via VPN, but must follow all existing on-premises IT
governance policies regarding connection of a demilitarized zone to protected resources. For implementation
guidance regarding VPN connectivity, see VPN Reference Architecture.
Decisions regarding subnet, firewall, and routing are currently being deferred to each application/workload
lead.
Additional analysis is required before releasing of any protected data or mission-critical workloads.
In this pattern, cloud networks can only connect to on-premises resources over an existing VPN that is compatible
with Azure. Traffic over that connection will be treated like any traffic coming from a demilitarized zone. Additional
considerations may be required on the on-premises edge device to securely handle traffic from Azure.
The cloud governance team has proactively invited members of the networking and IT security teams to regular
meetings, in order to stay ahead of networking demands and risks.
Security Baseline: Encryption
Encryption is another fundamental decision within the Security Baseline discipline. Because the company currently
does not yet store any protected data in the cloud, the Security Team has decided on a less aggressive pattern for
encryption. At this point, a cloud-native pattern for encryption is suggested but not required of any
development team.
No governance requirements have been set regarding the use of encryption, because the current corporate
policy does not permit mission-critical or protected data in the cloud.
Additional analysis will be required before releasing any protected data or mission-critical workloads.

Policy enforcement
The first decision to make regarding Deployment Acceleration is the pattern for enforcement. In this narrative, the
governance team decided to implement the Automated Enforcement pattern.
Azure Security Center will be made available to the security and identity teams to monitor security risks. Both
teams are also likely to use Security Center to identify new risks and improve corporate policy.
RBAC is required in all subscriptions to govern authentication enforcement.
Azure Policy will be published to each management group and applied to all subscriptions. However, the level
of policies being enforced will be very limited in this initial Governance MVP.
Although Azure management groups are being used, a relatively simple hierarchy is expected.
Azure Blueprints will be used to deploy and update subscriptions by applying RBAC requirements, Resource
Manager Templates, and Azure Policy across management groups.

Apply the dependent patterns


The following decisions represent the patterns to be enforced through the policy enforcement strategy above:
Identity Baseline. Azure Blueprints will set RBAC requirements at a subscription level to ensure that consistent
identity is configured for all subscriptions.
Security Baseline: Networking. The cloud governance team maintains a Resource Manager template for
establishing a VPN gateway between Azure and the on-premises VPN device. When an application team requires
a VPN connection, the cloud governance team will apply the gateway Resource Manager template via Azure
Blueprints.
Security Baseline: Encryption. At this point, no policy enforcement is required in this area. This will be revisited
during later iterations.

Application of governance-defined patterns


The cloud governance team is responsible for the following decisions and implementations. Many require inputs
from other teams, but the cloud governance team is likely to own both the decision and the implementation. The
following sections outline the decisions made for this use case and details of each decision.
Subscription design
The decision on what subscription design to use determines how Azure subscriptions get structured and how
Azure management groups will be used to efficiently manage access, policies, and compliance of these
subscription. In this narrative, the governance team has chosen the production-and-nonproduction subscription
design pattern.
Departments are not likely to be required given the current focus. Deployments are expected to be constrained
within a single billing unit. At the stage of adoption, there may not even be an enterprise agreement to
centralize billing. It's likely that this level of adoption is being managed by a single pay-as-you-go Azure
subscription.
Regardless of the use of the EA portal or the existence of an enterprise agreement, a subscription model should
still be defined and agreed on to minimize administrative overheard beyond just billing.
A common naming convention should be agreed on as part of the subscription design, based on the previous
two points.
Resource consistency
Resource consistency decisions determine the tools, processes, and effort required to ensure Azure resources are
deployed, configured, and managed consistently within a subscription. In this narrative, Deployment
Consistency has been chosen as the primary resource consistency pattern.
Resource groups are created for applications using the lifecycle approach. Everything that is created,
maintained, and retired together should reside a single resource group. For more on resource groups, see here.
Azure Policy should be applied to all subscriptions from the associated management group.
As part of the deployment process, Azure Resource Consistency templates for the resource group should be
stored in source control.
Each resource group is associated with a specific workload or application based on the lifecycle approach
described above.
Azure management groups enable updating governance designs as corporate policy matures.
Extensive implementation of Azure Policy could exceed the team's time commitments and may not provide a
great deal of value at this time. However, a simple default policy should be created and applied to each
management group to enforce the small number of current cloud governance policy statements. This policy will
define the implementation of specific governance requirements. Those implementations can then be applied
across all deployed assets.

IMPORTANT
Any time a resource in a resource group no longer shares the same lifecycle, it should be moved to another resource group.
Examples include common databases and networking components. While they may serve the application being developed,
they may also serve other purposes and should therefore exist in other resource groups.

Resource tagging
Resource tagging decisions determine how metadata is applied to Azure resources within a subscription to
support operations, management, and accounting purposes. In this narrative, the Classification pattern has been
chosen as the default model for resource tagging.
Deployed assets should be tagged with:
Data Classification
Criticality
SLA
Environment
These four values will drive governance, operations, and security decisions.
If this governance guide is being implemented for a business unit or team within a larger corporation, tagging
should also include metadata for the billing unit.
Logging and reporting
Logging and reporting decisions determine how your store log data and how the monitoring and reporting tools
that keep IT staff informed on operational health are structured. In this narrative, a cloud-native pattern** for
logging and reporting is suggested.

Incremental improvement of governance processes


As governance changes, some policy statements can't or shouldn't be controlled by automated tooling. Other
policies will result in effort by the IT Security team and the on-premises Identity Management team over time. To
help manage new risks as they arise, the cloud governance team will oversee the following processes.
Adoption acceleration: The cloud governance team has been reviewing deployment scripts across multiple
teams. They maintain a set of scripts that serve as deployment templates. Those templates are used by the cloud
adoption and DevOps teams to define deployments more quickly. Each of those scripts contains the necessary
requirements to enforce a set of governance policies with no additional effort from cloud adoption engineers. As
the curators of these scripts, the cloud governance team can more quickly implement policy changes. As a result of
script curation, the cloud governance team is seen as a source of adoption acceleration. This creates consistency
among deployments, without strictly forcing adherence.
Engineer training: The cloud governance team offers bimonthly training sessions and has created two videos for
engineers. These materials help engineers quickly learn the governance culture and how things are done during
deployments. The team is adding training assets that show the difference between production and nonproduction
deployments, so that engineers will understand how the new policies will affect adoption. This creates consistency
among deployments, without strictly forcing adherence.
Deployment planning: Before deploying any asset containing protected data, the cloud governance team will
review deployment scripts to validate governance alignment. Existing teams with previously approved
deployments will be audited using programmatic tooling.
Monthly audit and reporting: Each month, the cloud governance team runs an audit of all cloud deployments to
validate continued alignment to policy. When deviations are discovered, they are documented and shared with the
cloud adoption teams. When enforcement doesn't risk a business interruption or data leak, the policies are
automatically enforced. At the end of the audit, the cloud governance team compiles a report for the cloud strategy
team and each cloud adoption team to communicate overall adherence to policy. The report is also stored for
auditing and legal purposes.
Quarterly policy review: Each quarter, the cloud governance team and the cloud strategy team will review audit
results and suggest changes to corporate policy. Many of those suggestions are the result of continuous
improvements and the observation of usage patterns. Approved policy changes are integrated into governance
tooling during subsequent audit cycles.

Alternative patterns
If any of the patterns selected in this governance guide don't align with the reader's requirements, alternatives to
each pattern are available:
Encryption patterns
Identity patterns
Logging and Reporting patterns
Policy Enforcement patterns
Resource Consistency patterns
Resource Tagging patterns
Software Defined Networking patterns
Subscription Design patterns
Next steps
Once this guide is implemented, each cloud adoption team can go forth with a sound governance foundation. At
the same time, the cloud governance team will work to continuously update the corporate policies and governance
disciplines.
The two teams will use the tolerance indicators to identify the next set of improvements needed to continue
supporting cloud adoption. For the fictional company in this guide, the next step is improving the Security Baseline
to support moving protected data to the cloud.
Improve the Security Baseline discipline
Standard enterprise governance guide: Improve the
Security Baseline discipline
8 minutes to read • Edit Online

This article advances the narrative by adding security controls that support moving protected data to the cloud.

Advancing the narrative


IT and business leadership have been happy with results from early stage experimentation by the IT, App
Development, and BI teams. To realize tangible business values from these experiments, those teams must be
allowed to integrate protected data into solutions. This triggers changes to corporate policy, but also requires
incremental improvement of the cloud governance implementations before protected data can land in the cloud.
Changes to the cloud governance team
Given the effect of the changing narrative and support provided so far, the cloud governance team is now viewed
differently. The two system administrators who started the team are now viewed as experienced cloud architects.
As this narrative develops, the perception of them will shift from being Cloud Custodians to more of a Cloud
Guardian role.
While the difference is subtle, it's an important distinction when building a governance- focused IT culture. A
Cloud Custodian cleans up the messes made by innovative cloud architects. The two roles have natural friction
and opposing objectives. On the other hand, a Cloud Guardian helps keep the cloud safe, so other cloud architects
can move more quickly, with less messes. Additionally, a Cloud Guardian is involved in creating templates that
accelerate deployment and adoption, making them an innovation accelerator as well as a defender of the Five
Disciplines of Cloud Governance.
Changes in the current state
At the start of this narrative, the application development teams were still working in a dev/test capacity, and the
BI team was still in the experimental phase. IT operated two hosted infrastructure environments, named Prod and
DR.
Since then, some things have changed that will affect governance:
The application development team has implemented a CI/CD pipeline to deploy a cloud-native application with
an improved user experience. That app doesn't yet interact with protected data, so it isn't production ready.
The Business Intelligence team within IT actively curates data in the cloud from logistics, inventory, and third-
party sources. This data is being used to drive new predictions, which could shape business processes.
However, those predictions and insights are not actionable until customer and financial data can be integrated
into the data platform.
The IT team is progressing on the CIO and CFO's plans to retire the DR datacenter. More than 1,000 of the
2,000 assets in the DR datacenter have been retired or migrated.
The loosely defined policies regarding personal data and financial data have been modernized. However, the
new corporate policies are contingent on the implementation of related security and governance policies.
Teams are still stalled.
Incrementally improve the future state
Early experiments by the App Dev and BI teams show potential improvements in customer experiences and data-
driven decisions. Both teams want to expand adoption of the cloud over the next 18 months by deploying those
solutions to production.
During the remaining six months, the cloud governance team will implement security and governance
requirements to allow the cloud adoption teams to migrate the protected data in those datacenters.
The changes to current and future state expose new risks that require new policy statements.

Changes in tangible risks


Data breach: When adopting any new data platform, there is an inherent increase in liabilities related to potential
data breaches. Technicians adopting cloud technologies have increased responsibilities to implement solutions
that can decrease this risk. A robust security and governance strategy must be implemented to ensure those
technicians fulfill those responsibilities.
This business risk can be expanded into a few technical risks:
1. Mission-critical applications or protected data might be deployed unintentionally.
2. Protected data might be exposed during storage due to poor encryption decisions.
3. Unauthorized users might access protected data.
4. External intrusion might result in access to protected data.
5. External intrusion or denial of service attacks might cause a business interruption.
6. Organization or employment changes might allow for unauthorized access to protected data.
7. New exploits could create new intrusion or access opportunities.
8. Inconsistent deployment processes might result in security gaps, which could lead to data leaks or
interruptions.
9. Configuration drift or missed patches might result in unintended security gaps, which could lead to data leaks
or interruptions.

Incremental improvement of the policy statements


The following changes to policy will help remediate the new risks and guide implementation. The list looks long,
but adopting these policies may be easier than it appears.
1. All deployed assets must be categorized by criticality and data classification. Classifications are to be reviewed
by the cloud governance team and the application owner before deployment to the cloud.
2. Applications that store or access protected data are to be managed differently than those that don't. At a
minimum, they should be segmented to avoid unintended access of protected data.
3. All protected data must be encrypted when at rest. While this is the default for all Azure Storage Accounts,
additional encryption strategies may be needed, including encryption of the data within the storage account,
encryption of VMs, and database-level encryption when using SQL in a VM (TDE and column encryption).
4. Elevated permissions in any segment containing protected data should be an exception. Any such exceptions
will be recorded with the cloud governance team and audited regularly.
5. Network subnets containing protected data must be isolated from any other subnets. Network traffic between
protected data subnets will be audited regularly.
6. No subnet containing protected data can be directly accessed over the public internet or across datacenters.
Access to those subnets must be routed through intermediate subnets. All access into those subnets must
come through a firewall solution that can perform packet scanning and blocking functions.
7. Governance tooling must audit and enforce network configuration requirements defined by the security
management team.
8. Governance tooling must limit VM deployment to approved images only.
9. Whenever possible, node configuration management should apply policy requirements to the configuration of
any guest operating system.
10. Governance tooling must enforce that automatic updates are enabled on all deployed assets. Violations must
be reviewed with operational management teams and remediated in accordance with operations policies.
Assets that are not automatically updated must be included in processes owned by IT Operations.
11. Creation of new subscriptions or management groups for any mission-critical applications or protected data
will require a review from the cloud governance team, to ensure that the proper blueprint is assigned.
12. A least-privilege access model will be applied to any management group or subscription that contains mission-
critical apps or protected data.
13. Trends and exploits that could affect cloud deployments should be reviewed regularly by the security team to
provide updates to security management tooling used in the cloud.
14. Deployment tooling must be approved by the cloud governance team to ensure ongoing governance of
deployed assets.
15. Deployment scripts must be maintained in a central repository accessible by the cloud governance team for
periodic review and auditing.
16. Governance processes must include audits at the point of deployment and at regular cycles to ensure
consistency across all assets.
17. Deployment of any applications that require customer authentication must use an approved identity provider
that is compatible with the primary identity provider for internal users.
18. Cloud governance processes must include quarterly reviews with identity management teams. These reviews
can help identify malicious actors or usage patterns that should be prevented by cloud asset configuration.

Incremental improvement of governance practices


The governance MVP design will change to include new Azure policies and an implementation of Azure Cost
Management. Together, these two design changes will fulfill the new corporate policy statements.
1. The Networking and IT Security teams will define network requirements. The cloud governance team will
support the conversation.
2. The Identity and IT Security teams will define identity requirements and make any necessary changes to local
Active Directory implementation. The cloud governance team will review changes.
3. Create a repository in Azure DevOps to store and version all relevant Azure Resource Manager templates and
scripted configurations.
4. Azure Security Center implementation:
a. Configure Azure Security Center for any management group that contains protected data classifications.
b. Set automatic provisioning to on by default to ensure patching compliance.
c. Establish OS security configurations. The IT Security team will define the configuration.
d. Support the IT Security team in the initial use of Security Center. Transition the use of Security Center to
the IT Security team, but maintain access for the purpose of continually improving governance.
e. Create a Resource Manager template that reflects the changes required for Security Center
configuration within a subscription.
5. Update Azure policies for all subscriptions:
a. Audit and enforce the criticality and data classification across all management groups and subscriptions,
to identify any subscriptions with protected data classifications.
b. Audit and enforce the use of approved images only.
6. Update Azure policies for all subscriptions that contains protected data classifications:
a. Audit and enforce the use of standard Azure RBAC roles only.
b. Audit and enforce encryption for all storage accounts and files at rest on individual nodes.
c. Audit and enforce the application of an NSG to all NICs and subnets. The Networking and IT Security
teams will define the NSG.
d. Audit and enforce the use of approved network subnet and vNet per network interface.
e. Audit and enforce the limitation of user-defined routing tables.
f. Apply the Built-in Policies for Guest Configuration as follows:
a. Audit that Windows web servers are using secure communication protocols.
b. Audit that password security settings are set correctly inside Linux and Windows machines.
7. Firewall configuration:
a. Identify a configuration of Azure Firewall that meets necessary security requirements. Alternatively,
identify a compatible third-party appliance that is compatible with Azure.
b. Create a Resource Manager template to deploy the firewall with required configurations.
8. Azure blueprint:
a. Create a new blueprint named protected-data .
b. Add the firewall and Azure Security Center templates to the blueprint.
c. Add the new policies for protected data subscriptions.
d. Publish the blueprint to any management group that currently plans on hosting protected data.
e. Apply the new blueprint to each affected subscription, in addition to existing blueprints.

Conclusion
Adding the above processes and changes to the governance MVP will help to remediate many of the risks
associated with security governance. Together, they add the network, identity, and security monitoring tools
needed to protect data.

Next steps
As cloud adoption continues and delivers additional business value, risks and cloud governance needs also
change. For the fictional company in this guide, the next step is to support mission-critical workloads. This is the
point when Resource Consistency controls are needed.
Improving Resource Consistency
Standard enterprise governance guide: Improving
Resource Consistency
6 minutes to read • Edit Online

This article advances the narrative by adding Resource Consistency controls to support mission-critical apps.

Advancing the narrative


New customer experiences, new prediction tools, and migrated infrastructure continue to progress. The business
is now ready to begin using those assets in a production capacity.
Changes in the current state
In the previous phase of this narrative, the application development and BI teams were nearly ready to integrate
customer and financial data into production workloads. The IT team was in the process of retiring the DR
datacenter.
Since then, some things have changed that will affect governance:
IT has retired 100% of the DR datacenter, ahead of schedule. In the process, a set of assets in the Production
datacenter were identified as cloud migration candidates.
The application development teams are now ready for production traffic.
The BI team is ready to feed predictions and insights back into operation systems in the Production datacenter.
Incrementally improve the future state
Before using Azure deployments in production business processes, cloud operations must mature. In conjunction,
additional governance changes is required to ensure assets can be operated properly.
The changes to current and future state expose new risks that will require new policy statements.

Changes in tangible risks


Business interruption: There is an inherent risk of any new platform causing interruptions to mission-critical
business processes. The IT Operations team and the teams executing on various cloud adoptions are relatively
inexperienced with cloud operations. This increases the risk of interruption and must be remediated and governed.
This business risk can be expanded into several technical risks:
1. External intrusion or denial of service attacks might cause a business interruption.
2. Mission-critical assets may not be properly discovered, and therefore might not be properly operated.
3. Undiscovered or mislabeled assets might not be supported by existing operational management processes.
4. The configuration of deployed assets may not meet performance expectations.
5. Logging might not be properly recorded and centralized to allow for remediation of performance issues.
6. Recovery policies may fail or take longer than expected.
7. Inconsistent deployment processes might result in security gaps that could lead to data leaks or interruptions.
8. Configuration drift or missed patches might result in unintended security gaps that could lead to data leaks or
interruptions.
9. Configuration might not enforce the requirements of defined SLAs or committed recovery requirements.
10. Deployed operating systems or applications might fail to meet hardening requirements.
11. With so many teams working in the cloud, there is a risk of inconsistency.
Incremental improvement of the policy statements
The following changes to policy will help remediate the new risks and guide implementation. The list looks long,
but adopting these policies may be easier than it appears.
1. All deployed assets must be categorized by criticality and data classification. Classifications are to be reviewed
by the cloud governance team and the application owner before deployment to the cloud.
2. Subnets containing mission-critical applications must be protected by a firewall solution capable of detecting
intrusions and responding to attacks.
3. Governance tooling must audit and enforce network configuration requirements defined by the Security
Management team.
4. Governance tooling must validate that all assets related to mission-critical apps or protected data are included
in monitoring for resource depletion and optimization.
5. Governance tooling must validate that the appropriate level of logging data is being collected for all mission-
critical applications or protected data.
6. Governance process must validate that backup, recovery, and SLA adherence are properly implemented for
mission-critical applications and protected data.
7. Governance tooling must limit virtual machine deployments to approved images only.
8. Governance tooling must enforce that automatic updates are prevented on all deployed assets that support
mission-critical applications. Violations must be reviewed with operational management teams and remediated
in accordance with operations policies. Assets that are not automatically updated must be included in processes
owned by IT Operations.
9. Governance tooling must validate tagging related to cost, criticality, SLA, application, and data classification. All
values must align to predefined values managed by the governance team.
10. Governance processes must include audits at the point of deployment and at regular cycles to ensure
consistency across all assets.
11. Trends and exploits that could affect cloud deployments should be reviewed regularly by the security team to
provide updates to security management tooling used in the cloud.
12. Before release into production, all mission-critical apps and protected data must be added to the designated
operational monitoring solution. Assets that cannot be discovered by the chosen IT operations tooling, cannot
be released for production use. Any changes required to make the assets discoverable must be made to the
relevant deployment processes to ensure assets will be discoverable in future deployments.
13. When discovered, operational management teams will size assets, to ensure that assets meet performance
requirements.
14. Deployment tooling must be approved by the cloud governance team to ensure ongoing governance of
deployed assets.
15. Deployment scripts must be maintained in a central repository accessible by the cloud governance team for
periodic review and auditing.
16. Governance review processes must validate that deployed assets are properly configured in alignment with
SLA and recovery requirements.

Incremental improvement of governance practices


This section of the article will change the governance MVP design to include new Azure policies and an
implementation of Azure Cost Management. Together, these two design changes will fulfill the new corporate
policy statements.
1. The cloud operations team will define operational monitoring tooling and automated remediation tooling. The
cloud governance team will support those discovery processes. In this use case, the cloud operations team
chose Azure Monitor as the primary tool for monitoring mission-critical applications.
2. Create a repository in Azure DevOps to store and version all relevant Resource Manager templates and
scripted configurations.
3. Azure Recovery Services Vault implementation:
a. Define and deploy Azure Recovery Services Vault for backup and recovery processes.
b. Create a Resource Manager template for creation of a vault in each subscription.
4. Update Azure Policy for all subscriptions:
a. Audit and enforce criticality and data classification across all subscriptions to identify any subscriptions
with mission-critical assets.
b. Audit and enforce the use of approved images only.
5. Azure Monitor implementation:
a. Once a mission-critical workload is identified, create an Azure Monitor workspace.
b. During deployment testing, the cloud operations team deploys the necessary agents and tests discovery.
6. Update Azure Policy for all subscriptions that contain mission-critical applications.
a. Audit and enforce the application of an NSG to all NICs and subnets. Networking and IT Security define
the NSG.
b. Audit and enforce the use of approved network subnets and VNets for each network interface.
c. Audit and enforce the limitation of user-defined routing tables.
d. Audit and enforce deployment of Azure Monitor agents for all virtual machines.
e. Audit and enforce that Azure Recovery Services Vaults exist in the subscription.
7. Firewall configuration:
a. Identify a configuration of Azure Firewall that meets security requirements. Alternatively, identify a
third-party appliance that is compatible with Azure.
b. Create a Resource Manager template to deploy the firewall with required configurations.
8. Azure blueprint:
a. Create a new Azure blueprint named protected-data .
b. Add the firewall and Azure Vault templates to the blueprint.
c. Add the new policies for protected data subscriptions.
d. Publish the blueprint to any management group that will host mission-critical applications.
e. Apply the new blueprint to each affected subscription as well as existing blueprints.

Conclusion
These additional processes and changes to the governance MVP help remediate many of the risks associated with
resource governance. Together they add recovery, sizing, and monitoring controls that empower cloud-aware
operations.

Next steps
As cloud adoption continues and delivers additional business value, risks and cloud governance needs will also
change. For the fictional company in this guide, the next trigger is when the scale of deployment exceeds 100
assets to the cloud or monthly spending exceeds $1,000 per month. At this point, the cloud governance team adds
Cost Management controls.
Improving Cost Management
Standard enterprise guide: Improve the Cost
Management discipline
4 minutes to read • Edit Online

This article advances the narrative by adding cost controls to the governance MVP.

Advancing the narrative


Adoption has grown beyond the cost tolerance indicator defined in the governance MVP. This is a good thing, as it
corresponds with migrations from the "DR" datacenter. The increase in spending now justifies an investment of
time from the cloud governance team.
Changes in the current state
In the previous phase of this narrative, IT had retired 100% of the DR datacenter. The application development and
BI teams were ready for production traffic.
Since then, some things have changed that will affect governance:
The migration team has begun migrating VMs out of the production datacenter.
The application development teams is actively pushing production applications to the cloud through CI/CD
pipelines. Those applications can reactively scale with user demands.
The business intelligence team within IT has delivered several predictive analytics tools in the cloud. The
volumes of data aggregated in the cloud continues to grow.
All of this growth supports committed business outcomes. However, costs have begun to mushroom.
Projected budgets are growing faster than expected. The CFO needs improved approaches to managing costs.
Incrementally improve the future state
Cost monitoring and reporting is to be added to the cloud solution. IT is still serving as a cost clearing house. This
means that payment for cloud services continues to come from IT procurement. However, reporting should tie
direct operating expenses to the functions that are consuming the cloud costs. This model is referred to as a
"Show Back" cloud accounting model.
The changes to current and future state expose new risks that will require new policy statements.

Changes in tangible risks


Budget control: There is an inherent risk that self-service capabilities will result in excessive and unexpected
costs on the new platform. Governance processes for monitoring costs and mitigating ongoing cost risks must be
in place to ensure continued alignment with the planned budget.
This business risk can be expanded into a few technical risks:
Actual costs might exceed the plan.
Business conditions change. When they do, there will be cases when a business function needs to consume
more cloud services than expected, leading to spending anomalies. There is a risk that this extra spending will
be considered overages, as opposed to a necessary adjustment to the plan.
Systems could be overprovisioned, resulting in excess spending.

Incremental improvement of the policy statements


The following changes to policy will help remediate the new risks and guide implementation.
All cloud costs should be monitored against plan on a weekly basis by the governance team. Reporting on
deviations between cloud costs and plan is to be shared with IT leadership and finance monthly. All cloud costs
and plan updates should be reviewed with IT leadership and finance monthly.
All costs must be allocated to a business function for accountability purposes.
Cloud assets should be continually monitored for optimization opportunities.
Cloud governance tooling must limit Asset sizing options to an approved list of configurations. The tooling
must ensure that all assets are discoverable and tracked by the cost monitoring solution.
During deployment planning, any required cloud resources associated with the hosting of production
workloads should be documented. This documentation will help refine budgets and prepare additional
automation to prevent the use of more expensive options. During this process consideration should be given to
different discounting tools offered by the cloud provider, such as reserved instances or license cost reductions.
All application owners are required to attend trained on practices for optimizing workloads to better control
cloud costs.

Incremental improvement of the best practices


This section of the article will change the governance MVP design to include new Azure policies and an
implementation of Azure Cost Management. Together, these two design changes will fulfill the new corporate
policy statements.
1. Implement Azure Cost Management.
a. Establish the right scope of access to align with the subscription pattern and the Resource Consistency
discipline. Assuming alignment with the governance MVP defined in prior articles, this requires
Enrollment Account Scope access for the cloud governance team executing on high-level reporting.
Additional teams outside of governance may require Resource Group Scope access.
b. Establish a budget in Azure Cost Management.
c. Review and act on initial recommendations. Have a recurring process to support reporting.
d. Configure and execute Azure Cost Management Reporting, both initial and recurring.
2. Update Azure Policy
a. Audit the tagging, management group, subscription, and resource group values to identify any
deviation.
b. Establish SKU size options to limit deployments to SKUs listed in deployment planning documentation.

Conclusion
Adding these processes and changes to the governance MVP helps remediate many of the risks associated with
cost governance. Together, they create the visibility, accountability, and optimization needed to control costs.

Next steps
As cloud adoption continues and delivers additional business value, risks and cloud governance needs will also
change. For the fictional company in this guide, the next step is using this governance investment to manage
multiple clouds.
Multicloud evolution
Standard enterprise governance guide: Multicloud
improvement
4 minutes to read • Edit Online

This article advances the narrative by adding controls for multicloud adoption.

Advancing the narrative


Microsoft recognizes that customers may adopt multiple clouds for specific purposes. The fictional customer in
this guide is no exception. In parallel with their Azure adoption journey, business success has led to the acquisition
of a small but complementary business. That business is running all of their IT operations on a different cloud
provider.
This article describes how things change when integrating the new organization. For purposes of the narrative, we
assume this company has completed each of the governance iterations outlined in this governance guide.
Changes in the current state
In the previous phase of this narrative, the company had begun actively pushing production applications to the
cloud through CI/CD pipelines.
Since then, some things have changed that will affect governance:
Identity is controlled by an on-premises instance of Active Directory. Hybrid identity is facilitated through
replication to Azure Active Directory.
IT Operations or Cloud Operations are largely managed by Azure Monitor and related automations.
Disaster recovery and business continuity is controlled by Azure Vault instances.
Azure Security Center is used to monitor security violations and attacks.
Azure Security Center and Azure Monitor are both used to monitor governance of the cloud.
Azure Blueprints, Azure Policy, and Azure management groups are used to automate compliance with policy.
Incrementally improve the future state
The goal is to integrate the acquisition company into existing operations wherever possible.

Changes in tangible risks


Business acquisition cost: Acquisition of the new business is estimated to be profitable in approximately five
years. Because of the slow rate of return, the board wants to control acquisition costs, as much as possible. There
is a risk of cost control and technical integration conflicting with one another.
This business risk can be expanded into a few technical risks:
Cloud migration might produce additional acquisition costs.
The new environment might not be properly governed, which could result in policy violations.

Incremental improvement of the policy statements


The following changes to policy will help remediate the new risks and guide implementation:
All assets in a secondary cloud must be monitored through existing operational management and security
monitoring tools.
All Organization Units must be integrated into the existing identity provider.
The primary identity provider should govern authentication to assets in the secondary cloud.

Incremental improvement of governance practices


This section of the article will change the governance MVP design to include new Azure policies and an
implementation of Azure Cost Management. Together, these design changes will fulfill the new corporate policy
statements.
1. Connect the networks. This step is executed by the Networking and IT Security teams, and supported by the
cloud governance team. Adding a connection from the MPLS/leased-line provider to the new cloud will
integrate networks. Adding routing tables and firewall configurations will control access and traffic between the
environments.
2. Consolidate identity providers. Depending on the workloads being hosted in the secondary cloud, there are a
variety of options to identity provider consolidation. The following are a few examples:
a. For applications that authenticate using OAuth 2, users from Active Directory in the secondary cloud
can simply be replicated to the existing Azure AD tenant. This ensures all users can be authenticated in
the tenant.
b. At the other extreme, federation allows OUs to flow into Active Directory on-premises, then into the
Azure AD instance.
3. Add assets to Azure Site Recovery.
a. Azure Site Recovery was designed from the beginning as a hybrid or multicloud tool.
b. VMs in the secondary cloud might be able to be protected by the same Azure Site Recovery processes
used to protect on-premises assets.
4. Add assets to Azure Cost Management.
a. Azure Cost Management was designed from the beginning as a multicloud tool.
b. Virtual machines in the secondary cloud may be compatible with Azure Cost Management for some
cloud providers. Additional costs may apply.
5. Add assets to Azure Monitor.
a. Azure Monitor was designed as a hybrid cloud tool from inception.
b. Virtual machines in the secondary cloud may be compatible with Azure Monitor agents, allowing them
to be included in Azure Monitor for operational monitoring.
6. Adopt governance enforcement tools.
a. Governance enforcement is cloud-specific.
b. The corporate policies established in the governance guide are not cloud-specific. While the
implementation may vary from cloud to cloud, the policies can be applied to the secondary provider.
Multicloud adoption should be contained to where it is required based on technical needs or specific business
requirements. As multicloud adoption grows, so does complexity and security risks.

Conclusion
This series of articles described the incremental development of governance best practices, aligned with the
experiences of this fictional company. By starting small, but with the right foundation, the company could move
quickly and yet still apply the right amount of governance at the right time. The MVP by itself did not protect the
customer. Instead, it created the foundation to manage risks and add protections. From there, layers of governance
were applied to remediate tangible risks. The exact journey presented here won't align 100% with the experiences
of any reader. Rather, it serves as a pattern for incremental governance. You should mold these best practices to fit
your own unique constraints and governance requirements.
Governance guide for complex enterprises
8 minutes to read • Edit Online

Overview of best practices


This governance guide follows the experiences of a fictional company through various stages of governance
maturity. It is based on real customer experiences. The suggested best practices are based on the constraints and
needs of the fictional company.
As a quick starting point, this overview defines a minimum viable product (MVP ) for governance based on best
practices. It also provides links to some governance improvements that add further best practices as new
business or technical risks emerge.

WARNING
This MVP is a baseline starting point, based on a set of assumptions. Even this minimal set of best practices is based on
corporate policies driven by unique business risks and risk tolerances. To see if these assumptions apply to you, read the
longer narrative that follows this article.

Governance best practices


These best practices serve as a foundation for an organization to quickly and consistently add governance
guardrails across multiple Azure subscriptions.
Resource organization
The following diagram shows the governance MVP hierarchy for organizing resources.

Every application should be deployed in the proper area of the management group, subscription, and resource
group hierarchy. During deployment planning, the cloud governance team will create the necessary nodes in the
hierarchy to empower the cloud adoption teams.
1. Define a management group for each business unit with a detailed hierarchy that reflects geography first,
then environment type (for example, production or nonproduction environments).
2. Create a production subscription and a nonproduction subscription for each unique combination of discrete
business unit or geography. Creating multiple subscriptions requires careful consideration. For more
information, see the Subscription decision guide.
3. Apply consistent nomenclature at each level of this grouping hierarchy.
4. Resource groups should be deployed in a manner that considers its contents lifecycle. Resources that are
developed together, managed together, and retired together belong in the same resource group. For more
information on best practices for using resource groups, see here.
5. Region selection is incredibly important and must be considered so that networking, monitoring, auditing can
be in place for failover/failback as well as confirmation that needed SKUs are available in the preferred
regions.

These patterns provide room for growth without making the hierarchy needlessly complicated.

NOTE
In the event of changes to your business requirements, Azure management groups allow you to easily reorganize your
management hierarchy and subscription group assignments. However, keep in mind that policy and role assignments
applied to a management group are inherited by all subscriptions underneath that group in the hierarchy. If you plan to
reassign subscriptions between management groups, make sure that you are aware of any policy and role assignment
changes that may result. See the Azure management groups documentation for more information.

Governance of resources
A set of global policies and RBAC roles will provide a baseline level of governance enforcement. To meet the
cloud governance team's policy requirements, implementing the governance MVP requires completing the
following tasks:
1. Identify the Azure Policy definitions needed to enforce business requirements. This can include using built-in
definitions and creating new custom definitions.
2. Create a blueprint definition using these built-in and custom policy and the role assignments required by the
governance MVP.
3. Apply policies and configuration globally by assigning the blueprint definition to all subscriptions.
Identify policy definitions
Azure provides several built-in policies and role definitions that you can assign to any management group,
subscription, or resource group. Many common governance requirements can be handled using built-in
definitions. However, it's likely that you will also need to create custom policy definitions to handle your specific
requirements.
Custom policy definitions are saved to either a management group or a subscription and are inherited through
the management group hierarchy. If a policy definition's save location is a management group, that policy
definition is available to assign to any of that group's child management groups or subscriptions.
Since the policies required to support the governance MVP are meant to apply to all current subscriptions, the
following business requirements will be implemented using a combination of built-in definitions and custom
definitions created in the root management group:
1. Restrict the list of available role assignments to a set of built-in Azure roles authorized by your cloud
governance team. This requires a custom policy definition.
2. Require the following tags on all resources: Department/Billing Unit, Geography, Data Classification,
Criticality, SLA, Environment, Application Archetype, Application, and Application Owner. This can be
handled using the Require specified tag built-in definition.
3. Require that the Application tag for resources should match the name of the relevant resource group. This
can be handled using the "Require tag and its value" built-in definition.
For information on defining custom policies see the Azure Policy documentation. For guidance and examples of
custom policies, consult the Azure Policy samples site and the associated GitHub repository.
Assign Azure Policy and RBAC roles using Azure Blueprints
Azure policies can be assigned at the resource group, subscription, and management group level, and can be
included in Azure Blueprints definitions. Although the policy requirements defined in this governance MVP apply
to all current subscriptions, it's very likely that future deployments will require exceptions or alternative policies.
As a result, assigning policy using management groups, with all child subscriptions inheriting these assignments,
may not be flexible enough to support these scenarios.
Azure Blueprints allow the consistent assignment of policy and roles, application of Resource Manager
templates, and deployment of resource groups across multiple subscriptions. As with policy definitions, blueprint
definitions are saved to management groups or subscriptions, and are available through inheritance to any
children in the management group hierarchy.
The cloud governance team has decided that enforcement of required Azure Policy and RBAC assignments
across subscriptions will be implemented through Azure Blueprints and associated artifacts:
1. In the root management group, create a blueprint definition named governance-baseline .
2. Add the following blueprint artifacts to the blueprint definition:
a. Policy assignments for the custom Azure Policy definitions defined at the management group root.
b. Resource group definitions for any groups required in subscriptions created or governed by the
Governance MVP.
c. Standard role assignments required in subscriptions created or governed by the Governance MVP.
3. Publish the blueprint definition.
4. Assign the governance-baseline blueprint definition to all subscriptions.

See the Azure Blueprints documentation for more information on creating and using blueprint definitions.
Secure hybrid VNet
Specific subscriptions often require some level of access to on-premises resources. This is common in migration
scenarios or dev scenarios where dependent resources reside in the on-premises datacenter.
Until trust in the cloud environment is fully established it's important to tightly control and monitor any allowed
communication between the on-premises environment and cloud workloads, and that the on-premises network
is secured against potential unauthorized access from cloud-based resources. To support these scenarios, the
governance MVP adds the following best practices:
1. Establish a cloud secure hybrid VNet.
a. The VPN reference architecture establishes a pattern and deployment model for creating a VPN
Gateway in Azure.
b. Validate that on-premises security and traffic management mechanisms treat connected cloud
networks as untrusted. Resources and services hosted in the cloud should only have access to
authorized on-premises services.
c. Validate that the local edge device in the on-premises datacenter is compatible with Azure VPN
Gateway requirements and is configured to access the public internet.
d. Note that VPN tunnels should not be considered production ready circuits for anything but the most
simple workloads. Anything beyond a few simple workloads requiring on-premises connectivity should
use Azure ExpressRoute.
2. In the root management group, create a second blueprint definition named secure-hybrid-vnet .
a. Add the Resource Manager template for the VPN Gateway as an artifact to the blueprint definition.
b. Add the Resource Manager template for the virtual network as an artifact to the blueprint definition.
c. Publish the blueprint definition.
3. Assign the secure-hybrid-vnet blueprint definition to any subscriptions requiring on-premises connectivity.
This definition should be assigned in addition to the governance-baseline blueprint definition.

One of the biggest concerns raised by IT security and traditional governance teams is the risk that early stage
cloud adoption will compromise existing assets. The above approach allows cloud adoption teams to build and
migrate hybrid solutions, with reduced risk to on-premises assets. As trust in the cloud environment increases,
later evolutions may remove this temporary solution.

NOTE
The above is a starting point to quickly create a baseline governance MVP. This is only the beginning of the governance
journey. Further evolution will be needed as the company continues to adopt the cloud and takes on more risk in the
following areas:
Mission-critical workloads
Protected data
Cost management
Multicloud scenarios
Moreover, the specific details of this MVP are based on the example journey of a fictional company, described in the articles
that follow. We highly recommend becoming familiar with the other articles in this series before implementing this best
practice.

Incremental governance improvements


Once this MVP has been deployed, additional layers of governance can be quickly incorporated into the
environment. Here are some ways to improve the MVP to meet specific business needs:
Security Baseline for protected data
Resource configurations for mission-critical applications
Controls for Cost Management
Controls for incremental multicloud improvement

What does this guidance provide?


In the MVP, practices and tools from the Deployment Acceleration discipline are established to quickly apply
corporate policy. In particular, the MVP uses Azure Blueprints, Azure Policy, and Azure management groups to
apply a few basic corporate policies, as defined in the narrative for this fictional company. Those corporate
policies are applied using Azure Resource Manager templates and Azure policies to establish a small baseline for
identity and security.
Incremental improvements to governance practices
Over time, this governance MVP will be used to incrementally improve governance practices. As adoption
advances, business risk grows. Various disciplines within the Cloud Adoption Framework governance model will
adapt to manage those risks. Later articles in this series discuss the changes in corporate policy affecting the
fictional company. These changes happen across four disciplines:
Identity Baseline, as migration dependencies change in the narrative.
Cost Management, as adoption scales.
Security Baseline, as protected data is deployed.
Resource Consistency, as IT Operations begins supporting mission-critical workloads.

Next steps
Now that you're familiar with the governance MVP and the forthcoming governance changes, read the
supporting narrative for additional context.
Read the supporting narrative
Governance guide for complex enterprises: The
supporting narrative
4 minutes to read • Edit Online

The following narrative establishes a use case for governance during complex enterprise's cloud adoption journey.
Before acting on the recommendations in the guide, it's important to understand the assumptions and reasoning
that are reflected in this narrative. Then you can better align the governance strategy to your own organization's
cloud adoption journey.

Back story
Customers are demanding a better experience when interacting with this company. The current experience caused
market erosion and led to the board to hire a Chief Digital Officer (CDO ). The CDO is working with marketing and
sales to drive a digital transformation that will power improved experiences. Additionally, several business units
recently hired data scientists to farm data and improve many of the manual experiences through learning and
prediction. IT is supporting these efforts where it can. However, there are "shadow IT" activities occurring that fall
outside of needed governance and security controls.
The IT organization is also facing its own challenges. Finance is planning continued reductions in the IT budget
over the next five years, leading to some necessary spending cuts starting this year. Conversely, GDPR and other
data sovereignty requirements are forcing IT to invest in assets in additional countries to localize data. Two of the
existing datacenters are overdue for hardware refreshes, causing further problems with employee and customer
satisfaction. Three more datacenters require hardware refreshes during the execution of the five-year plan. The
CFO is pushing the CIO to consider the cloud as an alternative for those datacenters, to free up capital expenses.
The CIO has innovative ideas that could help the company, but she and her teams are limited to fighting fires and
controlling costs. At a luncheon with the CDO and one of the business unit leaders, the cloud migration
conversation generated interest from the CIO's peers. The three leaders aim to support each other using the cloud
to achieve their business objectives, and they have begun the exploration and planning phases of cloud adoption.

Business characteristics
The company has the following business profile:
Sales and operations span multiple geographic areas with global customers in multiple markets.
The business grew through acquisition and operates across three business units based on the target customer
base. Budgeting is a complex matrix across business units and functions.
The business views most of IT as a capital drain or a cost center.

Current state
Here is the current state of the company's IT and cloud operations:
IT operates more than 20 privately owned datacenters around the globe.
Due to organic growth and multiple geographies, there are a few IT teams that have unique data sovereignty
and compliance requirements that impact a single business unit operating within a specific geography.
Each datacenter is connected by a series of regional leased lines, creating a loosely coupled global WAN.
IT entered the cloud by migrating all end-user email accounts to Office 365. This migration was completed
more than six months ago. Since then, only a few IT assets have been deployed to the cloud.
The CDO's primary development team is working in a dev/test capacity to learn about cloud-native capabilities.
One business unit is experimenting with big data in the cloud. The BI team inside of IT is participating in that
effort.
The existing IT governance policy states that personal customer data and financial data must be hosted on
assets owned directly by the company. This policy blocks cloud adoption for any mission-critical apps or
protected data.
IT investments are controlled largely by capital expense. Those investments are planned yearly and often
include plans for ongoing maintenance, as well as established refresh cycles of three to five years depending on
the datacenter.
Most investments in technology that don't align to the annual plan are addressed by shadow IT efforts. Those
efforts are usually managed by business units and funded through the business unit's operating expenses.

Future state
The following changes are anticipated over the next several years:
The CIO is leading an effort to modernize the policy on personal and financial data to support future goals.
Two members of the IT Governance team have visibility into this effort.
The CIO wants to use the cloud migration as a forcing function to improve consistency and stability across
business units and geographies. However, the future state must respect any external compliance requirements
which would require deviation from standard approaches by specific IT teams.
If the early experiments in App Dev and BI show leading indicators of success, they would each like to release
small-scale production solutions to the cloud in the next 24 months.
The CIO and CFO have assigned an architect and the Vice President of Infrastructure to create a cost analysis
and feasibility study. These efforts will determine if the company can and should move 5,000 assets to the
cloud over the next 36 months. A successful migration would allow the CIO to eliminate two datacenters,
reducing costs by over $100M USD during the five-year plan. If three to four datacenters can experience
similar results, the budget will be back in the black, giving the CIO budget to support more innovative
initiatives.

Along with this cost savings, the company plans to change the management of some IT investments by
repositioning the committed capital expense as an operating expense within IT. This change will provide greater
cost control, which IT can use to accelerate other planned efforts.

Next steps
The company has developed a corporate policy to shape the governance implementation. The corporate policy
drives many of the technical decisions.
Review the initial corporate policy
Governance guide for complex enterprises: Initial
corporate policy behind the governance strategy
5 minutes to read • Edit Online

The following corporate policy defines the initial governance position, which is the starting point for this guide.
This article defines early-stage risks, initial policy statements, and early processes to enforce policy statements.

NOTE
The corporate policy is not a technical document, but it drives many technical decisions. The governance MVP described in
the overview ultimately derives from this policy. Before implementing a governance MVP, your organization should develop a
corporate policy based on your own objectives and business risks.

Cloud governance team


The CIO recently held a meeting with the IT governance team to understand the history of the personal data and
mission-critical policies and review the effect of changing those policies. The CIO also discussed the overall
potential of the cloud for IT and the company.
After the meeting, two members of the IT Governance team requested permission to research and support the
cloud planning efforts. Recognizing the need for governance and an opportunity to limit shadow IT, the Director of
IT Governance supported this idea. With that, the cloud governance team was born. Over the next several months,
they will inherit the cleanup of many mistakes made during exploration in the cloud from a governance
perspective. This will earn them the moniker of cloud custodians. In later iterations, this guide will show how their
roles change over time.

Objective
The initial objective is to establish a foundation for governance agility. An effective Governance MVP allows the
governance team to stay ahead of cloud adoption and implement guardrails as the adoption plan changes.

Business risks
The company is at an early stage of cloud adoption, experimenting and building proofs of concept. Risks are now
relatively low, but future risks are likely to have a significant impact. There is little definition around the final state
of the technical solutions to be deployed to the cloud. In addition, the cloud readiness of IT employees is low. A
foundation for cloud adoption will help the team safely learn and grow.
Future-proofing: There is a risk of not empowering growth, but also a risk of not providing the right protections
against future risks.
An agile yet robust governance approach is needed to support the board's vision for corporate and technical
growth. Failure to implement such a strategy will slow technical growth, potentially risking current and future
market share growth. The impact of such a business risk is unquestionably high. However, the role IT will play in
those potential future states is unknown, making the risk associated with current IT efforts relatively high. That
said, until more concrete plans are aligned, the business has a high tolerance for risk.
This business risk can be broken down tactically into several technical risks:
Well-intended corporate policies could slow transformation efforts or break critical business processes, if not
considered within a structured approval flow.
The application of governance to deployed assets could be difficult and costly.
Governance may not be properly applied across an application or workload, creating gaps in security.
With so many teams working in the cloud, there is a risk of inconsistency.
Costs may not properly align to business units, teams, or other budgetary management units.
The use of multiple identities to manage various deployments could lead to security issues.
Despite current policies, there is a risk that protected data could be mistakenly deployed to the cloud.

Tolerance indicators
The current risk tolerance is high and the appetite for investing in cloud governance is low. As such, the tolerance
indicators act as an early warning system to trigger the investment of time and energy. If the following indicators
are observed, it would be wise to advance the governance strategy.
Cost Management: Scale of deployment exceeds 1,000 assets to the cloud, or monthly spending exceeds
$10,000 USD per month.
Identity Baseline: Inclusion of applications with legacy or third-party multi-factor authentication
requirements.
Security Baseline: Inclusion of protected data in defined cloud adoption plans.
Resource Consistency: Inclusion of any mission-critical applications in defined cloud adoption plans.

Policy statements
The following policy statements establish the requirements needed to remediate the defined risks. These policies
define the functional requirements for the governance MVP. Each will be represented in the implementation of the
governance MVP.
Cost Management:
For tracking purposes, all assets must be assigned to an application owner within one of the core business
functions.
When cost concerns arise, additional governance requirements will be established with the finance team.
Security Baseline:
Any asset deployed to the cloud must have an approved data classification.
No assets identified with a protected level of data may be deployed to the cloud, until sufficient requirements
for security and governance can be approved and implemented.
Until minimum network security requirements can be validated and governed, cloud environments are seen as
a demilitarized zone and should meet similar connection requirements to other datacenters or internal
networks.
Resource Consistency:
Because no mission-critical workloads are deployed at this stage, there are no SLA, performance, or BCDR
requirements to be governed.
When mission-critical workloads are deployed, additional governance requirements will be established with IT
operations.
Identity Baseline:
All assets deployed to the cloud should be controlled using identities and roles approved by current governance
policies.
All groups in the on-premises Active Directory infrastructure that have elevated privileges should be mapped to
an approved RBAC role.
Deployment Acceleration:
All assets must be grouped and tagged according to defined grouping and tagging strategies.
All assets must use an approved deployment model.
Once a governance foundation has been established for a cloud provider, any deployment tooling must be
compatible with the tools defined by the governance team.

Processes
No budget has been allocated for ongoing monitoring and enforcement of these governance policies. Because of
that, the cloud governance team has some ad hoc ways to monitor adherence to policy statements.
Education: The cloud governance team is investing time to educate the cloud adoption teams on the
governance guides that support these policies.
Deployment reviews: Before deploying any asset, the cloud governance team will review the governance
guide with the cloud adoption teams.

Next steps
This corporate policy prepares the cloud governance team to implement the governance MVP, which will be the
foundation for adoption. The next step is to implement this MVP.
Best practices explained
Governance guide for complex enterprises: Best
practices explained
11 minutes to read • Edit Online

The governance guide begins with a set of initial corporate policies. These policies are used to establish a minimum
viable product (MVP ) for governance that reflects best practices.
In this article, we discuss the high-level strategies that are required to create a governance MVP. The core of the
governance MVP is the Deployment Acceleration discipline. The tools and patterns applied at this stage will enable
the incremental improvements needed to expand governance in the future.

Governance MVP (initial governance foundation)


Rapid adoption of governance and corporate policy is achievable, thanks to a few simple principles and cloud-
based governance tooling. These are the first of the three governance disciplines to approach in any governance
process. Each discipline will be explained further on in this article.
To establish the starting point, this article will discuss the high-level strategies behind Identity Baseline, Security
Baseline, and Deployment Acceleration that are required to create a governance MVP, which will serve as the
foundation for all adoption.

Implementation process
The implementation of the governance MVP has dependencies on Identity, Security, and Networking. Once the
dependencies are resolved, the cloud governance team will decide a few aspects of governance. The decisions from
the cloud governance team and from supporting teams will be implemented through a single package of
enforcement assets.
This implementation can also be described using a simple checklist:
1. Solicit decisions regarding core dependencies: Identity, Network, and Encryption.
2. Determine the pattern to be used during corporate policy enforcement.
3. Determine the appropriate governance patterns for the Resource Consistency, Resource Tagging, and Logging
and Reporting disciplines.
4. Implement the governance tools aligned to the chosen policy enforcement pattern to apply the dependent
decisions and governance decisions.

Dependent decisions
The following decisions come from teams outside of the cloud governance team. The implementation of each will
come from those same teams. However, the cloud governance team is responsible for implementing a solution to
validate that those implementations are consistently applied.
Identity Baseline
Identity Baseline is the fundamental starting point for all governance. Before attempting to apply governance,
identity must be established. The established identity strategy will then be enforced by the governance solutions. In
this governance guide, the Identity Management team implements the Directory Synchronization pattern:
RBAC will be provided by Azure Active Directory (Azure AD ), using the directory synchronization or "Same
Sign-On" that was implemented during company's migration to Office 365. For implementation guidance, see
Reference Architecture for Azure AD Integration.
The Azure AD tenant will also govern authentication and access for assets deployed to Azure.
In the governance MVP, the governance team will enforce application of the replicated tenant through subscription
governance tooling, discussed later in this article. In future iterations, the governance team could also enforce rich
tooling in Azure AD to extend this capability.
Security Baseline: Networking
Software Defined Network is an important initial aspect of the Security Baseline. Establishing the governance
MVP depends on early decisions from the Security Management team to define how networks can be safely
configured.
Given the lack of requirements, IT security is playing it safe and requires a Cloud DMZ Pattern. That means
governance of the Azure deployments themselves will be very light.
Azure subscriptions may connect to an existing datacenter via VPN, but must follow all existing on-premises IT
governance policies regarding connection of a demilitarized zone to protected resources. For implementation
guidance regarding VPN connectivity, see VPN Reference Architecture.
Decisions regarding subnet, firewall, and routing are currently being deferred to each application/workload
lead.
Additional analysis is required before releasing of any protected data or mission-critical workloads.
In this pattern, cloud networks can only connect to on-premises resources over an existing VPN that is compatible
with Azure. Traffic over that connection will be treated like any traffic coming from a demilitarized zone. Additional
considerations may be required on the on-premises edge device to securely handle traffic from Azure.
The cloud governance team has proactively invited members of the networking and IT security teams to regular
meetings, in order to stay ahead of networking demands and risks.
Security Baseline: Encryption
Encryption is another fundamental decision within the Security Baseline discipline. Because the company currently
does not yet store any protected data in the cloud, the Security Team has decided on a less aggressive pattern for
encryption. At this point, a cloud-native pattern for encryption is suggested but not required of any
development team.
No governance requirements have been set regarding the use of encryption, because the current corporate
policy does not permit mission-critical or protected data in the cloud.
Additional analysis will be required before releasing any protected data or mission-critical workloads.

Policy enforcement
The first decision to make regarding Deployment Acceleration is the pattern for enforcement. In this narrative, the
governance team decided to implement the Automated Enforcement pattern.
Azure Security Center will be made available to the security and identity teams to monitor security risks. Both
teams are also likely to use Security Center to identify new risks and improve corporate policy.
RBAC is required in all subscriptions to govern authentication enforcement.
Azure Policy will be published to each management group and applied to all subscriptions. However, the level
of policies being enforced will be very limited in this initial Governance MVP.
Although Azure management groups are being used, a relatively simple hierarchy is expected.
Azure Blueprints will be used to deploy and update subscriptions by applying RBAC requirements, Resource
Manager Templates, and Azure Policy across management groups.

Apply the dependent patterns


The following decisions represent the patterns to be enforced through the policy enforcement strategy above:
Identity Baseline. Azure Blueprints will set RBAC requirements at a subscription level to ensure that consistent
identity is configured for all subscriptions.
Security Baseline: Networking. The cloud governance team maintains a Resource Manager template for
establishing a VPN gateway between Azure and the on-premises VPN device. When an application team requires
a VPN connection, the cloud governance team will apply the gateway Resource Manager template via Azure
Blueprints.
Security Baseline: Encryption. At this point, no policy enforcement is required in this area. This will be revisited
during later iterations.

Application of governance-defined patterns


The cloud governance team will be responsible for the following decisions and implementations. Many will require
inputs from other teams, but the cloud governance team is likely to own both the decision and implementation.
The following sections outline the decisions made for this use case and details of each decision.
Subscription design
The decision on what subscription design to use determines how Azure subscriptions get structured and how
Azure management groups will be used to efficiently manage access, policies, and compliance of these
subscription. In this narrative, the governance team has chosen the Mixed subscription design pattern.
As new requests for Azure resources arise, a "Department" should be established for each major business unit
in each operating geography. Within each of the Departments, "Subscriptions" should be created for each
application archetype.
An application archetype is a means of grouping applications with similar needs. Common examples include:
Applications with protected data, governed applications (such as HIPAA or FedRAMP ), low -risk applications,
applications with on-premises dependencies, SAP or other mainframe applications in Azure, or applications
that extend on-premises SAP or mainframe applications. Each organization has unique needs based on data
classifications and the types of applications that support the business. Dependency mapping of the digital estate
can help define the application archetypes in an organization.
A common naming convention should be agreed upon as part of the subscription design, based on the above
two bullets.
Resource consistency
Resource consistency decisions determine the tools, processes, and effort required to ensure Azure resources are
deployed, configured, and managed consistently within a subscription. In this narrative, Deployment
Consistency has been chosen as the primary resource consistency pattern.
Resource groups are created for applications using the lifecycle approach. Everything that is created,
maintained, and retired together should reside a single resource group. For more on resource groups, see here.
Azure Policy should be applied to all subscriptions from the associated management group.
As part of the deployment process, Azure Resource Consistency templates for the resource group should be
stored in source control.
Each resource group is associated with a specific workload or application based on the lifecycle approach
described above.
Azure management groups enable updating governance designs as corporate policy matures.
Extensive implementation of Azure Policy could exceed the team's time commitments and may not provide a
great deal of value at this time. However, a simple default policy should be created and applied to each
management group to enforce the small number of current cloud governance policy statements. This policy will
define the implementation of specific governance requirements. Those implementations can then be applied
across all deployed assets.

IMPORTANT
Any time a resource in a resource group no longer shares the same lifecycle, it should be moved to another resource group.
Examples include common databases and networking components. While they may serve the application being developed,
they may also serve other purposes and should therefore exist in other resource groups.

Resource tagging
Resource tagging decisions determine how metadata is applied to Azure resources within a subscription to
support operations, management, and accounting purposes. In this narrative, the Accounting pattern has been
chosen as the default model for resource tagging.
Deployed assets should be tagged with values for:
Department/Billing Unit
Geography
Data Classification
Criticality
SLA
Environment
Application Archetype
Application
Application Owner
These values along with the Azure management group and subscription associated with a deployed asset will
drive governance, operations, and security decisions.
Logging and reporting
Logging and reporting decisions determine how your store log data and how the monitoring and reporting tools
that keep IT staff informed on operational health are structured. In this narrative a Hybrid monitoring pattern for
logging and reporting is suggested, but not required of any development team at this point.
No governance requirements are currently set regarding the specific data points to be collected for logging or
reporting purposes. This is specific to this fictional narrative and should be considered an antipattern. Logging
standards should be determined and enforced as soon as possible.
Additional analysis is required before the release of any protected data or mission-critical workloads.
Before supporting protected data or mission-critical workloads, the existing on-premises operational
monitoring solution must be granted access to the workspace used for logging. Applications are required to
meet security and logging requirements associated with the use of that tenant, if the application is to be
supported with a defined SLA.

Incremental of governance processes


Some of the policy statements cannot or should not be controlled by automated tooling. Other policies will require
periodic effort from IT Security and on-premises Identity Baseline teams. The cloud governance team will need to
oversee the following processes to implement the last eight policy statements:
Corporate policy changes: The cloud governance team will make changes to the governance MVP design to
adopt the new policies. The value of the governance MVP is that it will allow for the automatic enforcement of the
new policies.
Adoption acceleration: The cloud governance team has been reviewing deployment scripts across multiple
teams. They've maintained a set of scripts that serve as deployment templates. Those templates can be used by the
cloud adoption teams and DevOps teams to more quickly define deployments. Each script contains the
requirements for enforcing governance policies, and additional effort from cloud adoption engineers is not needed.
As the curators of these scripts, they can implement policy changes more quickly. Additionally, they are viewed as
accelerators of adoption. This ensures consistent deployments without strictly enforcing adherence.
Engineer training: The cloud governance team offers bimonthly training sessions and has created two videos for
engineers. Both resources help engineers get up to speed quickly on the governance culture and how deployments
are performed. The team is adding training assets to demonstrate the difference between production and
nonproduction deployments, which helps engineers understand how the new policies affect adoption. This ensures
consistent deployments without strictly enforcing adherence.
Deployment planning: Before deploying any asset containing protected data, the cloud governance team will be
responsible for reviewing deployment scripts to validate governance alignment. Existing teams with previously
approved deployments will be audited using programmatic tooling.
Monthly audit and reporting: Each month, the cloud governance team runs an audit of all cloud deployments to
validate continued alignment to policy. When deviations are discovered, they are documented and shared with the
cloud adoption teams. When enforcement doesn't risk a business interruption or data leak, the policies are
automatically enforced. At the end of the audit, the cloud governance team compiles a report for the cloud strategy
team and each cloud adoption team to communicate overall adherence to policy. The report is also stored for
auditing and legal purposes.
Quarterly policy review: Each quarter, the cloud governance team and the cloud strategy team to review audit
results and suggest changes to corporate policy. Many of those suggestions are the result of continuous
improvements and the observation of usage patterns. Approved policy changes are integrated into governance
tooling during subsequent audit cycles.

Alternative patterns
If any of the patterns chosen in this governance guide don't align with the reader's requirements, alternatives to
each pattern are available:
Encryption patterns
Identity patterns
Logging and Reporting patterns
Policy Enforcement patterns
Resource Consistency patterns
Resource Tagging patterns
Software Defined Networking patterns
Subscription Design patterns

Next steps
Once this guidance is implemented, each cloud adoption team can proceed with a solid governance foundation. At
the same time, the cloud governance team will work to continually update the corporate policies and governance
disciplines.
Both teams will use the tolerance indicators to identify the next set of improvements needed to continue
supporting cloud adoption. The next step for this company is incremental improvement of their governance
baseline to support applications with legacy or third-party multi-factor authentication requirements.
Improve the Identity Baseline discipline
Governance guide for complex enterprises: Improve
the Identity Baseline discipline
4 minutes to read • Edit Online

This article advances the narrative by adding Identity Baseline controls to the governance MVP.

Advancing the narrative


The business justification for the cloud migration of the two datacenters was approved by the CFO. During the
technical feasibility study, several roadblocks were discovered:
Protected data and mission-critical applications represent 25% of the workloads in the two datacenters. Neither
can be eliminated until the current governance policies regarding sensitive personal data and mission-critical
applications have been modernized.
7% of the assets in those datacenters are not cloud-compatible. They will be moved to an alternate datacenter
before termination of the datacenter contract.
15% of the assets in the datacenter (750 virtual machines) have a dependency on legacy authentication or
third-party multi-factor authentication.
The VPN connection that connects existing datacenters and Azure does not offer sufficient data transmission
speeds or latency to migrate the volume of assets within the two-year timeline to retire the datacenter.
The first two roadblocks are being managed in parallel. This article will address the resolution of the third and
fourth roadblocks.
Expand the cloud governance team
The cloud governance team is expanding. Given the need for additional support regarding identity management, a
systems administrator from the Identity Baseline team now participates in a weekly meeting to keep the existing
team members aware of changes.
Changes in the current state
The IT team has approval to move forward with the CIO and CFO's plans to retire two datacenters. However, IT is
concerned that 750 (15%) of the assets in those datacenters will have to be moved somewhere other than the
cloud.
Incrementally improve the future state
The new future state plans require a more robust Identity Baseline solution to migrate the 750 virtual machines
with legacy authentication requirements. Beyond these two datacenters, this challenge is expected to affect similar
percentages of assets in other datacenters.
The future state now also requires a connection from the cloud provider to the company's MPLS/leased-line
solution.
The changes to current and future state expose new risks that will require new policy statements.

Changes in tangible risks


Business interruption during migration. Migration to the cloud creates a controlled, time-bound risk that can
be managed. Moving aging hardware to another part of the world is much higher risk. A mitigation strategy is
needed to avoid interruptions to business operations.
Existing identity dependencies. Dependencies on existing authentication and identity services may delay or
prevent the migration of some workloads to the cloud. Failure to return the two datacenters on time will incur
millions of dollars in datacenter lease fees.
This business risk can be expanded into a few technical risks:
Legacy authentication might not be available in the cloud, limiting deployment of some applications.
The current third-party multi-factor authentication solution might not be available in the cloud, limiting
deployment of some applications.
Retooling or moving could create outages or add costs.
The speed and stability of the VPN might impede migration.
Traffic entering the cloud could cause security issues in other parts of the global network.

Incremental improvement of the policy statements


The following changes to policy will help remediate the new risks and guide implementation.
The chosen cloud provider must offer a means of authenticating via legacy methods.
The chosen cloud provider must offer a means of authentication with the current third-party multi-factor
authentication solution.
A high-speed private connection should be established between the cloud provider and the company's telco
provider, connecting the cloud provider to the global network of datacenters.
Until sufficient security requirements are established, no inbound public traffic may access company assets
hosted in the cloud. All ports are blocked from any source outside of the global WAN.

Incremental improvement of the best practices


The governance MVP design changes to include new Azure policies and an implementation of Active Directory on
a virtual machine. Together, these two design changes fulfill the new corporate policy statements.
Here are the new best practices:
Secure hybrid VNet blueprint: The on-premises side of the hybrid network should be configured to allow
communication between the following solution and the on-premises Active Directory servers. This best
practice requires a DMZ to enable Active Directory Domain Services across network boundaries.
Azure Resource Manager templates:
1. Define an NSG to block external traffic and allow internal traffic.
2. Deploy two Active Directory virtual machines in a load-balanced pair based on a golden image. On first
boot, that image runs a PowerShell script to join the domain and register with domain services. For
more information, see Extend Active Directory Domain Services (AD DS ) to Azure.
Azure Policy: Apply the NSG to all resources.
Azure blueprint:
1. Create a blueprint named active-directory-virtual-machines .
2. Add each of the Active Directory templates and policies to the blueprint.
3. Publish the blueprint to any applicable management group.
4. Apply the blueprint to any subscription requiring legacy or third-party multi-factor authentication.
5. The instance of Active Directory running in Azure can now be used as an extension of the on-premises
Active Directory solution, allowing it to integrate with the existing multi-factor authentication tool and
provide claims-based authentication, both through existing Active Directory functionality.

Conclusion
Adding these changes to the governance MVP helps remediate many of the risks in this article, allowing each
cloud adoption team to quickly move past this roadblock.

Next steps
As cloud adoption continues and delivers additional business value, risks and cloud governance needs will also
change. The following are a few changes that may occur. For this fictional company, the next trigger is the inclusion
of protected data in the cloud adoption plan. This change requires additional security controls.
Improve the Security Baseline discipline
Governance guide for complex enterprises: Improve
the Security Baseline discipline
13 minutes to read • Edit Online

This article advances the narrative by adding security controls that support moving protected data to the cloud.

Advancing the narrative


The CIO has spent months collaborating with colleagues and the company's legal staff. A management consultant
with expertise in cybersecurity was engaged to help the existing IT Security and IT Governance teams draft a new
policy regarding protected data. The group was able to foster board support to replace the existing policy, allowing
sensitive personal and financial data to be hosted by approved cloud providers. This required adopting a set of
security requirements and a governance process to verify and document adherence to those policies.
For the past 12 months, the cloud adoption teams have cleared most of the 5,000 assets from the two datacenters
to be retired. The 350 incompatible assets were moved to an alternate datacenter. Only the 1,250 virtual machines
that contain protected data remain.
Changes in the cloud governance team
The cloud governance team continues to change along with the narrative. The two founding members of the team
are now among the most respected cloud architects in the company. The collection of configuration scripts has
grown as new teams tackle innovative new deployments. The cloud governance team has also grown. Most
recently, members of the IT Operations team have joined cloud governance team activities to prepare for cloud
operations. The cloud architects who helped foster this community are seen both as cloud guardians and cloud
accelerators.
While the difference is subtle, it is an important distinction when building a governance-focused IT culture. A
cloud custodian cleans up the messes made by innovative cloud architects, and the two roles have natural friction
and opposing objectives. A cloud guardian helps keep the cloud safe, so other cloud architects can move more
quickly with fewer messes. A cloud accelerator performs both functions but is also involved in the creation of
templates to accelerate deployment and adoption, becoming an innovation accelerator as well as a defender of the
Five Disciplines of Cloud Governance.
Changes in the current state
In the previous phase of this narrative, the company had begun the process of retiring two datacenters. This
ongoing effort includes migrating some applications with legacy authentication requirements, which required
incremental improvements to the Identity Baseline, described in the previous article.
Since then, some things have changed that will affect governance:
Thousands of IT and business assets have been deployed to the cloud.
The application development team has implemented a continuous integration and continuous deployment
(CI/CD ) pipeline to deploy a cloud-native application with an improved user experience. That application
doesn't interact with protected data yet, so it isn't production ready.
The Business Intelligence team within IT actively curates data in the cloud from logistics, inventory, and third-
party data. This data is being used to drive new predictions, which could shape business processes. However,
those predictions and insights are not actionable until customer and financial data can be integrated into the
data platform.
The IT team is progressing on the CIO and CFO's plans to retire two datacenters. Almost 3,500 of the assets in
the two datacenters have been retired or migrated.
The policies regarding sensitive personal and financial data have been modernized. However, the new
corporate policies are contingent on the implementation of related security and governance policies. Teams are
still stalled.
Incrementally improve the future state
Early experiments from the application development and BI teams have shown potential improvements in
customer experiences and data-driven decisions. Both teams would like to expand adoption of the cloud over
the next 18 months by deploying those solutions to production.
IT has developed a business justification to migrate five more datacenters to Azure, which will further decrease
IT costs and provide greater business agility. While smaller in scale, the retirement of those datacenters is
expected to double the total cost savings.
Capital expense and operating expense budgets have approved to implement the required security and
governance policies, tools, and processes. The expected cost savings from the datacenter retirement are more
than enough to pay for this new initiative. IT and business leadership are confident this investment will
accelerate the realization of returns in other areas. The grassroots cloud governance team became a recognized
team with dedicated leadership and staffing.
Collectively, the cloud adoption teams, the cloud governance team, the IT security team, and the IT governance
team will implement security and governance requirements to allow cloud adoption teams to migrate
protected data into the cloud.

Changes in tangible risks


Data breach: There is an inherent increase in liabilities related to data breaches when adopting any new data
platform. Technicians adopting cloud technologies have increased responsibilities to implement solutions that can
decrease this risk. A robust security and governance strategy must be implemented to ensure those technicians
fulfill those responsibilities.
This business risk can be expanded into several technical risks:
1. Mission-critical apps or protected data might be deployed unintentionally.
2. Protected data might be exposed during storage due to poor encryption decisions.
3. Unauthorized users might access protected data.
4. External intrusion could result in access to protected data.
5. External intrusion or denial of service attacks could cause a business interruption.
6. Organization or employment changes could allow for unauthorized access to protected data.
7. New exploits might create opportunities for intrusion or unauthorized access.
8. Inconsistent deployment processes might result in security gaps that could lead to data leaks or interruptions.
9. Configuration drift or missed patches might result in unintended security gaps that could lead to data leaks or
interruptions.
10. Disparate edge devices might increase network operations costs.
11. Disparate device configurations might lead to oversights in configuration and compromises in security.
12. The Cybersecurity team insists there is a risk of vendor lock-in from generating encryption keys on a single
cloud provider's platform. While this claim is unsubstantiated, it was accepted by the team for the time being.

Incremental improvement of the policy statements


The following changes to policy will help remediate the new risks and guide implementation. The list looks long,
but the adoption of these policies may be easier than it would appear.
1. All deployed assets must be categorized by criticality and data classification. Classifications are to be reviewed
by the cloud governance team and the application before deployment to the cloud.
2. Applications that store or access protected data are to be managed differently than those that don't. At a
minimum, they should be segmented to avoid unintended access of protected data.
3. All protected data must be encrypted when at rest.
4. Elevated permissions in any segment containing protected data should be an exception. Any such exceptions
will be recorded with the cloud governance team and audited regularly.
5. Network subnets containing protected data must be isolated from any other subnets. Network traffic between
protected data subnets will be audited regularly.
6. No subnet containing protected data can be directly accessed over the public internet or across datacenters.
Access to these subnets must be routed through intermediate subnets. All access into these subnets must come
through a firewall solution that can perform packet scanning and blocking functions.
7. Governance tooling must audit and enforce network configuration requirements defined by the Security
Management team.
8. Governance tooling must limit VM deployment to approved images only.
9. Whenever possible, node configuration management should apply policy requirements to the configuration of
any guest operating system. Node configuration management should respect the existing investment in Group
Policy Object (GPO ) for resource configuration.
10. Governance tooling will audit that automatic updates are enabled on all deployed assets. When possible,
automatic updates will be enforced. When not enforced by tooling, node-level violations must be reviewed with
operational management teams and remediated in accordance with operations policies. Assets that are not
automatically updated must be included in processes owned by IT Operations.
11. Creation of new subscriptions or management groups for any mission-critical applications or protected data
requires a review from the cloud governance team to ensure proper blueprint assignment.
12. A least-privilege access model will be applied to any subscription that contains mission-critical applications or
protected data.
13. The cloud vendor must be capable of integrating encryption keys managed by the existing on-premises
solution.
14. The cloud vendor must be capable of supporting the existing edge device solution and any required
configurations to protect any publicly exposed network boundary.
15. The cloud vendor must be capable of supporting a shared connection to the global WAN, with data
transmission routed through the existing edge device solution.
16. Trends and exploits that could affect cloud deployments should be reviewed regularly by the security team to
provide updates to Security Baseline tooling used in the cloud.
17. Deployment tooling must be approved by the cloud governance team to ensure ongoing governance of
deployed assets.
18. Deployment scripts must be maintained in a central repository accessible by the cloud governance team for
periodic review and auditing.
19. Governance processes must include audits at the point of deployment and at regular cycles to ensure
consistency across all assets.
20. Deployment of any applications that require customer authentication must use an approved identity provider
that is compatible with the primary identity provider for internal users.
21. Cloud Governance processes must include quarterly reviews with Identity Baseline teams to identify malicious
actors or usage patterns that should be prevented by cloud asset configuration.

Incremental improvement of the best practices


This section modifies the governance MVP design to include new Azure policies and an implementation of Azure
Cost Management. Together, these two design changes will fulfill the new corporate policy statements.
The new best practices fall into two categories: corporate IT (hub) and cloud adoption (spoke).
Establishing a corporate IT hub and spoke subscription to centralize the Security Baseline: In this best
practice, the existing governance capacity is wrapped by a hub and spoke topology with shared services, with a
few key additions from the cloud governance team.
1. Azure DevOps repository. Create a repository in Azure DevOps to store and version all relevant Azure
Resource Manager templates and scripted configurations.
2. Hub and spoke template:
a. The guidance in the hub and spoke topology with shared services reference architecture can be used to
generate Resource Manager templates for the assets required in a corporate IT hub.
b. Using those templates, this structure can be made repeatable, as part of a central governance strategy.
c. In addition to the current reference architecture, a network security group template should be created to
capture any port blocking or whitelisting requirements for the VNet to host the firewall. This network
security group differs from prior groups, because it will be the first network security group to allow
public traffic into a VNet.
3. Create Azure policies. Create a policy named Hub NSG Enforcement to enforce the configuration of the network
security group assigned to any VNet created in this subscription. Apply the built-in Policies for guest
configuration as follows:
a. Audit that Windows web servers are using secure communication protocols.
b. Audit that password security settings are set correctly inside Linux and Windows machines.
4. Corporate IT blueprint
a. Create an Azure blueprint named corporate-it-subscription .
b. Add the hub and spoke templates and Hub NSG Enforcement policy.
5. Expanding on initial management group hierarchy.
a. For each management group that has requested support for protected data, the
corporate-it-subscription-blueprint blueprint provides an accelerated hub solution.
b. Because management groups in this fictional example include a regional hierarchy in addition to a
business unit hierarchy, this blueprint will be deployed in each region.
c. For each region in the management group hierarchy, create a subscription named
Corporate IT Subscription .
d. Apply the corporate-it-subscription-blueprint blueprint to each regional instance.
e. This will establish a hub for each business unit in each region. Note: Further cost savings could be
achieved, but sharing hubs across business units in each region.
6. Integrate group policy objects (GPO ) through Desired State Configuration (DSC ):
a. Convert GPO to DSC – The Microsoft Baseline Management project in GitHub can accelerate this effort.
Be sure to store DSC in the repository in parallel with Resource Manager templates.
b. Deploy Azure Automation State Configuration to any instances of the Corporate IT subscription. Azure
Automation can be used to apply DSC to VMs deployed in supported subscriptions within the
management group.
c. The current roadmap plans to enable custom guest configuration policies. When that feature is released,
the use of Azure Automation in this best practice will no longer be required.
Applying additional governance to a Cloud Adoption Subscription (Spoke): Building on the
Corporate IT Subscription , minor changes to the governance MVP applied to each subscription dedicated to the
support of application archetypes can produce rapid improvement.
In prior iterative changes to the best practice, we defined network security groups to block public traffic and
whitelisted internal traffic. Additionally, the Azure blueprint temporarily created DMZ and Active Directory
capabilities. In this iteration, we will tweak those assets a bit, creating a new version of the Azure blueprint.
1. Network peering template. This template will peer the VNet in each subscription with the Hub VNet in the
Corporate IT subscription.
a. The reference architecture from the prior section, hub and spoke topology with shared services,
generated a Resource Manager template for enabling VNet peering.
b. That template can be used as a guide to modify the DMZ template from the prior governance iteration.
c. We are now adding VNet peering to the DMZ VNet that was previously connected to the local edge
device over VPN.
d. *** The VPN should also be removed from this template as well to ensure no traffic is routed directly to
the on-premises datacenter, without passing through the corporate IT subscription and Firewall solution.
You could also set this VPN as a failover circuit in the event of an ExpressRoute circuit outge.
e. Additional network configuration will be required by Azure Automation to apply DSC to hosted VMs.
2. Modify the network security group. Block all public and direct on-premises traffic in the network security
group. The only inbound traffic should be coming through the VNet peer in the corporate IT subscription.
a. In the prior iteration, a network security group was created blocking all public traffic and whitelisting all
internal traffic. Now we want to shift this network security group a bit.
b. The new network security group configuration should block all public traffic, along with all traffic from
the local datacenter.
c. Traffic entering this VNet should only come from the VNet on the other side of the VNet peer.
3. Azure Security Center implementation:
a. Configure Azure Security Center for any management group that contains protected data classifications.
b. Set Automatic provisioning to on by default to ensure patching compliance.
c. Establish OS security configurations. IT Security to define the configuration.
d. Support IT Security in the initial use of Azure Security Center. Transition use of security center to IT
security, but maintain access for governance continuous improvement purposes.
e. Create a Resource Manager template reflecting the changes required for Azure Security Center
configuration within a subscription.
4. Update Azure Policy for all subscriptions.
a. Audit and enforce criticality and data classification across all management groups and subscriptions to
identify any subscriptions with protected data classifications.
b. Audit and enforce use of approved OS images only.
c. Audit and enforce guest configurations based on security requirements for each node.
5. Update Azure Policy for all subscriptions that contains protected data classifications.
a. Audit and enforce use of standard roles only
b. Audit and enforce application of encryption for all storage accounts and files at rest on individual nodes.
c. Audit and enforce the application of the new version of the DMZ network security group.
d. Audit and enforce use of approved network subnet and VNet per network interface.
e. Audit and enforce the limitation of user-defined routing tables.
6. Azure blueprint:
a. Create an Azure blueprint named protected-data .
b. Add the VNet peer, network security group, and Azure Security Center templates to the blueprint.
c. Ensure the template for Active Directory from the previous iteration is not included in the blueprint. Any
dependencies on Active Directory will be provided by the corporate IT subscription.
d. Terminate any existing Active Directory VMs deployed in the previous iteration.
e. Add the new policies for protected data subscriptions.
f. Publish the blueprint to any management group that will host protected data.
g. Apply the new blueprint to each affected subscription along with existing blueprints.

Conclusion
Adding these processes and changes to the governance MVP helps remediate many of the risks associated with
security governance. Together, they add the network, identity, and security monitoring tools needed to protect
data.
Next steps
As cloud adoption continues and delivers additional business value, risks and cloud governance needs also
change. For the fictional company in this guide, the next step is to support mission-critical workloads. This is the
point when Resource Consistency controls are needed.
Improve the Resource Consistency discipline
Governance guide for complex enterprises: Improve
the Resource Consistency discipline
6 minutes to read • Edit Online

This article advances the narrative by adding Resource Consistency controls to the governance MVP to support
mission-critical applications.

Advancing the narrative


The cloud adoption teams have met all requirements to move protected data. With those applications come SLA
commitments to the business and need for support from IT Operations. Right behind the team migrating the two
datacenters, multiple application development and BI teams are ready to begin launching new solutions into
production. IT Operations is new to cloud operations and needs to quickly integrate existing operational
processes.
Changes in the current state
IT is actively moving production workloads with protected data into Azure. Some low -priority workloads are
serving production traffic. More can be cut over as soon as IT Operations signs off on readiness to support the
workloads.
The application development teams are ready for production traffic.
The BI team is ready to integrate predictions and insights into the systems that run operations for the three
business units.
Incrementally improve the future state
IT operations is new to cloud operations and needs to quickly integrate existing operational processes.
The changes to current and future state expose new risks that will require new policy statements.

Changes in tangible risks


Business interruption: There is an inherent risk of any new platform causing interruptions to mission-critical
business processes. The IT Operations team and the teams executing on various cloud adoptions are relatively
inexperienced with cloud operations. This increases the risk of interruption and must be remediated and governed.
This business risk can be expanded into several technical risks:
1. Misaligned operational processes might lead to outages that can't be detected or mitigated quickly.
2. External intrusion or denial of service attacks might cause a business interruption.
3. Mission-critical assets might not be properly discovered and therefore not properly operated.
4. Undiscovered or mislabeled assets might not be supported by existing operational management processes.
5. Configuration of deployed assets might not meet performance expectations.
6. Logging might not be properly recorded and centralized to allow for remediation of performance issues.
7. Recovery policies may fail or take longer than expected.
8. Inconsistent deployment processes might result in security gaps that could lead to data leaks or interruptions.
9. Configuration drift or missed patches might result in unintended security gaps that could lead to data leaks or
interruptions.
10. Configuration might not enforce the requirements of defined SLAs or committed recovery requirements.
11. Deployed operating systems or applications might not meet OS and application hardening requirements.
12. There is a risk of inconsistency due to multiple teams working in the cloud.
Incremental improvement of the policy statements
The following changes to policy will help remediate the new risks and guide implementation. The list looks long,
but the adoption of these policies may be easier than it would appear.
1. All deployed assets must be categorized by criticality and data classification. Classifications are to be reviewed
by the cloud governance team and the application owner before deployment to the cloud.
2. Subnets containing mission-critical applications must be protected by a firewall solution capable of detecting
intrusions and responding to attacks.
3. Governance tooling must audit and enforce network configuration requirements defined by the Security
Baseline team.
4. Governance tooling must validate that all assets related to mission-critical applications or protected data are
included in monitoring for resource depletion and optimization.
5. Governance tooling must validate that the appropriate level of logging data is being collected for all mission-
critical applications or protected data.
6. Governance process must validate that backup, recovery, and SLA adherence are properly implemented for
mission-critical applications and protected data.
7. Governance tooling must limit virtual machine deployment to approved images only.
8. Governance tooling must enforce that automatic updates are prevented on all deployed assets that support
mission-critical applications. Violations must be reviewed with operational management teams and remediated
in accordance with operations policies. Assets that are not automatically updated must be included in processes
owned by IT operations to quickly and effectively update those servers.
9. Governance tooling must validate tagging related to cost, criticality, SLA, application, and data classification. All
values must align to predefined values managed by the cloud governance team.
10. Governance processes must include audits at the point of deployment and at regular cycles to ensure
consistency across all assets.
11. Trends and exploits that could affect cloud deployments should be reviewed regularly by the security team to
provide updates to Security Baseline tooling used in the cloud.
12. Before release into production, all mission-critical applications and protected data must be added to the
designated operational monitoring solution. Assets that cannot be discovered by the chosen IT operations
tooling cannot be released for production use. Any changes required to make the assets discoverable must be
made to the relevant deployment processes to ensure assets will be discoverable in future deployments.
13. When discovered, asset sizing is to be validated by operational management teams to validate that the asset
meets performance requirements.
14. Deployment tooling must be approved by the cloud governance team to ensure ongoing governance of
deployed assets.
15. Deployment scripts must be maintained in central repository accessible by the cloud governance team for
periodic review and auditing.
16. Governance review processes must validate that deployed assets are properly configured in alignment with
SLA and recovery requirements.

Incremental improvement of the best practices


This section of the article will improve the governance MVP design to include new Azure policies and an
implementation of Azure Cost Management. Together, these two design changes will fulfill the new corporate
policy statements.
Following the experience of this fictional example, it is assumed that the Protected Data changes have already
occurred. Building on that best practice, the following will add operational monitoring requirements, readying a
subscription for mission-critical applications.
Corporate IT subscription: Add the following to the Corporate IT subscription, which acts as a hub.
1. As an external dependency, the cloud operations team will need to define operational monitoring tooling,
business continuity and disaster recovery (BCDR ) tooling, and automated remediation tooling. The cloud
governance team can then support necessary discovery processes.
a. In this use case, the cloud operations team chose Azure Monitor as the primary tool for monitoring
mission-critical applications.
b. The team also chose Azure Site Recovery as the primary BCDR tooling.
2. Azure Site Recovery implementation.
a. Define and deploy Azure Site Recovery Vault for backup and recovery processes.
b. Create an Azure Resource Management template for creation of a vault in each subscription.
3. Azure Monitor implementation.
a. Once a mission-critical subscription is identified, a log analytics workspace can be created.
Individual cloud adoption subscription: The following will ensure that each subscription is discoverable by the
monitoring solution and ready to be included in BCDR practices.
1. Azure Policy for mission-critical nodes:
a. Audit and enforce use of standard roles only.
b. Audit and enforce application of encryption for all storage accounts.
c. Audit and enforce use of approved network subnet and VNet per network interface.
d. Audit and enforce the limitation of user-defined routing tables.
e. Audit and enforce the deployment of Log Analytics agents for Windows and Linux virtual machines.
2. Azure blueprint:
a. Create a blueprint named mission-critical-workloads-and-protected-data . This blueprint will apply
assets in addition to the protected data blueprint.
b. Add the new Azure policies to the blueprint.
c. Apply the blueprint to any subscription that is expected to host a mission-critical application.

Conclusion
Adding these processes and changes to the governance MVP helps remediate many of the risks associated with
resource governance. Together, they add the recovery, sizing, and monitoring controls necessary to empower
cloud-aware operations.

Next steps
As cloud adoption grows and delivers additional business value, the risks and cloud governance needs will also
change. For the fictional company in this guide, the next trigger is when the scale of deployment exceeds 1,000
assets to the cloud or monthly spending exceeds $10,000 USD per month. At this point, the cloud governance
team adds Cost Management controls.
Improve the Cost Management discipline
Governance guide for complex enterprises: Improve
the Cost Management discipline
4 minutes to read • Edit Online

This article advances the narrative by adding cost controls to the minimum viable product (MVP ) governance.

Advancing the narrative


Adoption has grown beyond the tolerance indicator defined in the governance MVP. The increases in spending
now justifies an investment of time from the cloud governance team to monitor and control spending patterns.
As a clear driver of innovation, IT is no longer seen primarily as a cost center. As the IT organization delivers more
value, the CIO and CFO agree that the time is right to shift the role IT plays in the company. Among other
changes, the CFO wants to test a direct pay approach to cloud accounting for the Canadian branch of one of the
business units. One of the two retired datacenters was exclusively hosted assets for that business unit's Canadian
operations. In this model, the business unit's Canadian subsidiary will be billed directly for the operating expenses
related to the hosted assets. This model allows IT to focus less on managing someone else's spending and more
on creating value. However, before this transition can begin Cost Management tooling needs to be in place.
Changes in the current state
In the previous phase of this narrative, the IT team was actively moving production workloads with protected data
into Azure.
Since then, some things have changed that will affect governance:
5,000 assets have been removed from the two datacenters flagged for retirement. Procurement and IT
security are now deprovisioning the remaining physical assets.
The application development teams have implemented CI/CD pipelines to deploy some cloud-native
applications, significantly affecting customer experiences.
The BI team has created aggregation, curation, insight, and prediction processes driving tangible benefits for
business operations. Those predictions are now empowering creative new products and services.
Incrementally improve the future state
Cost monitoring and reporting should be added to the cloud solution. Reporting should tie direct operating
expenses to the functions that are consuming the cloud costs. Additional reporting should allow IT to monitor
spending and provide technical guidance on cost management. For the Canadian branch, the department will be
billed directly.

Changes in risk
Budget control: There is an inherent risk that self-service capabilities will result in excessive and unexpected
costs on the new platform. Governance processes for monitoring costs and mitigating ongoing cost risks must be
in place to ensure continued alignment with the planned budget.
This business risk can be expanded into a few technical risks:
There is a risk of actual costs exceeding the plan.
Business conditions change. When they do, there will be cases when a business function needs to consume
more cloud services than expected, leading to spending anomalies. There is a risk that these additional costs
will be considered overages as opposed to a required adjustment to the plan. If successful, the Canadian
experiment should help remediate this risk.
There is a risk of systems being overprovisioned, resulting in excess spending.

Changes to the policy statements


The following changes to policy will help remediate the new risks and guide implementation.
All cloud costs should be monitored against plan on a weekly basis by the cloud governance team. Reporting
on deviations between cloud costs and plan is to be shared with IT leadership and finance monthly. All cloud
costs and plan updates should be reviewed with IT leadership and finance monthly.
All costs must be allocated to a business function for accountability purposes.
Cloud assets should be continually monitored for optimization opportunities.
Cloud governance tooling must limit asset sizing options to an approved list of configurations. The tooling
must ensure that all assets are discoverable and tracked by the cost monitoring solution.
During deployment planning, any required cloud resources associated with the hosting of production
workloads should be documented. This documentation will help refine budgets and prepare additional
automation tools to prevent the use of more expensive options. During this process consideration should be
given to different discounting tools offered by the cloud provider, such as Reserved Instances or License cost
reductions.
All application owners are required to attend trained on practices for optimizing workloads to better control
cloud costs.

Incremental improvement of the best practices


This section of the article will improve the governance MVP design to include new Azure policies and an
implementation of Azure Cost Management. Together, these two design changes will fulfill the new corporate
policy statements.
1. Make changes in the Azure Enterprise Portal to bill the Department administrator for the Canadian
deployment.
2. Implement Azure Cost Management.
a. Establish the right level of access scope to align with the subscription pattern and resource grouping
pattern. Assuming alignment with the governance MVP defined in prior articles, this would require
Enrollment Account Scope access for the cloud governance team executing on high-level reporting.
Additional teams outside of governance, like the Canadian procurement team, will require Resource
Group Scope access.
b. Establish a budget in Azure Cost Management.
c. Review and act on initial recommendations. It's recommended to have a recurring process to support
the reporting process.
d. Configure and execute Azure Cost Management Reporting, both initial and recurring.
3. Update Azure Policy.
a. Audit tagging, management group, subscription, and resource group values to identify any deviation.
b. Establish SKU size options to limit deployments to SKUs listed in deployment planning documentation.

Conclusion
Adding the above processes and changes to the governance MVP helps remediate many of the risks associated
with cost governance. Together, they create the visibility, accountability, and optimization needed to control costs.

Next steps
As cloud adoption grows and delivers additional business value, risks and cloud governance needs will also
change. For this fictional company, the next step is using this governance investment to manage multiple clouds.
Multicloud improvement
Governance guide for complex enterprises:
Multicloud improvement
3 minutes to read • Edit Online

Advancing the narrative


Microsoft recognizes that customers may adopt multiple clouds for specific purposes. The fictional company in
this guide is no exception. In parallel with their Azure adoption journey, business success has led to the acquisition
of a small but complementary business. That business is running all of their IT operations on a different cloud
provider.
This article describes how things change when integrating the new organization. For purposes of the narrative, we
assume this company has completed each of the governance iterations outlined in this governance guide.
Changes in the current state
In the previous phase of this narrative, the company had begun to implement cost controls and cost monitoring,
as cloud spending becomes part of the company's regular operating expenses.
Since then, some things have changed that will affect governance:
Identity is controlled by an on-premises instance of Active Directory. Hybrid Identity is facilitated through
replication to Azure Active Directory.
IT Operations or Cloud Operations are largely managed by Azure Monitor and related automation capabilities.
Disaster recovery and business continuity (DRBC ) is controlled by Azure Vault instances.
Azure Security Center is used to monitor security violations and attacks.
Azure Security Center and Azure Monitor are both used to monitor governance of the cloud.
Azure Blueprints, Azure Policy, and management groups are used to automate compliance to policy.
Incrementally improve the future state
The goal is to integrate the acquisition company into existing operations wherever possible.

Changes in tangible risks


Business acquisition cost: Acquisition of the new business is estimated to be profitable in approximately five
years. Because of the slow rate of return, the board wants to control acquisition costs, as much as possible. There
is a risk of cost control and technical integration conflicting with one another.
This business risk can be expanded into a few technical risks:
There is a risk of cloud migration producing additional acquisition costs.
There is also a risk of the new environment not being properly governed or resulting in policy violations.

Incremental improvement of the policy statements


The following changes to policy will help remediate the new risks and guide implementation.
All assets in a secondary cloud must be monitored through existing operational management and security
monitoring tools.
All organizational units must be integrated into the existing identity provider.
The primary identity provider should govern authentication to assets in the secondary cloud.
Incremental improvement of the best practices
This section of the article improves the governance MVP design to include new Azure policies and an
implementation of Azure Cost Management. Together, these two design changes will fulfill the new corporate
policy statements.
1. Connect the networks. Executed by Networking and IT Security, supported by governance.
a. Adding a connection from the MPLS or leased-line provider to the new cloud will integrate networks.
Adding routing tables and firewall configurations will control access and traffic between the
environments.
2. Consolidate identity providers. Depending on the workloads being hosted in the secondary cloud, there are a
variety of options to identity provider consolidation. The following are a few examples:
a. For applications that authenticate using OAuth 2, users in the Active Directory in the secondary cloud
could simply be replicated to the existing Azure AD tenant.
b. On the other extreme, federation between the two on-premises identity providers, would allow users
from the new Active Directory domains to be replicated to Azure.
3. Add assets to Azure Site Recovery.
a. Azure Site Recovery was built as a hybrid and multicloud tool from the beginning.
b. Virtual machines in the secondary cloud might be able to be protected by the same Azure Site Recovery
processes used to protect on-premises assets.
4. Add assets to Azure Cost Management.
a. Azure Cost Management was built as a multicloud tool from the beginning.
b. Virtual machines in the secondary cloud might be compatible with Azure Cost Management for some
cloud providers. Additional costs may apply.
5. Add assets to Azure Monitor.
a. Azure Monitor was built as a hybrid cloud tool from the beginning.
b. Virtual machines in the secondary cloud might be compatible with Azure Monitor agents, allowing them
to be included in Azure Monitor for operational monitoring.
6. Governance enforcement tools.
a. Governance enforcement is cloud-specific.
b. The corporate policies established in the governance guide are not cloud-specific. While the
implementation may vary from cloud to cloud, the policy statements can be applied to the secondary
provider.
Multicloud adoption should be contained to where it is required based on technical needs or specific business
requirements. As multicloud adoption grows, so does complexity and security risks.

Next steps
In many large enterprises, the Five Disciplines of Cloud Governance can be blockers to adoption. The next article
has some additional thoughts on making governance a team sport to help ensure long-term success in the cloud.
Multiple layers of governance
Governance guide for complex enterprises: Multiple
layers of governance
3 minutes to read • Edit Online

When large enterprises require multiple layers of governance, there are greater levels of complexity that must be
factored into the governance MVP and later governance improvements.
A few common examples of such complexities include:
Distributed governance functions.
Corporate IT supporting Business unit IT organizations.
Corporate IT supporting geographically distributed IT organizations.
This article explores some ways to navigate this type of complexity.

Large enterprise governance is a team sport


Large established enterprises often have teams or employees who focus on the disciplines mentioned throughout
this guide. This guide demonstrates one approach to making governance a team sport.
In many large enterprises, the Five Disciplines of Cloud Governance can be blockers to adoption. Developing
cloud expertise in identity, security, operations, deployments, and configuration across an enterprise takes time.
Holistically implementing IT governance policy and IT security can slow innovation by months or even years.
Balancing the business need to innovate and the governance need to protect existing resources is delicate.
The inherent capabilities of the cloud can remove blockers to innovation but increase risks. In this governance
guide, we showed how the example company created guardrails to manage the risks. Rather than tackling each of
the disciplines required to protect the environment, the cloud governance team leads a risk-based approach to
govern what could be deployed, while the other teams build the necessary cloud maturities. Most importantly, as
each team reaches cloud maturity, governance applies their solutions holistically. As each team matures and adds
to the overall solution, the cloud governance team can open stage gates, allowing additional innovation and
adoption to thrive.
This model illustrates the growth of a partnership between the cloud governance team and existing enterprise
teams (Security, IT Governance, Networking, Identity, and others). The guide starts with the governance MVP and
grows to a holistic end state through governance iterations.

Requirements to supporting such a team sport


The first requirement of a multilayer governance model is to understand of the governance hierarchy. Answering
the following questions will help you to understand the general governance hierarchy:
How is cloud accounting (billing for cloud services) allocated across business units?
How are governance responsibilities allocated across corporate IT and each business unit?
What types of environments do each of those units of IT manage?

Central governance of a distributed governance hierarchy


Tools like management groups allow corporate IT to create a hierarchy structure that matches the governance
hierarchy. Tools like Azure Blueprints can apply assets to different layers of that hierarchy. Azure Blueprints can be
versioned and various versions can be applied to management groups, subscriptions, or resource groups. Each of
these concepts is described in more detail in the governance MVP.
The important aspect of each of these tools is the ability to apply multiple blueprints to a hierarchy. This allows
governance to be a layered process. The following is one example of this hierarchical application of governance:
Corporate IT: Corporate IT creates a set of standards and policies that apply to all cloud adoption. This is
materialized in a "Baseline" blueprint. Corporate IT then owns the management group hierarchy, ensuring that
a version of the baseline is applied to all subscriptions in the hierarchy.
Regional or Business Unit IT: Various IT teams can apply an additional layer of governance by creating their
own blueprint. Those blueprints would create additive policies and standards. Once developed, Corporate IT
could apply those blueprints to the applicable nodes within the management group hierarchy.
Cloud adoption teams: Detailed decisions and implementation about applications or workloads can be made
by each cloud adoption team, within the context of governance requirements. At times the team can also
request additional Azure Resource Consistency templates to accelerate adoption efforts.
The details regarding governance implementation at each level will require coordination between each team. The
governance MVP and governance improvements outlined in this guide can aid in aligning that coordination.
Any change to business processes or technology platforms introduces risk to the business. Cloud governance teams, whose
members are sometimes known as cloud custodians, are tasked with mitigating these risks with minimal interruption to
adoption or innovation efforts.

However, cloud governance requires more than technical implementation. Subtle changes in the corporate narrative or
corporate policies can affect adoption efforts significantly. Before implementation, it's important to look beyond IT while
defining corporate policy.

Figure 1 - Visual of corporate policy and the Five Disciplines of Cloud Governance.

Define corporate policy


Defining corporate policy focuses on identifying and mitigating business risks regardless of the cloud platform. Healthy cloud
governance strategy begins with sound corporate policy. The following three-step process guides iterative development of such
policies.

Business risk
Investigate current cloud adoption plans and data classification to identify risks to the business. Work with the business to
balance risk tolerance and mitigation costs.

Policy and compliance


Evaluate risk tolerance to inform minimally invasive policies that govern cloud adoption and manage risks. In some
industries, third-party compliance affects initial policy creation.

Processes
The pace of adoption and innovation activities will naturally create policy violations. Executing relevant processes will aid
in monitoring and enforcing adherence to policies.

Next steps
Learn how to make your corporate policy ready for the cloud.
Prepare corporate policy for the cloud
Prepare corporate IT policy for the cloud
4 minutes to read • Edit Online

Cloud governance is the product of an ongoing adoption effort over time, as a true lasting transformation doesn't
happen overnight. Attempting to deliver complete cloud governance before addressing key corporate policy
changes using a fast aggressive method seldom produces the desired results. Instead we recommend an
incremental approach.
What is different about our Cloud Adoption Framework is the purchasing cycle and how it can enable authentic
transformation. Since there is not a big capital expenditure acquisition requirement, engineers can begin
experimentation and adoption sooner. In most corporate cultures, elimination of the capital expense barrier to
adoption can lead to tighter feedback loops, organic growth, and incremental execution.
The shift to cloud adoption requires a shift in governance. In many organizations, corporate policy transformation
allows for improved governance and higher rates of adherence through incremental policy changes and
automated enforcement of those changes, powered by newly defined capabilities that you configure with your
cloud service provider.
This article outlines key activities that can help you shape your corporate policies to enable an expanded
governance model.

Define corporate policy to mature cloud governance


In traditional governance and incremental governance, corporate policy creates the working definition of
governance. Most IT governance actions seek to implement technology to monitor, enforce, operate, and automate
those corporate policies. Cloud governance is built on similar concepts.

Figure 1 - Corporate governance and governance disciplines.


The image above demonstrates the interactions between business risk, policy and compliance, and monitor and
enforce to create a governance strategy. Followed by the Five Disciplines of Cloud Governance to realize your
strategy.

Review existing policies


In the image above, the governance strategy (risk, policy and compliance, monitor and enforce) starts with
recognizing business risks. Understanding how business risk changes in the cloud is the first step to creating a
lasting cloud governance strategy. Working with your business units to gain an accurate gauge of the business's
tolerance for risk, helps you understand what level of risks need to be remediated. Your understanding of new
risks and acceptable tolerance can fuel a review of existing policies, in order to determine the required level of
governance that is appropriate for your organization.

TIP
If your organization is governed by third-party compliance, one of the biggest business risks to consider may be a risk of
adherence to regulatory compliance. This risk often cannot be remediated, and instead may require a strict adherence. Be
sure to understand your third-party compliance requirements before beginning a policy review.

An incremental approach to cloud governance


An incremental approach to cloud governance assumes that it is unacceptable to exceed the business' tolerance
for risk. Instead, it assumes that the role of governance is to accelerate business change, help engineers
understand architecture guidelines, and ensure that business risks are regularly communicated and remediated.
Alternatively, the traditional role of governance can become a barrier to adoption by engineers or by the business
as a whole.
With an incremental approach to cloud governance, there is sometimes a natural friction between teams building
new business solutions and teams protecting the business from risks. However, in this model those two teams can
become peers working in increments or sprints. As peers, the cloud governance team and the cloud adoption
teams begin to work together to expose, evaluate, and remediate business risks. This effort can create a natural
means of reducing friction and building collaboration between teams.

Minimum viable product (MVP) for policy


The first step in an emerging partnership between your cloud governance and adoption teams is an agreement
regarding the policy MVP. Your MVP for cloud governance should acknowledge that business risks are small in
the beginning, but will likely grow as your organization adopts more cloud services over time.
For example, the business risk is small for a business deploying five VMs that don't contain any high business
impact (HBI) data. Later in the cloud adoption process, when the number reaches 1,000 VMs and the business is
starting to move HBI data, the business risk grows.
Policy MVP attempts to define a required foundation for policies needed to deploy the first x VMs or the first x
number of applications, where x is a small yet meaningful quantity of the units being adopted. This policy set
requires few constraints, but would contain the foundational aspects needed to quickly grow from one incremental
cloud adoption effort to the next. Through incremental policy development, this governance strategy would grow
over time. Through slow subtle shifts, the policy MVP would grow into feature parity with the outputs of the policy
review exercise.

Incremental policy growth


Incremental policy growth is the key mechanism to growing policy and cloud governance over time. It is also the
key requirement to adopting an incremental model to governance. For this model to work well, the governance
team must be committed to an ongoing allocation of time at each sprint, in order to evaluate and implement
changing governance disciplines.
Sprint time requirements: At the beginning of each iteration, each cloud adoption team creates a list of assets to
be migrated or adopted in the current increment. The cloud governance team is expected to allow sufficient time
to review the list, validate data classifications for assets, evaluate any new risks associated with each asset, update
architecture guidelines, and educate the team on the changes. These commitments commonly require 10-30
hours per sprint. It's also expected for this level of involvement to require at least one dedicated employee to
manage governance in a large cloud adoption effort.
Release time requirements: At the beginning of each release, the cloud adoption teams and the cloud strategy
team should prioritize a list of applications or workloads to be migrated in the current iteration, along with any
business change activities. Those data points allow the cloud governance team to understand new business risks
early. That allows time to align with the business and gauge the business's tolerance for risk.

Next steps
Effective cloud governance strategy begins with understanding business risk.
Understand business risk
Understand business risk during cloud migration
4 minutes to read • Edit Online

An understanding of business risk is one of the most important elements of any cloud transformation. Risk drives
policy, and it influences monitoring and enforcement requirements. Risk heavily influences how we manage the
digital estate, on-premises or in the cloud.

Relativity of risk
Risk is relative. A small company with a few IT assets, in a closed building has little risk. Add users and an internet
connection with access to those assets, the risk is intensified. When that small company grows to Fortune 500
status, the risks are exponentially greater. As revenue, business process, employee counts, and IT assets
accumulate, risks increase and coalesce. IT assets that aid in generating revenue are at tangible risk of stopping
that revenue stream in the event of an outage. Every moment of downtime equates to losses. Likewise, as data
accumulates, the risk of harming customers grows.
In the traditional on-premises world, IT governance teams focus on assessing risks, creating processes to manage
those risks, and deploying systems to ensure remediation measures are successfully implemented. These efforts
work to balance risks required to operate in a connected, modern business environment.

Understand business risks in the cloud


During a transformation, the same relative risks exist.
During early experimentation, a few assets are deployed with little to no relevant data. The risk is small.
When the first workload is deployed, risk goes up a little. This risk is easily remediated by choosing an
inherently low risk application with a small user base.
As more workloads come online, risks change at each release. New apps go live and risks change.
When a company brings the first 10-20 applications online, the risk profile is much different that it is when the
1000th applications go into production in the cloud.
The assets that accumulated in the traditional, on-premises estate likely accumulated over time. The maturity of
the business and IT teams was likely growing in a similar fashion. That parallel growth can tend to create some
unnecessary policy baggage.
During a cloud transformation, both the business and IT teams have an opportunity to reset those policies and
build new with a matured mindset.

What is a business risk MVP?


A minimum viable product is commonly used to define to define the smallest unit of something that can
produce tangible value. In a business risk MVP, the cloud governance team starts with the assumption that some
assets will be deployed to a cloud environment at some point in time. It's unknown what those assets are at the
time, and the team may be unsure what types of data will be stored on those assets.
When planning for business risk, the cloud governance team could build for the worst case scenario and map
every possible policy to the cloud. However, identifying all potential business risks for all cloud usage scenarios
can take considerable time and effort, potentially delaying the implementation of governance to your cloud
workloads. This is not recommended, but is an option.
Conversely, an MVP approach can allow the team to define an initial starting point and set of assumptions that
would be true for most/all assets. This business risk MVP will support initial small scale or test cloud
deployments, and then be used as a base for gradually identifying and remediating new risks as business needs
arise or additional workloads are added to your cloud environment. This process allows you to apply governance
throughout the cloud adoption process.
The following are a few basic examples of business risks that can be included as part of an MVP:
All assets are at risk of being deleted (through error, mistake or maintenance).
All assets are at risk of generating too much spending.
All assets could be compromised by weak passwords or insecure settings.
Any asset with open ports exposed to the internet are at risk of compromise.
The above examples are meant to establish MVP business risks as a theory. The actual list will be unique to every
environment. Once the Business Risk MVP is established, they can be converted to policies to remediate each
risk.

Incremental risk mitigation


As your organization deploys more workloads to the cloud, development teams will make use of increasing
amounts of cloud resources. At each iteration, new assets are created and staged. At each release, workloads are
readied for production promotion. Each of these cycles has the potential to introduce previously unidentified
business risks.
Assuming a business risk MVP is the starting point for your initial cloud adoption efforts, governance can mature
in parallel with your increasing use of cloud resources. When the cloud governance team operates in parallel with
cloud adoption teams, the growth of business risks can be addressed as they are identified, providing a stable
ongoing model for developing governance maturity.
Each asset staged can easily be classified according to risk. Data classification documents can be built or created in
parallel with staging cycles. Risk profile and exposure points can likewise be documented. Over time an extremely
clear view of business risk will come into focus across the organization.
With each iteration, the cloud governance team can work with the cloud strategy team to quickly communicate
new risks, mitigation strategies, tradeoffs, and potential costs. This empowers business participants and IT leaders
to partner in mature, well-informed decisions. Those decisions then inform policy maturity. When required, the
policy changes produce new work items for the maturity of core infrastructure systems. When changes to staged
systems are required, the cloud adoption teams have ample time to make changes, while the business tests the
staged systems and develops a user adoption plan.
This approach minimizes risks, while empowering the team to move quickly. It also ensures that risks are
promptly addressed and resolved before deployment.

Next steps
Learn how to evaluate risk tolerance during cloud adoption.
Evaluate risk tolerance
Evaluate risk tolerance
8 minutes to read • Edit Online

Every business decision creates new risks. Making an investment in anything creates risk of losses. New products
or services create risks of market failure. Changes to current products or services could reduce market share.
Cloud transformation does not provide a magical solution to everyday business risk. To the contrary, connected
solutions (cloud or on-premises) introduce new risks. Deploying assets to any network connected facility also
expands the potential threat profile by exposing security weaknesses to a much broader, global community.
Fortunately, cloud providers are aware of the changes, increases, and addition of risks. They invest heavily to
reduce and manage those risks on the behalf of their customers.
This article is not focused on cloud risks. Instead it discusses the business risks associated with various forms of
cloud transformation. Later in the article, the discussion shifts focus to discuss ways of understanding the
business' tolerance for risk.

What business risks are associated with a cloud transformation?


True business risks are based on the details of specific transformations. Several common risks provide a
conversation starter to understand business-specific risks.

IMPORTANT
Before reading the following, be aware that each of these risks can be managed. The goal of this article is to inform and
prepare readers for more productive risk management discussions.

Data breach: The top risk associated with any transformation is a data breach. Data leaks can cause
significant damage to your company, leading to loss of customers, decrease in business, or even legal
liability. Any changes to the way data is stored, processed, or used creates risk. Cloud transformations
create a high degree of change regarding data management, so the risk should not be taken lightly.
Security Baseline, Data Classification, and Incremental Rationalization can each help manage this risk.
Service disruption: Business operations and customer experiences rely heavily on technical operations.
Cloud transformations will create change in IT operations. In some organizations, that change is small and
easily adjusted. In other organizations, these changes could require retooling, retraining, or new
approaches to support cloud operations. The bigger the change, the bigger the potential impact on
business operations and customer experience. Managing this risk will require the involvement of the
business in transformation planning. Release planning and first workload selection in the incremental
rationalization article discuss ways to choose workloads for transformation projects. The business's role in
that activity is to communicate the business operations risk of changing prioritized workloads. Helping IT
choose workloads that have a lower impact on operations will reduce the overall risk.
Budget control: Cost models change in the cloud. This change can create risks associated with cost
overruns or increases in the cost of goods sold (COGS ), especially directly attributed operating expenses.
When business works closely with IT, it is feasible to create transparency regarding costs and services
consumed by various business units, programs, or projects. Cost Management provides examples of ways
business and IT can partner on this topic.
The above are a few of the most common risks mentioned by customers. The cloud governance team and the
cloud adoption teams can begin to develop a risk profile, as workloads are migrated and readied for production
release. Be prepared for conversations to define, refine, and manage risks based on the desired business
outcomes and transformation effort.

Understand risk tolerance


Identifying risk is a fairly direct process. IT-related risks are generally standard across industries. However,
tolerance for these risks is specific to each organization. This is the point where business and IT conversations
tend to get hung up. Each side of the conversation is essentially speaking a different language. The following
comparisons and questions are designed to start conversations that help each party better understand and
calculate risk tolerance.

Simple use case for comparison


To help understand risk tolerance, let's examine customer data. If a company in any industry posts customer data
on an unsecured server, the technical risk of that data being compromised or stolen is roughly the same.
However, a company's tolerance for that risk will vary wildly based on the nature and potential value of the data.
Companies in healthcare and finance in the United States, are governed by rigid, third-party compliance
requirements. It is assumed that personal data or healthcare-related data is extremely confidential. There are
severe consequences for these types of companies, if they are involved in the risks scenario above. Their
tolerance will be extremely low. Any customer data published inside or outside of the network will need to be
governed by those third-party compliance policies.
A gaming company whose customer data is limited to a user name, play times, and high scores is not as likely
to suffer significant consequences beyond loss to reputation, if they engage in the risky behavior above. While
any unsecured data is at risk, the impact of that risk is small. Therefore, the tolerance for risk in this case is
high.
A medium-sized enterprise that provides carpet cleaning services to thousands of customers would fall in
between these two tolerance extremes. There, customer data may be more robust, containing details like
address or phone number. Both could be considered personal data and should be protected. However, there
may not be any specific governance requirement mandating that the data be secured. From an IT perspective,
the answer is simple, secure the data. From a business perspective, it may not be as simple. The business
would need more details before they could determine a level of tolerance for this risk.
The next section shares a few sample questions that could help the business determine a level of risk tolerance
for the use case above or others.

Risk tolerance questions


This section lists conversation provoking questions in three categories: loss impact, probability of loss, and
remediation costs. When business and IT partner to address each of these areas, the decision to expend effort on
managing risks and the overall tolerance to a particular risk can easily be determined.
Loss impact: Questions to determine the impact of a risk. These questions can be difficult to answer.
Quantifying the impact is best, but sometimes the conversation alone is enough to understand tolerance. Ranges
are also acceptable, especially if they include assumptions that determined those ranges.
Could this risk violate third-party compliance requirements?
Could this risk violate internal corporate policies?
Could this risk cause the loss of life, limb or property?
Could this risk cost customers or market share? If so, can this cost be quantified?
Could this risk create negative customer experiences? Are those experiences likely to affect sales or revenue?
Could this risk create new legal liability? If so, is there a precedence for damage awards in these types of
cases?
Could this risk stop business operations? If so, how long would operations be down?
Could this risk slow business operations? If so, how slow and how long?
At this stage in the transformation is this a one-off risk or will it repeat?
Does the risk increase or decrease in frequency as the transformation progresses?
Does the risk increase or decrease in probability over time?
Is the risk time sensitive in nature? Will the risk pass or get worse, if not addressed?
These basic questions will lead to many more. After exploring a healthy dialogue, it is suggested that the relevant
risks be recorded and when possible quantified.
Risk remediation costs: Questions to determine the cost of removing or otherwise minimizing the risk. These
questions can be fairly direct, especially when represented in a range.
Is there a clear solution and what does it cost?
Are there options for preventing or minimizing this risk? What is the range of costs for those solutions?
What is needed from the business to select the best, clear solution?
What is needed from the business to validate costs?
What other benefits can come from the solution that would remove this risk?
These questions over simplify the technical solutions needed to manage or remove risks. However, these
questions communicate those solutions in ways the business can quickly integrate into a decision process.
Probability of loss: Questions to determine how likely it is that the risk will become a reality. This is the most
difficult area to quantify. Instead it is suggested that the cloud governance team create categories for
communicating probability, based on the supporting data. The following questions can help create categories
that are meaningful to the team.
Has any research been done regarding the likelihood of this risk being realized?
Can the vendor provide references or statistics on the likelihood of an impact?
Are there other companies in the relevant sector or vertical that have been hit by this risk?
Look further, are there other companies in general that have been hit by this risk?
Is this risk unique to something this company has done poorly?
After answering these questions along with questions as determined by the cloud governance team, groupings
of probability will likely emerge. The following are a few grouping samples to help get started:
No indication: Not enough research has been completed to determine probability.
Low risk: Current research indicates realizing the risk is unlikely.
Future risk: The current probability is low. However, continued adoption would require a fresh analysis.
Medium risk: It's likely that the risk will affect the business.
High risk: Over time, it is increasingly likely that the business will realize this risk.
Declining risk: The risk is medium to high. However, actions in IT or the business are reducing the likelihood
of an impact.
Determining tolerance:
The three question sets above should fuel enough data to determine initial tolerances. When risk and probability
are low, and risk remediation costs are high, the business is unlikely to invest in remediation. When risk and
probability are high, the business is likely to consider an investment, as long as the costs don't exceed the
potential risks.

Next steps
This type of conversation can help the business and IT evaluate tolerance more effectively. These conversations
can be used during the creation of MVP policies and during incremental policy reviews.
Define corporate policy
Define corporate policy for cloud governance
3 minutes to read • Edit Online

Once you've analyzed the known risks and related risk tolerances for your organization's cloud transformation
journey, your next step is to establish policy that will explicitly address those risks and define the steps needed to
remediate them where possible.

How can corporate IT policy become cloud-ready?


In traditional governance and incremental governance, corporate policy creates the working definition of
governance. Most IT governance actions seek to implement technology to monitor, enforce, operate, and automate
those corporate policies. Cloud Governance is built on similar concepts.

Figure 1 - Corporate governance and governance disciplines.


The image above illustrates the relationship between business risk, policy and compliance, and monitoring and
enforcement mechanisms that will need to interact as part of your governance strategy. The Five Disciplines of
Cloud Governance allow you to manage these interactions and realize your strategy.
Cloud governance is the product of an ongoing adoption effort over time, as a true lasting transformation doesn't
happen overnight. Attempting to deliver complete cloud governance before addressing key corporate policy
changes using a fast aggressive method seldom produces the desired results. Instead we recommend an
incremental approach.
What is different about a Cloud Adoption Framework is the purchasing cycle and it can enable authentic
transformation. Since there is not a large capital expenditure acquisition requirement, engineers can begin
experimentation and adoption sooner. In most corporate cultures, elimination of the capital expense barrier to
adoption can lead to tighter feedback loops, organic growth, and incremental execution.
The shift to cloud adoption requires a shift in governance. In many organizations, corporate policy transformation
allows for improved governance and higher rates of adherence through incremental policy changes and
automated enforcement of those changes, powered by newly defined capabilities that you configure with your
cloud service provider.

Review existing policies


As governance is an ongoing process, policy should be regularly reviewed with IT staff and stakeholders to ensure
resources hosted in the cloud continue to maintain compliance with overall corporate goals and requirements.
Your understanding of new risks and acceptable tolerance can fuel a review of existing policies, in order to
determine the required level of governance that is appropriate for your organization.

TIP
If your organization uses vendors or other trusted business partners, one of the biggest business risks to consider may be a
lack of adherence to regulatory compliance by these external organizations. This risk often cannot be remediated, and
instead may require a strict adherence to requirements by all parties. Make sure you've identified and understand any third-
party compliance requirements before beginning a policy review.

Create cloud policy statements


Cloud-based IT policies establish the requirements, standards, and goals that your IT staff and automated systems
will need to support. Policy decisions are a primary factor in your cloud architecture design and how you will
implement your policy adherence processes.
Individual cloud policy statements are guidelines for addressing specific risks identified during your risk
assessment process. While these policies can be integrated into your wider corporate policy documentation, cloud
policy statements discussed throughout the Cloud Adoption Framework guidance tends to be a more concise
summary of the risks and plans to deal with them. Each definition should include these pieces of information:
Business risk: A summary of the risk this policy will address.
Policy statement: A concise explanation of the policy requirements and goals.
Design or technical guidance: Actionable recommendations, specifications, or other guidance to support
and enforce this policy that IT teams and developers can use when designing and building their cloud
deployments.
If you need help getting started with defining policies, consult the governance disciplines introduced in the
governance section overview. The articles for each of these disciplines includes examples of common business
risks encountered when moving to the cloud and sample policies used to remediate those risks (for example, see
the Cost Management discipline's sample policy definitions).

Incremental governance and integrating with existing policy


Planned additions to your cloud environment should always be vetted for compliance with existing policy, and
policy updated to account for any issues not already covered. You should also perform regular cloud policy review
to ensure your cloud policy is up-to-date and in-sync with any new corporate policy.
The need to integrate cloud policy with your legacy IT policies depends largely on the maturity of your cloud
governance processes and the size of your cloud estate. See the article on incremental governance and the policy
MVP for a broader discussion on dealing with policy integration during your cloud transformation.

Next steps
After defining your policies, draft an architecture design guide to provide IT staff and developers with actionable
guidance.
Align your governance design guide with corporate policy
Align your cloud governance design guide with
corporate policy
2 minutes to read • Edit Online

After you've defined cloud policies based on your identified risks, you'll need to generate actionable guidance that
aligns with these policies for your IT staff and developers to refer to. Drafting a cloud governance design guide
allows you to specify specific structural, technological, and process choices based on the policy statements you
generated for each of the five governance disciplines.
A cloud governance design guide should establish the architecture choices and design patterns for each of the core
infrastructure components of cloud deployments that best meet your policy requirements. Alongside these you
should provide a high-level explanation of the technology, tools, and processes that will support each of these
design decisions.
Although your risk analysis and policy statements may, to some degree, be cloud platform agnostic, your design
guide should provide platform-specific implementation details that your IT and dev teams can use when creating
and deploying cloud-based workloads. Focus on the architecture, tools, and features of your chosen platform when
making design decision and providing guidance.
While cloud design guides should take into account some of the technical details associated with each
infrastructure component, they are not meant to be extensive technical documents or specifications. Make sure
your guides address your policy statements and clearly state design decisions in a format easy for staff to
understand and reference.

Use the actionable governance guides


If you're planning to use the Azure platform for your cloud adoption, the Cloud Adoption Framework provides
actionable governance guides illustrating the incremental approach of the Cloud Adoption Framework governance
model. These narrative guides cover a range of common adoption scenarios, including the business risks, tolerance
requirements, and policy statements that went into creating a governance minimum viable product (MVP ). These
guides represent a synthesis of real-world customer experience of the cloud adoption process in Azure.
While every cloud adoption has unique goals, priorities, and challenges, these samples should provide a good
template for converting your policy into guidance. Pick the closest scenario to your situation as a starting point,
and mold it to fit your specific policy needs.

Next steps
With design guidance in place, establish policy adherence processes to ensure policy compliance.
Establish policy adherence processes
Establish policy adherence processes
5 minutes to read • Edit Online

After establishing your cloud policy statements and drafting a design guide, you'll need to create a strategy for
ensuring your cloud deployment stays in compliance with your policy requirements. This strategy will need to
encompass your cloud governance team's ongoing review and communication processes, establish criteria for
when policy violations require action, and defining the requirements for automated monitoring and compliance
systems that will detect violations and trigger remediation actions.
See the corporate policy sections of the actionable governance guides for examples of how policy adherence
process fit into a cloud governance plan.

Prioritize policy adherence processes


How much investment in developing processes is required to support your policy goals? Depending on the size
and maturity of your cloud deployment, the effort required to establish processes that support compliance, and
the costs associated with this effort, can vary widely.
For small deployments consisting of development and test resources, policy requirements may be simple and
require few dedicated resources to address. On the other hand, a mature mission-critical cloud deployment with
high-priority security and performance needs may require a team of staff, extensive internal processes, and
custom monitoring tooling to support your policy goals.
As a first step in defining your policy adherence strategy, evaluate how the processes discussed below can support
your policy requirements. Determine how much effort is worth investing in these processes, and then use this
information to establish realistic budget and staffing plans to meet these needs.

Establish cloud governance team processes


Before defining triggers for policy compliance remediation, you need establish the overall processes that your
team will use and how information will be shared and escalated between IT staff and the cloud governance team.
Assign cloud governance team members
Your cloud governance team will provide ongoing guidance on policy compliance and handle policy-related issues
that emerge when deploying and operating your cloud assets. When building this team, invite staff members that
have expertise in areas covered by your defined policy statements and identified risks.
For initial test deployments, this can be limited to a few system administrators responsible for establishing the
basics of governance. As your governance processes mature, review the cloud guidance team's membership
regularly to ensure that you can properly address new potential risks and policy requirements. Identify members
of your IT and business staff with relevant experience or interest in specific areas of governance and include them
in your teams on a permanent or temporary basis as needed.
Reviews and policy iteration
As additional resources and workloads are deployed, the cloud governance team will need to ensure that new
workloads or assets comply with policy requirements. Evaluate new requirements from workload development
teams to ensure their planned deployments will align with your design guides, and update your policies to support
these requirements when appropriate.
Plan to evaluate new potential risks and update policy statements and design guides as needed. Work with IT staff
and workload teams to evaluate new Azure features and services on an ongoing basis. Also schedule regular
review cycles each of the five governance disciplines to ensure policy is current and in compliance.
Education
Policy compliance requires IT staff and developers to understand the policy requirements that affect their areas of
responsibility. Plan to devote resources to document decisions and requirements, and educate all relevant teams
on the design guides that support your policy requirements.
As policy changes, regularly update documentation and training materials, and ensure education efforts
communicate updated requirements and guidance to relevant IT staff.
At various stages of your cloud journey, you may find it best to consult with partners and professional training
programs to enhance the education of your team, both technically, and procedurally. Additionally, many find that
formal certifications are a valuable addition to your education portfolio and should be considered.
Establish escalation paths
If a resource goes out of compliance, who gets notified? If IT staff detect a policy compliance issue, who do they
contact? Make sure the escalation process to the cloud governance team is clearly defined. Ensure these
communication channels are kept updated to reflect staff and organization changes.

Violation triggers and actions


After defining your cloud governance team and its processes, you need to explicitly define what qualifies as
compliance violations that will triggers actions, and what those actions should be.
Define triggers
For each of your policy statements, review requirements to determine what constitutes a policy violation. Generate
your triggers using the information you've already established as part of the policy definition process.
Risk tolerance: Create violation triggers based on the metrics and risk indicators you established as part of
your risk tolerance analysis.
Defined policy requirements: Policy statements may provide service level agreement (SLA), business
continuity and disaster recovery (BCDR ), or performance requirements that should be used as the basis for
compliance triggers.
Define actions
Each violation trigger should have a corresponding action. Triggered actions should always notify an appropriate
IT staff or cloud governance team member when a violation occurs. This notification can lead to a manual review
of the compliance issue or kickoff a predefined remediation process depending on the type and severity of the
detected violation.
Some examples of violation triggers and actions:

CLOUD GOVERNANCE DISCIPLINE SAMPLE TRIGGER SAMPLE ACTION

Cost Management Monthly cloud spending is more than Notify the billing unit leader who will
20% higher than expected. begin a review of resource usage.

Security Baseline Detect suspicious user activity. Notify the IT security team and disable
the suspect user account.

Resource Consistency CPU utilization for a workload is greater Notify the IT Operations team and scale
than 90%. out additional resources to handle the
load.

Automation of monitoring and compliance


After you've defined your compliance violation triggers and actions, you can start planning how best to use the
logging and reporting tools and other features of the cloud platform to help automate your monitoring and policy
compliance strategy.
For help choosing the best monitoring pattern for your deployment, see the logging and reporting decision guide.

Next steps
Learn more about regulatory compliance in the cloud.
Regulatory compliance
Introduction to regulatory compliance
3 minutes to read • Edit Online

This is an introductory article about regulatory compliance, therefore it's not intended for implementing a
compliance strategy. More detailed information about Azure compliance offerings is available at the Microsoft
Trust Center. Moreover, all downloadable documentation is available to certain Azure customers from the
Microsoft Service Trust Portal.
Regulatory compliance refers to the discipline and process of ensuring that a company follows the laws enforced
by governing bodies in their geography or rules required by voluntarily adopted industry standards. For IT
regulatory compliance, people and processes monitor corporate systems in an effort to detect and prevent
violations of policies and procedures established by these governing laws, regulations, and standards. This in turn
applies to a wide array of monitoring and enforcement processes. Depending on the industry and geography,
these processes can become lengthy and complex.
Compliance is challenging for multinational organizations, especially in heavily regulated industries like healthcare
and financial services. Standards and regulations abound, and in certain cases may change frequently, making it
difficult for businesses to keep up with changing international electronic data handling laws.
As with security controls, organizations should understand the division of responsibilities regarding regulatory
compliance in the cloud. Cloud providers strive to ensure that their platforms and services are compliant. But
organizations also need to confirm that their applications, the infrastructure those applications depend on, and
services supplied by third parties are also certified as compliant.
The following are descriptions of compliance regulations in various industries and geographies:

HIPAA
A healthcare application that processes protected health information (PHI) is subject to both the Privacy Rule and
the Security Rule encompassed within the Health Information Portability and Accountability Act (HIPAA). At a
minimum, HIPAA could likely require that a healthcare business must receive written assurances from the cloud
provider that it will safeguard any PHI received or created.

PCI
Payment Card Industry Data Security Standard (PCI DSS ) is a proprietary information security standard for
organizations that handle branded credit cards from the major card schemes, including Visa, MasterCard,
American Express, Discover, and JCB. The PCI standard is mandated by the card brands and administered by the
Payment Card Industry Security Standards Council. The standard was created to increase controls around
cardholder data to reduce credit-card fraud. Validation of compliance is performed annually, either by an external
Qualified Security Assessor (QSA) or by a firm-specific Internal Security Assessor (ISA) who creates a Report on
Compliance (ROC ) for organizations handling large volumes of transactions, or by a Self-Assessment
Questionnaire (SAQ ) for companies.

Personal data
Personal data is information that could be used to identify a consumer, employee, partner, or any other living or
legal entity. Many emerging laws, particularly those dealing with privacy and personal data, require that
businesses themselves comply and report on compliance and any breaches that might occur.
GDPR
One of the most important developments in this area is the General Data Protection Regulation (GDPR ), designed
to strengthen data protection for individuals within the European Union. GDPR requires that data about
individuals (such as "a name, a home address, a photo, an email address, bank details, posts on social networking
websites, medical information, or a computer's IP address") be maintained on servers within the EU and not
transferred out of it. It also requires that companies notify individuals of any data breaches, and mandates that
companies have a data protection officer (DPO ). Other countries have, or are developing, similar types of
regulations.

Compliant foundation in Azure


To help customers meet their own compliance obligations across regulated industries and markets worldwide,
Azure maintains the largest compliance portfolio in the industry—in breadth (total number of offerings), as well as
depth (number of customer-facing services in assessment scope). Azure compliance offerings are grouped into
four segments: globally applicable, US Government, industry-specific, and region/country-specific.
Azure compliance offerings are based on various types of assurances, including formal certifications, attestations,
validations, authorizations, and assessments produced by independent third-party auditing firms, as well as
contractual amendments, self-assessments, and customer guidance documents produced by Microsoft. Each
offering description in this document provides an up-to-date scope statement indicating which Azure customer-
facing services are in scope for the assessment, as well as links to downloadable resources to assist customers
with their own compliance obligations.
More detailed information about Azure compliance offerings is available from the Microsoft Trust Center.
Moreover, all downloadable documentation is available to certain Azure customers from the Service Trust Portal
in the following sections:
Audit reports: Includes sections for FedRAMP, GRC assessment, ISO, PCI DSS, and SOC reports.
Data protection resources: Includes compliance guides, FAQ and white papers, and pen test and security
assessment sections.

Next steps
Learn more about cloud security readiness.
Cloud security readiness
CISO cloud readiness guide
3 minutes to read • Edit Online

Microsoft guidance like the Cloud Adoption Framework is not positioned to determine or guide the unique
security constraints of the thousands of enterprises supported by this documentation. When moving to the cloud,
the role of the chief information security officer or chief information security office (CISO ) isn't supplanted by
cloud technologies. Quite the contrary, the CISO and the office of the CISO, become more engrained and
integrated. This guide assumes the reader is familiar with CISO processes and is seeking to modernize those
processes to enable cloud transformation.
Cloud adoption enables services that weren't often considered in traditional IT environments. Self-service or
automated deployments are commonly executed by application development or other IT teams not traditionally
aligned to production deployment. In some organizations, business constituents similarly have self-service
capabilities. This can trigger new security requirements that weren't needed in the on-premises world. Centralized
security is more challenging, Security often becomes a shared responsibility across the business and IT culture.
This article can help a CISO prepare for that approach and engage in incremental governance.

How can a CISO prepare for the cloud?


Like most policies, security and governance policies within an organization tend to grow organically. When security
incidents happen, they shape policy to inform users and reduce the likelihood of repeat occurrences. While natural,
this approach creates policy bloat and technical dependencies. Cloud transformation journeys create a unique
opportunity to modernize and reset policies. While preparing for any transformation journey, the CISO can create
immediate and measurable value by serving as the primary stakeholder in a policy review.
In such a review, the role of the CISO is to create a safe balance between the constraints of existing
policy/compliance and the improved security posture of Cloud providers. Measuring this progress can take many
forms, often it is measured in the number of security policies that can be safely offloaded to the cloud provider.
Transferring security risks: As services are moved into infrastructure as a service (IaaS ) hosting models, the
business assumes less direct risk regarding hardware provisioning. The risk isn't removed, instead it is transferred
to the cloud vendor. Should a cloud vendor's approach to hardware provisioning provide the same level of risk
mitigation, in a secure repeatable process, the risk of hardware provisioning execution is removed from corporate
IT's area of responsibility and transferred to the cloud provider. This reduces the overall security risk corporate IT is
responsible for managing, although the risk itself should still be tracked and reviewed periodically.
As solutions move further "up stack" to incorporate platform as a service (PaaS ) or software as a service (SaaS )
models, additional risks can be avoided or transferred. When risk is safely moved to a cloud provider, the cost of
executing, monitoring, and enforcing security policies or other compliance policies can be safely reduced as well.
Growth mindset: Change can be scary to both the business and technical implementors. When the CISO leads a
growth mindset shift in an organization, we've found that those natural fears are replaced with an increased
interest in safety and policy compliance. Approaching a policy review, a transformation journey, or simple
implementation reviews with a growth mindset, allows the team to move quickly but not at the cost of a fair and
manageable risk profile.

Resources for the Chief Information Security Officer


Knowledge about the cloud is fundamental to approaching a policy review with a growth mindset. The following
resources can help the CISO better understand the security posture of Microsoft's Azure platform.
Security platform resources:
Security Development Lifecycle, internal audits
Mandatory security training, background checks
Penetration testing, intrusion detection, DDoS, audits, and logging
State-of-the-art datacenter, physical security, secure network
Microsoft Azure Security Response in the Cloud (PDF )
Privacy and controls:
Manage your data all the time
Control on data location
Provide data access on your terms
Responding to law enforcement
Stringent privacy standards
Compliance:
Microsoft Trust Center
Common controls hub
Cloud Services Due Diligence Checklist
Compliance by service, location, and industry
Transparency:
How Microsoft secures customer data in Azure services
How Microsoft manages data location in Azure services
Who in Microsoft can access your data on what terms
How Microsoft secures customer data in Azure services
Review certification for Azure services, transparency hub

Next steps
The first step to taking action in any governance strategy is a policy review. Policy and compliance could be a
useful guide during your policy review.
Prepare for a policy review
Conduct a cloud policy review
3 minutes to read • Edit Online

A cloud policy review is the first step toward governance maturity in the cloud. The objective of this process is to
modernize existing corporate IT policies. When completed, the updated policies provide an equivalent level of
risk management for cloud-based resources. This article explains the cloud policy review process and its
importance.

Why perform a cloud policy review?


Most businesses manage IT through the execution of processes which alignment with governing policies. In
small businesses, these policies may anecdotal and processes loosely defined. As businesses grow into large
enterprises, policies and processes tend to be more clearly documented and consistently executed.
As companies mature corporate IT policies, dependencies on past technical decisions have a tendency to seep
into governing policies. For instance, its common to see disaster recovery processes include policy that
mandates offsite tape backups. This inclusion assumes a dependency on one type of technology (tape backups),
that may no longer be the most relevant solution.
Cloud transformations create a natural inflection point to reconsider the legacy policy decisions of the past.
Technical capabilities and default processes change considerably in the cloud, as do the inherit risks. Using the
prior example, the tape backup policy stemmed from the risk of a single point of failure by keeping data in one
location and the business need to minimize the risk profile by mitigating this risk. In a cloud deployment, there
are several options that deliver the same risk mitigation, with much lower recovery time objectives (RTO ). For
example:
A cloud-native solution could enable geo-replication of the Azure SQL Database.
A hybrid solution could use Azure Site Recovery to replicate an IaaS workload to Azure.
When executing a cloud transformation, policies often govern many of the tools, services, and processes
available to the cloud adoption teams. If those policies are based on legacy technologies, they may hinder the
team's efforts to drive change. In the worst case, important policies are entirely ignored by the migration team to
enable workarounds. Neither is an acceptable outcome.

The cloud policy review process


Cloud policy reviews align existing IT governance and IT security policies with the Five Disciplines of Cloud
Governance: Cost Management, Security Baseline, Identity Baseline, Resource Consistency, and Deployment
Acceleration.
For each of these disciplines, the review process follows these steps:
1. Review existing on-premises policies related to the specific discipline, looking for two key data points: legacy
dependencies and identified business risks.
2. Evaluate each business risk by asking a simple question: "Does the business risk still exist in a cloud model?"
3. If the risk still exists, rewrite the policy by documenting the necessary business mitigation, not the technical
solution.
4. Review the updated policy with the cloud adoption teams to understand potential technical solutions to the
required mitigation.
Example of a policy review for a legacy policy
To provide an example of the process, let's again use the tape backup policy in the prior section:
A corporate policy mandates offsite tape backups for all production systems. In this policy, you can see two
data points of interest:
Legacy dependency on a tape backup solution
An assumed business risk associated with the storage of backups in the same physical location as the
production equipment.
Does the risk still exist? Yes. Even in the cloud, a dependence on a single facility does create some risk. There
is a lower probability of this risk affecting the business than was present in the on-premises solution, but the
risk still exists.
Rewrite of the policy. In the case of a datacenter-wide disaster, there must exist a means of restoring
production systems within 24 hours of the outage in a different datacenter and different geographic location.
It is also important to consider that the timeline specified in the above requirement may have been set
by technical constraints that are no longer present in the cloud. Make sure to understand the technical
constraints and capabilities of the cloud before simply applying a legacy RTO/RPO.
Review with the cloud adoption teams. Depending on the solution being implemented, there are multiple
means of adhering to this Resource Consistency policy.

Next steps
Learn more about including data classification in your cloud governance strategy.
Data classification
What is data classification?
2 minutes to read • Edit Online

Data classification allows you to determine and assign value to your organization's data, and is a common
starting point for governance. The data classification process categorizes data by sensitivity and business impact
in order to identify risks. When data is classified, you can manage it in ways that protect sensitive or important
data from theft or loss.

Understand data risks, then manage them


Before any risk can be managed, it must be understood. In the case of data breach liability, that understanding
starts with data classification. Data classification is the process of associating a metadata characteristic to every
asset in a digital estate, which identifies the type of data associated with that asset.
Any asset identified as a potential candidate for migration or deployment to the cloud should have documented
metadata to record the data classification, business criticality, and billing responsibility. These three points of
classification can go a long way to understanding and mitigating risks.

Classifications Microsoft uses


The following is a list of classifications Microsoft uses. Depending on your industry or existing security
requirements, data classification standards might already exist within your organization. If no standard exists, you
might want to use this sample classification to better understand your own digital estate and risk profile.
Non-business: Data from your personal life that doesn't belong to Microsoft.
Public: Business data that is freely available and approved for public consumption.
General: Business data that isn't meant for a public audience.
Confidential: Business data that can cause harm to Microsoft if overshared.
Highly confidential: Business data that would cause extensive harm to Microsoft if overshared.

Tagging data classification in Azure


Resource tags are a good approach for metadata storage, and you can use these tags to apply data classification
information to deployed resources. Although tagging cloud assets by classification isn't a replacement for a
formal data classification process, it provides a valuable tool for managing resources and applying policy. Azure
Information Protection is an excellent solution to help you classify data itself, regardless of where it sits (on-
premises, in Azure, or somewhere else). Consider it as part of an overall classification strategy.
For additional information on resource tagging in Azure, see Using tags to organize your Azure resources.

Next steps
Apply data classifications during one of the actionable governance guides.
Choose an actionable governance guide
Any change to business processes or technology platforms introduces risk. Cloud governance teams, whose members are
sometimes known as cloud custodians, are tasked with mitigating these risks and ensuring minimal interruption to adoption
or innovation efforts.

The Cloud Adoption Framework governance model guides these decisions (regardless of the chosen cloud platform) by
focusing on development of corporate policy and the Five Disciplines of Cloud Governance. Actionable design guides
demonstrate this model using Azure services. Learn about the disciplines of the Cloud Adoption Framework governance
model below.

Figure 1 - Diagram of corporate policy and the Five Disciplines of Cloud Governance.

Disciplines of Cloud Governance


With any cloud platform, there are common governance disciplines that help inform policies and align toolchains. These
disciplines guide decisions about the proper level of automation and enforcement of corporate policy across cloud platforms.

Cost Management
Cost is a primary concern for cloud users. Develop policies for cost control for all cloud platforms.

Security Baseline
Security is a complex topic, unique to each company. Once security requirements are established, cloud governance
policies and enforcement apply those requirements across network, data, and asset configurations.

Identity Baseline
Inconsistencies in the application of identity requirements can increase the risk of breach. The Identity Baseline discipline
focuses ensuring that identity is consistently applied across cloud adoption efforts.

Resource Consistency
Cloud operations depend on consistent resource configuration. Through governance tooling, resources can be
configured consistently to manage risks related to onboarding, drift, discoverability, and recovery.

Deployment Acceleration
Centralization, standardization, and consistency in approaches to deployment and configuration improve governance
practices. When provided through cloud-based governance tooling, they create a cloud factor that can accelerate
deployment activities.
Cost Management is one of the Five Disciplines of Cloud Governance within the Cloud Adoption Framework governance
model. For many customers, governing cost is a major concern when adopting cloud technologies. Balancing performance
demands, adoption pacing, and cloud services costs can be challenging. This is especially relevant during major business
transformations that implement cloud technologies. This section outlines the approach to developing a Cost Management
discipline as part of a cloud governance strategy.
NOTE

Cost Management governance does not replace the existing business teams, accounting practices, and procedures that are
involved in your organization's financial management of IT-related costs. The primary purpose of this discipline is to identify
potential cloud-related risks related to IT spending, and provide risk-mitigation guidance to the business and IT teams
responsible for deploying and managing cloud resources.
The primary audience for this guidance is your organization's cloud architects and other members of your cloud governance
team. However, the decisions, policies, and processes that emerge from this discipline should involve engagement and
discussions with relevant members of your business and IT teams, especially those leaders responsible for owning, managing,
and paying for cloud-based workloads.

Policy statements
Actionable policy statements and the resulting architecture requirements serve as the foundation of a Cost Management
discipline. To see policy statement samples, see the article on Cost Management Policy Statements. These samples can serve as
a starting point for your organization's governance policies.
C A U T IO N

The sample policies come from common customer experiences. To better align these policies to specific cloud governance
needs, execute the following steps to create policy statements that meet your unique business needs.

Develop governance policy statements


The following six steps will help you define governance policies to control costs in your environment.

Cost Management Template


Download the template for documenting a Cost Management discipline

Business Risks
Understand the motives and risks commonly associated with the Cost Management discipline.

Indicators and Metrics


Indicators to understand if it is the right time to invest in the Cost Management discipline.

Policy adherence processes


Suggested processes for supporting policy compliance in the Cost Management discipline.

Maturity
Aligning Cloud Management maturity with phases of cloud adoption.

Toolchain
Azure services that can be implemented to support the Cost Management discipline.

Next steps
Get started by evaluating business risks in a specific environment.
Understand business risks
Cost Management template
2 minutes to read • Edit Online

The first step to implementing change is communicating the desired change. The same is true when changing
governance practices. The template below serves as a starting point for documenting and communicating policy
statements that govern Cost Management issues in the cloud.
As your discussions progress, use this template's structure as a model for capturing the business risks, risk
tolerances, compliance processes, and tooling needed to define your organization's Cost Management policy
statements.

IMPORTANT
This template is a limited sample. Before updating this template to reflect your requirements, you should review the
subsequent steps for defining an effective Cost Management discipline within your cloud governance strategy.

Download governance discipline template

Next steps
Solid governance practices start with an understanding of business risk. Review the article on business risks and
begin to document the business risks that align with your current cloud adoption plan.
Understand business risks
Cost Management motivations and business risks
2 minutes to read • Edit Online

This article discusses the reasons that customers typically adopt a Cost Management discipline within a cloud
governance strategy. It also provides a few examples of business risks that drive policy statements.

Is Cost Management relevant?


In terms of cost governance, cloud adoption creates a paradigm shift. Management of cost in a traditional on-
premises world is based on refresh cycles, datacenter acquisitions, host renewals, and recurring maintenance
issues. You can forecast, plan, and refine each of these costs to align with annual capital expenditure budgets.
For cloud solutions, many businesses tend to take a more reactive approach to Cost Management. In many cases,
businesses will prepurchase, or commit to use, a set amount of cloud services. This model assumes that
maximizing discounts, based on how much the business plans on spending with a specific cloud vendor, creates
the perception of a proactive, planned cost cycle. However, that perception will only become a reality if the
business also implements mature Cost Management disciplines.
The cloud offers self-service capabilities that were previously unheard of in traditional on-premises datacenters.
These new capabilities empower businesses to be more agile, less restrictive, and more open to adopt new
technologies. However, the downside of self-service is that end users can unknowingly exceed allocated budgets.
Conversely, the same users can experience a change in plans and unexpectedly not use the amount of cloud
services forecasted. The potential of shift in either direction justifies investment in a Cost Management discipline
within the governance team.

Business risk
The Cost Management discipline attempts to address core business risks related to expenses incurred when
hosting cloud-based workloads. Work with your business to identify these risks and monitor each of them for
relevance as you plan for and implement your cloud deployments.
Risks will differ between organization, but the following serve as common cost-related risks that you can use as a
starting point for discussions within your cloud governance team:
Budget control: Not controlling budget can lead to excessive spending with a cloud vendor.
Utilization loss: Prepurchases or precommitments that go unused can result in lost investments.
Spending anomalies: Unexpected spikes in either direction can be indicators of improper usage.
Overprovisioned assets: When assets are deployed in a configuration that exceed the needs of an application
or virtual machine (VM ), they can create waste.

Next steps
Using the Cloud Management template, document business risks that are likely to be introduced by the current
cloud adoption plan.
After you've gained an understanding of realistic business risks, the next step is to document the business's
tolerance for risk and the indicators and key metrics to monitor that tolerance.
Understand indicators, metrics, and risk tolerance
Cost Management metrics, indicators, and risk
tolerance
3 minutes to read • Edit Online

This article will help you quantify business risk tolerance as it relates to Cost Management. Defining metrics and
indicators helps you create a business case for making an investment in the maturity of the Cost Management
discipline.

Metrics
Cost Management generally focuses on metrics related to costs. As part of your risk analysis, you'll want to gather
data related to your current and planned spending on cloud-based workloads to determine how much risk you
face, and how important investment in cost governance is to your cloud adoption strategy.
The following are examples of useful metrics that you should gather to help evaluate risk tolerance within the Cost
Management discipline:
Annual spending: The total annual cost for services provided by a cloud provider.
Monthly spending: The total monthly cost for services provided by a cloud provider.
Forecasted versus actual ratio: The ratio comparing forecasted and actual spending (monthly or annual).
Pace of adoption (MOM ) ratio: The percentage of the delta in cloud costs from month to month.
Accumulated cost: Total accrued daily spending, starting from the beginning of the month.
Spending trends: Spending trend against the budget.

Risk tolerance indicators


During early small-scale deployments, such as Dev/Test or experimental first workloads, Cost Management is
likely to be of relatively low risk. As more assets are deployed, the risk grows and the business' tolerance for risk is
likely to decline. Additionally, as more cloud adoption teams are given the ability to configure or deploy assets to
the cloud, the risk grows and tolerance decreases. Conversely, growing a Cost Management discipline will take
people from the cloud adoption phase to deploy more innovative new technologies.
In the early stages of cloud adoption, you will work with your business to determine a risk tolerance baseline. Once
you have a baseline, you will need to determine the criteria that would trigger an investment in the Cost
Management discipline. These criteria will likely be different for every organization.
Once you have identified business risks, you will work with your business to identify benchmarks that you can use
to identify triggers that could potentially increase those risks. The following are a few examples of how metrics,
such as those mentioned above, can be compared against your risk baseline tolerance to indicate your business's
need to further invest in Cost Management.
Commitment-driven (most common): A company that is committed to spending $x,000,000 this year on a
cloud vendor. They need a Cost Management discipline to ensure that the business doesn't exceed its spending
targets by more than 20%, and that they will use at least 90% of their commitment.
Percentage trigger: A company with cloud spending that is stable for their production systems. If that
changes by more than x%, then a Cost Management discipline is a wise investment.
Overprovisioned trigger: A company who believes their deployed solutions are overprovisioned. Cost
Management is a priority investment until they demonstrate proper alignment of provisioning and asset
utilization.
Monthly spending trigger: A company that spends over $x,000 per month is considered a sizable cost. If
spending exceeds that amount in a given month, they will need to invest in Cost Management.
Annual spending trigger: A company with an IT R&D budget that allows for spending $x,000 per year on
cloud experimentation. They may run production workloads in the cloud, but they are still considered
experimental solutions if the budget doesn't exceed that amount. If the budget is exceeded, they will need to
treat the budget like a production investment and manage spending closely.
Operating expense-adverse (uncommon): As a company, they are averse to operating expenses and will
need cost management controls in place before deploying a dev/test workload.

Next steps
Using the Cloud Management template, document metrics and tolerance indicators that align to the current cloud
adoption plan.
Review sample Cost Management policies as a starting point to develop policies that address specific business
risks that align with your cloud adoption plans.
Review sample policies
Cost Management sample policy statements
3 minutes to read • Edit Online

Individual cloud policy statements are guidelines for addressing specific risks identified during your risk
assessment process. These statements should provide a concise summary of risks and plans to deal with them.
Each statement definition should include these pieces of information:
Business risk: A summary of the risk this policy will address.
Policy statement: A clear summary explanation of the policy requirements.
Design options: Actionable recommendations, specifications, or other guidance that IT teams and developers
can use when implementing the policy.
The following sample policy statements address common cost-related business risks. These statements are
examples you can reference when drafting policy statements to address your organization's needs. These
examples are not meant to be prescriptive, and there are potentially several policy options for dealing with each
identified risk. Work closely with business and IT teams to identify the best policies for your unique set of risks.

Future-proofing
Business risk: Current criteria that don't warrant an investment in a Cost Management discipline from the
governance team. However, you anticipate such an investment in the future.
Policy statement: You should associate all assets deployed to the cloud with a billing unit and
application/workload. This policy will ensure that future Cost Management efforts will be effective.
Design options: For information on establishing a future-proof foundation, see the discussions related to
creating a governance MVP in the actionable design guides included as part of the Cloud Adoption Framework
guidance.

Budget overruns
Business risk: Self-service deployment creates a risk of overspending.
Policy statement: Any cloud deployment must be allocated to a billing unit with approved budget and a
mechanism for budgetary limits.
Design options: In Azure, budget can be controlled with Azure Cost Management

Underutilization
Business risk: The company has prepaid for cloud services or has made an annual commitment to spend a
specific amount. There is a risk that the agreed-on amount won't be used, resulting in a lost investment.
Policy statement: Each billing unit with an allocated cloud budget will meet annually to set budgets, quarterly to
adjust budgets, and monthly to allocate time for reviewing planned versus actual spending. Discuss any deviations
greater than 20% with the billing unit leader monthly. For tracking purposes, assign all assets to a billing unit.
Design options:
In Azure, planned versus actual spending can be managed via Azure Cost Management
There are several options for grouping resources by billing unit. In Azure, a resource consistency model should
be chosen in conjunction with the governance team and applied to all assets.
Overprovisioned assets
Business risk: In traditional on-premises datacenters, it is common practice to deploy assets with extra capacity
planning for growth in the distant future. The cloud can scale more quickly than traditional equipment. Assets in
the cloud are also priced based on the technical capacity. There is a risk of the old on-premises practice artificially
inflating cloud spending.
Policy statement: Any asset deployed to the cloud must be enrolled in a program that can monitor utilization
and report any capacity in excess of 50% of utilization. Any asset deployed to the cloud must be grouped or
tagged in a logical manner, so governance team members can engage the workload owner regarding any
optimization of overprovisioned assets.
Design options:
In Azure, Azure Advisor can provide optimization recommendations.
There are several options for grouping resources by billing unit. In Azure, a resource consistency model should
be chosen in conjunction with the governance team and applied to all assets.

Overoptimization
Business risk: Effective cost management creates new risks. Optimization of spending is inverse to system
performance. When reducing costs, there is a risk of overtightening spending and producing poor user
experiences.
Policy statement: Any asset that directly affects customer experiences must be identified through grouping or
tagging. Before optimizing any asset that affects customer experience, the cloud governance team must adjust
optimization based on at least 90 days of utilization trends. Document any seasonal or event driven bursts
considered when optimizing assets.
Design options:
In Azure, Azure Monitor's insights features can help with analysis of system utilization.
There are several options for grouping and tagging resources based on roles. In Azure, you should choose a
resource consistency model in conjunction with the governance team and apply this to all assets.

Next steps
Use the samples mentioned in this article as a starting point to develop policies that address specific business
risks that align with your cloud adoption plans.
To begin developing your own custom policy statements related to Cost Management, download the Cost
Management template.
To accelerate adoption of this discipline, choose the actionable governance guide that most closely aligns with
your environment. Then modify the design to incorporate your specific corporate policy decisions.
Building on risks and tolerance, establish a process for governing and communicating Cost Management policy
adherence.
Establish policy compliance processes
Cost Management policy compliance processes
3 minutes to read • Edit Online

This article discusses an approach to creating processes that support a Cost Management governance discipline.
Effective governance of cloud costs starts with recurring manual processes designed to support policy compliance.
This requires regular involvement of the cloud governance team and interested business stakeholders to review
and update policy and ensure policy compliance. In addition, many ongoing monitoring and enforcement
processes can be automated or supplemented with tooling to reduce the overhead of governance and allow for
faster response to policy deviation.

Planning, review, and reporting processes


The best Cost Management tools in the cloud are only as good as the processes and policies that they support.
The following is a set of example processes commonly involved in the Cost Management discipline. Use these
examples as a starting point when planning the processes that will allow you to continue to update cost policy
based on business change and feedback from the business teams subject to cost governance guidance.
Initial risk assessment and planning: As part of your initial adoption of the Cost Management discipline,
identify your core business risks and tolerances related to cloud costs. Use this information to discuss budget and
cost-related risks with members of your business teams and develop a baseline set of policies for mitigating these
risks to establish your initial governance strategy.
Deployment planning: Before deploying any asset, establish a forecasted budget based on expected cloud
allocation. Ensure that ownership and accounting information for the deployment is documented.
Annual planning: On an annual basis, perform a roll-up analysis on all deployed and to-be-deployed assets.
Align budgets by business units, departments, teams, and other appropriate divisions to empower self-service
adoption. Ensure that the leader of each billing unit is aware of the budget and how to track spending.
This is the time to make a precommitment or prepurchase to maximize discounting. It is wise to align annual
budgeting with the cloud vendor's fiscal year to further capitalize on year-end discount options.
Quarterly planning: On a quarterly basis, review budgets with each billing unit leader to align forecast and actual
spending. If there are changes to the plan or unexpected spending patterns, align and reallocate the budget.
This quarterly planning process is also a good time to evaluate the current membership of your cloud governance
team for knowledge gaps related to current or future business plans. Invite relevant staff and workload owners to
participate in reviews and planning as either temporary advisors or permanent members of your team.
Education and training: On a bimonthly basis, offer training sessions to make sure business and IT staff are up-
to-date on the latest Cost Management policy requirements. As part of this process review and update any
documentation, guidance, or other training assets to ensure they are in sync with the latest corporate policy
statements.
Monthly reporting: On a monthly basis, report actual spending against forecast. Notify billing leaders of any
unexpected deviations.
These basic processes will help align spending and establish a foundation for the Cost Management discipline.

Processes for ongoing monitoring


A successful Cost Management governance strategy depends on visibility into the past, current, and planned
future cloud-related spending. Without the ability to analyze the relevant metrics and data of your existing costs,
you cannot identify changes in your risks or detect violations of your risk tolerances. The ongoing governance
processes discussed above require quality data to ensure policy can be modified to better protect your
infrastructure against changing business requirements and cloud usage.
Ensure that your IT teams have implemented automated systems for monitoring your cloud spending and usage
for unplanned deviations from expected costs. Establish reporting and alerting systems to ensure prompt
detection and mitigation of potential policy violations.

Compliance violation triggers and enforcement actions


When violations are detected, you should take enforcement actions to realign with policy. You can automate most
violation triggers using the tools outlined in the Cost Management toolchain for Azure.
The following are examples of triggers:
Monthly budget deviations: Discuss any deviations in monthly spending that exceed 20% forecast-versus-
actual ratio with the billing unit leader. Record resolutions and changes in forecast.
Pace of adoption: Any deviation at a subscription level exceeding 20% will trigger a review with billing unit
leader. Record resolutions and changes in forecast.

Next steps
Using the Cloud Management template, document the processes and triggers that align to the current cloud
adoption plan.
For guidance on executing cloud management policies in alignment with adoption plans, see the article on Cost
Management discipline improvement.
Cost Management discipline improvement
Cost Management discipline improvement
4 minutes to read • Edit Online

The Cost Management discipline attempts to address core business risks related to expenses incurred when
hosting cloud-based workloads. Within the Five Disciplines of Cloud Governance, Cost Management is involved in
controlling cost and usage of cloud resources with the goal of creating and maintaining a planned cost cycle.
This article outlines potential tasks your company perform to develop and mature your Cost Management
discipline. These tasks can be broken down into planning, building, adopting, and operating phases of
implementing a cloud solution, which are then iterated on allowing the development of an incremental approach
to cloud governance.

Figure 1 - Adoption phases of the incremental approach to cloud governance.


No single document can account for the requirements of all businesses. As such, this article outlines suggested
minimum and potential example activities for each phase of the governance maturation process. The initial
objective of these activities is to help you build a Policy MVP and establish a framework for incremental policy
improvement. Your cloud governance team will need to decide how much to invest in these activities to improve
your Cost Management governance capabilities.
Cau t i on

Neither the minimum or potential activities outlined in this article are aligned to specific corporate policies or
third-party compliance requirements. This guidance is designed to help facilitate the conversations that will lead to
alignment of both requirements with a cloud governance model.

Planning and readiness


This phase of governance maturity bridges the divide between business outcomes and actionable strategies.
During this process, the leadership team defines specific metrics, maps those metrics to the digital estate, and
begins planning the overall migration effort.
Minimum suggested activities:
Evaluate your Cost Management toolchain options.
Develop a draft Architecture Guidelines document and distribute to key stakeholders.
Educate and involve the people and teams affected by the development of Architecture Guidelines.
Potential activities:
Ensure budgetary decisions that support the business justification for your cloud strategy.
Validate learning metrics that you use to report on the successful allocation of funding.
Understand the desired cloud accounting model that affects how cloud costs should be accounted for.
Become familiar with the digital estate plan and validate accurate costing expectations.
Evaluate buying options to determine if it's better to "pay as you go" or to make a precommitment by
purchasing an Enterprise Agreement.
Align business goals with planned budgets and adjust budgetary plans as necessary.
Develop a goals and budget reporting mechanism to notify technical and business stakeholders at the end of
each cost cycle.

Build and predeployment


Several technical and nontechnical prerequisites are required to successfully migrate an environment. This process
focuses on the decisions, readiness, and core infrastructure that proceeds a migration.
Minimum suggested activities:
Implement your Cost Management toolchain by rolling out in a predeployment phase.
Update the Architecture Guidelines document and distribute to key stakeholders.
Develop educational materials and documentation, awareness communications, incentives, and other programs
to help drive user adoption.
Determine if your purchase requirements align with your budgets and goals.
Potential activities:
Align your budgetary plans with the Subscription Strategy that defines your core ownership model.
Use the Resource Consistency Strategy to enforce architecture and cost guidelines over time.
Determine if any cost anomalies affect your adoption and migration plans.

Adopt and migrate


Migration is an incremental process that focuses on the movement, testing, and adoption of applications or
workloads in an existing digital estate.
Minimum suggested activities:
Migrate your Cost Management toolchain from predeployment to production.
Update the Architecture Guidelines document and distribute to key stakeholders.
Develop educational materials and documentation, awareness communications, incentives, and other programs
to help drive user adoption.
Potential activities:
Implement your cloud accounting model.
Ensure that your budgets reflect your actual spending during each release and adjust as necessary.
Monitor changes in budgetary plans and validate with stakeholders if additional sign-offs are needed.
Update changes to the Architecture Guidelines document to reflect actual costs.

Operate and post-implementation


After the transformation is complete, governance and operations must live on for the natural lifecycle of an
application or workload. This phase of governance maturity focuses on the activities that commonly come after the
solution is implemented and the transformation cycle begins to stabilize.
Minimum suggested activities:
Customize your Cost Management toolchain based on changes in your organization's cost management needs.
Consider automating any notifications and reports to reflect actual spending.
Refine Architecture Guidelines to guide future adoption processes.
Educate affected teams on a periodic basis to ensure ongoing adherence to the Architecture Guidelines.
Potential activities:
Execute a quarterly cloud business review to communicate value delivered to the business and associated costs.
Adjust plans quarterly to reflect changes to actual spending.
Determine financial alignment to P&Ls for business unit subscriptions.
Analyze stakeholder value and cost reporting methods on a monthly basis.
Remediate underused assets and determine if they're worth continuing.
Detect misalignments and anomalies between the plan and actual spending.
Assist the cloud adoption teams and the cloud strategy team with understanding and resolving these
anomalies.

Next steps
Now that you understand the concept of cloud identity governance, examine the Cost Management toolchain to
identify Azure tools and features that you'll need when developing the Cost Management governance discipline on
the Azure platform.
Cost Management toolchain for Azure
Cost Management tools in Azure
2 minutes to read • Edit Online

Cost Management is one of the Five Disciplines of Cloud Governance. This discipline focuses on ways of
establishing cloud spending plans, allocating cloud budgets, monitoring and enforcement of cloud budgets,
detecting costly anomalies, and adjusting the cloud governance plan when actual spending is misaligned.
The following is a list of Azure native tools that can help mature the policies and processes that support this
governance discipline.

AZURE COST AZURE EA CONTENT


TOOL AZURE PORTAL MANAGEMENT PACK AZURE POLICY

Enterprise Agreement No No Yes No


required?

Budget control No Yes No Yes

Monitor spending on Yes Yes Yes No


single resource

Monitor spending No Yes Yes No


across multiple
resources

Control spending on Yes - manual sizing Yes No Yes


single resource

Enforce spending No Yes No Yes


across multiple
resources

Enforce accounting No No No Yes


metadata on
resources

Monitor and detect Yes Yes Yes No


trends

Detect spending No Yes Yes No


anomalies

Socialize deviations No Yes Yes No


Security Baseline is one of the Five Disciplines of Cloud Governance within the Cloud Adoption Framework governance model.
Security is a component of any IT deployment, and the cloud introduces unique security concerns. Many businesses are subject
to regulatory requirements that make protecting sensitive data a major organizational priority when considering a cloud
transformation. Identifying potential security threats to your cloud environment and establishing processes and procedures for
addressing these threats should be a priority for any IT security or cybersecurity team. The Security Baseline discipline ensures
technical requirements and security constraints are consistently applied to cloud environments, as those requirements mature.
NOTE

Security Baseline governance does not replace the existing IT teams, processes, and procedures that your organization uses to
secure cloud-deployed resources. The primary purpose of this discipline is to identify security-related business risks and provide
risk-mitigation guidance to the IT staff responsible for security infrastructure. As you develop governance policies and processes
make sure to involve relevant IT teams in your planning and review processes.
This article outlines the approach to developing a Security Baseline discipline as part of your cloud governance strategy. The
primary audience for this guidance is your organization's cloud architects and other members of your cloud governance team.
However, the decisions, policies, and processes that emerge from this discipline should involve engagement and discussions with
relevant members of your IT and security teams, especially those technical leaders responsible for implementing networking,
encryption, and identity services.
Making the correct security decisions is critical to the success of your cloud deployments and wider business success. If your
organization lacks in-house expertise in cybersecurity, consider engaging external security consultants as a component of this
discipline. Also consider engaging Microsoft Consulting Services, the Microsoft FastTrack cloud adoption service, or other
external cloud adoption experts to discuss concerns related to this discipline.

Policy statements
Actionable policy statements and the resulting architecture requirements serve as the foundation of a Security Baseline
discipline. To see policy statement samples, see the article on Security Baseline Policy Statements. These samples can serve as a
starting point for your organization's governance policies.
C A U T IO N

The sample policies come from common customer experiences. To better align these policies to specific cloud governance needs,
execute the following steps to create policy statements that meet your unique business needs.

Develop governance policy statements


The following six steps offer examples and potential options to consider when developing Security Baseline governance. Use
each step as a starting point for discussions within your cloud governance team and with affected business, IT, and security
teams across your organization to establish the policies and processes needed to manage security-related risks.

Security Baseline Template


Download the template for documenting a Security Baseline discipline

Business Risks
Understand the motives and risks commonly associated with the Security Baseline discipline.
Indicators and Metrics
Indicators to understand if it is the right time to invest in the Security Baseline discipline.

Policy adherence processes


Suggested processes for supporting policy compliance in the Security Baseline discipline.

Maturity
Aligning Cloud Management maturity with phases of cloud adoption.

Toolchain
Azure services that can be implemented to support the Security Baseline discipline.

Next steps
Get started by evaluating business risks in a specific environment.
Understand business risks
Security Baseline template
2 minutes to read • Edit Online

The first step to implementing change is communicating what is desired. The same is true when changing
governance practices. The template below provides a starting point for documenting and communicating policy
statements that govern security related issues in the cloud.
As your discussions progress, use this template's structure as a model for capturing the business risks, risk
tolerances, compliance processes, and tooling needed to define your organization's Security Baseline policy
statements.

IMPORTANT
This template is a limited sample. Before updating this template to reflect your requirements, you should review the
subsequent steps for defining an effective Security Baseline discipline within your cloud governance strategy.

Download governance discipline template

Next steps
Solid governance practices start with an understanding of business risk. Review the article on business risks and
begin to document the business risks that align with your current cloud adoption plan.
Understand business risks
Security Baseline motivations and business risks
2 minutes to read • Edit Online

This article discusses the reasons that customers typically adopt a Security Baseline discipline within a cloud
governance strategy. It also provides a few examples of potential business risks that can drive policy statements.

Security Baseline relevancy


Security is a key concern for any IT organization. Cloud deployments face many of the same security risks as
workloads hosted in traditional on-premises datacenters. However, the nature of public cloud platforms, with a
lack of direct ownership of the physical hardware storing and running your workloads, means cloud security
requires its own policy and processes.
One of the primary things that sets cloud security governance apart from traditional security policy is the ease
with which resources can be created, potentially adding vulnerabilities if security isn't considered before
deployment. The flexibility that technologies like software defined networking (SDN ) provide for rapidly changing
your cloud-based network topology can also easily modify your overall network attack surface in unforeseen
ways. Cloud platforms also provide tools and features that can improve your security capabilities in ways not
always possible in on-premises environments.
The amount you invest into security policy and processes will depend a great deal on the nature of your cloud
deployment. Initial test deployments may only need the most basic of security policies in place, while a mission-
critical workload will entail addressing complex and extensive security needs. All deployments will need to engage
with the discipline at some level.
The Security Baseline discipline covers the corporate policies and manual processes that you can put in place to
protect your cloud deployment against security risks.

NOTE
While it is important to understand Identity Baseline in the context of Security Baseline and how that relates to Access
Control, the Five Disciplines of Cloud Governance calls out Identity Baseline as its own discipline, separate from Security
Baseline.

Business risk
The Security Baseline discipline attempts to address core security-related business risks. Work with your business
to identify these risks and monitor each of them for relevance as you plan for and implement your cloud
deployments.
Risks will differ between organization, but the following serve as common security-related risks that you can use
as a starting point for discussions within your cloud governance team:
Data breach: Inadvertent exposure or loss of sensitive cloud-hosted data can lead to losing customers,
contractual issues, or legal consequences.
Service disruption: Outages and other performance issues due to insecure infrastructure interrupts normal
operations and can result in lost productivity or lost business.

Next steps
Using the Cloud Management template, document business risks that are likely to be introduced by the current
cloud adoption plan.
Once an understanding of realistic business risks is established, the next step is to document the business's
tolerance for risk and the indicators and key metrics to monitor that tolerance.
Understand indicators, metrics, and risk tolerance
Security Baseline metrics, indicators, and risk
tolerance
4 minutes to read • Edit Online

This article will help you quantify business risk tolerance as it relates to Security Baseline. Defining metrics and
indicators helps you create a business case for making an investment in maturing the Security Baseline discipline.

Metrics
Security Baseline generally focuses on identifying potential vulnerabilities in your cloud deployments. As part of
your risk analysis you'll want to gather data related to your security environment to determine how much risk you
face, and how important investment in Security Baseline governance is to your planned cloud deployments.
Every organization has different security environments and requirements and different potential sources of
security data. The following are examples of useful metrics that you should gather to help evaluate risk tolerance
within the Security Baseline discipline:
Data classification: Number of cloud-stored data and services that are unclassified according to on your
organization's privacy, compliance, or business impact standards.
Number of sensitive data stores: Number of storage end points or databases that contain sensitive data and
should be protected.
Number of unencrypted data stores: Number of sensitive data stores that are not encrypted.
Attack surface: How many total data sources, services, and applications will be cloud-hosted. What percentage
of these data sources are classified as sensitive? What percentage of these applications and services are
mission-critical?
Covered standards: Number of security standards defined by the security team.
Covered resources: Deployed assets that are covered by security standards.
Overall standards compliance: Ratio of compliance adherence to security standards.
Attacks by severity: How many coordinated attempts to disrupt your cloud-hosted services, such as through
Distributed Denial of Service (DDoS ) attacks, does your infrastructure experience? What is the size and severity
of these attacks?
Malware protection: Percentage of deployed virtual machines (VMs) that have all required anti-malware,
firewall, or other security software installed.
Patch latency: How long has it been since VMs have had OS and software patches applied.
Security health recommendations: Number of security software recommendations for resolving health
standards for deployed resources, organized by severity.

Risk tolerance indicators


Cloud platforms provide a baseline set of features that enable small deployment teams to configure basic security
settings without extensive additional planning. As a result, small dev/test or experimental first workloads that do
not include sensitive data represent a relatively low level of risk, and will likely not need much in the way of formal
Security Baseline policy. However, as soon as important data or mission-critical functionality is moved to the
cloud, security risks increase, while tolerance for those risks diminishes rapidly. As more of your data and
functionality is deployed to the cloud, the more likely you need an increased investment in the Security Baseline
discipline.
In the early stages of cloud adoption, work with your IT security team and business stakeholders to identify
business risks related to security, then determine an acceptable baseline for security risk tolerance. This section of
the Cloud Adoption Framework provides examples, but the detailed risks and baselines for your company or
deployments may be different.
Once you have a baseline, establish minimum benchmarks representing an unacceptable increase in your
identified risks. These benchmarks act as triggers for when you need to take action to remediate these risks. The
following are a few examples of how security metrics, such as those discussed above, can justify an increased
investment in the Security Baseline discipline.
Mission-critical workloads trigger. A company deploying mission-critical workloads to the cloud should
invest in the Security Baseline discipline to prevent potential disruption of service or sensitive data exposure.
Protected data trigger. A company hosting data on the cloud that can be classified as confidential, private, or
otherwise subject to regulatory concerns. They need a Security Baseline discipline to ensure that this data is not
subject to loss, exposure, or theft.
External attacks trigger. A company that experiences serious attacks against their network infrastructure x
times per month could benefit from the Security Baseline discipline.
Standards compliance trigger. A company with more than x% of resources out of security standards
compliance should invest in the Security Baseline discipline to ensure standards are applied consistently across
your IT infrastructure.
Cloud estate size trigger. A company hosting more than x applications, services, or data sources. Large cloud
deployments can benefit from investment in the Security Baseline discipline to ensure that their overall attack
surface is properly protected against unauthorized access or other external threats.
Security software compliance trigger. A company where less than x% of deployed virtual machines have all
required security software installed. A Security Baseline discipline can be used to ensure software is installed
consistently on all software.
Patching trigger. A company where deployed virtual machines or services where OS or software patches
have not been applied in the last x days. A Security Baseline discipline can be used to ensure patching is kept
up-to-date within a required schedule.
Security-focused. Some companies will have strong security and data confidentiality requirements even for
test and experimental workloads. These companies will need to invest in the Security Baseline discipline before
any deployments can begin.
The exact metrics and triggers you use to gauge risk tolerance and the level of investment in the Security Baseline
discipline will be specific to your organization, but the examples above should serve as a useful base for discussion
within your cloud governance team.

Next steps
Using the Cloud Management template, document metrics and tolerance indicators that align to the current cloud
adoption plan.
Review sample Security Baseline policies as a starting point to develop policies that address specific business risks
that align with your cloud adoption plans.
Review sample policies
Security Baseline sample policy statements
4 minutes to read • Edit Online

Individual cloud policy statements are guidelines for addressing specific risks identified during your risk
assessment process. These statements should provide a concise summary of risks and plans to deal with them.
Each statement definition should include these pieces of information:
Technical risk: A summary of the risk this policy will address.
Policy statement: A clear summary explanation of the policy requirements.
Technical options: Actionable recommendations, specifications, or other guidance that IT teams and
developers can use when implementing the policy.
The following sample policy statements address common security-related business risks. These statements are
examples you can reference when drafting policy statements to address your organization's needs. These examples
are not meant to be proscriptive, and there are potentially several policy options for dealing with each identified
risk. Work closely with business, security, and IT teams to identify the best policies for your unique set of risks.

Asset classification
Technical risk: Assets that are not correctly identified as mission-critical or involving sensitive data may not
receive sufficient protections, leading to potential data leaks or business disruptions.
Policy statement: All deployed assets must be categorized by criticality and data classification. Classifications
must be reviewed by the cloud governance team and the application owner before deployment to the cloud.
Potential design option: Establish resource tagging standards and ensure IT staff apply them consistently to any
deployed resources using Azure resource tags.

Data encryption
Technical risk: There is a risk of protected data being exposed during storage.
Policy statement: All protected data must be encrypted when at rest.
Potential design option: See the Azure encryption overview article for a discussion of how data at rest
encryption is performed on the Azure platform. Additional controls such as in account data encryption and control
over how storage account settings can be changed should also be considered.

Network isolation
Technical risk: Connectivity between networks and subnets within networks introduces potential vulnerabilities
that can result in data leaks or disruption of mission-critical services.
Policy statement: Network subnets containing protected data must be isolated from any other subnets. Network
traffic between protected data subnets is to be audited regularly.
Potential design option: In Azure, network and subnet isolation is managed through Azure Virtual Networks.

Secure external access


Technical risk: Allowing access to workloads from the public internet introduces a risk of intrusion resulting in
unauthorized data exposure or business disruption.
Policy statement: No subnet containing protected data can be directly accessed over public internet or across
datacenters. Access to those subnets must be routed through intermediate subnets. All access into those subnets
must come through a firewall solution capable of performing packet scanning and blocking functions.
Potential design option: In Azure, secure public endpoints by deploying a DMZ between the public internet and
your cloud-based network. Consider deployment, configuration and automation of Azure Firewall.

DDoS protection
Technical risk: Distributed denial of service (DDoS ) attacks can result in a business interruption.
Policy statement: Deploy automated DDoS mitigation mechanisms to all publicly accessible network endpoints.
No public facing web site backed by IaaS should be exposed to the internet without DDoS.
Potential design option: Use Azure DDoS Protection Standard to minimize disruptions caused by DDoS attacks.

Secure on-premises connectivity


Technical risk: Unencrypted traffic between your cloud network and on-premises over the public internet is
vulnerable to interception, introducing the risk of data exposure.
Policy statement: All connections between the on-premises and cloud networks must take place either through a
secure encrypted VPN connection or a dedicated private WAN link.
Potential design option: In Azure, use ExpressRoute or Azure VPN to establish private connections between
your on-premises and cloud networks.

Network monitoring and enforcement


Technical risk: Changes to network configuration can lead to new vulnerabilities and data exposure risks.
Policy statement: Governance tooling must audit and enforce network configuration requirements defined by
the Security Baseline team.
Potential design option: In Azure, network activity can be monitored using Azure Network Watcher, and Azure
Security Center can help identify security vulnerabilities. Azure Policy allows you to restrict network resources and
resource configuration policy according to limits defined by the security team.

Security review
Technical risk: Over time, new security threats and attack types emerge, increasing the risk of exposure or
disruption of your cloud resources.
Policy statement: Trends and potential exploits that could affect cloud deployments should be reviewed regularly
by the security team to provide updates to Security Baseline tooling used in the cloud.
Potential design option: Establish a regular security review meeting that includes relevant IT and governance
team members. Review existing security data and metrics to establish gaps in current policy and Security Baseline
tooling, and update policy to remediate any new risks. Leverage Azure Advisor and Azure Security Center to gain
actionable insights on emerging threats specific to your deployments.

Next steps
Use the samples mentioned in this article as a starting point to develop policies that address specific security risks
that align with your cloud adoption plans.
To begin developing your own custom policy statements related to Security Baseline, download the Security
Baseline template.
To accelerate adoption of this discipline, choose the actionable governance guide that most closely aligns with your
environment. Then modify the design to incorporate your specific corporate policy decisions.
Building on risks and tolerance, establish a process for governing and communicating Security Baseline policy
adherence.
Establish policy compliance processes
Security Baseline policy compliance processes
5 minutes to read • Edit Online

This article discusses an approach to policy adherence processes that govern Security Baseline. Effective
governance of cloud security starts with recurring manual processes designed to detect vulnerabilities and impose
policies to remediate those security risks. This requires regular involvement of the cloud governance team and
interested business and IT stakeholders to review and update policy and ensure policy compliance. In addition,
many ongoing monitoring and enforcement processes can be automated or supplemented with tooling to reduce
the overhead of governance and allow for faster response to policy deviation.

Planning, review, and reporting processes


The best Security Baseline tools in the cloud are only as good as the processes and policies that they support. The
following is a set of example processes commonly involved in the Security Baseline discipline. Use these examples
as a starting point when planning the processes that will allow you to continue to update security policy based on
business change and feedback from the security and IT teams tasked with turning governance guidance into
action.
Initial risk assessment and planning: As part of your initial adoption of the Security Baseline discipline, identify
your core business risks and tolerances related to cloud security. Use this information to discuss specific technical
risks with members of your IT and security teams and develop a baseline set of security policies for mitigating
these risks to establish your initial governance strategy.
Deployment planning: Before deploying any workload or asset, perform a security review to identify any new
risks and ensure all access and data security policy requirements are met.
Deployment testing: As part of the deployment process for any workload or asset, the cloud governance team,
in cooperation with your corporate security teams, will be responsible for reviewing the deployment to validate
security policy compliance.
Annual planning: On an annual basis, perform a high-level review of Security Baseline strategy. Explore future
corporate priorities and updated cloud adoption strategies to identify potential risk increase and other emerging
security needs. Also use this time to review the latest Security Baseline best practices and integrate these into your
policies and review processes.
Quarterly review and planning: On a quarterly basis perform a review of security audit data and incident
reports to identify any changes required in security policy. As part of this process, review the current cybersecurity
landscape to proactively anticipate emerging threats, and update policy as appropriate. After the review is
complete, align design guidance with updated policy.
This planning process is also a good time to evaluate the current membership of your cloud governance team for
knowledge gaps related to new or changing policy and risks related to security. Invite relevant IT staff to
participate in reviews and planning as either temporary technical advisors or permanent members of your team.
Education and training: On a bimonthly basis, offer training sessions to make sure IT staff and developers are
up-to-date on the latest security policy requirements. As part of this process review and update any
documentation, guidance, or other training assets to ensure they are in sync with the latest corporate policy
statements.
Monthly audit and reporting reviews: On a monthly basis, perform an audit on all cloud deployments to
assure their continued alignment with security policy. Review security related activities with IT staff and identify
any compliance issues not already handled as part of the ongoing monitoring and enforcement process. The result
of this review is a report for the cloud strategy team and each cloud adoption team to communicate overall
adherence to policy. The report is also stored for auditing and legal purposes.

Ongoing monitoring processes


Determining if your security governance strategy is successful depends on visibility into the current and past state
of your cloud infrastructure. Without the ability to analyze the relevant metrics and data of your cloud resources
security health and activity, you cannot identify changes in your risks or detect violations of your risk tolerances.
The ongoing governance processes discussed above require quality data to ensure policy can be modified to
better protect your infrastructure against changing threats and security requirements.
Ensure that your security and IT teams have implemented automated monitoring systems for your cloud
infrastructure that capture the relevant logs data you need to evaluate risk. Be proactive in monitoring these
systems to ensure prompt detection and mitigation of potential policy violation, and ensure your monitoring
strategy is in line with security needs.

Violation triggers and enforcement actions


Because security noncompliance can lead to critical and data exposure and service disruption risks, the cloud
governance team should have visibility into serious policy violations. Ensure IT staff have clear escalation paths for
reporting security issues to the governance team members best suited to identify and verify that policy issues are
mitigated.
When violations are detected, you should take actions to realign with policy as soon as possible. Your IT team can
automate most violation triggers using the tools outlined in the Security Baseline toolchain for Azure.
The following triggers and enforcement actions provide examples you can reference when planning how to use
monitoring data to resolve policy violations:
Increase in attacks detected. If any resource experiences a 25% increase in brute force or DDoS attacks,
discuss with IT security staff and workload owner to determine remedies. Track issue and update guidance if
policy revision is necessary to prevent future incidents.
Unclassified data detected. Any data source without an appropriate privacy, security, or business impact
classification will have external access denied until the classification is applied by the data owner and the
appropriate level of data protection applied.
Security health issue detected. Disable access to any virtual machines (VMs) that have known access or
malware vulnerabilities identified until appropriate patches or security software can be installed. Update policy
guidance to account for any newly detected threats.
Network vulnerability detected. Access to any resource not explicitly allowed by the network access policies
should trigger an alert to IT security staff and the relevant workload owner. Track issue and update guidance if
policy revision is necessary to mitigate future incidents.

Next steps
Using the Cloud Management template, document the processes and triggers that align to the current cloud
adoption plan.
For guidance on executing cloud management policies in alignment with adoption plans, see the article on
discipline improvement.
Security Baseline discipline improvement
Security Baseline discipline improvement
5 minutes to read • Edit Online

The Security Baseline discipline focuses on ways of establishing policies that protect the network, assets, and most
importantly the data that will reside on a cloud provider's solution. Within the Five Disciplines of Cloud
Governance, Security Baseline includes classification of the digital estate and data. It also includes documentation
of risks, business tolerance, and mitigation strategies associated with the security of the data, assets, and network.
From a technical perspective, this also includes involvement in decisions regarding encryption, network
requirements, hybrid identity strategies, and the processes used to develop cloud Security Baseline policies.
This article outlines some potential tasks your company can engage in to better develop and mature the Security
Baseline discipline. These tasks can be broken down into planning, building, adopting, and operating phases of
implementing a cloud solution, which are then iterated on allowing the development of an incremental approach
to cloud governance.

Figure 1 - Adoption phases of the incremental approach to cloud governance.


It's impossible for any one document to account for the requirements of all businesses. As such, this article
outlines suggested minimum and potential example activities for each phase of the governance maturation
process. The initial objective of these activities is to help you build a Policy MVP and establish a framework for
incremental policy improvement. Your cloud governance team will need to decide how much to invest in these
activities to improve your Security Baseline governance capabilities.
Cau t i on

Neither the minimum or potential activities outlined in this article are aligned to specific corporate policies or
third-party compliance requirements. This guidance is designed to help facilitate the conversations that will lead to
alignment of both requirements with a cloud governance model.

Planning and readiness


This phase of governance maturity bridges the divide between business outcomes and actionable strategies.
During this process, the leadership team defines specific metrics, maps those metrics to the digital estate, and
begins planning the overall migration effort.
Minimum suggested activities:
Evaluate your Security Baseline toolchain options.
Develop a draft Architecture Guidelines document and distribute to key stakeholders.
Educate and involve the people and teams affected by the development of architecture guidelines.
Add prioritized security tasks to your migration backlog.
Potential activities:
Define a data classification schema.
Conduct a digital estate planning process to inventory the current IT assets powering your business processes
and supporting operations.
Conduct a policy review to begin the process of modernizing existing corporate IT security policies, and define
MVP policies addressing known risks.
Review your cloud platform's security guidelines. For Azure these can be found in the Microsoft Service Trust
Platform.
Determine whether your Security Baseline policy includes a Security Development Lifecycle.
Evaluate network, data, and asset-related business risks based on the next one to three releases, and gauge
your organization's tolerance for those risks.
Review Microsoft's top trends in cybersecurity report to get an overview of the current security landscape.
Consider developing a Security DevOps role in your organization.

Build and predeployment


Several technical and nontechnical prerequisites are required to successful migrate an environment. This process
focuses on the decisions, readiness, and core infrastructure that proceeds a migration.
Minimum suggested activities:
Implement your Security Baseline toolchain by rolling out in a predeployment phase.
Update the Architecture Guidelines document and distribute to key stakeholders.
Implement security tasks on your prioritized migration backlog.
Develop educational materials and documentation, awareness communications, incentives, and other programs
to help drive user adoption.
Potential activities:
Determine your organization's encryption strategy for cloud-hosted data.
Evaluate your cloud deployment's identity strategy. Determine how your cloud-based identity solution will
coexist or integrate with on-premises identity providers.
Determine network boundary policies for your Software Defined Networking (SDN ) design to ensure secure
virtualized networking capabilities.
Evaluate your organization's least-privilege access policies, and use task-based roles to provide access to
specific resources.
Apply security and monitoring mechanisms to all cloud services and virtual machines.
Automate security policies where possible.
Review your Security Baseline policy and determine if you need to modify your plans according to best
practices guidance such as those outlined in the Security Development Lifecycle.

Adopt and migrate


Migration is an incremental process that focuses on the movement, testing, and adoption of applications or
workloads in an existing digital estate.
Minimum suggested activities:
Migrate your Security Baseline toolchain from predeployment to production.
Update the Architecture Guidelines document and distribute to key stakeholders.
Develop educational materials and documentation, awareness communications, incentives, and other programs
to help drive user adoption.
Potential activities:
Review the latest Security Baseline and threat information to identify any new business risks.
Gauge your organization's tolerance to handle new security risks that may arise.
Identify deviations from policy, and enforce corrections.
Adjust security and access control automation to ensure maximum policy compliance.
Validate that the best practices defined during the build and predeployment phases are properly executed.
Review your least-privilege access policies and adjust access controls to maximize security.
Test your Security Baseline toolchain against your workloads to identify and resolve any vulnerabilities.

Operate and post-implementation


Once the transformation is complete, governance and operations must live on for the natural lifecycle of an
application or workload. This phase of governance maturity focuses on the activities that commonly come after the
solution is implemented and the transformation cycle begins to stabilize.
Minimum suggested activities:
Validate and refine your Security Baseline toolchain.
Customize notifications and reports to alert you of potential security issues.
Refine the Architecture Guidelines to guide future adoption processes.
Communicate and educate the affected teams periodically to ensure ongoing adherence to architecture
guidelines.
Potential activities:
Discover patterns and behavior for your workloads and configure your monitoring and reporting tools to
identify and notify you of any abnormal activity, access, or resource usage.
Continuously update your monitoring and reporting policies to detect the latest vulnerabilities, exploits, and
attacks.
Have procedures in place to quickly stop unauthorized access and disable resources that may have been
compromised by an attacker.
Regularly review the latest security best practices and apply recommendations to your security policy,
automation, and monitoring capabilities where possible.

Next steps
Now that you understand the concept of cloud security governance, move on to learn more about what security
and best practices guidance Microsoft provides for Azure.
Learn about security guidance for Azure Introduction to Azure security Learn about logging, reporting, and
monitoring
Cloud-native Security Baseline policy
6 minutes to read • Edit Online

Security Baseline is one of the Five Disciplines of Cloud Governance. This discipline focuses on general security
topics including protection of the network, digital assets, data, etc. As outlined in the policy review guide, the Cloud
Adoption Framework includes three levels of sample policy: cloud-native, enterprise, and cloud-design-principle-
compliant for each of the disciplines. This article discusses the cloud-native sample policy for the Security Baseline
discipline.

NOTE
Microsoft is in no position to dictate corporate or IT policy. This article will help you prepare for an internal policy review. It is
assumed that this sample policy will be extended, validated, and tested against your corporate policy before attempting to
use it. Any use of this sample policy as-is is discouraged.

Policy alignment
This sample policy synthesizes a cloud-native scenario, meaning that the tools and platforms provided by Azure are
sufficient to manage business risks involved in a deployment. In this scenario, it is assumed that a simple
configuration of the default Azure services provides sufficient asset protection.

Cloud security and compliance


Security is integrated into every aspect of Azure, offering unique security advantages derived from global security
intelligence, sophisticated customer-facing controls, and a secure, hardened infrastructure. This powerful
combination helps protect your applications and data, support your compliance efforts, and provide cost-effective
security for organizations of all sizes. This approach creates a strong starting position for any security policy, but
does not negate the need for equally strong security practices related to the security services being used.
Built-in security controls
It's hard to maintain a strong security infrastructure when security controls are not intuitive and need to be
configured separately. Azure includes built-in security controls across a variety of services that help you protect
data and workloads quickly and manage risk across hybrid environments. Integrated partner solutions also let you
easily transition existing protections to the cloud.
Cloud-native identity policies
Identity is becoming the new boundary control plane for security, taking over that role from the traditional
network-centric perspective. Network perimeters have become increasingly porous and that perimeter defense
cannot be as effective as it was before the advent of bring your own device (BYOD ) and cloud applications. Azure
identity management and access control enable seamless, secure access to all your applications.
A sample cloud-native policy for identity across cloud and on-premises directories, could include requirements like
the following:
Authorized access to resources with role-based access control (RBAC ), multi-factor authentication, and single
sign-on (SSO ).
Quick mitigation of user identities suspected of compromise.
Just-in-time (JIT), just-enough access granted on a task-by-task basis to limit exposure of overprivileged admin
credentials.
Extended user identity and access to policies across multiple environments through Azure Active Directory.
While it is important to understand Identity Baseline in the context of Security Baseline, the Five Disciplines of
Cloud Governance calls out Identity Baseline as its own discipline, separate from Security Baseline.
Network access policies
Network control includes the configuration, management, and securing of network elements such as virtual
networking, load balancing, DNS, and gateways. The controls provide a means for services to communicate and
interoperate. Azure includes a robust and secure networking infrastructure to support your application and service
connectivity requirements. Network connectivity is possible between resources located in Azure, between on-
premises and Azure hosted resources, and to and from the internet and Azure.
A cloud-native policy for network controls may include requirements like the following:
Hybrid connections to on-premises resources, might not be allowed in a cloud-native policy. Should a hybrid
connection prove necessary, a more robust Enterprise Security Policy sample would be a more relevant
reference.
Users can establish secure connections to and within Azure using virtual networks and network security groups.
Native Windows Azure Firewall protects hosts from malicious network traffic by limited port access. A good
example of this policy is a requirement to block (or not enable) traffic directly to a VM over SSH/RDP.
Services like the Azure Application Gateway web application firewall (WAF ) and Azure DDoS Protection
safeguard applications and ensure availability for virtual machines running in Azure. These features should not
be disabled.
Data protection
One of the keys to data protection in the cloud is accounting for the possible states in which your data may occur,
and what controls are available for each state. For the purpose of Azure data security and encryption best practices,
recommendations focus on the following data states:
Data encryption controls are built into services from virtual machines to storage and SQL Database.
As data moves between clouds and customers, it can be protected using industry-standard encryption
protocols.
Azure Key Vault enables users to safeguard and control cryptographic keys, passwords, connection strings and
certificates used by cloud apps and services.
Azure Information Protection will help classify, label, and protect your sensitive data within apps.
While these features are built into Azure, each of the above requires configuration and could increase costs.
Alignment of each cloud-native feature with a data classification strategy is highly suggested.
Security monitoring
Security monitoring is a proactive strategy that audits your resources to identify systems that do not meet
organizational standards or best practices. Azure Security Center provides unified Security Baseline and advanced
threat protection across hybrid cloud workloads. With Security Center, you can apply security policies across your
workloads, limit your exposure to threats, and detect and respond to attacks, including:
Unified view of security across all on-premises and cloud workloads with Azure Security Center.
Continuous monitoring and security assessments to ensure compliance and remediate any vulnerabilities.
Interactive tools and contextual threat intelligence for streamlined investigation.
Extensive logging and integration with existing security information.
Reduces the need for expensive, nonintegrated, one off security solutions.
Extending cloud-native policies
Using the cloud can reduce some of the security burden. Microsoft provides physical security for Azure datacenters
and helps protect the cloud platform against infrastructure threats such as a DDoS attack. Given that Microsoft has
thousands of cybersecurity specialists working on security every day, the resources to detect, prevent, or mitigate
cyberattacks are considerable. In fact, while organizations used to worry about whether the cloud was secure, most
now understand that the level of investment in people and specialized infrastructure made by vendors like
Microsoft makes the cloud more secure than most on-premises datacenters. Using the cloud can reduce some of
the security burden. Microsoft provides physical security for Azure datacenters and helps protect the cloud
platform against infrastructure threats such as a DDoS attack. Given that Microsoft has thousands of cybersecurity
specialists working on security every day, the resources to detect, prevent, or mitigate cyberattacks are
considerable. In fact, while organizations used to worry about whether the cloud was secure, most now understand
that the level of investment in people and specialized infrastructure made by vendors like Microsoft makes the
cloud more secure than most on-premises datacenters.
Even with this investment in a cloud-native Security Baseline, it is suggested that any Security Baseline policy
extend the default cloud-native policies. The following are examples of extended policies that should be considered,
even in a cloud-native environment:
Secure VMs. Security should be every organization's top priority, and doing it effectively requires several
things. You must assess your security state, protect against security threats, and then detect and respond rapidly
to threats that occur.
Protect VM contents. Setting up regular automated backups is essential to protect against user errors. This
isn't enough, though; you must also make sure that your backups are safe from cyberattacks and are available
when you need them.
Monitor applications. This pattern encompasses several tasks, including getting insight into the health of your
VMs, understanding interactions among them, and establishing ways to monitor the applications these VMs
run. All of these tasks are essential in keeping your applications running around the clock.
Secure and Audit data access. Organizations should audit all data access and use advanced machine learning
capabilities to call out deviations from regular access patterns.
Failover practice. Cloud operations that have low tolerances for failure must be able to fail over or recover
from a cybersecurity or platform incident. These procedures must not simply be documented, but should be
practiced quarterly.

Next steps
Now that you've reviewed the sample Security Baseline policy for cloud-native solutions, return to the policy
review guide to start building on this sample to create your own policies for cloud adoption.
Build your own policies using the policy review guide
Microsoft Security Guidance
5 minutes to read • Edit Online

Tools
Microsoft introduced the Service Trust Platform and Compliance Manager to help with the following:
Overcome compliance management challenges.
Fulfill responsibilities of meeting regulatory requirements.
Conduct self-service audits and risk assessments of enterprise cloud service utilization.
These tools are designed to help organizations meet complex compliance obligations and improve data protection
capabilities when choosing and using Microsoft Cloud services.
Service Trust Platform (STP ) provides in-depth information and tools to help meet your needs for using
Microsoft Cloud services, including Azure, Office 365, Dynamics 365, and Windows. STP is a one-stop shop for
security, regulatory, compliance, and privacy information related to the Microsoft Cloud. It is where we publish the
information and resources needed to perform self-service risk assessments of cloud services and tools. STP was
created to help track regulatory compliance activities within Azure, including:
Compliance Manager: Compliance Manager, a workflow -based risk assessment tool in the Microsoft Service
Trust Platform, enables you to track, assign, and verify your organization's regulatory compliance activities
related to Microsoft Cloud services, such as Office 365, Dynamics 365 and Azure. You can find more details in
the next section.
Trust documents: Currently there are three categories of guides that provide you with abundant resources to
assess Microsoft Cloud; learn about Microsoft operations in security, compliance, and privacy; and help you act
on improving your data protection capabilities. These include:
Audit reports: Audit reports allow you to stay current on the latest privacy, security, and compliance-related
information for Microsoft Cloud services. This includes ISO, SOC, FedRAMP and other audit reports, bridge
letters, and materials related to independent third-party audits of Microsoft Cloud services such as Azure,
Office 365, Dynamics 365, and others.
Data protection guides: Data protection guides provide information about how Microsoft Cloud services
protect your data, and how you can manage cloud data security and compliance for your organization. This
includes deep-dive white papers that provide details on how Microsoft designs and operates cloud services,
FAQs, reports of end-of-year security assessments, penetration test results, and guidance to help you conduct
risk assessment and improve your data protection capabilities.
Azure security and compliance blueprint: Blueprints provide resources to assist you in building and
launching cloud-powered applications that help you comply with stringent regulations and standards. With
more certifications than any other cloud provider, you can have confidence deploying your critical workloads to
Azure, with blueprints that include:
Industry-specific overview and guidance.
Customer responsibilities matrix.
Reference architectures with threat models.
Control implementation matrices.
Automation to deploy reference architectures.
Privacy resources: Documentation for Data Protection Impact Assessments, Data Subject Requests
(DSRs), and Data Breach Notification is provided to incorporate into your own accountability program in
support of the General Data Protection Regulation (GDPR ).
Get started with GDPR: Microsoft products and services help organizations meet GDPR requirements while
collecting or processing personal data. STP is designed to give you information about the capabilities in
Microsoft services that you can use to address specific requirements of the GDPR. The documentation can help
your GDPR accountability and your understanding of technical and organizational measures. Documentation
for Data Protection Impact Assessments, Data Subject Requests (DSRs), and Data Breach Notification is
provided to incorporate into your own accountability program in support of the GDPR.
Data subject requests: The GDPR grants individuals (or data subjects) certain rights in connection with
the processing of their personal data. This includes the right to correct inaccurate data, erase data, or
restrict its processing, as well as receive their data and fulfill a request to transmit their data to another
controller.
Data breach: The GDPR mandates notification requirements for data controllers and processors in the
event of a breach of personal data. STP provides you with information about how Microsoft tries to
prevent breaches in the first place, how Microsoft detects a breach, and how Microsoft will respond in
the event of a breach and notify you as a data controller.
Data protection impact assessment: Microsoft helps controllers complete GDPR Data Protection
Impact Assessments. The GDPR provides an in-exhaustive list of cases in which DPIAs must be carried
out, such as automated processing for the purposes of profiling and similar activities; processing on a
large scale of special categories of personal data, and systematic monitoring of a publicly accessible area
on a large scale.
Other resources: In addition to tools guidance discussed in the above sections, STP also provides other
resources including regional compliance, additional resources for the Security and Compliance Center,
and frequently asked questions about the Service Trust Platform, Compliance Manager, and
privacy/GDPR.
Regional compliance: STP provides numerous compliance documents and guidance for Microsoft online
services to meet compliance requirements for different regions including Czech Republic, Poland, and Romania.

Unique intelligent insights


As the volume and complexity of security signals grow, determining if those signals are credible threats, and then
acting, takes far too long. Microsoft offers an unparalleled breadth of security intelligence delivered at cloud scale
to help quickly detect and remediate threats. For more information, see the Azure Security Center overview.

Azure threat intelligence


By using the threat intelligence option available in Security Center, IT administrators can identify security threats
against the environment. For example, they can identify whether a particular computer is part of a botnet.
Computers can become nodes in a botnet when attackers illicitly install malware that secretly connects the
computer to the command and control. Threat intelligence can also identify potential threats coming from
underground communication channels, such as the dark web.
To build this threat intelligence, Security Center uses data that comes from multiple sources within Microsoft.
Security Center uses this to identify potential threats against your environment. The Threat intelligence pane is
composed of three major options:
Detected threat types
Threat origin
Threat intelligence map

Machine learning in Azure Security Center


Azure Security Center deeply analyzes a wealth of data from a variety of Microsoft and partner solutions to help
you achieve greater security. To take advantage of this data, the company use data science and machine learning
for threat prevention, detection, and eventually investigation.
Broadly, Azure Machine Learning helps achieve two outcomes:

Next-generation detection
Attackers are increasingly automated and sophisticated. They use data science too. They reverse-engineer
protections and build systems that support mutations in behavior. They masquerade their activities as noise, and
learn quickly from mistakes. Machine learning helps us respond to these developments.

Simplified Security Baseline


Making effective security decisions is not easy. It requires security experience and expertise. While some large
organizations have such experts on staff, many companies don't. Azure Machine Learning enables customers to
benefit from the wisdom of other organizations when making security decisions.

Behavioral analytics
Behavioral analytics is a technique that analyzes and compares data to a collection of known patterns. However,
these patterns are not simple signatures. They are determined through complex machine learning algorithms that
are applied to massive data sets. They are also determined through careful analysis of malicious behaviors by
expert analysts. Azure Security Center can use behavioral analytics to identify compromised resources based on
analysis of virtual machine logs, virtual network device logs, fabric logs, crash dumps, and other sources.
Security Baseline tools in Azure
2 minutes to read • Edit Online

Security Baseline is one of the Five Disciplines of Cloud Governance. This discipline focuses on ways of
establishing policies that protect the network, assets, and most importantly the data that will reside on a cloud
provider's solution. Within the Five Disciplines of Cloud Governance, the Security Baseline discipline involves
classification of the digital estate and data. It also involves documentation of risks, business tolerance, and
mitigation strategies associated with the security of data, assets, and networks. From a technical perspective, this
discipline also includes involvement in decisions regarding encryption, network requirements, hybrid identity
strategies, and tools to automate enforcement of security policies across resource groups.
The following list of Azure tools can help mature the policies and processes that support Security Baseline.

AZURE PORTAL
AND AZURE AZURE
RESOURCE AZURE KEY SECURITY AZURE
TOOL MANAGER VAULT AZURE AD AZURE POLICY CENTER MONITOR

Apply access Yes No Yes No No No


controls to
resources and
resource
creation

Secure virtual Yes No No Yes No No


networks

Encrypt No Yes No No No No
virtual drives

Encrypt PaaS No Yes No No No No


storage and
databases

Manage No No Yes No No No
hybrid
identity
services

Restrict No No No Yes No No
allowed types
of resource

Enforce geo- No No No Yes No No


regional
restrictions

Monitor No No No No Yes Yes


security
health of
networks and
resources
AZURE PORTAL
AND AZURE AZURE
RESOURCE AZURE KEY SECURITY AZURE
TOOL MANAGER VAULT AZURE AD AZURE POLICY CENTER MONITOR

Detect No No No No Yes Yes


malicious
activity

Preemptively No No No No Yes No
detect
vulnerabilities

Configure Yes No No No No No
backup and
disaster
recovery

For a complete list of Azure security tools and services, see Security services and technologies available on Azure.
It is also common for customers to use third-party tools for facilitating Security Baseline activities. For more
information, see the article Integrate security solutions in Azure Security Center.
In addition to security tools, the Microsoft Trust Center contains extensive guidance, reports, and related
documentation that can help you perform risk assessments as part of your migration planning process.
Identity Baseline is one of the Five Disciplines of Cloud Governance within the Cloud Adoption Framework governance model.
Identity is increasingly considered the primary security perimeter in the cloud, which is a shift from the traditional focus on
network security. Identity services provide the core mechanisms supporting access control and organization within IT
environments, and the Identity Baseline discipline complements the Security Baseline discipline by consistently applying
authentication and authorization requirements across cloud adoption efforts.
NOTE

Identity Baseline governance does not replace the existing IT teams, processes, and procedures that allow your organization to
manage and secure identity services. The primary purpose of this discipline is to identify potential identity-related business risks
and provide risk-mitigation guidance to IT staff that are responsible for implementing, maintaining, and operating your identity
management infrastructure. As you develop governance policies and processes make sure to involve relevant IT teams in your
planning and review processes.
This section of the Cloud Adoption Framework outlines the approach to developing an Identity Baseline discipline as part of
your cloud governance strategy. The primary audience for this guidance is your organization's cloud architects and other
members of your cloud governance team. However, the decisions, policies, and processes that emerge from this discipline
should involve engagement and discussions with relevant members of the IT teams responsible for implementing and managing
your organization's identity management solutions.
If your organization lacks in-house expertise in Identity Baseline and security, consider engaging external consultants as a part of
this discipline. Also consider engaging Microsoft Consulting Services, the Microsoft FastTrack cloud adoption service, or other
external cloud adoption partners to discuss concerns related to this discipline.

Policy statements
Actionable policy statements and the resulting architecture requirements serve as the foundation of an Identity Baseline
discipline. To see policy statement samples, see the article on Identity Baseline Policy Statements. These samples can serve as a
starting point for your organization's governance policies.
C A U T IO N

The sample policies come from common customer experiences. To better align these policies to specific cloud governance needs,
execute the following steps to create policy statements that meet your unique business needs.

Develop governance policy statements


The following six steps offer examples and potential options to consider when developing Identity Baseline governance. Use
each step as a starting point for discussions within your cloud governance team and with affected business, and IT teams across
your organization to establish the policies and processes needed to manage identity-related risks.

Identity Baseline Template


Download the template for documenting an Identity Baseline discipline

Business Risks
Understand the motives and risks commonly associated with the Identity Baseline discipline.
Indicators and Metrics
Indicators to understand if it is the right time to invest in the Identity Baseline discipline.

Policy adherence processes


Suggested processes for supporting policy compliance in the Identity Baseline discipline.

Maturity
Aligning Cloud Management maturity with phases of cloud adoption.

Toolchain
Azure services that can be implemented to support the Identity Baseline discipline.

Next steps
Get started by evaluating business risks in a specific environment.
Understand business risks
Identity Baseline template
2 minutes to read • Edit Online

The first step to implementing change is communicating the desired change. The same is true when changing
governance practices. The template below serves as a starting point for documenting and communicating policy
statements that govern identity services in the cloud.
As your discussions progress, use this template's structure as a model for capturing the business risks, risk
tolerances, compliance processes, and tooling needed to define your organization's Identity Baseline policy
statements.

IMPORTANT
This template is a limited sample. Before updating this template to reflect your requirements, you should review the
subsequent steps for defining an effective Identity Baseline discipline within your cloud governance strategy.

Download governance discipline template

Next steps
Solid governance practices start with an understanding of business risk. Review the article on business risks and
begin to document the business risks that align with your current cloud adoption plan.
Understand business risks
Identity Baseline motivations and business risks
2 minutes to read • Edit Online

This article discusses the reasons that customers typically adopt an Identity Baseline discipline within a cloud
governance strategy. It also provides a few examples of business risks that drive policy statements.

Identity Baseline relevancy


Traditional on-premises directories are designed to allow businesses to strictly control permissions and policies
for users, groups, and roles within their internal networks and datacenters. These directories typically support
single-tenant implementations, with services applicable only within the on-premises environment.
Cloud identity services expand an organization's authentication and access control capabilities to the internet.
They support multitenancy and can be used to manage users and access policy across cloud applications and
deployments. Public cloud platforms have cloud-native identity services supporting management and
deployment tasks and are capable of varying levels of integration with your existing on-premises identity
solutions. All of these features can result in cloud identity policy being more complicated than your traditional on-
premises solutions require.
The importance of the Identity Baseline discipline to your cloud deployment will depend on the size of your team
and need to integrate your cloud-based identity solution with an existing on-premises identity service. Initial test
deployments may not require much in the way of user organization or management, but as your cloud estate
matures, you will likely need to support more complicated organizational integration and centralized
management.

Business risk
The Identity Baseline discipline attempts to address core business risks related to identity services and access
control. Work with your business to identify these risks and monitor each of them for relevance as you plan for
and implement your cloud deployments.
Risks will differ between organization, but the following serve as common identity-related risks that you can use
as a starting point for discussions within your cloud governance team:
Unauthorized access. Sensitive data and resources that can be accessed by unauthorized users can lead to
data leaks or service disruptions, violating your organization's security perimeter and risking business or legal
liabilities.
Inefficiency due to multiple identity solutions. Organizations with multiple identity services tenants can
require multiple accounts for users. This can lead to inefficiency for users who need to remember multiple sets
of credentials and for IT in managing accounts across multiple systems. If user access assignments are not
updated across identity solutions as staff, teams, and business goals change, your cloud resources may be
vulnerable to unauthorized access or users unable to access required resources.
Inability to share resources with external partners. Difficulty adding external business partners to your
existing identity solutions can prevent efficient resource sharing and business communication.
On-premises identity dependencies. Legacy authentication mechanisms or third-party multi-factor
authentication might not be available in the cloud, requiring either migrating workloads to be retooled, or
additional identity services to be deployed to the cloud. Either requirement could delay or prevent migration,
and increase costs.

Next steps
Using the Cloud Management template, document business risks that are likely to be introduced by the current
cloud adoption plan.
Once an understanding of realistic business risks is established, the next step is to document the business's
tolerance for risk and the indicators and key metrics to monitor that tolerance.
Understand indicators, metrics, and risk tolerance
Identity Baseline metrics, indicators, and risk
tolerance
4 minutes to read • Edit Online

This article will help you quantify business risk tolerance as it relates to Identity Baseline. Defining metrics and
indicators helps you create a business case for making an investment in maturing the Identity Baseline discipline.

Metrics
Identity Baseline focuses on identifying, authenticating, and authorizing individuals, groups of users, or automated
processes, and providing them appropriate access to resources in your cloud deployments. As part of your risk
analysis you'll want to gather data related to your identity services to determine how much risk you face, and how
important investment in Identity Baseline governance is to your planned cloud deployments.
The following are examples of useful metrics that you should gather to help evaluate risk tolerance within the
Identity Baseline discipline:
Identity systems size. Total number of users, groups, or other objects managed through your identity
systems.
Overall size of directory services infrastructure. Number of directory forests, domains, and tenants used
by your organization.
Dependency on legacy or on-premises authentication mechanisms. Number of workloads that depend
on legacy or third-party or multi-factor authentication mechanisms.
Extent of cloud-deployed directory services. Number of directory forests, domains, and tenants you've
deployed to the cloud.
Cloud-deployed Active Directory servers. Number of Active Directory servers deployed to the cloud.
Cloud-deployed organizational units. Number of Active Directory organizational units (OUs) deployed to
the cloud.
Extent of federation. Number of Identity Baseline systems federated with your organization's systems.
Elevated users. Number of user accounts with elevated access to resources or management tools.
Use of role-based access control. Number of subscriptions, resource groups, or individual resources not
managed through role-based access control (RBAC ) via groups.
Authentication claims. Number of successful and failed user authentication attempts.
Authorization claims. Number of successful and failed attempts by users to access resources.
Compromised accounts. Number of user accounts that have been compromised.

Risk tolerance indicators


Risks related to Identity Baseline are largely related to the complexity of your organization's identity infrastructure.
If all your users and groups are managed using a single directory or cloud-native identity provider using minimal
integration with other services, your risk level will likely be small. However, as your business needs grow your
Identity Baseline systems may need to support more complicated scenarios, such as multiple directories to
support your internal organization or federation with external identity providers. As these systems become more
complex, risk increases.
In the early stages of cloud adoption, work with your IT security team and business stakeholders to identify
business risks related to identity, then determine an acceptable baseline for identity risk tolerance. This section of
the Cloud Adoption Framework provides examples, but the detailed risks and baselines for your company or
deployments may be different.
Once you have a baseline, establish minimum benchmarks representing an unacceptable increase in your
identified risks. These benchmarks act as triggers for when you need to take action to address these risks. The
following are a few examples of how identity related metrics, such as those discussed above, can justify an
increased investment in the Identity Baseline discipline.
User account number trigger. A company with more than x users, groups, or other objects managed in your
identity systems could benefit from investment in the Identity Baseline discipline to ensure efficient governance
over a large number of accounts.
On-premises identity dependency trigger. A company planning to migrate workloads to the cloud that
require legacy authentication capabilities or third-party multi-factor authentication should invest in the Identity
Baseline discipline to reduce risks related to refactoring or additional cloud infrastructure deployment.
Directory services complexity trigger. A company maintaining more than x number of_ individual forests,
domains, or directory tenants should invest in the Identity Baseline discipline to reduce risks related with
account management and the efficiency issues related to multiple user credentials spread across multiple
systems.
Cloud-hosted directory services trigger. A company hosting x Active Directory server virtual machines
(VMs) hosted in the cloud, or having x organizational units (OUs) managed on these cloud-based servers, can
benefit from investment in the Identity Baseline discipline to optimize integration with any on-premises or
other external identity services.
Federation trigger. A company implementing identity federation with x external Identity Baseline systems can
benefit from investing in the Identity Baseline discipline to ensure consistent organizational policy across
federation members.
Elevated access trigger. A company with more than x% of users with elevated permissions to management
tools and resources should consider investing in the Identity Baseline discipline to minimize the risk of
inadvertent overprovisioning of access to users.
RBAC trigger. A company with under x% of resources using role-based access control methods should
consider investing in the Identity Baseline discipline to identify optimized ways to assign user access to
resources.
Authentication failure trigger. A company where authentication failures represent more than x% of attempts
should invest in the Identity Baseline discipline to ensure that authentication methods are not under external
attack, and that users are able to use the authentication methods correctly.
Authorization failure trigger. A company where access attempts are rejected more than x% of the time
should invest in the Identity Baseline discipline to improve the application and updating of access controls, and
identify potentially malicious access attempts.
Compromised account trigger. A company with more than 1 compromised account should invest in the
Identity Baseline discipline to improve the strength and security of authentication mechanisms and improve
mechanisms to remediate risks related to compromised accounts.
The exact metrics and triggers you use to gauge risk tolerance and the level of investment in the Identity Baseline
discipline will be specific to your organization, but the examples above should serve as a useful base for discussion
within your cloud governance team.

Next steps
Using the Cloud Management template, document metrics and tolerance indicators that align to the current cloud
adoption plan.
Review sample Identity Baseline policies as a starting point to develop policies that address specific business risks
that align with your cloud adoption plans.
Review sample policies
Identity Baseline sample policy statements
3 minutes to read • Edit Online

Individual cloud policy statements are guidelines for addressing specific risks identified during your risk
assessment process. These statements should provide a concise summary of risks and plans to deal with them.
Each statement definition should include these pieces of information:
Technical risk: A summary of the risk this policy will address.
Policy statement: A clear summary explanation of the policy requirements.
Design options: Actionable recommendations, specifications, or other guidance that IT teams and developers
can use when implementing the policy.
The following sample policy statements address common identity-related business risks. These statements are
examples you can reference when drafting policy statements to address your organization's needs. These examples
are not meant to be proscriptive, and there are potentially several policy options for dealing with each identified
risk. Work closely with business and IT teams to identify the best policies for your unique set of risks.

Lack of access controls


Technical risk: Insufficient or ad hoc access control settings can introduce risk of unauthorized access to sensitive
or mission-critical resources.
Policy statement: All assets deployed to the cloud should be controlled using identities and roles approved by
current governance policies.
Potential design options: Azure Active Directory conditional access is the default access control mechanism in
Azure.

Overprovisioned access
Technical risk: Users and groups with control over resources beyond their area of responsibility can result in
unauthorized modifications leading to outages or security vulnerabilities.
Policy statement: The following policies will be implemented:
A least-privilege access model will be applied to any resources involved in mission-critical applications or
protected data.
Elevated permissions should be an exception, and any such exceptions must be recorded with the cloud
governance team. Exceptions will be audited regularly.
Potential design options: Consult the Azure Identity Management best practices to implement a role-based
access control (RBAC ) strategy that restricts access based on the need to know and least-privilege security
principles.

Lack of shared management accounts between on-premises and the


cloud
Technical risk: IT management or administrative staff with accounts on your on-premises Active Directory may
not have sufficient access to cloud resources may not be able to efficiently resolve operational or security issues.
Policy statement: All groups in the on-premises Active Directory infrastructure that have elevated privileges
should be mapped to an approved RBAC role.
Potential design options: Implement a hybrid identity solution between your cloud-based Azure Active
Directory and your on-premises Active Directory, and add the required on-premises groups to the RBAC roles
necessary to do their work.

Weak authentication mechanisms


Technical risk: Identity management systems with insufficiently secure user authentication methods, such as
basic user/password combinations, can lead to compromised or hacked passwords, providing a major risk of
unauthorized access to secure cloud systems.
Policy statement: All accounts are required to sign in to secured resources using a multi-factor authentication
method.
Potential design options: For Azure Active Directory, implement Azure Multi-Factor Authentication as part of
your user authorization process.

Isolated identity providers


Technical risk: Incompatible identity providers can result in the inability to share resources or services with
customers or other business partners.
Policy statement: Deployment of any applications that require customer authentication must use an approved
identity provider that is compatible with the primary identity provider for internal users.
Potential design options: Implement Federation with Azure Active Directory between your internal and
customer identity providers or use Azure Active Directory B2B

Identity reviews
Technical risk: As business changes over time, the addition of new cloud deployments or other security concerns
can increase the risks of unauthorized access to secure resources.
Policy statement: Cloud Governance processes must include quarterly review with identity management teams
to identify malicious actors or usage patterns that should be prevented by cloud asset configuration.
Potential design options: Establish a quarterly security review meeting that includes both governance team
members and IT staff responsible for managing identity services. Review existing security data and metrics to
establish gaps in current identity management policy and tooling, and update policy to remediate any new risks.

Next steps
Use the samples mentioned in this article as a starting point for developing policies to address specific business
risks that align with your cloud adoption plans.
To begin developing your own custom policy statements related to Identity Baseline, download the Identity
Baseline template.
To accelerate adoption of this discipline, choose the actionable governance guide that most closely aligns with your
environment. Then modify the design to incorporate your specific corporate policy decisions.
Building on risks and tolerance, establish a process for governing and communicating Identity Baseline policy
adherence.
Establish policy compliance processes
Identity Baseline policy compliance processes
4 minutes to read • Edit Online

This article discusses an approach to policy adherence processes that govern Identity Baseline. Effective
governance of identity starts with recurring manual processes that guide identity policy adoption and revisions.
This requires regular involvement of the cloud governance team and interested business and IT stakeholders to
review and update policy and ensure policy compliance. In addition, many ongoing monitoring and enforcement
processes can be automated or supplemented with tooling to reduce the overhead of governance and allow for
faster response to policy deviation.

Planning, review, and reporting processes


Identity management tools offer capabilities and features that greatly assist user management and access control
within a cloud deployment. However, they also require well thought out processes and policies to support your
organization's goals. The following is a set of example processes commonly involved in the Identity Baseline
discipline. Use these examples as a starting point when planning the processes that will allow you to continue to
update identity policy based on business change and feedback from the IT teams tasked with turning governance
guidance into action.
Initial risk assessment and planning: As part of your initial adoption of the Identity Baseline discipline, identify
your core business risks and tolerances related to cloud identity management. Use this information to discuss
specific technical risks with members of your IT teams responsible for managing identity services and develop a
baseline set of security policies for mitigating these risks to establish your initial governance strategy.
Deployment planning: Before any deployment, review the access needs for any workloads and develop an
access control strategy that aligns with established corporate identity policy. Document any gaps between needs
and current policy to determine if policy updates are required, and modify policy as needed.
Deployment testing: As part of the deployment, the cloud governance team, in cooperation with IT teams
responsible for identity services, will be responsible for reviewing the deployment to validate identity policy
compliance.
Annual planning: On an annual basis, perform a high-level review of identity management strategy. Explore
planned changes to the identity services environment and updated cloud adoption strategies to identify potential
risk increase or need to modify current identity infrastructure patterns. Also use this time to review the latest
identity management best practices and integrate these into your policies and review processes.
Quarterly planning: On a quarterly basis perform a general review of identity and access control audit data, and
meet with the cloud adoption teams to identify any potential new risks or operational requirements that would
require updates to identity policy or changes in access control strategy.
This planning process is also a good time to evaluate the current membership of your cloud governance team for
knowledge gaps related to new or changing policy and risks related to identity. Invite relevant IT staff to participate
in reviews and planning as either temporary technical advisors or permanent members of your team.
Education and training: On a bimonthly basis, offer training sessions to make sure IT staff and developers are
up-to-date on the latest identity policy requirements. As part of this process review and update any
documentation, guidance, or other training assets to ensure they are in sync with the latest corporate policy
statements.
Monthly audit and reporting reviews: On a monthly basis, perform an audit on all cloud deployments to assure
their continued alignment with identity policy. Use this review to check user access against business change to
ensure users have correct access to cloud resources, and ensure access strategies such as RBAC are being followed
consistently. Identify any privileged accounts and document their purpose. This review process produces a report
for the cloud strategy team and each cloud adoption team detailing overall adherence to policy. The report is also
stored for auditing and legal purposes.

Processes for ongoing monitoring


Determining if your identity governance strategy is successful depends on visibility into the current and past state
of your identity systems. Without the ability to analyze your cloud deployment's relevant metrics and related data,
you cannot identify changes in your risks or detect violations of your risk tolerances. The ongoing governance
processes discussed above require quality data to ensure policy can be modified to support the changing needs of
your business.
Ensure that your IT teams have implemented automated monitoring systems for your identity services that
capture the logs and audit information you need to evaluate risk. Be proactive in monitoring these systems to
ensure prompt detection and mitigation of potential policy violation, and ensure any changes to your identity
infrastructure are reflected in your monitoring strategy.

Violation triggers and enforcement actions


Violations of identity policy can result in unauthorized access to sensitive data and lead to serious disruption of
mission-critical application and services. When violations are detected, you should take actions to realign with
policy as soon as possible. Your IT team can automate most violation triggers using the tools outlined in the
Identity Baseline toolchain.
The following triggers and enforcement actions provide examples you can reference when planning how to use
monitoring data to resolve policy violations:
Suspicious activity detected: User logins detected from anonymous proxy IP addresses, unfamiliar locations,
or successive logins from impossibly distant geographical locations may indicate a potential account breach or
malicious access attempt. Login will be blocked until user identity can be verified and password reset.
Leaked user credentials: Accounts that have their username and password leaked to the internet will be
disabled until user identity can be verified and password reset.
Insufficient access controls detected: Any protected assets where access restrictions do not meet security
requirements will have access blocked until the resource is brought into compliance.

Next steps
Using the Cloud Management template, document the processes and triggers that align to the current cloud
adoption plan.
For guidance on executing cloud management policies in alignment with adoption plans, see the article on
discipline improvement.
Identity Baseline discipline improvement
Identity Baseline discipline improvement
6 minutes to read • Edit Online

The Identity Baseline discipline focuses on ways of establishing policies that ensure consistency and continuity of
user identities regardless of the cloud provider that hosts the application or workload. Within the Five Disciplines
of Cloud Governance, Identity Baseline includes decisions regarding the Hybrid Identity Strategy, evaluation and
extension of identity repositories, implementation of single sign-on (same sign-on), auditing and monitoring for
unauthorized use or malicious actors. In some cases, it may also involve decisions to modernize, consolidate, or
integrate multiple identity providers.
This article outlines some potential tasks your company can engage in to better develop and mature the Identity
Baseline discipline. These tasks can be broken down into planning, building, adopting, and operating phases of
implementing a cloud solution, which are then iterated on allowing the development of an incremental approach
to cloud governance.

Figure 1 - Adoption phases of the incremental approach to cloud governance.


It's impossible for any one document to account for the requirements of all businesses. As such, this article
outlines suggested minimum and potential example activities for each phase of the governance maturation
process. The initial objective of these activities is to help you build a Policy MVP and establish a framework for
incremental policy improvement. Your cloud governance team will need to decide how much to invest in these
activities to improve your Identity Baseline governance capabilities.
Cau t i on

Neither the minimum or potential activities outlined in this article are aligned to specific corporate policies or
third-party compliance requirements. This guidance is designed to help facilitate the conversations that will lead to
alignment of both requirements with a cloud governance model.

Planning and readiness


This phase of governance maturity bridges the divide between business outcomes and actionable strategies.
During this process, the leadership team defines specific metrics, maps those metrics to the digital estate, and
begins planning the overall migration effort.
Minimum suggested activities:
Evaluate your Identity toolchain options and implement a hybrid strategy that is appropriate to your
organization.
Develop a draft Architecture Guidelines document and distribute to key stakeholders.
Educate and involve the people and teams affected by the development of architecture guidelines.
Potential activities:
Define roles and assignments that will govern identity and access management in the cloud.
Define your on-premises groups and map to corresponding cloud-based roles.
Inventory identity providers (including database-driven identities used by custom applications).
Consolidate and integrate identity providers where duplication exists, to simplify the overall identity solution
and reduce risk.
Evaluate hybrid compatibility of existing identity providers.
For identity providers that are not hybrid compatible, evaluate consolidation or replacement options.

Build and predeployment


Several technical and nontechnical prerequisites are required to successfully migrate an environment. This process
focuses on the decisions, readiness, and core infrastructure that proceeds a migration.
Minimum suggested activities:
Consider a pilot test before implementing your Identity toolchain, making sure it simplifies the user experience
as much as possible.
Apply feedback from pilot tests into the predeployment. Repeat until results are acceptable.
Update the Architecture Guidelines document to include deployment and user adoption plans, and distribute to
key stakeholders.
Consider establishing an early adopter program and rolling out to a limited number of users.
Continue to educate the people and teams most affected by the architecture guidelines.
Potential activities:
Evaluate your logical and physical architecture and determine a Hybrid Identity Strategy.
Map identity access management policies, such as login ID assignments, and choose the appropriate
authentication method for Azure AD.
If federated, enable tenant restrictions for administrative accounts.
Integrate your on-premises and cloud directories.
Consider using the following access models:
Least-privilege access model.
Privileged Identity Baseline access model.
Finalize all preintegration details and review Identity Best Practices.
Enable single identity, single sign-on (SSO ), or seamless SSO.
Configure multi-factor authentication for administrators.
Consolidate or integrate identity providers, where necessary.
Implement tooling necessary to centralize management of identities.
Enable just-in-time (JIT) access and role change alerting.
Conduct a risk analysis of key admin activities for assigning to built-in roles.
Consider an updated rollout of stronger authentication for all users.
Enable Privileged Identity Baseline (PIM ) for JIT (using time-limited activation) for additional
administrative roles.
Separate user accounts from global admin accounts (to make sure that administrators do not
inadvertently open emails or run programs associated with their global admin accounts).

Adopt and migrate


Migration is an incremental process that focuses on the movement, testing, and adoption of applications or
workloads in an existing digital estate.
Minimum suggested activities:
Migrate your Identity toolchain from development to production.
Update the Architecture Guidelines document and distribute to key stakeholders.
Develop educational materials and documentation, awareness communications, incentives, and other programs
to help drive user adoption.
Potential activities:
Validate that the best practices defined during the build predeployment phases are properly executed.
Validate and refine your Hybrid Identity Strategy.
Ensure that each application or workload continues to align with the identity strategy before release.
Validate that single sign-on (SSO ) and seamless SSO is working as expected for your applications.
Reduce or eliminate the number of alternative identity stores.
Scrutinize the need for any in-app or in-database identity stores. Identities that fall outside of a proper identity
provider (first-party or third-party) can represent risk to the application and the users.
Enable conditional access for on-premises federated applications.
Distribute identity across global regions in multiple hubs with synchronization between regions.
Establish central role-based access control (RBAC ) federation.

Operate and post-implementation


Once the transformation is complete, governance and operations must live on for the natural lifecycle of an
application or workload. This phase of governance maturity focuses on the activities that commonly come after the
solution is implemented and the transformation cycle begins to stabilize.
Minimum suggested activities:
Customize your Identity Baseline toolchain based on changes to your organization's changing identity needs.
Automate notifications and reports to alert you of potential malicious threats.
Monitor and report on system usage and user adoption progress.
Report on post-deployment metrics and distribute to stakeholders.
Refine the Architecture Guidelines to guide future adoption processes.
Communicate and continually educate the affected teams on a periodic basis to ensure ongoing adherence to
architecture guidelines.
Potential activities:
Conduct periodic audits of identity policies and adherence practices.
Ensure sensitive user accounts (CEO, CFO, VP, etc) are always enabled for multi-factor authentication and
anomalous login detection.
Scan for malicious actors and data breaches regularly, particularly those related to identity fraud, such as
potential admin account takeovers.
Configure a monitoring and reporting tool.
Consider integrating more closely with security and fraud-prevention systems.
Regularly review access rights for elevated users or roles.
Identify every user who is eligible to activate admin privilege.
Review onboarding, offboarding, and credential update processes.
Investigate increasing levels of automation and communication between identity access management (IAM )
modules.
Consider implementing a development security operations (DevSecOps) approach.
Carry out an impact analysis to gauge results on costs, security, and user adoption.
Periodically produce an impact report that shows the changes in metrics created by the system and estimate
the business impacts of the Hybrid Identity Strategy.
Establish integrated monitoring recommended by the Azure Security Center.

Next steps
Now that you understand the concept of cloud identity governance, examine the Identity Baseline toolchain to
identify Azure tools and features that you'll need when developing the Identity Baseline governance discipline on
the Azure platform.
Identity Baseline toolchain for Azure
Identity Baseline tools in Azure
4 minutes to read • Edit Online

Identity Baseline is one of the Five Disciplines of Cloud Governance. This discipline focuses on ways of
establishing policies that ensure consistency and continuity of user identities regardless of the cloud provider
that hosts the application or workload.
The following tools are included in the discovery guide on Hybrid Identity.
Active Directory (on-premises): Active Directory is the identity provider most frequently used in the
enterprise to store and validate user credentials.
Azure Active Directory: A software as a service (SaaS ) equivalent to Active Directory, capable of federating
with an on-premises Active Directory.
Active Directory (IaaS ): An instance of the Active Directory application running in a virtual machine in Azure.
Identity is the control plane for IT security. So authentication is an organization's access guard to the cloud.
Organizations need an identity control plane that strengthens their security and keeps their cloud apps safe from
intruders.

Cloud authentication
Choosing the correct authentication method is the first concern for organizations wanting to move their apps to
the cloud.
When you choose this method, Azure AD handles users' sign-in process. Coupled with seamless single sign-on
(SSO ), users can sign in to cloud apps without having to reenter their credentials. With cloud authentication, you
can choose from two options:
Azure AD password hash synchronization: The simplest way to enable authentication for on-premises
directory objects in Azure AD. This method can also be used with any method as a back-up failover
authentication method in case your on-premises server goes down.
Azure AD Pass-through Authentication: Provides a persistent password validation for Azure AD
authentication services by using a software agent that runs on one or more on-premises servers.

NOTE
Companies with a security requirement to immediately enforce on-premises user account states, password policies, and
sign-in hours should consider the pass-through Authentication method.

Federated authentication:
When you choose this method, Azure AD passes the authentication process to a separate trusted authentication
system, such as on-premises Active Directory Federation Services (AD FS ) or a trusted third-party federation
provider, to validate the user's password.
The article choosing the right authentication method for Azure Active Directory contains a decision tree to help
you choose the best solution for your organization.
The following table lists the native tools that can help mature the policies and processes that support this
governance discipline.
PASSWORD HASH PASS-THROUGH
SYNCHRONIZATION + AUTHENTICATION +
CONSIDERATION SEAMLESS SSO SEAMLESS SSO FEDERATION WITH AD FS

Where does authentication In the cloud In the cloud after a secure On-premises
happen? password verification
exchange with the on-
premises authentication
agent

What are the on-premises None One server for each Two or more AD FS servers
server requirements beyond additional authentication
the provisioning system: agent Two or more WAP servers
Azure AD Connect? in the perimeter/DMZ
network

What are the requirements None Outbound internet access Inbound internet access to
for on-premises internet from the servers running WAP servers in the
and networking beyond the authentication agents perimeter
provisioning system?
Inbound network access to
AD FS servers from WAP
servers in the perimeter

Network load balancing

Is there an SSL certificate No No Yes


requirement?

Is there a health monitoring Not required Agent status provided by Azure AD Connect Health
solution? Azure Active Directory
admin center

Do users get single sign-on Yes with Seamless SSO Yes with Seamless SSO Yes
to cloud resources from
domain-joined devices
within the company
network?

What sign-in types are UserPrincipalName + UserPrincipalName + UserPrincipalName +


supported? password password password

Windows Integrated Windows Integrated sAMAccountName +


Authentication by using Authentication by using password
Seamless SSO Seamless SSO
Windows Integrated
Alternate login ID Alternate login ID Authentication

Certificate and smart card


authentication

Alternate login ID

Is Windows Hello for Key trust model Key trust model Key trust model
Business supported?
Certificate trust model with Certificate trust model with Certificate trust model
Intune Intune
PASSWORD HASH PASS-THROUGH
SYNCHRONIZATION + AUTHENTICATION +
CONSIDERATION SEAMLESS SSO SEAMLESS SSO FEDERATION WITH AD FS

What are the multi-factor Azure Multi-Factor Azure Multi-Factor Azure Multi-Factor
authentication options? Authentication Authentication Authentication

Custom Controls with Custom Controls with Azure Multi-Factor


conditional access* conditional access* Authentication server

Third-party multi-factor
authentication

Custom Controls with


conditional access*

What user account states Disabled accounts Disabled accounts Disabled accounts
are supported? (up to 30-minute delay)
Account locked out Account locked out

Account expired Account expired

Password expired Password expired

Sign-in hours Sign-in hours

What are the conditional Azure AD conditional access Azure AD conditional access Azure AD conditional access
access options?
AD FS claim rules

Is blocking legacy protocols Yes Yes Yes


supported?

Can you customize the Yes, with Azure AD Premium Yes, with Azure AD Premium Yes
logo, image, and description
on the sign-in pages?

What advanced scenarios Smart password lockout Smart password lockout Multisite low-latency
are supported? authentication system
Leaked credentials reports
AD FS extranet lockout

Integration with third-party


identity systems

NOTE
Custom controls in Azure AD conditional access does not currently support device registration.

Next steps
The Hybrid Identity Digital Transformation Framework whitepaper outlines combinations and solutions for
choosing and integrating each of these components.
The Azure AD Connect tool helps you to integrate your on-premises directories with Azure AD.
Resource Consistency is one of the Five Disciplines of Cloud Governance within the Cloud Adoption Framework governance
model. This discipline focuses on ways of establishing policies related to the operational management of an environment,
application, or workload. IT Operations teams often provide monitoring of applications, workload, and asset performance. They
also commonly execute the tasks required to meet scale demands, remediate performance Service Level Agreement (SLA)
violations, and proactively avoid performance SLA violations through automated remediation. Within the Five Disciplines of
Cloud Governance, Resource Consistency is a discipline that ensures resources are consistently configured in such a way that
they can be discoverable by IT operations, are included in recovery solutions, and can be onboarded into repeatable operations
processes.
NOTE

Resource Consistency governance does not replace the existing IT teams, processes, and procedures that allow your organization
to effectively manage cloud-based resources. The primary purpose of this discipline is to identify potential business risks and
provide risk-mitigation guidance to the IT staff that are responsible for managing your resources in the cloud. As you develop
governance policies and processes make sure to involve relevant IT teams in your planning and review processes.
This section of the Cloud Adoption Framework outlines how to develop a Resource Consistency discipline as part of your cloud
governance strategy. The primary audience for this guidance is your organization's cloud architects and other members of your
cloud governance team. However, the decisions, policies, and processes that emerge from this discipline should involve
engagement and discussions with relevant members of the IT teams responsible for implementing and managing your
organization's Resource Consistency solutions.
If your organization lacks in-house expertise in Resource Consistency strategies, consider engaging external consultants as a part
of this discipline. Also consider engaging Microsoft Consulting Services, the Microsoft FastTrack cloud adoption service, or other
external cloud adoption experts for discussing how best to organize, track, and optimize your cloud-based assets.

Policy statements
Actionable policy statements and the resulting architecture requirements serve as the foundation of a Resource Consistency
discipline. To see policy statement samples, see the article on Resource Consistency Policy Statements. These samples can serve
as a starting point for your organization's governance policies.
C A U T IO N

The sample policies come from common customer experiences. To better align these policies to specific cloud governance needs,
execute the following steps to create policy statements that meet your unique business needs.

Develop governance policy statements


The following six steps offer examples and potential options to consider when developing Resource Consistency governance. Use
each step as a starting point for discussions within your cloud governance team and with affected business, and IT teams across
your organization to establish the policies and processes needed to manage Resource Consistency risks.

Resource Consistency Template


Download the template for documenting a Resource Consistency discipline

Business Risks
Understand the motives and risks commonly associated with the Resource Consistency discipline.
Indicators and Metrics
Indicators to understand if it is the right time to invest in the Resource Consistency discipline.

Policy adherence processes


Suggested processes for supporting policy compliance in the Resource Consistency discipline.

Maturity
Aligning Cloud Management maturity with phases of cloud adoption.

Toolchain
Azure services that can be implemented to support the Resource Consistency discipline.

Next steps
Get started by evaluating business risks in a specific environment.
Understand business risks
Resource Consistency template
2 minutes to read • Edit Online

The first step to implementing change is communicating what is desired. The same is true when changing
governance practices. The template below serves as a starting point for documenting and communicating policy
statements that govern IT operations and management in the cloud.
As your discussions progress, use this template's structure as a model for capturing the business risks, risk
tolerances, compliance processes, and tooling needed to define your organization's Resource Consistency policy
statements.

IMPORTANT
This template is a limited sample. Before updating this template to reflect your requirements, you should review the
subsequent steps for defining an effective Resource Consistency discipline within your cloud governance strategy.

Download governance discipline template

Next steps
Solid governance practices start with an understanding of business risk. Review the article on business risks and
begin to document the business risks that align with your current cloud adoption plan.
Understand business risks
Resource Consistency motivations and business risks
2 minutes to read • Edit Online

This article discusses the reasons that customers typically adopt a Resource Consistency discipline within a cloud
governance strategy. It also provides a few examples of potential business risks that can drive policy statements.

Resource Consistency relevancy


When it comes to deploying resources and workloads, the cloud offers increased agility and flexibility over most
traditional on-premises datacenters. However, these potential cloud-based advantages also come paired with
potential management drawbacks that can seriously jeopardize the success of your cloud adoption. What assets
have you deployed? What teams own what assets? Do you have enough resources supporting a workload? How
do you know if workloads are healthy?
Resource Consistency is crucial to ensure that resources are deployed, updated, and configured consistently in a
repeatable manner, and that service disruptions are minimized and remedied in as little time as possible.
The Resource Consistency discipline is concerned with identifying and mitigating business risks related to the
operational aspects of your cloud deployment. Resource Consistency includes monitoring of applications,
workloads, and asset performance. It also includes the tasks required to meet scale demands, provide disaster
recovery capabilities, mitigate performance Service Level Agreement (SLA) violations, and proactively avoid those
SLA violations through automated remediation.
Initial test deployments may not require much beyond adopting some cursory naming and tagging standards to
support your Resource Consistency needs. As your cloud adoption matures and you deploy more complicated
and mission-critical assets, the need to invest in the Resource Consistency discipline increases rapidly.

Business risk
The Resource Consistency discipline attempts to address core operational business risks. Work with your business
and IT teams to identify these risks and monitor each of them for relevance as you plan for and implement your
cloud deployments.
Risks will differ between organization, but the following serve as common risks that you can use as a starting
point for discussions within your cloud governance team:
Unnecessary operational cost. Obsolete or unused resources, or resources that are overprovisioned during
times of low demand, add unnecessary operational costs.
Underprovisioned resources. Resources that experience higher than anticipated demand can result in
business disruption as cloud resources are overwhelmed by demand.
Management inefficiencies. Lack of consistent naming and tagging metadata associated with resources can
lead to IT staff having difficulty finding resources for management tasks or identifying ownership and
accounting information related to assets. This results in management inefficiencies that can increase cost and
slow IT responsiveness to service disruption or other operational issues.
Business interruption. Service disruptions that result in violations of your organization's established Service
Level Agreements (SLAs) can result in loss of business or other financial impacts to your company.

Next steps
Using the Cloud Management template, document business risks that are likely to be introduced by the current
cloud adoption plan.
Once an understanding of realistic business risks is established, the next step is to document the business's
tolerance for risk and the indicators and key metrics to monitor that tolerance.
Understand indicators, metrics, and risk tolerance
Resource Consistency metrics, indicators, and risk
tolerance
5 minutes to read • Edit Online

This article will help you quantify business risk tolerance as it relates to Resource Consistency. Defining metrics
and indicators helps you create a business case for making an investment in maturing the Resource Consistency
discipline.

Metrics
The Resource Consistency discipline focuses on addressing risks related to the operational management of your
cloud deployments. As part of your risk analysis you'll want to gather data related to your IT operations to
determine how much risk you face, and how important investment in Resource Consistency governance is to your
planned cloud deployments.
Every organization has different operational scenarios, but the following items represent useful examples of the
metrics you should gather when evaluating risk tolerance within the Resource Consistency discipline:
Cloud assets. Total number of cloud-deployed resources.
Untagged resources. Number of resources without required accounting, business impact, or organizational
tags.
Underused assets. Number of resources where memory, CPU, or network capabilities are all consistently
underutilized.
Resource depletion. Number of resources where memory, CPU, or network capabilities are exhausted by
load.
Resource age. Time since resource was last deployed or modified.
VMs in critical condition. Number of deployed VMs where one or more critical issues are detected which
need to be addressed in order to restore normal functionality.
Alerts by severity. Total number of alerts on a deployed asset, broken down by severity.
Unhealthy network links. Number of resources with network connectivity issues.
Unhealthy service endpoints. Number of issues with external network endpoints.
Cloud provider service health incidents. Number of disruptions or performance incidents caused by the
cloud provider.
Service level agreements. This can include both Microsoft's commitments for uptime and connectivity of
Azure services, as well as commitments made by the business to its external and internal customers.
Service availability. Percentage of actual uptime cloud-hosted workloads compared to the expected uptime.
Recovery time objective (RTO ). The maximum acceptable time that an application can be unavailable after
an incident.
Recovery point objective (RPO ). The maximum duration of data loss that is acceptable during a disaster. For
example, if you store data in a single database, with no replication to other databases, and perform hourly
backups, you could lose up to an hour of data.
Mean time to recover (MTTR). The average time required to restore a component after a failure.
Mean time between failures (MTBF). The duration that a component can reasonably expect to run between
outages. This metric can help you calculate how often a service will become unavailable.
Backup health. Number of backups actively being synchronized.
Recovery health. Number of recovery operations successfully performed.
Risk tolerance indicators
Cloud platforms offer a baseline set of features that allow deployment teams to effectively manage small
deployments without extensive additional planning or processes. As a result, small Dev/Test or experimental first
workloads that include a relatively small amount of cloud-based assets represent low level of risk, and will likely
not need much in the way of a formal Resource Consistency policy.
However, as the size of your cloud estate grows the complexity of managing your assets becomes significantly
more difficult. With more assets on the cloud, the ability identify ownership of resources and control resource
useful becomes critical to minimizing risks. As more mission-critical workloads are deployed to the cloud, service
uptime becomes more critical, and tolerance for service disruption potential cost overruns diminishes rapidly.
In the early stages of cloud adoption, work with your IT operations team and business stakeholders to identify
business risks related to Resource Consistency, then determine an acceptable baseline for risk tolerance. This
section of the Cloud Adoption Framework provides examples, but the detailed risks and baselines for your
company or deployments may be different.
Once you have a baseline, establish minimum benchmarks representing an unacceptable increase in your
identified risks. These benchmarks act as triggers for when you need to take action to remediate these risks. The
following are a few examples of how operational metrics, such as those discussed above, can justify an increased
investment in the Resource Consistency discipline.
Tagging and naming trigger. A company with more than x resources lacking required tagging information or
not obeying naming standards should consider investing in the Resource Consistency discipline to help refine
these standards and ensure consistent application of them to cloud-deployed assets.
Overprovisioned resources trigger. If a company has more than x% of assets regularly using small amounts
of their available memory, CPU, or network capabilities, investment in the Resource Consistency discipline is
suggested to help optimize resources usage for these items.
Underprovisioned resources trigger. If a company has more than x% of assets regularly exhausting most of
their available memory, CPU, or network capabilities, investment in the Resource Consistency discipline is
suggested to help ensure these assets have the resources necessary to prevent service interruptions.
Resource age trigger. A company with more than x resources that have not been updated in over y months
could benefit from investment in the Resource Consistency discipline aimed at ensuring active resources are
patched and healthy, while retiring obsolete or otherwise unused assets.
Service-level agreement trigger. A company that cannot meet its service-level agreements to its external
customers or internal partners should invest in the Deployment Acceleration discipline to reduce system
downtime.
Recovery time triggers. If a company exceeds the required thresholds for recovery time following a system
failure, it should invest in improving its Deployment Acceleration discipline and systems design to reduce or
eliminate failures or the effect of individual component downtime.
VM health trigger. A company that has more than x% of VMs experiencing a critical health issue should
invest in the Resource Consistency discipline to identify issues and improve VM stability.
Network health trigger. A company that has more than x% of network subnets or endpoints experiencing
connectivity issues should invest in the Resource Consistency discipline to identify and resolve network issues.
Backup coverage trigger. A company with x% of mission-critical assets without up-to-date backups in place
would benefit from an increased investment in the Resource Consistency discipline to ensure a consistent
backup strategy.
Backup health trigger. A company experiencing more than x% failure of restore operations should invest in
the Resource Consistency discipline to identify problems with backup and ensure important resources are
protected.
The exact metrics and triggers you use to gauge risk tolerance and the level of investment in the Resource
Consistency discipline will be specific to your organization, but the examples above should serve as a useful base
for discussion within your cloud governance team.

Next steps
Using the Cloud Management template, document metrics and tolerance indicators that align to the current cloud
adoption plan.
Review sample Resource Consistency policies as a starting point to develop policies that address specific business
risks that align with your cloud adoption plans.
Review sample policies
Resource Consistency sample policy statements
4 minutes to read • Edit Online

Individual cloud policy statements are guidelines for addressing specific risks identified during your risk
assessment process. These statements should provide a concise summary of risks and plans to deal with them.
Each statement definition should include these pieces of information:
Technical risk: A summary of the risk this policy will address.
Policy statement: A clear summary explanation of the policy requirements.
Design options: Actionable recommendations, specifications, or other guidance that IT teams and developers
can use when implementing the policy.
The following sample policy statements address common business risks related to resource consistency. These
statements are examples you can reference when drafting policy statements to address your organization's needs.
These examples are not meant to be proscriptive, and there are potentially several policy options for dealing with
each identified risk. Work closely with business and IT teams to identify the best policies for your unique set of
risks.

Tagging
Technical risk: Without proper metadata tagging associated with deployed resources, IT Operations cannot
prioritize support or optimization of resources based on required SLA, importance to business operations, or
operational cost. This can result in mis-allocation of IT resources and potential delays in incident resolution.
Policy statement: The following policies will be implemented:
Deployed assets should be tagged with the following values:
Cost
Criticality
SLA
Environment
Governance tooling must validate tagging related to cost, criticality, SLA, application, and environment. All
values must align to predefined values managed by the governance team.
Potential design options: In Azure, standard name-value metadata tags are supported on most resource types.
Azure Policy is used to enforce specific tags as part of resource creation.

Ungoverned subscriptions
Technical risk: Arbitrary creation of subscriptions and management groups can lead to isolated sections of your
cloud estate that are not properly subject to your governance policies.
Policy statement: Creation of new subscriptions or management groups for any mission-critical applications or
protected data will require a review from the cloud governance team. Approved changes will be integrated into a
proper blueprint assignment.
Potential design options: Lock down administrative access to your organizations Azure management groups to
only approved governance team members who will control the subscription creation and access control process.

Manage updates to virtual machines


Technical risk: Virtual machines (VMs) that are not up-to-date with the latest updates and software patches are
vulnerable to security or performance issues, which can result in service disruptions.
Policy statement: Governance tooling must enforce that automatic updates are enabled on all deployed VMs.
Violations must be reviewed with operational management teams and remediated in accordance with operations
policies. Assets that are not automatically updated must be included in processes owned by IT Operations.
Potential design options: For Azure hosted VMs, you can provide consistent update management using the
Update Management solution in Azure Automation.

Deployment compliance
Technical risk: Deployment scripts and automation tooling that is not fully vetted by the cloud governance team
can result in resource deployments that violate policy.
Policy statement: The following policies will be implemented:
Deployment tooling must be approved by the cloud governance team to ensure ongoing governance of
deployed assets.
Deployment scripts must be maintained in central repository accessible by the cloud governance team for
periodic review and auditing.
Potential design options: Consistent use of Azure Blueprints to manage automated deployments allows
consistent deployments of Azure resources that adhere to your organization's governance standards and policies.

Monitoring
Technical risk: Improperly implemented or inconsistently instrumented monitoring can prevent the detection of
workload health issues or other policy compliance violations.
Policy statement: The following policies will be implemented:
Governance tooling must validate that all assets are included in monitoring for resource depletion, security,
compliance, and optimization.
Governance tooling must validate that the appropriate level of logging data is being collected for all
applications and data.
Potential design options: Azure Monitor is the default monitoring service in Azure, and consistent monitoring
can be enforced via Azure Blueprints when deploying resources.

Disaster recovery
Technical risk: Resource failure, deletions, or corruption can result in disruption of mission-critical applications or
services and the loss of sensitive data.
Policy statement: All mission-critical applications and protected data must have backup and recovery solutions
implemented to minimize business impact of outages or system failures.
Potential design options: The Azure Site Recovery service provides backup, recovery, and replication
capabilities that minimize outage duration in business continuity and disaster recovery (BCDR ) scenarios.

Next steps
Use the samples mentioned in this article as a starting point to develop policies that address specific business risks
that align with your cloud adoption plans.
To begin developing your own custom policy statements related to Resource Consistency, download the Resource
Consistency template.
To accelerate adoption of this discipline, choose the actionable governance guide that most closely aligns with your
environment. Then modify the design to incorporate your specific corporate policy decisions.
Building on risks and tolerance, establish a process for governing and communicating Resource Consistency policy
adherence.
Establish policy compliance processes
Resource Consistency policy compliance processes
5 minutes to read • Edit Online

This article discusses an approach to policy adherence processes that govern Resource Consistency. Effective
cloud Resource Consistency governance starts with recurring manual processes designed to identify operational
inefficiency, improve management of deployed resources, and ensure mission-critical workloads have minimal
disruptions. These manual processes are supplemented with monitoring, automation, and tooling to help reduce
the overhead of governance and allow for faster response to policy deviation.

Planning, review, and reporting processes


Cloud platforms provide an array of management tools and features that you can use to organize, provision, scale,
and minimize downtime. Using these tools to effectively structure and operate your cloud deployments in ways
that remediate potential risks requires well thought out processes and policies in addition to close cooperation
with IT Operations teams and business stakeholders.
The following is a set of example processes commonly involved in the Resource Consistency discipline. Use these
examples as a starting point when planning the processes that will allow you to continue to update Resource
Consistency policy based on business change and feedback from the development and IT teams tasked with
turning guidance into action.
Initial risk assessment and planning: As part of your initial adoption of the Resource Consistency discipline,
identify your core business risks and tolerances related to operations and IT management. Use this information to
discuss specific technical risks with members of your IT teams and workload owners to develop a baseline set of
Resource Consistency policies designed to remediate these risks, establishing your initial governance strategy.
Deployment planning: Before deploying any asset, perform a review to identify any new operational risks.
Establish resource requirements and expected demand patterns, and identify scalability needs and potential usage
optimization opportunities. Also ensure backup and recovery plans are in place.
Deployment testing: As part of deployment, the cloud governance team, in cooperation with your cloud
operations teams, will be responsible for reviewing the deployment to validate Resource Consistency policy
compliance.
Annual planning: On an annual basis, perform a high-level review of Resource Consistency strategy. Explore
future corporate expansion plans or priorities and update cloud adoption strategies to identify potential risk
increase or other emerging Resource Consistency needs. Also use this time to review the latest best practices for
cloud Resource Consistency and integrate these into your policies and review processes.
Quarterly review and planning: On a quarterly basis perform a review of operational data and incident reports
to identify any changes required in Resource Consistency policy. As part of this process, review changes in
resource usage and performance to identify assets that require increases or decreases in resource allocation, and
identify any workloads or assets that are candidates for retirement.
This planning process is also a good time to evaluate the current membership of your cloud governance team for
knowledge gaps related to new or changing policy and risks related to Resource Consistency as a discipline. Invite
relevant IT staff to participate in reviews and planning as either temporary technical advisors or permanent
members of your team.
Education and training: On a bimonthly basis, offer training sessions to make sure IT staff and developers are
up-to-date on the latest Resource Consistency policy requirements and guidance. As part of this process review
and update any documentation or other training assets to ensure they are in sync with the latest corporate policy
statements.
Monthly audit and reporting reviews: On a monthly basis, perform an audit on all cloud deployments to
assure their continued alignment with Resource Consistency policy. Review related activities with IT staff and
identify any compliance issues not already handled as part of the ongoing monitoring and enforcement process.
The result of this review is a report for the cloud strategy team and each cloud adoption team to communicate
overall performance and adherence to policy. The report is also stored for auditing and legal purposes.

Ongoing monitoring processes


Determining if your Resource Consistency governance strategy is successful depends on visibility into the current
and past state of your cloud infrastructure. Without the ability to analyze the relevant metrics and data of your
cloud environment's health and activity, you cannot identify changes in your risks or detect violations of your risk
tolerances. The ongoing governance processes discussed above require quality data to ensure policy can be
modified to optimize your cloud resource usage and improve overall performance of cloud-hosted workloads.
Ensure that your IT teams have implemented automated monitoring systems for your cloud infrastructure that
capture the relevant logs data you need to evaluate risks. Be proactive in monitoring these systems to ensure
prompt detection and mitigation of potential policy violation, and ensure your monitoring strategy is in line with
your operational needs.

Violation triggers and enforcement actions


Because Resource Consistency policy compliance can lead to critical service disruption or significant cost overruns
risks, the cloud governance team should have visibility into noncompliance incidents. Ensure IT staff have clear
escalation paths for reporting these issues to the governance team members best suited to identify and verify that
policy issues are mitigated when detected.
When violations are detected, you should take actions to realign with policy as soon as possible. Your IT team can
automate most violation triggers using the tools outlined in the Resource Consistency toolchain for Azure.
The following triggers and enforcement actions provide examples you can reference when planning how to use
monitoring data to resolve policy violations:
Overprovisioned resource detected. Resources detected using less than 60% of CPU or memory capacity
should automatically scale down or deprovisioning resources to reduce costs.
Underprovisioned resource detected. Resources detected using more than 80% of CPU or memory
capacity should automatically scale up or provisioning additional resources to provide additional capacity.
Untagged resource creation. Any request to create a resource without required meta tags will be rejected
automatically.
Critical resource outage detected. IT staff are notified on all detected outages of mission-critical outages. If
outage is not immediately resolvable, staff will escalate the issue and notify workload owners and the cloud
governance team. The cloud governance team will track the issue until resolution and update guidance if policy
revision is necessary to prevent future incidents.
Configuration drift. Resources detected that do not conform to established baselines should trigger alerts
and be automatically remediated using configuration management tools like Azure Automation, Chef, Puppet,
Ansible, etc.

Next steps
Using the Cloud Management template, document the processes and triggers that align to the current cloud
adoption plan.
For guidance on executing cloud management policies in alignment with adoption plans, see the article on
discipline improvement.
Resource Consistency discipline improvement
Resource Consistency discipline improvement
6 minutes to read • Edit Online

The Resource Consistency discipline focuses on ways of establishing policies related to the operational
management of an environment, application, or workload. Within the Five Disciplines of Cloud Governance,
Resource Consistency includes the monitoring of application, workload, and asset performance. It also includes the
tasks required to meet scale demands, remediate performance Service Level Agreement (SLA) violations, and
proactively avoid SLA violations through automated remediation.
This article outlines some potential tasks your company can engage in to better develop and mature the Resource
Consistency discipline. These tasks can be broken down into planning, building, adopting, and operating phases of
implementing a cloud solution, which are then iterated on allowing the development of an incremental approach
to cloud governance.

Figure 1 - Adoption phases of the incremental approach to cloud governance.


It's impossible for any one document to account for the requirements of all businesses. As such, this article
outlines suggested minimum and potential example activities for each phase of the governance maturation
process. The initial objective of these activities is to help you build a Policy MVP and establish a framework for
incremental policy improvement. Your cloud governance team will need to decide how much to invest in these
activities to improve your Resource Consistency governance capabilities.
Cau t i on

Neither the minimum or potential activities outlined in this article are aligned to specific corporate policies or
third-party compliance requirements. This guidance is designed to help facilitate the conversations that will lead to
alignment of both requirements with a cloud governance model.

Planning and readiness


This phase of governance maturity bridges the divide between business outcomes and actionable strategies.
During this process, the leadership team defines specific metrics, maps those metrics to the digital estate, and
begins planning the overall migration effort.
Minimum suggested activities:
Evaluate your Resource Consistency toolchain options.
Understand the licensing requirements for your cloud strategy.
Develop a draft Architecture Guidelines document and distribute to key stakeholders.
Become familiar with the resource manager you use to deploy, manage, and monitor all the resources for your
solution as a group.
Educate and involve the people and teams affected by the development of architecture guidelines.
Add prioritized resource deployment tasks to your migration backlog.
Potential activities:
Work with the business stakeholders and your cloud strategy team to understand the desired cloud accounting
approach and cost accounting practices within your business units and organization as a whole.
Define your monitoring and policy enforcement requirements.
Examine the business value and cost of outage to define remediation policy and SLA requirements.
Determine whether you'll deploy a simple workload or multiple team governance strategy for your resources.
Determine scalability requirements for your planned workloads.

Build and predeployment


Several technical and nontechnical prerequisites are required to successful migrate an environment. This process
focuses on the decisions, readiness, and core infrastructure that proceeds a migration.
Minimum suggested activities:
Implement your Resource Consistency toolchain by rolling out in a predeployment phase.
Update the Architecture Guidelines document and distribute to key stakeholders.
Implement resource deployment tasks on your prioritized migration backlog.
Develop educational materials and documentation, awareness communications, incentives, and other programs
to help drive user adoption.
Potential activities:
Decide on a subscription design strategy, choosing the subscription patterns that best fit your organization and
workload needs.
Use a resource consistency strategy to enforce architecture guidelines over time.
Implement resource naming, and tagging standards for your resources to match your organizational and
accounting requirements.
To create proactive point-in-time governance, use deployment templates and automation to enforce common
configurations and a consistent grouping structure when deploying resources and resource groups.
Establish a least-privilege permissions model, where users have no permissions by default.
Determine who in your organization owns each workload and account, and who will need to access to maintain
or modify these resources. Define cloud roles and responsibilities that match these needs and use these roles as
the basis for access control.
Define dependencies between resources.
Implement automated resource scaling to match requirements defined in the Plan stage.
Conduct access performance to measure the quality of services received.
Consider deploying policy to manage SLA enforcement using configuration settings and resource creation
rules.

Adopt and migrate


Migration is an incremental process that focuses on the movement, testing, and adoption of applications or
workloads in an existing digital estate.
Minimum suggested activities:
Migrate your Resource Consistency toolchain from predeployment to production.
Update the Architecture Guidelines document and distribute to key stakeholders.
Develop educational materials and documentation, awareness communications, incentives, and other programs
to help drive user adoption.
Migrate any existing automated remediation scripts or tools to support defined SLA requirements.
Potential activities:
Complete and test monitoring and reporting data with your chosen on-premises, cloud gateway, or hybrid
solution.
Determine if changes need to be made to SLA or management policy for resources.
Improve operations tasks by implementing query capabilities to efficiently find resource across your cloud
estate.
Align resources to changing business needs and governance requirements.
Ensure that your virtual machines, virtual networks, and storage accounts reflect actual resource access needs
during each release, and adjust as necessary.
Verify automated scaling of resources meets access requirements.
Review user access to resources, resource groups, and Azure subscriptions, and adjust access controls as
necessary.
Monitor changes in resource access plans and validate with stakeholders if additional sign-offs are needed.
Update changes to the Architecture Guidelines document to reflect actual costs.
Determine whether your organization requires clearer financial alignment to P&Ls for business units.
For global organizations, implement your SLA compliance or sovereignty requirements.
For cloud aggregation, deploy a gateway solution to a cloud provider.
For tools that don't allow for hybrid or gateway options, tightly couple monitoring with an operational
monitoring tool that spans all datacenters and clouds.

Operate and post-implementation


Once the transformation is complete, governance and operations must live on for the natural lifecycle of an
application or workload. This phase of governance maturity focuses on the activities that commonly come after the
solution is implemented and the transformation cycle begins to stabilize.
Minimum suggested activities:
Customize your Resource Consistency toolchain based on updates to your organization's changing Cost
Management needs.
Consider automating any notifications and reports to reflect actual resource usage.
Refine Architecture Guidelines to guide future adoption processes.
Educate affected teams periodically to ensure ongoing adherence to the architecture guidelines.
Potential activities:
Adjust plans quarterly to reflect changes to actual resources.
Automatically apply and enforce governance requirements during future deployments.
Evaluate underused resources and determine if they're worth continuing.
Detect misalignments and anomalies between planned and actual resource usage.
Assist the cloud adoption teams and the cloud strategy team in understanding and resolving these anomalies.
Determine if changes need to be made to Resource Consistency for billing and SLAs.
Evaluate logging and monitoring tools to determine whether your on-premises, cloud gateway, or hybrid
solution needs adjusting.
For business units and geographically distributed groups, determine if your organization should consider using
additional cloud management features such as Azure management groups to better apply centralized policy
and meet SLA requirements.

Next steps
Now that you understand the concept of cloud resource governance, move on to learn more about how resource
access is managed in Azure in preparation for learning how to design a governance model for a simple workload
or for multiple teams.
Learn about resource access management in Azure Learn about service-level agreements for Azure Learn about
logging, reporting, and monitoring
Resource Consistency tools in Azure
2 minutes to read • Edit Online

Resource Consistency is one of the Five Disciplines of Cloud Governance. This discipline focuses on ways of
establishing policies related to the operational management of an environment, application, or workload. Within
the Five Disciplines of Cloud Governance, the Resource Consistency discipline involves monitoring of application,
workload, and asset performance. It also involves the tasks required to meet scale demands, remediate
performance SLA violations, and proactively avoid performance SLA violations through automated remediation.
The following is a list of Azure tools that can help mature the policies and processes that support this governance
discipline.

AZURE AZURE
AZURE RESOURCE AZURE AUTOMATIO AZURE AZURE SITE
TOOL PORTAL MANAGER BLUEPRINTS N AZURE AD BACKUP RECOVERY

Deploy Yes Yes Yes Yes No No No


resources

Manage Yes Yes Yes Yes No No No


resources

Deploy No Yes No Yes No No No


resources
using
templates

Orchestrate No No Yes No No No No
d
environmen
t
deployment

Define Yes Yes Yes No No No No


resource
groups

Manage Yes Yes Yes No No No No


workload
and
account
owners

Manage Yes Yes Yes No No No No


conditional
access to
resources

Configure Yes No No No Yes No No


RBAC users
AZURE AZURE
AZURE RESOURCE AZURE AUTOMATIO AZURE AZURE SITE
TOOL PORTAL MANAGER BLUEPRINTS N AZURE AD BACKUP RECOVERY

Assign roles Yes Yes Yes No Yes No No


and
permissions
to
resources

Define No Yes Yes No No No No


dependenci
es between
resources

Apply Yes Yes Yes No Yes No No


access
control

Assess No No No Yes No No No
availability
and
scalability

Apply tags Yes Yes Yes No No No No


to
resources

Assign Yes Yes Yes No No No No


Azure
Policy rules

Apply No No No Yes No No No
automated
remediation

Manage Yes No No No No No No
billing

Plan Yes Yes Yes No No Yes Yes


resources
for disaster
recovery

Recover No No No No No Yes Yes


data during
an outage
or SLA
violation

Recover No No No No No Yes Yes


applications
and data
during an
outage or
SLA
violation

Along with these Resource Consistency tools and features, you will need to monitor your deployed resources for
performance and health issues. Azure Monitor is the default monitoring and reporting solution in Azure. Azure
Monitor provides features for monitoring your cloud resources. This list shows which feature addresses common
monitoring requirements.

APPLICATION AZURE MONITOR REST


TOOL AZURE PORTAL INSIGHTS LOG ANALYTICS API

Log virtual machine No No Yes No


telemetry data

Log virtual No No Yes No


networking telemetry
data

Log PaaS services No No Yes No


telemetry data

Log application No Yes No No


telemetry data

Configure reports Yes No No Yes


and alerts

Schedule regular No No No No
reports or custom
analysis

Visualize and analyze Yes No No No


log and performance
data

Integrate with on- No No No Yes


premises or third-
party monitoring
solution

When planning your deployment, you will need to consider where logging data is stored and how you integrate
cloud-based reporting and monitoring services with your existing processes and tools.

NOTE
Organizations also use third-party DevOps tools to monitor workloads and resources. For more information, see DevOps
tool integrations.

Next steps
Learn how to create, assign, and manage policy definitions in Azure.
Resource access management in Azure
4 minutes to read • Edit Online

Cloud Governance outlines the Five Disciplines of Cloud Governance, which includes Resource Management.
What is resource access governance furthers explains how resource access management fits into the resource
management discipline. Before you move on to learn how to design a governance model, it's important to
understand the resource access management controls in Azure. The configuration of these resource access
management controls forms the basis of your governance model.
Begin by taking a closer look at how resources are deployed in Azure.

What is an Azure resource?


In Azure, the term resource refers to an entity managed by Azure. For example, virtual machines, virtual networks,
and storage accounts are all referred to as Azure resources.

Figure 1 - A resource.

What is an Azure resource group?


Each resource in Azure must belong to a resource group. A resource group is simply a logical construct that
groups multiple resources together so they can be managed as a single entity based on lifecycle and security. For
example, resources that share a similar lifecycle, such as the resources for an n-tier application may be created or
deleted as a group. Put another way: everything that is born together, gets managed together, and deprecates
together, goes together in a resource group.
Figure
2 - A resource group contains a resource.
Resource groups and the resources they contain are associated with an Azure subscription.

What is an Azure subscription?


An Azure subscription is similar to a resource group in that it's a logical construct that groups together resource
groups and their resources. However, an Azure subscription is also associated with the controls used by Azure
Resource Manager. Take a closer look at Azure Resource Manager to learn about the relationship between it and
an Azure subscription.
Figure 3 - An Azure subscription.

What is Azure Resource Manager?


In how does Azure work? you learned that Azure includes a "front end" with many services that orchestrate all the
functions of Azure. One of these services is Azure Resource Manager, and this service hosts the RESTful API used
by clients to manage resources.
Figure 4 - Azure Resource Manager.
The following figure shows three clients: PowerShell, the Azure portal, and the Azure CLI:
Figure 5 - Azure clients connect to the Azure Resource Manager RESTful API.
While these clients connect to Azure Resource Manager using the RESTful API, Azure Resource Manager does
not include functionality to manage resources directly. Rather, most resource types in Azure have their own
resource provider.
Figure 6 - Azure resource providers.
When a client makes a request to manage a specific resource, Azure Resource Manager connects to the resource
provider for that resource type to complete the request. For example, if a client makes a request to manage a
virtual machine resource, Azure Resource Manager connects to the Microsoft.Compute resource provider.
Figure 7 - Azure Resource Manager connects to the Microsoft.Compute resource provider to manage the
resource specified in the client request.
Azure Resource Manager requires the client to specify an identifier for both the subscription and the resource
group in order to manage the virtual machine resource.
Now that you have an understanding of how Azure Resource Manager works, return to the discussion of how an
Azure subscription is associated with the controls used by Azure Resource Manager. Before any resource
management request can be executed by Azure Resource Manager, a set of controls are checked.
The first control is that a request must be made by a validated user, and Azure Resource Manager has a trusted
relationship with Azure Active Directory (Azure AD ) to provide user identity functionality.

Figure 8 - Azure Active Directory.


In Azure AD, users are segmented into tenants. A tenant is a logical construct that represents a secure, dedicated
instance of Azure AD typically associated with an organization. Each subscription is associated with an Azure AD
tenant.
Figure 9 - An Azure AD tenant associated with a subscription.
Each client request to manage a resource in a particular subscription requires that the user has an account in the
associated Azure AD tenant.
The next control is a check that the user has sufficient permission to make the request. Permissions are assigned
to users using role-based access control (RBAC ).
Figure 10. Each user in the tenant is assigned one or more RBAC roles.
An RBAC role specifies a set of permissions a user may take on a specific resource. When the role is assigned to
the user, those permissions are applied. For example, the built-in owner role allows a user to perform any action
on a resource.
The next control is a check that the request is allowed under the settings specified for Azure resource policy. Azure
resource policies specify the operations allowed for a specific resource. For example, an Azure resource policy can
specify that users are only allowed to deploy a specific type of virtual machine.
Figure 11. Azure resource policy.
The next control is a check that the request does not exceed an Azure subscription limit. For example, each
subscription has a limit of 980 resource groups per subscription. If a request is received to deploy an additional
resource group when the limit has been reached, it is denied.
Figure 12. Azure resource limits.
The final control is a check that the request is within the financial commitment associated with the subscription.
For example, if the request is to deploy a virtual machine, Azure Resource Manager verifies that the subscription
has sufficient payment information.
Figure 13. A financial commitment is associated with a subscription.

Summary
In this article, you learned about how resource access is managed in Azure using Azure Resource Manager.

Next steps
Now that you understand how to manage resource access in Azure, move on to learn how to design a governance
model for a simple workload or for multiple teams using these services.
An overview of governance
Governance design for a simple workload
6 minutes to read • Edit Online

The goal of this guidance is to help you learn the process for designing a resource governance model in Azure to
support a single team and a simple workload. You'll look at a set of hypothetical governance requirements, then
go through several example implementations that satisfy those requirements.
In the foundational adoption stage, our goal is to deploy a simple workload to Azure. This results in the following
requirements:
Identity management for a single workload owner who is responsible for deploying and maintaining the
simple workload. The workload owner requires permission to create, read, update, and delete resources as well
as permission to delegate these rights to other users in the identity management system.
Manage all resources for the simple workload as a single management unit.

Azure licensing
Before you begin designing our governance model, it's important to understand how Azure is licensed. This is
because the administrative accounts associated with your Azure license have the highest level of access to your
Azure resources. These administrative accounts form the basis of your governance model.

NOTE
If your organization has an existing Microsoft Enterprise Agreement that does not include Azure, Azure can be added by
making an upfront monetary commitment. For more information, see licensing Azure for the enterprise.

When Azure was added to your organization's Enterprise Agreement, your organization was prompted to create
an Azure account. During the account creation process, an Azure account owner was created, as well as an
Azure Active Directory (Azure AD ) tenant with a global administrator account. An Azure AD tenant is a logical
construct that represents a secure, dedicated instance of Azure AD.

Figure 1 - An Azure account with an Account Manager and Azure AD Global Administrator.

Identity management
Azure only trusts Azure AD to authenticate users and authorize user access to resources, so Azure AD is our
identity management system. The Azure AD global administrator has the highest level of permissions and can
perform all actions related to identity, including creating users and assigning permissions.
Our requirement is identity management for a single workload owner who is responsible for deploying and
maintaining the simple workload. The workload owner requires permission to create, read, update, and delete
resources as well as permission to delegate these rights to other users in the identity management system.
Our Azure AD global administrator will create the workload owner account for the workload owner:

Figure 2 - The Azure AD global administrator creates the workload owner user account.
You aren't able to assign resource access permission until this user is added to a subscription, so you'll do that in
the next two sections.

Resource management scope


As the number of resources deployed by your organization grows, the complexity of governing those resources
grows as well. Azure implements a logical container hierarchy to enable your organization to manage your
resources in groups at various levels of granularity, also known as scope.
The top level of resource management scope is the subscription level. A subscription is created by the Azure
account owner, who establishes the financial commitment and is responsible for paying for all Azure resources
associated with the subscription:
Figure 3 - The Azure account owner creates a subscription.
When the subscription is created, the Azure account owner associates an Azure AD tenant with the subscription,
and this Azure AD tenant is used for authenticating and authorizing users:

Figure 4 - The Azure account owner associates the Azure AD tenant with the subscription.
You may have noticed that there is currently no user associated with the subscription, which means that no one
has permission to manage resources. In reality, the account owner is the owner of the subscription and has
permission to take any action on a resource in the subscription. However, in practical terms the account owner is
more than likely a finance person in your organization and is not responsible for creating, reading, updating, and
deleting resources - those tasks will be performed by the workload owner. Therefore, you need to add the
workload owner to the subscription and assign permissions.
Since the account owner is currently the only user with permission to add the workload owner to the
subscription, they add the workload owner to the subscription:

Figure 5 - The Azure account owner adds the workload owner to the subscription.
The Azure account owner grants permissions to the workload owner by assigning a role-based access control
(RBAC ) role. The RBAC role specifies a set of permissions that the workload owner has for an individual
resource type or a set of resource types.
Notice that in this example, the account owner has assigned the built-in owner role:

Figure 6 - The workload owner was assigned the built-in owner role.
The built-in owner role grants all permissions to the workload owner at the subscription scope.

IMPORTANT
The Azure account owner is responsible for the financial commitment associated with the subscription, but the workload
owner has the same permissions. The account owner must trust the workload owner to deploy resources that are within
the subscription budget.

The next level of management scope is the resource group level. A resource group is a logical container for
resources. Operations applied at the resource group level apply to all resources in a group. Also, it's important to
note that permissions for each user are inherited from the next level up unless they are explicitly changed at that
scope.
To illustrate this, let's look at what happens when the workload owner creates a resource group:

Figure 7 - The workload owner creates a resource group and inherits the built-in owner role at the resource group
scope.
Again, the built-in owner role grants all permissions to the workload owner at the resource group scope. As
discussed earlier, this role is inherited from the subscription level. If a different role is assigned to this user at this
scope, it applies to this scope only.
The lowest level of management scope is at the resource level. Operations applied at the resource level apply only
to the resource itself. Again, permissions at the resource level are inherited from resource group scope. For
example, let's look at what happens if the workload owner deploys a virtual network into the resource group:
Figure 8 - The workload owner creates a resource and inherits the built-in owner role at the resource scope.
The workload owner inherits the owner role at the resource scope, which means the workload owner has all
permissions for the virtual network.

Implement the basic resource access management model


Let's move on to learn how to implement the governance model designed earlier.
To begin, your organization requires an Azure account. If your organization has an existing Microsoft Enterprise
Agreement that does not include Azure, Azure can be added by making an upfront monetary commitment. For
more information, see Licensing Azure for the enterprise.
When your Azure account is created, you specify a person in your organization to be the Azure account owner.
An Azure Active Directory (Azure AD ) tenant is then created by default. Your Azure account owner must create
the user account for the person in your organization who is the workload owner.
Next, your Azure account owner must create a subscription and associate the Azure AD tenant with it.
Finally, now that the subscription is created and your Azure AD tenant is associated with it, you can add the
workload owner to the subscription with the built-in owner role.

Next steps
Deploy a basic workload to Azure
Learn about resource access for multiple teams
Governance design for multiple teams
24 minutes to read • Edit Online

The goal of this guidance is to help you learn the process for designing a resource governance model in Azure to
support multiple teams, multiple workloads, and multiple environments. First you'll look at a set of hypothetical
governance requirements, then go through several example implementations that satisfy those requirements.
The requirements are:
The enterprise plans to transition new cloud roles and responsibilities to a set of users and therefore requires
identity management for multiple teams with different resource access needs in Azure. This identity
management system is required to store the identity of the following users:
The individual in your organization responsible for ownership of subscriptions.
The individual in your organization responsible for the shared infrastructure resources used to
connect your on-premises network to an Azure virtual network.
Two individuals in your organization responsible for managing a workload.
Support for multiple environments. An environment is a logical grouping of resources, such as virtual
machines, virtual networking, and network traffic routing services. These groups of resources have similar
management and security requirements and are typically used for a specific purpose such as testing or
production. In this example, the requirement is for four environments:
A shared infrastructure environment that includes resources shared by workloads in other
environments. For example, a virtual network with a gateway subnet that provides connectivity to on-
premises.
A production environment with the most restrictive security policies. May include internal or external
facing workloads.
A preproduction environment for development and testing work. This environment has security,
compliance, and cost policies that differ from those in the production environment. In Azure, this takes
the form of an Enterprise Dev/Test subscription.
A sandbox environment for proof of concept and education purposes. This environment is typically
assigned per employee participating in development activities and has strict procedural and operational
security controls in place to prevent corporate data from landing here. In Azure, these take the form of
Visual Studio subscriptions. These subscriptions should also not be tied to the enterprise Azure Active
Directory.
A permissions model of least privilege in which users have no permissions by default. The model must
support the following:
A single trusted user (treated like a service account) at the subscription scope with permission to assign
resource access rights.
Each workload owner is denied access to resources by default. Resource access rights are granted
explicitly by the single trusted user at the resource group scope.
Management access for the shared infrastructure resources limited to the shared infrastructure owners.
Management access for each workload restricted to the workload owner (in production) and increasing
levels of control as development increases from Dev to Test to Stage to Prod.
The enterprise does not want to have to manage roles independently in each of the three main
environments, and therefore requires the use of only built-in roles available in Azure's role-based access
control (RBAC ). If the enterprise absolutely requires custom RBAC roles, additional processes would be
needed to synchronize custom roles across the three environments.
Cost tracking by workload owner name, environment, or both.
Identity management
Before you can design identity management for your governance model, it's important to understand the four
major areas it encompasses:
Administration: The processes and tools for creating, editing, and deleting user identity.
Authentication: Verifying user identity by validating credentials, such as a user name and password.
Authorization: Determining which resources an authenticated user is allowed to access or what operations
they have permission to perform.
Auditing: Periodically reviewing logs and other information to discover security issues related to user identity.
This includes reviewing suspicious usage patterns, periodically reviewing user permissions to verify they are
accurate, and other functions.
There is only one service trusted by Azure for identity, and that is Azure Active Directory (Azure AD ). You'll be
adding users to Azure AD and using it for all of the functions listed above. But before looking at how to configure
Azure AD, it's important to understand the privileged accounts that are used to manage access to these services.
When your organization signed up for an Azure account, at least one Azure account owner was assigned. Also,
an Azure AD tenant was created, unless an existing tenant was already associated with your organization's use of
other Microsoft services such as Office 365. A global administrator with full permissions on the Azure AD
tenant was associated when it was created.
The user identities for both the Azure Account Owner and the Azure AD global administrator are stored in a
highly secure identity system that is managed by Microsoft. The Azure Account Owner is authorized to create,
update, and delete subscriptions. The Azure AD global administrator is authorized to perform many actions in
Azure AD, but for this design guide you'll focus on the creation and deletion of user identity.

NOTE
Your organization may already have an existing Azure AD tenant if there's an existing Office 365, Intune, or Dynamics license
associated with your account.

The Azure Account Owner has permission to create, update, and delete subscriptions:

Figure 1 - An Azure account with an Account Manager and Azure AD Global Administrator.
The Azure AD global administrator has permission to create user accounts:
Figure 2 - The Azure AD Global Administrator creates the required user accounts in the tenant.
The first two accounts, App1 Workload Owner and App2 Workload Owner are each associated with an
individual in your organization responsible for managing a workload. The network operations account is owned
by the individual that is responsible for the shared infrastructure resources. Finally, the subscription owner
account is associated with the individual responsible for ownership of subscriptions.

Resource access permissions model of least privilege


Now that your identity management system and user accounts have been created, you have to decide how to
apply role-based access control (RBAC ) roles to each account to support a permissions model of least privilege.
There's another requirement stating the resources associated with each workload be isolated from one another
such that no one workload owner has management access to any other workload they do not own. There's also a
requirement to implement this model using only built-in roles for Azure role-based access control.
Each RBAC role is applied at one of three scopes in Azure: subscription, resource group, then an individual
resource. Roles are inherited at lower scopes. For example, if a user is assigned the built-in owner role at the
subscription level, that role is also assigned to that user at the resource group and individual resource level unless
overridden.
Therefore, to create a model of least-privilege access you have to decide the actions a particular type of user is
allowed to take at each of these three scopes. For example, the requirement is for a workload owner to have
permission to manage access to only the resources associated with their workload and no others. If you were to
assign the built-in owner role at the subscription scope, each workload owner would have management access to
all workloads.
Let's take a look at two example permission models to understand this concept a little better. In the first example,
the model trusts only the service administrator to create resource groups. In the second example, the model
assigns the built-in owner role to each workload owner at the subscription scope.
In both examples, there is a subscription service administrator that is assigned the built-in owner role at the
subscription scope. Recall that the built-in owner role grants all permissions including the management of access
to resources.
Figure 3 - A subscription with a service administrator assigned the built-in owner role.
1. In the first example, there is workload owner A with no permissions at the subscription scope - they have no
resource access management rights by default. This user wants to deploy and manage the resources for their
workload. They must contact the service administrator to request creation of a resource group.

2. The service administrator reviews their request and creates resource group A. At this point, workload
owner A still doesn't have permission to do anything.

3. The service administrator adds workload owner A to resource group A and assigns the built-in
contributor role. The contributor role grants all permissions on resource group A except managing access
permission.
4. Let's assume that workload owner A has a requirement for a pair of team members to view the CPU and
network traffic monitoring data as part of capacity planning for the workload. Because workload owner A is
assigned the contributor role, they do not have permission to add a user to resource group A. They must send
this request to the service administrator.

5. The service administrator reviews the request, and adds the two workload contributor users to resource
group A. Neither of these two users require permission to manage resources, so they are assigned the built-in
reader role.
6. Next, workload owner B also requires a resource group to contain the resources for their workload. As with
workload owner A, workload owner B initially does not have permission to take any action at the
subscription scope so they must send a request to the service administrator.

7. The service administrator reviews the request and creates resource group B.
8. The service administrator then adds workload owner B to resource group B and assigns the built-in
contributor role.

At this point, each of the workload owners is isolated in their own resource group. None of the workload owners
or their team members have management access to the resources in any other resource group.
Figure 4 - A subscription with two workload owners isolated with their own resource group.
This model is a least-privilege model—each user is assigned the correct permission at the correct resource
management scope.
However, consider that every task in this example was performed by the service administrator. While this is a
simple example and may not appear to be an issue because there were only two workload owners, it's easy to
imagine the types of issues that would result for a large organization. For example, the service administrator can
become a bottleneck with a large backlog of requests that result in delays.
Let's take a look at second example that reduces the number of tasks performed by the service administrator.
1. In this model, workload owner A is assigned the built-in owner role at the subscription scope, enabling them
to create their own resource group: resource group A.

2. When resource group A is created, workload owner A is added by default and inherits the built-in owner
role from the subscription scope.

3. The built-in owner role grants workload owner A permission to manage access to the resource group.
Workload owner A adds two workload contributors and assigns the built-in reader role to each of them.

4. Service administrator now adds workload owner B to the subscription with the built-in owner role.
5. Workload owner B creates resource group B and is added by default. Again, workload owner B inherits
the built-in owner role from the subscription scope.

Note that in this model, the service administrator performed fewer actions than they did in the first example due
to the delegation of management access to each of the individual workload owners.
Figure 5 - A subscription with a service administrator and two workload owners, all assigned the built-in owner
role.
However, because both workload owner A and workload owner B are assigned the built-in owner role at the
subscription scope, they have each inherited the built-in owner role for each other's resource group. This means
that not only do they have full access to one another's resources, they are also able to delegate management
access to each other's resource groups. For example, workload owner B has rights to add any other user to
resource group A and can assign any role to them, including the built-in owner role.
If you compare each example to the requirements, you'll see that both examples support a single trusted user at
the subscription scope with permission to grant resource access rights to the two workload owners. Each of the
two workload owners did not have access to resource management by default and required the service
administrator to explicitly assign permissions to them. However, only the first example supports the requirement
that the resources associated with each workload are isolated from one another such that no workload owner has
access to the resources of any other workload.

Resource management model


Now that you've designed a permissions model of least privilege, let's move on to take a look at some practical
applications of these governance models. Recall from the requirements that you must support the following three
environments:
1. Shared infrastructure environment: A group of resources shared by all workloads. These are resources such
as network gateways, firewalls, and security services.
2. Production environment: Multiple groups of resources representing multiple production workloads. These
resources are used to host the private and public facing application artifacts. These resources typically have the
tightest governance and security models to protect the resources, application code, and data from
unauthorized access.
3. Preproduction environment: Multiple groups of resources representing multiple nonproduction ready
workloads. These resources are used for development and testing These resources may have a more relaxed
governance model to enable increased developer agility. Security within these groups should increase the
closer to "production" an application development process gets.
For each of these three environments, there is a requirement to track cost data by workload owner,
environment, or both. That is, you'll want to know the ongoing cost of the shared infrastructure, the costs
incurred by individuals in both the preproduction and production environments, and finally the overall cost of
preproduction and production environments.
You have already learned that resources are scoped to two levels: subscription and resource group. Therefore,
the first decision is how to organize environments by subscription. There are only two possibilities: a single
subscription, or, multiple subscriptions.
Before you look at examples of each of these models, let's review the management structure for subscriptions in
Azure.
Recall from the requirements that you have an individual in the organization who is responsible for subscriptions,
and this user owns the subscription owner account in the Azure AD tenant. However, this account does not have
permission to create subscriptions. Only the Azure Account Owner has permission to do this:

Figure 6 - An Azure Account Owner creates a subscription.


Once the subscription has been created, the Azure Account Owner can add the subscription owner account to
the subscription with the owner role:
Figure 7 - The Azure Account Owner adds the subscription owner user account to the subscription with the
owner role.
The subscription owner can now create resource groups and delegate resource access management.
First let's look at an example resource management model using a single subscription. The first decision is how to
align resource groups to the three environments. You have two options:
1. Align each environment to a single resource group. All shared infrastructure resources are deployed to a single
shared infrastructure resource group. All resources associated with development workloads are deployed to
a single development resource group. All resources associated with production workloads are deployed into
a single production resource group for the production environment.
2. Create separate resource groups for each workload, using a naming convention and tags to align resource
groups with each of the three environments.
Let's begin by evaluating the first option. You'll be using the permissions model that was discussed in the previous
section, with a single subscription service administrator who creates resource groups and adds users to them with
either the built-in contributor or reader role.
1. The first resource group deployed represents the shared infrastructure environment. The subscription
owner creates a resource group for the shared infrastructure resources named netops-shared-rg .
2. The subscription owner adds the network operations user account to the resource group and assigns the
contributor role.

3. The network operations user creates a VPN gateway and configures it to connect to the on-premises VPN
appliance. The network operations user also applies a pair of tags to each of the resources:
environment:shared and managedBy:netOps. When the subscription service administrator exports a cost
report, costs will be aligned with each of these tags. This allows the subscription service administrator to
pivot costs using the environment tag and the managedBy tag. Notice the resource limits counter at the top
right-hand side of the figure. Each Azure subscription has service limits, and to help you understand the effect
of these limits you'll follow the virtual network limit for each subscription. There is a limit of 1000 virtual
networks per subscription, and after the first virtual network is deployed there are now 999 available.
4. Two more resource groups are deployed. The first is named prod-rg . This resource group is aligned with the
production environment. The second is named dev-rg and is aligned with the development environment. All
resources associated with production workloads are deployed to the production environment and all resources
associated with development workloads are deployed to the development environment. In this example, you'll
only deploy two workloads to each of these two environments, so you won't encounter any Azure subscription
service limits. However, consider that each resource group has a limit of 800 resources per resource group. If
you continue to add workloads to each resource group, eventually this limit will be reached.

5. The first workload owner sends a request to the subscription service administrator and is added to each
of the development and production environment resource groups with the contributor role. As you learned
earlier, the contributor role allows the user to perform any operation other than assigning a role to another
user. The first workload owner can now create the resources associated with their workload.
6. The first workload owner creates a virtual network in each of the two resource groups with a pair of virtual
machines in each. The first workload owner applies the environment and managedBy tags to all resources.
Note that the Azure service limit counter is now at 997 virtual networks remaining.
7. Each of the virtual networks does not have connectivity to on-premises when they are created. In this type of
architecture, each virtual network must be peered to the hub -vnet in the shared infrastructure environment.
Virtual network peering creates a connection between two separate virtual networks and allows network traffic
to travel between them. Note that virtual network peering is not inherently transitive. A peering must be
specified in each of the two virtual networks that are connected, and if only one of the virtual networks
specifies a peering the connection is incomplete. To illustrate the effect of this, the first workload owner
specifies a peering between prod-vnet and hub-vnet. The first peering is created, but no traffic flows because
the complementary peering from hub-vnet to prod-vnet has not yet been specified. The first workload
owner contacts the network operations user and requests this complementary peering connection.
8. The network operations user reviews the request, approves it, then specifies the peering in the settings for
the hub-vnet. The peering connection is now complete and network traffic flows between the two virtual
networks.
9. Now, a second workload owner sends a request to the subscription service administrator and is added to
the existing production and development environment resource groups with the contributor role. The
second workload owner has the same permissions on all resources as the first workload owner in each
resource group.
10. The second workload owner creates a subnet in the prod-vnet virtual network, then adds two virtual
machines. The second workload owner applies the environment and managedBy tags to each resource.
This example resource management model enables us to manage resources in the three required environments.
The shared infrastructure resources are protected because there's only a single user in the subscription with
permission to access those resources. Each of the workload owners is able to use the shared infrastructure
resources without having any permissions on the actual shared resources themselves. However, This
management model fails the requirement for workload isolation - each of the two workload owners are able to
access the resources of the other's workload.
There's another important consideration with this model that may not be immediately obvious. In the example, it
was app1 workload owner that requested the network peering connection with the hub-vnet to provide
connectivity to on-premises. The network operations user evaluated that request based on the resources
deployed with that workload. When the subscription owner added app2 workload owner with the
contributor role, that user had management access rights to all resources in the prod-rg resource group.

This means app2 workload owner had permission to deploy their own subnet with virtual machines in the
prod-vnet virtual network. By default, those virtual machines now have access to the on-premises network. The
network operations user is not aware of those machines and did not approve their connectivity to on-premises.
Next, let's look at a single subscription with multiple resources groups for different environments and workloads.
Note that in the previous example, the resources for each environment were easily identifiable because they were
in the same resource group. Now that you no longer have that grouping, you will have to rely on a resource group
naming convention to provide that functionality.
1. The shared infrastructure resources will still have a separate resource group in this model, so that remains
the same. Each workload requires two resource groups - one for each of the development and production
environments. For the first workload, the subscription owner creates two resource groups. The first is named
app1-prod-rg and the second is named app1-dev-rg. As discussed earlier, this naming convention identifies
the resources as being associated with the first workload, app1, and either the dev or prod environment.
Again, the subscription owner adds the app1 workload owner to the resource group with the contributor
role.

2. Similar to the first example, app1 workload owner deploys a virtual network named app1-prod-vnet to the
production environment, and another named app1-dev-vnet to the development environment. Again,
app1 workload owner sends a request to the network operations user to create a peering connection. Note
that app1 workload owner adds the same tags as in the first example, and the limit counter has been
decremented to 997 virtual networks remaining in the subscription.
3. The subscription owner now creates two resource groups for app2 workload owner. Following the same
conventions as for app1 workload owner, the resource groups are named app2-prod-rg and app2-dev-rg.
The subscription owner adds app2 workload owner to each of the resource groups with the contributor
role.

4. App2 workload owner deploys virtual networks and virtual machines to the resource groups with the same
naming conventions. Tags are added and the limit counter has been decremented to 995 virtual networks
remaining in the subscription.
5. App2 workload owner sends a request to the network operations user to peer the app2 -prod -vnet with the
hub -vnet. The network operations user creates the peering connection.

The resulting management model is similar to the first example, with several key differences:
Each of the two workloads is isolated by workload and by environment.
This model required two more virtual networks than the first example model. While this is not an important
distinction with only two workloads, the theoretical limit on the number of workloads for this model is 24.
Resources are no longer grouped in a single resource group for each environment. Grouping resources
requires an understanding of the naming conventions used for each environment.
Each of the peered virtual network connections was reviewed and approved by the network operations user.
Now let's look at a resource management model using multiple subscriptions. In this model, you'll align each of
the three environments to a separate subscription: a shared services subscription, production subscription, and
finally a development subscription. The considerations for this model are similar to a model using a single
subscription in that you have to decide how to align resource groups to workloads. Already determined is that
creating a resource group for each workload satisfies the workload isolation requirement, so you'll stick with that
model in this example.
1. In this model, there are three subscriptions: shared infrastructure, production, and development. Each of these
three subscriptions requires a subscription owner, and in the simple example you'll use the same user account
for all three. The shared infrastructure resources are managed similarly to the first two examples above, and
the first workload is associated with the app1 -rg in the production environment and the same-named resource
group in the development environment. The app1 workload owner is added to each of the resource group with
the contributor role.

2. As with the earlier examples, app1 workload owner creates the resources and requests the peering connection
with the shared infrastructure virtual network. App1 workload owner adds only the managedBy tag because
there is no longer a need for the environment tag. That is, resources are for each environment are now
grouped in the same subscription and the environment tag is redundant. The limit counter is decremented to
999 virtual networks remaining.
3. Finally, the subscription owner repeats the process for the second workload, adding the resource groups with
the app2 workload owner in the *contributor role. The limit counter for each of the environment subscriptions
is decremented to 998 virtual networks remaining.
This management model has the benefits of the second example above. However, the key difference is that limits
are less of an issue due to the fact that they are spread over two subscriptions. The drawback is that the cost data
tracked by tags must be aggregated across all three subscriptions.
Therefore, you can select any of these two examples resource management models depending on the priority of
your requirements. If you anticipate that your organization will not reach the service limits for a single
subscription, you can use a single subscription with multiple resource groups. Conversely, if your organization
anticipates many workloads, multiple subscriptions for each environment may be better.

Implement the resource management model


You've learned about several different models for governing access to Azure resources. Now you'll walk through
the steps necessary to implement the resource management model with one subscription for each of the shared
infrastructure, production, and development environments from the design guide. You'll have one
subscription owner for all three environments. Each workload will be isolated in a resource group with a
workload owner added with the contributor role.
NOTE
Read understanding resource access in Azure to learn more about the relationship between Azure Accounts and
subscriptions.

Follow these steps:


1. Create an Azure account if your organization doesn't already have one. The person who signs up for the Azure
account becomes the Azure account administrator, and your organization's leadership must select an individual
to assume this role. This individual will be responsible for:
Creating subscriptions.
Creating and administering Azure Active Directory (Azure AD ) tenants that store user identity for those
subscriptions.
2. Your organization's leadership team decides which people are responsible for:
Management of user identity; an Azure AD tenant is created by default when your organization's Azure
Account is created, and the account administrator is added as the Azure AD global administrator by
default. Your organization can choose another user to manage user identity by assigning the Azure AD
global administrator role to that user.
Subscriptions, which means these users:
Manage costs associated with resource usage in that subscription.
Implement and maintain least permission model for resource access.
Keep track of service limits.
Shared infrastructure services (if your organization decides to use this model), which means this user is
responsible for:
On-premises to Azure network connectivity.
Ownership of network connectivity within Azure through virtual network peering.
Workload owners.
3. The Azure AD global administrator creates the new user accounts for:
The person who will be the subscription owner for each subscription associated with each
environment. Note that this is necessary only if the subscription service administrator will not be
tasked with managing resource access for each subscription/environment.
The person who will be the network operations user.
The people who are workload owners.
4. The Azure account administrator creates the following three subscriptions using the Azure account portal:
A subscription for the shared infrastructure environment.
A subscription for the production environment.
A subscription for the development environment.
5. The Azure account administrator adds the subscription service owner to each subscription.
6. Create an approval process for workload owners to request the creation of resource groups. The approval
process can be implemented in many ways, such as over email, or you can using a process management tool
such as SharePoint workflows. The approval process can follow these steps:
The workload owner prepares a bill of materials for required Azure resources in either the
development environment, production environment, or both, and submits it to the subscription
owner.
The subscription owner reviews the bill of materials and validates the requested resources to ensure
that the requested resources are appropriate for their planned use - for example, checking that the
requested virtual machine sizes are correct.
If the request is not approved, the workload owner is notified. If the request is approved, the
subscription owner creates the requested resource group following your organization's naming
conventions, adds the workload owner with the contributor role and sends notification to the
workload owner that the resource group has been created.
7. Create an approval process for workload owners to request a virtual network peering connection from the
shared infrastructure owner. As with the previous step, this approval process can be implemented using email
or a process management tool.
Now that you've implemented your governance model, you can deploy your shared infrastructure services.

Related resources
Built-in roles for Azure resources

Next steps
Learn about deploying a basic infrastructure
Deployment Acceleration is one of the Five Disciplines of Cloud Governance within the Cloud Adoption Framework governance
model. This discipline focuses on ways of establishing policies to govern asset configuration or deployment. Within the Five
Disciplines of Cloud Governance, Deployment Acceleration includes deployment, configuration alignment, and script reusability.
This could be through manual activities or fully automated DevOps activities. In either case, the policies would remain largely
the same. As this discipline matures, the cloud governance team can serve as a partner in DevOps and deployment strategies by
accelerating deployments and removing barriers to cloud adoption, through the application of reusable assets.
This article outlines the Deployment Acceleration process that a company experiences during the planning, building, adopting,
and operating phases of implementing a cloud solution. It's impossible for any one document to account for all of the
requirements of any business. As such, each section of this article outlines suggested minimum and potential activities. The
objective of these activities is to help you build a policy MVP, but establish a framework for Incremental Policy improvement.
The cloud governance team should decide how much to invest in these activities to improve the Deployment Acceleration
position.
NOTE

The Deployment Acceleration discipline does not replace the existing IT teams, processes, and procedures that allow your
organization to effectively deploy and configure cloud-based resources. The primary purpose of this discipline is to identify
potential business risks and provide risk-mitigation guidance to the IT staff that are responsible for managing your resources in
the cloud. As you develop governance policies and processes make sure to involve relevant IT teams in your planning and
review processes.
The primary audience for this guidance is your organization's cloud architects and other members of your cloud governance
team. However, the decisions, policies, and processes that emerge from this discipline should involve engagement and
discussions with relevant members of your business and IT teams, especially those leaders responsible for deploying and
configuring cloud-based workloads.

Policy statements
Actionable policy statements and the resulting architecture requirements serve as the foundation of a Deployment Acceleration
discipline. To see policy statement samples, see the article on Deployment Acceleration Policy Statements. These samples can
serve as a starting point for your organization's governance policies.
C A U T IO N

The sample policies come from common customer experiences. To better align these policies to specific cloud governance needs,
execute the following steps to create policy statements that meet your unique business needs.

Develop governance policy statements


The following six steps will help you define governance policies to control deployment and configuration of resources in your
cloud environment.

Deployment Acceleration Template


Download the template for documenting a Deployment Acceleration discipline

Business Risks
Understand the motives and risks commonly associated with the Deployment Acceleration discipline.
Indicators and Metrics
Indicators to understand if it is the right time to invest in the Deployment Acceleration discipline.

Policy adherence processes


Suggested processes for supporting policy compliance in the Deployment Acceleration discipline.

Maturity
Aligning Cloud Management maturity with phases of cloud adoption.

Toolchain
Azure services that can be implemented to support the Deployment Acceleration discipline.

Next steps
Get started by evaluating business risks in a specific environment.
Understand business risks
Deployment Acceleration template
2 minutes to read • Edit Online

The first step to implementing change is communicating the desired change. The same is true when changing
governance practices. The template below serves as a starting point for documenting and communicating policy
statements that govern configuration and deployment issues in the cloud. The template also outlines the business
criteria that may have led you to create the documented policy statements.
As your discussions progress, use this template's structure as a model for capturing the business risks, risk
tolerances, compliance processes, and tooling needed to define your organization's Deployment Acceleration
policy statements.

IMPORTANT
This template is a limited sample. Before updating this template to reflect your requirements, you should review the
subsequent steps for defining an effective Deployment Acceleration discipline within your cloud governance strategy.

Download governance discipline template

Next steps
Solid governance practices start with an understanding of business risk. Review the article on business risks and
begin to document the business risks that align with your current cloud adoption plan.
Understand business risks
Deployment Acceleration motivations and business
risks
2 minutes to read • Edit Online

This article discusses the reasons that customers typically adopt a Deployment Acceleration discipline within a
cloud governance strategy. It also provides a few examples of business risks that drive policy statements.

Deployment Acceleration relevancy


On-premises systems are often deployed using baseline images or installation scripts. Additional configuration is
usually necessary, which may involve multiple steps or human intervention. These manual processes are error-
prone and often result in "configuration drift", requiring time-consuming troubleshooting and remediation tasks.
Most Azure resources can be deployed and configured manually via the Azure portal. This approach may be
sufficient for your needs when only have a few resources to manage. However, as your cloud estate grows, your
organization should begin to integrate automation into your deployment processes to ensure your cloud
resources avoid configuration drift or other problems introduced by manual processes. Adopting a DevOps or
DevSecOps approach is often the best way to manage your deployments as you cloud adoption efforts mature.
A robust Deployment Acceleration plan ensures that your cloud resources are deployed, updated, and configured
correctly and consistently, and remain that way. The maturity of your Deployment Acceleration strategy can also
be a significant factor in your Cost Management strategy. Automated provisioning and configuration of your
cloud resources allows you to scale down or deallocate resources when demand is low or time-bound, so you only
pay for resources as you need them.

Business risk
The Deployment Acceleration discipline attempts to address the following business risks. During cloud adoption,
monitor each of the following for relevance:
Service disruption: Lack of predictable repeatable deployment processes or unmanaged changes to system
configurations can disrupt normal operations and can result in lost productivity or lost business.
Cost overruns: Unexpected changes in configuration of system resources can make identifying root cause of
issues more difficult, raising the costs of development, operations, and maintenance.
Organizational inefficiencies: Barriers between development, operations, and security teams can cause
numerous challenges to effective adoption of cloud technologies and the development of a unified cloud
governance model.

Next steps
Using the Cloud Management template, document business risks that are likely to be introduced by the current
cloud adoption plan.
Once an understanding of realistic business risks is established, the next step is to document the business's
tolerance for risk and the indicators and key metrics to monitor that tolerance.
Metrics, indicators, and risk tolerance
Deployment Acceleration metrics, indicators, and risk
tolerance
2 minutes to read • Edit Online

This article will help you quantify business risk tolerance as it relates to Deployment Acceleration. Defining metrics
and indicators helps you create a business case for making an investment in the maturity of the Deployment
Acceleration discipline.

Metrics
The Deployment Acceleration discipline focuses on risks related to how cloud resources are configured, deployed,
updated, and maintained. The following information is useful when adopting this discipline of cloud governance:
Deployment failures: Percentage of deployments that fail or result in misconfigured resources.
Time to deployment: The amount of time needed to deploy updates to an existing system.
Assets out-of-compliance: The number or percentage of resources that are out of compliance with defined
policies.

Risk tolerance indicators


Risks related to Deployment Acceleration are largely related to the number and complexity of cloud-based
systems deployed for your enterprise. As your cloud estate grows, the number of systems deployed and the
frequency of updating your cloud resources will increase. Dependencies between resources magnify the
importance of ensuring proper configuration of resources and designing systems for resiliency if one or more
resources experiences unexpected downtime.
Traditional corporate IT organizations often have siloed operations, security, and development teams that often do
not collaborate well or are even adversarial or hostile toward one another. Recognizing these challenges early and
integrating key stakeholders from each of the teams can help ensure agility in your cloud adoption while
remaining secure and well-governed. Therefore, one should consider adopting a DevOps or DevSecOps
organizational culture early in your cloud adoption journey.
Work with your DevSecOps team and business stakeholders to identify business risks related to configuration,
then determine an acceptable baseline for configuration risk tolerance. This section of the Cloud Adoption
Framework guidance provides examples, but the detailed risks and baselines for your company or deployments
will likely differ.
Once you have a baseline, establish minimum benchmarks representing an unacceptable increase in your
identified risks. These benchmarks act as triggers for when you need to take action to remediate these risks. The
following are a few examples of how configuration-related metrics, such as those discussed above, can justify an
increased investment in the Deployment Acceleration discipline.
Configuration drift triggers: A company that is experiencing unexpected changes in the configuration of key
system components, or failures in the deployment of or updates to its systems, should invest in the
Deployment Acceleration discipline to identify root causes and steps for remediation.
Out of compliance triggers: If the number of out-of-compliance resources exceeds a defined threshold
(either as a total number of resources or a percentage of total resources), a company should invest in
Deployment Acceleration discipline improvements to ensure each resource's configuration remains in
compliance throughout that resource's lifecycle.
Project schedule triggers: If the time to deploy a company's resources and applications often exceed a define
threshold, a company should invest in its Deployment Acceleration processes to introduce or improve
automated deployments for consistency and predictability. Deployment times measured in days or even weeks
usually indicate a suboptimal Deployment Acceleration strategy.

Next steps
Using the Cloud Management template, document metrics and tolerance indicators that align to the current cloud
adoption plan.
Review sample Deployment Acceleration policies as a starting point to develop policies that address specific
business risks that align with your cloud adoption plans.
Review sample policies
Deployment Acceleration sample policy statements
3 minutes to read • Edit Online

Individual cloud policy statements are guidelines for addressing specific risks identified during your risk
assessment process. These statements should provide a concise summary of risks and plans to deal with them.
Each statement definition should include these pieces of information:
Technical risk: A summary of the risk this policy will address.
Policy statement: A clear summary explanation of the policy requirements.
Design options: Actionable recommendations, specifications, or other guidance that IT teams and developers
can use when implementing the policy.
The following sample policy statements address common configuration-related business risks. These statements
are examples you can reference when drafting policy statements to address your organization's needs. These
examples are not meant to be proscriptive, and there are potentially several policy options for dealing with each
identified risk. Work closely with business and IT teams to identify the best policies for your unique set of risks.

Reliance on manual deployment or configuration of systems


Technical risk: Relying on human intervention during deployment or configuration increases the likelihood of
human error and reduces the repeatability and predictability of system deployments and configuration. It also
typically leads to slower deployment of system resources.
Policy statement: All assets deployed to the cloud should be deployed using templates or automation scripts
whenever possible.
Potential design options: Azure Resource Manager templates provides an infrastructure as code approach to
deploying your resources to Azure. You could also use Terraform as a consistent on-premises and cloud-based
deployment tool.

Lack of visibility into system issues


Technical risk: Insufficient monitoring and diagnostics for business systems prevent operations personnel from
identifying and remediating issues before a system outage occurs, and can significantly increase the time needed
to properly resolve an outage.
Policy statement: The following policies will be implemented:
Key metrics and diagnostics measures will be identified for all production systems and components, and
monitoring and diagnostic tools will be applied to these systems and monitored regularly by operations
personnel.
Operations will consider using monitoring and diagnostic tools in nonproduction environments such as
Staging and QA to identify system issues before they occur in the production environment.
Potential design options: Azure Monitor, which includes Log Analytics and Application Insights, provides tools
for collecting and analyzing telemetry to help you understand how your applications are performing and
proactively identify issues affecting them and the resources they depend on. Additionally, Azure Activity Log
reports all changes that are being made at the platform level and should be monitored and audited for
noncompliant changes.

Configuration security reviews


Technical risk: Over time, new security threats or concerns can increase the risks of unauthorized access to secure
resources.
Policy statement: Cloud governance processes must include monthly review with configuration management
teams to identify malicious actors or usage patterns that should be prevented by cloud asset configuration.
Potential design options: Establish a monthly security review meeting that includes both governance team
members and IT staff responsible for configuration cloud applications and resources. Review existing security data
and metrics to establish gaps in current Deployment Acceleration policy and tooling, and update policy to
remediate any new risks.

Next steps
Use the samples mentioned in this article as a starting point to develop policies that address specific business risks
that align with your cloud adoption plans.
To begin developing your own custom policy statements related to identity management, download the Identity
Baseline template.
To accelerate adoption of this discipline, choose the actionable governance guide that most closely aligns with your
environment. Then modify the design to incorporate your specific corporate policy decisions.
Building on risks and tolerance, establish a process for governing and communicating Deployment Acceleration
policy adherence.
Establish policy compliance processes
Deployment Acceleration policy compliance
processes
4 minutes to read • Edit Online

This article discusses an approach to policy adherence processes that govern Deployment Acceleration. Effective
governance of cloud configuration starts with recurring manual processes designed to detect issues and impose
policies to remediate those risks. However, you can automate these processes and supplement with tooling to
reduce the overhead of governance and allow for faster response to deviation.

Planning, review, and reporting processes


The best Deployment Acceleration tools in the cloud are only as good as the processes and policies that they
support. The following is a set of example processes commonly used as part of a Deployment Acceleration
discipline. Use these examples as a starting point when planning the processes that will allow you to continue to
update deployment and configuration policy based on business change and feedback from the development and
IT teams responsible for turning governance guidance into action.
Initial risk assessment and planning: As part of your initial adoption of the Deployment Acceleration discipline,
identify your core business risks and tolerances related to deployment of your business applications. Use this
information to discuss specific technical risks with members of the IT operations team, and develop a baseline set
of deployment and configuration policies for remediating these risks to establish your initial governance strategy.
Deployment planning: Before deploying any asset, perform a security and operations review to identify any new
risks and ensure all deployment related policy requirements are met.
Deployment testing: As part of the deployment process for any asset, the cloud governance team, in
cooperation with your IT operations teams, is responsible for reviewing the deployment policy compliance.
Annual planning: Conduct an annual high-level review of Deployment Acceleration strategy. Explore future
corporate priorities and updated cloud adoption strategies to identify potential risk increase and other emerging
configuration needs and opportunities. Also use this time to review the latest DevOps best practices and integrate
these into your policies and review processes.
Quarterly review and planning: Conduct a quarterly review of operational audit data and incident reports to
identify any changes required in Deployment Acceleration policy. As part of this process, review current DevOps
and DevTechOps best practices, and update policy as appropriate. After the review is complete, align application
and systems design guidance with updated policy.
This planning process is also a good time to evaluate the current membership of your cloud governance team for
knowledge gaps related to new or changing policy and risks related to DevOps and Deployment Acceleration.
Invite relevant IT staff to participate in reviews and planning as either temporary technical advisors or permanent
members of your team.
Education and training: On a bimonthly basis, offer training sessions to make sure IT staff and developers are
up-to-date on the latest Deployment Acceleration strategy and requirements. As part of this process review and
update any documentation, guidance, or other training assets to ensure they are in sync with the latest corporate
policy statements.
Monthly audit and reporting reviews: Perform a monthly audit on all cloud deployments to assure their
continued alignment with configuration policy. Review deployment-related activities with IT staff and identify any
compliance issues not already handled as part of the ongoing monitoring and enforcement process. The result of
this review is a report for the cloud strategy team and each cloud adoption team to communicate overall
adherence to policy. The report is also stored for auditing and legal purposes.

Ongoing monitoring processes


Determining if your Deployment Acceleration governance strategy is successful depends on visibility into the
current and past state of your cloud infrastructure. Without the ability to analyze the relevant metrics and data of
your cloud resources operational health and activity, you cannot identify changes in your risks or detect violations
of your risk tolerances. The ongoing governance processes discussed above requires quality data to ensure policy
can be modified to protect your infrastructure against changing threats and risks from misconfigured resources.
Ensure that your IT operations teams have implemented automated monitoring systems for your cloud
infrastructure that capture the relevant logs data you need to evaluate risk. Be proactive in monitoring these
systems to ensure prompt detection and mitigation of potential policy violation, and ensure your monitoring
strategy is in line with deployment and configuration needs.

Violation triggers and enforcement actions


Because noncompliance with configuration policies can lead to critical service disruption risks, the cloud
governance team should have visibility into serious policy violations. Ensure IT staff have clear escalation paths for
reporting configuration compliance issues to the governance team members best suited to identify and verify that
policy issues are mitigated when detected.
When violations are detected, you should take actions to realign with policy as soon as possible. Your IT team can
automate most violation triggers using the tools outlined in the Deployment Acceleration toolchain for Azure.
The following triggers and enforcement actions provide examples you can use when discussing how to use
monitoring data to resolve policy violations:
Unexpected changes in configuration detected. If the configuration of a resource changes unexpectedly,
work with IT staff and workload owners to identify root cause and develop a remediation plan.
Configuration of new resources does not adhere to policy. Work with DevOps teams and workload
owners to review Deployment Acceleration policies during project startup so everyone involved understands
the relevant policy requirements.
Deployment failures or configuration issues cause delays in project schedules. Work with development
teams and workload owners to ensure the team understands how to automate the deployment of cloud-based
resources for consistency and repeatability. Fully automated deployments should be required early in the
development cycle—trying to accomplish this late in the development cycle usually leads to unexpected issues
and delays.

Next steps
Using the Cloud Management template, document the processes and triggers that align to the current cloud
adoption plan.
For guidance on executing cloud management policies in alignment with adoption plans, see the article on
discipline improvement.
Deployment Acceleration discipline improvement
Deployment Acceleration discipline improvement
4 minutes to read • Edit Online

The Deployment Acceleration discipline focuses on establishing policies that ensure that resources are deployed
and configured consistently and repeatably, and remain in compliance throughout their lifecycle. Within the Five
Disciplines of Cloud Governance, Deployment Acceleration includes decisions regarding automating deployments,
source-controlling deployment artifacts, monitoring deployed resources to maintain desired state, and auditing
any compliance issues.
This article outlines some potential tasks your company can engage in to better develop and mature the
Deployment Acceleration discipline. These tasks can be broken down into planning, building, adopting, and
operating phases of implementing a cloud solution, which are then iterated on allowing the development of an
incremental approach to cloud governance.

Figure 1 - Adoption phases of the incremental approach to cloud governance.


It's impossible for any one document to account for the requirements of all businesses. As such, this article
outlines suggested minimum and potential example activities for each phase of the governance maturation
process. The initial objective of these activities is to help you build a Policy MVP and establish a framework for
incremental policy improvement. Your cloud governance team will need to decide how much to invest in these
activities to improve your Identity Baseline governance capabilities.
Cau t i on

Neither the minimum or potential activities outlined in this article are aligned to specific corporate policies or
third-party compliance requirements. This guidance is designed to help facilitate the conversations that will lead to
alignment of both requirements with a cloud governance model.

Planning and readiness


This phase of governance maturity bridges the divide between business outcomes and actionable strategies.
During this process, the leadership team defines specific metrics, maps those metrics to the digital estate, and
begins planning the overall migration effort.
Minimum suggested activities:
Evaluate your Deployment Acceleration toolchain options and implement a hybrid strategy that is appropriate
to your organization.
Develop a draft Architecture Guidelines document and distribute to key stakeholders.
Educate and involve the people and teams affected by the development of Architecture Guidelines.
Train development teams and IT staff to understand DevSecOps principles and strategies and the importance
of fully automated deployments in the Deployment Acceleration Discipline.
Potential activities:
Define roles and assignments that will govern Deployment Acceleration in the cloud.

Build and predeployment


Minimum suggested activities:
For new cloud-based applications, introduce fully automated deployments early in the development process.
This investment will improve the reliability of your testing processes and ensure consistency across your
development, QA, and production environments.
Store all deployment artifacts such as deployment templates or configuration scripts using a source-control
platform such as GitHub or Azure DevOps.
Store all secrets, passwords, certificates, and connection strings in Azure Key Vault
Consider a pilot test before implementing your Deployment Acceleration toolchain, making sure it streamlines
your deployments as much as possible. Apply feedback from pilot tests during the predeployment phase,
repeating as needed.
Evaluate the logical and physical architecture of your applications, and identify opportunities to automate the
deployment of application resources or improve portions of the architecture using other cloud-based
resources.
Update the Architecture Guidelines document to include deployment and user adoption plans, and distribute to
key stakeholders.
Continue to educate the people and teams most affected by the architecture guidelines.
Potential activities:
Define a continuous integration and continuous deployment (CI/CD ) pipeline to fully manage releasing
updates to your application through your development, QA, and production environments.

Adopt and migrate


Migration is an incremental process that focuses on the movement, testing, and adoption of applications or
workloads in an existing digital estate.
Minimum suggested activities:
Migrate your Deployment Acceleration toolchain from development to production.
Update the Architecture Guidelines document and distribute to key stakeholders.
Develop educational materials and documentation, awareness communications, incentives, and other programs
to help drive developer and IT adoption.
Potential activities:
Validate that the best practices defined during the build and predeployment phases are properly executed.
Ensure that each application or workload aligns with the Deployment Acceleration strategy before release.

Operate and post-implementation


Once the transformation is complete, governance and operations must live on for the natural lifecycle of an
application or workload. This phase of governance maturity focuses on the activities that commonly come after the
solution is implemented and the transformation cycle begins to stabilize.
Minimum suggested activities:
Customize your Deployment Acceleration toolchain based on changes to your organization's changing identity
needs.
Automate notifications and reports to alert you of potential configuration issues or malicious threats.
Monitor and report on application and resource usage.
Report on post-deployment metrics and distribute to stakeholders.
Revise the Architecture Guidelines to guide future adoption processes.
Continue to communicate with and train the affected people and teams on a regular basis to ensure ongoing
adherence to Architecture Guidelines.
Potential activities:
Configure a desired state configuration monitoring and reporting tool.
Regularly review configuration tools and scripts to improve processes and identify common issues.
Work with development, operations, and security teams to help mature DevSecOps practices and break down
organizational silos that lead to inefficiencies.

Next steps
Now that you understand the concept of cloud identity governance, examine the Identity Baseline toolchain to
identify Azure tools and features that you'll need when developing the Identity Baseline governance discipline on
the Azure platform.
Identity Baseline toolchain for Azure
Deployment Acceleration tools in Azure
2 minutes to read • Edit Online

Deployment Acceleration is one of the Five Disciplines of Cloud Governance. This discipline focuses on ways of
establishing policies to govern asset configuration or deployment. Within the Five Disciplines of Cloud
Governance, the Deployment Acceleration discipline involves deployment and configuration alignment. This
could be through manual activities or fully automated DevOps activities. In either case, the policies involved
would remain largely the same.
Cloud custodians, cloud guardians, and cloud architects with an interest in governance are each likely to invest a
lot of time in the Deployment Acceleration discipline, which codifies policies and requirements across multiple
cloud adoption efforts. The tools in this toolchain are important to the cloud governance team and should be a
high priority on the learning path for the team.
The following is a list of Azure tools that can help mature the policies and processes that support this governance
discipline.

AZURE AZURE AZURE


MANAGEMENT RESOURCE AZURE RESOURCE AZURE COST
AZURE POLICY GROUPS MANAGER BLUEPRINTS GRAPH MANAGEMENT

Implement Yes No No No No No
corporate
policies

Apply policies Required Yes No No No No


across
subscriptions

Deploy No No Yes No No No
defined
resources

Create fully Required Required Required Yes No No


compliant
environments

Audit policies Yes No No No No No

Query Azure No No No No Yes No


resources

Report on No No No No No Yes
cost of
resources

The following are additional tools that may be required to accomplish specific Deployment Acceleration
objectives. Often these tools are used outside of the governance team, but are still considered an aspect of
Deployment Acceleration as a discipline.
AZURE
RESOURCE AZURE AZURE AZURE SITE
AZURE PORTAL MANAGER AZURE POLICY DEVOPS BACKUP RECOVERY

Manual Yes Yes No Not efficiently No Yes


deployment
(single asset)

Manual Not efficiently Yes No Not efficiently No Yes


deployment
(full
environment)

Automated No Yes No Yes No Yes


deployment
(full
environment)

Update Yes Yes Not efficiently Not efficiently No Yes - during


configuration replication
of a single
asset

Update Not efficiently Yes Yes Yes No Yes - during


configuration replication
of a full
environment

Manage Not efficiently Not efficiently Yes Yes No Yes - during


configuration replication
drift

Create an No No No Yes No No
automated
pipeline to
deploy code
and configure
assets
(DevOps)

Aside from the Azure native tools mentioned above, it is common for customers to use third-party tools to
facilitate Deployment Acceleration and DevOps deployments.
Delivering on a cloud strategy requires solid planning, readiness, and adoption. But it's the ongoing operation of the digital
assets that delivers tangible business outcomes. Without a plan for reliable, well-managed operations of the cloud solutions,
those efforts will yield little value. The following exercises help develop the business and technical approaches needed to provide
cloud management that powers ongoing operations.

Getting started
To prepare you for this phase of the cloud adoption lifecycle, the framework suggests the following exercises:

Establish a management baseline


Define the criticality classifications, cloud management tools, and processes required to deliver your minimum commitment
to operations management.

Define business commitments


Document supported workloads to establish operational commitments with the business and agree on cloud management
investments for each workload.

Expand the management baseline


Based on business commitments and operations decisions, make use of the included best practices to implement the required
cloud management tooling.

Advanced operations and design principles


Platforms or Workloads that require a higher level of business commitment might require a deeper architecture review to
deliver on resiliency and reliability commitments.

Scalable cloud management methodology


The preceding steps create actionable approaches to deliver on the Cloud Adoption Framework's Manage methodology.
Create a balanced cloud portfolio
As outlined in the business alignment article, not all workloads are mission critical. Within any portfolio are various degrees of
operational management needs. Business alignment efforts aid in capturing the business impact and negotiating management
costs with the business, to ensure the most appropriate operational management processes and tools.

Objective of this content


The guidance in this section of the Cloud Adoption Framework serves two purposes:
Provides examples of actionable operations management approaches that represent common experiences often encountered
by customers.
Helps you create personalized management solutions based on business commitments.
This content is intended for use by the cloud operations team. It's also relevant to cloud architects who need to develop a strong
foundation in cloud operations or cloud design principles.

Intended audience
The content in the Cloud Adoption Framework affects the business, technology, and culture of enterprises. This section of the
Cloud Adoption Framework interacts heavily with IT operations, IT governance, finance, line-of-business leaders, networking,
identity, and cloud adoption teams. Various dependencies on these personnel require a facilitative approach by the cloud
architects who are using this guidance. Facilitation with these teams is seldom a one-time effort.
The cloud architect serves as the thought leader and facilitator to bring these audiences together. The content in this collection of
guides is designed to help the cloud architect facilitate the right conversation, with the right audience, to drive necessary
decisions. Business transformation that's empowered by the cloud depends on the cloud architect to help guide decisions
throughout the business and IT.
Cloud architect specialization in this section: Each section of the Cloud Adoption Framework represents a different
specialization or variant of the cloud architect role. This section of the Cloud Adoption Framework is designed for cloud
architects with a passion for operations and management of deployment solutions. Within this framework, these specialists are
referred to frequently as cloud operations, or collectively as the cloud operations team.

Use this guide


If you want to follow this guide from beginning to end, this content aids in developing a robust cloud operations strategy. The
guidance walks you through the theory and implementation of such a strategy.
Next steps
Apply the methodology to establish clear business commitments.
Establish clear business commitments
2 minutes to read • Edit Online

Azure management guide: Before you start


NOTE
This guide is a starting point for innovation guidance in the Cloud Adoption Framework. It is also available in Azure
Quickstart Center. See the tip later in this article for a link to Azure Quickstart Center.

Before you start


The Azure Management Guide helps Azure customers create a management baseline to establish resource
consistency across Azure. This guide outlines the basic tools needed for any Azure production environments,
especially environments that host sensitive data. For more information, best practices, and considerations related
to preparing your cloud environment, see the Cloud Adoption Framework's readiness section.

Scope of this guide


This guide teaches you how to establish tooling for a management baseline. It also outlines ways to extend the
baseline or build resiliency beyond the baseline.
Inventory and visibility: Create an inventory of assets across multiple clouds. Develop visibility into the run
state of each asset.
Operational compliance: Establish controls and processes to ensure each state is properly configured and
running in a well-governed environment.
Protect and recover: Ensure all managed assets are protected and can be recovered using baseline
management tooling.
Enhanced baseline options: Evaluate common additions to the baseline that might meet business needs.
Platform operations: Extend the management baseline with a well-defined service catalog and centrally
managed platforms.
Workload operations: Extend the management baseline to include a focus on mission-critical workloads.

Management baseline
A management baseline is the minimum set of tools and processes that should be applied to every asset in an
environment. Several additional options can be included in the management baseline. The next few articles
accelerate cloud management capabilities by focusing on the minimum options necessary instead of on all of the
available options.

TIP
For an interactive experience, view this guide in the Azure portal. Go to Azure Quickstart Center in the Azure portal and
select Azure Management Guide. Then follow the step-by-step instructions.

The next step is Inventory and visibility.


This guide provides interactive steps that let you try features as they're introduced. To come back to where you left
off, use the breadcrumb for navigation.
Inventory and visibility in Azure
3 minutes to read • Edit Online

Inventory and visibility is the first of three disciplines in a cloud management baseline.

This discipline comes first because collecting proper operational data is vital when you make decisions about
operations. Cloud management teams must understand what is managed and how well those assets are operated.
This article describes the different tools that provide both an inventory and visibility into the inventory's run state.
For any enterprise-grade environment, the following table outlines the suggested minimum for a management
baseline.

PROCESS TOOL PURPOSE

Monitor health of Azure services Azure Service Health Health, performance, and diagnostics
for services running in Azure

Log centralization Log Analytics Central logging for all visibility purposes

Monitoring centralization Azure Monitor Central monitoring of operational data


and trends

Virtual machine inventory and change Azure Change Tracking and Inventory Inventory VMs and monitor changes for
tracking guest OS level

Subscription Monitoring Azure Activity Log Monitoring change at the subscription


level

Guest OS monitoring Azure Monitor for VMs Monitoring changes and performance
of VMs

Network monitoring Azure Network Watcher Monitoring network changes and


performance
PROCESS TOOL PURPOSE

DNS monitoring DNS Analytics Security, performance, and operations


of DNS

Azure Service Health


Azure Service Health
Azure Service Health provides a personalized view of the health of your Azure services and regions. Information
about active issues is posted to Service Health to help you understand the effect on your resources. Regular
updates keep you informed as issues are resolved.
We also publish planned maintenance events to Service Health so you'll know about changes that can affect
resource availability. Set up Service Health alerts to notify you when service issues, planned maintenance, or other
changes might affect your Azure services and regions.
Azure Service Health includes:
Azure status: A global view of the health of Azure services.
Service health: A personalized view of the health of your Azure services.
Resource health: A deeper view of the health of your individual resources.
Action
To set up a Service Health alert:
1. Go to Service Health.
2. Select Health alerts.
3. Create a service health alert.
G O TO S E R V I C E
H E A L TH

To set up a Service Health alert, go to the Azure portal.


Learn more
To learn more, see the Azure Service Health documentation.

Log Analytics
Log Analytics
A Log Analytics workspace is a unique environment for storing Azure Monitor log data. Each workspace has its
own data repository and configuration. Data sources and solutions are configured to store their data in particular
workspaces. Azure monitoring solutions require all servers to be connected to a workspace, so that their log data
can be stored and accessed.
Action
E XP L O R E A Z U R E
M O N I TO R

Learn more
To learn more, see the Log Analytics workspace creation documentation.

Azure Monitor
Azure Monitor
Azure Monitor provides a single unified hub for all monitoring and diagnostics data in Azure and gives you
visibility across your resources. With Azure Monitor, you can find and fix problems and optimize performance. You
can also understand customer behavior.
Monitor and visualize metrics. Metrics are numerical values available from Azure resources. They help
you understand the health of your systems. Customize charts for your dashboards, and use workbooks for
reporting.
Query and analyze logs. Logs include activity logs and diagnostic logs from Azure. Collect additional logs
from other monitoring and management solutions for your cloud or on-premises resources. Log Analytics
provides a central repository to aggregate all of this data. From there, you can run queries to help
troubleshoot issues or to visualize data.
Set up alerts and actions. Alerts notify you of critical conditions. Corrective actions can be taken based on
triggers from metrics, logs, or service-health issues. You can set up different notifications and actions and
can also send data to your IT service management tools.
Action
E XP L O R E A Z U R E
M O N I TO R

Start monitoring your:


Applications
Containers
Virtual machines
Networks
To monitor other resources, find additional solutions in Azure Marketplace.
To explore Azure Monitor, go to the Azure portal.
Learn more
To learn more, see Azure Monitor documentation.

Onboard solutions
Onboard solutions
To enable solutions, you need to configure the Log Analytics workspace. Onboarded Azure VMs and on-premises
servers get the solutions from the Log Analytics workspaces they're connected to.
There are two approaches to onboarding:
Single VM
Entire subscription
Each article guides you through a series of steps to onboard these solutions:
Update Management
Change Tracking and Inventory
Azure Activity Log
Azure Log Analytics Agent Health
Antimalware Assessment
Azure Monitor for VMs
Azure Security Center
Each of the previous steps helps establish inventory and visibility.
Operational compliance in Azure
3 minutes to read • Edit Online

Operational compliance is the second discipline in any cloud management baseline.

Improving operational compliance reduces the likelihood of an outage related to configuration drift or
vulnerabilities related to systems being improperly patched.
For any enterprise-grade environment, this table outlines the suggested minimum for a management baseline.

PROCESS TOOL PURPOSE

Patch management Update Management Management and scheduling of


updates

Policy enforcement Azure Policy Policy enforcement to ensure


environment and guest compliance

Environment configuration Azure Blueprints Automated compliance for core services

Resource Configuration Desired State Configuration Automated configuration on Guest OS


and some aspects of the environment

Update Management
Update Management
Computers that are managed by Update Management use the following configurations to do assessment and
update deployments:
Microsoft Monitoring Agent (MMA) for Windows or Linux
PowerShell Desired State Configuration (DSC ) for Linux
Azure Automation Hybrid Runbook Worker
Microsoft Update or Windows Server Update Services (WSUS ) for Windows computers
For more information, see Update Management solution.

WARNING
Before using Update Management, you must onboard virtual machines or an entire subscription into Log Analytics and
Azure Automation.
There are two approaches to onboarding:
Single VM
Entire subscription
You should follow one before proceeding with Update Management.

Manage updates
To apply a policy to a resource group:
1. Go to Azure Automation.
2. Select Automation accounts, and choose one of the listed accounts.
3. Go to Configuration Management.
4. Inventory, Change Management, and State Configuration can be used to control the state and operational
compliance of the managed VMs.
AS S IG N
PO LIC Y

Azure Policy
Azure Policy
Azure Policy is used throughout governance processes. It's also highly valuable within cloud management
processes. Azure Policy can audit and remediate Azure resources and can also audit settings inside a machine. The
validation is performed by the Guest Configuration extension and client. The extension, through the client, validates
settings like:
Operating system configuration.
Application configuration or presence.
Environment settings.
Azure Policy Guest Configuration currently only audits settings inside the machine. It doesn't apply configurations.
Action
Assign a built-in policy to a management group, subscription, or resource group.
AS S IG N
PO LIC Y

Apply a policy
To apply a policy to a resource group:
1. Go to Azure Policy.
2. Select Assign a policy.
Learn more
To learn more, see:
Azure Policy
Azure Policy - Guest configuration
Cloud Adoption Framework: Policy enforcement decision guide

Azure Blueprints
Azure Blueprints
With Azure Blueprints, cloud architects and central information-technology groups can define a repeatable set of
Azure resources. These resources implement and adhere to an organization's standards, patterns, and
requirements.
With Azure Blueprints, development teams can rapidly build and stand up new environments. Teams can also trust
they're building within organizational compliance. They do so by using a set of built-in components like networking
to speed up development and delivery.
Blueprints are a declarative way to orchestrate the deployment of different resource templates and other artifacts
like:
Role assignments.
Policy assignments.
Azure Resource Manager templates.
Resource groups.
Applying a blueprint can enforce operational compliance in an environment if this enforcement isn't done by the
cloud governance team.
Create a blueprint
To create a blueprint:
1. Go to Blueprints - Getting started.
2. On the Create a Blueprint pane, select Create.
3. Filter the list of blueprints to select the appropriate blueprint.
4. In the Blueprint name box, enter the blueprint name.
5. Select Definition location, and choose the appropriate location.
6. Select Next : Artifacts >>, and review the artifacts included in the blueprint.
7. Select Save Draft.
C R E A TE A
B LU E PR INT

1. Go to Blueprints - Getting started.


2. On the Create a Blueprint pane, select Create.
3. Filter the list of blueprints to select the appropriate blueprint.
4. In the Blueprint name box, enter the blueprint name.
5. Select Definition location, and choose the appropriate location.
6. Select Next : Artifacts >>, and review the artifacts included in the blueprint.
7. Select Save Draft.
Publish a blueprint
To publish blueprint artifacts to your subscription:
1. Go to Blueprints - Blueprint definitions.
2. Select the blueprint you created in the previous steps.
3. Review the blueprint definition and select Publish blueprint.
4. In the Version box, enter a version like "1.0".
5. In the Change notes box, enter your notes.
6. Select Publish.
B LU E PR INT
D E F I N I TI O N S

1. Go to Blueprints - Blueprint definitions.


2. Select the blueprint you created in the previous steps.
3. Review the blueprint definition and select Publish blueprint.
4. In the Version box, enter a version like "1.0".
5. In the Change notes box, enter your notes.
6. Select Publish.
Learn more
To learn more, see:
Azure Blueprints
Cloud Adoption Framework: Resource consistency decision guide
Standards-based blueprints samples
Protect and recover in Azure
2 minutes to read • Edit Online

Protect and recover is the third and final discipline in any cloud-management baseline.

In Operational compliance in Azure the objective is to reduce the likelihood of a business interruption. The current
article aims to reduce the duration and impact of outages that can't be prevented.
For any enterprise-grade environment, this table outlines the suggested minimum for any management baseline:

PROCESS TOOL PURPOSE

Protect data Azure Backup Back up data and virtual machines in


the cloud.

Protect the environment Azure Security Center Strengthen security and provide
advanced threat protection across your
hybrid workloads.

Azure Backup
Azure Backup
With Azure Backup, you can back up, protect, and recover your data in the Microsoft cloud. Azure Backup replaces
your existing on-premises or offsite backup solution with a cloud-based solution. This new solution is reliable,
secure, and cost competitive. Azure Backup can also help protect and recover on-premises assets through one
consistent solution.
Enable backup for an Azure VM
1. In the Azure portal, select Virtual machines, and select the VM you want to replicate.
2. On the Operations pane, select Backup.
3. Create or select an existing Azure Recovery Services vault.
4. Select Create (or edit) a new policy.
5. Configure the schedule and retention period.
6. Select OK.
7. Select Enable Backup.
G O TO V I R TU A L
M AC H INE S

Overview

Azure Site Recovery


Azure Site Recovery
Azure Site Recovery is a critical component in your disaster recovery strategy.
Site Recovery replicates VMs and workloads that are hosted in a primary Azure region. It replicates them to a copy
that is hosted in a secondary region. When an outage occurs in your primary region, you fail over to the copy
running in the secondary region. You then continue to access your applications and services from there. This
proactive approach to recovery can significantly reduce recovery times. When the recovery environment is no
longer needed, production traffic can fall back to the original environment.
Replicate an Azure VM to another region with Site Recovery
The following steps outline the process to use Site Recovery for Azure-to-Azure replication, which is replication of
an Azure VM to another region.

TIP
Depending on your scenario, the exact steps might differ slightly.

Enable replication for the Azure VM


1. In the Azure portal, select Virtual machines, and select the VM you want to replicate.
2. On the Operations pane, select Disaster recovery.
3. Select Configure disaster recovery > Target region, and choose the target region to which you'll replicate.
4. For this quickstart, accept the default values for all other options.
5. Select Enable replication, which starts a job to enable replication for the VM.
G O TO V I R TU A L
M AC H INE S

Verify settings
After the replication job has finished, you can check the replication status, verify replication health, and test the
deployment.
1. In the VM menu, select Disaster recovery.
2. Verify replication health, the recovery points that have been created, and source and target regions on the map.
G O TO V I R TU A L
M AC H INE S

Learn more
Azure Site Recovery overview
Replicate an Azure VM to another region
Enhanced management baseline in Azure
3 minutes to read • Edit Online

The first three cloud management disciplines describe a management baseline. The preceding articles in this guide
outline a minimum viable product (MVP ) for cloud management services, which is referred to as a management
baseline. This article outlines a few common improvements to the baseline.
The purpose of a management baseline is to create a consistent offering that provides a minimum level of business
commitment for all supported workloads. With this baseline of common, repeatable management offerings, the
team can deliver highly optimized operational management with minimal deviation.
However, you might need a greater commitment to the business beyond the standard offering. The following
image and list show three ways to go beyond the management baseline.

Workload operations:
Largest per-workload operations investment.
Highest degree of resiliency.
Suggested for the approximately 20% of workloads that drive business value.
Typically reserved for high-criticality or mission-critical workloads.
Platform operations:
Operations investment is spread across many workloads.
Resiliency improvements affect all workloads that use the defined platform.
Suggested for the approximately 20% of platforms that have highest criticality.
Typically reserved for medium-criticality to high-criticality workloads.
Enhanced management baseline:
Lowest relative operations investment.
Slightly improved business commitments using additional cloud-native operations tools and processes.
Both workload operations and platform operations require changes to design and architecture principles. Those
changes can take time and might result in increased operating expenses. To reduce the number of workloads that
require such investments, an enhanced management baseline can provide enough of an improvement to the
business commitment.
This table outlines a few processes, tools, and potential effects common in customers' enhanced management
baselines:

DISCIPLINE PROCESS TOOL POTENTIAL IMPACT LEARN MORE

Inventory and Service change Azure Resource Graph Greater visibility into Overview of Azure
visibility tracking changes to Azure Resource Graph
services might help
detect negative effects
sooner or remediate
faster.

Inventory and IT service IT Service Automated ITSM IT Service


visibility management (ITSM) Management connection creates Management
integration Connector awareness sooner. Connector (ITSMC)

Operational Operations Azure Automation Automate operational See the following


compliance automation compliance for faster sections
and more accurate
response to change.

Operational Multicloud operations Azure Automation Automate operations Hybrid Runbook


compliance Hybrid Runbook across multiple clouds. Worker overview
Worker

Operational Guest automation Desired State Code-based DSC Overview


compliance Configuration (DSC) configuration of guest
operating systems to
reduce errors and
configuration drift.

Protect and recover Breach notification Azure Security Center Extend protection to See the following
include security- sections
breach recovery
triggers.

Azure Automation
Azure Automation
Azure Automation provides a centralized system for the management of automated controls. In Azure Automation,
you can run simple remediation, scale, and optimization processes in response to environmental metrics. These
processes reduce the overhead associated with manual incident processing.
Most importantly, automated remediation can be delivered in near-real-time, significantly reducing interruptions to
business processes. A study of the most common business interruptions identifies activities within your
environment that could be automated.
Runbooks
The basic unit of code for delivering automated remediation is a runbook. Runbooks contain the instructions for
remediating or recovering from an incident.
To create or manage runbooks:
1. Go to Azure Automation.
2. Select Automation accounts and choose one of the listed accounts.
3. Go to Process automation.
4. With the options presented, you can create or manage runbooks, schedules, and other automated remediation
functionality.
AS S IG N
PO LIC Y

Azure Security Center


Azure Security Center
Azure Security Center also plays an important part in your protect-and-recover strategy. It can help you monitor
the security of your machines, networks, storage, data services, and applications.
Azure Security Center provides advanced threat detection by using machine learning and behavioral analytics to
help identify active threats targeting your Azure resources. It also provides threat protection that blocks malware
and other unwanted code, and it reduces the surface area exposed to brute force and other network attacks.
When Azure Security Center identifies a threat, it triggers a security alert with steps you need for responding to an
attack. It also provides a report with information about the detected threat.
Azure Security Center is offered in two tiers: Free and Standard. Features like security recommendations are
available in the Free tier. The Standard tier provides additional protection like advanced threat detection and
protection across hybrid cloud workloads.
Action
Try Standard tier for free for your first 30 days
After you enable and configure security policies for a subscription's resources, you can view the security state of
your resources and any issues on the Prevention pane. You can also view a list of those issues on the
Recommendations tile.
E XP L O R E A Z U R E S E C U R I T Y
C E N TE R

To explore Azure Security Center, go to the Azure portal.


Learn more
To learn more, see Azure Security Center documentation.
Platform specialization for cloud management
5 minutes to read • Edit Online

Much like the enhanced management baseline, platform specialization is extension beyond the standard
management baseline. See the following image and list that show the ways to expand the management baseline.
This article addresses the platform specialization options.

Workload operations: The largest per-workload operations investment and the highest degree of resiliency.
We suggest workload operations for the approximately 20% of workloads that drive business value. This
specialization is usually reserved for high criticality or mission-critical workloads.
Platform operations: Operations investment is spread across many workloads. Resiliency improvements
affect all workloads that use the defined platform. We suggest platform operations for the approximately 20%
of platforms that have the highest criticality. This specialization is usually reserved for medium to high criticality
workloads.
Enhanced management baseline: The relatively lowest operations investment. This specialization slightly
improves business commitments by using additional cloud-native operations tools and processes.
Both workload and platform operations require changes to design and architecture principles. Those changes can
take time and might result in increased operating expenses. To reduce the number of workloads requiring such
investments, an enhanced management baseline might provide enough of an improvement to the business
commitment.
This table outlines a few common processes, tools, and potential effects common in customers' enhanced
management baselines:

SUGGESTED MANAGEMENT
PROCESS TOOL PURPOSE LEVEL

Improve system design Azure Architecture Improving the architectural N/A


Framework design of the platform to
improve operations

Automate remediation Azure Automation Responding to advanced Platform operations


platform data with platform-
specific automation

Service catalog Managed applications center Providing a self-service Platform operations


catalog of approved
solutions that meet
organizational standards

Container performance Azure Monitor for containers Monitoring and diagnostics Platform operations
of containers

Platform as a service (PaaS) Azure SQL Analytics Monitoring and diagnostics Platform operations
data performance for PaaS databases

Infrastructure as a service SQL Server Health Check Monitoring and diagnostics Platform operations
(IaaS) data performance for IaaS databases

High-level process
Platform specialization consists of a disciplined execution of the following four processes in an iterative approach.
Each process is explained in more detail in later sections of this article.
Improve system design: Improve the design of common systems or platforms to effectively minimize
interruptions.
Automate remediation: Some improvements aren't cost effective. In such cases, it might make more sense to
automate remediation and reduce the effect of interruptions.
Scale the solution: As systems design and automated remediation are improved, those changes can be scaled
across the environment through the service catalog.
Continuous improvement: Different monitoring tools can be used to discover incremental improvements.
These improvements can be addressed in the next pass of system design, automation, and scale.

Improve system design


Improve system design
Improving system design is the most effective approach to improving operations of any common platform.
Through system-design improvements, stability can increase and business interruptions can decrease. Design of
individual systems is out of scope for the environment view that is taken throughout Cloud Adoption Framework
for Azure.
As a complement to Cloud Adoption Framework, Azure Architecture Framework provides best practices for
improving the resiliency and design of a specific system. Those design improvements can be applied to the
systems design of either a platform or a specific workload.
Azure Architecture Framework focuses on improvement across five pillars of system design:
Scalability: Scaling the common platform assets to handle increased load
Availability: Reducing business interruptions by improving uptime potential
Resiliency: Improving recovery times to reduce the duration of interruptions
Security: Protecting applications and data from external threats
Management: Operational processes specific to those common platform assets
Technical debt and architectural flaws cause most business interruptions. For existing deployments, you can view
system-design improvements as payments against existing technical debt. For new deployments, you can view
those improvements as avoidance of technical debt.
The following Automated remediation tab shows ways to remediate technical debt that can't or shouldn't be
addressed.
Learn more about Azure Architecture Framework to improve system design.
As system design improves, return to this article to find new opportunities to improve and scale those
improvements across your environment.

Automated remediation
Automated remediation
Some technical debt can't be addressed. Resolution might be too expensive to correct or might be planned but
have a long project duration. The business interruption might not have a significant business effect. Or the
business priority might be to recover quickly instead of investing in resiliency.
When resolution of technical debt isn't the desired approach, automated remediation is commonly the next step.
Using Azure Automation and Azure Monitor to detect trends and provide automated remediation is the most
common approach to automated remediation.
For guidance on automated remediation, see Azure Automation and alerts.

Scale the solution with a service catalog


Scale the solution with a service catalog
A well-managed service catalog is the cornerstone of platform specialization and platform operations. Use of a
catalog is how improvements to systems design and remediation are scaled across an environment.
The cloud platform team and cloud automation team align to create repeatable solutions to the most common
platforms in any environment. But if those solutions aren't consistently used, cloud management can provide little
more than a baseline offering.
To maximize adoption and minimize maintenance overhead of any optimized platform, you should add the
platform to an Azure service catalog. You can deploy each application in the catalog for internal consumption via
the service catalog or as a marketplace offering for external consumers.
For instructions on publishing to a service catalog, see the article series on publishing to a service catalog.
Deploy applications from the service catalog
1. In the Azure portal, go to Managed applications center (preview).
2. On the Browse pane, select Service Catalog applications.
3. Click + Add to choose an application definition from your company's service catalog.
Any managed applications you're servicing are displayed.
G O TO V I R TU A L
M AC H INE S

Manage service catalog applications


1. In the Azure portal, go to Managed applications center (preview).
2. On the Service pane, select Service Catalog applications.
Any managed applications you're servicing are displayed.
G O TO V I R TU A L
M AC H INE S

Continuous improvement
Continuous improvement
Platform specialization and platform operations both depend on strong feedback loops among adoption, platform,
automation, and management teams. Grounding those feedback loops in data helps each team make wise
decisions. For platform operations to achieve long-term business commitments, it's important to use insights
specific to the centralized platform.
Containers and SQL Server are the two most common centrally managed platforms. These articles can help you
get started with continuous-improvement data collection on those platforms:
Container performance
PaaS database performance
IaaS database performance
Workload specialization for cloud management
2 minutes to read • Edit Online

Workload specialization builds on the concepts outlined in Platform Specialization.

Workload operations: The largest per-workload operations investment and highest degree of resiliency. We
suggest workload operations for the approximately 20% of workloads that drive business value. This
specialization is usually reserved for high criticality or mission-critical workloads.
Platform operations: Operations investment is spread across many workloads. Resiliency improvements
affect all workloads that use the defined platform. We suggest platform operations for the approximately 20% of
platforms that have the highest criticality. This specialization is usually reserved for medium to high criticality
workloads.
Enhanced management baseline: The relatively lowest operations investment. This specialization slightly
improves business commitments by using additional cloud-native operations tools and processes.

High-level process
Workload specialization consists of a disciplined execution of the following four processes in an iterative approach.
Each process is explained in more detail in Platform Specialization.
Improve system design: Improve the design of a specific workload to effectively minimize interruptions.
Automate remediation: Some improvements aren't cost effective. In such cases, it might make more sense to
automate remediation and reduce the effect of interruptions.
Scale the solution: As you improve systems design and automated remediation, you can scale those changes
across the environment through the service catalog.
Continuous improvement: You can use different monitoring tools to discover incremental improvements.
These improvements can be addressed in the next pass of system design, automation, and scale.

Cultural change
Workload specialization often triggers a cultural change in traditional IT build processes that focus on delivering a
management baseline, enhanced baselines, and platform operations. Those types of offerings can be scaled across
the environment. Workload specialization is similar in execution to platform specialization. But unlike common
platforms, the specialization required by individual workloads often doesn't scale.
When workload specialization is required, operational management commonly evolves beyond a central IT
perspective. The approach suggested in Cloud Adoption Framework is a distribution of cloud management
functionality.
In this model, operational tasks like monitoring, deployment, DevOps, and other innovation-focused functions shift
to an application-development or business-unit organization. The Cloud Platform and core Cloud Monitoring team
still delivers on the management baseline across the environment.
Those centralized teams also guide and instruct workload-specialized teams on operations of their workloads. But
the day-to-day operational responsibility falls on a cloud management team that is managed outside of IT. This
type of distributed control is one of the primary indicators of maturity in a cloud center of excellence.

Beyond platform specialization: Application Insights


Greater detail on the specific workload is required to provide clear workload operations. During the continuous
improvement phase, Application Insights will be a necessary addition to the cloud management toolchain.

REQUIREMENT TOOL PURPOSE

Application monitoring Application Insights Monitoring and diagnostics for apps

Performance, availability, and usage Application Insights Advanced application monitoring with
app dashboard, composite maps, usage,
and tracing

Deploy Application Insights


1. In the Azure portal, go to Application Insights.
2. Select + Add to create an Application Insights resource to monitor your live web application.
3. Follow the on-screen prompts.
See the Azure Monitor Application Insights hub for guidance on configuring your application for monitoring.
C R E A TE A P P L I C A TI O N I N S I G H T
RESO URCES

Monitor performance, availability, and usage


1. In the Azure portal, search for Application Insights.
2. Choose one of the Application Insights resources from the list.
Application Insights contains different kinds of options for monitoring performance, availability, usage, and
dependencies. Each of these views of the application data provides clarity into the continuous-improvement
feedback loop.
M O N I TO R
A P P L I C A TI O N S
Overview of Azure server management services
2 minutes to read • Edit Online

Azure server management services provide a consistent experience for managing servers at scale. These services
cover both Linux and Windows operating systems. They can be used in production, development, and test
environments. The server management services can support Azure IaaS virtual machines, physical servers, and
virtual machines that are hosted on-premises or in other hosting environments.
The Azure server management services suite includes the services in the following diagram:

This section of the Microsoft Cloud Adoption Framework provides an actionable and prescriptive plan for
deploying server management services in your environment. This plan helps orient you quickly to these services,
guiding you through an incremental set of management stages for all environment sizes.
For simplicity, we've categorized this guidance into three stages:

Why use Azure server management services?


Azure server management services offer the following benefits:
Native to Azure: Server management services are built into and natively integrated with Azure Resource
Manager. These services are continuously improved to provide new features and capabilities.
Windows and Linux: Windows and Linux machines get the same consistent management experience.
Hybrid: The server management services cover Azure IaaS virtual machines as well as physical and virtual
servers that are hosted on-premises or in other hosting environments.
Security: Microsoft devotes substantial resources to all forms of security. This investment not only protects the
Azure infrastructure but also extends the resulting technologies and expertise to protect customers' resources
wherever they reside.

Next steps
Familiarize yourself with the tools, services, and planning involved with adopting the Azure server management
suite.
Prerequisite tools and planning
Phase 1: Prerequisite planning for Azure server
management services
6 minutes to read • Edit Online

In this phase, you'll become familiar with the Azure server management suite of services, and plan how to deploy
the resources needed to implement these management solutions.

Understand the tools and services


Review Azure server management tools and services for a detailed overview of:
The management areas that are involved in ongoing Azure operations.
The Azure services and tools that help support you in these areas.
You'll use several of these services together to meet your management requirements. These tools are referenced
often throughout this guidance.
The following sections discuss the planning and preparation required to use these tools and services.

Log Analytics workspace and Automation account planning


Many of the services you'll use to onboard Azure management services require a Log Analytics workspace and a
linked Azure Automation account.
A Log Analytics workspace is a unique environment for storing Azure Monitor log data. Each workspace has its
own data repository and configuration. Data sources and solutions are configured to store their data in particular
workspaces. Azure monitoring solutions require all servers to be connected to a workspace, so that their log data
can be stored and accessed.
Some of the management services require an Azure Automation account. You use this account, and the
capabilities of Azure Automation, to integrate Azure services and other public systems to deploy, configure, and
manage your server management processes.
The following Azure server management services require a linked Log Analytics workspace and Automation
account:
Azure Update Management
Change Tracking and Inventory
Hybrid Runbook Worker
Desired State Configuration
The second phase of this guidance focuses on deploying services and automation scripts. It shows you how to
create a Log Analytics workspace and an Automation account. This guidance also shows you how to use Azure
Policy to ensure that new virtual machines are connected to the correct workspace.
The examples in this guidance assume a deployment that doesn't already have servers deployed to the cloud. To
learn more about the principles and considerations involved in planning your workspaces, see Manage log data
and workspaces in Azure Monitor.

Planning considerations
When preparing the workspaces and accounts that you need for onboarding management services, consider the
following issues:
Azure geographies and regulatory compliance: Azure regions are organized into geographies. An Azure
geography ensures that data residency, sovereignty, compliance, and resiliency requirements are honored
within geographical boundaries. If your workloads are subject to data-sovereignty or other compliance
requirements, workspace and Automation accounts must be deployed to regions within the same Azure
geography as the workload resources they support.
Number of workspaces: As a guiding principle, create the minimum number of workspaces required per
Azure geography. We recommend at least one workspace for each Azure geography where your compute or
storage resources are located. This initial alignment helps avoid future regulatory issues when you migrate
data to different geographies.
Data retention and capping: You may also need to take Data retention policies or data capping requirements
into consideration when creating workspaces or Automation accounts. For more information about these
principles, and for additional considerations when planning your workspaces, see Manage log data and
workspaces in Azure Monitor.
Region mapping: Linking a Log Analytics workspace and an Azure Automation account is supported only
between certain Azure regions. For example, if the Log Analytics workspace is hosted in the EastUS region, the
linked Automation account must be created in the EastUS2 region to be used with management services. If you
have an Automation account that was created in another region, it can't link to a workspace in EastUS. The
choice of deployment region can significantly affect Azure geography requirements. Consult the region
mapping table to decide which region should host your workspaces and Automation accounts.
Workspace multihoming: The Azure Log Analytics agent supports multihoming in some scenarios, but the
agent faces several limitations and challenges when running in this configuration. Unless Microsoft has
recommended it for your specific scenario, we don't recommend that you configure multihoming on the Log
Analytics agent.

Resource placement examples


There are several different models for choosing the subscription in which you place the Log Analytics workspace
and Automation account. In short, place the workspace and Automation accounts in a subscription owned by the
team that's responsible for implementing the Update Management service and the Change Tracking and
Inventory service.
The following are examples of some ways to deploy workspaces and Automation accounts.
Placement by geography
Small and medium environments have a single subscription and several hundred resources that span multiple
Azure geographies. For these environments, create one Log Analytics workspace and one Azure Automation
account in each geography.
You can create a workspace and an Azure Automation account, as one pair, in each resource group. Then, deploy
the pair in the corresponding geography to the virtual machines.
Alternatively, if your data-compliance policies don't dictate that resources reside in specific regions, you can create
one pair to manage all the virtual machines. We also recommend that you place the workspace and Automation
account pairs in separate resource groups to provide more granular role-based access control (RBAC ).
The example in the following diagram has one subscription with two resource groups, each located in a different
geography:
Placement in a management subscription
Larger environments span multiple subscriptions and have a central IT department that owns monitoring and
compliance. For these environments, create pairs of workspaces and Automation accounts in an IT management
subscription. In this model, virtual-machine resources in a geography store their data in the corresponding
geography workspace in the IT management subscription. If application teams need to run automation tasks but
don't require linked workspace and Automation accounts, they can create separate Automation accounts in their
own application subscriptions.
Decentralized placement
In an alternative model for large environments, the application development team can be responsible for patching
and management. In this case, place the workspace and Automation account pairs in the application team
subscriptions alongside their other resources.
Create a workspace and Automation account
After you've chosen the best way to place and organize workspace and account pairs, make sure that you've
created these resources before starting the onboarding process. The automation examples later in this guidance
create a workspace and Automation account pair for you. However, if you want to onboard by using the Azure
portal and you don't have an existing workspace and Automation account pair, you'll need to create one.
To create a Log Analytics workspace by using the Azure portal, see Create a workspace. Next, create a matching
Automation account for each workspace by following the steps in Create an Azure Automation account.

NOTE
When you create an Automation account by using the Azure portal, the portal attempts by default to create Run As
accounts for both Azure Resource Manager and the classic deployment model resources. If you don't have classic virtual
machines in your environment and you're not the co-administrator on the subscription, the portal creates a Run As account
for Resource Manager, but it generates an error when deploying the classic Run As account. If you don't intend to support
classic resources, you can ignore this error.
You can also create Run As accounts by using PowerShell.

Next steps
Learn how to onboard your servers to Azure server management services.
Onboard to Azure server management services
Phase 2: Onboarding Azure server management
services
2 minutes to read • Edit Online

After you're familiar with the tools and planning involved in Azure management services, you're ready for the
second phase. Phase 2 provides step-by-step guidance for onboarding these services for use with your Azure
resources. Start by evaluating this onboarding process before adopting it broadly in your environment.

NOTE
The automation approaches discussed in later sections of this guidance are meant for deployments that don't already have
servers deployed to the cloud. They require that you have the Owner role on a subscription to create all the required
resources and policies. If you've already created Log Analytics workspaces and Automation accounts, we recommend that
you pass these resources in the appropriate parameters when you start the example automation scripts.

Onboarding processes
This section of the guidance covers the following onboarding processes for both Azure virtual machines and on-
premises servers:
Enable management services on a single VM for evaluation by using the portal. Use this process to
familiarize yourself with the Azure server management services.
Configure management services for a subscription by using the portal. This process helps you configure
the Azure environment so that any new VMs that are provisioned will automatically use management services.
Use this approach if you prefer the Azure portal experience to scripts and command lines.
Configure management services for a subscription by using Azure Automation. This process is fully
automated. Just create a subscription, and the scripts will configure the environment to use management
services for any newly provisioned VM. Use this approach if you're familiar with PowerShell scripts and Azure
Resource Manager templates, or if you want to learn to use them.
The procedures for each of these approaches are different.

NOTE
When you use the Azure portal, the sequence of onboarding steps differs from the automated onboarding steps. The portal
offers a simpler onboarding experience.

The following diagram shows the recommended deployment model for management services:
As shown in the preceding diagram, the Log Analytics agent has both an auto -enroll and opt-in configuration for
on-premises servers:
Auto-enroll: When the Log Analytics agent is installed on a server and configured to connect to a workspace,
the solutions that are enabled on that workspace are applied to the server automatically.
Opt-in: Even if the agent is installed and connected to the workspace, the solution isn't applied unless it's added
to the server's scope configuration in the workspace.

Next steps
Learn how to onboard a single VM by using the portal to evaluate the onboarding process.
Onboard a single Azure VM for evaluation
Enable server management services on a single VM
for evaluation
2 minutes to read • Edit Online

Learn how to enable server management services on a single VM for evaluation.

NOTE
Create the required Log Analytics workspace and Azure Automation account before you implement Azure management
services on a VM.

It's simple to onboard Azure server management services to individual virtual machines in the Azure portal. You
can familiarize yourself with these services before you onboard them. When you select a VM instance, all the
solutions on the list of management tools and services appear on the Operations or Monitoring menu. You
select a solution and follow the wizard to onboard it.

Related resources
For more information about how to onboard these solutions to individual VMs, see:
Onboard Update Management, Change Tracking, and Inventory solutions from Azure virtual machine
Onboard Azure Monitoring for VMs

Next steps
Learn how to use Azure Policy to onboard Azure VMs at scale.
Configure Azure management services for a subscription
Configure Azure server management services at scale
7 minutes to read • Edit Online

You must complete these two tasks to onboard Azure server management services to your servers:
Deploy service agents to your servers
Enable the management solutions
This article covers the three processes that are necessary to complete these tasks:
1. Deploy the required agents to Azure VMs by using Azure Policy
2. Deploy the required agents to on-premises servers
3. Enable and configuring the solutions

NOTE
Create the required Log Analytics workspace and Azure Automation account before you onboard virtual machines to Azure
server management services.

Use Azure Policy to deploy extensions to Azure VMs


All the management solutions that are discussed in Azure management tools and services require that the Log
Analytics agent is installed on Azure virtual machines and on-premises servers. You can onboard your Azure VMs
at scale by using Azure Policy. Assign policy to ensure that the agent is installed on your Azure VMs and connected
to the correct Log Analytics workspace.
Azure Policy has a built-in policy initiative that includes the Log Analytics agent and the Microsoft Dependency
agent, which is required by Azure Monitor for VMs.

NOTE
For more information about various agents for Azure monitoring, see Overview of the Azure monitoring agents.

Assign policies
To assign the policies that described in the previous section:
1. In the Azure portal, go to Azure Policy > Assignments > Assign initiative.

2. On the Assign Policy page, set the Scope by selecting the ellipsis (…) and then selecting either a
management group or subscription. Optionally, select a resource group. Then choose Select at the bottom
of the Scope page. The scope determines which resources or group of resources the policy is assigned to.
3. Select the ellipsis (… ) next to Policy definition to open the list of available definitions. To filter the initiative
definitions, enter Azure Monitor in the Search box:

4. The Assignment name is automatically populated with the policy name that you selected, but you can
change it. You can also add an optional description to provide more information about this policy
assignment. The Assigned by field is automatically filled based on who is signed in. This field is optional,
and it supports custom values.
5. For this policy, select Log Analytics workspace for the Log analytics agent to associate.

6. Select the Managed Identity location check box. If this policy is of the type DeployIfNotExists, a managed
identity will be required to deploy the policy. In the portal, the account will be created as indicated by the
check box selection.
7. Select Assign.
After you complete the wizard, the policy assignment will be deployed to the environment. It can take up to 30
minutes for the policy to take effect. To test it, create new VMs after 30 minutes, and check if the Microsoft
Monitoring Agent is enabled on the VM by default.

Install agents on on-premises servers


NOTE
Create the required Log Analytics workspace and Azure Automation account before you onboard Azure server management
services to servers.

For on-premises servers, you need to download and install the Log Analytics agent and the Microsoft Dependency
agent manually and configure them to connect to the correct workspace. You must specify the workspace ID and
key information. To get that information, go to your Log Analytics workspace in the Azure portal and select
Settings > Advanced settings.
Enable and configure solutions
To enable solutions, you need to configure the Log Analytics workspace. Onboarded Azure VMs and on-premises
servers will get the solutions from the Log Analytics workspaces that they're connected to.
Update Management
The Update Management, Change Tracking, and Inventory solutions require both a Log Analytics workspace and
an Automation account. To ensure that these resources are properly configured, we recommend that you onboard
through your Automation account. For more information, see Onboard Update Management, Change Tracking,
and Inventory solutions.
We recommend that you enable the Update Management solution for all servers. Update Management is free for
Azure VMs and on-premises servers. If you enable Update Management through your Automation account, a
scope configuration is created in the workspace. Manually update the scope to include machines that are covered
by the Update Management service.
To cover your existing servers as well as future servers, you need to remove the scope configuration. To do this,
view your Automation account in the Azure portal. Select Update Management > Manage machine > Enable
on all available and future machines. This setting allows all Azure VMs that are connected to the workspace to
use Update Management.

Change Tracking and Inventory solutions


To onboard the Change Tracking and Inventory solutions, follow the same steps as for Update Management. For
more information about how to onboard these solutions from your Automation account, see Onboard Update
Management, Change Tracking, and Inventory solutions.
The Change Tracking solution is free for Azure VMs and costs $6 per node per month for on-premises servers.
This cost covers Change Tracking, Inventory, and Desired State Configuration. If you want to enroll only specific
on-premises servers, you can opt in those servers. We recommend that you onboard all your production servers.
Opt in via the Azure portal
1. Go to the Automation account that has Change Tracking and Inventory enabled.
2. Select Change tracking.
3. Select Manage machines in the upper-right pane.
4. Select Enable on selected machines. Then select Add next to the machine name.
5. Select Enable to enable the solution for those machines.

Opt in by using saved searches


Alternatively, you can configure the scope configuration to opt in on-premises servers. Scope configuration uses
saved searches.
To create or modify the saved search, follow these steps:
1. Go to the Log Analytics workspace that is linked to the Automation account that you configured in the
preceding steps.
2. Under General, select Saved searches.
3. In the Filter box, enter Change Tracking to filter the list of saved searches. In the results, select
MicrosoftDefaultComputerGroup.
4. Enter the computer name or the VMUUID to include the computers that you want to opt in for Change
Tracking.

Heartbeat
| where AzureEnvironment=~"Azure" or Computer in~ ("list of the on-premises server names", "server1")
| distinct Computer
NOTE
The server name must exactly match the value in the expression, and it shouldn't contain a domain name suffix.

5. Select Save. By default, the scope configuration is linked to the MicrosoftDefaultComputerGroup saved
search. It will be automatically updated.
Azure Activity Log
Azure Activity Log is also part of Azure Monitor. It provides insight into subscription-level events that occur in
Azure.
To implement this solution:
1. In the Azure portal, open All services and select Management + Governance > Solutions.
2. In the Solutions view, select Add.
3. Search for Activity Log Analytics and select it.
4. Select Create.
You need to specify the Workspace name of the workspace that you created in the previous section where the
solution is enabled.
Azure Log Analytics Agent Health
The Azure Log Analytics Agent Health solution reports on the health, performance, and availability of your
Windows and Linux servers.
To implement this solution:
1. In the Azure portal, open All services and select Management + Governance > Solutions.
2. In the Solutions view, select Add.
3. Search for Azure Log Analytics agent health and select it.
4. Select Create.
You need to specify the Workspace name of the workspace that you created in the previous section where the
solution is enabled.
After creation is complete, the workspace resource instance displays AgentHealthAssessment when you select
View > Solutions.
Antimalware Assessment
The Antimalware Assessment solution helps you identify servers that are infected or at increased risk of infection
by malware.
To implement this solution:
1. In the Azure portal, open All services and select Management + Governance > Solutions.
2. In the Solutions view, select Add.
3. Search for and select Antimalware Assessment.
4. Select Create.
You need to specify the Workspace name of the workspace that you created in the previous section where the
solution is enabled.
After creation is complete, the workspace resource instance displays AntiMalware when you select View >
Solutions.
Azure Monitor for VMs
You can enable Azure Monitor for VMs through the view page for the VM instance, as described in Enable
management services on a single VM for evaluation. You shouldn't enable solutions directly from the Solutions
page as you do for the other solutions that are described in this article. For large-scale deployments, it may be
easier to use automation to enable the correct solutions in the workspace.
Azure Security Center
We recommend that you onboard all your servers at least to the Azure Security Center Free tier. This option
provides a basic level of security assessments and actionable security recommendations for your environment. If
you upgrade to the Standard tier, you get additional benefits, which are discussed in detail on the Security Center
pricing page.
To enable the Azure Security Center Free tier, follow these steps:
1. Go to the Security Center portal page.
2. Under POLICY & COMPLIANCE, select Security policy.
3. Find the Log Analytics workspace resource that you created in the pane on the right side.
4. Select Edit settings for that workspace.
5. Select Pricing tier.
6. Choose the Free option.
7. Select Save.

Next steps
Learn how to use automation to onboard servers and create alerts.
Automate onboarding and alert configuration
Automate onboarding
2 minutes to read • Edit Online

To improve the efficiency of deploying Azure server management services, consider automating deployment as
discussed in previous sections of this guidance. The script and the example templates provided in the following
sections are starting points for developing your own automation of onboarding processes.
This guidance has a supporting GitHub repository of sample code, CloudAdoptionFramework. The repository
provides example scripts and Azure Resource Manager templates to help you automate the deployment of Azure
server management services.
The sample files illustrate how to use Azure PowerShell cmdlets to automate the following tasks:
Create a Log Analytics workspace. (Or, use an existing workspace if it meets the requirements. For details,
see Workspace planning.
Create an Automation account. (Or, use an existing account if it meets the requirements. For details, see
Workspace planning).
Link the Automation account and the Log Analytics workspace. This step isn't required if you're onboarding
by using the Azure portal.
Enable Update Management, and Change Tracking and Inventory, for the workspace.
Onboard Azure VMs by using Azure Policy. A policy installs the Log Analytics agent and the Microsoft
Dependency Agent on the Azure VMs.
Onboard on-premises servers by installing the Log Analytics agent on them.
The files described in the following table are used in this sample. You can customize them to support your own
deployment scenarios.

FILE NAME DESCRIPTION

New-AMSDeployment.ps1 The main, orchestrating script that automates onboarding. It


creates resource groups, and location, workspace, and
Automation accounts, if they don't exist already. This
PowerShell script requires an existing subscription.

Workspace-AutomationAccount.json A Resource Manager template that deploys the workspace


and Automation account resources.

WorkspaceSolutions.json A Resource Manager template that enables the solutions you


want in the Log Analytics workspace.

ScopeConfig.json A Resource Manager template that uses the opt-in model for
on-premises servers with the Change Tracking solution. Using
the opt-in model is optional.

Enable-VMInsightsPerfCounters.ps1 A PowerShell script that enables VM Insights for servers and


configures performance counters.

ChangeTracking-Filelist.json A Resource Manager template that defines the list of files that
will be monitored by Change Tracking.
Use the following command to run New -AMSDeployment.ps1:

.\New-AMSDeployment.ps1 -SubscriptionName '{Subscription Name}' -WorkspaceName '{Workspace Name}' -


WorkspaceLocation '{Azure Location}' -AutomationAccountName {Account Name} -AutomationAccountLocation {Account
Location}

Next steps
Learn how to set up basic alerts to notify your team of key management events and issues.
Set up basic alerts
Set up basic alerts
2 minutes to read • Edit Online

A key part of managing resources is getting notified when problems occur. Alerts proactively notify you of critical
conditions, based on triggers from metrics, logs, or service-health issues. As part of onboarding the Azure server
management services, you can set up alerts and notifications that help keep your IT teams aware of any problems.

Azure Monitor alerts


Azure Monitor offers alerting capabilities to notify you, via email or messaging, when things go wrong. These
capabilities are based on a common data-monitoring platform that includes logs and metrics generated by your
servers and other resources. By using a common set of tools in Azure Monitor, you can analyze data that's
combined from multiple resources and use it to trigger alerts. These triggers can include:
Metric values.
Log search queries.
Activity log events.
The health of the underlying Azure platform.
Tests for website availability.
See the list of Azure Monitor data sources for a more detailed description of the sources of monitoring data that
this service collects.
For details about manually creating and managing alerts by using the Azure portal, see the Azure Monitor
documentation.

Automated deployment of recommended alerts


In this guide, we recommend that you create a set of 15 alerts for basic infrastructure monitoring. Find the
deployment scripts in the Azure Alert Toolkit GitHub repository.
This package creates alerts for:
Low disk space
Low available memory
High CPU use
Unexpected shutdowns
Corrupted file systems
Common hardware failures
The package uses HP server hardware as an example. Change the settings in the associated configuration file to
reflect your OEM hardware. You can also add more performance counters to the configuration file. To deploy the
package, run the New -CoreAlerts.ps1 file.

Next steps
Learn about operations and security mechanisms that support your ongoing operations.
Ongoing management and security
Phase 3: Ongoing management and security
2 minutes to read • Edit Online

After you've onboarded Azure server management services, you'll need to focus on the operations and security
configurations that will support your ongoing operations. We'll start with securing your environment by reviewing
the Azure Security Center. We'll then configure policies to keep your servers in compliance and automate common
tasks. This section covers the following topics:
Address security recommendations. Azure Security Center provides suggestions to improve the security of
your environment. When you implement these recommendations, you see the impact reflected in a security
score.
Enable the Guest Configuration policy. Use the Azure Policy Guest Configuration feature to audit the
settings in a virtual machine. For example, you can check whether any certificates are about to expire.
Track and alert on critical changes. When you're troubleshooting, the first question to consider is, "What's
changed?" In this article, you'll learn how to track changes and create alerts to proactively monitor critical
components.
Create update schedules. Schedule the installation of updates to ensure that all your servers have the latest
ones.
Common Azure Policy examples. This article provides examples of common management policies.

Address security recommendations


Azure Security Center is the central place to manage security for your environment. You'll see an overall
assessment and targeted recommendations.
We recommend that you review and implement the recommendations provided by this service. For information
about additional benefits of Azure Security Center, see Follow Azure Security Center recommendations.

Next steps
Learn how to enable the Azure Policy Guest Configuration feature.
Guest Configuration policy
Guest Configuration policy
2 minutes to read • Edit Online

You can use the Azure Policy Guest Configuration extension to audit the configuration settings in a virtual
machine. Guest Configuration is currently supported only on Azure VMs.
To find the list of Guest Configuration policies, search for "Guest Configuration" on the Azure Policy portal page.
Or run this cmdlet in a PowerShell window to find the list:

Get-AzPolicySetDefinition | where-object {$_.Properties.metadata.category -eq "Guest Configuration"}

NOTE
Guest Configuration functionality is regularly updated to support additional policy sets. Check for new supported policies
periodically and evaluate whether they'll be useful.

Deployment
Use the following example PowerShell script to deploy these policies to:
Verify that password security settings in Windows and Linux computers are set correctly.
Verify that certificates aren't close to expiration on Windows VMs.
Before you run this script, use the Connect-AzAccount cmdlet to sign in. When you run the script, you must
provide the name of the subscription that you want to apply the policies to.

#Assign Guest Configuration policy.


param (
[Parameter(Mandatory=$true)]
[string]$SubscriptionName
)

$Subscription = Get-AzSubscription -SubscriptionName $SubscriptionName


$scope = "/subscriptions/" + $Subscription.Id

$PasswordPolicy = Get-AzPolicySetDefinition -Name "3fa7cbf5-c0a4-4a59-85a5-cca4d996d5a6"


$CertExpirePolicy = Get-AzPolicySetDefinition -Name "b6f5e05c-0aaa-4337-8dd4-357c399d12ae"

New-AzPolicyAssignment -Name "PasswordPolicy" -DisplayName "[Preview]: Audit that password security


settings are set correctly inside Linux and Windows machines" -Scope $scope -PolicySetDefinition
$PasswordPolicy -AssignIdentity -Location eastus

New-AzPolicyAssignment -Name "CertExpirePolicy" -DisplayName "[Preview]: Audit that certificates are not
expiring on Windows VMs" -Scope $scope -PolicySetDefinition $CertExpirePolicy -AssignIdentity -Location eastus

Next steps
Learn how to enable change tracking and alerting for critical file, service, software, and registry changes.
Enable tracking and alerting for critical changes
Enable tracking and alerting for critical changes
3 minutes to read • Edit Online

Azure Change Tracking and Inventory provide alerts on the configuration state of your hybrid environment and
changes to that environment. It can report critical file, service, software, and registry changes that might affect
your deployed servers.
By default, the Azure Automation inventory service doesn't monitor files or registry settings. The solution does
provide a list of registry keys that we recommend for monitoring. To see this list, go to your Automation account
in the Azure portal and select Inventory > Edit Settings.

For more information about each registry key, see Registry key change tracking. Select any key to evaluate and
then enable it. The setting is applied to all VMs that are enabled in the current workspace.
You can also use the service to track critical file changes. For example, you might want to track the
C:\windows\system32\drivers\etc\hosts file because the OS uses it to map host names to IP addresses. Changes
to this file could cause connectivity problems or redirect traffic to dangerous websites.
To enable file-content tracking for the hosts file, follow the steps in Enable file content tracking.
You can also add an alert for changes to files that you're tracking. For example, say you want to set an alert for
changes to the hosts file. Select Log Analytics on the command bar or Log Search for the linked Log Analytics
workspace. In Log Analytics, use the following query to search for changes to the hosts file:

ConfigurationChange | where FieldsChanged contains "FileContentChecksum" and FileSystemPath contains "hosts"

This query searches for changes to the contents of files that have a path that contains the word “hosts.” You can
also search for a specific file by changing the path parameter. (For example,
FileSystemPath == "c:\\windows\\system32\\drivers\\etc\\hosts" .)

After the query returns the results, select New alert rule to open the alert-rule editor. You can also get to this
editor via Azure Monitor in the Azure portal.
In the alert-rule editor, review the query and change the alert logic if you need to. In this case, we want the alert to
be raised if any changes are detected on any machine in the environment.
After you set the condition logic, you can assign action groups to perform actions in response to the alert. In this
example, when the alert is raised, emails are sent and an ITSM ticket is created. You can take many other useful
actions, like triggering an Azure function, an Azure Automation runbook, a webhook, or a logic app.

After you've set all the parameters and logic, apply the alert to the environment.

Tracking and alerting examples


This section shows other common scenarios for tracking and alerting that you might want to use.
Driver file changed
Use the following query to detect if driver files are changed, added, or removed. It's useful for tracking changes to
critical system files.
ConfigurationChange | where ConfigChangeType == "Files" and FileSystemPath contains "
c:\\windows\\system32\\drivers\\"

Specific service stopped


Use the following query to track changes to system-critical services.

ConfigurationChange | where SvcState == "Stopped" and SvcName contains "w3svc"

New software installed


Use the following query for environments that need to lock down software configurations.

ConfigurationChange | where ConfigChangeType == "Software" and ChangeCategory == "Added"

Specific software version is or isn't installed on a machine


Use the following query to assess security. This query references ConfigurationData , which contains the logs for
inventory and provides the last-reported configuration state, not changes.

ConfigurationData | where SoftwareName contains "Monitoring Agent" and CurrentVersion != "8.0.11081.0"

Known DLL changed through the registry


Use the following query to detect changes to well-known registry keys.

ConfigurationChange | where RegistryKey == "HKEY_LOCAL_MACHINE\\System\\CurrentControlSet\\Control\\Session


Manager\\KnownDlls"

Next steps
Learn how to use Azure Automation to create update schedules to manage updates to your servers.
Create update schedules
Create update schedules
2 minutes to read • Edit Online

You can manage update schedules by using the Azure portal or the new PowerShell cmdlet modules.
To create an update schedule via the Azure portal, see Schedule an update deployment.
The Az.Automation module now supports configuring update management by using Azure PowerShell. Version
1.7.0 of the module adds support for the New -AzAutomationUpdateManagementAzureQuery cmdlet. This cmdlet
lets you use tags, location, and saved searches to configure update schedules for a flexible group of machines.

Example script
The example script in this section illustrates the use of tagging and querying to create dynamic groups of
machines that you can apply update schedules to. It performs the following actions. You can refer to the
implementations of the specific actions when you create your own scripts.
Creates an Azure Automation update schedule that runs every Saturday at 8:00 AM.
Creates a query for machines that match these criteria:
Deployed in the westus , eastus , or eastus2 Azure location
Have an Owner tag applied to them with a value set to JaneSmith
Have a Production tag applied to them with a value set to true
Applies the update schedule to the queried machines and sets a two-hour update window.
Before you run the example script, you'll need to sign in by using the Connect-AzAccount cmdlet. When you start
the script, provide the following information:
The target subscription ID
The target resource group
Your Log Analytics workspace name
Your Azure Automation account name
<#
.SYNOPSIS
This script orchestrates the deployment of the solutions and the agents.
.Parameter SubscriptionName
.Parameter WorkspaceName
.Parameter AutomationAccountName
.Parameter ResourceGroupName

#>

param (
[Parameter(Mandatory=$true)]
[string]$SubscriptionId,

[Parameter(Mandatory=$true)]
[string]$ResourceGroupName,

[Parameter(Mandatory=$true)]
[string]$WorkspaceName,

[Parameter(Mandatory=$true)]
[string]$AutomationAccountName,

[Parameter(Mandatory=$false)]
[string]$scheduleName = "SaturdayCritialSecurity"
)

Import-Module Az.Automation

$startTime = ([DateTime]::Now).AddMinutes(10)
$schedule = New-AzAutomationSchedule -ResourceGroupName $ResourceGroupName `
-AutomationAccountName $AutomationAccountName `
-StartTime $startTime `
-Name $scheduleName `
-Description "Saturday patches" `
-DaysOfWeek Saturday `
-WeekInterval 1 `
-ForUpdateConfiguration

# Using AzAutomationUpdateManagementAzureQuery to create dynamic groups.

$queryScope = @("/subscriptions/$SubscriptionID/resourceGroups/")

$query1Location =@("westus", "eastus", "eastus2")


$query1FilterOperator = "Any"
$ownerTag = @{"Owner"= @("JaneSmith")}
$ownerTag.add("Production", "true")

$DGQuery = New-AzAutomationUpdateManagementAzureQuery -ResourceGroupName $ResourceGroupName `


-AutomationAccountName $AutomationAccountName `
-Scope $queryScope `
-Tag $ownerTag

$AzureQueries = @($DGQuery)

$UpdateConfig = New-AzAutomationSoftwareUpdateConfiguration -ResourceGroupName $ResourceGroupName `


-AutomationAccountName $AutomationAccountName `
-Schedule $schedule `
-Windows `
-Duration (New-TimeSpan -Hours 2) `
-AzureQuery $AzureQueries `
-IncludedUpdateClassification Security,Critical

Next steps
See examples of how to implement common policies in Azure that can help manage your servers.
Common policies in Azure
Common Azure Policy examples
2 minutes to read • Edit Online

Azure Policy can help you apply governance to your cloud resources. This service can help you create guardrails
that ensure company-wide compliance to governance policy requirements. To create policies, use either the Azure
portal or PowerShell cmdlets. This article provides PowerShell cmdlet examples.

NOTE
With Azure Policy, enforcement policies (deployIfNotExists) aren't automatically deployed to existing VMs. Remediation is
required to keep VMs in compliance. For more information, see Remediate noncompliant resources with Azure Policy.

Common policy examples


The following sections describe some commonly used policies.
Restrict resource regions
Regulatory and policy compliance often depends on control of the physical location where resources are deployed.
You can use a built-in policy to allow users to create resources only in certain allowed Azure regions.
To find this policy in the portal, search for "location" on the policy definition page. Or run this cmdlet to find the
policy:

Get-AzPolicyDefinition | Where-Object { ($_.Properties.policyType -eq "BuiltIn") -and


($_.Properties.displayName -like "*location*") }

The following script shows how to assign the policy. Change the $SubscriptionID value to point to the
subscription that you want to assign the policy to. Before you run the script, use the Connect-AzAccount cmdlet to
sign in.

#Specify the value for $SubscriptionID.


$SubscriptionID = <subscription ID>
$scope = "/subscriptions/$SubscriptionID"

#Replace the -Name GUID with the policy GUID you want to assign.
$AllowedLocationPolicy = Get-AzPolicyDefinition -Name "e56962a6-4747-49cd-b67b-bf8b01975c4c"

#Replace the locations with the ones you want to specify.


$policyParam = '{"listOfAllowedLocations":{"value":["eastus","westus"]}}'
New-AzPolicyAssignment -Name "Allowed Location" -DisplayName "Allowed locations for resource creation" -Scope
$scope -PolicyDefinition $AllowedLocationPolicy -Location eastus -PolicyParameter $policyparam

You can also use this script to apply the other policies that are discussed in this article. Just replace the GUID in
the line that sets $AllowedLocationPolicy with the GUID of the policy that you want to apply.
Block certain resource types
Another common built-in policy that's used to control costs can also be used to block certain resource types.
To find this policy in the portal, search for "allowed resource types" on the policy definition page. Or run this
cmdlet to find the policy:
Get-AzPolicyDefinition | Where-Object { ($_.Properties.policyType -eq "BuiltIn") -and
($_.Properties.displayName -like "*allowed resource types") }

After you identify the policy that you want to use, you can modify the PowerShell sample in the Restrict resource
regions section to assign the policy.
Restrict VM size
Azure offers a wide range of VM sizes to support various workloads. To control your budget, you could create a
policy that allows only a subset of VM sizes in your subscriptions.
Deploy antimalware
You can use this policy to deploy a Microsoft IaaSAntimalware extension with a default configuration to VMs that
aren't protected by antimalware.
The policy GUID is 2835b622-407b-4114-9198-6f7064cbe0dc .
The following script shows how to assign the policy. To use the script, change the $SubscriptionID value to point
to the subscription that you want to assign the policy to. Before you run the script, use the Connect-AzAccount
cmdlet to sign in.

#Specify the value for $SubscriptionID.


$SubscriptionID = <subscription ID>
$scope = "/subscriptions/$SubscriptionID"

$AntimalwarePolicy = Get-AzPolicyDefinition -Name "2835b622-407b-4114-9198-6f7064cbe0dc"

#Replace location “eastus” with the value that you want to use.
New-AzPolicyAssignment -Name "Deploy Antimalware" -DisplayName "Deploy default Microsoft IaaSAntimalware
extension for Windows Server" -Scope $scope -PolicyDefinition $AntimalwarePolicy -Location eastus –
AssignIdentity

Next steps
Learn about other server-management tools and services that are available.
Azure server management tools and services
Azure server management tools and services
5 minutes to read • Edit Online

As is discussed in the overview of this guidance, the suite of Azure server management services covers these
areas:
Migrate
Secure
Protect
Monitor
Configure
Govern
The following sections briefly describe these management areas and provide links to detailed content about the
main Azure services that support them.

Migrate
Migration services can help you migrate your workloads into Azure. To provide the best guidance, the Azure
Migrate service starts by measuring on-premises server performance and assessing suitability for migration.
After Azure Migrate completes the assessment, you can use Azure Site Recovery and Azure Database Migration
Service to migrate your on-premises machines to Azure.

Secure
Azure Security Center is a comprehensive security management application. By onboarding to Security Center,
you can quickly get an assessment of the security and regulatory compliance status of your environment. For
instructions on onboarding your servers to Azure Security Center, see Configure Azure management services for
a subscription.

Protect
To protect your data, you need to plan for backup, high availability, encryption, authorization, and related
operational issues. These topics are covered extensively online, so here we'll focus on building a Business
Continuity Disaster Recovery (BCDR ) plan. We'll include references to documentation that describes in detail how
to implement and deploy this type of plan.
When you build data-protection strategies, first consider breaking down your workload applications into their
different tiers. This approach helps because each tier typically requires its own unique protection plan. To learn
more about designing applications to be resilient, see Designing resilient applications for Azure.
The most basic data protection is backup. To speed up the recovery process if servers are lost, back up not just
data but also server configurations. Backup is an effective mechanism to handle accidental data deletion and
ransomware attacks. Azure Backup can help you protect your data on Azure and on-premises servers running
Windows or Linux. For details about what Backup can do and for how -to guides, see the Azure Backup
documentation.
Recovery via backup can take a long time. The industry standard is usually one day. If a workload requires
business continuity for hardware failures or datacenter outage, consider using data replication. Azure Site
Recovery provides continuous replication of your VMs, a solution that provides bare-minimum data loss. Site
Recovery also supports several replication scenarios, such as replication:
Of Azure VMs between two Azure regions.
Between servers on-premises.
Between on-premises servers and Azure.
For more information, see the complete Azure Site Recovery replication matrix.
For your file-server data, another service to consider is Azure File Sync. This service helps you centralize your
organization's file shares in Azure Files, while preserving the flexibility, performance, and compatibility of an on-
premises file server. To use this service, follow the instructions for deploying Azure File Sync.

Monitor
Azure Monitor provides a view into various resources, like applications, containers, and virtual machines. It also
collects data from several sources:
Azure Monitor for VMs (insights) provides an in-depth view of virtual-machine health, performance trends,
and dependencies. The service monitors the health of the operating systems of your Azure virtual machines,
virtual-machine scale sets, and machines in your on-premises environment.
Log Analytics (logs) is a feature of Azure Monitor. Its role is central to the overall Azure management story. It
serves as the data store for log analysis and for many other Azure services. It offers a rich query language and
an analytics engine that provides insights into the operation of your applications and resources.
Azure Activity Log is also a feature of Azure Monitor. It provides insight into subscription-level events that
occur in Azure.

Configure
Several services fit into this category. They can help you to:
Automate operational tasks.
Manage server configurations.
Measure update compliance.
Schedule updates.
Detect changes to your servers.
These services are essential to supporting ongoing operations:
Update Management automates the deployment of patches across your environment, including deployment to
operating-system instances running outside of Azure. It supports both Windows and Linux operating systems,
and tracks key OS vulnerabilities and nonconformance caused by missing patches.
Change Tracking and Inventory provides insight into the software that's running in your environment, and
highlights any changes that have occurred.
Azure Automation lets you run Python and PowerShell scripts or runbooks to automate tasks across your
environment. When you use Automation with the Hybrid Runbook Worker, you can extend your runbooks to
your on-premises resources as well.
Azure Automation State Configuration enables you to push PowerShell Desired State Configuration (DSC )
configurations directly from Azure. DSC also lets you monitor and preserve configurations for guest operating
systems and workloads.

Govern
Adopting and moving to the cloud creates new management challenges. It requires a different mindset as you
shift from an operational management burden to monitoring and governance. The Cloud Adoption Framework
for Azure starts with governance. The framework explains how to migrate to the cloud, what the journey will look
like, and who should be involved.
The governance design for standard organizations often differs from governance design for complex enterprises.
To learn more about governance best practices for a standard organization, see the standard enterprise
governance guide. To learn more about governance best practices for a complex enterprise, see the governance
guide for complex enterprises.

Billing information
To learn about pricing for Azure management services, go to these pages:
Azure Site Recovery
Azure Backup
Azure Monitor
Azure Security Center
Azure Automation, including:
Desired State Configuration
Azure Update Management service
Azure Change Tracking and Inventory services
Azure Policy
Azure File Sync service

NOTE
The Azure Update Management solution is free, but there's a small cost related to data ingestion. As a rule of thumb, the
first 5 gigabytes (GB) per month of data ingestion are free. We generally observe that each machine uses about 25 MB per
month. So, about 200 machines per month are covered for free. For more servers, multiply the number of additional servers
by 25 MB per month. Then, multiply the result by the storage price for the additional storage that you need. For
information about costs, see Azure Storage Overview pricing. Each additional server typically has a nominal impact on cost.
Cloud monitoring guide: Introduction
3 minutes to read • Edit Online

The cloud fundamentally changes how enterprises procure and use technology resources. In the past, enterprises
assumed ownership of and responsibility for all levels of technology, from infrastructure to software. Now, the
cloud offers the potential for enterprises to provision and consume resources as needed.
Although the cloud offers nearly unlimited flexibility in terms of design choices, enterprises seek proven and
consistent methodology for the adoption of cloud technologies. Each enterprise has different goals and timelines
for cloud adoption, making a one-size-fits-all approach to adoption nearly impossible.

This digital transformation also enables an opportunity to modernize your infrastructure, workloads, and
applications. Depending on business strategy and objectives, adopting a hybrid cloud model is likely part of the
migration journey from on-premises to operating fully in the cloud. During this journey, IT teams are challenged to
adopt and realize rapid value from the cloud. IT must also understand how to effectively monitor the application or
service that's migrating to Azure, and continue to deliver effective IT operations and DevOps.
Stakeholders want to use cloud-based, software as a service (SaaS ) monitoring and management tools. They need
to understand what services and solutions deliver to achieve end-to-end visibility, reduce costs, and focus less on
infrastructure and maintenance of traditional software-based IT operations tools.
However, IT often prefers to use the tools they've already made a significant investment in. This approach supports
their service operations processes to monitor both cloud models, with the eventual goal of transitioning to a SaaS -
based offering. IT prefers this approach not only because it takes time, planning, resources, and funding to switch.
It's also because of confusion about which products or Azure services are appropriate or applicable to achieve the
transition.
The goal of this guide is to provide a detailed reference to help enterprise IT managers, business decision makers,
application architects, and application developers understand:
Azure monitoring platforms, with an overview and comparison of their capabilities.
The best-fit solution for monitoring hybrid, private, and Azure native workloads.
The recommended end-to-end monitoring approach for both infrastructure and applications. This approach
includes deployable solutions for migrating these common workloads to Azure.
This guide isn't a how -to article for using or configuring individual Azure services and solutions, but it does
reference those sources when they're applicable or available. After you've read it, you'll understand how to
successfully operate a workload by following best practices and patterns.
If you're unfamiliar with Azure Monitor and System Center Operations Manager, and you want to get a better
understanding of what makes them unique and how they compare to each other, review the Overview of our
monitoring platforms.

Audience
This guide is useful primarily for enterprise administrators, IT operations, IT security and compliance, application
architects, workload development owners, and workload operations owners.

How this guide is structured


This article is part of a series. The following articles are meant to be read together, in order:
Introduction (this article)
Monitoring strategy for cloud deployment models
Collect the right data
Alerting

Products and services


A few software and services are available to help you monitor and manage a variety of resources that are hosted in
Azure, your corporate network, or other cloud providers. They are:
System Center Operations Manager
Azure Monitor, which now includes Log Analytics and Application Insights
Azure Policy and Azure Blueprints
Azure Automation
Azure Logic Apps
Azure Event Hubs
This first version of the guide covers our current monitoring platforms: Azure Monitor and System Center
Operations Manager. It also outlines our recommended strategy for monitoring each of the cloud deployment
models. Also included is the first set of monitoring recommendations, starting with data collection and alerting.

Next steps
Monitoring strategy for cloud deployment models
Cloud monitoring guide: Monitoring strategy for
cloud deployment models
17 minutes to read • Edit Online

This article includes our recommended monitoring strategy for each of the cloud deployment models, based on
the following criteria:
You must maintain your commitment to Operations Manager or another enterprise monitoring platform,
because it's integrated with your IT operations processes, knowledge, and expertise, or certain functionality
isn't available yet in Azure Monitor.
You must monitor workloads both on-premises and in the public cloud, or just in the cloud.
Your cloud migration strategy includes modernizing IT operations and moving to our cloud monitoring
services and solutions.
You might have critical systems that are air-gapped or physically isolated, or are hosted in a private cloud or on
physical hardware, and these systems need to be monitored.
Our strategy includes support for monitoring infrastructure (compute, storage, and server workloads), application
(end-user, exceptions, and client), and network resources. It delivers a complete, service-oriented monitoring
perspective.

Azure cloud monitoring


Azure Monitor is the Azure native platform service that provides a single source for monitoring Azure resources.
It's designed for cloud solutions that:
Are built on Azure.
Support a business capability that’s based on virtual machine (VM ) workloads or complex architectures that
use microservices and other platform resources.
It monitors all layers of the stack, starting with tenant services, such as Azure Active Directory Domain Services,
and subscription-level events and Azure service health.
It also monitors infrastructure resources, such as VMs, storage, and network resources. At the top layer, it monitors
your application.
By monitoring each of these dependencies, and collecting the right signals that each can emit, you get the
observability of applications and the key infrastructure you need.
Our recommended approach to monitoring each layer of the stack is summarized in the following table:

LAYER RESOURCE SCOPE METHOD

Application A web-based application Monitor a live web Azure Monitor (Application


that runs on .NET, .NET application to automatically Insights).
Core, Java, JavaScript, and detect performance
Node.js platform on an anomalies, identify code
Azure VM, Azure App exceptions and issues, and
Services, Azure Service collect user behavior
Fabric, Azure Functions, and analytics.
Azure Cloud Services.
LAYER RESOURCE SCOPE METHOD

Azure resources - platform Azure Database services (for Azure Database for SQL Enable diagnostics logging
as a service (PaaS) example, SQL or MySQL). performance metrics. to stream SQL data to Azure
Monitor logs.

Azure resources - 1. Azure Storage 1. Capacity, availability, and 1. Storage metrics for Blob
infrastructure as a service 2. Azure Application performance. storage.
(IaaS) Gateway 2. Performance and 2. Enable diagnostics logging
3. Network security groups diagnostics logs (activity, and configure streaming to
4. Azure Traffic Manager access, performance, and Azure Monitor logs.
5. Azure Virtual Machines firewall). 3. Enable diagnostics logging
6. Azure Kubernetes 3. Monitor events when of network security groups,
Service/Azure Container rules are applied, and the and configure streaming to
Instances rule counter for how many Azure Monitor logs.
times a rule is applied to 4. Enable diagnostics logging
deny or allow. of Traffic Manager
4. Monitor endpoint status endpoints, and configure
availability. streaming to Azure Monitor
5. Monitor capacity, logs.
availability, and performance 5. Enable Azure Monitor for
in a guest VM operating VMs.
system (OS). Map app 6. Enable Azure Monitor for
dependencies hosted on containers.
each VM, including the
visibility of active network
connections between
servers, inbound and
outbound connection
latency, and ports across
any TCP-connected
architecture.
6. Monitor capacity,
availability, and performance
of workloads running on
containers and container
instances.

Network Communication between Monitor reachability, latency, Azure Network Watcher.


your virtual machine and and network topology
one or more endpoints changes that occur between
(another VM, a fully qualified the VM and the endpoint.
domain name, a uniform
resource identifier, or an
IPv4 address).
LAYER RESOURCE SCOPE METHOD

Azure subscription Azure service health and Administrative actions Delivered in the Activity Log
basic resource health. performed on a service or for monitoring and alerting
resource. by using Azure Resource
Service health with an Manager.
Azure service is in a
degraded or unavailable
state.
Health issues detected
with an Azure resource from
the Azure service
perspective.
Operations performed
with Azure Autoscale
indicating a failure or
exception.
Operations performed
with Azure Policy indicating
that an allowed or denied
action occurred.
Record of alerts
generated by Azure Security
Center.

Azure tenant Azure Active Directory Enable diagnostics logging,


and configure streaming to
Azure Monitor logs.

Hybrid cloud monitoring


For many organizations, transition to the cloud must be approached gradually, where the hybrid cloud model is
the most common first step in the journey. You carefully select the appropriate subset of applications and
infrastructure to begin your migration, while you avoid disruption to your business. However, because we offer
two monitoring platforms that support this cloud model, IT decision makers might be uncertain as to which is the
best choice to support their business and IT operational goals.
In this section, we address the uncertainty by reviewing several factors and offering an understanding of which
platform to consider.
Keep in mind the following key technical aspects:
You need to collect data from Azure resources that support the workload, and forward them to your
existing on-premises or managed service provider tools.
You need to maintain your current investment in System Center Operations Manager, and configure it to
monitor IaaS and PaaS resources that are running in Azure. Optionally, because you're monitoring two
environments with different characteristics, based on your requirements, you need to determine how
integrating with Azure Monitor supports your strategy.
As part of your modernization strategy to standardize on a single tool to reduce cost and complexity, you
need to commit to Azure Monitor for monitoring the resources in Azure and on your corporate network.
The following table summarizes the requirements that Azure Monitor and System Center Operations Manager
support with monitoring the hybrid cloud model based on a common set of criteria.
REQUIREMENT AZURE MONITOR OPERATIONS MANAGER

Infrastructure requirements No Yes


Requires, at a minimum, a management
server and a SQL server to host the
operational database and the reporting
data warehouse database. The
complexity increases when high
availability and disaster recovery are
required, and there are machines in
multiple sites, untrusted systems, and
other complex design considerations.

Limited connectivity - no internet No Yes


or isolated network

Limited connectivity - controlled Yes Yes


internet access

Limited connectivity - frequently Yes Yes


disconnected

Configurable health monitoring No Yes

Web app availability test (isolated Yes, limited Yes


network) Azure Monitor has limited support in
this area and requires custom firewall
exceptions.

Web app availability test (globally No Yes


distributed)

Monitor VM workloads Yes, limited Yes


Can collect IIS and SQL Server error Supports monitoring most of the server
logs, Windows events, and performance workloads with available management
counters. Requires creating custom packs. Requires either the Log Analytics
queries, alerts, and visualizations. Windows agent or Operations Manager
agent on the VM, reporting back to the
management group on the corporate
network.

Monitor Azure IaaS Yes Yes


Supports monitoring most of the
infrastructure from the corporate
network. Tracks availability state,
metrics, and alerts for Azure VMs, SQL,
and storage via the Azure management
pack.

Monitor Azure PaaS Yes Yes, limited


Based on what's supported in the Azure
management pack.
REQUIREMENT AZURE MONITOR OPERATIONS MANAGER

Azure service monitoring Yes Yes


Although there's no native monitoring
of Azure service health provided today
through a management pack, you can
create custom workflows to query
Azure service health alerts. Use the
Azure REST API to get alerts through
your existing notifications.

Modern web application monitoring Yes No

Legacy web application monitoring Yes, limited, varies by SDK Yes, limited
Supports monitoring older versions of
.NET and Java web applications.

Monitor Azure Kubernetes Service Yes No


containers

Monitor Docker or Windows containers Yes No

Network performance monitoring Yes Yes, limited


Supports availability checks, and collects
basic statistics from network devices by
using the Simple Network Management
Protocol (SNMP) from the corporate
network.

Interactive data analysis Yes No


Relies on SQL Server Reporting Services
canned or custom reports, third-party
visualization solutions, or a custom
Power BI implementation. There are
scale and performance limitations with
the Operations Manager data
warehouse. Integrate with Azure
Monitor logs as an alternative for data
aggregation requirements. You achieve
integration by configuring the Log
Analytics connector.

End-to-end diagnostics, root-cause Yes Yes, limited


analysis, and timely troubleshooting Supports end-to-end diagnostics and
troubleshooting only for on-premises
infrastructure and applications. Uses
other System Center components or
partner solutions.

Interactive visualizations (Dashboards) Yes Yes, limited


Delivers essential dashboards with its
HTML5 web console or an advanced
experience from partner solutions, such
as Squared Up and Savision.

Integration with IT or DevOps tools Yes Yes, limited

Collect and stream monitoring data to third-party or on-premises tools


To collect metrics and logs from Azure infrastructure and platform resources, you need to enable Azure
Diagnostics logs for those resources. Additionally, with Azure VMs, you can collect metrics and logs from the
guest OS by enabling the Azure Diagnostics extension. To forward the diagnostics data that's emitted from your
Azure resources to your on-premises tools or managed service provider, configure Event Hubs to stream the data
to them.
Monitor with System Center Operations Manager
Although System Center Operations Manager was originally designed as an on-premises solution to monitor
across applications, workloads, and infrastructure components that are running in your IT environment, it evolved
to include cloud-monitoring capabilities. It integrates with Azure, Office 365, and Amazon Web Services (AWS ). It
can monitor across these diverse environments with management packs that are designed and updated to support
them.
For customers who have made significant investments in Operations Manager to achieve comprehensive
monitoring that's tightly integrated with their IT service management processes and tools, or for customers new to
Azure, it's understandable to ask the following questions:
Can Operations Manager continue to deliver value, and does it make business sense?
Do the features in Operations Manager make it the right fit for our IT organization?
Does integrating Operations Manager with Azure Monitor provide the cost-effective and comprehensive
monitoring solution that we require?
If you've already invested in Operations Manager, you don't need to focus on planning a migration to replace it
immediately. With Azure or other cloud providers that exist as an extension of your own on-premises network,
Operations Manager can monitor the guest VMs and Azure resources as if they were on your corporate network.
This approach requires a reliable network connection between your network and the Azure virtual network that
has sufficient bandwidth.
To monitor the workloads that are running in Azure, you need:
The Management Pack for Azure. It collects performance metrics emitted by Azure services such as web
and worker roles, Application Insights availability tests (web tests), Azure Service Bus, and so on. The
management pack uses the Azure REST API to monitor the availability and performance of these resources.
Some Azure service types have no metrics or predefined monitors in the Management Pack, but you can
still monitor them through the relationships defined in the Azure Management Pack for discovered
services.
The Management Pack for Azure SQL Database to monitor the availability and performance of Azure SQL
databases and Azure SQL database servers using the Azure REST API and T-SQL queries to SQL Server
system views.
To monitor the guest OS and workloads that are running on the VM, such as SQL Server, IIS, or Apache
Tomcat, you need to download and import the management pack that supports the application, service, and
OS.
Knowledge is defined in the management pack, which describes how to monitor the individual dependencies and
components. Both Azure management packs require performing a set of configuration steps in Azure and
Operations Manager before you can begin monitoring these resources.
At the application tier, Operations Manager offers basic application performance monitoring capabilities for some
legacy versions of .NET and Java. If certain applications within your hybrid cloud environment operate in an
offline or network-isolated mode, such that they can't communicate with a public cloud service, Operations
Manager Application Performance Monitoring (APM ) might be a viable option for certain limited scenarios. For
applications that are not running on legacy platforms but are hosted both on-premises and in any public cloud
that allows communication through a firewall (either direct or via a proxy) to Azure, use Azure Monitor Application
Insights. This service offers deep, code-level monitoring, with first-class support for ASP.NET, ASP.NET Core,
Java, JavaScript, and Node.js.
For any web application that can be reached externally, you should enable a type of synthetic transaction known as
availability monitoring. It's important to know whether your application or a critical HTTP/HTTPS endpoint that
your app relies on, is available and responsive. With Application Insights availability monitoring, you can run tests
from multiple Azure datacenters and provide insight into the health of your application from a global perspective.
Although Operations Manager is capable of monitoring resources that are hosted in Azure, there are several
advantages to including Azure Monitor, because its strengths overcome the limitations in Operations Manager
and can establish a strong foundation to support eventual migration from it. Here we review each of those
strengths and weaknesses, with our recommendation to include Azure Monitor in your hybrid monitoring
strategy.
Disadvantages of using Operations Manager by itself
Analyzing monitoring data in Operations Manager is commonly performed by using predefined views that
are provided by management packs accessed from the console, from SQL Server Reporting Services
(SSRS ) reports, or from custom views that end users have created. Ad hoc data analysis isn't possible out of
the box. Operations Manager reporting is inflexible. The data warehouse that provides long-term retention
of the monitoring data doesn't scale or perform well. And expertise in writing T-SQL statements,
developing a Power BI solution, or using third-party solutions is required to support the requirements for
the various personas in the IT organization.
Alerting in Operations Manager doesn't support complex expressions or include correlation logic. To help
reduce noise, alerts are grouped to show the relationships between them and to identify their causes.
Advantages of using Operations Manager with Azure Monitor
Azure Monitor is the way to work around the limitations of Operations Manager. It complements the
Operations Manager data warehouse database by collecting important performance and log data. Azure
Monitor delivers better analytics, performance (when querying large data volume), and retention than the
Operations Manager data warehouse.
With the Azure Monitor query language, you can create much more complex and sophisticated queries. You
can run queries across terabytes of data in seconds. You can quickly transform your data into pie charts,
time charts, and many other visualizations. To analyze this data, you're no longer constrained by working
with Operations Manager reports that are based on SQL Server Reporting Services, custom SQL queries,
or other workarounds.
You can deliver an improved alerting experience by implementing the Azure Monitor Alerts Management
solution. Alerts that are generated in the Operations Manager management group can be forwarded to the
Azure Monitor logs Analytics workspace. You can configure the subscription that's responsible for
forwarding alerts from Operations Manager to Azure Monitor logs to forward only certain alerts. For
example, you can forward only alerts that meet your criteria for querying in support of problem
management for trends, and investigation of the root cause of failures or problems, through a single pane
of glass. Additionally, you can correlate other log data from Application Insights or other sources, to gain
insight that help improve user experience, increase uptime, and reduce time to resolve incidents.
You can monitor cloud-native infrastructure and applications, from a simple or multitier architecture in
Azure, and you can use Operations Manager to monitor on-premises infrastructure. This monitoring
includes one or more VMs, multiple VMs placed in an availability set or virtual machine scale set, or a
containerized application deployed to Azure Kubernetes Service (AKS ) that's running on Windows Server
or Linux containers.
You can use the System Center Operations Manager Health Check solution to proactively assess the risk
and health of your System Center Operations Manager management group at regular intervals. This
solution can replace or complement any custom functionality you have added to your management group.
By using the Map feature of Azure Monitor for VMs, you can monitor standard connectivity metrics from
network connections between your Azure VMs and on-premises VMs. These metrics include response time,
requests per minute, traffic throughput, and links. You can identify failed connections, troubleshoot, perform
migration validation, perform security analysis, and verify the overall architecture of the service. Map can
automatically discover application components on Windows and Linux systems, and map the
communication between services. This automation helps you identify connections and dependencies you
were unaware of, plan and validate migration to Azure, and minimize speculation during incident resolution.
By using Network Performance Monitor, you can monitor the network connectivity between:
Your corporate network and Azure.
Mission-critical multitier applications and microservices.
User locations and web-based applications (HTTP/HTTPS ).
This strategy delivers visibility of the network layer, without the need for SNMP. It can also present, in an
interactive topology map, the hop-by-hop topology of routes between the source and destination endpoint. It's a
better choice than attempting to accomplish the same result with network monitoring in Operations Manager or
with other network monitoring tools currently used in your environment.
Monitor with Azure Monitor
Although a migration to the cloud presents numerous challenges, it also includes a number of opportunities. It
enables your organization to migrate from one or more on-premises enterprise monitoring tools to not only
potentially reduce capital expenditures and operating costs, but also to benefit from the advantages that a cloud
monitoring platform such as Azure Monitor can deliver at cloud scale. Examine your monitoring and alerting
requirements, configuration of existing monitoring tools, and workloads transitioning to the cloud. After your plan
is finalized, configure Azure Monitor.
Monitor the hybrid infrastructure and applications, from a simple or multitier architecture where
components are hosted between Azure, other cloud providers, and your corporate network. The
components might include one or more VMs, multiple VMs placed in an availability set or virtual machine
scale set, or a containerized application that's deployed to Azure Kubernetes Service (AKS ) running on
Windows Server or Linux containers.
Enable Azure Monitor for VMs, Azure Monitor for containers, and Application Insights to detect and
diagnose issues between infrastructure and applications. For a more thorough analysis and correlation of
data collected from the multiple components or dependencies supporting the application, you need to use
Azure Monitor logs.
Create intelligent alerts that apply to a core set of applications and service components, help reduce alert
noise with dynamic thresholds for complex signals, and use alert aggregation based on machine learning
algorithms to help identify the issue quickly.
Define a library of queries and dashboards to support the requirements of the various personas in the IT
organization.
Define standards and methods for enabling monitoring across the hybrid and cloud resources, a
monitoring baseline for each resource, alert thresholds, and so on.
Configure role-based access control (RBAC ) so you grant users and groups only the amount of access they
need to monitor data from resources they are responsible for managing.
Include automation and self-service to enable each team to create, enable, and tune their monitoring and
alerting configurations as needed.

Private cloud monitoring


You can achieve holistic monitoring of Azure Stack with System Center Operations Manager. Specifically, you can
monitor the workloads that are running in the tenant, the resource level, on the virtual machines, and the
infrastructure hosting Azure Stack (physical servers and network switches).
You can also achieve holistic monitoring with a combination of infrastructure monitoring capabilities that are
included in Azure Stack. These capabilities help you view health and alerts for an Azure Stack region and the Azure
Monitor service in Azure Stack, which provides base-level infrastructure metrics and logs for most services.
If you've already invested in Operations Manager, use the Azure Stack management pack to monitor the
availability and health state of Azure Stack deployments, including regions, resource providers, updates, update
runs, scale units, unit nodes, infrastructure roles, and their instances (logical entities comprised of the hardware
resources). This management pack uses the Health and Update resource provider REST APIs to communicate
with Azure Stack. To monitor physical servers and storage devices, use the OEM vendors' management pack (for
example, provided by Lenovo, Hewlett Packard, or Dell). Operations Manager can natively monitor the network
switches to collect basic statistics by using SNMP. Monitoring the tenant workloads is possible with the Azure
management pack by following two basic steps. Configure the subscription that you want to monitor, and then add
the monitors for that subscription.

Next steps
Collect the right data
Cloud monitoring guide: Collect the right data
2 minutes to read • Edit Online

This article describes some considerations for collecting monitoring data in a cloud application.
To observe the health and availability of your cloud solution, you must configure the monitoring tools to collect a
level of signals that are based on predictable failure states. These signals are the symptoms of the failure, not the
cause. The monitoring tools use metrics and, for advanced diagnostics and root cause analysis, logs.
Plan for monitoring and migration carefully. Start by including the monitoring service owner, the manager of
operations, and other related personnel during the planning phase, and continue engaging them throughout the
development and release cycle. Their focus will be to develop a monitoring configuration that's based on the
following criteria:
What's the composition of the service, and are those dependencies monitored today? If so, are there multiple
tools involved? Is there an opportunity to consolidate, without introducing risks?
What is the SLA of the service, and how will I measure and report it?
What should the service dashboard look like when an incident is raised? What should the dashboard look like
for the service owner, and for the team that supports the service?
What metrics does the resource produce that I need to monitor?
How will the service owner, support teams, and other personnel be searching the logs?
How you answer those questions, and the criteria for alerting, determines how you'll use the monitoring platform.
If you're migrating from an existing monitoring platform or set of monitoring tools, use the migration as an
opportunity to reevaluate the signals you collect. This is especially true now that there are several cost factors to
consider when you migrate or integrate with a cloud-based monitoring platform like Azure Monitor. Remember,
monitoring data needs to be actionable. You need to have optimized data collected to give you "a 10,000 foot
view" of the overall health of the service. The instrumentation that's defined to identify real incidents should be as
simple, predictable, and reliable as possible.

Develop a monitoring configuration


The monitoring service owner and team typically follow a common set of activities to develop a monitoring
configuration. These activities start at the initial planning stages, continue through testing and validating in a
nonproduction environment, and extend to deploying into production. Monitoring configurations are derived from
known failure modes, test results of simulated failures, and the experience of several people in the organization
(the service desk, operations, engineers, and developers). Such configurations assume that the service already
exists, it's being migrated to the cloud, and it hasn't been rearchitected.
For service-level quality results, monitor the health and availability of these services early in the development
process. If you monitor the design of that service or application as an afterthought, your results won't be as
successful.
To drive quicker resolution of the incident, consider the following recommendations:
Define a dashboard for each service component.
Use metrics to help guide further diagnosis and to identify a resolution or workaround of the issue if a root
cause can't be uncovered.
Use dashboard drill-down capabilities, or support customizing the view to refine it.
If you need verbose logs, metrics should have helped target the search criteria. If the metrics didn't help,
improve them for the next incident.
Embracing this guiding set of principles can help give you near-real-time insights, as well as better management of
your service.

Next steps
Alerting strategy
Cloud monitoring guide: Alerting
11 minutes to read • Edit Online

For years, IT organizations have struggled to combat the alert fatigue that's created by the monitoring tools
deployed in the enterprise. Many systems generate a high volume of alerts often considered meaningless, while
other alerts are relevant but are either overlooked or ignored. As a result, IT and developer operations have
struggled to meet the service-level quality promised to internal or external customers. To ensure reliability, it's
essential to understand the state of your infrastructure and applications. To minimize service degradation and
disruption, or to decrease the effect of or reduce the number of incidents, you need to identify causes quickly.

Successful alerting strategy


You can't fix what you don't know is broken.
Alerting on what matters is critical. It's underpinned by collecting and measuring the right metrics and logs. You
also need a monitoring tool capable of storing, aggregating, visualizing, analyzing, and initiating an automated
response when conditions are met. You can improve the observability of your services and applications only if you
fully understand its composition. You map that composition into a detailed monitoring configuration to be applied
by the monitoring platform. This configuration includes the predictable failure states (the symptoms, not the cause
of the failure) that make sense to alert for.
Consider the following principles for determining whether a symptom is an appropriate candidate for alerting:
Does it matter? Is the issue symptomatic of a real problem or issue influencing the overall health of the
application? For example, do you care whether the CPU utilization is high on the resource? Or that a particular
SQL query running on a SQL database instance on that resource is consuming high CPU utilization over a
sustained period? Because the CPU utilization condition is a real issue, you should alert on it. But you don't
need to notify the team, because it doesn't help point to what is causing the condition in the first place. Alerting
and notifying on the SQL query process utilization issue is both relevant and actionable.
Is it urgent? Is the issue real, and does it need urgent attention? If so, the responsible team should be
immediately notified.
Are your customers affected? Are users of the service or application affected as a result of the issue?
Are other dependent systems affected? Are there alerts from dependencies that are interrelated, and that
can possibly be correlated to avoid notifying different teams all working on the same problem?
Ask these questions when you're initially developing a monitoring configuration. Test and validate the assumptions
in a nonproduction environment, and then deploy into production. Monitoring configurations are derived from
known failure modes, test results of simulated failures, and experience from different members of the team.
After the release of your monitoring configuration, you can learn a lot about what's working and what's not.
Consider high alert volume, issues unnoticed by monitoring but noticed by end users, and what were the best
actions to have taken as part of this evaluation. Identify changes to implement to improve service delivery, as part
of an ongoing, continuous monitoring improvement process. It's not just about evaluating alert noise or missed
alerts, but also the effectiveness of how you're monitoring the workload. It's about the effectiveness of your alert
policies, process, and overall culture to determine whether you're improving.
Both System Center Operations Manager and Azure Monitor support alerts based on static or even dynamic
thresholds and actions set up on top of them. Examples include alerts for email, SMS, and voice calls for simple
notifications. Both of these services also support IT service management (ITSM ) integration, to automate the
creation of incident records and escalate to the correct support team, or any other alert management system that
uses a webhook.
When possible, you can use any of several services to automate recovery actions. These include System Center
Orchestrator, Azure Automation, Azure Logic Apps, or autoscaling in the case of elastic workloads. While notifying
the responsible teams is the most common action for alerting, automating corrective actions might also be
appropriate. This automation can help streamline the entire incident management process. Automating these
recovery tasks can also reduce the risk of human error.

Azure Monitor alerting


If you're using Azure Monitor exclusively, follow these guidelines as you consider speed, cost, and storage volume.
Depending on the feature and configuration you're using, you can store monitoring data in any of six repositories:
Azure Monitor metrics database: A time-series database used primarily for Azure Monitor platform
metrics, but also has Application Insights metric data mirrored into it. Information entering this database
has the fastest alert times.
Application Insights logs store: A database that stores most Application Insights telemetry in log form.
Azure Monitor logs store: The primary store for Azure log data. Other tools can route data to it and can
be analyzed in Azure Monitor logs. Because of ingestion and indexing, log alert queries have higher latency.
This latency is generally 5-10 minutes, but can be higher under certain circumstances.
Activity log store: Used for all activity log and service health events. Dedicated alerting is possible. Holds
subscription level events that occur on objects in your subscription, as seen from the outside of those
objects. An example might be when a policy is set or a resource is accessed or deleted.
Azure Storage: General-purpose storage that's supported by Azure Diagnostics and other monitoring
tools. It's a low -cost option for long-term retention of monitoring telemetry. Alerting isn't supported from
data that's stored in this service.
Event Hubs: Generally used to stream data into on-premises or other partners' monitoring or ITSM tools.
Azure Monitor has four types of alerts, each somewhat tied to the repository that the data is stored in:
Metric alert: Alerts on data in the Azure Monitor metrics database. Alerts occur when a monitored value
crosses a user-defined threshold, and then again when it returns to “normal” state.
Log query alert: Available to alerts on content in the Application Insights or Azure logs stores. It can also
alert based on cross-workspace queries.
Activity log alert: Alerts on items in the activity log store, with the exception of Service Health data.
Service Health alert: A special type of alert that's used only for Service Health issues that come from the
activity log store, such as outages and upcoming planned maintenance. Note that this type of alert is
configured through Azure Service Health, a companion service to Azure Monitor.
Enable alerting through partner tools
If you're using an external alerting solution, route as much as you can through Azure Event Hubs, which is the
fastest path out of Azure Monitor. You'll have to pay for ingestion into Event Hub. If cost is an issue and speed
isn't, you can use Azure Storage as a less expensive alternative. Just make sure that your monitoring or ITSM tools
can read Azure Storage to extract the data.
Azure Monitor includes support for integrating with other monitoring platforms, and ITSM software such as
ServiceNow. You can use Azure alerting and still trigger actions outside of Azure, as required by your incident
management or DevOps process. If you want to alert in Azure Monitor and automate the response, you can
initiate automated actions by using Azure Functions, Azure Logic Apps, or Azure Automation, based on your
scenario and requirements.
Specialized Azure monitoring offerings
Management solutions generally store their data in the Azure logs store. The two exceptions are Azure Monitor for
VMs and Azure Monitor for containers. The following table describes the alerting experience based on the
particular data type and where it is stored.

SOLUTION DATA TYPE ALERT BEHAVIOR

Azure Monitor for containers Calculated average performance data Create metric alerts if you want to be
from nodes and pods are written to the alerted based on variation of measured
metrics store. utilization performance, aggregated
over time.

Calculated performance data that uses Create log query alerts if you want to
percentiles from nodes, controllers, be alerted based on variation of
containers, and pods are written to the measured utilization from clusters and
logs store. Container logs and containers. Log query alerts can also be
inventory information are also written configured based on pod-phase counts
to the logs store. and status node counts.

Azure Monitor for VMs Health criteria are metrics written to the Alerts are generated when the health
metrics store. state changes from healthy to
unhealthy. This alert supports only
Action Groups that are configured to
send SMS or email notifications.

Map and guest operating system Create log query alerts.


performance log data is written to the
logs store.

Fastest speed driven by cost


Latency is one of the most critical decisions driving alerting and a quick resolution of issues affecting your service.
If you require near-real-time alerting under five minutes, evaluate first if you have or can get alerts on your
telemetry where it is stored by default. In general, this strategy is also the cheapest option, because the tool you're
using is already sending its data to that location.
That said, there are some important footnotes to this rule.
Guest OS telemetry has a number of paths to get into the system.
The fastest way to alert on this data is to import it as custom metrics. Do this by using the Azure
Diagnostics extension and then using a metric alert. However, custom metrics are currently in preview and
are more expensive than other options.
The least expensive but slowest method is to send it to the Azure logs Kusto store. Running the Log
Analytics agent on the VM is the best way to get all guest operating system metric and log data into this
store.
You can send it to both stores by running both the extension and the agent on the same VM. You can then
alert quickly but also use the guest operating system data as part of more complex searches when you
combine it with other telemetry.
Importing data from on-premises: If you're trying to query and monitor across machines running in Azure and
on-premises, you can use the Log Analytics agent to collect guest operating system data. You can then use a
feature called logs to metrics to streamline the metrics into the metrics store. This method bypasses part of the
ingestion process into the Azure logs store, and the data is thus available sooner in the metrics database.
Minimize alerts
If you use a solution such as Azure Monitor for VMs and find the default health criteria that monitors performance
utilization acceptable, don't create overlapping metric or log query alerts based on the same performance
counters.
If you aren't using Azure Monitor for VMs, make the job of creating alerts and managing notifications easier by
exploring the following features:

NOTE
These features apply only to metric alerts, alerts based on data that's being sent to the Azure Monitor metric database. The
features don't apply to the other types of alerts. As mentioned previously, the primary objective of metric alerts is speed. If
getting an alert in less than five minutes isn't of primary concern, you can use a log query alert instead.

Dynamic thresholds: Dynamic thresholds look at the activity of the resource over a time period, and create
upper and lower "normal behavior" thresholds. When the metric being monitored falls outside of these
thresholds, you get an alert.
Multisignal alerts: You can create a metric alert that uses the combination of two different inputs from two
different resource types. For example, if you want to fire an alert when the CPU utilization of a VM is over
90 percent, and the number of messages in a certain Azure Service Bus queue feeding that VM exceeds a
certain amount, you can do so without creating a log query. This feature works for only two signals. If you
have a more complex query, feed your metric data into the Azure Monitor log store, and use a log query.
Multiresource alerts: Azure Monitor allows a single metric alert rule that applies to all VM resources. This
feature can save you time because you don't need to create individual alerts for each VM. Pricing for this
type of alert is the same. Whether you create 50 alerts for monitoring CPU utilization for 50 VMs, or one
alert that monitors CPU utilization for all 50 VMs, it costs you the same amount. You can use these types of
alerts in combination with dynamic thresholds as well.
Used together, these features can save time by minimizing alert notifications and the management of the
underlying alerts.
Alerts limitations
Be sure to note the limitations on the number of alerts you can create. Some limits (but not all of them) can be
increased by calling support.
Best query experience
If you're looking for trends across all your data, it makes sense to import all your data into Azure Logs, unless it's
already in Application Insights. You can create queries across both workspaces, so there's no need to move data
between them. You can also import activity log and Service Health data into your Log Analytics workspace. You
pay for this ingestion and storage, but you get all your data in one place for analysis and querying. This approach
also gives you the ability to create complex query conditions and alert on them.
Cloud monitoring guide: Monitoring platforms
overview
14 minutes to read • Edit Online

Microsoft provides a range of monitoring capabilities from two products: System Center Operations Manager,
which was designed for on-premises and then extended to the cloud, and Azure Monitor, which was designed for
the cloud but can also monitor on-premises systems. These two offerings deliver core monitoring services, such as
alerting, service uptime tracking, application and infrastructure health monitoring, diagnostics, and analytics.
Many organizations are embracing the latest practices for DevOps agility and cloud innovations to manage their
heterogenous environments. Yet they are also concerned about their ability to make appropriate and responsible
decisions about how to monitor those workloads.
This article provides a high-level overview of our monitoring platforms to help you understand how each delivers
core monitoring functionality.

The story of System Center Operations Manager


In 2000, we entered the operations management field with Microsoft Operations Manager (MOM ) 2000. In 2007,
we introduced a reengineered version of the product, named System Center Operations Manager. It moved
beyond simple monitoring of a Windows server and concentrated on robust, end-to-end service and application
monitoring, including heterogenous platforms, network devices, and other application or service dependencies. It's
an established, enterprise-grade monitoring platform for on-premises environments, in the same class as IBM
Tivoli or HP Operations Manager in the industry. It has grown to support monitoring compute and platform
resources running in Azure, Amazon Web Services (AWS ), and other cloud providers.

The story of Azure Monitor


When Azure was released in 2010, monitoring of cloud services was provided with the Azure Diagnostics agent,
which provided a way to collect diagnostics data from Azure resources. This capability was considered a general
monitoring tool rather than an enterprise-class monitoring platform.
Application Insights was introduced to shift with changes in the industry where proliferation of cloud, mobile, and
IoT devices was growing and the introduction of DevOps practices. It grew from Application Performance
Monitoring in Operations Manager to a service in Azure, where it delivers rich monitoring of web applications
written in a variety of languages. In 2015, the preview of Application Insights for Visual Studio was announced and
later, it became known as just Application Insights. It collects details about application performance, requests and
exceptions, and traces.
In 2015, Azure Operational Insights was made generally available. It delivered the Log Analytics analysis service
that collected and searched data from machines in Azure, on-premises, or other cloud environments, and
connected to System Center Operations Manager. Intelligence packs were offered that delivered a variety of
prepackaged management and monitoring configurations that contained a collection of query and analytic logic,
visualizations, and data collection rules for such scenarios as security auditing, health assessments, and alert
management. Later, Azure Operational Insights became known as Log Analytics.
In 2016, the preview of Azure Monitor was announced at the Microsoft Ignite conference. It provided a common
framework to collect platform metrics, resource diagnostics logs, and subscription-level activity log events from
any Azure service that started using the framework. Previously, each Azure service had its own monitoring
method.
At the 2018 Ignite conference, we announced that the Azure Monitor brand expanded to include several different
services originally developed with independent functionality:
The original Azure Monitor, for collecting platform metrics, resource diagnostics logs, and activity logs for
Azure platform resources only.
Application Insights, for application monitoring.
Log Analytics, the primary location for collecting and analyzing log data.
A new unified alerting service, which brought together alert mechanisms from each of the other services
mentioned earlier.
Azure Network Watcher, for monitoring, diagnosing, and viewing metrics for resources in an Azure virtual
network.

The story of Operations Management Suite (OMS)


From 2015 until April 2018, Operations Management Suite (OMS ) was a bundling of the following Azure
management services for licensing purposes:
Application Insights
Azure Automation
Azure Backup
Operational Insights (later rebranded as Log Analytics)
Site Recovery
The functionality of the services that were part of OMS did not change when OMS was discontinued. They were
realigned under Azure Monitor.

Infrastructure requirements
Operations Manager
Operations Manager requires significant infrastructure and maintenance to support a management group, which
is a basic unit of functionality. At a minimum, a management group consists of one or more management servers,
a SQL Server instance, hosting the operational and reporting data warehouse database, and agents. The
complexity of a management group design depends on a number of factors, such as the scope of workloads to
monitor, and the number of devices or computers supporting the workloads. If you require high availability and
site resiliency, as is commonly the case with enterprise monitoring platforms, the infrastructure requirements and
associated maintenance can increase dramatically.
Operations Manager
Management Group

Network Device

Web
console UNIX/Linux System

Management
server Agent-managed
system
Operations Operational DB
console
Data Warehouse DB
Agentless-managed
system
Reporting DB

Azure Monitor
Azure Monitor is a software as a service (SaaS ) service, where all the infrastructure supporting it is running in
Azure and managed by Microsoft. It's designed to perform monitoring, analytics, and diagnostics at scale, and is
available in all national clouds. Core parts of the infrastructure (collectors, metrics and logs store, and analytics)
that are necessary to support Azure Monitor are maintained by Microsoft.
Azure Monitor
Sources Collectors Storage Usage

Insights
Application Container VM Monitoring
Solutions

Application Application Insights


SDK Visualize
Operating System Workbooks Views Dashboards Power BI
Azure Diagnostics
Extension Metrics
Azure Resources
Log Analytics Analyze
Agent
Azure Subscription Metric Explorer Log Analytics
Logs
Activity Log
Azure Tenant
Respond
Other methods Autoscale Alerts
Custom Sources

Integrate
Export APIs Logic Apps

Grey items not part of Azure Monitor, but part of Azure Monitor story.

Data collection
Operations Manager
Agents
Operations Manager collects data directly only from agents that are installed on Windows computers. It can accept
data from the Operations Manager SDK, but this approach is typically used for partners that extend the product
with custom applications, not for collecting monitoring data. It can collect data from other sources, such as Linux
computers and network devices, by using special modules that run on the Windows agent that remotely accesses
these other devices.
Network Device

Linux server
Management
server
Agent-managed
Windows computer

The Operations Manager agent can collect from multiple data sources on the local computer, such as the event log,
custom logs, and performance counters. It can also run scripts, which can collect data from the local computer or
from external sources. You can write custom scripts to collect data that can't be collected by other means, or to
collect data from a variety of remote devices that can't otherwise be monitored.
Management packs
Operations Manager performs all monitoring with workflows (rules, monitors, and object discoveries). These
workflows are packaged together in a management pack and deployed to agents. Management packs are available
for a variety of products and services, which include predefined rules and monitors. You can also author your own
management pack for your own applications and custom scenarios.
Monitoring configuration
Management packs can contain hundreds of rules, monitors, and object discovery rules. An agent runs all these
monitoring settings from all the management packs that apply, which are determined by discovery rules. Each
instance of each monitoring setting runs independently and acts immediately on the data that it collects. This is
how Operations Manager can achieve near-real-time alerting and the current health state of monitored resources.
For example, a monitor might sample a performance counter every few minutes. If that counter exceeds a
threshold, it immediately sets the health state of its target object, which immediately triggers an alert in the
management group. A scheduled rule might watch for a particular event to be created and immediately fire an
alert when that event is created in the local event log.
Because these monitoring settings are isolated from each other and work from the individual sources of data,
Operations Manager has challenges correlating data between multiple sources. It's also difficult to react to data
after it's been collected. You can run workflows that access the Operations Manager database, but this scenario
isn't common, and it's typically used for a limited number of special purpose workflows.
Operations Manager
Management Group

Network Device

Web
console UNIX/Linux System

Management
server Agent-managed
system
Operations Operational DB
console
Data Warehouse DB
Agentless-managed
system
Reporting DB

Azure Monitor
Data sources
Azure Monitor collects data from a variety of sources, including Azure infrastructure and platform resources,
agents on Windows and Linux computers, and monitoring data collected in Azure storage. Any REST client can
write log data to Azure Monitor by using an API, and you can define custom metrics for your web applications.
Some metric data can be routed to different locations, depending on its usage. For example, you might use the
data for "fast-as-possible" alerting or for long-term trend analysis searches in conjunction with other log data.
Monitoring solutions and insights
Monitoring solutions use the logs platform in Azure Monitor to provide monitoring for a particular application or
service. They typically define data collection from agents or from Azure services, and provide log queries and
views to analyze that data. They typically don't provide alert rules, which means that you must define your own
alert criteria based on collected data.
Insights, such as Azure Monitor for containers and Azure Monitor for VMs, use the logs and metrics platform of
Azure Monitor to provide a customized monitoring experience for an application or service in the Azure portal.
They might provide health monitoring and alerting conditions, in addition to customized analysis of collected data.
Monitoring configuration
Azure Monitor separates data collection from actions taken against that data, which supports distributed
microservices in a cloud environment. It consolidates data from multiple sources into a common data platform,
and provides analysis, visualization, and alerting capabilities based on the collected data.
All data that's collected by Azure Monitor is stored as either logs or metrics, and different features of Monitor rely
on either. Metrics contain numerical values in time series that are well suited for near-real-time alerting and fast
detection of issues. Logs contain text or numerical data, and are supported by a powerful query language that
make them especially useful for performing complex analysis.
Because Monitor separates data collection from actions against that data, it might not be able to provide near-real-
time alerting in many cases. To alert on log data, queries are run on a recurring schedule defined in the alert. This
behavior allows Azure Monitor to easily correlate data from all monitored sources, and you can interactively
analyze data in a variety of ways. This is especially helpful for doing root cause analysis and identifying where else
an issue might occur.

Health monitoring
Operations Manager
Management Packs in Operations Manager include a service model that describes the components of the
application being monitored and their relationship. Monitors identify the current health state of each component
based on data and scripts on the agent. Health states roll up so that you can quickly view the summarized health
state of monitored computers and applications.
Azure Monitor
Azure Monitor doesn't provide a user-definable method of implementing a service model or monitors that indicate
the current health state of any service components. Because monitoring solutions are based on standard features
of Azure Monitor, they don't provide state-level monitoring. The following features of Azure Monitor can be
helpful:
Application Insights: Builds a composite map of your web application, and provides a health state for
each application component or dependency. This includes alerts status and drill-down to more detailed
diagnostics of your application.
Azure Monitor for VMs: Delivers a health-monitoring experience for the guest Azure VMs, similar to that
of Operations Manager, when it monitors Windows and Linux virtual machines. It evaluates the health of
key operating system components from the perspective of availability and performance to determine the
current health state. When it determines that the guest VM is experiencing sustained resource utilization,
disk-space capacity, or an issue related to core operating system functionality, it generates an alert to bring
this state to your attention.
Azure Monitor for containers: Monitors the performance and health of Azure Kubernetes Service or
Azure Container Instances. It collects memory and processor metrics from controllers, nodes, and
containers that are available in Kubernetes through the Metrics API. It also collects container logs and
inventory data about containers and their images. Predefined health criteria that are based on the collected
performance data help you identify whether a resource bottleneck or capacity issue exists. You can also
understand the overall performance, or the performance from a specific Kubernetes object type (pod, node,
controller, or container).

Analyze data
Operations Manager
Operations Manager provides four basic ways to analyze data after it has been collected:
Health Explorer: Helps you discover which monitors are identifying a health state issue and review
knowledge about the monitor and possible causes for actions related to it.
Views: Offers predefined visualizations of collected data, such as a graph of performance data or a list of
monitored components and their current health state. Diagram views visually present the service model of
an application.
Reports: Allow you to summarize historical data that's stored in the Operations Manager data warehouse.
You can customize the data that views and reports are based on. However, there is no feature to allow for
complex or interactive analysis of collected data.
Operations Manager Command Shell: Extends Windows PowerShell with an additional set of cmdlets,
and can query and visualize collected data. This includes graphs and other visualizations, natively with
PowerShell, or with the Operations Manager HTML -based web console.
Azure Monitor
With the powerful Azure Monitor analytics engine, you can interactively work with log data and combine them
with other monitoring data for trending and other data analysis. Views and dashboards allow you to visualize
query data in a variety of ways from the Azure portal, and import it into Power BI. Monitoring solutions include
queries and views to present the data they collect. Insights such as Application Insights, Azure Monitor for VMs,
and Azure Monitor for containers include customized visualizations to support interactive monitoring scenarios.

Alerting
Operations Manager
Operations Manager creates alerts in response to predefined events, when a performance threshold is met, and
when the health state of a monitored component changes. It includes the complete management of alerts, allowing
you to set their resolution and assign them to various operators or system engineers. You can set notification rules
that specify which alerts will send proactive notifications.
Management packs include various predefined alerting rules for different critical conditions in the application
being monitored. You can tune these rules or create custom rules to the particular requirements of your
environment.
Azure Monitor
With Azure Monitor, you can create alerts based on a metric crossing a threshold, or based on a scheduled query
result. Although alerts based on metrics can achieve near-real-time results, scheduled queries have a longer
response time, depending on the speed of data ingestion and indexing. Instead of being limited to a specific agent,
log query alerts in Azure Monitor let you analyze data across all data stored in multiple workspaces. These alerts
also include data from a specific Application Insights app by using a cross-workspace query.
Although monitoring solutions can include alert rules, you ordinarily create them based on your own
requirements.

Workflows
Operations Manager
Management packs in Operations Manager contain hundreds of individual workflows, and they determine both
what data to collect and what action to perform with that data. For example, a rule might sample a performance
counter every few minutes, storing its results for analysis. A monitor might sample the same performance counter
and compare its value to a threshold to determine the health state of a monitored object. Another rule might run a
script to collect and analyze some data on an agent computer, and then fire an alert if it returns a particular value.
Workflows in Operations Manager are independent of each other, which makes analysis across multiple monitored
objects difficult. These monitoring scenarios must be based on data after it's collected, which is possible but can be
difficult, and it isn't common.
Azure Monitor
Azure Monitor separates data collection from actions and analysis taken from that data. Agents and other data
sources write log data to a Log Analytics workspace and write metric data to the metric database, without any
analysis of that data or knowledge of how it might be used. Monitor performs alerting and other actions from the
stored data, which allows you to perform analysis across data from all sources.

Extend the base platform


Operations Manager
Operations Manager implements all monitoring logic in a management pack, which you either create yourself or
obtain from us or a partner. When you install a management pack, it automatically discovers components of the
application or service on different agents, and deploys appropriate rules and monitors. The management pack
contains health definitions, alert rules, performance and event collection rules, and views, to provide complete
monitoring that supports the infrastructure service or application.
The Operations Manager SDK enables Operations Manager to integrate with third-party monitoring platforms or
IT service management (ITSM ) software. The SDK is also used by some partner management packs to support
monitoring network devices and deliver custom presentation experiences, such as the Squared Up HTML5
dashboard or integration with Microsoft Office Visio.
Azure Monitor
Azure Monitor collects metrics and logs from Azure resources, with little to no configuration. Monitoring solutions
add logic for monitoring an application or service, but they still work within the standard log queries and views in
Monitor. Insights, such as Application Insights and Azure Monitor for VMs, use the Monitor platform for data
collecting and processing. They also provide additional tools to visualize and analyze the data. You can combine
data collected by insights with other data, by using core Monitor features such as log queries and alerts.
Monitor supports several methods to collect monitoring or management data from Azure or external resources.
You can then extract and forward data from the metric or log stores to your ITSM or monitoring tools. Or you can
perform administrative tasks by using the Azure Monitor REST API.

Next steps
Monitoring the cloud deployment models
Centralize management operations
2 minutes to read • Edit Online

For most organizations, using a single Azure Active Directory (Azure AD ) tenant for all users simplifies
management operations and reduces maintenance costs. This is because all management tasks can be by
designated users, user groups, or service principals within that tenant.
We recommend that you use only one Azure AD tenant for your organization, if possible. However, some
situations might require an organization to maintain multiple Azure AD tenants for the following reasons:
They are wholly independent subsidiaries.
They're operating independently in multiple geographies.
Certain legal or compliance requirements apply.
There are acquisitions of other organizations (sometimes temporary until a long-term tenant consolidation
strategy is defined).
When a multiple-tenant architecture is required, Azure Lighthouse provides a way to centralize and streamline
management operations. Subscriptions from multiple tenants can be onboarded for Azure delegated resource
management. This option allows specified users in the managing tenant to perform cross-tenant management
functions in a centralized and scalable manner.
For example, let's say your organization has a single tenant, Tenant A. The organization then acquires two
additional tenants, Tenant B and Tenant C, and you have business reasons that require you to maintain them as
separate tenants.
Your organization wants to use the same policy definitions, backup practices, and security processes across all
tenants. Because you already have users (including user groups and service principals) that are responsible for
performing these tasks within Tenant A, you can onboard all of the subscriptions within Tenant B and Tenant C so
that those same users in Tenant A can perform those tasks. Tenant A then becomes the managing tenant for Tenant
B and Tenant C.

For more information, see Azure Lighthouse in enterprise scenarios.


Establish an operational fitness review
9 minutes to read • Edit Online

As your enterprise begins to operate workloads in Azure, the next step is to establish a process for operational
fitness review. This process enumerates, implements, and iteratively reviews the nonfunctional requirements for
these workloads. Nonfunctional requirements are related to the expected operational behavior of the service.
There are five essential categories of nonfunctional requirements, which are called the pillars of software quality:
Scalability
Availability
Resiliency, including business continuity and disaster recovery
Management
Security
A process for operational fitness review ensures that your mission-critical workloads meet the expectations of your
business with respect to the quality pillars.
Create a process for operational fitness review to fully understand the problems that result from running
workloads in a production environment, and how to remediate and resolve those problems. This article outlines a
high-level process for operational fitness review that your enterprise can use to achieve this goal.

Operational fitness at Microsoft


From the outset, many teams across Microsoft have been involved in the development of the Azure platform. It's
difficult to ensure quality and consistency for a project of such size and complexity. You need a robust process to
enumerate and implement fundamental nonfunctional requirements on a regular basis.
The processes that Microsoft follows form the basis for the processes outlined in this article.

Understand the problem


As you learned in Getting started, the first step in an enterprise's digital transformation is to identify the business
problems to be solved by adopting Azure. The next step is to determine a high-level solution to the problem, such
as migrating a workload to the cloud or adapting an existing, on-premises service to include cloud functionality.
Finally, you design and implement the solution.
During this process, the focus is often on the features of the service: the set of functional requirements that you
want the service to perform. For example, a product-delivery service requires features for determining the source
and destination locations of the product, tracking the product during delivery, and sending notifications to the
customer.
The nonfunctional requirements, in contrast, relate to properties such as the service's availability, resiliency, and
scalability. These properties differ from the functional requirements because they don't directly affect the final
function of any particular feature in the service. However, nonfunctional requirements do relate to the performance
and continuity of the service.
You can specify some nonfunctional requirements in terms of a service-level agreement (SLA). For example, you
can express service continuity as a percentage of availability: "Available 99.99% of the time". Other nonfunctional
requirements might be more difficult to define and might change as production needs change. For example, a
consumer-oriented service might face unanticipated throughput requirements after a surge of popularity.
NOTE
For more details about resiliency requirements, see Designing reliable Azure applications. That article includes explanations of
concepts like recovery-point objective (RPO), recovery-time objective (RTO), and SLA.

Process for operational fitness review


The key to maintaining the performance and continuity of an enterprise's services is to implement a process for
operational fitness review.

At a high level, the process has two phases. In the prerequisites phase, the requirements are established and
mapped to supporting services. This phase occurs infrequently: perhaps annually or when new operations are
introduced. The output of the prerequisites phase is used in the flow phase. The flow phase occurs more frequently,
such as monthly.
Prerequisites phase
The steps in this phase capture the requirements for conducting a regular review of the important services.
1. Identify critical business operations. Identify the enterprise's mission-critical business operations.
Business operations are independent from any supporting service functionality. In other words, business
operations represent the actual activities that the business needs to perform and that are supported by a set
of IT services.
The term mission-critical (or business critical) reflects a severe impact on the business if the operation is
impeded. For example, an online retailer might have a business operation, such as "enable a customer to add
an item to a shopping cart" or "process a credit card payment." If either of these operations fails, a customer
can't complete the transaction and the enterprise fails to realize sales.
2. Map operations to services. Map the critical business operations to the services that support them. In the
shopping-cart example, several services might be involved, including an inventory stock-management
service and a shopping-cart service. To process a credit-card payment, an on-premises payment service
might interact with a third-party, payment-processing service.
3. Analyze service dependencies. Most business operations require orchestration among multiple
supporting services. It's important to understand the dependencies between the services, and the flow of
mission-critical transactions through these services.
Also consider the dependencies between on-premises services and Azure services. In the shopping-cart
example, the inventory stock-management service might be hosted on-premises and ingest data entered by
employees from a physical warehouse. However, it might store data off-premises in an Azure service, such
as Azure Storage, or a database, such as Azure Cosmos DB.
An output from these activities is a set of scorecard metrics for service operations. The scorecard measures criteria
such as availability, scalability, and disaster recovery. Scorecard metrics express the operational criteria that you
expect the service to meet. These metrics can be expressed at any level of granularity that's appropriate for the
service operation.
The scorecard should be expressed in simple terms to facilitate meaningful discussion between the business
owners and engineering. For example, a scorecard metric for scalability might be color-coded in a simple way.
Green means meeting the defined criteria, yellow means failing to meet the defined criteria but actively
implementing a planned remediation, and red means failing to meet the defined criteria with no plan or action.
It's important to emphasize that these metrics should directly reflect business needs.
Service -review phase
The service-review phase is the core of the operational fitness review. It involves these steps:
1. Measure service metrics. Use the scorecard metrics to monitor the services, to ensure that the services
meet the business expectations. Service monitoring is essential. If you can't monitor a set of services with
respect to the nonfunctional requirements, consider the corresponding scorecard metrics to be red. In this
case, the first step for remediation is to implement the appropriate service monitoring. For example, if the
business expects a service to operate with 99.99% availability, but there is no production telemetry in place
to measure availability, assume that you're not meeting the requirement.
2. Plan remediation. For each service operation for which metrics fall below an acceptable threshold,
determine the cost of remediating the service to bring operation to an acceptable level. If the cost of
remediating the service is greater than the expected revenue generation of the service, move on to consider
the intangible costs, such as customer experience. For example, if customers have difficulty placing a
successful order by using the service, they might choose a competitor instead.
3. Implement remediation. After the business owners and engineering team agree on a plan, implement it.
Report the status of the implementation whenever you review scorecard metrics.
This process is iterative, and ideally your enterprise has a team dedicated to it. This team should meet regularly to
review existing remediation projects, kick off the fundamental review of new workloads, and track the enterprise's
overall scorecard. The team should also have the authority to hold remediation teams accountable if they're behind
schedule or fail to meet metrics.

Structure of the review team


The team responsible for operational fitness review is composed of the following roles:
Business owner: Provides knowledge of the business to identify and prioritize each mission-critical
business operation. This role also compares the mitigation cost to the business impact, and drives the final
decision on remediation.
Business advocate: Breaks down business operations into discreet parts, and maps those parts to services
and infrastructure, whether on-premises or in the cloud. The role requires deep knowledge of the
technology associated with each business operation.
Engineering owner: Implements the services associated with the business operation. These individuals
might participate in the design, implementation, and deployment of any solutions for nonfunctional
requirement problems that are uncovered by the review team.
Service owner: Operates the business's applications and services. These individuals collect logging and
usage data for these applications and services. This data is used both to identify problems and to verify fixes
after they're deployed.
Review meeting
We recommend that your review team meet on a regular basis. For example, the team might meet monthly, and
then report status and metrics to senior leadership on a quarterly basis.
Adapt the details of the process and meeting to fit your specific needs. We recommend the following tasks as a
starting point:
1. The business owner and business advocate enumerate and determine the nonfunctional requirements for
each business operation, with input from the engineering and service owners. For business operations that
have been identified previously, review and verify the priority. For new business operations, assign a priority
in the existing list.
2. The engineering and service owners map the current state of business operations to the corresponding on-
premises and cloud services. The mapping is a list of the components in each service, oriented as a
dependency tree. The engineering and service owners then determine the critical paths through the tree.
3. The engineering and service owners review the current state of operational logging and monitoring for the
services listed in the previous step. Robust logging and monitoring are critical: they identify service
components that contribute to a failure to meet nonfunctional requirements. If sufficient logging and
monitoring aren't in place, the team must put them in place by creating and implementing a plan.
4. The team creates scorecard metrics for new business operations. The scorecard consists of the list of
constituent components for each service identified in step 2. It's aligned with the nonfunctional
requirements, and includes a measure of how well each component meets the requirements.
5. For constituent components that fail to meet nonfunctional requirements, the team designs a high-level
solution, and assigns an engineering owner. At this point, the business owner and business advocate
establish a budget for the remediation work, based on the expected revenue of the business operation.
6. Finally, the team conducts a review of the ongoing remediation work. Each of the scorecard metrics for work
in progress is reviewed against the expected criteria. For constituent components that meet metric criteria,
the service owner presents logging and monitoring data to confirm that the criteria are met. For those
constituent components that don't meet metric criteria, each engineering owner explains the problems that
are preventing criteria from being met, and presents any new designs for remediation.

Recommended resources
Pillars of software quality. This section of the Azure Application Architecture Guide describes the five pillars of
software quality: scalability, availability, resiliency, management, and security.
Ten design principles for Azure applications. This section of the Azure Application Architecture Guide discusses
a set of design principles to make your application more scalable, resilient, and manageable.
Designing resilient applications for Azure. This guide starts with a definition of the term resiliency and related
concepts. Then, it describes a process for achieving resiliency by using a structured approach over the lifetime of
an application, from design and implementation to deployment and operations.
Cloud design patterns. These design patterns are useful for engineering teams when building applications on
the pillars of software quality.
Azure Advisor. Advisor provides recommendations that are personalized based on your usage and
configurations to help you optimize your resources for high availability, security, performance, and cost.
IT management and operations in the cloud
2 minutes to read • Edit Online

As a business moves to a cloud-based model, the importance of proper management and operations can't be
overstated. Unfortunately, few organizations are prepared for the IT management shift that's required for success
in building a cloud-first operating model. This section of the Cloud Adoption Framework outlines the operating
model, processes, and tooling that have proven successful in the cloud. Each of these areas represents a minor but
fundamental change in the way the business should view IT operations and management as it begins to adopt the
cloud.

Brief history of IT management


Before the cloud, IT management grew from a simple acquisition function. Acquisition of technical equipment to
support business processes required technical expertise and deep experience with a specific group of equipment
vendors. IT management consolidated the selection, acquisition, and configuration of IT assets. Generally, the
acquired assets included storage, computing power, networking, and other similar assets that are required to power
the desired business function. As the primary subject matter experts on the equipment, IT was also tasked with
operating the equipment to ensure maximum performance and minimal business disruptions.
When the business builds out new technology solutions, it has a clear need that can justify the significant expenses
associated with acquiring assets, or even building out full datacenters. When it builds solutions, the business sees
the acquisition costs as an investment in the future. After the business need is met, the perception of the same costs
shifts. Costs that are associated with existing solutions are seen as operational drag that's created by past needs.
That perception is why many businesses view IT as a cost center. It's also why many IT organizations experience
regular cost-control exercises or reductions in IT staff.

Cloud management
The historical IT operating model was sufficient for over 20 years. But that model is now outdated and is less
desirable than cloud-first alternatives. When IT management teams move to the cloud, they have an opportunity to
rethink this model and drive greater value for the business. This article series outlines a modernized model of IT
management.

Next steps
For a deeper understanding of the new cloud management model, start with Understand business alignment.
Understand business alignment
Create business alignment in cloud management
2 minutes to read • Edit Online

In on-premises environments, IT assets (applications, virtual machines, VM hosts, disk, servers, devices, and data
sources) are managed by IT to support workload operations. In IT terms, a workload is a collection of IT assets
that support a specific business operation. To help support business operations, IT management delivers
processes that are designed to minimize disruptions to those assets. When an organization moves to the cloud,
management and operations shift a bit, creating an opportunity to develop tighter business alignment.

Business vernacular
The first step in creating business alignment is to ensure term alignment. IT management, like most engineering
professions, has amassed a collection of jargon, or highly technical terms. Such terms can lead to confusion for
business stakeholders and make it difficult to map management services to business value.
Fortunately, the process of developing a cloud adoption strategy and cloud adoption plan creates an ideal
opportunity to remap these terms. The process also creates opportunities to rethink commitments to operational
management, in partnership with the business. The following article series walks you through this new approach
across three specific terms that can help improve conversations among business stakeholders:
Criticality: Mapping workloads to business processes. Ranking criticality to focus investments.
Impact: Understanding the impact of potential outages to aid in evaluating return on investment for cloud
management.
Commitment: Developing true partnerships, by creating and documenting agreements with the business.

NOTE
Underlying these terms are classic IT terms such as SLA, RTO, and RPO. Mapping specific business and IT terms is covered in
more detail in the Commitment article.

Ops Management planning workbook


To help capture decisions that result from this conversation about term alignment, an Ops Management
workbook is available on our GitHub site. This workbook does not perform SLA or cost calculations. It serves only
to help capture such measures and forecast return on loss-avoidance efforts.
Alternatively, these same workloads and associated assets could be tagged directly in Azure, if the solutions are
already deployed to the cloud.

Next steps
Start creating business alignment by defining workload criticality.
Define workload criticality
Business criticality in cloud management
3 minutes to read • Edit Online

Across every business, there exist a small number of workloads that are too important to fail. These workloads
are considered mission critical. When those workloads experience outages or performance degradation, the
adverse impact on revenue and profitability can be felt across the entire company.
At the other end of the spectrum, some workloads can go months at a time without being used. Poor performance
or outages for those workloads is not desirable, but the impact is isolated and limited.
Understanding the criticality of each workload in the IT portfolio is the first step toward establishing mutual
commitments to cloud management. The following diagram illustrates a common alignment between the
criticality scale to follow and the standard commitments made by the business.

Criticality scale
The first step in any business criticality alignment effort is to create a criticality scale. The following table presents
a sample scale to be used as a reference, or template, for creating your own scale.

CRITICALITY BUSINESS VIEW

Mission-critical Affects the company's mission and might noticeably affect


corporate profit-and-loss statements.

Unit-critical Affects the mission of a specific business unit and its profit-
and-loss statements.

High Might not hinder the mission, but affects high-importance


processes. Measurable losses can be quantified in the case of
outages.

Medium Impact on processes is likely. Losses are low or immeasurable,


but brand damage or upstream losses are likely.

Low Impact on business processes isn't measurable. Neither brand


damage nor upstream losses are likely. Localized impact on a
single team is likely.
CRITICALITY BUSINESS VIEW

Unsupported No business owner, team, or process that's associated with


this workload can justify any investment in the ongoing
management of the workload.

It's common for businesses to include additional criticality classifications that are specific to their industry, vertical,
or specific business processes. Examples of additional classifications include:
Compliance-critical: In heavily regulated industries, some workloads might be critical as part of an effort to
maintain compliance requirements.
Security-critical: Some workloads might not be mission critical, but outages could result in loss of data or
unintended access to protected information.
Safety-critical: When lives or the physical safety of employees and customers is at risk during an outage, it
can be wise to classify workloads as safety-critical.

Importance of accurate criticality


Later in the cloud-adoption process, the cloud management team will use this classification to determine the
amount of effort required to meet aligned levels of criticality. In on-premises environments, operations
management is often purchased centrally and treated as a necessary business burden, with little or no additional
operating costs. In the cloud, operations management (like all of the cloud) is purchased on a per-asset basis as
monthly operating costs.
Because there's a clear and direct cost to operations management in the cloud, it's important to properly align
costs and desired criticality scales.

Select a default criticality


An initial review of every workload in the portfolio can be time consuming. To ensure that this effort doesn't block
your broader cloud strategy, we recommend that your teams agree on a default criticality to apply to all
workloads.
Based on the preceding criticality-scale table, we recommend that you adopt Medium criticality as the default.
Doing so will allow your cloud strategy team to quickly identify workloads that require a higher level of criticality.

Use the template


The following steps apply if you're using the Ops Management workbook to plan for cloud management.
1. Record the criticality scale on the scale tab of the workbook.
2. Update each workload in the Example or Clean Template to reflect the default criticality in the Criticality
column.
3. The business should enter the correct values to reflect any deviations from the default criticality.

Next steps
After your team has defined business criticality, you can calculate and record business impact.
Calculate and record business impact
Business impact in cloud management
4 minutes to read • Edit Online

Assume the best, prepare for the worst. In IT management, it's safe to assume that the workloads required to
support business operations will be available and will perform within agreed-upon constraints, based on the
selected criticality. However, to manage investments wisely, it's important to understand the impact on the
business when an outage or performance degradation occurs. This importance is illustrated in the following
graph, which maps potential business interruptions of specific workloads to the business impact of outages across
a relative value scale.

To create a fair basis of comparison for the impact on various workloads across a portfolio, a time/value metric is
suggested. The time/value metric captures the adverse impact of a workload outage. Generally, this impact is
recorded as a direct loss of revenue or operating revenue during a typical outage period. More specifically, it
calculates the amount of lost revenue for a unit of time. The most common time/value metric is Impact per hour,
which measures operating revenue losses per hour of outage.
A few approaches can be used to calculate impact. You can apply any of the options in the following sections to
achieve similar outcomes. It's important to use the same approach for each workload when you calculate
protected losses across a portfolio.

Start with estimates


Current operating models might make it difficult to determine an accurate impact. Fortunately, few systems need
a highly accurate loss calculation. In the previous step, Classify Criticality, we suggested that you start all
workloads with a default of medium criticality. Medium criticality workloads generally receive a standard level of
management support with a relatively low impact on operating cost. Only when a workload requires additional
operational management resources might you require an accurate financial impact.
For all standardized workloads, business impact serves as a prioritization variable when you're recovering systems
during an outage. Outside of those limited situations, the business impact creates little to no change in the
operations management experience.

Calculate time
Depending on the nature of the workload, you could calculate losses differently. For high-paced transactional
systems such as a real-time trading platform, losses per millisecond might be significant. Less frequently used
systems, such as payroll, might not be used every hour. Whether the frequency of usage is high or low, it's
important to normalize the time variable when you calculate financial impact.

Calculate total impact


When you want to consider additional management investments, it's more important that the business impact be
more accurate. The following three approaches to calculating losses are ordered from most accurate to least
accurate:
Adjusted losses: If your business has experienced a major loss event in the past, such as a hurricane or
other natural disaster, a claims adjuster might have calculated actual losses during the outage. These
calculations are based on insurance industry standards for loss calculation and risk management. Using
adjusted losses as the total amount of losses in a specific time frame can lead to highly accurate projections.
Historical losses: If your on-premises environment has suffered historically from outages resulting from
infrastructure instability, it can be a bit harder to calculate losses. But you can still apply the adjuster
formulas used internally. To calculate historical losses, compare the deltas in sales, gross revenue, and
operating costs across three time frames: before, during, and after outage. By examining these deltas, you
can identify accurate losses when no other data is available.
Complete loss calculation: If no historical data is available, you can derive a comparative loss value. In
this model, you determine the average gross revenue per hour for the business unit. When you're
projecting loss avoidance investments, it's not fair to assume that a complete system outage equates to a
100 percent loss of revenue. But you can use this assumption as a rough basis for comparing loss impacts
and prioritizing investments.
Before you make certain assumptions about potential losses associated with workload outages, it's a good idea to
work with your finance department to determine the best approach to such calculations.

Calculate workload impact


When you're calculating losses by applying historical data, you might have enough information to clearly
determine the contribution of each workload to those losses. Performing this evaluation is where partnerships
within the business are absolutely critical. After the total impact has been calculated, that impact must be
attributed across each of the workloads. That distribution of impact should come from the business stakeholders,
who should agree on the relative and cumulative impact of each workload. To that end, your team should solicit
feedback from business executives to validate alignment. Such feedback is often equal parts emotion and subject
matter expertise. It's important that this exercise represent the logic and beliefs of the business stakeholders who
should have a say in budget allocation.

Use the template


If you're using the Operations Management workbook to plan for cloud management, consider doing the
following:
Each business should update each workload in the Example or Clean Template with the Time/Value Impact of
each workload. By default, Time/Value Impact represents the projected losses per hour associated with an
outage of the workload.

Next steps
After the business has defined impact, you can align commitments.
Align management commitments with the business
Business commitment in cloud management
9 minutes to read • Edit Online

Defining business commitment is an exercise in balancing priorities. The objective is to align the proper level of
operational management at an acceptable operating cost. Finding that balance requires a few data points and
calculations, which we've outlined in this article.

Commitments to business stability, via technical resiliency or other service-level agreement (SLA) impacts, are a
business justification decision. For most workloads in an environment, a baseline level of cloud management is
sufficient. For others, a 2x to 4x cost increase is easily justified because of the potential impact of any business
interruptions.
The previous articles in this series can help you understand the classification and impact of interruptions to
various workloads. This article helps you calculate the returns. As illustrated in the preceding image, each level of
cloud management has inflection points where cost can rise faster than increases in resiliency. Those inflection
points will prompt detailed business decisions and business commitments.

Determine a proper commitment with the business


For each workload in the portfolio, the cloud operations team and cloud strategy team should align on the level of
management that's provided directly by the cloud operations team.
As you're establishing a commitment with the business, there are a few key aspects to align:
IT operations prerequisites
Management responsibility
Cloud tenancy
Soft-cost factors
Loss avoidance ROI
Validation of management level
To aid in your decision process, the remainder of this article describes each of these aspects in greater detail.

IT operations prerequisites
The Azure Management Guide outlines the management tools that are available in Azure. Before reaching a
commitment with the business, IT should determine an acceptable standard-level management baseline to be
applied to all managed workloads. IT would then calculate a standard management cost for each of the managed
workloads in the IT portfolio, based on counts of CPU cores, disk space, and other asset-related variables. IT
would also estimate a composite SLA for each workload, based on the architecture.

TIP
IT operations teams often use a default minimum of 99.9 percent uptime for the initial composite SLA. They might also
choose to normalize management costs based on the average workload, especially for solutions with minimal logging and
storage needs. Averaging the costs of a few medium criticality workloads can provide a starting point for initial
conversations.

TIP
If you're using the Ops Management workbook to plan for cloud management, the Ops management fields should be
updated to reflect these prerequisites. Those fields include Commitment level, Composite SLA, and Monthly cost. Monthly
cost should represent the cost of the added operational management tools on a monthly basis.

The operations management baseline serves as an initial starting point to be validated in each of the following
sections.

Management responsibility
In a traditional on-premises environment, the cost of managing the environment is commonly assumed to be a
sunk cost that's owned by IT operations. In the cloud, management is a purposeful decision with direct budgetary
impact. The costs of each management function can be more directly attributed to each workload that's deployed
to the cloud. This approach allows for greater control, but it does create a requirement for cloud operations teams
and cloud strategy teams to first commit to an agreement about responsibilities.
Organizations might also choose to outsource some of their ongoing management functions to a service provider.
These service providers can use Azure Lighthouse to give organizations more precise control in granting access to
their resources, along with greater visibility into the actions performed by the service providers.
Delegated responsibility: Because there's no need to centralize and assume operational management
overhead, IT operations for many organizations are considering new approaches. One common approach
is referred to as delegated responsibility. In a cloud center of excellence model, platform operations and
platform automation provide self-service management tools that can be used by business-led operations
teams, independent of a central IT operations team. This approach gives business stakeholders complete
control over management-related budgets. It also allows the cloud center of excellence (CCoE ) team to
ensure that a minimum set of guardrails has been properly implemented. In this model, IT acts as a broker
and a guide to help the business make wise decisions. Business operations oversee day to day operations
of dependent workloads.
Centralized responsibility: Compliance requirements, technical complexity, and some shared service
models might require a central IT model. In this model, IT continues to exercise its operations management
responsibilities. Environmental design, management controls, and governance tooling might be centrally
managed and controlled, which restricts the role of business stakeholders in making management
commitments. But the visibility into the cost and architecture of cloud approaches makes it much easier for
centralized IT to communicate the cost and level of management for each workload.
Mixed model: Classification is at the heart of a mixed model of management responsibilities. Companies
that are in the midst of a transformation from on-premises to cloud might require an on-premises-first
operating model for a while. Companies with strict compliance requirements, or that depend on long-term
contracts with IT outsourcing vendors, might require a centralized operating model.
Regardless of their constraints, today's businesses must innovate. When rapid innovation must flourish, in
the midst of a central-IT, centralized-responsibility model, a mixed-model approach might provide balance.
In this approach, central IT provides a centralized operating model for all workloads that are mission-
critical or contain sensitive information. At the same time, all other workload classifications might be placed
in a cloud environment that's designed for delegated responsibilities. The centralized responsibility
approach serves as the general operating model. The business then has flexibility to adopt a specialized
operating model, based on its required level of support and sensitivity.
The first step is committing to a responsibility approach, which then shapes the following commitments.
Which organization will be responsible for day-to-day operations management for this workload?

Cloud tenancy
For most businesses, management is easier when all assets reside in a single tenant. However, some organizations
might need to maintain multiple tenants. To learn why a business might require a multitenant Azure environment,
see Centralize management operations with Azure Lighthouse.
Will this workload reside in a single Azure tenant, alongside all other workloads?

Soft-cost factors
The next section outlines an approach to comparative returns that are associated with levels of management
processes and tooling. At the end of that section, each analyzed workload measures the cost of management
relative to the forecast impact of business disruptions. That approach provides a relatively easy way to understand
whether an investment in richer management approaches is warranted.
Before you run the numbers, it's important to look at the soft-cost factors. Soft-cost factors produce a return, but
that return is difficult to measure through direct hard-cost savings that would be visible in a profit-and-loss
statement. Soft-cost factors are important because they can indicate a need to invest in a higher level of
management than is fiscally prudent.
A few examples of soft-cost factors would include:
Daily workload usage by the board or CEO.
Workload usage by a top x percent of customers that leads to a greater revenue impact elsewhere.
Impact on employee satisfaction.
The next data point that's required to make a commitment is a list of soft-cost factors. These factors don't need to
be documented at this stage, but business stakeholders should be aware of the importance of these factors and
their exclusion from the following calculations.

Calculate loss avoidance ROI


When it's calculating the relative return on operations management costs, the IT team that's responsible for Cloud
Operations should complete the previously mentioned prerequisites and assume a minimum level of
management for all workloads.
The next commitment to be made is an acceptance by the business of the costs associated with the baseline-
managed offering.
Does the business agree to invest in the baseline offering to meet minimum standards of cloud
operations?
If the business does not agree to that level of management, a solution must be devised that allows the business to
proceed, without materially affecting the cloud operations of other workloads.
If the business wants more than the standard management level, the remainder of this section will help validate
that investment and the associated returns (in the form of loss avoidance).
Increased levels of management: Design principles and service catalog
For managed solutions, several design principles and template solutions can be applied in addition to the
management baseline. Each of the design principles for reliability and resiliency adds operating cost to the
workload. For IT and the business to agree on these additional commitments, it's important to understand
potential losses that can be avoided through that increased investment.
The following calculations will walk through formulas to help you better understand the differences between
losses and increased management investments. For guidance on calculating the cost of increased management,
see Workload automation and Platform automation.

TIP
If you're using the Ops Management workbook to plan for cloud management, update the Ops management fields to
reflect to reflect each conversation. Those fields include Commitment level, Composite SLA, and Monthly cost. Monthly cost
should represent the monthly cost of the added operational management tools. After they're updated, the fields will update
the ROI formulas and each of the following fields.

Estimate outage (hours per year)


Composite SLA is the service-level agreement that's based on the deployment of each asset in the workload. That
field drives Estimated Outage (labeled Est. Outage in the workbook). To calculate estimated outage in hours per
year without using the workbook, apply the following formula:

Estimated outage = (1 - Composite SLA percentage) × Number of hours in a year

The workbook uses the default value of 8,760 hours per year.
Standard loss impact
Standard loss impact (labeled Standard impact in the workbook) forecasts the financial impact of any outage,
assuming that the Estimated outage prediction proves accurate. To calculate this forecast without using the
workbook, apply the following formula:

Standard Impact = Estimated outage @ three 9s of uptime × Time-value impact

This serves as a baseline for cost, should the business stakeholders choose to invest in a higher level of
management.
Composite SLA impact
Composite SLA impact (labeled Commitment level impact in the workbook) provides updated fiscal impact, based
on the changes to the uptime SLA. This calculation allows you to compare the projected financial impact of both
options. To calculate this forecast impact without the spreadsheet, apply the following formula:

Composite SLA impact = Estimated outage × Time-value impact

The value represents the potential losses to be avoided by the changed commitment level and new composite
SLA.
Comparison basis
Comparison basis evaluates standard impact and composite SLA impact to determine which is most appropriate
in the return column.
Return on loss avoidance
If the cost of managing a workload exceeds the potential losses, the proposed investment in cloud management
might not be fruitful. To compare the Return on Loss Avoidance, see the column labeled Annual ROI****. To
calculate this column on your own, use the following formula:

Return on Loss Avoidance = (Comparison basis - (Monthly cost × 12 )) ÷ (Monthly cost × 12 ))

Unless there are other soft-cost factors to consider, this comparison can quickly suggest whether there should be a
deeper investment in cloud operations, resiliency, reliability, or other areas.

Validate the commitment


By this point in the process, commitments have been made: centralized or delegated responsibility, Azure tenancy,
and level of commitment. Each commitment should be validated and documented to ensure that the cloud
operations team, the cloud strategy team, and the business stakeholders are aligned on this commitment to
manage the workload.

Next steps
After the commitments are made, the responsible operations teams can begin configuring the workload in
question. To get started, evaluate various approaches to inventory and visibility.
Inventory and visibility options
Management leveling across cloud management
disciplines
3 minutes to read • Edit Online

The keys to proper management in any environment are consistency and repeatable processes. There are endless
of options for the things that can be done in Azure. Likewise, there are countless approaches to cloud management.
To provide consistency and repeatability, it's important to narrow those options to a consistent set of management
processes and tools that will be offered for workloads hosted in the cloud.

Suggested management levels


Because the workloads in your IT portfolio vary, it's unlikely that a single level of management will suffice for each
workload. To help you support a variety of workloads and business commitments, we suggest that your cloud
operations team or platform operations team establish a few levels of operations management.

As a starting point, consider establishing the management levels that are shown in the preceding diagram and
suggested in the following list:
Management baseline: A cloud management baseline (or management baseline) is a defined set of tools,
processes, and consistent pricing that serve as the foundation for all cloud management in Azure. To establish a
cloud management baseline and determine which tools to include in the baseline offering to your business,
review the list in the "Cloud management disciplines" section.
Enhanced baseline: A number of workloads might require enhancements to the baseline that aren't
necessarily specific to a single platform or workload. Although these enhancements aren't cost effective for
every workload, there should be common processes, tools, and solutions for any workload that can justify the
cost of the extra management support.
Platform specialization: In any given environment, some common platforms are used by a variety of
workloads. This general architectural commonality doesn't change when businesses adopt the cloud. Platform
specialization is an elevated level of management that applies data and architectural subject matter expertise to
provide a higher level of operational management. Examples of platform specialization would include
management functions specific to SQL Server, Containers, Active Directory, or other services that can be better
managed through consistent, repeatable processes, tools, and architectures.
Workload specialization: For workloads that are truly mission critical, there might be a cost justification to go
much deeper into the management of that workload. Workload specialization applies workload telemetry to
determine more advanced approaches to daily management. That same data often identifies automation,
deployment, and design improvements that would lead to greater stability, reliability, and resiliency beyond
what's possible with operational management alone.
Unsupported: It's equally important to communicate common management processes that won't be delivered
through cloud management disciplines for workloads that are classified as not supported or not critical.
Organizations might also choose to outsource functions related to one or more of these management levels to a
service provider. These service providers can use Azure Lighthouse to provide greater precision and transparency.
The remaining articles in this series outline a number of processes that are commonly found within each of these
disciplines. In parallel, the Azure Management Guide demonstrates the tools that can support each of those
processes. For assistance with building your management baseline, start with the Azure Management Guide. After
you've established the baseline, this article series and the accompanying best practices can help expand that
baseline to define other levels of management support.

Cloud management disciplines


Each suggested management level can call on a variety of cloud management disciplines. However, the mapping is
designed to make it easier to find the suggested processes and tools to deliver on the appropriate level of cloud
management.
In most cases, the previously discussed management baseline level consists of processes and tools from the
following disciplines. In each case, a few processes and tools are highlighted to demonstrate enhanced baseline
functions.
Inventory and visibility: At a minimum, a management baseline should include a means of inventorying
assets and creating visibility into the run state of each asset.
Operational compliance: Regular management of configuration, sizing, cost, and performance of assets is key
to maintaining performance expectations and a management baseline.
Protect and recover: Minimizing operational interruptions and expediting recovery can help you avoid
performance losses and revenue impacts. Detection and recovery are essential aspects of this discipline within
any management baseline.
The platform specialization level of management pulls from the processes and tools that are aligned with the
platform operations disciplines. Likewise, the workload specialization level of management pulls from the
processes and tools that are aligned with the workload operations disciplines.

Next steps
The next step toward defining each level of cloud management is an understanding of inventory and visibility.
Inventory and visibility options
Inventory and visibility in cloud management
6 minutes to read • Edit Online

Operational management has a clear dependency on data. Consistent management requires an understanding
about what is managed (inventory) and how those managed workloads and assets change over time (visibility).
Clear insights about inventory and visibility help empower the team to manage the environment effectively. All
other operational management activities and processes build on these two areas.
A few classic phrases about the importance of measurements set the tone for this article:
Manage what matters.
You can only manage what you can measure.
If you can't measure it, it might not matter.
The inventory and visibility discipline builds on these timeless phrases. Before you can effectively establish
operational management processes, it's important to gather data and create the right level of visibility for the
right teams.

Common customer challenges


Unless inventory and visibility processes are consistently applied, operational management teams can suffer from
a higher volume of business interruptions, longer time to recovery, and greater amounts of effort required to
troubleshoot and triage issues. As changes adversely affect higher priority applications and larger numbers of
assets, each of these metrics grows even faster.
These challenges stem from a small number of questions that can be answered only through consistent
data/telemetry:
How does the current-state performance deviate from standard operational performance telemetry?
What assets are causing the business interruptions at the workload level?
Which assets must be remediated to return to acceptable performance of this workload or business process?
When did the deviation start? What was the trigger?
Which changes have been made to the underlying assets? By whom?
Were the changes intentional? Malicious?
How did changes affect performance telemetry?
It is difficult, if not impossible, to answer these questions without a rich, centralized source for logs and telemetry
data. To enable cloud management by ensuring the consistent configuration that's required to centralize the data,
the baseline service must first start by defining the processes. The processes should capture how such a
configuration enforces data collection to support the components of inventory and visibility in the next section.

Components of inventory and visibility


Creating visibility on any cloud platform requires a few key components:
Responsibility and visibility
Inventory
Central logging
Change tracking
Performance telemetry
Responsibility and visibility
When you establish commitments for each workload, management responsibility is a key factor. Delegated
responsibility creates a need for delegated visibility. The first step toward inventory and visibility is to ensure that
the responsible parties have access to the right data. Before you implement any cloud-native tools for visibility,
ensure that each monitoring tool has been configured with proper access and scope for each operations team.
Inventory
If no one knows that an asset exists, it's difficult to manage the asset. Before an asset or workload can be
managed, it must be inventoried and classified. The first technical step toward stable operations is a validation of
inventory and classification of that inventory.
Central logging
Centralized logging is critical to the visibility that's required day to day by the operations management teams. All
assets deployed to the cloud should record logs to a central location. In Azure, that central location is log analytics.
The centralization of logging drives reports about change management, service health, configuration, and most
other aspects of IT operations.
Enforcing the consistent use of central logging is the first step toward establishing repeatable operations.
Enforcement can be accomplished through corporate policy. When possible, however, enforcement should be
automated to ensure consistency.
Change tracking
Change is the one constant in a technology environment. Awareness and understanding of changes across
multiple workloads is essential to reliable operations. Any cloud management solution should include a means of
understanding the when, how, and why of technical change. Without those data points, remediation efforts are
significantly hindered.
Performance telemetry
Business commitments about cloud management are driven by data. To properly maintain commitments, the
cloud operations team must first understand the telemetry about the stability, performance, and operations of the
workload, and the assets which support the workload.
The ongoing health and operations of the network, DNS, operating systems, and other foundational aspects of
the environment are critical data points that factor into the overall health of any workload.

Processes
Perhaps more important than the features of the cloud management platform, the cloud management processes
will realize operations commitments with the business. Any cloud management methodology should include, at a
minimum, the following processes:
Reactive monitoring: When deviations adversely affect business operations, who addresses those
deviations? What actions do they take to remediate the deviations?
Proactive monitoring: When deviations are detected but business operations are not affected, how are those
deviations addressed, and by whom?
Commitment reporting: How is adherence to the business commitment communicated to business
stakeholders?
Budgetary reviews: What is the process for reviewing those commitments against budgeted costs? What is
the process for adjusting the deployed solution or the commitments to create alignment?
Escalation paths: What escalation paths are available when any of the preceding processes fail to meet the
needs of the business?
There are several more processes related to inventory and visibility. The preceding list is designed to provoke
thought within the operations team. Answering these questions will help develop some of the necessary
processes, as well as likely trigger new, deeper questions.

Responsibilities
When you're developing processes for operational monitoring, it's equally important to determine responsibilities
for daily operation and regular support of each process.
In a central IT organization, IT would provide the operational expertise. The business would be consultative in
nature, when issues require remediation.
In a cloud center of excellence organization, business operations would provide the expertise and hold
responsibility for management of these processes. IT would focus on the automation and support of teams, as
they operate the environment.
But these are the common responsibilities. Organizations often require a mixture of responsibilities to meet
business commitments.

Act on inventory and visibility


Regardless of the cloud platform, the five components of inventory and visibility are used to drive most
operational processes. All subsequent disciplines will build on the data that's being captured. The next articles in
this series outline ways to act on that data and integrate other data sources.
Share visibility
Data without action produces little return. Cloud management might expand beyond cloud-native tools and
processes. To accommodate broader processes, a cloud management baseline might need to be enhanced to
include reporting, IT Service Management integration, or data centralization. Cloud management might need to
include one or more of the following during various phases of operational maturity.
Report
Offline processes and communication about commitments to business stakeholders often require reporting. Self-
service reporting or periodic reporting might be a necessary component of an enhanced management baseline.
IT Service Management (ITSM ) integration
ITSM integration is often the first example of acting on inventory and visibility. When deviations from expected
performance patterns arise, ITSM integration uses alerts from the cloud platform to trigger tickets in a separate
service management tool to trigger remediation activities. Some operating models might require ITSM
integration as an aspect of the enhanced management baseline.
Data centralization
There's a variety of reasons why a business might require multiple tenants within a single cloud provider. In those
scenarios, data centralization is a required component of the enhanced management baseline, because it can
provide visibility across each of those tenants or environments.

Next steps
Operational compliance builds on inventory capabilities by applying management automation and controls. See
how operational compliance maps to your processes.
Plan for operational compliance
Operational compliance in cloud management
2 minutes to read • Edit Online

Operational compliance builds on the discipline of inventory and visibility. As the first actionable step of cloud
management, this discipline focuses on regular telemetry reviews and remediation efforts (both proactive and
reactive remediation). This discipline is the cornerstone for maintaining balance between security, governance,
performance, and cost.

Components of operations compliance


Maintaining compliance with operational commitments requires analysis, automation, and human remediation.
Effective operational compliance requires consistency in a few critical processes:
Resource consistency
Environment consistency
Resource configuration consistency
Update consistency
Remediation automation
Resource consistency
The most effective step that a cloud management team can take toward operational compliance is to establish
consistency in resource organization and tagging. When resources are consistently organized and tagged, all other
operational tasks become easier. For deeper guidance on resource consistency, see the Governance phase of the
cloud adoption lifecycle. Specifically, the initial governance foundation articles demonstrate how to start
developing resource consistency.
Environment consistency
Establishing consistent environments, or landing zones, is the next most important step toward operational
compliance. When landing zones are consistent and enforced through automated tools, it is significantly less
complex to diagnose and resolve operational issues. For deeper guidance on environment consistency, see the
Ready phase of the cloud adoption lifecycle. The exercises in that phase help build a repeatable process for
defining and maturing a consistent, code-first approach to the development of cloud-based environments.
Resource configuration consistency
As it builds on governance and readiness approaches, cloud management should include processes for the
ongoing monitoring and evaluation of its adherence to resource consistency requirements. As workloads change
or new versions are adopted, it is vital that cloud management processes evaluate any configuration changes,
which are not easily regulated through automation.
When inconsistencies are discovered, some are addressed by consistency in updates and others can be
automatically remediated.
Update consistency
Stability in approach can lead to more stable operations. But some changes are required within cloud
management processes. In particular, regular patching and performance changes are essential to reducing
interruptions and controlling costs.
One of the many values of a mature cloud management methodology is a focus on stabilizing and controlling
necessary change.
Any cloud management baseline should include a means of scheduling, controlling, and possibly automating
necessary updates. Those updates should include patches at a minimum, but could also include performance,
sizing, and other aspects of updating assets.
Remediation automation
As an enhanced baseline for cloud management, some workloads may benefit from automated remediation.
When a workload commonly encounters issues that can't be resolved through code or architectural changes,
automating remediation can help reduce the burden of cloud management and increase user satisfaction.
Many would argue that any issue that's common enough to automate should be resolved through resolution of
technical debt. When a long-term resolution is prudent, it should be the default option. However, a number of
business scenarios make it difficult to justify large investments in the resolution of technical debt. When such a
resolution can't be justified, but remediation is a common and costly burden, automated remediation is the next
best solution.

Next steps
Protection and recovery are the next areas to consider in a cloud management baseline.
Protect and recover
Protect and recover in cloud management
5 minutes to read • Edit Online

After they've met the requirements for inventory and visibility and operational compliance, cloud management
teams can anticipate and prepare for a potential workload outage. As they're planning for cloud management, the
teams must start with an assumption that something will fail.
No technical solution can consistently offer a 100 percent uptime SLA. Solutions with the most redundant
architectures claim to deliver on "six 9s" or 99.9999 percent uptime. But even a "six 9s" solution goes down for
31.6 seconds in any given year. Sadly, it's rare for a solution to warrant a large, ongoing operational investment
that's required to reach "six 9s" of uptime.
Preparation for an outage allows the team to detect failures sooner and recover more quickly. The focus of this
discipline is on the steps that come immediately after a system fails. How do you protect workloads, so that they
can be recovered quickly when an outage occurs?

Translate protection and recovery conversations


The workloads that power business operations consist of applications, data, virtual machines (VMs), and other
assets. Each of those assets might require a different approach to protection and recovery. The important aspect of
this discipline is to establish a consistent commitment within the management baseline, which can provide a
starting point during business discussions.
At a minimum, each asset that supports any given workload should have a baseline approach with a clear
commitment to speed of recovery (recovery time objectives, or RTO ) and risk of data loss (recovery point
objectives, or RPO ).
Recovery time objectives (RTO )
When disaster strikes, a recovery time objective is the amount of time it should take to recovery any system to its
state prior to the disaster. For each workload, that would include the time required to restore minimum necessary
functionality for the VMs and apps. It also includes the amount of time required to restore the data that's required
by the applications.
In business terms, RTO represents the amount of time that the business process will be out of service. For
mission-critical workloads, this variable should be relatively low, allowing the business processes to resume
quickly. For lower-priority workloads, a standard level of RTO might not have a noticeable impact on company
performance.
The management baseline should establish a standard RTO for non-mission-critical workloads. The business can
then use that baseline as a way to justify additional investments in recovery times.
Recovery point objectives (RPO )
In most cloud management systems, data is periodically captured and stored through some form of data
protection. The last time data was captured is referred to as a recovery point. When a system fails, it can be
restored only to the most recent recovery point.
If a system has a recovery point objective that's measured in hours or days, a system failure would result in the
loss of data for those hours or days between the last recovery point and the outage. A one-day RPO would
theoretically result in the loss of all transactions in the day leading up to the failure.
For mission-critical systems, an RPO that's measured in minutes or seconds might be more appropriate to use to
avoid a loss in revenue. But a shorter RPO generally results in an increase in overall management costs.
To help minimize costs, a management baseline should focus on the longest acceptable RPO. The cloud
management team can then increase the RPO of specific platforms or workloads, which would warrant more
investment.

Protect and recover workloads


Most of the workloads in an IT environment support a specific business or technical process. Systems that don't
have a systemic impact on business operations often don't warrant the increased investments required to recover
quickly or minimize data loss. By establishing a baseline, the business can clearly understand what level of
recovery support can be offered at a consistent, manageable price point. This understanding helps the business
stakeholders evaluate the value of an increased investment in recovery.
For most cloud management teams, an enhanced baseline with specific RPO/RTO commitments for various assets
yields the most favorable path to mutual business commitments. The following sections outline a few common
enhanced baselines that empower the business to easily add protection and recovery functionality through a
repeatable process.
Protect and recover data
Data is arguably the most valuable asset in the digital economy. The ability to protect and recover data more
effectively is the most common enhanced baseline. For the data that powers a production workload, loss of data
can be directly equated to loss in revenue or loss of profitability. We generally encourage cloud management
teams to offer a level of enhanced management baseline that supports common data platforms.
Before cloud management teams implement platform operations, it's common for them to support improved
operations for a platform as a service (PaaS ) data platform. For instance, it's easy for a cloud management team to
enforce a higher frequency of backup or multiregion replication for Azure SQL Database or Azure Cosmos DB
solutions. Doing so allows the development team to easily improve RPO by modernizing their data platforms.
To learn more about this thought process, see platform operations discipline.
Protect and recover VMs
Most workloads have some dependency on virtual machines, which host various aspects of the solution. For the
workload to support a business process after a system failure, a number of virtual machines must be recovered
quickly.
Every minute of downtime on those virtual machines could cause lost revenue or reduced profitability. When VM
downtime has a direct impact on the fiscal performance of the business, RTO is very important. Virtual machines
can be recovered more quickly by using replication to a secondary site and automated recovery, a model that's
referred to as a hot-warm recovery model. At the highest state of recovery, virtual machines can be replicated to a
fully functional, secondary site. This more expensive approach is referred to as a high-availability, or hot-hot,
recovery model.
Each of the preceding models reduces the RTO, resulting in a faster restoration of business process capabilities.
However, each model also results in significantly increased cloud management costs.
For more about this thought process, see workload operations discipline.

Next steps
After this management baseline component is met, the team can look ahead to avoid outages in platform
operations and workload operations.
Platform operations Workload operations
Platform operations in cloud management
6 minutes to read • Edit Online

A cloud management baseline that spans inventory and visibility, operational compliance, and protection and
recovery might provide a sufficient level of cloud management for most workloads in the IT portfolio. However,
that baseline is seldom enough to support the full portfolio. This article builds on the most common next step in
cloud management, portfolio operations.
A quick study of the assets in the IT portfolio highlights patterns across the workloads that are being supported.
Within those workloads, there will be a number of common platforms. Depending on the past technical decisions
within the company, those platforms could vary widely.
For some organizations, there will be a heavy dependence on SQL Server, Oracle, or other open-source data
platforms. In other organizations, the commonalities might be rooted in the hosting platforms for virtual
machines (VMs) or containers. Still others might have a common dependency on applications or Enterprise
Resource Planning (ERP ) systems, such as SAP, Oracle, or others.
By understanding these commonalities, the cloud management team can specialize in higher levels of support for
those prioritized platforms.

Establish a service catalog


The objective of platform operations is to create reliable and repeatable solutions, which the cloud adoption team
can use to deliver a platform that provides a higher level of business commitment. That commitment could
decrease the likelihood or frequency of downtime, which improves reliability. In the event of a system failure, the
commitment could also help decrease the amount of data loss or time to recovery. Such a commitment often
includes ongoing, centralized operations to support the platform.
As the cloud management team establishes higher degrees of operational management and specialization related
to specific platforms, those platforms are added to a growing service catalog. The service catalog provides self-
service deployment of platforms in a specific configuration, which adheres to ongoing platform operations.
During the business-alignment conversation, cloud management and cloud strategy teams can propose service
catalog solutions as a way for the business to improve reliability, uptime, and recovery commitments in a
controlled, repeatable process.
For reference, some organizations refer to an early-stage service catalog as an approved list. The primary
difference is that a service catalog comes with ongoing operational commitments from the cloud center of
excellence (CCoE ). An approved list is similar, in that it provides a preapproved list of solutions that a team can use
in the cloud. However, typically there isn't an operational benefit associated with applications on an approved list.
Much like the debate between Central IT and CCoE, the difference is one of priorities. A service catalog assumes
good intent but provides operational, governance, and security guardrails that accelerate innovation. An approved
list hinders innovation until operations, compliance, and security gates can be passed for a solution. Both solutions
are viable, but they require the company to make subtle prioritization decisions to invest more in innovation or
compliance.
Build the service catalog
Cloud management is seldom successful at delivering a service catalog in a silo. Proper development of the
catalog requires a partnership across Central IT or the CCoE. This approach tends to be most successful when an
IT organization reaches a CCoE level of maturity, but could be implemented sooner.
When it's building the service catalog within a CCoE model, the cloud platform team builds out the desired-state
platform. The cloud governance and cloud security teams validate governance and compliance within the
deployment. The cloud management team establishes ongoing operations for that platform. And the cloud
automation team packages the platform for scalable, repeatable deployment.
After the platform is packaged, the cloud management team can add it to the growing service catalog. From there,
the cloud adoption team can use the package or others in the catalog during deployment. After the solution goes
to production, the business realizes the extra benefits of improved operational management and potentially
reduced business disruptions.

NOTE
Building a service catalog requires a great deal of effort and time from multiple teams. Using the service catalog or
approved list as a gating mechanism will slow innovation. When innovation is a priority, service catalogs should be
developed parallel to other adoption efforts.

Define your own platform operations


Although management tools and processes can help improve platform operations, that is often not enough to
achieve the desired states of stability and reliability. True platform operations requires a focus on architecture-
quality pillars. When a platform justifies a deeper investment in operations, the following five pillars should be
considered before the platform becomes a part of any service catalog:
Scalability: The ability of a system to handle increased load.
Availability: The percentage of time that a system is functional and working.
Resiliency: The ability of a system to recover from failures and continue to function.
Management: The operations processes that keep a system running in production.
Security: Protecting applications and data from threats.
The Azure Architecture Framework provides an approach to evaluating specific workloads for adherence to these
pillars, in an effort to improve overall operations. These pillars can be applied to both platform operations and
workload operations.

Get started with specific platforms


The platforms discussed in the next sections are common to typical Azure customers, and they can easily justify an
investment in platform operations. Cloud management teams tend to start with them when they're building out
platform operations requirements or a full service catalog.
PaaS data operations
Data is often the first platform to warrant platform operations investments. When data is hosted in a platform as a
service (PaaS ) environment, business stakeholders tend to request a reduced recovery point objective (RPO ) to
minimize data loss. Depending on the nature of the application, they might also request a reduction in recovery
time objective (RTO ). In either case, the architecture that supports PaaS -based data solutions can easily
accommodate some increased level of management support.
In most scenarios, the cost of improving management commitments is easily justified, even for applications that
are not mission critical. This platform operations improvement is so common that many cloud management
teams see it more as an enhanced baseline, rather than as a true platform operations improvement.
IaaS data operations
When data is hosted in a traditional infrastructure as a service (IaaS ) solution, the effort to improve RPO and RTO
can be significantly higher. Yet the business stakeholders' desire to achieve better management commitments is
seldom affected by a PaaS versus IaaS decision. If anything, an understanding of the fundamental differences in
architecture might prompt the business to ask for PaaS solutions or commitments that match what's available on
PaaS solutions. Modernization of any IaaS data platforms should be considered as a first step into platform
operations.
When modernization isn't an option, cloud management teams commonly prioritize IaaS -based data platforms as
a first required service in the service catalog. Providing the business with a choice between standalone data
servers and clustered, high-availability, data solutions makes the business commitment conversation much easier
to facilitate. A basic understanding of the operational improvements and the increased costs will arm the business
to make the best decision for the business processes and supporting workloads.
Other common platform operations
In addition to data platforms, virtual machine hosts tend to be a common platform for operations improvements.
Most commonly, cloud platform and cloud management teams invest in improvements to VMware hosts or
container solutions. Such investments can improve the stability and reliability of the hosts, which support the
VMs, which in turn power the workloads. Proper operations on one host or container can improve the RPO or
RTO of several workloads. This approach creates improved business commitments, but distributes the investment.
Improved commitments and reduced costs combine to make it much easier to justify improvements to cloud
management and platform operations.

Next steps
In parallel with improvements to platform operations, cloud management teams also focus on improving
workload operations for the top 20 percent or less of production workloads.
Improve workload operations
Workload operations in cloud management
5 minutes to read • Edit Online

Some workloads are critical to the success of the business. For those workloads, a management baseline is
insufficient to meet the required business commitments to cloud management. Platform operations might not
even be sufficient to meet business commitments. This highly important subset of workloads requires a
specialized focus on the way the workload functions and how it is supported.
In return, the investment in workload operations can lead to improved performance, decreased risk of business
interruption, and faster recovery when system failures occur. This article discusses an approach to investing in the
continued operations of these high priority workloads to drive improved business commitments.

When to invest in workload operations


The Pareto Principle (also known as the 80/20 Rule) states that 80 percent of effects come from 20 percent of the
causes. When IT portfolios are allowed to grow organically over time, this rule is often illustrated in a review of
the IT portfolio. Depending on the effect that requires investment, the cause can vary but the general principle
holds true:
80 percent of system failures tend to be the result of 20 percent of the common errors or bugs.
80 percent of business value tends to come from 20 percent of the workloads in a portfolio.
80 percent of the effort to migrate to the cloud comes from 20 percent of the workloads being moved.
80 percent of cloud management efforts will support 20 percent of the service incidents or trouble tickets.
80 percent of business impact from an outage will come from 20 percent of the systems affected by the
outage.
Workload operations should be applied only when the cloud adoption strategy, business outcomes, and
operational metrics are each well understood. This is a paradigm shift from the classic view of IT. Traditionally, IT
assumed that all workloads experienced the same degree of support and required similar levels of priority.
Before they invest in deep workload operations, both IT and the business should understand the business
justifications and the expectations of increased investment in cloud management.

Start with the data


Workload operations begin with a deep understanding of workload performance and support requirements.
Before the team invests in workload operations, it must have rich data about workload dependencies, application
performance, database diagnostics, virtual machine telemetry, and incident history.
This data seeds the insights that drive workload operations decisions.

Continued observation
Initial data and ongoing telemetry can help formulate and test theories about the performance of a workload. But
ongoing workload operations are rooted in a continued and expanded observation of workload performance,
with a heavy focus on application and data performance.
Test the automation
At the application level, the first requirements of workload operations, is an investment in deep testing. For any
application that's supported through workload operations, a test plan should be established and regularly
executed to deliver functional and scale testing across the applications.
Regular test telemetry can provide immediate validation of various hypotheses about the operation of the
workload. Improving operational and architectural patterns can be executed and tested. The resulting deltas
provide a clear impact analysis to guide continued investments.
Understand releases
A clear understanding of release cycles and release pipelines is an important element of workload operations.
An understanding of cycles can prepare for potential interruptions and allow the team to proactively address any
releases that might produce an adverse effect on operations. This understanding also allows the cloud
management team to partner with adoption teams to continuously improve the quality of the product and
address any bugs that might affect stability.
More importantly, an understanding of release pipelines can significantly improve the recovery point objective
(RPO ) of a workload. In many scenarios, the fastest and most accurate path to the recovery of an application is a
release pipeline. For application layers that change only when a new release happens, it might be wise to invest
more heavily in pipeline optimization than on the recovery of the application from traditional back-up processes.
Although a deployment pipeline can be the fastest path to recovery, it can also be the fastest path to remediation.
When an application has a fast, efficient, and reliable release pipeline, the cloud management team has an option
to automate deployment to a new host as a form of automated remediation.
There might be many other faster, more effective mechanisms for remediation and recovery. However, when the
use of an existing pipeline can meet business commitments and capitalize on existing DevOps investments, the
existing pipeline might be a viable alternative.
Clearly communicate changes to the workload
Change to any workload is among the biggest risks to workload operations. For any workload in the workload
operations level of cloud management, the cloud management team should closely align with the cloud adoption
teams to understand the changes coming from each release. This investment in proactive understanding will have
a direct, positive impact on operational stability.

Improve outcomes
The data and communication investments in a workload will yield suggestions for improvements to ongoing
operations in one of three areas:
Technical debt resolution
Automated remediation
Improved system design
Technical debt resolution
The best workload operations plans still require remediation. As your cloud management team seeks to stay
connected to understand adoption efforts and releases, the team likewise should regularly share remediation
requirements to ensure that technical debt and bugs are a continued priority for your development teams.
Automated remediation
By applying the Pareto Principle, we can say that 80 percent of negative business impact likely comes from 20
percent of the service incidents. When those incidents can't be addressed in normal development cycles,
investments in remediation automation can significantly reduce business interruptions.
Improved system design
In the cases of technical debt resolution and automated remediation, system flaws are the common cause of most
system outages. You can have the greatest impact on overall workload operations by adhering to a few design
principles:
Scalability: The ability of a system to handle increased load.
Availability: The percentage of time that a system is functional and working.
Resiliency: The ability of a system to recover from failures and continue to function.
Management: Operations processes that keep a system running in production.
Security: Protecting applications and data from threats.
To help improve overall operations, the Azure Architecture Framework provides an approach to evaluating
specific workloads for adherence to these pillars. You can apply the pillars can be applied to both platform
operations and workload operations.

Next steps
With a full understanding of the manage methodology within the Cloud Adoption Framework, you are now
armed to implement cloud management principles. For guidance on making this methodology actionable within
your operations environment, see Cloud management in the Cloud Adoption Framework of the adoption
lifecycle.
Apply this methodology
Apply design principles and advanced operations
6 minutes to read • Edit Online

The first three cloud management disciplines describe a management baseline. At a minimum, a management
baseline should include a standard business commitment to minimize business interruptions and accelerate
recovery if service is interrupted. Most management baselines include a disciplined focus on maintaining
"inventory and visibility," "operational compliance," and "protection and recovery."
The purpose of a management baseline is to create a consistent offering that provides a minimum level of business
commitment for all supported workloads. This baseline of common, repeatable management offerings allows the
team to deliver a highly optimized degree of operational management, with minimal deviation. But that standard
offering might not provide a rich enough commitment to the business.
The diagram in the next section illustrates three ways to go beyond the management baseline.
The management baseline should meet the minimum commitment required by 80 percent of the lowest criticality
workloads in the portfolio. The baseline should not be applied to mission-critical workloads. Nor should it be
applied to common platforms that are shared across workloads. Those workloads require a focus on design
principles and advanced operations.

Advanced operations options


There are three suggested paths for improving business commitments beyond the management baseline, as
shown in the following diagram:
Enhanced management baseline
As outlined in the Azure Management Guide, an enhanced management baseline uses cloud-native tools to
improve uptime and decrease recovery times. The improvements are significant, but less so than with workload or
platform specialization. The advantage of an enhanced management baseline is the equally significant reduction in
cost and implementation time.

Management specialization
Aspects of workload and platform operations might require changes to design and architecture principles. Those
changes could take time and might result in increased operating expenses. To reduce the number of workloads
requiring such investments, an enhanced management baseline could provide enough of an improvement to the
business commitment.
For workloads that warrant a higher investment to meet a business commitment, specialization of operations is
key.

Areas of management specialization


There are two areas of specialization:
Platform specialization: Invest in ongoing operations of a shared platform, distributing the investment across
multiple workloads.
Workload specialization: Invest in ongoing operations of a specific workload, generally reserved for mission-
critical workloads.
Central IT or cloud center of excellence (CCoE)
Decisions between platform specialization and workload specialization are based on the criticality and impact of
each workload. However, these decisions are also indicative of larger cultural decisions between Central IT and
CCoE organizational models.
Workload specialization often triggers a cultural change. Traditional IT and Central IT both build processes that can
provide support at scale. Scale support is more achievable for repeatable services found in a management
baseline, enhanced baseline, or even platform operations. Workload specialization doesn't often scale. This lack of
scale makes it difficult for a centralized IT organization to provide necessary support without reaching
organizational scale limitations.
Alternatively, a cloud center of excellence approach scales through purposeful delegation of responsibility and
selective centralization. Workload specialization tends to better align with the delegated responsibility approach of
a CCoE.
The natural alignment of roles in a CCoE is outlined as follows:
The cloud platform team helps build common platforms that support multiple cloud adoption teams.
The cloud automation team extends those platforms into deployable assets in a service catalog.
Cloud management delivers the management baseline centrally and helps support the use of the service
catalog.
But the business unit (in the form of a business DevOps team or cloud adoption team) holds responsibility for
day-to-day operations of the workload, pipeline, or performance.
As for the alignment of areas of management, Central IT and CCoE models can generally deliver on Platform
Specialization, with minimal cultural change. Delivering on workload specialization might be a little more complex
for Central IT teams.

Management specialization processes


Within each specialization, the following four-step process is delivered in a disciplined, iterative approach. This
approach requires partnership among cloud adoption, cloud platform, cloud automation, and cloud management
experts to create a viable and informed feedback loop.
Improve system design: Improve the design of common systems (platforms) or specific workloads to
effectively minimize interruptions.
Automate remediation: Some improvements are not cost effective. In such cases, it might make more sense
to automate remediation and reduce the impact of interruptions.
Scale the solution: As systems design and automated remediation are improved, you can scale those changes
across the environment through the service catalog.
Continuous improvement: You can use various monitoring tools to discover incremental improvements to
address in the next pass of system design, automation, and scale.
Improve system design
Improving system design is the most effective approach to improving operations of any common platform. System
design improvements can help increase stability and decrease business interruptions. Design of individual systems
is out of scope for the environment view taken throughout the Cloud Adoption Framework. As a complement to
this framework, the Azure Architecture Framework provides best practices for improving the resiliency and design
of a specific system. You can apply those design improvements to the systems design of a platform or a specific
workload.
The Azure Architecture Framework focuses on improvement across five pillars of system design:
Scalability: Scaling the common platform assets to handle increased load.
Availability: Decreasing business interruptions by improving uptime potential.
Resiliency: Improving recovery times to reduce duration of interruptions.
Security: Protecting applications and data from external threats.
Management: Operations processes specific to those common platform assets.
Most business interruptions equate to some form of technical debt, or deficiency in the architecture. For existing
deployments, systems design improvements can be viewed as payments against existing technical debt. For new
deployments, systems design improvements can be viewed as avoidance of technical debt. The next section,
"Automated remediation," looks at ways to address technical debt that can't or shouldn't be addressed.
To help improve system design, learn more about the Azure Architecture Framework. As your system design
improves, return to this article to find new opportunities to improve and scale the improvements across your
environment.
Automated remediation
Some technical debt can't or shouldn't be addressed. Resolution could be too expensive to correct. It could be
planned but might have a long project duration. The business interruption might not have a significant business
impact, or the business priority is to recover quickly instead of investing in resiliency.
When resolution of technical debt isn't the desired path, automated remediation is commonly the desired next step.
Using Azure Automation and Azure Monitor to detect trends and provide automated remediation is the most
common approach to automated remediation.
For guidance on automated remediation, see Azure Automation and alerts.
Scale the solution with a service catalog
The cornerstone of platform specialization and platform operations is a well-managed service catalog. This is how
improvements to systems design and remediation are scaled across an environment. The cloud platform team and
cloud automation team align to create repeatable solutions to the most common platforms in any environment.
But, if those solutions aren't consistently applied, cloud management can provide little more than a baseline
offering.
To maximize adoption and minimize maintenance overhead of any optimized platform, the platform should be
added to a service catalog. Each application in the catalog can be deployed for internal consumption via the service
catalog, or as a marketplace offering for external consumers.
For information about publishing to a service catalog, see the series on publishing to a service catalog.
Continuous improvement
Platform specialization and platform operations both depend on strong feedback loops between adoption,
platform, automation, and management teams. Grounding those feedback loops in data empowers each team to
make wise decisions. For platform operations to achieve long-term business commitments, it's important to take
advantage of insights that are specific to the centralized platform. Because containers and SQL Server are the two
most common centrally managed platforms, consider getting started with continuous improvement data collection
by reviewing the following articles:
Container performance
PaaS database performance
IaaS database performance
Cloud adoption can't happen without well-organized people. Successful cloud adoption is the result of properly skilled people
doing the appropriate types of work, in alignment with clearly defined business goals, and in a well-managed environment. To
deliver an effective cloud operating model, it's important to establish appropriately staffed organizational structures. This
article outlines an approach to establishing and maintaining the proper organizational structures in four steps.

Organization alignment exercises


The following exercises will help guide the process of creating a landing zone to support cloud adoption.

Structure type
Define the type of organizational structure that best fits your operating model.

Cloud capabilities
Understand the cloud capabilities required to adopt and operate the cloud.

Establish teams
Define the teams that will be providing various cloud capabilities. A number of best practice options are listed for reference.

RACI matrix
Clearly defined roles are an important aspect of any operating model. Leverage the provided RACI matrix to map
responsibility, accountability, consulted, and informed roles to each of the teams for various functions of the cloud operating
model.

Structure type
The following organizational structures do not necessarily have to map to an organizational chart (org chart). Org charts
generally reflect command and control management structures. Conversely, the following organizational structures are
designed to capture alignment of roles and responsibilities. In an agile, matrix organization, these structures may be best
represented as virtual teams (or v-teams). There is no limitation suggesting that these organizational structures couldn't be
represented in an org chart, but it is not necessary in order to produce an effective operating model.
The first step of managing organizational alignment is to determine how the following organizational structures will be
fulfilled:
Org chart alignment: Management hierarchies, manager responsibilities, and staff alignment will align to organizational
structures.
Virtual teams (v-teams): Management structures and org charts remain unchanged. Instead, virtual teams will be created
and tasked with the required capabilities.
Mixed model: More commonly, a mixture of org chart and v-team alignment will be required to deliver on transformation
goals.

Understand required cloud capabilities


The following is a list of cloud capabilities that are required to succeed at cloud adoption and longer-term operating models.
After you become familiar with the various cloud capabilities, these can be aligned to organizational structures based on
staffing and maturity:
Cloud adoption: Deliver technical solutions.
Cloud strategy: Align technical change to business needs.
Cloud operations: Support and operate adopted solutions.
Cloud center of excellence: Improve quality, speed, and resiliency of adoption.
Cloud governance: Manage risk
Cloud platform: Operate and mature the platform.
Cloud automation: Accelerate adoption and innovation.

Mature organizational structures


To some degree, each of the above capabilities is delivered in every cloud adoption effort, either explicitly or in accordance with
a defined team structure. As adoption needs grow, so does the need to create balance and structure. To meet those needs,
companies often follow a process of maturing organizational structures.

The article on determining organizational structure maturity provides additional detail regarding each level of maturity.

Align RACI charts


At each level of maturity, accountability for various cloud capabilities shifts to new teams. This shifting of accountability enables
faster migration and innovation cycles by removing and automating barriers to change. To align assignments properly, the
article on RACI alignment shares a RACI chart for each of the referenced organizational structures.

Next steps
To track organization structure decisions over time, download and modify the RACI spreadsheet template.
Download the RACI spreadsheet template
Cloud strategy capabilities
2 minutes to read • Edit Online

Successful cloud adoption should align to defined motivations and business outcomes. When those outcomes
impact business functions, it's a good idea to establish a team made up of business leaders from across the
organization. To unite various leaders, we recommend that the executive leadership create a cloud strategy
team. The goal of the cloud strategy team is to produce tangible business results that are enabled by cloud
technologies. This team ensures that cloud adoption efforts progress in alignment with business outcomes.
In the absence of a defined cloud strategy team, someone must still provide the capability to align technical
activities to business outcomes. That same person or group should also manage change across the project. This
section defines this capability in more detail.

Possible sources for this capability


This capability is commonly provided by the following types of roles. When a cloud strategy team is defined, it
should include many of the following roles:
Finance
Line of business
Human resources
Operations
Enterprise architecture
IT infrastructure
Application groups
Project managers (Often with Agile project management experience)
This capability helps guide critical prioritization and discovery efforts during cloud adoption. It may also trigger
changes in business processes, the execution of operations, customer interactions, or even product
development. If this capability is confined to IT, the success of cloud adoption efforts will be constrained. To
drive true business change, business leaders should be the primary source of this capability. A defined cloud
strategy team provides a means for involving key participants in a structured way.

NOTE
The organization's CEO and CIO often assign the team. Assignments are typically based on empowering this team to drive
change that cuts across various different organizations within the enterprise. The cloud strategy team members should be
assigned based on the motivations for cloud adoption, business outcomes, and relevant financial models.

Key responsibilities
The primary focus of the cloud strategy is to validate and maintain alignment between business priorities and
cloud adoption efforts. Secondarily, you should focus on change management across the adoption efforts. The
following tasks assist in achieving this capability.
Early planning tasks
Review and provide feedback on business outcomes and financial models.
Aid in establishing clear motivations for cloud adoption that align with corporate objectives.
Define relevant learning metrics that clearly communicate progress toward business outcomes.
Understand business risks introduced by the plan, represent the business's tolerance for risk.
Review and approve the rationalization of the digital estate.
Ongoing monthly tasks
Support the cloud governance capability during risk/tolerance conversations.
Review release plans to understand timelines and business impact of technical change.
Define business change plans associated with planned releases.
Ensure business teams are ready to execute business testing and the business change plan.

Meeting cadence
The tasks listed in preceding section can be time-consuming in the early planning phases. Here are some
recommendations for the allocation of time for cloud strategy team members:
During early planning efforts, allocate an hour each week to meet with the team. After the adoption plan is
solidified (usually within 4–6 weeks), the time requirements can be reduced.
Throughout the adoption efforts, allocate 1–2 hours each month to review progress and validate continued
priorities.
Additional time is likely required from delegated members of the executive's team on an as-needed basis.
Each member of the cloud strategy team should appoint a delegate who can allocate 5–10 hours per week to
support ongoing prioritization questions and report on any urgent needs.

Next steps
Strategy and planning are important. However, nothing is actionable without cloud adoption capabilities.
Understand the role of this important capability before beginning adoption efforts.
Align cloud adoption capabilities
Cloud adoption capabilities
3 minutes to read • Edit Online

Cloud adoption capabilities allow for the implementation of technical solutions in the cloud. Like any IT project,
the people delivering the actual work will determine success. The teams providing the necessary cloud adoption
capabilities can be staffed from multiple subject matter experts or implementation partners.

Possible sources for this capability


Cloud adoption teams are the modern-day equivalent of technical implementation teams or project teams.
However, the nature of the cloud may require a more fluid team structure. Some teams focus exclusively on
cloud migration, while other teams focus on innovations that take advantage of cloud technologies. Some teams
include the broad technical expertise required to complete large adoption efforts, like a full datacenter migration.
Other teams have a tighter technical focus and may move between projects to accomplish specific goals. One
example would be a team of data platform specialists who help convert SQL VMs to SQL PaaS instances.
Regardless of the type or number of cloud adoption teams, the cloud adoption capability is provided by subject
matter experts found in IT, business analysis, or implementation partners.
Depending on the desired business outcomes, the skills needed to provide full cloud adoption capabilities could
include:
Infrastructure implementers
DevOps engineers
Application developers
Data scientists
Data or application platform specialists
For optimal collaboration and efficiency, we recommend that cloud adoption teams have an average team size of
six people. These teams should be self-organizing from a technical execution perspective. We highly recommend
that these teams also include project management expertise, with deep experience in agile, scrum, or other
iterative models. This team is most effective when managed using a flat structure.

Key responsibilities
The primary need from any cloud adoption capability is the timely, high-quality implementation of the technical
solutions outlined in the adoption plan, in alignment with governance requirements and business outcomes,
taking advantage of technology, tools, and automation solutions made available to the team.
Early planning tasks:
Execute the rationalization of the digital estate
Review, validate, and advance the prioritized migration backlog
Begin execution of the first workload as a learning opportunity
Ongoing monthly tasks:
Oversee change management processes
Manage the release and sprint backlogs
Build and maintain the adoption landing zone in conjunction with governance requirements
Execute the technical tasks outlined in the sprint backlogs
Team cadence
We recommend that teams providing cloud adoption capability be dedicated to the effort full-time.
It's best if these teams meet daily in a self-organizing way. The goal of daily meetings is to quickly update the
backlog, and to communicate what has been completed, what is to be done today, and what things are blocked,
requiring additional external support.
Release schedules and iteration durations are unique to each company. However, a range of one to four weeks
per iteration seems to be the average duration. Regardless of iteration or release cadence, we recommend that
the team meets all supporting teams at the end of each release to communicate the outcome of the release, and
to reprioritize upcoming efforts. Likewise, it's valuable to meet as a team at the end of each sprint, with the cloud
center of excellence or cloud governance team to stay aligned on common efforts and any needs for support.
Some of the technical tasks associated with cloud adoption can become repetitive. Team members should rotate
every 3–6 months to avoid employee satisfaction issues and maintain relevant skills. A rotating seat on cloud
center of excellence or cloud governance team can provide an excellent opportunity to keep employees fresh and
harness new innovations.

Next steps
Adoption is great, but ungoverned adoption can produce unexpected results. Aligning cloud governance
capabilities accelerates adoption and best practices, while reducing business and technical risks.
Align cloud governance capabilities
Cloud governance capabilities
6 minutes to read • Edit Online

Any kind of change generates new risks. Cloud governance capabilities ensure that risks and risk tolerance are
properly evaluated and managed. This capability ensures the proper identification of risks that can't be tolerated
by the business. The people providing this capability can then convert risks into governing corporate policies.
Governing policies are then executed through defined disciplines executed by the staff members who provide
cloud governance capabilities.

Possible sources for this capability


Depending on the desired business outcomes, the skills needed to provide full cloud governance capabilities
could be provided by:
IT governance
Enterprise architecture
Security
IT operations
IT infrastructure
Networking
Identity
Virtualization
Business continuity and disaster recovery
Application owners within IT
Finance owners
The cloud governance capability identifies risks related to current and future releases. This capability is seen in
the efforts to evaluate risk, understand the potential impacts, and make decisions regarding risk tolerance. In
doing so, plans can quickly be updated to reflect the changing needs of the cloud adoption capability.

Key responsibilities
The primary duty of any cloud governance capability is to balance competing forces of transformation and risk
mitigation. Additionally, cloud governance ensures that cloud adoption is aware of data and asset classification
and architecture guidelines that govern all adoption approaches. The team will also work with the cloud center
of excellence to apply automated approaches to governing cloud environments.
These tasks are usually executed by the cloud governance capability on a monthly basis.
Early planning tasks:
Understand business risks introduced by the plan
Represent the business' tolerance for risk
Aid in the creation of a Governance MVP
Ongoing monthly tasks:
Understand business risks introduced during each release
Represent the business' tolerance for risk
Aid in the incremental improvement of Policy and Compliance requirements
Meeting cadence
Cloud governance capability is usually delivered by a working team. The time commitment from each team
member will represent a large percentage of their daily schedules. Contributions will not be limited to meetings
and feedback cycles.

Additional participants
The following represent participants who will frequently participate in cloud governance activities:
Leaders from middle management and direct contributors in key roles who have been appointed to
represent the business will help evaluate risk tolerances.
The cloud governance capabilities are delivered by an extension of the cloud strategy capability. Just as the
CIO and business leaders are expected to participate in cloud strategy capabilities, their direct reports are
expected to participate in cloud governance activities.
Business employees that are members of the business unit who work closely with the leadership of the line-
of-business should be empowered to make decisions regarding corporate and technical risk.
Information Technology (IT) and Information Security (IS ) employees who understand the technical aspects
of the cloud transformation may serve in a rotating capacity instead of being a consistent provider of cloud
governance capabilities.

Maturation of cloud governance capability


Some large organizations have existing, dedicated teams that focus on IT governance. These teams specialize in
risk management across the IT portfolio. When those teams exist, the following maturity models can be
accelerated quickly. However, the IT governance team is encouraged to review the cloud governance model to
understand how governance shifts slightly in the cloud. Key articles include Extending corporate policy to the
cloud and the Five Disciplines of Cloud Governance.
No governance: It is common for organizations to move into the cloud with no clear plans for governance.
Before long, concerns around security, cost, scale, and operations begin to trigger conversations about the need
for a governance model and people to staff the processes associated with that model. Starting those
conversations before they become concerns is always a good first step to overcome the antipattern of "no
governance." The section on defining corporate policy can help facilitate those conversations.
Governance blocked: When concerns around security, cost, scale, and operations go unanswered, projects
and business goals tend to get blocked. Lack of proper governance generates fear, uncertainty, and doubt
among stakeholders and engineers. Stop this in its tracks by taking action early. The two governance guides
defined in the Cloud Adoption Framework can help you start small, set initially limiting policies to minimize
uncertainty and mature governance over time. Choose from the complex enterprise guide or standard
enterprise guide.
Voluntary governance: There tend to be brave souls in every enterprise. Those gallant few who are willing to
jump in and help the team learn from their mistakes. Often this is how governance starts, especially in smaller
companies. These brave souls volunteer time to fix some issues and push cloud adoption teams toward a
consistent well-managed set of best practices.
The efforts of these individuals are much better than "no governance" or "governance blocked" scenarios. While
their efforts should be commended, this approach should not be confused with governance. Proper governance
requires more than sporadic support to drive consistency, which is the goal of any good governance approach.
The guidance in the Five Disciplines of Cloud Governance can help develop this discipline.
Cloud custodian: This moniker has become a badge of honor for many cloud architects who specialize in early
stage governance. When governance practices first start out, the results appear similar to those of governance
volunteers. However, there is one fundamental difference. A cloud custodian has a plan in mind. At this stage of
maturity, the team is spending time cleaning up the messes made by the cloud architects who came before
them. However, the cloud custodian aligns that effort to well structured corporate policy. They also use
governance automation tools, like those outlined in the governance MVP.
Another fundamental difference between a cloud custodian and a governance volunteer is leadership support.
The volunteer puts in extra hours above regular expectations because of their quest to learn and do. The cloud
custodian gets support from leadership to reduce their daily duties to ensure regular allocations of time can be
invested in improving cloud governance.
Cloud guardian: As governance practices solidify and become accepted by cloud adoption teams, the role of
cloud architects who specialize in governance changes a bit, as does the role of the cloud governance team.
Generally, the more mature practices gain the attention of other subject matter experts who can help
strengthen the protections provided by governance implementations.
While the difference is subtle, it is an important distinction when building a governance-focused IT culture. A
cloud custodian cleans up the messes made by innovative cloud architects, and the two roles have natural
friction and opposing objectives. A cloud guardian helps keep the cloud safe, so other cloud architects can move
more quickly with fewer messes.
Cloud guardians begin using more advanced governance approaches to accelerate platform deployment and
help teams self-service their environmental needs, so they can move faster. Examples of these more advanced
functions are seen in the incremental improvements to the governance MVP, such as improvement of the
security baseline.
Cloud accelerators: Cloud guardians and cloud custodians naturally harvest scripts and automations that
accelerate the deployment of environments, platforms, or even components of various applications. Curating
and sharing these scripts in addition to centralized governance responsibilities develops a high degree of
respect for these architects throughout IT.
Those governance practitioners who openly share their curated scripts help deliver technology projects faster
and embed governance into the architecture of the workloads. This workload influence and support of good
design patterns elevate cloud accelerators to a higher rank of governance specialist.
Global governance: When organizations depend on globally dispersed IT needs, there can be significant
deviations in operations and governance in various geographies. Business unit demands and even local data
sovereignty requirements can cause governance best practices to interfere with required operations. In these
scenarios, a tiered governance model allows for minimally viable consistency and localized governance. The
article on multiple layers of governance provides more insights on reaching this level of maturity.
Every company is unique, and so are their governance needs. Choose the level of maturity that fits your
organization and use the Cloud Adoption Framework to guide the practices, processes, and tooling to help you
get there.

Next steps
As cloud governance matures, teams will be empowered to adopt the cloud at ever faster paces. Continued
cloud adoption efforts tend to trigger maturity in IT operations. This maturation may also necessitate the
development of cloud operations capabilities.
Develop cloud operations capabilities
Central IT capabilities
7 minutes to read • Edit Online

As cloud adoption scales, cloud governance capabilities alone may not be sufficient to govern adoption efforts.
When adoption is gradual, teams tend to organically develop the skills and processes needed to be ready for the
cloud over time.
However, when one cloud adoption team uses the cloud to achieve a high-profile business outcome, gradual
adoption is seldom the case. Success follows success. This is also true for cloud adoption, but it happens at cloud
scale. When cloud adoption expands from one team to multiple teams relatively quickly, additional support from
existing IT staff is needed. However, those staff members may lack the training and experience required to support
the cloud using cloud-native IT tools. This often drives the formation of a central IT team governing the cloud.
Cau t i on

While this is a common maturity step, it can present a high risk to adoption, potentially blocking innovation and
migration efforts if not managed effectively. See the risk section below to learn how to mitigate the risk of
centralization becoming a cultural antipattern.

Possible sources for central IT expertise


The skills needed to provide centralized IT capabilities could be provided by:
An existing Central IT team
Enterprise architects
IT operations
IT governance
IT infrastructure
Networking
Identity
Virtualization
Business continuity and disaster recovery
Application owners within IT

WARNING
Central IT should only be applied in the cloud when existing delivery on-premises is based on a Central IT model. If the
current on-premises model is based on delegated control, consider a cloud center of excellence (CCoE) approach for a more
compatible alternative.

Key responsibilities
Adapt existing IT practices to ensure adoption efforts result in well-governed, well-managed environments in the
cloud.
The following tasks are typically executed regularly:
Strategic tasks
Review:
business outcomes
financial models
motivations for cloud adoption
business risks
rationalization of the digital estate
Monitor adoption plans and progress against the prioritized migration backlog.
Identify and prioritize platform changes that are required to support the migration backlog.
Act as an intermediary or translation layer between cloud adoption needs and existing IT teams.
Leverage existing IT teams to accelerate platform capabilities and enable adoption.
Technical tasks
Build and maintain the cloud platform to support solutions.
Define and implement the platform architecture.
Operate and manage the cloud platform.
Continuously improve the platform.
Keep up with new innovations in the cloud platform.
Deliver new cloud capabilities to support business value creation.
Suggest self-service solutions.
Ensure that solutions meet existing governance and compliance requirements.
Create and validate deployment of platform architecture.
Review release plans for sources of new platform requirements.

Meeting cadence
Central IT expertise usually comes from a working team. Expect participants to commit much of their daily
schedules to alignment efforts. Contributions aren't limited to meetings and feedback cycles.

Central IT risks
Each of the cloud capabilities and phases of organizational maturity are prefixed with the word "cloud". Central IT
is the only exception. Central IT became prevalent when all IT assets could be housed in few locations, managed by
a small number of teams, and controlled through a single operations management platform. Global business
practices and the digital economy have largely reduced the instances of those centrally managed environments.
In the modern view of IT, assets are globally distributed. Responsibilities are delegated. Operations management is
delivered by a mixture of internal staff, managed service providers, and cloud providers. In the digital economy, IT
management practices are transitioning to a model of self-service and delegated control with clear guardrails to
enforce governance. Central IT can be a valuable contributor to cloud adoption by becoming a cloud broker and a
partner for innovation and business agility.
Central IT as a function is well positioned to take valuable knowledge and practices from existing on-premises
models and apply those practices to cloud delivery. However, this process will require change. New processes, new
skills, and new tools are required to support cloud adoption at scale. When Central IT adapts, it becomes an
important partner in cloud adoption efforts. However, if Central IT doesn't adapt to the cloud, or attempts to use
the cloud as a catalyst for tight-grain controls, Central IT quickly becomes a blocker to adoption, innovation, and
migration.
The measures of this risk are speed and flexibility. The cloud simplifies adopting new technologies quickly. When
new cloud capabilities can be deployed within minutes, but Central IT reviews add weeks or months to the
deployment process, then these centralized processes become a major impediment to business success. When this
indicator is encountered, consider alternative strategies to IT delivery.
Exceptions
Many industries require rigid adherence to third-party compliance. Some compliance requirements still demand
centralized IT control. Delivering on these compliance measures can add time to deployment processes, especially
for new technologies that haven't been used broadly. In these scenarios, expect delays in deployment during the
early stages of adoption. Similar situations my exist for companies that deal with sensitive customer data, but may
not be governed by a third-party compliance requirement.
Operate within the exceptions
When centralized IT processes are required and those processes create appropriate checkpoints in adoption of
new technologies, these innovation checkpoints can still be addressed quickly. Governance and compliance
requirements are designed to protect those things that are sensitive, not to protect everything. The cloud provides
simple mechanisms for acquiring and deploying isolated resources while maintaining proper guardrails.
A mature Central IT team maintains necessary protections but negotiates practices that still enable innovation.
Demonstrating this level of maturity depends on proper classification and isolation of resources.
Example narrative of operating within exceptions to empower adoption
This example narrative illustrates the approach taken by a mature Central IT team to empower adoption.
Contoso, LLC has adopted a Central IT model for the support of the business's cloud resources. To deliver this
model, they have implemented tight controls for various shared services such as ingress network connections. This
wise move reduced the exposure of their cloud environment and provided a single "break-glass" device to block all
traffic in case of a breach. Their security baseline policies state that all ingress traffic must come through a shared
device managed by the Central IT team.
However, one of their cloud adoption teams now requires an environment with a dedicated and specially
configured ingress network connection to use a specific cloud technology. An immature Central IT team would
simply refuse the request and prioritize its existing processes over adoption needs. Contoso's Central IT team is
different. They quickly identified a simple four-part solution to this dilemma: Classification, Negotiation, Isolation,
and Automation.
Classification: Since the cloud adoption team was in the early stages of building a new solution and didn't have
any sensitive data or mission-critical support needs, the assets in the environment were classified as low risk and
noncritical. Effective classification is a sign of maturity in Central IT. Classifying all assets and environments allows
for clearer policies.
Negotiation: Classification alone isn't sufficient. Shared services were implemented to consistently operate
sensitive and mission-critical assets. Changing the rules would compromise governance and compliance policies
designed for the assets that need more protection. Empowering adoption can't happen at the cost of stability,
security, or governance. This led to a negotiation with the adoption team to answer specific questions. Would a
business-led DevOps team be able to provide operations management for this environment? Would this solution
require direct access to other internal resources? If the cloud adoption team is comfortable with those tradeoffs,
then the ingress traffic might be possible.
Isolation: Since the business can provide its own ongoing operations management, and since the solution doesn't
rely on direct traffic to other internal assets, it can be cordoned off in a new subscription. That subscription is also
added to a separate node of the new management group hierarchy.
Automation: Another sign of maturity in this team is their automation principles. The team uses Azure Policy to
automate policy enforcement. They also use Azure Blueprints to automate deployment of common platform
components and enforce adherence to the defined identity baseline. For this subscription and any others in the
new management group, the policies and templates are slightly different. Policies blocking ingress bandwidth have
been lifted. They have been replaced by requirements to route traffic through the shared services subscription, like
any ingress traffic, to enforce traffic isolation. Since the on-premises operations management tooling can't access
this subscription, agents for that tool are no longer required either. All other governance guardrails required by
other subscriptions in the management group hierarchy are still enforced, ensuring sufficient guardrails.
The mature creative approach of Contoso's central IT team provided a solution that didn't compromise
governance or compliance, but was still encouraged adoption. This approach of brokering rather than owning
cloud-native approaches to centralized IT is the first step toward building a true cloud center of excellence (CCoE ).
Adopting this approach to quickly evolve existing policies will allow for centralized control when required and
governance guardrails when more flexibility is acceptable. Balancing these two considerations mitigates the risks
associated with Central IT in the cloud.

Next steps
As Central IT matures in the cloud, the next maturity step is typically looser coupling of cloud operations. The
availability of cloud-native operations management tooling and lower operating costs for PaaS -first solutions
often lead to business teams (or more specifically, DevOps teams within the business) assuming responsibility for
cloud operations.
Cloud operations capability
Cloud operation capabilities
2 minutes to read • Edit Online

Business transformation may be enabled by cloud adoption. However, returns are only realized when the
workloads deployed to the cloud are operating in alignment with performance expectations. As additional
workloads adopt cloud technologies, additional operations capacity will be required.
Traditional IT operations were required to focus on maintaining current-state operations for a wide variety of
low -level technical assets. Things like storage, cpu, memory, network equipment, servers, and virtual machine
hosts require continuous maintenance to maintain peek operations. Capital budgets often include large expenses
related to annual or periodic updates to these low -level assets.
Human capital within operations would also focus heavily on the monitoring, repair, and remediation of issues
related to these assets. In the cloud, many of these capital costs and operations activities are transferred to the
cloud provider. This provides an opportunity for IT operations to improve and provide significant additional
value.

Possible sources for this capability


The skills needed to provide cloud operations capabilities could be provided by:
IT operations
Outsource IT operations vendors
Cloud service providers
Cloud-managed service providers
Application-specific operations teams
Business application operations teams
DevOps teams

Key responsibilities
The duties of the people providing cloud operations capability is to deliver maximum workload performance and
minimum business interruptions within agreed upon operations budgets.
Strategic tasks
Review business outcomes, financial models, motivations for cloud adoption, business risks, and
rationalization of the digital estate.
Determine workload criticality, impact of disruptions or performance degradation.
Establish business approved cost/performance commitments.
Monitor and operate cloud workloads.
Technical tasks
Maintain asset and workload inventory.
Monitor performance of workloads.
Maintain operational compliance.
Protect workloads and associated assets.
Recover assets in the case of performance degradation or business interruption.
Mature capabilities of core platforms.
Continuously improve workload performance.
Improve budgetary and design requirements of workloads to fit commitments to the business.

Meeting cadence
Those performing cloud operations capabilities should be involved in release planning and cloud center of
excellence planning to provide feedback and prepare for operational requirements.

Next steps
As adoption and operations scale, it's important to define and automate governance best practices that extend
existing IT requirements. Forming a cloud center of excellence is an important step to scaling cloud adoption,
cloud operations, and cloud governance efforts.
Establish a cloud center of excellence
Cloud center of excellence
9 minutes to read • Edit Online

Business and technical agility are core objectives of most IT organizations. A cloud center of excellence (CCoE )
is a function that creates a balance between speed and stability.

Function structure
A CCoE model requires collaboration between each of the following capabilities:
Cloud adoption (specifically solution architects)
Cloud strategy (specifically the program and project managers)
Cloud governance
Cloud platform
Cloud automation

Impact and cultural change


When this function is properly structured and supported, the participants can accelerate innovation and
migration efforts while reducing the overall cost of change and increasing business agility. When successfully
implemented, this function can produce noticeable reductions in time-to-market. As team practices mature,
quality indicators will improve, including reliability, performance efficiency, security, maintainability, and
customer satisfaction. These gains in efficiency, agility, and quality are especially vital if the company plans on
implementing large-scale cloud migration efforts or has a desire to use the cloud to drive innovations
associated with market differentiation.
When successful, a CCoE model will create a significant cultural shift in IT. The fundamental premise of a
CCoE approach is that IT serves as a broker, partner, or representative to the business. This model is a
paradigm shift away from the traditional view of IT as an operations unit or abstraction layer between the
business and IT assets.
The following image provides an analogy for this cultural change. Without a CCoE approach, IT tends to focus
on providing control and central responsibility, acting like the stoplights at an intersection. When the CCoE is
successful, the focus is on freedom and delegated responsibility, which is more like a roundabout at an
intersection.
Neither of the approaches illustrated in the analogy image above is right or wrong, they're just alternative
views of responsibility and management. If the desire is to establish a self-service model that allows business
units to make their own decisions while adhering to a set of guidelines and established, repeatable controls,
then a CCoE model could fit within the technology strategy.

Key responsibilities
The primary duty of the CCoE team is to accelerate cloud adoption through cloud-native or hybrid solutions.
The objective of the CCoE is to:
Help build a modern IT organization through agile approaches to capture and implement business
requirements.
Use reusable deployment packages that align with security, compliance, and service management policies.
Maintain a functional Azure platform in alignment with operational procedures.
Review and approve the use of cloud-native tools.
Over time, standardize and automate commonly needed platform components and solutions.

Meeting cadence
The CCoE is a function staffed by four high demand teams. It is important to allow for organic collaboration
and track growth through a common repository/solution catalog. Maximize natural interactions, but minimize
meetings. When this function matures, the teams should try to limit dedicated meetings. Attendance at
recurring meetings, like release meetings hosted by the cloud adoption team, will provide data inputs. In
parallel, a meeting after each release plan is shared can provide a minimum touch point for this team.

Solutions and controls


Each member of the CCoE is tasked with understanding the necessary constraints, risks, and protections that
led to the current set of IT controls. The collective efforts of the CCoE should turn that understanding into
cloud-native (or hybrid) solutions or controls, which enable the desired self-service business outcomes. As
solutions are created, they are shared with various teams in the form of controls or automations that serve as
guardrails for various efforts. Those guardrails help to route the free-flowing activities of various teams, while
delegating responsibilities to the participants in various migration or innovation efforts.
Examples of this transition:
SCENARIO PRE-CCOE SOLUTION POST-CCOE SOLUTION

Provision a production SQL Server Network, IT, and data platform teams The team requiring the server deploys
provision various components over a PaaS instance of Azure SQL
the course of days or even weeks. Database. Alternatively, a
preapproved template could be used
to deploy all of the IaaS assets to the
cloud in hours.

Provision a development environment Network, IT, Development, and The development team defines their
DevOps teams agree to specs and own specs and deploys an
deploy an environment. environment based on allocated
budget.

Update security requirements to Networking, IT, and security teams Cloud governance tools are used to
improve data protection update various networking devices update policies that can be applied
and VMs across multiple immediately to all assets in all cloud
environments to add protections. environments.

Negotiations
At the root of any CCoE effort is an ongoing negotiation process. The CCoE team negotiates with existing IT
functions to reduce central control. The trade-offs for the business in this negotiation are freedom, agility, and
speed. The value of the trade-off for existing IT teams is delivered as new solutions. The new solutions provide
the existing IT team with one or more of the following benefits:
Ability to automate common issues.
Improvements in consistency (reduction in day-to-day frustrations).
Opportunity to learn and deploy new technical solutions.
Reductions in high severity incidents (fewer quick fixes or late-night pager-duty responses).
Ability to broaden their technical scope, addressing broader topics.
Participation in higher-level business solutions, addressing the impact of technology.
Reduction in menial maintenance tasks.
Increase in technology strategy and automation.
In exchange for these benefits, the existing IT function may be trading the following values, whether real or
perceived:
Sense of control from manual approval processes.
Sense of stability from change control.
Sense of job security from completion of necessary yet repetitive tasks.
Sense of consistency that comes from adherence to existing IT solution vendors.
In healthy cloud-forward companies, this negotiation process is a dynamic conversation between peers and
partnering IT teams. The technical details may be complex, but are manageable when IT understands the
objective and is supportive of the CCoE efforts. When IT is less than supportive, the following section on
enabling CCoE success can help overcome cultural blockers.

Enable CCoE success


Before proceeding with this model, it is important to validate the company's tolerance for a growth mindset
and IT's comfort with releasing central responsibilities. As mentioned above, the purpose of a CCoE is to
exchange control for agility and speed.
This type of change takes time, experimentation, and negotiation. There will be bumps and set backs during
this maturation process. However, if the team stays diligent and isn't discouraged from experimentation, there
is a high probability of success in improving agility, speed, and reliability. One of the biggest factors in success
or failure of a CCoE is support from leadership and key stakeholders.
Key stakeholders
IT Leadership is the first and most obvious stakeholder. IT managers will play an important part. However, the
support of the CIO and other executive-level IT leaders is needed during this process.
Less obvious is the need for business stakeholders. Business agility and time-to-market are key motivations
for CCoE formation. As such, the key stakeholders should have a vested interest in these areas. Examples of
business stakeholders include line-of-business leaders, finance executives, operations executives, and business
product owners.
Business stakeholder support
CCoE efforts can be accelerated with support from the business stakeholders. Much of the focus of CCoE
efforts is centered around making long-term improvements to business agility and speed. Defining the impact
of current operating models and the value of improvements is valuable as a guide and negotiation tool for the
CCoE. Documenting the following items is suggested for CCoE support:
Establish a set of business outcomes and goals that are expected as a result of business agility and speed.
Clearly define pain points created by current IT processes (such as speed, agility, stability, and cost
challenges).
Clearly define the historical impact of those pain points (such as lost market share, competitor gains in
features and functions, poor customer experiences, and budget increases).
Define business improvement opportunities that are blocked by the current pain points and operating
models.
Establish timelines and metrics related to those opportunities.
These data points are not an attack on IT. Instead, they help CCoE learn from the past and establish a realistic
backlog and plan for improvement.
Ongoing support and engagement: CCoE teams can demonstrate quick returns in some areas. However,
the higher-level goals, like business agility and time-to-market, can take much longer. During maturation,
there is a high risk of the CCoE becoming discouraged or being pulled off to focus on other IT efforts.
During the first six to nine months of CCoE efforts, we recommend that business stakeholders allocate time to
meet monthly with the IT leadership and the CCoE. There is little need for formal ceremony to these meetings.
Simply reminding the CCoE members and their leadership of the importance of this program can go along
way to driving CCoE success.
Additionally, we recommend that the business stakeholders stay informed of the progress and blockers
experienced by the CCoE team. Many of their efforts will seem like technical minutiae. However, it is
important for business stakeholders to understand the progress of the plan, so they can engage when the
team looses steam or becomes distracted by other priorities.
IT stakeholder support
Support the vision: A successful CCoE effort requires a great deal of negotiation with existing IT team
members. When done well, all of IT contributes to the solution and feels comfortable with the change. When
this is not the case, some members of the existing IT team may want to hold on to control mechanisms for
various reasons. Support of IT stakeholders will be vital to the success of the CCoE when those situations
occur. Encouragement and reinforcement of the overall goals of the CCoE is important to resolve blockers to
proper negotiation. On rare occasions, IT stakeholders may even need to step in and break up a deadlock or
tied vote to keep the CCoE progressing.
Maintain focus: A CCoE can be a significant commitment for any resource-constrained IT team. Removing
strong architects from short-term projects to focus on long-term gains can create difficulty for team members
who aren't part of the CCoE. It is important that IT leadership and IT stakeholders stay focused on the goal of
the CCoE. The support of IT leaders and IT stakeholders is required to deprioritize the disruptions of day-to-
day operations in favor of CCoE duties.
Create a buffer: The CCoE team will experiment with new approaches. Some of those approaches won't
align well with existing operations or technical constraints. There is a real risk of the CCoE experiencing
pressure or recourse from other teams when experiments fail. Encouragement and buffering the team from
the consequences of "Fast Fail" learning opportunities is important. It's equally important to hold the team
accountable to a growth mindset, ensuring that they are learning from those experiments and finding better
solutions.

Next steps
A CCoE model requires both a cloud platform capabilities and cloud automation capabilities. The next step is
to align cloud platform capabilities.
Align cloud platform capabilities
Cloud platform capabilities
2 minutes to read • Edit Online

The cloud introduces many technical changes as well as opportunities to streamline technical solutions. However,
general IT principles and business needs stay the same. You still need to protect sensitive business data. If your IT
platform depends on a local area network, there's a good chance that you'll need network definitions in the cloud.
Users who need to access applications and data will want their current identities to access relevant cloud
resources.
While the cloud presents the opportunity to learn new skills, your current architects should be able to directly
apply their experiences and subject matter expertise. Cloud platform capabilities are usually provided by a select
group of architects who focus on learning about the cloud platform. These architects then aid others in decision
making and the proper application of controls to cloud environments.

Possible sources for cloud platform expertise


The skills needed to provide full platform capabilities could be provided by:
Enterprise architecture
IT operations
IT governance
IT infrastructure
Networking
Identity
Virtualization
Business continuity and disaster recovery
Application owners within IT

Key responsibilities
Cloud platform duties center around the creation and support of your cloud platform or landing zones.
The following tasks are typically executed on a regular basis:
Strategic tasks
Review:
business outcomes
financial models
motivations for cloud adoption
business risks
rationalization of the digital estate
Monitor adoption plans and progress against the prioritized migration backlog.
Identify and prioritize platform changes that are required to support the migration backlog.
Technical tasks
Build and maintain the cloud platform to support solutions.
Define and implement the platform architecture.
Operate and manage the cloud platform.
Continuously improve the platform.
Keep up with new innovations in the cloud platform.
Bring new cloud capabilities to support business value creation.
Suggest self-service solutions.
Ensure solutions meet existing governance/compliance requirements.
Create and validate deployment of platform architecture.
Review release plans for sources of new platform requirements.

Meeting cadence
Cloud platform expertise usually comes from a working team. Expect participants to commit a large portion of
their daily schedules to cloud platform work. Contributions aren't limited to meetings and feedback cycles.

Next steps
As your cloud platform becomes better defined, aligning cloud automation capabilities can accelerate adoption. It
can also help establish best practices while reducing business and technical risks.
Align cloud automation capabilities
Cloud automation capabilities
2 minutes to read • Edit Online

During cloud adoption efforts, cloud automation capabilities will unlock the potential of DevOps and a cloud-
native approach. Expertise in each of these areas can accelerate adoption and innovation.

Possible sources for cloud automation expertise


The skills needed to provide cloud automation capabilities could be provided by:
DevOps engineers
Developers with DevOps and infrastructure expertise
IT engineers with DevOps and automation expertise
These subject matter experts might be providing capabilities in other areas such as cloud adoption, cloud
governance, or cloud platform. After they demonstrate proficiency at automating complex workloads, you can
recruit these experts to deliver automation value.

Mindset
Before you admit a team member to this group, they should demonstrate three key characteristics:
Expertise in any cloud platform with a special emphasis on DevOps and automation.
A growth mindset or openness to changing the way IT operates today.
A desire to accelerate business change and remove traditional IT roadblocks.

Key responsibilities
The primary duty of cloud automation is to own and advance the solution catalog. The solution catalog is a
collection of prebuilt solutions or automation templates. These solutions can rapidly deploy various platforms as
required to support needed workloads. These solutions are building blocks that accelerate cloud adoption and
reduce the time to market during migration or innovation efforts.
Examples of solutions in the catalog include:
A script to deploy a containerized application
A Resource Manager template to deploy a SQL HA AO cluster
Sample code to build a deployment pipeline using Azure DevOps
An Azure DevTest Labs instance of the corporate ERP for development purposes
Automated deployment of a self-service environment commonly requested by business users
The solutions in the solution catalog aren't deployment pipelines for a workload. Instead, you might use
automation scripts in the catalog to quickly create a deployment pipeline. You might also use a solution in the
catalog to quickly provision platform components to support workload tasks like automated deployment, manual
deployment, or migration.
These following tasks are typically executed by cloud automation on a regular basis:
Strategic tasks
Review:
business outcomes
financial models
motivations for cloud adoption
business risks
rationalization of the digital estate
Monitor adoption plans and progress against the prioritized migration backlog.
Identify opportunities to accelerate cloud adoption, reduce effort through automation, and improve security,
stability, and consistency.
Prioritize a backlog of solutions for the solution catalog that delivers the most value given other strategic
inputs.
Technical tasks
Curate or develop solutions based on the prioritized backlog.
Ensure solutions align to platform requirements.
Ensure solutions are consistently applied and meet existing governance/compliance requirements.
Create and validate solutions in the catalog.
Review release plans for sources of new automation opportunities.

Meeting cadence
Cloud automation is a working team. Expect participants to commit a large portion of their daily schedules to
cloud automation work. Contributions aren't limited to meetings and feedback cycles.
The cloud automation team should align activities with other areas of capability. This alignment might result in
meeting fatigue. To ensure cloud automation has sufficient time to manage the solution catalog, you should
review meeting cadences to maximize collaboration and minimize disruptions to development activities.

Next steps
As the essential cloud capabilities align, the collective teams can help develop needed technical skills.
Building technical skills
Establish team structures
5 minutes to read • Edit Online

Every cloud capability is provided by someone during every cloud adoption effort. These assignments and team
structures can develop organically, or they can be intentionally designed to match a defined team structure.
As adoption needs grow, so does the need for balance and structure. This article provides examples of common
team structures at various stages of organizational maturity. The following graphic and list outline those structures
based on typical maturation stages. Use these examples to find the organizational structure that best aligns with
your operational needs.

Organizational structures tend to move through the common maturity model that's outlined here:
1. Cloud adoption team only
2. MVP best practice
3. Central IT
4. Strategic alignment
5. Operational alignment
6. Cloud center of excellence (CCoE )
Most companies start with little more than a cloud adoption team. However, we recommend that you establish an
organizational structure that more closely resembles the MVP best practice structure.

Cloud adoption team only


The nucleus of all cloud adoption efforts is the cloud adoption team. This team drives the technical changes that
enable adoption. Depending on the objectives of the adoption effort, this team may include a diverse range of
team members who handle a broad set of technical and business tasks.
For small-scale or early-stage adoption efforts, this team might be as small as one person. In larger-scale or late-
stage efforts, it's common to have several cloud adoption teams, each with around six engineers. Regardless of
size or tasks, the consistent aspect of any cloud adoption team is that it provides the means to onboarding
solutions into the cloud. For some organizations, this may be a sufficient organizational structure. The cloud
adoption team article provides more insight into the structure, composition, and function of the cloud adoption
team.

WARNING
Operating with only a cloud adoption team (or multiple cloud adoption teams) is considered an antipattern and should be
avoided. At a minimum, consider the MVP best practice.

Best practice: minimum viable product (MVP)


We recommend that you have two teams to create balance across cloud adoption efforts. These two teams are
responsible for various capabilities throughout the adoption effort.
Cloud adoption team: This team is accountable for technical solutions, business alignment, project
management, and operations for solutions that are adopted.
Cloud governance team: To balance the cloud adoption team, a cloud governance team is dedicated to
ensuring excellence in the solutions that are adopted. The cloud governance team is accountable for platform
maturity, platform operations, governance, and automation.

This proven approach is considered an MVP because it may not be sustainable. Each team is wearing many hats,
as outlined in the responsible, accountable, consulted, informed (RACI) charts.
The following sections describe a fully staffed, proven organizational structure along with approaches to aligning
the appropriate structure to your organization.

Central IT
As adoption scales, the cloud governance team may struggle to keep pace with the flow of innovation from
multiple cloud adoption teams. This is especially true in environments which have heavy compliance, operations,
or security requirements. At this stage, it is common for companies to shift cloud responsibilities to an existing
central IT team. If that team is able to reassess tools, processes, and people to better support cloud adoption at
scale, then including the central IT team can add significant value. Bringing in subject matter experts from
operations, automation, security, and administration to modernize Central IT can drive effective operational
innovations.
Unfortunately, the central IT phase can be one of the riskiest phases of organizational maturity. The central IT team
must come to the table with a strong growth mindset. If the team views the cloud as an opportunity to grow and
adapt their capabilities, then it can provide great value throughout the process. However, if the central IT team
views cloud adoption primarily as a threat to their existing model, then the central IT team becomes an obstacle to
the cloud adoption teams and the business objectives they support. Some central IT teams have spent months or
even years attempting to force the cloud into alignment with on-premises approaches, with only negative results.
The cloud doesn't require that everything change within Central IT, but it does require change. If resistance to
change is prevalent within the central IT team, this phase of maturity can quickly become a cultural antipattern.
Cloud adoption plans heavily focused on platform as a service (PaaS ), DevOps, or other solutions that require less
operations support are less likely to see value during this phase of maturity. On the contrary, these types of
solutions are the most likely to be hindered or blocked by attempts to centralize IT. A higher level of maturity, like
a cloud center of excellence (CCoE ), is more likely to yield positive results for those types of transformational
efforts. To understand the differences between Central IT in the cloud and a CCoE, see Cloud center of excellence.

Strategic alignment
As the investment in cloud adoption grows and business values are realized, business stakeholders often become
more engaged. A defined cloud strategy team, as the following image illustrates, aligns those business
stakeholders to maximize the value realized by cloud adoption investments.

When maturity happens organically, as a result of IT-led cloud adoption efforts, strategic alignment is usually
preceded by a governance or central IT team. When cloud adoption efforts are lead by the business, the focus on
operating model and organization tends to happen earlier. Whenever possible, business outcomes and the cloud
strategy team should both be defined early in the process.

Operational alignment
Realizing business value from cloud adoption efforts requires stable operations. Operations in the cloud may
require new tools, processes, or skills. When stable IT operations are required to achieve business outcomes, it's
important to add a defined cloud operations team, as shown here.

Cloud operations can be delivered by the existing IT operations roles. But it's not uncommon for cloud operations
to be delegated to other parties outside of IT operations. Managed service providers, DevOps teams, and business
unit IT often assume the responsibilities associated with cloud operations, with support and guardrails provided
by IT operations. This is increasingly common for cloud adoption efforts that focus heavily on DevOps or PaaS
deployments.

Cloud center of excellence


At the highest state of maturity, a cloud center of excellence aligns teams around a cloud-first modern operating
model. This approach provides central IT functions like governance, security, platform, and automation.

The primary difference between this structure and the Central IT structure above is a focus on self-service. The
teams in this structure organize with the intent of delegating control as much as possible. Aligning governance
and compliance practices to cloud-native solutions creates guardrails and protection mechanisms. Unlike the
Central IT model, the cloud-native approach maximizes innovation and minimizes operational overhead. For this
model to be adopted, mutual agreement to modernize IT processes will be required from business and IT
leadership. This model is unlikely to occur organically and often requires executive support.

Next steps
After aligning to a certain stage of organizational structure maturity, you can use RACI charts to align
accountability and responsibility across each team.
Align the appropriate RACI chart
Align responsibilities across teams
2 minutes to read • Edit Online

Learn to align responsibilities across teams by developing a cross-team matrix that identifies responsible,
accountable, consulted, and informed (RACI) parties. This article provides an example RACI matrix for the
organizational structures described in Establish team structures:
Cloud adoption team only
MVP best practice
Central IT
Strategic alignment
Operational alignment
Cloud center of excellence (CCoE )
To track organizational structure decisions over time, download and modify the RACI spreadsheet template.
The examples in this article specify these RACI constructs:
The one team that is accountable for a function.
The teams that are responsible for the outcomes.
The teams that should be consulted during planning.
The teams that should be informed when work is completed.
The last row of each table (except the first) contains a link to the most-aligned cloud capability for additional
information.

Cloud adoption team only


BUSINESS CHANGE SOLUTION PLATFORM PLATFORM
SOLUTION ALIGNMEN MANAGEM OPERATIO GOVERNA PLATFORM OPERATIO AUTOMATI
DELIVERY T ENT NS NCE MATURITY NS ON

Cloud Accounta Accounta Accounta Accounta Accounta Accounta Accounta Accounta


adoption ble ble ble ble ble ble ble ble
team

Best practice: minimum viable product (MVP)


BUSINESS CHANGE SOLUTION PLATFORM PLATFORM
SOLUTION ALIGNMEN MANAGEM OPERATIO GOVERNA PLATFORM OPERATIO AUTOMATI
DELIVERY T ENT NS NCE MATURITY NS ON

Cloud Accounta Accounta Accounta Accounta Consulted Consulted Consulted Informed


adoption ble ble ble ble
team

Cloud Consulted Informed Informed Informed Accounta Accounta Accounta Accounta


governan ble ble ble ble
ce team
BUSINESS CHANGE SOLUTION PLATFORM PLATFORM
SOLUTION ALIGNMEN MANAGEM OPERATIO GOVERNA PLATFORM OPERATIO AUTOMATI
DELIVERY T ENT NS NCE MATURITY NS ON

Aligned Cloud Cloud Cloud Cloud CCoE-Clo CCoE-Clo CCoE-Clo CCoE-Clo


cloud adoption strategy strategy operation ud ud ud ud
capability s governan platform platform automatio
ce n

Central IT
BUSINESS CHANGE SOLUTION PLATFORM PLATFORM
SOLUTION ALIGNMEN MANAGEM OPERATIO GOVERNA PLATFORM OPERATIO AUTOMATI
DELIVERY T ENT NS NCE MATURITY NS ON

Cloud Accounta Accounta Responsib Responsib Informed Informed Informed Informed


adoption ble ble le le
team

Cloud Consulted Informed Informed Informed Accounta Consulted Responsib Informed


governan ble le
ce team

Central IT Consulted Informed Accounta Accounta Responsib Accounta Accounta Accounta


ble ble le ble ble ble

Aligned Cloud Cloud Cloud Cloud Cloud Central IT Central IT Central IT


cloud adoption strategy strategy operation governan
capability s ce

Strategic alignment
BUSINESS CHANGE SOLUTION PLATFORM PLATFORM
SOLUTION ALIGNMEN MANAGEM OPERATIO GOVERNA PLATFORM OPERATIO AUTOMATI
DELIVERY T ENT NS NCE MATURITY NS ON

Cloud Consulted Accounta Accounta Consulted Consulted Informed Informed Informed


strategy ble ble
team

Cloud Accounta Consulted Responsib Accounta Informed Informed Informed Informed


adoption ble le ble
team

CCoE Consulted Informed Informed Informed Accounta Accounta Accounta Accounta


Model ble ble ble ble
RACI

Aligned Cloud Cloud Cloud Cloud CCoE-Clo CCoE-Clo CCoE-Clo CCoE-Clo


cloud adoption strategy strategy operation ud ud ud ud
capability s governan platform platform automatio
ce n

Operational alignment
BUSINESS CHANGE SOLUTION PLATFORM PLATFORM
SOLUTION ALIGNMEN MANAGEM OPERATIO GOVERNA PLATFORM OPERATIO AUTOMATI
DELIVERY T ENT NS NCE MATURITY NS ON

Cloud Consulted Accounta Accounta Consulted Consulted Informed Informed Informed


strategy ble ble
team

Cloud Accounta Consulted Responsib Consulted Informed Informed Informed Informed


adoption ble le
team

Cloud Consulted Consulted Responsib Accounta Consulted Informed Accounta Consulted


operation le ble ble
s team

CCoE Consulted Informed Informed Informed Accounta Accounta Responsib Accounta


Model ble ble le ble
RACI

Aligned Cloud Cloud Cloud Cloud CCoE-Clo CCoE-Clo CCoE-Clo CCoE-Clo


cloud adoption strategy strategy operation ud ud ud ud
capability s governan platform platform automatio
ce n

Cloud center of excellence (CCoE)


BUSINESS CHANGE SOLUTION PLATFORM PLATFORM
SOLUTION ALIGNMEN MANAGEM OPERATIO GOVERNA PLATFORM OPERATIO AUTOMATI
DELIVERY T ENT NS NCE MATURITY NS ON

Cloud Consulted Accounta Accounta Consulted Consulted Informed Informed Informed


strategy ble ble
team

Cloud Accounta Consulted Responsib Consulted Informed Informed Informed Informed


adoption ble le
team

Cloud Consulted Consulted Responsib Accounta Consulted Informed Accounta Consulted


operation le ble ble
s team

Cloud Consulted Informed Informed Consulted Accounta Consulted Responsib Informed


governan ble le
ce team

Cloud Consulted Informed Informed Consulted Consulted Accounta Responsib Responsib


platform ble le le
team

Cloud Consulted Informed Informed Informed Consulted Responsib Responsib Accounta


automati le le ble
on team
BUSINESS CHANGE SOLUTION PLATFORM PLATFORM
SOLUTION ALIGNMEN MANAGEM OPERATIO GOVERNA PLATFORM OPERATIO AUTOMATI
DELIVERY T ENT NS NCE MATURITY NS ON

Aligned Cloud Cloud Cloud Cloud CCoE-Clo CCoE-Clo CCoE-Clo CCoE-Clo


cloud adoption strategy strategy operation ud ud ud ud
capability s governan platform platform automatio
ce n

Next steps
To track decisions about organization structure over time, download and modify the RACI spreadsheet template.
Copy and modify the most closely aligned sample from the RACI matrices in this article.
Download the RACI spreadsheet template
Skills readiness path during the Ready phase of a
migration
3 minutes to read • Edit Online

During the ready phase of a migration, the objective is to prepare for the journey ahead. This phase is
accomplished in two primary areas: organizational and environmental (technical) readiness. Both may require new
skills for technical and nontechnical contributors. The following information can help your organization build the
necessary skills.

Organizational readiness learning paths


Depending on the motivations and business outcomes that are associated with a cloud-adoption effort, leaders
may need to establish new organizational structures or virtual teams (v-teams) to facilitate various functions. The
following articles can help your organization develop the necessary skills to structure those teams to meet the
desired outcomes:
Initial organization alignment: An overview of alignment and team structures to help meet specific goals.
Break down silos and fiefdoms: Learn about two common organizational antipatterns and ways to guide the
teams to productive collaboration.

Environmental (technical) readiness learning paths


During the ready phase, technical staff have to create a migration landing zone to host, operate, and govern
workloads that they migrate to the cloud. Use the following paths to accelerate development of the necessary
skills:
Create an Azure account: The first step to using Azure is to create an account. Your account holds the Azure
services that you provision and handles your personal settings, like identity, billing, and preferences.
Azure portal: Tour the Azure portal features and services, and customize the portal.
Introduction to Azure: Get started with Azure. Create and configure your first virtual machine in the cloud.
Introduction to security in Azure: Learn the basic concepts to protect your infrastructure and data in the cloud.
Understand what responsibilities are yours and what Azure handles.
Manage resources in Azure: Learn how to work through the Azure CLI and web portal to create, manage, and
control cloud-based resources.
Create a VM: Use the Azure portal to create a virtual machine.
Azure network services: Learn Azure networking basics and how to improve resiliency and reduce latency.
Azure compute options: Review the Azure compute services.
Secure resources with RBAC: Use role-based access control (RBAC ) to secure resources.
Azure Storage options: Learn about the benefits of Azure data storage.
During the ready phase, architects have to design solutions that span all Azure environments. The following
resources can prepare them for these tasks:
Foundations for cloud architecture: A PluralSight course to help architect the right foundational solutions.
Microsoft Azure architecture: A PluralSight course to ground architects in Azure architecture.
Designing migrations for Microsoft Azure: A PluralSight course to help architects design a migration solution.

Deeper skills exploration


The following information describes resources for additional learning.
Typical mappings of cloud IT roles
Microsoft and its partners offer options to help all audiences develop their skills for using Microsoft Azure
services.
Microsoft Virtual Academy: Offers training from the people who helped build Azure. From a basic overview to
deep technical training, IT staff learn how to apply Azure to their business.
Microsoft IT Pro Career Center: A free online resource to help map your cloud career path. Learn from
industry experts about your cloud role and the skills you need. Follow a learning curriculum at your own pace
to build the skills that you need to stay relevant.
We recommend that you turn your knowledge of Azure into official recognition with Microsoft Azure certification
training and exams.
Microsoft Learn
Microsoft Learn is a new approach to learning. Readiness for the new responsibilities that come with cloud
adoption doesn't come easily. Microsoft Learn provides a rewarding approach to hands-on learning that helps you
achieve your goals faster. Earn points, reach new levels, and achieve more.
The following are a few examples of role-specific learning paths on Microsoft Learn:
Business users may experience a steep learning curve when they help plan, test, and adopt cloud-based
technology. Microsoft Learn modules focus on adopting cloud models and tools for better managing
business through cloud-based services.
Solution architects can access hundreds of modules and learning paths. The available topics range from
core infrastructure services to advanced data transformation.
Administrators have access to modules that focus on Azure fundamentals, configuring containers, and even
advanced administration in the cloud.
Developers can use Learn resources to help during architecture, governance, modernization activities.

Learn more
For additional learning paths, browse the Microsoft Learn catalog. Use the Roles filter to align learning paths with
your role.
Build a cost-conscious organization
6 minutes to read • Edit Online

As outlined in Motivations: Why are we moving to the cloud?, there are many sound reasons for a company to
adopt the cloud. When cost reduction is a primary driver, it's important to create a cost-conscious organization.
Ensuring cost consciousness is not a one-time activity. Like other cloud-adoption topics, it's iterative. The following
diagram outlines this process to focus on three interdependent activities: visibility, accountability, and optimization.
These processes play out at macro and micro levels, which we describe in detail in this article.

Figure 1 - Outline of the cost-conscious organization.

General cost-conscious processes


Visibility: For an organization to be conscious of costs, it needs visibility into those costs. Visibility in a
cost-conscious organization requires consistent reporting for the teams adopting the cloud, finance teams
who manage budgets, and management teams who are responsible for the costs. This visibility is
accomplished by establishing:
The right reporting scope.
Proper resource organization (management groups, resource groups, subscriptions).
Clear tagging strategies.
Proper access controls (RBAC ).
Accountability: Accountability is as important as visibility. Accountability starts with clear budgets for
adoption efforts. Budgets should be well established, clearly communicated, and based on realistic
expectations. Accountability requires an iterative process and a growth mindset to drive the right level of
accountability.
Optimization: Optimization is the action that creates cost reductions. During optimization, resource
allocations are modified to reduce the cost of supporting various workloads. This process requires iteration
and experimentation. Each reduction in cost reduces performance. Finding the right balance between cost
control and end-user performance expectations demands input from multiple parties.
The following sections describe the roles that the cloud strategy team, cloud adoption team, cloud governance
team, and cloud center of excellence (CCoE ) play in developing a cost-conscious organization.

Cloud strategy team


Building cost consciousness into cloud-adoption efforts starts at the leadership level. To be effective long term, the
cloud strategy team should include a member of the finance team. If your financial structure holds business
managers accountable for solution costs, they should be invited to join the team as well. In addition to the core
activities that are typically assigned to the cloud strategy team, all members of the cloud strategy team should also
be responsible for:
Visibility: The cloud strategy team and cloud governance team need to know the actual costs of the cloud-
adoption efforts. Given the executive-level view of this team, they should have access to multiple cost
scopes to analyze spending decisions. Typically, an executive needs visibility into the total costs across all
cloud "spend." But as active members of the cloud strategy team, they should also be able to view costs per
business unit or per billing unit to validate showback, chargeback, or other cloud accounting models.
Accountability: Budgets should be established between the cloud strategy, cloud governance, and cloud
adoption teams based on expected adoption activities. When deviations from budget occur, the cloud
strategy team and the cloud governance team must partner to quickly determine the best course of action
to remediate the deviations.
Optimization: During optimization efforts, the cloud strategy team can represent the investment and
return value of specific workloads. If a workload has strategic value or financial impact on the business, cost-
optimization efforts should be monitored closely. If there's no strategic impact on the organization and no
inherent cost for poor performance of a workload, the cloud strategy team may approve overoptimization.
To drive these decisions, the team must be able to view costs on a per-project scope.

Cloud adoption team


The cloud adoption team is at the center of all adoption activities. So, they're the first line of defense against
overspending. This team has an active role in all three phases of cost-consciousness.
Visibility:
Awareness: It's important for the cloud adoption team to have visibility into the cost-saving goals of the
effort. Simply stating that the cloud-adoption effort will help reduce costs is a recipe for failure. Specific
visibility is important. For example, if the goal is to reduce datacenter TCO by 3 percent or annual
operating expenses by 7 percent, disclose those targets early and clearly.
Telemetry: This team needs visibility into the impact of their decisions. During migration or innovation
activities, their decisions have a direct effect on costs and performance. The team needs to balance these
two competing factors. Performance monitoring and cost monitoring that's scoped to the team's active
projects are important to provide the necessary visibility.
Accountability: The cloud adoption team needs to be aware of any preset budgets that are associated with
their adoption efforts. When real costs don't align with the budget, there's an opportunity to create
accountability. Accountability doesn't equate to penalizing the adoption team for exceeding budget, because
budget excess can result from necessary performance decisions. Instead, accountability means educating
the team about the goals and how their decisions affect those goals. Additionally, accountability includes
providing a dialog in which the team can communicate about decisions that led to overspending. If those
decisions are misaligned with the goals of the project, this effort provides a good opportunity to partner
with the cloud strategy team to make better decisions.
Optimization: This effort is a balancing act, as optimization of resources can reduce the performance of the
workloads that they support. Sometimes anticipated or budgeted savings can't be realized for a workload
because the workload doesn't perform adequately with the budgeted resources. In those cases, the cloud
adoption team has to make wise decisions and report changes to the cloud strategy team and the cloud
governance team so that budgets or optimization decisions can be corrected.

Cloud governance team


Generally, the cloud governance team is responsible for cost management across the entire cloud-adoption effort.
As outlined in the cost management discipline topic of the Cloud Adoption Framework's governance methodology,
cost management is the first of the Five Disciplines of Cloud Governance. Those articles outline a series of deeper
responsibilities for the cloud governance team.
This effort focuses on the following activities that are related to the development of a cost-conscious organization:
Visibility: The cloud governance team works as a peer of the cloud strategy team to plan cloud-adoption
budgets. These two teams also work together to regularly review actual expenses. The cloud governance
team is responsible for ensuring consistent, reliable cost reporting and performance telemetry.
Accountability: When budget deviations occur, the cloud strategy team and the cloud governance team
must partner to quickly determine the best course of action to remediate the deviations. Generally, the cloud
governance team will act on those decisions. Sometimes the action may be simple retraining for the affected
cloud adoption team. The cloud governance team can also help optimize deployed assets, change
discounting options, or even implement automated cost-control options like blocking deployment of
unplanned assets.
Optimization: After assets are migrated to or created in the cloud, you can employ monitoring tools to
assess performance and utilization of those assets. Proper monitoring and performance data can identify
assets that should be optimized. The cloud governance team is responsible for ensuring that the monitoring
and cost-reporting tools are consistently deployed. They can also help the adoption teams identify
opportunities to optimize based on performance and cost telemetry.

Cloud center of excellence


While not typically responsible for cost management, the CCoE can have a significant impact on cost-conscious
organizations. Many foundational IT decisions affect costs at scale. When the CCoE does their part, costs can be
reduced for multiple cloud-adoption efforts.
Visibility: Any management group or resource group that houses core IT assets should be visible to the
CCoE team. The team can use this data to farm opportunities to optimize.
Accountability: While not typically accountable for cost, the CCoE can hold itself accountable for creating
repeatable solutions that minimize cost and maximize performance.
Optimization: Given the CCoE's visibility to multiple deployments, the team is in an ideal position to
suggest optimization tips and to help adoption teams better tune assets.

Next steps
Practicing these responsibilities at each level of the business helps drive a cost-conscious organization. To begin
acting on this guidance, review the organizational readiness introduction to help identify the right team structures.
Identify the right team structures
Organizational antipatterns: Silos and fiefdoms
13 minutes to read • Edit Online

Success in any major change to business practices, culture, or technology operations requires a growth mindset.
At the heart of the growth mindset is an acceptance of change and the ability to lead in spite of ambiguity.
Some antipatterns can block a growth mindset in organizations that want to grow and transform, including
micromanagement, biased thinking, and exclusionary practices. Many of these blockers are personal challenges
that create personal growth opportunities for everyone. But two common antipatterns in IT require more than
individual growth or maturity: silos and fiefdoms.

These antipatterns are a result of organic changes within various teams, which result in unhealthy organizational
behaviors. To address the resistance caused by each antipattern, it's important to understand the root cause of this
formation.

Healthy, organic IT teams


It's natural to create a division of labor across IT. It's healthy to establish teams that have similar expertise, shared
processes, a common objective, and an aligned vision. It's also natural for those teams to have their own
microculture, shared norms, and perspectives.
Healthy IT teams focus on partnering with other teams to promote the successful completion of their duties.
Healthy IT teams seek to understand the business goals that their technology contribution is designed to support.
The details and fiscal impact might be fuzzy, but the team's value contribution is often understood within the team.
Although healthy IT teams have a passion for the technology that they support, they're open to change and willing
to try new things. Such teams are often the earliest and strongest contributors to cloud center of excellence
(CCoE ) efforts. Their contribution should be heavily encouraged.
Natural resistance to change
At times, the microcultures within healthy IT teams might react poorly to executive or top-down decisions to drive
change. This reaction is natural, as collectives of human beings with shared norms often cooperate to overcome
external threats.
Changes that affect the team's day-to-day jobs, sense of security, or autonomy can be viewed as a risk to the
collective. Signs of resistance are often an early indicator that the team members don't feel like they are part of the
decision-making process.
When cloud architects and other leaders invest in abolishing personal biases and driving for inclusion of the
existing IT teams, this resistance is likely to lessen quickly and dissolve over time. One tool available to cloud
architects and leaders to create inclusive decision making is the formation of a CCoE.
Healthy friction
It's easy to confuse resistance with friction. Existing IT teams can be knowledgeable regarding past mistakes,
tangible risks, tribal knowledge about solutions, and undocumented technical debt. Unfortunately, even the
healthiest IT teams can fall in the trap of describing these important data points as part of a specific technical
solution, which shouldn't be changed. This approach to communication masks the teams' knowledge, creating a
perception of resistance.
Providing these teams with a mechanism for communicating in future-looking terminology will add data points,
identify gaps, and create healthy friction around the proposed solutions. That extra friction will sand down rough
edges on the solutions and drive longer-term values. Simply changing the conversation can create clarity around
complex topics and generate energy to deliver more successful solutions.
The guidance on defining corporate policy is aimed at facilitating risk-based conversations with business
stakeholders. However, this same model can be used to facilitate conversations with teams that are perceived as
cloud resistant. When the perception of resistance is widespread, it might be wise to include resistance resolution
practices in the charter for a cloud governance team.

Antipatterns
The organic and responsive growth within IT that creates healthy IT teams can also result in antipatterns that
block transformation and cloud adoption. IT silos and fiefdoms are different from the natural microcultures within
healthy IT teams. In either pattern, the team focus tends to be directed toward protecting their "turf". When team
members are confronted with an opportunity to drive change and improve operations, they will invest more time
and energy into blocking the change than finding a positive solution.
As mentioned earlier, healthy IT teams can create natural resistance and positive friction. Silos and fiefdoms are a
different challenge. There is no documented leading indicator for either antipattern. These antipatterns tend to be
identified after months of cloud center of excellence and cloud governance team efforts. They're discovered as the
result of ongoing resistance.
Even in toxic cultures, the efforts of the CCoE and the cloud governance team should help drive cultural growth
and technical progress. After months of effort, a few teams might still show no signs of inclusive behaviors and
stand firm in their resistance to change. These teams are likely operating in one of the following antipattern
models: silos and fiefdoms. Although these models have similar symptoms, the root cause and approaches to
addressing resistance is radically different between them.

IT silos
Team members in an IT silo are likely to define themselves through their alignment to a small number of IT
vendors or an area of technical specialization. However, don't confuse IT silos with IT fiefdoms. IT silos tend to be
driven by comfort and passion, and are generally easier to overcome than the fear-driven motives behind
fiefdoms.
This antipattern often emerges from of a common passion for a specific solution. IT silos are then reinforced by
the team's advanced skills as a result of the investment in that specific solution. This superior skill can be an
accelerator to cloud adoption efforts if the resistance to change can be overcome. It can also become a major
blocker if the silos are broken down or if the team members can't accurately evaluate options. Fortunately, IT silos
can often be overcome without any significant changes to the organizational chart.
Address resistance from IT silos
IT silos can be addressed through the following approaches. The best approach will depend on the root cause of
the resistance.
Create virtual teams: The organizational readiness section of the Cloud Adoption Framework describes a
multilayered structure for integrating and defining four virtual teams (v-teams). One benefit of this structure is
cross-organization visibility and inclusion. Introducing a cloud center of excellence creates a high-profile
aspirational team that top engineers will want to participate in. This helps create new cross-solution alignments
that aren't bound by organizational-chart constraints, and will drive inclusion of top engineers who have been
sheltered by IT silos.
Introduction of a cloud strategy team will create immediate visibility to IT contributions regarding cloud adoption
efforts. When IT silos fight for separation, this visibility can help motivate IT and business leaders to properly
support those resistant team members. This process is a quick path to stakeholder engagement and support.
Consider experimentation and exposure: Team members in an IT silo have likely been constrained to think a
certain way for some time. Breaking the one-track mind is a first step to addressing resistance.
Experimentation and exposure are powerful tools for breaking down barriers in silos. The team members might be
resistant to competing solutions, so it's not wise to put them in charge of an experiment that competes with their
existing solution. However, as part of a first workload test of the cloud, the organization should implement
competing solutions. The siloed team should be invited to participate as an input and review source, but not as a
decision maker. This should be clearly communicated to the team, along with a commitment to engage the team
more deeply as a decision maker before moving into production solutions.
During review of the competing solution, use the practices outlined in Define corporate policy to document
tangible risks of the experiment and establish policies that help the siloed team become more comfortable with
the future state. This will expose the team to new solutions and harden the future solution.
Be "boundaryless": The teams that drive cloud adoption find it easy to push boundaries by exploring exciting,
new cloud-native solutions. This is one half of the approach to removing boundaries. However, that thinking can
further reinforce IT silos. Pushing for change too quickly and without respect to existing cultures can create
unhealthy friction and lead to natural resistance.
When IT silos start to resist, it's important to be boundaryless in your own solutions. Be mindful of one simple
truth: cloud-native isn't always the best solution. Consider hybrid solutions that might provide an opportunity to
extend the existing investments of the IT silo into the future.
Also consider cloud-based versions of the solution that the IT silo team uses now. Experiment with those solutions
and expose yourself to the viewpoint of those living in the IT silo. At a minimum, you will gain a fresh perspective.
In many situations, you might earn enough of the IT silo's respect to lessen resistance.
Invest in education: Many people living in an IT silo became passionate about the current solution as a result of
expanding their own education. Investing in the education of these teams is seldom misplaced. Allocate time for
these individuals to engage in self-learning, classes, or even conferences to break the day-to-day focus on the
current solution.
For education to be an investment, some return must come as a result of the expense. In exchange for the
investment, the team might demonstrate the proposed solution to the rest of the teams involved in cloud
adoption. They might also provide documentation of the tangible risks, risk management approaches, and desired
policies in adopting the proposed solution. Each will engage these teams in the solution and help take advantage
of their tribal knowledge.
Turn roadblocks into speed bumps: IT silos can slow or stop any transformation. Experimentation and iteration
will find a way, but only if the project keeps moving. Focus on turning roadblocks into merely speed bumps.
Define policies that everyone can be temporarily comfortable with in exchange for continued progression.
For instance, if IT security is the roadblock because its security solution can't monitor compromises of protected
data in the cloud, establish data classification policies. Prevent deployment of classified data into the cloud until an
agreeable solution can be found. Invite IT security into experimentation with hybrid or cloud-native solutions to
monitor protected data.
If the network team operates as a silo, identify workloads that are self-contained and don't have network
dependencies. In parallel, experiment, expose, and educate the network team while working on hybrid or
alternative solutions.
Be patient and be inclusive: It's tempting to move on without support of an IT silo. But this decision will cause
disruptions and roadblocks down the road. Changing minds in members of the IT silo can take time. Be patient of
their natural resistance--convert it to value. Be inclusive and invite healthy friction to improve the future solution.
Never compete: The IT silo exists for a reason. It persists for a reason. There is an investment in maintaining the
solution that the team members are passionate about. Directly competing with the solution or the IT silo will
distract from the real goal of achieving business outcomes. This trap has blocked many transformation projects.
Stay focused on the goal, as opposed to a single component of the goal. Help accentuate the positive aspects of
the IT silo's solution and help the team members make wise decisions about the best solutions for the future.
Don't insult or degrade the current solution, because that would be counterproductive.
Partner with the business: If the IT silo isn't blocking business outcomes, why do you care? There is no perfect
solution or perfect IT vendor. Competition exists for a reason; each has its own benefits.
Embrace diversity and include the business by supporting and aligning to a strong cloud strategy team. When an
IT silo supports a solution that blocks business outcomes, it will be easier to communicate that roadblock without
the noise of technical squabbles. Supporting nonblocking IT silos will show an ability to partner for the desired
business outcomes. These efforts will earn more respect and greater support from the business when an IT silo
presents a legitimate blocker.

IT fiefdoms
Team members in an IT fiefdom are likely to define themselves through their alignment to a specific process or
area of responsibility. The team operates under an assumption that external influence on its area of responsibility
will lead to problems. Fiefdoms tend to be a fear-driven antipattern, which will require significant leadership
support to overcome.
Fiefdoms are especially common in organizations that have experienced IT downsizing, frequent turbulence in IT
staff, or poor IT leadership. When the business sees IT purely as a cost center, fiefdoms are much more likely to
arise.
Generally, fiefdoms are the result of a line manager who fears loss of the team and the associated power base.
These leaders often have a sense of duty to their team and feel a need to protect their subordinates from negative
consequences. Phrases like "shelter the team from change" and "protect the team from process disruption" can be
indicators of an overly guarded manager who might need more support from leadership.
Address resistance from IT fiefdoms
IT fiefdoms can demonstrate some growth by following the approaches to addressing IT silo resistance. Before
you try to address resistance from an IT fiefdom, we recommend that you treat the team like an IT silo first. If
those types of approaches fail to yield any significant change, the resistant team might be suffering from an IT
fiefdom antipattern. The root cause of IT fiefdoms is a little more complex to address, because that resistance
tends to come from the direct line manager (or a leader higher up the organizational chart). Challenges that are IT
silo-driven are typically simpler to overcome.
When continued resistance from IT fiefdoms blocks cloud adoption efforts, it might be wise for a combined effort
to evaluate the situation with existing IT leaders. IT leaders should carefully consider insights from the cloud
strategy team, cloud center of excellence, and cloud governance team before making decisions.
NOTE
IT leaders should never take changes to the organizational chart lightly. They should also validate and analyze feedback from
each of the supporting teams. However, transformational efforts like cloud adoption tend to magnify underlying issues that
have gone unnoticed or unaddressed long before this effort. When fiefdoms are preventing the company's success,
leadership changes are a likely necessity.
Fortunately, removing the leader of a fiefdom doesn't often end in termination. These strong, passionate leaders can often
move into a management role after a brief period of reflection. With the right support, this change can be healthy for the
leader of the fiefdom and the current team.

Cau t i on

For managers of IT fiefdoms, protecting the team from risk is a clear leadership value. However, there's a fine line
between protection and isolation. When the team is blocked from participating in driving changes, it can have
psychological and professional consequences on the team. The urge to resist change might be strong, especially
during times of visible change.
The manager of any isolated team can best demonstrate a growth mindset by experimenting with the guidance
associated with healthy IT teams in the preceding sections. Active and optimistic participation in governance and
CCoE activities can lead to personal growth. Managers of IT fiefdoms are best positioned to change stifling
mindsets and help the team develop new ideas.
IT fiefdoms can be a sign of systemic leadership issues. To overcome an IT fiefdom, IT leaders need the ability to
make changes to operations, responsibilities, and occasionally even the people who provide line management of
specific teams. When those changes are required, it's wise to approach those changes with clear and defensible
data points.
Alignment with business stakeholders, business motivations, and business outcomes might be required to drive
the necessary change. Partnership with the cloud strategy team, cloud center of excellence, and cloud governance
team can provide the data points needed for a defensible position. When necessary, these teams should be
involved in a group escalation to address challenges that can't be addressed with IT leadership alone.

Next steps
Disrupting organizational antipatterns is a team effort. To act on this guidance, review the organizational readiness
introduction to identify the right team structures and participants:
Identify the right team structures and participants
The cloud fundamentally changes how enterprises procure and use technology resources. Traditionally, enterprises assumed
ownership and responsibility of all aspects of technology, from infrastructure to software. The cloud allows enterprises to provision
and to consume resources only as needed. However, cloud adoption is a means to an end. Businesses adopt the cloud when they
realize it can address any of these business opportunities:
Businesses are motivated to migrate to the cloud to:
Optimize operations
Simplify technology
Increase business agility
Reduce costs
Prepare for new technical capabilities
Scaling to market demands or new geographical regions
Businesses are motivated to innovate using the cloud to:
Improve customer experiences
Increase customer engagements
Transform products
Prepare for and build new technical capabilities
Scale to market demands or new geographical regions

Vision and objectives


Removing key obstacles and enabling change requires more than implementation guidance. The Cloud Adoption Framework is a
set of documentation, implementation guidance, best practices, and tools that help align strategies for business, culture, and
technology to enable the desired business outcomes. Its modular structure guides the customer through their cloud journey. The
Cloud Adoption Framework can stand by itself and provide self-service structured guidance for customers. The Cloud Adoption
Framework builds on existing guidance whenever possible to meet the following objectives:
Technical strategy objective: Establish scalable technical strategies, beyond a minimum viable product, so customers can
easily customize and adapt to meet their needs and address common constraints.
Business strategy objective: Without defining business strategy in this guidance, we will help architects understand,
document, and communicate the business strategy so the right decisions can be made.
Culture strategy objective: While not providing deep guidance on facilitating culture or organizational change, we will
provide methodologies, scenarios, and questions that will help identify and remove cultural roadblocks to the technical strategy.

Fulfilling the vision


The Cloud Adoption Framework is an overarching framework that covers Plan, Ready, and Adopt phases across the Migration and
Innovation motivations for cloud adoption, supported by Governance and Operations guidance.
The framework has reached general availability (GA). However, we are still actively building this framework in collaboration with
customers, partners, and internal teams. To encourage partnership, content is released as it becomes available. These public
releases enable testing, validating, and incrementally refining the guidance.
To successfully adopting the cloud, a customer must prepare its people, technologies, and processes for this digital transformation.
The Cloud Adoption Framework includes a section outlining the overall adoption journeys, both Migration and Innovation, as an
overview for the customers. This section is composed of the following adoption journey phases:
Plan: Align business outcomes to actionable technology backlogs. This phase consists of three areas of early stage planning
activities:
Define the business justification and business outcomes.
Prioritize workloads based on impacts to the business outcomes.
Create a cloud adoption plan based on the current digital estate and prioritized workloads.
Ready: Prepare the people, culture, and environment for change. This phase has three key components:
Create a cloud strategy team and other organizational alignment.
Create a skills readiness plan across roles and functions.
Establish an Azure foundation by preparing the cloud environment.
Adopt: Implement the desired changes across IT and business processes to help customers realize their business, technology,
and people strategies. This phase includes several areas that will vary depending on what the organization is implementing:
Migration of workloads and assets.
Apps and data modernization.
Cloud governance.
Management and operation of assets and workloads in the cloud.
These phases are not linear and will likely become a cycle, with customers revisiting and expanding adoption plans as their people
become more cloud proficient and their apps and workloads are properly managed and operated, aligned with corporate policies,
and deliver on their business outcomes.
Establish an operating model for the cloud
3 minutes to read • Edit Online

Cloud adoption is an iterative effort focusing on what you do in the cloud. The cloud strategy outlines the digital
transformation to guide business programs, as various teams execution adoption projects. Planning and Readiness
help ensure the success of each of those important elements. All steps of cloud adoption equate to tangible
projects with manageable objectives, timelines, and budgets.
These adoption efforts are relatively easy to track and measure, even when they involve multiple projected
iterations and releases. Each phase of the adoption lifecycle is important. Each phase is prone to potential
roadblocks across business, culture, and technology constraints. But, each phase depends heavily on the underlying
operating model.
If adoption describes what you are doing, the operating model defines the underlying who and how
that enable adoption.
Satya Nadella said "Culture eats strategy for breakfast". The operating model is the embodiment of the IT
culture, captured in a number of measurable processes. When the cloud is powered by a strong operating model,
the culture will drive to the strategy, accelerating adoption and business values realization. Conversely, when
adoption is successful but there is no operating model, the returns can be impressive but very short lived. For long
term success it is vital that adoption and operating models advance in parallel.

Establish your operating model


Current operating models can scale to support adoption of the cloud. A modern operating model will help you
remove nontechnical blockers to cloud adoption.
This section of the Cloud Adoption Framework provides an actionable operating model to guide nontechnical
decisions. This operating model consists of three methodologies to aid in creating your own cloud operating
model:
Govern: Ensure consistency across adoption efforts. Align to governance or compliance requirements to
maintain a well-managed cross-cloud environment.
Manage: Align ongoing processes for operational management of the technology to maximize value attainment
and minimize disruptions.
Organize: As the operating model matures, so will the organization of various teams and capabilities
supporting the operating model.

Align operating models


The cloud and the digital economy have exposed the need for multiple operating models. Sometimes this need is
driven by a requirement to support multiple public clouds. More commonly, the need is highlighted by the
transition from on-premises to the cloud. In either scenario, it's important to align operating models for maximum
performance and minimum redundancy.
Analysts are predicting multicloud adoption at high volumes. Many customers are moving toward that prediction.
Unfortunately, customers are reporting significant challenges operating multiple clouds. Duplicated resources,
processes, skills, and technologies is resulting in increased costs, not the savings promised by the cloud predictions.
To avoid this trend, customers should adopt a specialized operating model. When aligning operating models, there
should always be one general operating model. Additional specialized operating models would be used for
specific scenarios to support deviations from the standard model.
General operating model: The general operating model aligns to a single public or private cloud platform.
Operations of that platform defines operational standards, policies, and processes. This operating model
should be the primary means of powering the go-forward cloud strategy. In this model, the goal is to use
the primary cloud provider for the bulk of cloud adoption.
Specialized operating model: Specific business outcomes may be a better fit for an alternative cloud
provider. When a compelling business case is present, the standards, policies, and processes from the
general operating model are applied to the new cloud provider but are then modified to fit the specialized
use case.
If Azure is the primary platform of choice, the guides and best practices in each of the operating model sections
listed above will prove valuable in the creation of your operating model. But this framework recognizes that not all
of our readers have committed to Azure as the primary platform. To accommodate this broader audience, the
theory content in each section can be applied to public or private cloud operating models with similar outcomes.

Next steps
Governance is a common first step toward establishing an operating model for the cloud.
Learn about cloud governance
Operating model terminology
2 minutes to read • Edit Online

The term operating model has many definitions. This intro article establishes terminology associated with
operating models. To understand an operating model as it relates to the cloud, we first have to understand how an
operating model fits into the bigger theme of corporate planning.

Terms
Business model: Business models tend to define corporate value ("what" the business does to provide value) and
mission/vision statements ("why" the business has chosen to add value in that way). At a minimum, business
models should be able to represent the "what" and "why" in the form of financial projections. There are many
different schools of thought regarding how far a business model goes beyond these basic leadership principles.
However, to create a sound operating model, the business models should include high-level statements to establish
directional goals. It's even more effective if those goals can be represented in metrics or KPIs to track progress.
Customer experience: All good business models ground the "why" side of a business's strategy in the experience
of their customers. This process could involve a customer acquiring a product or service. It could include
interactions between a company and its business customers. Another example could center around the long-term
management of a customer's financial or health needs, as opposed to a single transaction or process. Regardless of
the type of experience, the majority of successful companies realize that they exist to operate and improve the
experiences that drive their "why" statements.
Digital transformation: Digital transformation has become an industry buzzword. However, it is a vital
component in the fulfillment of modern business models. Since the advent of the smartphone and other portable
computing form factors, customer experiences have become increasingly digital. This shift is painfully obvious in
some industries like DVD rentals, print media, automotive, or retail. In each case, digitized experiences have had a
significant impact on the customer experience. In some cases, physical media have been entirely replaced with
digital media, upsetting the entire industry vertical. In others, digital experiences are seen as a standard
augmentation of the experience. To deliver business value ("what" statements), the customer experience ("why"
statements) must factor in the impact of digital experiences on the customers' experiences. This process is digital
transformation. Digital transformation is seldom the entire "why" statement in a business strategy, but it is an
important aspect.
Operating model: If the business model represents the "what" and "why", then an operating model represents the
"how" and "who" for operationalizing the business strategy. The operating model defines the ways in which people
work together to accomplish the large goals outlined in the business strategy. Operating models are often
described as the people, process, and technology behind the business strategy. In the article on the Cloud Adoption
Framework operating model, this concept is explained in detail.
Cloud adoption: As stated above, digital transformation is an important aspect of the customer experience and
the business model. Likewise, cloud adoption is an important aspect of any operating model. Cloud adoption is a
strong enabler to deliver the right technologies and processes required to successfully deliver on the modern
operating model.
Cloud adoption is "what we do" to realize the business value. The operating model represents "who we are and
how we function on a daily basis" while cloud adoption is being delivered.

Next steps
Leverage the operating model provided by the Cloud Adoption Framework to develop operational maturity.
Leverage the operating model
Architectural decision guides
2 minutes to read • Edit Online

The architectural decision guides in the Cloud Adoption Framework describe patterns and models that help
when creating cloud governance design guidance. Each decision guide focuses on one core infrastructure
component of cloud deployments and lists patterns and models that can support specific cloud deployment
scenarios.
When you begin to establish cloud governance for your organization, actionable governance journeys provide a
baseline roadmap. However, these journeys make assumptions about requirements and priorities that might not
reflect those of your organization.
These decision guides supplement the sample governance journeys by providing alternative patterns and
models that help you align the architectural design choices made in the example design guidance with your own
requirements.

Decision guidance categories


Each of the following categories represents a foundational technology of all cloud deployments. The sample
governance journeys make design decisions related to these technologies based on the needs of example
businesses, and some of these decisions might not match your own organization's needs. The sections below
discuss alternative options for each of these categories, allowing you to choose a pattern or model better suited
to your requirements.
Subscriptions: Plan your cloud deployment's subscription design and account structure to match your
organization's ownership, billing, and management capabilities.
Identity: Integrate cloud-based identity services with your existing identity resources to support authorization
and access control within your IT environment.
Policy Enforcement: Define and enforce organizational policy rules for cloud-deployed resources and workloads
that align with your governance requirements.
Resource Consistency: Ensure that deployment and organization of your cloud-based resources align to enforce
resource management and policy requirements.
Resource Tagging: Organize your cloud-based resources to support billing models, cloud accounting
approaches, management, and to optimize resource utilization and cost. Resource tagging requires a consistent
and well-organized naming and metadata scheme.
Software Defined Networking: Deploy secure workloads to the cloud using rapid deployment and modification
of virtualized networking capabilities. Software-defined networks (SDNs) can support agile workflows, isolate
resources, and integrate cloud-based systems with your existing IT infrastructure.
Encryption: Secure your sensitive data using encryption to align with your organization's compliance and
security policy requirements.
Logging and Reporting: Monitor log data generated by cloud-based resources. Analyzing data provides health-
related insights into the operations, maintenance, and compliance status of workloads.
Regional Guidance: A discussion on the appropriate decision criteria for regional placement of resources within
the Azure platform.
Next steps
Learn how subscriptions and accounts serve as the base of a cloud deployment.
Subscriptions design
Subscription decision guide
4 minutes to read • Edit Online

Effective subscription design helps organizations establish a structure to organize assets in Azure during a cloud
adoption.
Each resource in Azure, such as a virtual machine or a database, is associated with a subscription. Adopting
Azure begins by creating an Azure subscription, associating it with an account, and deploying resources to the
subscription. For an overview of these concepts, see Azure fundamental concepts.
As your digital estate in Azure grows, you will likely need to create additional subscriptions to meet your
requirements. Azure allows you to define a hierarchy of management groups to organize your subscriptions and
easily apply the right policy to the right resources. For more information, see Scaling with multiple Azure
subscriptions.
Some basic examples of using management groups to separate different workloads include:
Production vs. nonproduction workloads: Some enterprises create management groups to separate their
production and nonproduction subscriptions. Management groups allow these customers to more easily
manage roles and policies. For example, a nonproduction subscription may allow developers contributor
access, but in production, they have only reader access.
Internal services vs. external services: Much like production versus nonproduction workloads, enterprises
often have different requirements, policies, and roles for internal services versus external customer-facing
services.
This decision guide helps you consider different approaches to organizing your management group hierarchy.

Subscription design patterns


Because every organization is different, Azure management groups are designed to be flexible. Modeling your
cloud estate to reflect your organization's hierarchy helps you define and apply policies at higher levels of the
hierarchy, and rely on inheritance to ensure that those policies are automatically applied to management groups
lower in the hierarchy. Although subscriptions can be moved between different management groups, it is helpful
to design an initial management group hierarchy that reflects your anticipated organizational needs.
Before finalizing your subscription design, also consider how resource consistency considerations might
influence your design choices.

NOTE
Azure Enterprise Agreements (EAs) allows you to define another organizational hierarchy for billing purposes. This
hierarchy is distinct from your management group hierarchy, which focuses on providing an inheritance model for easily
applying suitable policies and access control to your resources.

The following subscription patterns reflect an initial increase in subscription design sophistication, followed by
several more advanced hierarchies that may align well to your organization:
Single subscription
A single subscription per account may suffice for organizations that need to deploy a small number of cloud-
hosted assets. This is the first subscription pattern you'll implement when beginning your cloud adoption
process, allowing small-scale experimental or proof-of-concept deployments to explore the capabilities of the
cloud.
Production-and-nonproduction pattern
When you're ready to deploy a workload to a production environment, you should add an additional
subscription. This helps you keep your production data and other assets out of your dev/test environments. You
can also easily apply two different sets of policies across the resources in the two subscriptions.

Workload separation pattern


As an organization adds new workloads to the cloud, different ownership of subscriptions or basic separation of
responsibility may result in multiple subscriptions in both the production and nonproduction management
groups. While this approach does provide basic workload separation, it doesn't take significant advantage of the
inheritance model to automatically apply policies across a subset of your subscriptions.

Application category pattern


As an organization's cloud footprint grows, additional subscriptions are typically created to support applications
with fundamental differences in business criticality, compliance requirements, access controls, or data protection
needs. Building from the production-and-nonproduction subscription pattern, the subscriptions supporting
these application categories are organized under either the production or nonproduction management group as
applicable. These subscriptions are typically owned and administered by central IT operations staff.
Each organization will categorize their applications differently, often separating subscriptions based on specific
applications or services or along the lines of application archetypes. This categorization is often designed to
support workloads that are likely to consume most of the resource limits of a subscription, or separate mission-
critical workloads to ensure they aren't competing with other workloads under these limits. Some workloads that
might justify a separate subscription under this pattern include:
Mission-critical workloads.
Applications which are part of "Cost of Goods Sold" (COGS ) within your company. Example: every instance
of Company X's widget contains an Azure IoT module that sends telemetry. This may necessitate a dedicated
subscription for accounting/governance purposes as part of COGS.
Applications subject to regulatory requirements such as HIPAA or FedRAMP.
Functional pattern
The functional pattern organizes subscriptions and accounts along functional lines, such as finance, sales, or IT
support, using a management group hierarchy.
Business unit pattern
The business unit pattern groups subscriptions and accounts based on profit and loss category, business unit,
division, profit center, or similar business structure using a management group hierarchy.
Geographic pattern
For organizations with global operations, the geographic pattern groups subscriptions and accounts based on
geographic regions using a management group hierarchy.

Mixed patterns
Management group hierarchies can be up to six levels deep. This provides you with the flexibility to create a
hierarchy that combines several of these patterns to meet your organizational needs. For example, the diagram
below shows an organizational hierarchy that combines a business unit pattern with a geographic pattern.
Related resources
Resource access management in Azure
Multiple layers of governance in large enterprises
Multiple geographic regions

Next steps
Subscription design is just one of the core infrastructure components requiring architectural decisions during a
cloud adoption process. Visit the decision guides overview to learn about alternative patterns or models used
when making design decisions for other types of infrastructure.
Architectural decision guides
Identity decision guide
7 minutes to read • Edit Online

In any environment, whether on-premises, hybrid, or cloud-only, IT needs to control which administrators,
users, and groups have access to resources. Identity and access management (IAM ) services enable you to
manage access control in the cloud.

Jump to: Determine Identity Integration Requirements | Cloud baseline | Directory Synchronization | Cloud
hosted domain services | Active Directory Federation Services | Learn more
Several options are available for managing identity in a cloud environment. These options vary in cost and
complexity. A key factor in structuring your cloud-based identity services is the level of integration required with
your existing on-premises identity infrastructure.
In Azure, Azure Active Directory (Azure AD ) provides a base level of access control and identity management
for cloud resources. However, if your organization's on-premises Active Directory infrastructure has a complex
forest structure or customized organizational units (OUs), your cloud-based workloads might require directory
synchronization with Azure AD for a consistent set of identities, groups, and roles between your on-premises
and cloud environments. Additionally, support for applications that depend on legacy authentication
mechanisms might require the deployment of Active Directory Domain Services (AD DS ) in the cloud.
Cloud-based identity management is an iterative process. You could start with a cloud-native solution with a
small set of users and corresponding roles for an initial deployment. As your migration matures, you might
need to integrate your identity solution using directory synchronization or add domains services as part of your
cloud deployments. Revisit your identity strategy in every iteration of your migration process.

Determine identity integration requirements


DIRECTORY CLOUD-HOSTED ACTIVE DIRECTORY
QUESTION CLOUD BASELINE SYNCHRONIZATION DOMAIN SERVICES FEDERATION SERVICES

Do you currently lack Yes No No No


an on-premises
directory service?
DIRECTORY CLOUD-HOSTED ACTIVE DIRECTORY
QUESTION CLOUD BASELINE SYNCHRONIZATION DOMAIN SERVICES FEDERATION SERVICES

Do your workloads No Yes No No


need to use a
common set of users
and groups between
the cloud and on-
premises
environment?

Do your workloads No No Yes Yes


depend on legacy
authentication
mechanisms, such as
Kerberos or NTLM?

Do you require single No No No Yes


sign-on across
multiple identity
providers?

As part of planning your migration to Azure, you will need to determine how best to integrate your existing
identity management and cloud identity services. The following are common integration scenarios.
Cloud baseline
Azure AD is the native Identity and Access Management (IAM ) system for granting users and groups access to
management features on the Azure platform. If your organization lacks a significant on-premises identity
solution, and you plan on migrating workloads to be compatible with cloud-based authentication mechanisms,
you should begin developing your identity infrastructure using Azure AD as a base.
Cloud baseline assumptions: Using a purely cloud-native identity infrastructure assumes the following:
Your cloud-based resources will not have dependencies on on-premises directory services or Active
Directory servers, or workloads can be modified to remove those dependencies.
The application or service workloads being migrated either support authentication mechanisms compatible
with Azure AD or can be modified easily to support them. Azure AD relies on internet-ready authentication
mechanisms such as SAML, OAuth, and OpenID Connect. Existing workloads that depend on legacy
authentication methods using protocols such as Kerberos or NTLM might need to be refactored before
migrating to the cloud using the cloud baseline pattern.

TIP
Completely migrating your identity services to Azure AD eliminates the need to maintain your own identity infrastructure,
significantly simplifying your IT management.
However, Azure AD is not a full replacement for a traditional on-premises Active Directory infrastructure. Directory
features such as legacy authentication methods, computer management, or group policy might not be available without
deploying additional tools or services to the cloud.
For scenarios where you need to integrate your on-premises identities or domain services with your cloud deployments,
see the directory synchronization and cloud-hosted domain services patterns discussed below.

Directory synchronization
For organizations with existing on-premises Active Directory infrastructure, directory synchronization is often
the best solution for preserving existing user and access management while providing the required IAM
capabilities for managing cloud resources. This process continuously replicates directory information between
Azure AD and on-premises directory services, allowing common credentials for users and a consistent identity,
role, and permission system across your entire organization.
Note: Organizations that have adopted Office 365 might have already implemented directory synchronization
between their on-premises Active Directory infrastructure and Azure Active Directory.
Directory synchronization assumptions: Using a synchronized identity solution assumes the following:
You need to maintain a common set of user accounts and groups across your cloud and on-premises IT
infrastructure.
Your on-premises identity services support replication with Azure AD.

TIP
Any cloud-based workloads that depend on legacy authentication mechanisms provided by on-premises Active Directory
servers and that are not supported by Azure AD will still require either connectivity to on-premises domain services or
virtual servers in the cloud environment providing these services. Using on-premises identity services also introduces
dependencies on connectivity between the cloud and on-premises networks.

Cloud-hosted domain services


If you have workloads that depend on claims-based authentication using legacy protocols such as Kerberos or
NTLM, and those workloads cannot be refactored to accept modern authentication protocols such as SAML or
OAuth and OpenID Connect, you might need to migrate some of your domain services to the cloud as part of
your cloud deployment.
This pattern involves deploying virtual machines running Active Directory to your cloud-based virtual networks
to provide Active Directory Domain Services (AD DS ) for resources in the cloud. Any existing applications and
services migrating to your cloud network should be able to use these cloud-hosted directory servers with minor
modifications.
It's likely that your existing directories and domain services will continue to be used in your on-premises
environment. In this scenario, it's recommended that you also use directory synchronization to provide a
common set of users and roles in both the cloud and on-premises environments.
Cloud-hosted domain services assumptions: Performing a directory migration assumes the following:
Your workloads depend on claims-based authentication using protocols like Kerberos or NTLM.
Your workload virtual machines need to be domain-joined for management or application of Active
Directory group policy purposes.

TIP
While a directory migration coupled with cloud-hosted domain services provides great flexibility when migrating existing
workloads, hosting virtual machines within your cloud virtual network to provide these services does increase the
complexity of your IT management tasks. As your cloud migration experience matures, examine the long-term
maintenance requirements of hosting these servers. Consider whether refactoring existing workloads for compatibility
with cloud identity providers such as Azure Active Directory can reduce the need for these cloud-hosted servers.

Active Directory Federation Services


Identity federation establishes trust relationships across multiple identity management systems to allow
common authentication and authorization capabilities. You can then support single sign-on capabilities across
multiple domains within your organization or identity systems managed by your customers or business
partners.
Azure AD supports federation of on-premises Active Directory domains using Active Directory Federation
Services (AD FS ). See the reference architecture Extend AD FS to Azure to see how this can be implemented in
Azure.

Learn more
For more information about identity services in Azure, see:
Azure AD. Azure AD provides cloud-based identity services. It allows you to manage access to your Azure
resources and control identity management, device registration, user provisioning, application access control,
and data protection.
Azure AD Connect. The Azure AD Connect tool allows you to connect Azure AD instances with your existing
identity management solutions, allowing synchronization of your existing directory in the cloud.
Role-based access control (RBAC ). Azure AD provides RBAC to efficiently and securely manage access to
resources in the management plane. Jobs and responsibilities are organized into roles, and users are
assigned to these roles. RBAC allows you to control who has access to a resource along with which actions a
user can perform on that resource.
Azure AD Privileged Identity Management (PIM ). PIM lowers the exposure time of resource access
privileges and increases your visibility into their use through reports and alerts. It limits users to taking on
their privileges "just in time" (JIT), or by assigning privileges for a shorter duration, after which privileges are
revoked automatically.
Integrate on-premises Active Directory domains with Azure Active Directory. This reference architecture
provides an example of directory synchronization between on-premises Active Directory domains and Azure
AD.
Extend Active Directory Domain Services (AD DS ) to Azure. This reference architecture provides an example
of deploying AD DS servers to extend domain services to cloud-based resources.
Extend Active Directory Federation Services (AD FS ) to Azure. This reference architecture configures Active
Directory Federation Services (AD FS ) to perform federated authentication and authorization with your
Azure AD directory.

Next steps
Identity is just one of the core infrastructure components requiring architectural decisions during a cloud
adoption process. Visit the decision guides overview to learn about alternative patterns or models used when
making design decisions for other types of infrastructure.
Architectural decision guides
Policy enforcement decision guide
3 minutes to read • Edit Online

Defining organizational policy is not effective unless it can be enforced across your organization. A key aspect of
planning any cloud migration is determining how best to combine tools provided by the cloud platform with
your existing IT processes to maximize policy compliance across your entire cloud estate.

Jump to: Baseline best practices | Policy compliance monitoring | Policy enforcement | Cross-organization policy |
Automated enforcement
As your cloud estate grows, you will be faced with a corresponding need to maintain and enforce policy across a
larger array of resources, and subscriptions. As your estate gets larger and your organization's policy
requirements increase, the scope of your policy enforcement processes needs to expand to ensure consistent
policy adherence and fast violation detection.
Platform-provided policy enforcement mechanisms at the resource or subscription level are usually sufficient for
smaller cloud estates. Larger deployments justify a larger enforcement scope and may need to take advantage of
more sophisticated enforcement mechanisms involving deployment standards, resource grouping and
organization, and integrating policy enforcement with your logging and reporting systems.
The primary factors in determining the scope of your policy enforcement processes is your organization's cloud
governance requirements, the size and nature of your cloud estate, and how your organization is reflected in your
subscription design. An increase in size of your estate or a greater need to centrally manage policy enforcement
can both justify an increase in enforcement scope.

Baseline best practices


For single subscription and simple cloud deployments, many corporate policies can be enforced using features
that are native to resources and subscriptions in Azure. The consistent use of the patterns discussed throughout
the Cloud Adoption Framework decision guides can help establish a baseline level of policy compliance without
specific investment in policy enforcement. These features include:
Deployment templates can provision resources with standardized structure and configuration.
Tagging and naming standards can help organize operations and support accounting and business
requirements.
Traffic management and networking restrictions can be implemented through Software Defined Networking.
Role-based access control can secure and isolate your cloud resources.
Start your cloud policy enforcement planning by examining how the application of the standard patterns
discussed throughout these guides can help meet your organizational requirements.

Policy compliance monitoring


A first step beyond simply relying on the policy enforcement mechanisms provided by the Azure platform is
ensuring ability to verify cloud-based applications and services comply with organizational policy. This includes
implementing notification capabilities for alerting responsible parties if a resource becomes noncompliant.
Effective logging and reporting of the compliance status of your cloud workloads is a critical part of a corporate
policy enforcement strategy.
As your cloud estate grows, additional tools such as Azure Security Center can provide integrated security and
threat detection, and help apply centralized policy management and alerting for both your on-premises and
cloud assets.

Policy enforcement
In Azure, you can apply configuration settings and resource creation rules at the management group,
subscription, or resource group level to help ensure policy alignment.
Azure Policy is an Azure service for creating, assigning, and managing policies. These policies enforce different
rules and effects over your resources, so those resources stay compliant with your corporate standards and
service level agreements. Azure Policy evaluates your resources for noncompliance with assigned policies. For
example, you might want to limit the SKU size of virtual machines in your environment. After implementing a
corresponding policy, new and existing resources are evaluated for compliance. With the right policy, existing
resources can be brought into compliance.

Cross-organization policy
As your cloud estate grows to span many subscriptions that require enforcement, you will need to focus on a
cloud-estate-wide enforcement strategy to ensure policy consistency.
Your subscription design must account for policy in relation to your organizational structure. In addition to
helping support complex organization within your subscription design, Azure management groups can be used
to assign Azure Policy rules across multiple subscriptions.

Automated enforcement
While standardized deployment templates are effective at a smaller scale, Azure Blueprints allows large-scale
standardized provisioning and deployment orchestration of Azure solutions. Workloads across multiple
subscriptions can be deployed with consistent policy settings for any resources created.
For IT environments integrating cloud and on-premises resources, you may need use logging and reporting
systems to provide hybrid monitoring capabilities. Your third-party or custom operational monitoring systems
may offer additional policy enforcement capabilities. For larger or more mature cloud estates, consider how best
to integrate these systems with your cloud assets.

Next steps
Policy enforcement is just one of the core infrastructure components requiring architectural decisions during a
cloud adoption process. Visit the decision guides overview to learn about alternative patterns or models used
when making design decisions for other types of infrastructure.
Architectural decision guides
Resource consistency decision guide
5 minutes to read • Edit Online

Azure subscription design defines how you organize your cloud assets in relation to your organization's
structure, accounting practices, and workload requirements. In addition to this level of structure, addressing
your organizational governance policy requirements across your cloud estate requires the ability to
consistently organize, deploy, and manage resources within a subscription.

Jump to: Basic grouping | Deployment consistency | Policy consistency | Hierarchical consistency | Automated
consistency
Decisions regarding the level of your cloud estate's resource consistency requirements are primarily driven by
these factors: post-migration digital estate size, business or environmental requirements that don't fit neatly
within your existing subscription design approaches, or the need to enforce governance over time after
resources have been deployed.
As these factors increase in importance, the benefits of ensuring consistent deployment, grouping, and
management of cloud-based resources becomes more important. Achieving more advanced levels of resource
consistency to meet increasing requirements requires more effort spent in automation, tooling, and
consistency enforcement, and this results in additional time spent on change management and tracking.

Basic grouping
In Azure, resource groups are a core resource organization mechanism to logically group resources within a
subscription.
Resource groups act as containers for resources with a common lifecycle as well as shared management
constraints such as policy or role-based access control (RBAC ) requirements. Resource groups can't be nested,
and resources can only belong to one resource group. All control plane actions act on all resources in a
resource group. For example, deleting a resource group also deletes all resources within that group. The
preferred pattern for resource group management is to consider:
1. Are the contents of the resource group developed together?
2. Are the contents of the resource group managed, updated, and monitored together and done so by the
same people or teams?
3. Are the contents of the resource group retired together?
If you answered NO to any of the above points, the resource in question should be placed elsewhere, in
another resource group.

IMPORTANT
Resource groups are also region specific; however, it is common for resources to be in different regions within the same
resource group because they are managed together as described above. For more information on region selection, see
the Regions decision guide.

Deployment consistency
Building on top of the base resource grouping mechanism, the Azure platform provides a system for using
templates to deploy your resources to the cloud environment. You can use templates to create consistent
organization and naming conventions when deploying workloads, enforcing those aspects of your resource
deployment and management design.
Azure Resource Manager templates allow you to repeatedly deploy your resources in a consistent state using a
predetermined configuration and resource group structure. Resource Manager templates help you define a set
of standards as a basis for your deployments.
For example, you can have a standard template for deploying a web server workload that contains two virtual
machines as web servers combined with a load balancer to distribute traffic between the servers. You can then
reuse this template to create structurally identical set of virtual machines and load balancer whenever this type
of workload is needed, only changing the deployment name and IP addresses involved.
You can also programmatically deploy these templates and integrate them with your CI/CD systems.

Policy consistency
To ensure that governance policies are applied when resources are created, part of resource grouping design
involves using a common configuration when deploying resources.
By combining resource groups and standardized Resource Manager templates, you can enforce standards for
what settings are required in a deployment and what Azure Policy rules are applied to each resource group or
resource.
For example, you may have a requirement that all virtual machines deployed within your subscription connect
to a common subnet managed by your central IT team. You can create a standard template for deploying
workload VMs to create a separate resource group for the workload and deploy the required VMs there. This
resource group would have a policy rule to only allow network interfaces within the resource group to be
joined to the shared subnet.
For a more in-depth discussion of enforcing your policy decisions within a cloud deployment, see Policy
enforcement.

Hierarchical consistency
Resource groups allow you to support additional levels of hierarchy within your organization within the
subscription, applying Azure Policy rules and access controls at a resource group level. However, As the size of
your cloud estate grows, you may need to support more complicated cross-subscription governance
requirements than can be supported using the Azure Enterprise Agreement's
Enterprise/Department/Account/Subscription hierarchy.
Azure management groups allow you to organize subscriptions into more sophisticated organizational
structures by grouping subscriptions in a hierarchy distinct from your enterprise agreement's hierarchy. This
alternate hierarchy allows you to apply access control and policy enforcement mechanisms across multiple
subscriptions and the resources they contain. Management group hierarchies can be used to match your cloud
estate's subscriptions with operations or business governance requirements. For more information, see the
subscription decision guide.

Automated consistency
For large cloud deployments, global governance becomes both more important and more complex. It is crucial
to automatically apply and enforce governance requirements when deploying resources, as well as meet
updated requirements for existing deployments.
Azure Blueprints enable organizations to support global governance of large cloud estates in Azure. Blueprints
move beyond the capabilities provided by standard Azure Resource Manager templates to create complete
deployment orchestrations capable of deploying resources and applying policy rules. Blueprints support
versioning, the ability to update all subscriptions where the blueprint was used, and the ability to lock down
deployed subscriptions to avoid the unauthorized creation and modification of resources.
These deployment packages allow IT and development teams to rapidly deploy new workloads and
networking assets that comply with changing organizational policy requirements. Blueprints can also be
integrated into CI/CD pipelines to apply revised governance standards to deployments as they are updated.

Next steps
Resource consistency is just one of the core infrastructure components requiring architectural decisions during
a cloud adoption process. Visit the decision guides overview to learn about alternative patterns or models used
when making design decisions for other types of infrastructure.
Architectural decision guides
Resource naming and tagging decision guide
4 minutes to read • Edit Online

Organizing cloud-based resources is one of the most important tasks for IT, unless you only have simple
deployments. Organizing your resources serves three primary purposes:
Resource Management: Your IT teams will need to quickly find resources associated with specific workloads,
environments, ownership groups, or other important information. Organizing resources is critical to assigning
organizational roles and access permissions for resource management.
Automation: In addition to making resources easier for IT to manage, a proper organizational scheme allows
you to take advantage of automation as part of resource creation, operational monitoring, and the creation of
DevOps processes.
Accounting: Making business groups aware of cloud resource consumption requires IT to understand what
workloads and teams are using which resources. To support approaches such as chargeback and showback
accounting, cloud resources need to be organized to reflect ownership and usage.

Tagging decision guide

Jump to: Baseline naming conventions | Resource tagging patterns | Learn more
Your tagging approach can be simple or complex, with the emphasis ranging from supporting IT teams managing
cloud workloads to integrating information relating to all aspects of the business.
An IT aligned tagging focus, such as tagging based on workload, function, or environment, will reduce the
complexity of monitoring assets and make management decisions based on operational requirements much
easier.
Tagging schemes that include a business aligned focus, such as accounting, business ownership, or business
criticality may require a larger time investment to create tagging standards that reflect business interests and
maintain those standards over time. However, the result of this process is a tagging system providing an
improved ability to account for costs and value of IT assets to the overall business. This association of an asset's
business value to its operational cost is one of the first steps in changing the cost center perception of IT within
your wider organization.

Baseline naming conventions


A standardized naming convention is the starting point for organizing your cloud-hosted resources. A properly
structured naming system allows you to quickly identify resources for both management and accounting
purposes. If you have existing IT naming conventions in other parts of your organization, consider whether your
cloud naming conventions should align with them or if you should establish separate cloud-based standards.
Note also that different Azure resource types have different naming requirements. Your naming conventions
must be compatible with these naming requirements.

Resource tagging patterns


For more sophisticated organization than a consistent naming convention only can provide, cloud platforms
support the ability to tag resources.
Tags are metadata elements attached to resources. Tags consist of pairs of key/value strings. The values you
include in these pairs is up to you, but the application of a consistent set of global tags, as part of a
comprehensive naming and tagging policy, is a critical part of an overall governance policy.
As part of your planning process, use the following questions to help determine the kind of information your
resource tags need to support:
Does your naming and tagging policies need to integrate with existing naming and organizational policies
within your company?
Will you implement a chargeback or showback accounting system? Will you need to associate resources with
accounting information for departments, business groups, and teams in more detail than a simple
subscription-level breakdown allows?
Does tagging need to represent details such regulatory compliance requirements for a resource? What about
operational details such as uptime requirements, patching schedules, or security requirements?
What tags will be required for all resources based on central IT policy? What tags will be optional? Are
individual teams allowed to implement their own custom tagging schemes?
The common tagging patterns listed below provide examples of how tagging can be used to organize cloud
assets. These patterns are not meant to be exclusive and can be used in parallel, providing multiple ways of
organizing assets based on your company's needs.

TAG TYPE EXAMPLES DESCRIPTION

Functional app = catalogsearch1 Categorize resources in relation to their


tier = web purpose within a workload, what
webserver = apache environment they've been deployed to,
env = prod or other functionality and operational
env = staging details.
env = dev

Classification confidentiality=private Classifies a resource by how it is used


sla = 24hours and what policies apply to it

Accounting department = finance Allows resource to be associated with


project = catalogsearch specific groups within an organization
region = northamerica for billing purposes

Partnership owner = jsmith Provides information about what


contactalias = catsearchowners people (outside of IT) are related or
stakeholders = user1;user2;user3 otherwise affected by the resource
TAG TYPE EXAMPLES DESCRIPTION

Purpose businessprocess=support Aligns resources to business functions


businessimpact=moderate to better support investment decisions
revenueimpact=high

Learn more
For more information about naming and tagging in Azure, see:
Naming conventions for Azure resources. Refer to this guidance for recommended naming conventions for
Azure resources.
Use tags to organize your Azure resources. You can apply tags in Azure at both the resource group and
individual resource level, giving you flexibility in the granularity of any accounting reports based on applied
tags.

Next steps
Resource tagging is just one of the core infrastructure components requiring architectural decisions during a
cloud adoption process. Visit the decision guides overview to learn about alternative patterns or models used
when making design decisions for other types of infrastructure.
Architectural decision guides
Encryption decision guide
7 minutes to read • Edit Online

Encrypting data protects it against unauthorized access. Properly implemented encryption policy provides
additional layers of security for your cloud-based workloads and guards against attackers and other unauthorized
users from both inside and outside your organization and networks.
Jump to: Key management | Data encryption | Learn more
Cloud encryption strategy focuses on corporate policy and compliance mandates. Encrypting resources is
desirable, and many Azure services such as Azure Storage and Azure SQL Database enable encryption by
default. However, encryption has costs that can increase latency and overall resource usage.
For demanding workloads, striking the correct balance between encryption and performance, and determining
how data and traffic is encrypted can be essential. Encryption mechanisms can vary in cost and complexity, and
both technical and policy requirements can influence your decisions on how encryption is applied and how you
store and manage critical secrets and keys.
Corporate policy and third-party compliance are the biggest drivers when planning an encryption strategy. Azure
provides multiple standard mechanisms that can meet common requirements for encrypting data, whether at rest
or in transit. However, for policies and compliance requirements that demand tighter controls, such as
standardized secrets and key management, encryption in-use, or data-specific encryption, you will need to
develop a more sophisticated encryption strategy to support these requirements.

Key management
Encryption of data in the cloud depends on the secure storage, management, and operational use of encryption
keys. A key management system is critical to your organization's ability to create, store, and manage
cryptographic keys, as well important passwords, connection strings, and other IT confidential information.
Modern key management systems such as Azure Key Vault support storage and management of software
protected keys for dev and test usage and hardware security module (HSM ) protected keys for maximum
protection of production workloads or sensitive data.
When planning a cloud migration, the following table can help you decide how to store and manage encryption
keys, certificates, and secrets, which are critical for creating secure and manageable cloud deployments:

QUESTION CLOUD-NATIVE BRING YOUR OWN KEY HOLD YOUR OWN KEY

Does your organization lack Yes No No


centralized key and secret
management?

Will you need to limit the No Yes No


creation of keys and secrets
to devices to your on-
premises hardware, while
using these keys in the
cloud?
QUESTION CLOUD-NATIVE BRING YOUR OWN KEY HOLD YOUR OWN KEY

Does your organization No No Yes


have rules or policies in
place that would prevent
keys from being stored
offsite?

Cloud-native
With cloud-native key management, all keys and secrets are generated, managed, and stored in a cloud-based
vault such as Azure Key Vault. This approach simplifies many IT tasks related to key management, such as key
backup, storage, and renewal.
Using a cloud-native key management system includes these assumptions:
You trust the cloud key management solution with creating, managing, and hosting your organization's secrets
and keys.
You enable all on-premises applications and services that rely on accessing encryption services or secrets to
access the cloud key management system.
Bring your own key
With a bring your own key approach, you generate keys on dedicated HSM hardware within your on-premises
environment, then securely transferring these keys to a cloud-based management system such as Azure Key Vault
for use with your cloud-hosted resources.
Bring your own key assumptions: Generating keys on-premises and using them with a cloud-based key
management system includes these assumptions:
You trust the underlying security and access control infrastructure of the cloud platform for hosting and using
your keys and secrets.
Your cloud-hosted applications or services are able to access and use keys and secrets in a robust and secure
way.
You are required by regulatory or organizational policy to keep the creation and management of your
organization's secrets and keys on-premises.
On-premises (hold your own key)
Certain scenarios might have regulatory, policy, or technical reasons prohibiting the storage of keys on a cloud-
based key management system. If so, you must generate keys using on-premises hardware, store and manage
them using an on-premises key management system, and establish a way for cloud-based resources to access
these keys for encryption purposes. Note that holding your own key might not be compatible with all Azure-
based services.
On-premises key management assumptions: Using an on-premises key management system includes these
assumptions:
You are required by regulatory or organizational policy to keep the creation, management, and hosting of your
organization's secrets and keys on-premises.
Any cloud-based applications or services that rely on accessing encryption services or secrets can access the
on-premises key management system.

Data encryption
Consider several different states of data with different encryption needs when planning your encryption policy:
DATA STATE DATA

Data in transit Internal network traffic, internet connections, connections


between datacenters or virtual networks

Data at rest Databases, files, virtual drives, PaaS storage

Data in use Data loaded in RAM or in CPU caches

Data in transit
Data in transit is data moving between resources on the internal, between datacenters or external networks, or
over the internet.
Data in transit is usually encrypted by requiring SSL/TLS protocols for network traffic. Always encrypt traffic
between your cloud-hosted resources and external networks or the public internet. PaaS resources typically
enforce SSL/TLS encryption by default. Your cloud adoption teams and workload owners should consider
enforcing encryption for traffic between IaaS resources hosted inside your virtual networks.
Assumptions about encrypting data in transit: Implementing proper encryption policy for data in transit
assumes the following:
All publicly accessible endpoints in your cloud environment will communicate with the public internet using
SSL/TLS protocols.
When connecting cloud networks with on-premises or other external network over the public internet, use
encrypted VPN protocols.
When connecting cloud networks with on-premises or other external network using a dedicated WAN
connection such as ExpressRoute, you will use a VPN or other encryption appliance on-premises paired with a
corresponding virtual VPN or encryption appliance deployed to your cloud network.
If you have sensitive data that shouldn't be included in traffic logs or other diagnostics reports visible to IT
staff, you will encrypt all traffic between resources in your virtual network.
Data at rest
Data at rest represents any data not being actively moved or processed, including files, databases, virtual machine
drives, PaaS storage accounts, or similar assets. Encrypting stored data protects virtual devices or files against
unauthorized access either from external network penetration, rogue internal users, or accidental releases.
PaaS storage and database resources generally enforce encryption by default. IaaS resources can be secured by
encrypting data at the virtual disk level or by encrypting the entire storage account hosting your virtual drives. All
of these assets can make use of either Microsoft-managed or customer-managed keys stored in Azure Key Vault.
Encryption for data at rest also encompasses more advanced database encryption techniques, such as column-
level and row level encryption, providing much more control over exactly what data is being secured.
Your overall policy and compliance requirements, the sensitivity of the data being stored, and the performance
requirements of your workloads should determine which assets require encryption.
Assumptions about encrypting data at rest
Encrypting data at rest assumes the following:
You are storing data that is not meant for public consumption.
Your workloads can accept the added latency cost of disk encryption.
Data in use
Encryption for data in use involves securing data in nonpersistent storage, such as RAM or CPU caches. Use of
technologies such as full memory encryption, enclave technologies, such as Intel's Secure Guard Extensions
(SGX). This also includes cryptographic techniques, such as homomorphic encryption that can be used to create
secure, trusted execution environments.
Assumptions about encrypting data in use: Encrypting data in use assumes the following:
You are required to maintain data ownership separate from the underlying cloud platform at all times, even at
the RAM and CPU level.

Learn more
For more information about encryption and key management in Azure, see:
Azure encryption overview . A detailed description of how Azure uses encryption to secure both data at rest
and data in transit.
Azure Key Vault. Key Vault is the primary key management system for storing and managing cryptographic
keys, secrets, and certificates within Azure.
Azure Data Security and Encryption Best Practices. A discussion of Azure data security and encryption best
practices.
Confidential computing in Azure. Azure's confidential computing initiative provides tools and technology to
create trusted execution environments or other encryption mechanisms to secure data in use.

Next steps
Encryption is just one of the core infrastructure components requiring architectural decisions during a cloud
adoption process. Visit the decision guides overview to learn about alternative patterns or models used when
making design decisions for other types of infrastructure.
Architectural decision guides
Software Defined Networking decision guide
3 minutes to read • Edit Online

Software Defined Networking (SDN ) is a network architecture designed to allow virtualized networking
functionality that can be centrally managed, configured, and modified through software. SDN enables the
creation of cloud-based networks using the virtualized equivalents to physical routers, firewalls, and other
networking devices used in on-premises networks. SDN is critical to creating secure virtual networks on public
cloud platforms such as Azure.

Networking decision guide

Jump to: PaaS Only | Cloud-native | Cloud DMZ Hybrid | Hub and spoke model | Learn more
SDN provides several options with varying degrees of pricing and complexity. The above discovery guide
provides a reference to quickly personalize these options to best align with specific business and technology
strategies.
The inflection point in this guide depends on several key decisions that your cloud strategy team has made
before making decisions about networking architecture. Most important among these are decisions involving
your digital estate definition and subscription design (which may also require inputs from decisions made
related to your cloud accounting and global markets strategies).
Small single-region deployments of fewer than 1,000 VMs are less likely to be significantly affected by this
inflection point. Conversely, large adoption efforts with more than 1,000 VMs, multiple business units, or
multiple geopolitical markets, could be substantially affected by your SDN decision and this key inflection point.

Choose the right virtual networking architectures


This section expands on the decision guide to help you choose the right virtual networking architectures.
There are many ways to implement SDN technologies to create cloud-based virtual networks. How you
structure the virtual networks used in your migration and how those networks interact with your existing IT
infrastructure will depend on a combination of the workload requirements and your governance requirements.
When planning which virtual networking architecture or combination of architectures to consider when planning
your cloud migration, consider the following questions to help determine what's right for your organization:
QUESTION PAAS-ONLY CLOUD-NATIVE CLOUD DMZ HYBRID HUB AND SPOKE

Will your Yes No No No No


workload only
use PaaS services
and not require
networking
capabilities
beyond those
provided by the
services
themselves?

Does your No No Yes Yes Yes


workload require
integration with
on-premises
applications?

Have you No No No Yes Yes


established
mature security
policies and
secure
connectivity
between your
on-premises and
cloud networks?

Does your No No No Yes Yes


workload require
authentication
services not
supported
through cloud
identity services,
or do you need
direct access to
on-premises
domain
controllers?

Will you need to No No No No Yes


deploy and
manage a large
number of VMs
and workloads?

Will you need to No No No No Yes


provide
centralized
management
and on-premises
connectivity
while delegating
control over
resources to
individual
workload teams?
Virtual networking architectures
Learn more about the primary software defined networking architectures:
PaaS -only: Most platform as a service (PaaS ) products support a limited set of built-in networking features
and may not require an explicitly defined software defined network to support workload requirements.
Cloud-native: A cloud-native architecture supports cloud-based workloads using virtual networks built on
the cloud platform's default software defined networking capabilities, without reliance on on-premises or
other external resources.
Cloud DMZ: Supports limited connectivity between your on-premises and cloud networks, secured through
the implementation of a demilitarized zone tightly controlling traffic between the two environments.
Hybrid: The hybrid cloud network architecture allows virtual networks in trusted cloud environments to
access your on-premises resources and vice versa.
Hub and spoke: The hub and spoke architecture allows you to centrally manage external connectivity and
shared services, isolate individual workloads, and overcome potential subscription limits.

Learn more
For more information about Software Defined Networking in Azure, see:
Azure Virtual Network. On Azure, the core SDN capability is provided by Azure Virtual Network, which acts
as a cloud analog to physical on-premises networks. Virtual networks also act as a default isolation boundary
between resources on the platform.
Azure best practices for network security. Recommendations from the Azure Security team on how to
configure your virtual networks to minimize security vulnerabilities.

Next steps
Software defined networking is just one of the core infrastructure components requiring architectural decisions
during a cloud adoption process. Visit the decision guides overview to learn about alternative patterns or models
used when making design decisions for other types of infrastructure.
Architectural decision guides
Software Defined Networking: PaaS-only
2 minutes to read • Edit Online

When you implement a platform as a service (PaaS ) resource, the deployment process automatically creates an
assumed underlying network with a limited number of controls over that network, including load balancing, port
blocking, and connections to other PaaS services.
In Azure, several PaaS resource types can be deployed into or connected to a virtual network, allowing these
resources to integrate with your existing virtual networking infrastructure. Other services, such as App Service
Environments, Azure Kubernetes Service (AKS ), and Service Fabric must be deployed within virtual network.
However, in many cases a PaaS only networking architecture, relying only on the default native networking
capabilities provided by PaaS resources, is sufficient to meet a workload's connectivity and traffic management
requirements.
If you are considering a PaaS only networking architecture, be sure you validate that the required assumptions
align with your requirements.

PaaS-only assumptions
Deploying a PaaS -only networking architecture assumes the following:
The application being deployed is a standalone application or depends only on other PaaS resources that do
not require a virtual network.
Your IT operations teams can update their tools, training, and processes to support management,
configuration, and deployment of standalone PaaS applications.
The PaaS application is not part of a broader cloud migration effort that will include IaaS resources.
These assumptions are minimum qualifiers aligned to deploying a PaaS -only network. While this approach may
align with the requirements of a single application deployment, each cloud adoption team should consider these
long-term questions:
Will this deployment expand in scope or scale to require access to other non-PaaS resources?
Are other PaaS deployments planned beyond the current solution?
Does the organization have plans for other future cloud migrations?
The answers to these questions would not preclude a team from choosing a PaaS only option but should be
considered before making a final decision.
Software Defined Networking: Cloud-native
2 minutes to read • Edit Online

A cloud-native virtual network is a required when deploying IaaS resources such as virtual machines to a cloud
platform. Access to virtual networks from external sources, similar to the web, need to be explicitly provisioned.
These types of virtual networks support the creation of subnets, routing rules, and virtual firewall and traffic
management devices.
A cloud-native virtual network has no dependencies on your organization's on-premises or other noncloud
resources to support the cloud-hosted workloads. All required resources are provisioned either in the virtual
network itself or by using managed PaaS offerings.

Cloud-native assumptions
Deploying a cloud-native virtual network assumes the following:
The workloads you deploy to the virtual network have no dependencies on applications or services that are
accessible only from inside your on-premises network. Unless they provide endpoints accessible over the
public internet, applications and services hosted internally on-premises are not usable by resources hosted on
a cloud platform.
Your workload's identity management and access control depends on the cloud platform's identity services or
IaaS servers hosted in your cloud environment. You will not need to directly connect to identity services hosted
on-premises or other external locations.
Your identity services do not need to support single sign-on (SSO ) with on-premises directories.
Cloud-native virtual networks have no external dependencies. This makes them simple to deploy and configure,
and as a result this architecture is often the best choice for experiments or other smaller self-contained or rapidly
iterating deployments.
Additional issues your cloud adoption teams should consider when discussing a cloud-native virtual networking
architecture include:
Existing workloads designed to run in an on-premises datacenter may need extensive modification to take
advantage of cloud-based functionality, such as storage or authentication services.
Cloud-native networks are managed solely through the cloud platform management tools, and therefore may
lead to management and policy divergence from your existing IT standards as time goes on.

Next steps
For more information about cloud-native virtual networking in Azure, see:
Azure Virtual Network: How -to guides. Newly created Azure Virtual Networks are cloud-native by default. Use
these guides to help plan the design and deployment of your virtual networks.
Subscription limits: Networking. Any single virtual network and connected resources can only exist within a
single subscription, and are bound by subscription limits.
Software Defined Networking: Cloud DMZ
2 minutes to read • Edit Online

The Cloud DMZ network architecture allows limited access between your on-premises and cloud-based networks,
using a virtual private network (VPN ) to connect the networks. Although a DMZ model is commonly used when
you want to secure external access to a network, the Cloud DMZ architecture discussed here is intended
specifically to secure access to the on-premises network from cloud-based resources and vice versa.

This architecture is designed to support scenarios where your organization wants to start integrating cloud-based
workloads with on-premises workloads but may not have fully matured cloud security policies or acquired a
secure dedicated WAN connection between the two environments. As a result, cloud networks should be treated
like a demilitarized zone to ensure on-premises services are secure.
The DMZ deploys network virtual appliances (NVAs) to implement security functionality such as firewalls and
packet inspection. Traffic passing between on-premises and cloud-based applications or services must pass
through the DMZ where it can be audited. VPN connections and the rules determining what traffic is allowed
through the DMZ network are strictly controlled by IT security teams.

Cloud DMZ assumptions


Deploying a cloud DMZ includes the following assumptions:
Your security teams have not fully aligned on-premises and cloud-based security requirements and policies.
Your cloud-based workloads require access to limited subset of services hosted on your on-premises or third-
party networks, or users or applications in your on-premises environment need limited access to cloud-hosted
resources.
Implementing a VPN connection between your on-premises networks and cloud provider is not prevented by
corporate policy, regulatory requirements, or technical compatibility issues.
Your workloads either do not require multiple subscriptions to bypass subscription resource limits, or they
involve multiple subscriptions but don't require central management of connectivity or shared services used
by resources spread across multiple subscriptions.
Your cloud adoption teams should consider the following issues when looking at implementing a Cloud DMZ
virtual networking architecture:
Connecting on-premises networks with cloud networks increases the complexity of your security
requirements. Even though connections between cloud networks and the on-premises environment are
secured, you still need to ensure cloud resources are secured. Any public IPs created to access cloud-based
workloads need to be properly secured using a public facing DMZ or Azure Firewall.
The Cloud DMZ architecture is commonly used as a stepping stone while connectivity is further secured and
security policy aligned between on-premises and cloud networks, allowing a broader adoption of a full-scale
hybrid networking architecture. However, it may also apply to isolated deployments with specific security,
identity, and connectivity needs that the Cloud DMZ approach satisfies.

Learn more
For more information about implementing a Cloud DMZ in Azure, see:
Implement a DMZ between Azure and your on-premises datacenter. This article discusses how to implement a
secure hybrid network architecture in Azure.
Software Defined Networking: Hybrid network
2 minutes to read • Edit Online

The hybrid cloud network architecture allows virtual networks to access your on-premises resources and services
and vice versa, using a Dedicated WAN connection such as ExpressRoute or other connection method to directly
connect the networks.

Building on the cloud-native virtual network architecture, a hybrid virtual network is isolated when initially
created. Adding connectivity to the on-premises environment grants access to and from the on-premises network,
although all other inbound traffic targeting resources in the virtual network need to be explicitly allowed. You can
secure the connection using virtual firewall devices and routing rules to limit access or you can specify exactly
what services can be accessed between the two networks using cloud-native routing features or deploying
network virtual appliances (NVAs) to manage traffic.
Although the hybrid networking architecture supports VPN connections, dedicated WAN connections like
ExpressRoute are preferred due to higher performance and increased security.

Hybrid assumptions
Deploying a hybrid virtual network includes the following assumptions:
Your IT security teams have aligned on-premises and cloud-based network security policy to ensure cloud-
based virtual networks can be trusted to communicated directly with on-premises systems.
Your cloud-based workloads require access to storage, applications, and services hosted on your on-premises
or third-party networks, or your users or applications in your on-premises need access to cloud-hosted
resources.
You need to migrate existing applications and services that depend on on-premises resources, but don't want
to expend the resources on redevelopment to remove those dependencies.
Connecting your on-premises networks to cloud resources over VPN or dedicated WAN is not prevented by
corporate policy, data sovereignty requirements, or other regulatory compliance issues.
Your workloads either do not require multiple subscriptions to bypass subscription resource limits, or your
workloads involve multiple subscriptions but do not require central management of connectivity or shared
services used by resources spread across multiple subscriptions.
Your cloud adoption teams should consider the following issues when looking at implementing a hybrid virtual
networking architecture:
Connecting on-premises networks with cloud networks increases the complexity of your security requirements.
Both networks must be secured against external vulnerabilities and unauthorized access from both sides of the
hybrid environment.
Scaling the number and size of workloads within a hybrid cloud environment can add significant complexity to
routing and traffic management.
You will need to develop compatible management and access control policies to maintain consistent
governance throughout your organization.

Learn more
For more information about hybrid networking in Azure, see:
Hybrid network reference architecture. Azure hybrid virtual networks use either an ExpressRoute circuit or
Azure VPN to connect your virtual network with your organization's existing IT assets not hosted in Azure. This
article discusses the options for creating a hybrid network in Azure.
Software Defined Networking: Hub and spoke
2 minutes to read • Edit Online

The hub and spoke networking model organizes your Azure-based cloud network infrastructure into multiple
connected virtual networks. This model allows you to more efficiently manage common communication or
security requirements and deal with potential subscription limitations.
In the hub and spoke model, the hub is a virtual network that acts as a central location for managing external
connectivity and hosting services used by multiple workloads. The spokes are virtual networks that host
workloads and connect to the central hub through virtual network peering.
All traffic passing in or out of the workload spoke networks is routed through the hub network where it can be
routed, inspected, or otherwise managed by centrally managed IT rules or processes.
This model aims to address each of the following concerns:
Cost savings and management efficiency. Centralizing services that can be shared by multiple workloads,
such as network virtual appliances (NVAs) and DNS servers, in a single location allows IT to minimize
redundant resources and management effort across multiple workloads.
Overcoming subscriptions limits. Large cloud-based workloads may require the use of more resources than
are allowed within a single Azure subscription (see subscription limits). Peering workload virtual networks
from different subscriptions to a central hub can overcome these limits.
Separation of concerns. The ability to deploy individual workloads between central IT teams and workloads
teams.
The following diagram shows an example hub and spoke architecture including centrally managed hybrid
connectivity.

The hub and spoke architecture is often used alongside the hybrid networking architecture, providing a centrally
managed connection to your on-premises environment shared between multiple workloads. In this scenario, all
traffic traveling between the workloads and on-premises passes through the hub where it can be managed and
secured.

Hub and spoke assumptions


Implementing a hub and spoke virtual networking architecture assumes the following:
Your cloud deployments will involve workloads hosted in separate working environments, such as
development, test, and production, that all rely on a set of common services such as DNS or directory services.
Your workloads do not need to communicate with each other but have common external communications and
shared services requirements.
Your workloads require more resources than are available within a single Azure subscription.
You need to provide workload teams with delegated management rights over their own resources while
maintaining central security control over external connectivity.

Global hub and spoke


Hub and spoke architectures are commonly implemented with virtual networks deployed to the same Azure
Region to minimize latency between networks. However, large organizations with global reach may need to
deploy workloads across multiple regions for availability, disaster recovery, or regulatory requirements. The hub
and spoke model can use of Azure global virtual network peering to extend centralized management and shared
services across regions and support workloads distributed across the world.

Learn more
For examples of how to implement hub and spoke networks on Azure, see the following examples on the Azure
Reference Architectures site:
Implement a hub and spoke network topology in Azure
Implement a hub and spoke network topology with shared services in Azure
Logging and reporting decision guide
7 minutes to read • Edit Online

All organizations need mechanisms for notifying IT teams of performance, uptime, and security issues before
they become serious problems. A successful monitoring strategy allows you to understand how the individual
components that make up your workloads and networking infrastructure are performing. Within the context of a
public cloud migration, integrating logging and reporting with any of your existing monitoring systems, while
surfacing important events and metrics to the appropriate IT staff, is critical in ensuring your organization is
meeting uptime, security, and policy compliance goals.

Jump to: Planning your monitoring infrastructure | Cloud-native | On-premises extension | Gateway aggregation
| Hybrid monitoring (on-premises) | Hybrid monitoring (cloud-based) | Multicloud | Learn more
The inflection point when determining a cloud logging and reporting strategy is based primarily on existing
investments your organization has made in operational processes, and to some degree any requirements you
have to support a multicloud strategy.
Activities in the cloud can be logged and reported in multiple ways. Cloud-native and centralized logging are two
common managed service options that are driven by the subscription design and the number of subscriptions.

Plan your monitoring infrastructure


When planning your deployment, you need to consider where logging data is stored and how you will integrate
cloud-based reporting and monitoring services with your existing processes and tools.

ON-PREMISES GATEWAY
QUESTION CLOUD-NATIVE EX TENSION HYBRID MONITORING AGGREGATION

Do you have an No Yes Yes No


existing on-premises
monitoring
infrastructure?

Do you have No Yes No No


requirements
preventing storage of
log data on external
storage locations?
ON-PREMISES GATEWAY
QUESTION CLOUD-NATIVE EX TENSION HYBRID MONITORING AGGREGATION

Do you need to No No Yes No


integrate cloud
monitoring with on-
premises systems?

Do you need to No No No Yes


process or filter
telemetry data before
submitting it to your
monitoring systems?

Cloud-native
If your organization currently lacks established logging and reporting systems, or if your planned deployment
does not need to be integrated with existing on-premises or other external monitoring systems, a cloud-native
SaaS solution such as Azure Monitor, is the simplest choice.
In this scenario, all log data is recorded and stored in the cloud, while the logging and reporting tools that
process and surface information to IT staff are provided by the Azure platform and Azure Monitor.
Custom Azure Monitor-based logging solutions can be implemented ad hoc for each subscription or workload
in smaller or experimental deployments, and are organized in a centralized manner to monitor log data across
your entire cloud estate.
Cloud-native assumptions: Using a cloud-native logging and reporting system assumes the following:
You do not need to integrate the log data from you cloud workloads into existing on-premises systems.
You will not be using your cloud-based reporting systems to monitor on-premises systems.
On-premises extension
It might require substantial redevelopment effort for applications and services migrating to the cloud to use
cloud-based logging and reporting solutions such as Azure Monitor. In these cases, consider allowing these
workloads to continue sending telemetry data to existing on-premises systems.
To support this approach, your cloud resources will need to be able to communicate directly with your on-
premises systems through a combination of hybrid networking and cloud hosted domain services. With this in
place, the cloud virtual network functions as a network extension of the on-premises environment. Therefore,
cloud hosted workloads can communicate directly with your on-premises logging and reporting system.
This approach capitalizes on your existing investment in monitoring tooling with limited modification to any
cloud-deployed applications or services. This is often the fastest approach to support monitoring during a lift
and shift migration. However, it won't capture log data produced by cloud-based PaaS and SaaS resources, and
it will omit any VM -related logs generated by the cloud platform itself such as VM status. As a result, this pattern
should be a temporary solution until a more comprehensive hybrid monitoring solution is implemented.
On-premises–only assumptions:
You need to maintain log data only in your on-premises environment only, either in support of technical
requirements or due to regulatory or policy requirements.
Your on-premises systems do not support hybrid logging and reporting or gateway aggregation solutions.
Your cloud-based applications can submit telemetry directly to your on-premises logging systems or
monitoring agents that submit to on-premises can be deployed to workload VMs.
Your workloads don't depend on PaaS or SaaS services that require cloud-based logging and reporting.
Gateway aggregation
For scenarios where the amount of cloud-based telemetry data is large or existing on-premises monitoring
systems need log data modified before it can be processed, a log data gateway aggregation service might be
required.
A gateway service is deployed to your cloud provider. Then, relevant applications and services are configured to
submit telemetry data to the gateway instead of a default logging system. The gateway can then process the
data: aggregating, combining, or otherwise formatting it before then submitting it to your monitoring service for
ingestion and analysis.
Also, a gateway can be used to aggregate and preprocess telemetry data bound for cloud-native or hybrid
systems.
Gateway aggregation assumptions:
You expect large volumes of telemetry data from your cloud-based applications or services.
You need to format or otherwise optimize telemetry data before submitting it to your monitoring systems.
Your monitoring systems have APIs or other mechanisms available to ingest log data after processing by the
gateway.
Hybrid monitoring (on-premises)
A hybrid monitoring solution combines log data from both your on-premises and cloud resources to provide an
integrated view into your IT estate's operational status.
If you have an existing investment in on-premises monitoring systems that would be difficult or costly to replace,
you might need to integrate the telemetry from your cloud workloads into preexisting on-premises monitoring
solutions. In a hybrid on-premises monitoring system, on-premises telemetry data continues to use the existing
on-premises monitoring system. Cloud-based telemetry data is either sent to the on-premises monitoring
system directly, or the data is sent to Azure Monitor then compiled and ingested into the on-premises system at
regular intervals.
On-premises hybrid monitoring assumptions: Using an on-premises logging and reporting system for
hybrid monitoring assumes the following:
You need to use existing on-premises reporting systems to monitor cloud workloads.
You need to maintain ownership of log data on-premises.
Your on-premises management systems have APIs or other mechanisms available to ingest log data from
cloud-based systems.

TIP
As part of the iterative nature of cloud migration, transitioning from distinct cloud-native and on-premises monitoring to a
partial hybrid approach is likely as the integration of cloud-based resources and services into your overall IT estate
matures.

Hybrid monitoring (cloud-based)


If you do not have a compelling need to maintain an on-premises monitoring system, or you want to replace on-
premises monitoring systems with a centralized cloud-based solution, you can also choose to integrate on-
premises log data with Azure Monitor to provide centralized cloud-based monitoring system.
Mirroring the on-premises centered approach, in this scenario cloud-based workloads would submit telemetry
direct to Azure Monitor, and on-premises applications and services would either submit telemetry directly to
Azure monitor, or aggregate that data on-premises for ingestion into Azure Monitor at regular intervals. Azure
Monitor would then serve as your primary monitoring and reporting system for your entire IT estate.
Cloud-based hybrid monitoring assumptions: Using cloud-based logging and reporting systems for hybrid
monitoring assumes the following:
You don't depend on existing on-premises monitoring systems.
Your workloads do not have regulatory or policy requirements to store log data on-premises.
Your cloud-based monitoring systems have APIs or other mechanisms available to ingest log data from on-
premises applications and services.
Multicloud
Integrating logging and reporting capabilities across a multiple-cloud platform can be complicated. Services
offered between platforms are often not directly comparable, and logging and telemetry capabilities provided by
these services differ as well. Multicloud logging support often requires the use of gateway services to process
log data into a common format before submitting data to a hybrid logging solution.

Learn more
Azure Monitor is the default reporting and monitoring service for Azure. It provides:
A unified platform for collecting app telemetry, host telemetry (such as VMs), container metrics, Azure
platform metrics, and event logs.
Visualization, queries, alerts, and analytical tools. It can provide insights into virtual machines, guest
operating systems, virtual networks, and workload application events.
REST APIs for integration with external services and automation of monitoring and alerting services.
Integration with many popular third-party vendors.

Next steps
Logging and reporting is just one of the core infrastructure components requiring architectural decisions during
a cloud adoption process. Visit the decision guides overview to learn about alternative patterns or models used
when making design decisions for other types of infrastructure.
Architectural decision guides
Migration tools decision guide
4 minutes to read • Edit Online

The strategy and tools you use to migrate an application to Azure will largely depend on your business
motivations, technology strategies, and timelines, as well as a deep understanding of the actual workload and
assets (infrastructure, apps, and data) being migrated. The following decision tree serves as high-level guidance
for selecting the best tools to use based on migration decisions. Treat this decision tree as a starting point.
The choice to migrate using platform as a service (PaaS ) or infrastructure as a service (IaaS ) technologies is
driven by the balance between cost, time, existing technical debt, and long-term returns. IaaS is often the fastest
path to the cloud with the least amount of required change to the workload. PaaS could require modifications to
data structures or source code, but produces substantial long-term returns in the form of reduced operating costs
and greater technical flexibility. In the following diagram, the term modernize is used to reflect a decision to
modernize an asset during migration and migrate the modernized asset to a PaaS platform.

Key questions
Answering the following questions will allow you to make decisions based on the above tree.
Would modernization of the application platform during migration prove to be a wise investment
of time, energy, and budget? PaaS technologies such as Azure App Service or Azure Functions can increase
deployment flexibility and reduce the complexity of managing virtual machines to host applications. However,
applications may require refactoring before they can take advantage of these cloud-native capabilities,
potentially adding significant time and cost to a migration effort. If your application can migrate to PaaS
technologies with a minimum of modifications, it is likely a good candidate for modernization. If extensive
refactoring would be required, a migration using IaaS -based virtual machines may be a better choice.
Would modernization of the data platform during migration prove to be a wise investment of time,
energy, and budget? As with application migration, Azure PaaS managed storage options, such as Azure
SQL Database, Cosmos DB, and Azure Storage, offer significant management and flexibility benefits, but
migrating to these services may require refactoring of existing data and the applications that use that data.
Data platforms often require significantly less refactoring than the application platform would. As such, it is
very common for the data platform to be modernized, even though the application platform remains the
same. If your data can be migrated to a managed data service with minimal changes, it is a good candidate for
modernization. Data that would require extensive time or cost to be refactored to use these PaaS services may
be better migrated using IaaS -based virtual machines to better match existing hosting capabilities.
Is your application currently running on dedicated virtual machines or sharing hosting with other
applications? Application running on dedicated virtual machines may be more easily migrated to PaaS
hosting options than applications running on shared servers.
Will your data migration exceed your network bandwidth? Network capacity between your on-premises
data sources and Azure can be a bottleneck on data migration. If the data you need to transfer faces
bandwidth limitations that prevent efficient or timely migration, you may need to look into alternative or
offline transfer mechanisms. The Cloud Adoption Framework's article on migration replication discusses how
replication limits can affect migration efforts. As part of your migration assessment, consult your IT teams to
verify your local and WAN bandwidth is capable of handling your migration requirements. Also see the
expanded scope migration scenario for when storage requirements exceed network capacity during a
migration.
Does your application make use of an existing DevOps pipeline? In many cases Azure Pipelines can be
easily refactored to deploy applications to cloud-based hosting environments.
Does your data have complex data storage requirements? Production applications usually require data
storage that is highly available, offers always on functionality and similar service uptime and continuity
features. Azure PaaS -based managed database options, such as Azure SQL Database, Azure Database for
MySQL, and Azure Cosmos DB all offer 99.99% uptime service-level agreements. Conversely, IaaS -based
SQL Server on Azure VMs offers single-instance service-level agreements of 99.95%. If your data cannot be
modernized to use PaaS storage options, guaranteeing higher IaaS uptime will involve more complex data
storage scenarios such as running SQL Server Always-on clusters and continuously syncing data between
instances. This can involve significant hosting and maintenance costs, so balancing uptime requirements,
modernization effort, and overall budgetary impact is important when considering your data migration
options.

Innovation and migration


In line with the Cloud Adoption Frameworks emphasis on incremental migration efforts, an initial decision on
migration strategy and tooling does not rule out future innovation efforts to update an application to take
advantage of opportunities presented by the Azure platform. While an initial migration effort may primarily
focus on rehosting using an IaaS approach, you should plan to revisit your cloud-hosted application portfolio
regularly to investigate optimization opportunities.

Learn more
Cloud fundamentals: Overview of Azure compute options: Provides information on the capabilities of
Azure IaaS and PaaS compute options.
Cloud fundamentals: Choose the right data store: Discusses PaaS storage options available on the Azure
platform.
Expanded scope migration: Data requirements exceed network capacity during a migration effort:
Discusses alternative data migration mechanisms for scenarios where data migration is hindered by available
network bandwidth.
SQL Database: Choose the right SQL Server option in Azure: Discussion of the options and business
justifications for choosing to host your SQL Server workloads in a hosted infrastructure (IaaS ) or a hosted
service (PaaS ) environment.
Deploy a basic workload in Azure
3 minutes to read • Edit Online

The term workload is typically defined as an arbitrary unit of functionality, such as an application or service. It
helps to think about a workload in terms of the code artifacts that are deployed to a server, and also other services
specific to an application. This may be a useful definition for an on-premises application or service, but for cloud
applications it needs to be expanded.
In the cloud a workload not only encompasses all the artifacts, but it also includes the cloud resources as well.
Included is cloud resources as part of the definition because of the concept known as "infrastructure as code". As
you learned in how does Azure work?, resources in Azure are deployed by an orchestrator service. This
orchestrator service exposes functionality through a web API, and you can call the web API using several tools
such as PowerShell, the Azure CLI, and the Azure portal. This means that you can specify Azure resources in a
machine-readable file that can be stored along with the code artifacts associated with the application.
This enables you to define a workload in terms of code artifacts and the necessary cloud resources, thus further
enabling you to isolate workloads. You can isolate workloads by the way resources are organized, by network
topology, or by other attributes. The goal of workload isolation is to associate a workload's specific resources to a
team, so that the team can independently manage all aspects of those resources. This enables multiple teams to
share resource management services in Azure while preventing the unintentional deletion or modification of each
other's resources.
This isolation also enables another concept, known as DevOps. DevOps includes the software development
practices that include both software development and IT operations above, and adds the use of automation as
much as possible. One of the principles of DevOps is known as continuous integration and continuous delivery
(CI/CD ). Continuous integration refers to the automated build processes that are run every time a developer
commits a code change. Continuous delivery refers to the automated processes that deploy this code to various
environments such as a development environment for testing or a production environment for final deployment.

Basic workload
A basic workload is typically defined as a single web application or a virtual network (VNet) with virtual machine
(VM ).

NOTE
This guide does not cover application development. For more information about developing applications on Azure, see the
Azure Application Architecture Guide.

Regardless of whether the workload is a web application or a VM, each of these deployments requires a resource
group. A user with permission to create a resource group must do this before following the steps below.

Basic web application (PaaS)


For a basic web application, select one of the 5-minute quickstarts from the web apps documentation and follow
the steps.
NOTE
Some of the Quickstart guides will deploy a resource group by default. In this case, it's not necessary to create a resource
group explicitly. Otherwise, deploy the web application to the resource group created above.

Once you deploy a simple workload, you can learn more about the best practices for deploying a basic web
application to Azure.

Single Windows or Linux VM (IaaS)


For a simple workload that runs on a VM, the first step is to deploy a virtual network. All infrastructure as a service
(IaaS ) resources in Azure such as virtual machines, load balancers, and gateways, require a virtual network. Learn
about Azure virtual networks, and then follow the steps to deploy a Virtual Network to Azure using the portal.
When you specify the settings for the virtual network in the Azure portal, be sure to specify the name of the
resource group created above.
The next step is to decide whether to deploy a single Windows or Linux VM. For Windows VM, follow the steps to
deploy a Windows VM to Azure with the portal. Again, when you specify the settings for the virtual machine in the
Azure portal, specify the name of the resource group created above.
Once you've followed the steps and deployed the VM, you can learn about best practices for running a Windows
VM on Azure. For a Linux VM, follow the steps to deploy a Linux VM to Azure with the portal. You can also learn
more about best practices for running a Linux VM on Azure.

Next steps
See Architectural decision guides for how to use core infrastructure components in the Azure cloud.
In early 2018, Microsoft released the Cloud Operating Model (COM ). The COM was a guide that helped customers understand the
what and the why of digital transformation. This helped customers get a sense of all the areas that needed to be addressed:
business strategy, culture strategy, and technology strategy. What was not included in the COM were the specific how-to's, which
left customers wondering, "Where do we go from here?"
In October 2018, we began a review of all the models that had proliferated across the Microsoft community, we found roughly 60
different cloud adoption models. A cross-Microsoft team was established to bring everything together as a dedicated engineering
"product" with defined implementations across services, sales, and marketing. This effort culminated in the creation of a single
model, the Microsoft Cloud Adoption Framework for Azure, designed to help customers understand the what and why and
provide unified guidance on the how to help them accelerate their cloud adoption. The goal of this project is to create a One
Microsoft approach to cloud adoption.

Using Cloud Operating Model practices within the Cloud Adoption Framework
For a similar approach to COM, readers should begin with one of the following:
Begin a cloud migration journey
Innovate through cloud adoption
Enable successful cloud adoption
The guidance previously provided in COM is still relevant to the Cloud Adoption Framework. The experience is different, but the
structure of the Cloud Adoption Framework is simply an expansion of that guidance. To transition from COM to the Cloud
Adoption Framework, an understanding of scope and structure is important. The following two sections describe that transition.

Scope
COM established a scope comprised of the following components:

Business strategy: Establish clear business objectives and outcomes that are to be supported by cloud adoption.
Technology strategy: Align the overarching strategy to guide adoption of the cloud in alignment with the business strategy.
People strategy: Develop a strategy for training the people and changing the culture to enable business success.
The high-level scopes of the Cloud Operating Model and the Cloud Adoption Framework are similar. Business, culture, and
technology are reflected throughout the guidance and each methodology within the Cloud Adoption Framework.
NOTE

The Cloud Adoption Framework's scope has two significant points of clarity. In the Cloud Adoption Framework, business strategy
goes beyond the documentation of cloud costs—it is about understanding motivations, desired outcomes, returns, and cloud costs
to create actionable plans and clear business justifications. In the Cloud Adoption Framework, people strategy goes beyond
training to include approaches that create demonstrable cultural maturity. A few areas on the roadmap include demonstrations of
the impact of Agile management, DevOps integration, customer empathy and obsession, and lean product development
approaches.

Structure
COM included an infographic that outlined the various decisions and actions needed during a cloud adoption effort. That graphic
provided a clear means of communicating next steps and dependent decisions.
The Cloud Adoption Framework follows a similar model. However, as the actions and decisions expanded into multiple decision
trees, complexity quickly made a single graphical view appear overwhelming. To simplify the guidance and make it more
immediately actionable, the single graphic has been decomposed into the following structures.
At the executive level, the Cloud Adoption Framework has been simplified into the following three phases of adoption and two
primary governance guides.

The three phases of adoption are:


Plan: Develop the business plan to guide cloud adoption in alignment with desired business outcomes.
Ready: Prepare the people, organization, and technical environment for execution of the adoption plan.
Adopt: Technical strategy required to execute a specific adoption plan, in alignment with a specific adoption journey, to realize
business outcomes.
The three phases of cloud adoption have been mapped to two specific journeys:
Migrate: Move existing workloads to the cloud.
Innovate: Modernize existing workloads and create new products and services.
Additional resources required for successful cloud adoption can be found in Enabling adoption success.
Next steps
To resume your journey where COM left off, choose one of the following cloud adoption journeys:
Getting started with cloud migration
Getting started with cloud-enabled innovation
Enabling adoption success
Azure enterprise scaffold is now the Microsoft Cloud
Adoption Framework for Azure
2 minutes to read • Edit Online

The Azure enterprise scaffold has been integrated into the Microsoft Cloud Adoption Framework for Azure. The
goals of the enterprise scaffold are now addressed in the Ready section of the Cloud Adoption Framework. The
enterprise scaffold content has been deprecated.
To begin using the Cloud Adoption Framework, see:
Ready overview
Creating your first landing zone
Landing zone considerations.
If you need to review the deprecated content, see the Azure enterprise scaffold.
W A R N IN G

Azure Virtual Datacenter has been integrated into the Microsoft Cloud Adoption Framework for Azure. This guidance serves as a
significant part of the foundation for the Ready and Governance methodologies within the Cloud Adoption Framework. To
support customers making this transition, the following resources have been archived and will be maintained in a separate
GitHub repository.

Archived resources
Azure Virtual Datacenter: Concepts
This e-book shows you how to deploy enterprise workloads to the Azure cloud platform, while respecting your existing security and
networking policies.

Azure Virtual Datacenter: A Network Perspective


This online article provides an overview of networking patterns and designs that can be used to solve the architectural scale,
performance, and security concerns that many customers face when thinking about moving en masse to the cloud.

Azure Virtual Datacenter: Lift and Shift Guide


This white paper discusses the process that enterprise IT staff and decision makers can use to identify and plan the migration of
applications and servers to Azure using a lift and shift approach, minimizing any additional development costs while optimizing
cloud hosting options.
How does Azure work?
2 minutes to read • Edit Online

Azure is Microsoft's public cloud platform. Azure offers a large collection of services including platform as a
service (PaaS ), infrastructure as a service (IaaS ), and managed database service capabilities. But what exactly is
Azure, and how does it work?

Azure, like other cloud platforms, relies on a technology known as virtualization. Most computer hardware can be
emulated in software, because most computer hardware is simply a set of instructions permanently or semi-
permanently encoded in silicon. Using an emulation layer that maps software instructions to hardware
instructions, virtualized hardware can execute in software as if it were the actual hardware itself.
Essentially, the cloud is a set of physical servers in one or more datacenters that execute virtualized hardware on
behalf of customers. So how does the cloud create, start, stop, and delete millions of instances of virtualized
hardware for millions of customers simultaneously?
To understand this, let's look at the architecture of the hardware in the datacenter. Inside each datacenter is a
collection of servers sitting in server racks. Each server rack contains many server blades as well as a network
switch providing network connectivity and a power distribution unit (PDU ) providing power. Racks are sometimes
grouped together in larger units known as clusters.
Within each rack or cluster, most of the servers are designated to run these virtualized hardware instances on
behalf of the user. However, some of the servers run cloud management software known as a fabric controller. The
fabric controller is a distributed application with many responsibilities. It allocates services, monitors the health of
the server and the services running on it, and heals servers when they fail.
Each instance of the fabric controller is connected to another set of servers running cloud orchestration software,
typically known as a front end. The front end hosts the web services, RESTful APIs, and internal Azure databases
used for all functions the cloud performs.
For example, the front end hosts the services that handle customer requests to allocate Azure resources such as
virtual machines, and services like Cosmos DB. First, the front end validates the user and verifies the user is
authorized to allocate the requested resources. If so, the front end checks a database to locate a server rack with
sufficient capacity and then instructs the fabric controller on that rack to allocate the resource.
So fundamentally, Azure is a huge collection of servers and networking hardware running a complex set of
distributed applications to orchestrate the configuration and operation of the virtualized hardware and software
on those servers. It is this orchestration that makes Azure so powerful—users are no longer responsible for
maintaining and upgrading hardware because Azure does all this behind the scenes.

Next steps
Now that you understand Azure internals, learn about cloud resource governance.
Learn about resource governance

Вам также может понравиться