Академический Документы
Профессиональный Документы
Культура Документы
CLOUD
COMPUTING
Andrew Weiss
Michael Bluvshteyn
CONTENTS
Page 1
Michael Bluvshteyn
Michael is currently an undergraduate student at Purdue University working on finishing his B.S. in Computer
and Information Technology with a concentration in Network Engineering. He has always been interested in
virtualization and cloud computing. He believes that cloud computing is a way to help organizations of all
sizes achieve more with less, while at the same time, reducing the cost and environmental footprint of IT.
Page 2
Page 3
RESEARCH METHODOLOGY
In order to examine the issues discussed in more detail, it is important to follow a methodology that resembles
common cloud computing deployment scenarios. To gather the appropriate data, a fully functional cloud
environment is required.
Overview
Typical enterprise IT architectures are comprised of multi-site data centers that contain core operating
infrastructure components. Large corporations utilize multiple, geographically dispersed data centers for
redundancy and disaster recovery scenarios. This research will attempt to emulate these data centers by
creating multiple server nodes in a laboratory environment. Nodes will be interconnected via simulated wide
area network (WAN) links.
The Scenario
The study will recreate the IT environment of a medium size business. The business is large enough to have IT
professionals that are dedicated to specific fields and an IT department that has both developers and systems
engineers. The businesss primary systems are Microsoft-based (i.e. Active Directory, Exchange, etc.). Some
core systems run Linux/Unix (i.e. Oracle, Web Services, etc.) and in-house expertise is used to run both
platforms. The Company has three primary sites, its corporate office in Chicago, IL and engineering/sales
offices in Los Angeles and New York. The IT department resides in Chicago, however, both sales offices have
virtualization nodes for redundancy and for handling local requests. All remote users and smaller remote sites
connect directly to the corporate office for all IT services.
The Objective
The study aims to answer five main questions.
n Is the cloud a viable solution?
n Which virtualization solution is better for the given scenario?
n What are the scalability concerns?
n What are the interoperability concerns?
n What are the manageability requirements?
Is the cloud a viable solution?
With all of the hype surrounding cloud computing, it is easy to assume that it is a solution for every
environment. The main goal of this section is to decide whether or not cloud computing is worthwhile for the
business depicted in the scenario.
What are the scalability concerns?
Organizations need to have the ability to scale their IT resources to support future growth. Our research
intends on learning more about how each platform can accomplish this. While the research environment is
limited to a small-scale deployment, the concepts explored can be applied to large cloud applications.
Page 4
Page 5
Andrew Weiss
Page 6
WHAT IS OPENSTACK
Overview
OpenStack is an open source cloud computing platform originally developed by the hosting company
Rackspace and NASA in a joint collaboration effort. The project allows organizations to deploy cloud
computing services within their existing infrastructure. Scalability and elasticity are the two primary goals of
the OpenStack project. Since its first release in the fall of 2010, OpenStack has had considerable growth
with over 100 corporations participating at the time of this writing.
History of OpenStack
In October 2010, the initial release of OpenStack (code-name Austin) was made available to the public.
Since then, OpenStack has come a long way in both its capabilities and support by the community. The two
original projects include OpenStack Compute and OpenStack Object Storage. The Bexar release in
February 2011 introduced the OpenStack Image Service Project. In April 2011, the OpenStacks Cactus
build was introduced with improvements to all three projects. The latest stable release of OpenStack is
Diablo which was released in September 2011.
Two new projects no longer in incubation are OpenStack Identity and OpenStack Dashboard. These projects
will be included with the next official release of OpenStack, code-name Essex which is due for an April
2012 release. Both of these components were in Release Candidate status at the time of this writing and
were included in the research.
OpenStack Projects
Currently, OpenStack is made up of five foundational projects: OpenStack Compute (code-name Nova),
OpenStack Object Storage (code-name Swift), OpenStack Image Service (code-name Glance),
OpenStack Identity (code-name Keystone), and OpenStack Dashboard (code-name Horizon).
Page 7
Each component of OpenStack is designed to provide specific functionality that revolves around the following
key elements:
n Virtual machine provisioning (Compute)
n Hypervisor management (Compute)
n Network management (Compute)
n Virtual machine image management (Image Service)
n Storage (Object Storage)
n Authentication (Identity)
n User interface (Dashboard)
Page 8
Page 9
Page 10
Page 11
Page 12
Development Framework
OpenStack has been built primarily using Python. This has allowed for the design and development of a
highly scalable cloud technology with minimal resource consumption. There is also support for nearly anyone
who is interested in the platform to contribute to the project. With freely available APIs, companies have
been developing their own OpenStack-based solutions.
Page 13
Compute Node(s)
Each virtual machine host is also known as a compute node. Compute nodes run the Nova Compute service
and contain identical configurations. OpenStack has a built in compute scheduler that allows the
administrator to control how virtual machine instances are allocated across hosts. For the purposes of this
research, simple scheduling was established which allowed for load balancing of instances based on
resource consumption. The cloud controller also had the compute service installed, giving instances the ability
to be provisioned across multiple hosts.
Storage Node(s)
Storage nodes run the required Swift storage services and are synchronized via underlying technologies.
When multiple storage nodes are used, a reverse proxy server directs storage requests to the appropriate
nodes. A multi-node environment provides for high availability data deployments.
Page 14
Below is a set of required technical knowledge items that administrators should possess prior to building
OpenStack in their own environments.
n Linux operating system fundamentals
n SQL
n Python fundamentals
n Shell scripting
n Networking fundamentals
n KVM virtualization
Hardware requirements
As outlined above in the Methodology section, it was important that a production enterprise environment
was emulated as closely as possible. OpenStack requires a limited number of resources, and as a result,
there was no need for expensive hardware. Many of the OpenStack components could be run simultaneously
on the same server, which allowed for additional consolidation of equipment.
The environment was separated into the three logical components as previously described: the cloud
controller, compute node, and storage node. The system specifications for the hardware used is as follows:
Cloud controller specifications
n Dell OptiPlex 745
n Intel Core 2 Duo Processor with VT support
n 2 Gigabit Ethernet network interface cards (onboard and PCI-E add-in card)
n 4GB of memory
n 80GB internal hard drive
Compute node specifications
n Dell OptiPlex 745
n Intel Core 2 Duo Processor with VT support
n 2 Gigabit Ethernet network interface cards (onboard and PCI-E add-in card)
n 6GB of memory
n 80GB internal hard drive
Storage node specifications
n Dell OptiPlex 745
n Intel Core 2 Duo Processor
n 2 Gigabit Ethernet network interface cards (onboard and PCI-E add-in card)
Page 15
n 2GB of memory
n 2 80GB internal hard drives
Networking requirements
A critical aspect of OpenStack is a proper network implementation. For the purposes of this research, three
distinct networks were created: a public, private, and virtual machine network. Refer to (Table 1) for the
specifications of each network.
Network Type
Addressing
Public
10.129.1.0/24
Private
192.168.10.0/24
Virtual Machine
172.16.0.0/24
Each physical machine had two network interface cards installed, one for the public network and the other for
the private network. The public network was located on the Purdue Computer and Information Technology
(CIT) network and allowed for external communications. The private network was used for communication
between the hosts themselves. A bridged interface was created on the cloud controller to allow for
communication between the virtual machine network and the public network.
Package
Identity
keystone 2012.1~rc1
Compute
nova 2012.1~rc1
Image
Service
glance 2012.1~rc1
Dashboard
horizon 2012.1~rc1
The recommended order of installation and configuration of the cloud controller was followed, as indicated
below:
1. MySQL database server
2. Keystone
3. Nova
Page 16
4. Glance
5. Horizon
A database server is required to store all of the metadata and tables generated by the OpenStack
components. Keystone is typically the first service installed because each of the components require
authentication in order for intercommunications to be established. The services were configured according to
documentation obtained from various sources (refer to the References section).
TROUBLESHOOTING OPENSTACK
Troubleshooting became an important skill throughout this research on OpenStack. Fortunately, each package
creates its own log files from which relevant debugging information can be parsed. Each of the components
was configured for all levels of debugging so that the commands and their outputs could be analyzed. Since
there was no formal troubleshooting approach defined prior to this research, a procedure for doing so was
developed.
n Analyze the appropriate log files
n Ensure the configuration files are correct and error-free
n Make sure the required services are running
Page 17
Michael Bluvshteyn
Page 18
WHAT IS VMWARE
Overview
VMware is a global company providing virtualization software for home computing as well as the enterprise.
VMware has quickly become the global leader in virtualization and cloud infrastructure with more than
300,000 customers and 25,000 partners. VMware also has a large software portfolio, allowing it to provide
solutions for individuals and companies of any size.
History of VMware
VMware was founded in 1998 by Diane Greene, Mendel Rosenblum, Scott Devine, Edward Wang and
Edouard Bungion. VMwares most notable product, and the one that helped shape the company into what it is
today, was the first Type 2 hypervisor. This hypervisor was eventually used in VMwares first product,
VMware workstation. That, along with their Type 1 hypervisor, form the basis of all VMware-based cloud
products today. VMware was acquired by EMC in August 2007 and is run as a partial subsidiary of EMC to
this day. EMC released 10% of VMware shares in an IPO on August 2007.
VMware Products
VMware has a large portfolio of products; however, the three that were focused on for this paper were
VMware vSphere, vCenter, and vCloud Director.
vSphere
vSphere, formerly known as VMware Infrastructure 4, is VMwares cloud computing and virtualization
operating system. They key component of vSphere is VMware ESXi which is the platform on which all
enterprise virtualization and cloud solutions from VMware are built.
vCenter
vCenter, formerly known as Virtual Center, is a centralized management tool for vSphere.
vCloud Director
vCloud Director is a cloud computing management platform. vCloud Director interfaces with virtualized
resources allowing users to gain self-service access to them through a services catalogue. vCloud Director
allows tasks that previously would have required significant IT involvement to be completed automatically.
Page 19
Each of the products used in our VMware cloud provides the following components:
VMware ESXi (vSphere)
n Hypervisor management
n Network management
n Storage (to utilize all features of VMware ESX and vCenter, a SAN is required for storage)
vCenter
n Virtual machine provisioning
n Virtual machine image management
n Authentication
vCloud Director
vCloud Director provides additional functionality, including the auto provisioning of virtual machines as well as
entire IaaS clouds. vCloud Director allows for the creation of complex business rules for the distribution of
virtual machines and networks.
n Virtual machine provisioning
n Virtual machine image management
n Authentication
Page 20
vSphere Client
vSphere Client is used to interface with both ESXi and vCenter.
n User interface
vCloud Director GUI Interface
vCloud Director GUI interface can be used to provision and create new virtual machines and networks, as well
as create logical pools of resources (VDCs) that can then be consumed by internal and external clients.
vCloud Director also allows the ability to over-commit resources between pools and virtual machines. Internal
and external clients can be assigned to VDCs using a subscription (dedicated) or on demand (pay-as-you-go)
model. Users can access assigned VDCs through a self-service web portal. The vCloud portal can include
custom application or libraries, which have approved operating system templates and complete application
stacks created by the administrator. Users can then instantly and effortlessly deploy from any library to
which they have access.
n User interface
Page 21
n Networking fundamentals
n Storage and SAN concepts
Hardware requirements
It was important that the lab environment was capable of emulating a production environment as closely as
possible. There were also hardware constraints that needed to be met. VMware requires a reduced number
of resources to run properly, and the hardware limitation only affected the performance and number of
virtual machines that could be instantiated. In the end, four Dell Optiplex 745 machines were utilized in the
following configurations (note, in this scenario the domain controller required for vSphere to run was
virtualized and is NOT a recommended practice).
Machine 1: Server 2008 R2 64bit (vSphere)
n Dell OptiPlex 745
n Intel Core 2 Duo Processor with VT support
n 1 Gigabit Ethernet network interface cards
n 4GB of memory
n 80GB internal hard drive
Machines 2 and 3: ESXi
n Dell OptiPlex 745
n Intel Core 2 Duo Processor with VT support
n 1 Gigabit Ethernet network interface card
n 6GB of memory
n 150GB internal hard drive
Machine 4: CentOS 5 (vCloud Director)
n Dell OptiPlex 745
n Intel Core 2 Duo Processor with VT support
n 1 Gigabit Ethernet network interface card
n 4GB of memory
n 80GB internal hard drive
n 150GB internal hard drive
Page 22
Software Requirements
It was important to use enterprise level hardware that met the processing and memory needs of the machines
that were virtualized. This section provides the minimum requirements necessary to get a VMware cloud
solution running in a lab environment.
VMware ESXi
n 64-bit VT enabled processor
n 2GB RAM minimum
n 1 or more Gigabit or 10Gb Ethernet controllers
n SCSI, SAS, SATA internal storage
VMware vCenter 5
n Microsoft server 2008/R2 64 bit
n 4GB RAM minimum
n Microsoft SQL server 2008 R2 Express (minimum)
n 1 or more Gigabit or 10Gb Ethernet controllers
n Active Directory
vCloud Director
n vCenter 5/4.1/4
n ESX/i 4/4.1/5
n vCenter networks used for network pools must be available to all hosts in the cluster
n vCenter must trust ESX/i hosts
n RHEL 5
Networking Requirements
The VMware cloud was setup directly on network space allocated on the Purdue Computer and Information
Technology (CIT) network.
Installing ESXi
The installation of ESXi was relatively straightforward. The ESXi image was installed directly from a USB
thumb drive onto local storage on each of the three ESXi nodes. Due to hardware and performance
constraints, local storage for all ESXi nodes along with shared storage on the vCloud Director host were used.
For production purposes, a high performance SAN is recommended.
Page 23
Installing vCenter
The installation of vCenter was performed by a VMware installation wizard that ensured all hardware and
software prerequisites were met before allowing the installation to start. One important requirement was to
have an Active Directory domain setup. vCenter cannot be installed on a domain controller, so in the lab
environment, a DC was virtualized on one of the ESXi nodes before installing vCenter. Once all requirements
were met, the wizard installed SQL server on the local machine and prompted for the required service ports.
In the lab, the default ports were used.
Page 24
Page 25
CLOUD SCALABILITY
The general consensus among the IT community is that scalability is a critical component of cloud computing.
With tremendous growth in data storage requirements, the ability for companies to be able to scale their
application and data delivery platforms is essential. As a result, this has been a primary focus area of this
research.
Page 26
CLOUD MANAGEABILITY
When it comes to cloud computing, organizations want to spend less time maintaining and troubleshooting and
more time innovating. With that comes a new aspect of IT systems administration that cloud technologies can
offer.
CLOUD INTEROPERABILITY
An important aspect of cloud computing is the ability for different technologies to integrate with one another.
OpenStack has support for multiple hypervisors on its Compute platform which include Linux KVM, VMwares
ESX/ESXi and Citrixs XenServer/Xen Cloud Platform. This provides organizations that may already have
VMware and/or Citrix virtualization solutions in place with the ability to expand to OpenStack.
VMware solutions are geared towards managing VMware ESX servers. While vCenter supports a variety of
different plugins, most features of both vCenter and vCloud Director are meant for VMware ESX.
Page 27
Page 28
Page 29
GLOSSARY
API: An API, or application programming interface, provides a set of coding specifications used by developers to
provide for communication with other applications and services.
Bridge interface: A Layer 2 networking interface that forwards Ethernet frames between different network
segments.
Domain controller (DC): A host that is responsible for managing clients, devices, and resources in a domain
environment.
Hypervisor: A hypervisor is a hardware virtualization method that allows guest operating systems to run on a
single host machine.
KVM: KVM, or kernel-based virtual machine, is a virtualization technique designed for Linux.
Reverse proxy: A reverse proxy takes requests from Internet hosts and forwards them to internal clients.
Storage area network (SAN): A network of storage devices interconnected via Layer 2 technologies.
Virtual machine: An emulated guest operating system that runs on top of a physical host machine.
VDC: A VDC is a virtual data center concept created by VMware.
VT: VT is Intels virtualization technology built into its chipset to support true hardware virtualization.
Page 30
REFERENCES
Chen, G. (2010, January). Virtualizing Tier I Applications: A Critical Step on the Journey Towards the Private
Cloud. IDC.
Gallagher, S. (2012, January). AT&T joins OpenStack as it launches cloud for developers. Retrieved March 25,
2012, from Ars Technica: http://arstechnica.com/business/news/2012/01/att-joins-openstack-as-it-launchescloud-for-developers.ars
Hartshorne, B. (2012, February 9). Scaling media stroage at Wikimedia with Swift. Retrieved March 25, 2012,
from Wikimedia Foundation: http://blog.wikimedia.org/2012/02/09/scaling-media-storage-at-wikimediawith-swift/
Henderson, T. (2012, January 18). Cloud activity to explode in 2012. Retrieved January 30, 2012, from
Network World: http://www.networkworld.com/news/2012/122011-outlook-test-254278.html
VMware. (2010). Indiana University Virtualizes Mission-Critical Oracle Database. Palo Alto, CA: VMware.
VMware. (2010, January). Major Telecom Provider Uses Virtualization to Give Customers Greater Flexibility
and Lower Costs in Outsourced Datacenters. Palo Alto, CA: VMware.
Page 31