Вы находитесь на странице: 1из 10

contributed articles

DOI:10.1145/ 2593686
Doesnt matter, says Andrea. The
Users trust in cloud systems is bonus more than covers it, and well
still come out ahead. Theres a lot of
undermined by the lack of transparency money on the table.
in existing security policies. What about power? Cooling? And
how soon can we get our hands on the
BY MIHIR NANAVATI, PATRICK COLP, machines? Its Friday. Lets not lose the
BILL AIELLO, AND ANDREW WARFIELD weekend.
Actually, says Andrea, Why dont

Cloud
we just rent the machines through the
cloud? Wed have things up and run-
ning in a couple of hours.
Good idea! Get Sam on it, and Ill
run it by security.

Security:
An hour and hundreds of clicks lat-
er, Transmogrifica has more than 100
nodes across North America, each cost-
ing less than a dollar per hour. Teams
are already busy setting up their soft-

A Gathering
ware stack, awaiting a green light to
start work with live data.
Cloud computing has fundamen-
tally changed the way people view com-
puting resources; rather than being an

Storm
important capital consideration, they
can be treated as a utility, like power
and water, to be tapped as needed.
Offloading computation to large, cen-
tralized providers gives users the flex-
ibility to scale available resources with
changing demands, while economies
of scale allow operators to provide
required infrastructure at lower cost
than most individual users hosting
their own servers.
The benefits of such utilification36
extend well beyond the cost of underly-
ing infrastructure; cloud providers can
afford dedicated security and reliabil-
FRIDAY, 15:21. Transmogrifica headquarters, Palo Alto ity teams with expertise far beyond the
reach of an average enterprise. From a
News has just come in that Transmogrifica has won security perspective, providers can also
a major contract from Petrolica to model oil and gas
reserves in Gulf of Mexico. Hefty bonuses are on offer key insights
if the work is completed ahead of schedule.   Utilification of computing delivers
benefits in terms of cost, availability,
Two people are seated in the boardroom. and management overhead.
Lets order 150 machines right away to speed things  S hared infrastructure opens questions as
along, says Andrea. to the best defenses to use against new
and poorly understood attack vectors.
Too expensive, says Robin. And what will we do  L ack of transparency concerning
with all the machines when were done? Well be over- cloud providers security efforts and
governmental surveillance programs
provisioned. complicates reasoning about security.

70 COMMUNICATIO NS O F TH E ACM | M AY 201 4 | VO L . 5 7 | NO. 5


benefit from scale, as they can collect ance between the organizational need volve a significant benefit; since virtual
large quantities of data and perform to provision and administer software at machines are analogous to physical
analytics to detect intrusions and other the granularity of a whole machine and machines, administrators can move
abnormalities not easily spotted at the the operational desire to use expensive existing in-house server workloads
level of individual systems. datacenter resources as efficiently as in their entirety, or the full software
The value of such centralized de- possible. Virtualization cleanly decou- stack, dependencies and all, to the
ployment is evident from its rapid up- ples the administration of hosted soft- cloud, with little or no modification.
take in industry; for example, Netflix ware from that of the underlying physi- Virtualization has also proved itself
migrated significant parts of its man- cal hardware, allowing customers to an excellent match for various trends
agement and encoding infrastructure provision servers quickly and account- in computer hardware over the past
to Amazon Web Services,12 and Drop- ably and providers to service and scale decade; for example, increasingly par-
box relies on Amazons Simple Stor- their datacenter hardware without af- allel, multicore systems can be parti-
age Service to store users data.11 Cloud fecting hosted applications. tioned into a number of single- or dual-
desktop services (such as OnLive Desk- Achieving a full range of features in a core virtual machines, so hardware can
top) have also helped users augment virtualization platform requires many be shared across multiple users while
thin clients like iPads and Chrome- software components. A key one is the maintaining isolation boundaries by
books with access to remote worksta- hypervisor, a special class of operating allowing each virtual machine access
tions in data centers. systems that hosts virtual machines. to only a dedicated set of processors.
Virtualization is at the forefront of While conventional OSes present sys- Multiplexing several virtual ma-
this shift to cloud-hosted servicesa tem- and library-level interfaces to run chines onto a single physical host
technique for machine consolidation multiple simultaneous applications, allows cloud operators to provide
that helps co-locate multiple appli- a hypervisor presents a hardware-like low-cost leased computing to users.
cation servers on the same physical interface that allows simultaneous ex- However, such convenience comes at
machine. Developed in the 1960s and ecution of many entire OS instances at a price, as users must now trust the
rediscovered in earnest over the past the same time. The coarser granularity provider to get it right and are largely
decade, virtualization has struck a bal- sandboxes provided by hypervisors in- helpless in the face of provider failures.

MAY 2 0 1 4 | VO L. 57 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 71


contributed articles

While arguable that such reliance is vices, including Foursquare, Heroku, In traditional nonvirtualized envi-
like relying on third parties for other Netflix, and Reddit. Unlike outages, ronments, securing systems involves
infrastructure (such as power and net- however, security exploits are not ob- patching and securing the OS kernel,
work connectivity), there is one crucial vious from the outside and could go often on a weekly basis. However, vir-
difference: Users rely on the provider undetected for a long time. While the tualized environments expose a larger
for both availability of resources and broad reporting of failures and outages attack surface than conventional non-
the confidentiality of their data, mak- is a strong incentive for providers to virtualized environments; even fully
ing critical both the security and the give clear explanations, there is little patched and secured systems may be
availability of the systems. incentive to disclose compromises of compromised due to vulnerabilities
Despite the best effort of cloud their systems to their users. Moreover, in the virtualization platform while si-
providers, unexpected power outages, cloud providers are legally bound to multaneously remaining vulnerable to
hardware failures, and software mis- cooperate with law-enforcement agen- all attacks possible on nonvirtualized
configurations have caused several cies in some jurisdictions and may be systems. OS bugs have been exploited
high-profile incidents24 affecting the compelled, often in secrecy, to reveal to allow attackers to break process iso-
availability of large-scale Internet ser- more about their users activities than lation and compromise entire systems,
is commonly acknowledged. while virtualization-platform bugs risk
Figure 1. Example TCB of a virtualization In virtualized environments, misbe- exposing an opportunity for attackers
platform.
having tenants on a given machine within one virtual machine to gain ac-
can try to compromise one another or cess to virtual machines belonging to
Control VM the virtualization platform itself. As other customers. Exploits at either of
aka Domain 0 VM A the lowest software layer, responsible these layers endanger both private data
Administrative Tools for isolating hosted virtual machines and application execution for users of
and protecting against such attacks, virtual machines.
Device Drivers
VM B the virtualization platform is the un- Worse, virtualization exposes users
Device Emulation derlying trusted layer in virtualized de- to the types of attacks typically absent
ployments. The trust customers place in nonvirtualized environments. Even
Hypervisor in the security and stability of hosting without a compromise of the virtu-
platforms is, to a large degree, trust in alization platform, shared hardware
Virtualization Platform
the correctness of the virtualization could store sensitive state that is inad-
platform itself. vertently revealed during side-channel
attacks. Despite attempts by virtualiza-
Figure 2. TCB size for different virtualization platforms, from Nova.29 The Linux kernel size tion platforms to isolate hardware re-
is a minimal system install, calculated by removing all unused device drivers, file systems,
and network support.
sources, isolation is far from complete;
while each virtual machine may have
access to only a subset of the proces-
5100 Hypervisor sors and physical memory of the sys-
Linux tem, caches and buses are often still
5000
Qemu shared. Attacks that leak encryption
KVM keys and other sensitive data across in-
4900
Windows dependent virtual machines via shared
caches are being explored.26,40
4800
Modern cloud deployments require
an unprecedented degree of trust on
the part of users, in terms of both the
intention and competence of service
600
providers. Cloud providers, for their
part, offer little transparency or reason
500
for users to believe their trust is well
placed; for example, Amazons secu-
400
rity whitepapers say simply that EC2
KLOC

relies on a highly customized version


300
of the Xen virtualization platform to
provide instance isolation.6 While sev-
200
eral techniques to harden virtualized
deployments are available, it is unclear
100
which, if any, are being used by large
cloud service providers.
0
ESXi Linux Xen KVM Hyper-V
Critical systems and sensitive data
are not exclusive to cloud computing.
Financial, medical, and legal systems

72 COMM UNICATIO NS O F THE ACM | M AY 201 4 | VO L . 5 7 | NO. 5


contributed articles

have long required practitioners com- Nothing I can think of, though let kernel (200K LOC vs. 300K LOC). Given
ply with licensing and regulatory re- me think it through a bit more. that Linux has seen several privilege-
quirements, as well as strict auditing to escalation exploits over the years, jus-
help assess damage in case of failure. Trusted Computing Base tifying the security of the virtualization
Similarly, aircraft (and, more recently, The set of hardware and software platform strictly as a function of the
car) manufacturers have been required components a systems security de- size of the TCB fails to hold up.
to include black boxes to collect data pends on is called the systems trusted A survey of existing attacks on virtu-
for later investigation in case of mal- computing base, or TCB. Proponents alization platforms20,27,37,38 reveals they,
functions. Cloud providers have begun of virtualization have argued for the like other large software systems, are
wooing customers with enhanced se- security of hypervisors through the susceptible to exploits due to security
curity and compliance certifications, small is secure argument; hyper- vulnerabilities; the sidebar Anatomy
underlining the increasing need for so- visors present a tiny attack surface of an Attack describes how an attack-
lutions for secure cloud computation.5 so must have few bugs and be se- er can chain several existing vulner-
The rest of this article focuses on a key cure.23,32,33 Unfortunately, it ignores abilities together into a privilege esca-
underpinning of the cloudthe virtu- the reality that TCB actually contains lation exploit and bypass the isolation
alization platformdiscussing some not just the hypervisor but the entire between virtual machines provided by
of the technical challenges and recent virtualization platform. the hypervisor.
progress in achieving trustworthy host- Note the subtle but crucial distinc-
ing environments. tion between hypervisor and virtu- Reduce Trusted Code?
alization platform. Architecturally, One major concern with existing vir-
Meanwhile in Palo Alto... hypervisors form the base of the virtu- tualization platforms is the size of the
Friday, 15:47. Transmogrifica head- alization platform, responsible for at TCB. Some systems reduce TCB size
quarters, Palo Alto least providing CPU multiplexing and by de-privileging the commodity OS
An executive enters the boardroom memory isolation and management. component; for example, driver-spe-
where Robin is already seated. Virtualization platforms as a whole cific domains14 host device drivers in
Hello, Robin. I hear celebrations also provide the other functionality isolated virtual machines, removing
are in order. How much time do we needed to host virtual machines, in- them from the TCB. Similarly, stub do-
have? cluding device drivers to interface with mains30 remove the device emulation
Hey Sasha, just who I was looking physical hardware, device emulation stack from the TCB. Other approaches
for, Robin says. Its going to be tight. to expose virtual devices to VMs, and completely remove the commodity OS
Andrea was just here, and we thought control toolstack to actuate and man- from the systems TCB,10,24 effectively
wed buy virtual machines in the cloud age VMs. Some enterprise virtualiza- making the hypervisor the only per-
to speed things up. Anything security tion platforms (such as Hyper-V and sistently executing component of the
would be unhappy about? Xen) rely on a full-fledged commodity providers software stack a user needs
Well..., says Sasha, it isnt as se- OS running with special privileges for to trust. The systems TCB becomes a
cure as in-house. We could be shar- the functionality, making both the hy- single, well-vetted component with sig-
ing the system with anyone. Literally pervisor and the commodity OS part nificantly fewer moving parts.
anyonewho might love for us to fail. of the TCB (see Figure 1). Other virtu- Boot code is one of the most com-
Xanadu, for instance, which is sore alization platforms, most notably KVM plex and privileged pieces of software.
about not getting the contract? Its un- and VMware ESXi, include all required Not only is it error prone it is also not
likely, but it could have nodes on the functionality within the hypervisor used for much processing once the sys-
same hosts we do and start attacking itself. KVM is an extension to a full- tem has booted. Many legacy devices
us. fledged Linux installation, and ESXi is commodity OSes support (such as the
What would it be able to do?, says a dedicated virtualization kernel that ISA bus and serial ports) are not rel-
Robin. includes device drivers. In each case, evant in multi-tenant deployments like
In theory, nothing. The hypervisor this additional functionality means the cloud computing. Modifying the de-
is supposed to protect against all such hypervisor is significantly larger than vice-emulation stack to eliminate this
attacks. And these guys take their se- the hypervisor component of either complex, privileged boot-time code25
curity seriously; they also have a good Hyper-V of Xen. Regardless of the exact once it has executed significantly re-
record. Cant think of anything off- architecture of the virtualization plat- duces the size of the TCB, resulting in
hand, but its frustrating how opaque form, it must be trusted in its entirety. a more trustworthy platform.
everything is. We barely know what Figure 2 makes it clear that even the Prior to the 2006 introduction of
system its running or if its hardened smallest of the virtual platforms, ESXi,a hardware support for virtualization,
in any way. Also, were completely in is comparable in size to a stock Linux all subsystems had to be virtualized
the dark about the rest of the provid- entirely through software. Virtualizing
ers security process. Makes it really a In 2009, Microsoft released a stripped-down the processor requires modification of
difficult to recommend anything one version of Windows Server 2008 called Server any hosted OS, either statically before
Core23 for virtualized deployments; while fig-
way or the other. ures concerning its size are still not available
booting or dynamically through an on-
Thats annoying. Anything else I to us, we do not anticipate the virtualization the-fly process called binary transla-
need to know? platform being significantly smaller than ESXi. tion. When a virtual machine is cre-

MAY 2 0 1 4 | VO L. 57 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 73


contributed articles

from additional device compatibility or


remove the entire OS and sacrifice de-
Anatomy of an Attack vice compatibility by requiring hyper-
visor-specific drivers for every device.29
Not all discovered vulnerabilities are exploitable; in fact, most exploits rely on chaining No matter how small the TCB is
together multiple vulnerabilities. In 2009, Kostya Korchinsky of Immunity Inc., made, sharing hardware requires a
presented an attack that gave an administrator within a virtual machine running on a
VMware hypervisor access to a physical host.20 software component to mandate ac-
This is notable for two reasons: It affected the entire family of VMware products, cess to the shared hardware. Being
so both Workstation and ESX server were vulnerable, and it was reliable enough that both complex and highly privileged,
Canvas, Immunitys commercially available penetration testing tool, included a
cloudburst mode to exploit systems and deploy different payloads. Rather than remain
this software is a real concern for the
an esoteric proof of concept, it was indeed a commercial exploit available to anyone. security of the system, an observation
The virtualization platform exposes virtual devices to guest machines through that begs the question whether it is
device emulation. The device emulation layer runs as a user-mode process within really necessary to share hardware re-
the host, acting as a translation and multiplexing layer between virtual and physical
devices. Cloudburst exploited multiple vulnerabilities in the emulated video card sources at all.
interface to allow the guest arbitrary read-and-write access to host memory, giving it the Researchers have argued that a stat-
ability to corrupt random regions of memory. ic partitioning of system resources can
The emulated video card accepts requests from the guest virtual machine through eliminate the virtualization platform
a FIFO command queue and responds to these requests by updating a virtual frame
buffer. Both the queue and the frame buffer reside in the address space of the from the TCB altogether.18 The virtual-
emulation process on the host (vmware-vmx) but are shared with the video driver ization platform traditionally orches-
in the guest. The rest of the processs address space is private and should remain trates the booting of the system, mul-
inaccessible to the guest at all times.
SVGA _ CMD_ RECT_ COPY is an example of a request issued by the driver to the
tiplexing the virtual resources exposed
emulator, specifying the (X, Y) coordinates and dimensions of a rectangle to be copied to virtual machines onto the available
along with the (X, Y) coordinates of the destination. The emulated device responds physical resources. However, static
by copying the appropriate regions, indexed relative to the start of the frame buffer. partitioning obviates the need for such
However, due to incorrect boundary checking, the device is able to supply an extremely
large or even negative X or Y coordinate and read data from arbitrary regions of the multiplexing in exchange for a loss of
processs address space. Unfortunately, due to stricter bounds checking around the flexibility in resource allocation. Parti-
destination coordinates, arbitrary regions of process memory cannot be written to. tioning physical CPUs and memory is
Emulating 3D operations requires the emulated device maintain some device state
or contexts. The contexts are stored as an array within the process but are not shared
relatively straightforward; each virtual
with the guest, which requests updates to the contexts through the command queue. machine is assigned a fixed number
The SVGA _ CMD_ SETRENDERSTATE command takes an index into the context array of CPU cores and a dedicated region
and a value to be written at that location but does not perform bounds checking on of memory that is isolated using the
the value of the index, effectively allowing the guest to write to any region of process
memory, relative to the context array. This relative write can be further extended by hardware support for virtualizing page
exploiting the SVGA _CMD_ SETLIGHTENABLED command that reads a pointer from tables. Devices (such as network cards
a fixed location within the context and writes the requested value to the memory the and hard disks) pose an even greater
pointer references. These two vulnerabilities can be chained to achieve arbitrary challenge since it is not reasonable to
memory writes; as the referenced pointer lies within the context array, it is easily
modified by exploiting the SETRENDERSTATE vulnerability. dedicate an entire device for each vir-
When arbitrary reads and writes are possible, shell-code can be written into process tual machine.
memory, then triggered by modifying a function pointer to reference the shell-code. As Fortunately, hardware virtualization
no-execute protection prevents injected shell-code from being executed, the function
pointer must first call the appropriate memory protection functions to mark these
support is not limited to processors,
regions of memory as executable code pages; when this is done, however, the exploit recently making inroads into devices
proceeds normally. themselves. Single-root I/O virtualiza-
tion (SR-IOV)21 enables a single physi-
cal device to expose multiple virtual de-
ated, binary translation modifies the and memory directly in hardware. vices, each indistinguishable from the
instruction stream of the entire OS to Some systems further reduce the original physical device. Each such vir-
be virtualized, then executes this modi- size of the TCB by splitting the func- tual device can be allocated to a virtual
fied code rather than the original OS tionality of the virtualization platform machine, with direct access to the de-
code. between a simple, low-level, system- vice. All the multiplexing between the
Virtualizing memory requires a wide hypervisor, responsible for isola- virtual devices is performed entirely in
complex page-table scheme called tion and security, and more complex, hardware. Network interfaces that sup-
shadow page tables,7 an expansive, per-tenant hypervisors responsible for port SR-IOV are increasingly popular,
extremely complicated process re- the remaining functionality of conven- with storage controllers likely to fol-
quiring the hypervisor maintain page tional virtualization platforms.29,35 By low suit. However, while moving func-
tables for each process in a hosted reducing the shared surface between tionality to hardware does reduce the
virtual machine. It also must monitor multiple VMs, such architectures help amount of code to be trusted, there is
any modifications to these page tables protect against cross-tenant attacks. no guarantee the hardware is immune
to ensure isolation between different In such systems, removing a large to vulnerability or compromise.
virtual machines. Advances in proces- commodity OS from the TCB presents Eliminating the hypervisor, while
sor technology render this functional- an unenviable trade-off; systems can attractive in terms of security, sacri-
ity moot by virtualizing both processor either retain the entire OS and benefit fices several benefits that make virtual-

74 COM MUNICATIO NS O F TH E AC M | M AY 201 4 | VO L . 5 7 | NO. 5


contributed articles

ization attractive to cloud computing. system in any way. Or to borrow one


Statically partitioning resources affects practitioners only somewhat tongue-
the efficiency and utilization of the sys- in-cheek observation: It only shows
tem, as cloud providers are no longer that every fault in the specification
able to multiplex several virtual ma-
chines onto a single set of physical re- This single has been precisely implemented in
the system.31 Moreover, formal veri-
sources. As trusted platforms beneath
OSes, hypervisors are conveniently
administrative fication quickly becomes intractable
for large pieces of code. While it has
placed to interpose on memory and de- toolstack is an proved applicable to some microker-
vice requests, a facility often leveraged
to achieve promised levels of security
artifact of the way nels,19 and despite ongoing efforts
to formally verify Hyper-V,22 no virtu-
and availability. hypervisors have alization platform has been shrunk
Live migration9 involves moving
a running virtual machine from one
been designed enough to be formally verified.
Software exploits usually lever-
physical host to another without inter- rather than a age existing bugs to modify the flow
rupting its execution. Primarily used
for maintenance and load balanc- fundamental of execution and cause the program
to perform an unauthorized action.
ing, it allows providers to seamlessly limitation of In code-injection exploits, attackers
change virtual to physical placements
to better balance workloads or simply hypervisors typically add code to be executed via
vulnerable buffers. Hardware security
free up a physical host for hardware or
software upgrades. Both live-migration
themselves. features help mitigate such attacks by
preventing execution of injected code;
and fault-tolerant solutions rely on the for example, the no-execute (NX) bit
ability of the hypervisor to continually helps segregate regions of memory
monitor a virtual machines memory into code and data sections, disallow-
accesses and mirror them to another ing execution of instructions resident
host. Interposing on memory accesses in data regions, while supervisor mode
also allows hypervisors to dedupli- execution protection (SMEP) prevents
cate, or remove redundant copies, transferring execution to regions of
and compress memory pages across memory controlled by unprivileged, us-
virtual machines. Supporting several er-mode processes while executing in a
key features of cloud computing, virtu- privileged context. Another class of at-
alization will likely be seen in cloud de- tacks called return-oriented program-
ployments for the foreseeable future. ming28 leverages code already present
in the system rather than adding any
Small Enough? new code and is not affected by these
Arguments for trusting the virtualiza- security enhancements. Such attacks
tion platform often focus on TCB size; rely on small snippets of existing code,
as a result, TCB reduction continues to or gadgets, that immediately precede
be an active area of research. While sig- a return instruction. By controlling the
nificant progressfrom shrinking the call stack, the attacker can cause execu-
hypervisor to isolating and removing tion to jump between the gadgets as de-
other core services of the platform sired. Since all executed code is original
has been made, in the absence of full read-only system code, neither NX nor
hardware virtualization support for ev- SMEP are able to prevent the attack.
ery device, the TCB will never be com- While such exploits seem cumbersome
pletely empty. and impractical, techniques are avail-
At what point is the TCB small able to automate the process.17
enough to be considered secure? Regardless of methodology, most
Formal verification is a technique to exploits rely on redirecting execu-
mathematically prove the correct- tion flow in an unexpected and un-
ness of a piece of code by comparing desirable way. Control-flow integrity
implementation with a correspond- (CFI) prevents such an attack by en-
ing specification of expected behav- suring the program jumps only to
ior. Although capable of guaranteeing predefined, well-known locations
an absence of programming errors, (such as functions, loops, and con-
it does only that; while proving the ditionals). Similarly, returns are
realization of a system conforms to a able to return execution only to valid
given specification, it does not prove function-call sites. This protection is
the security of the specification or the typically achieved by inserting guard

MAY 2 0 1 4 | VO L. 57 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 75


contributed articles

conditions in the code to validate any lines are evicted; the attacker deduces
control-flow transfer instructions.1,13 the execution pattern code based on
Software-based CFI implementa- the evicted cache lines and is able to
tions typically rely on more privileged extract the victims cryptographic key.
components to ensure the enforce-
ment mechanism itself is not disabled While providers Moreover, combining such attacks with
techniques to exploit cloud placement
or tampered with; for example, the
kernel can prevent user-space applica-
have no incentive algorithms26 could allow attackers to
identify victims precisely, arrange to
tions from accessing and bypassing the to undermine their forcibly co-locate virtual machines, and
inserted guard conditions. However,
shepherding the execution of hypervi-
users operations extract sensitive data from them.
Modern hypervisors are helpless to
sors through CFI is more of a challenge; (their business prevent them, as they have no way to
as the hypervisor is the most privileged
software component, there is noth-
indeed depends partition or isolate the caches, which
are often shared between cores on the
ing to prevent it from modifying the on maintaining same processor on modern architec-
enforcement engine. A possible work-
around34 is to mark all memory as read- user satisfaction), tures. While researchers have pro-
posed techniques to mitigate timing
only, even to the hypervisor, and fault the carelessness attacks (such as randomly delaying
on any attempted modification. Such
modification is verified while handling or maliciousness requests, adjusting the virtual ma-
chines perception of time and add-
the fault, and, though benign updates
to nonsensitive pages are allowed, any
of a single, ing enough noise to the computation
to prevent information leakage), no
attempt to modify the enforcement well-placed low-overhead practically deployable
engine is blocked. Despite the difficul-
ties, monitoring control flow is one of
administrator solutions are available. Such mitiga-
tion techniques remain an active area
the most comprehensive techniques to could compromise of research.
counter code-injection exploits.
the security of What to Do?
Shared Hardware Resources an entire system. The resurgence of hypervisors is a
A hypervisor provides strong isola- relatively recent phenomenon, with
tion guarantees between virtual ma- significant security advances in only a
chines, preventing information leak- few years. However, they are extremely
age between them. Such guarantees complex pieces of software, and writ-
are critical for cloud computing; their ing a completely bug-free hypervisor
absence would spell the end for pub- is daunting, if not impossible; vulner-
lic-cloud deployments. The need for abilities will therefore continue to exist
strong isolation is typically balanced and be exploited.
against another operational require- Assuming any given system will
ment, that providers share hardware eventually be exploited, what can we
resources between virtual machines do? Recovering from an exploit is so
to provide services at the scale and fraught with risk (overlooking even a
cost users demand. single backdoor can lead to re-com-
Side-channel attacks bypass isola- promise) it usually entails restoring
tion boundaries by ignoring the soft- the system from a known good backup.
ware stack and deriving information Any changes since this last backup are
from shared hardware resources; for lost. However, before recovery can be-
example, timing attacks infer certain gin, the exploit must first be detected.
system properties by measuring the Any delay toward such detection repre-
variance in time taken for the same sents a window of opportunity for an
operation across several executions attacker to monitor or manipulate the
under varying circumstances. Timing entire system.
attacks on shared instruction caches Comprehensive logging and audit-
have allowed attackers to extract cryp- ing techniques are required in several
tographic keys from a co-located vic- application domains, especially for
tims virtual machine.40 complying with many of the standards
These attacks are conceptually sim- cloud providers aim to guarantee.5
ple: The attacker fills up the i-cache, Broadly speaking, such audit trails
then waits for the victim to run. The have helped uncover corporate impro-
exact execution within the victims vir- priety, financial fraud, and even piece
tual machine determines which cache together causes of tragic accidents.

76 COMM UNICATIO NS O F THE AC M | M AY 201 4 | VO L . 5 7 | NO. 5


contributed articles

For cloud computing, such logs can Shouldnt the providers SOC to all virtual machines on the adminis-
help identify exactly how and when the compliance ensure its got steps in trators, it effectively hampers the abil-
system was compromised and what re- place to prevent that from happen- ity of operators to provide the guaran-
sources were affected. ing?, says Andrea before Robin could tees required by their customers, who,
Tracking information flows be- respond. Anyhow, Ill run it by legal in turn, could opt for more private
tween virtual machines and the man- and see how unhappy they are. We hosting solutions despite the obvious
agement tool stack allows logging should probably be fine for now, but advantages of cloud hosting in terms
unauthorized use of highly privileged its worth keeping in mind for any oth- of scale and security.
administrative tools.15 Not only is such er projects. Recognizing this danger, some sys-
use tracked, the specifics of the inter- tems advocate splitting the monolithic
action are recorded for future audit. If Watching the Watchers administrative toolstack into several
a virtual machines memory is read, the Isolating virtual machines in co-tenant mini toolstacks8,10 each capable of ad-
log stores the exact regions of accessed deployments relies on the underlying ministrating only a subset of the entire
memory, along with their contents. Us- hypervisor. While securing the hyper- system. By separating the provisioning
ers can then assess the effects of the ac- visor against external attacks is indeed of resources from their administra-
cesses and resolve them appropriately; vital to security, it is not the only vec- tion, users would have a private tool-
for instance, if regions corresponding tor for a determined attacker. Todays stack to manage their virtual machines
to a password or encryption keys are hypervisors run a single management to a much greater degree than with pre-
read, users can change the password or stack, controlled by a cloud provider. provisioned machines (see Figure 3).
encryption keys before incurring any Capable of provisioning and destroy- As a users toolstack can interpose on
further damage. ing virtual machines, the management memory accesses from only the guests
Beyond this, advanced recovery so- toolstack can also read the memory assigned to it, users can encrypt the
lutions can help recover quickly from and disk content of every virtual ma- content of their virtual machines if de-
security breaches and minimize data chine, making it an attractive target sired. Correspondingly, platform ad-
loss. Built on top of custom logging en- for compromising the entire system. ministrators no longer need rights to
gines,16 they provide analytics to clas- This single administrative tool- access the memory of any guest on the
sify actions as either tainted or non- stack is an artifact of the way hypervi- system, limiting their ability to snoop
tainted. Recovery is now much more sors have been designed rather than sensitive data.
fine grain; by undoing all effects of a fundamental limitation of hyper- Nested virtualization, which al-
only the tainted actions, an attack can visors themselves. While providers lows a hypervisor to host other hyper-
be reversed without losing all useful have no incentive to undermine their visors in addition to regular OSes, pro-
changes since the last backup. Alter- users operations (their business in- vides another way to enforce privacy
natively, during recovery, all actions, deed depends on maintaining user for tenants; Figure 4 outlines a small,
including the attack, are performed satisfaction), the carelessness or ma- lightweight, security-centric hypervi-
against a patched version of the sys- liciousness of a single, well-placed ad- sor hosting several private, per-tenant,
tem. The attack will now fail, while ministrator could compromise the se- commodity hypervisors.39 Isolation,
useful changes are restored. curity of an entire system. security, and resource allocation are
Revelations over the past year in- separated from resource manage-
Back at Transmogrifica... dicate several providers have been ment. Administrators at the cloud
Friday, 16:35. Transmogrifica head- required to participate in large-scale provider manage the outer hypervisor,
quarters, Palo Alto surveillance operations to aid law- allocating resources managed by the
Sasha enters the boardroom where enforcement and counterintelligence inner hypervisors. The inner hypervi-
Robin and Andrea are already seated. efforts. While such efforts concen- sors are administered by the clients
Robin, Andrea, how confidential is trate largely on email and social-net- themselves, allowing them to encrypt
this Petrolica data well be processing? work activity, the full extent of surveil- the memory and disks of their systems
Well, says Robin, glancing toward lance remains largely unknown to the without sacrificing functionality. Since
Andrea, Obviously, its private data, public. It would be nave to believe device management and emulation
and we dont want anyone to have ac- providers with the ability to monitor are performed by the inner hypervisor,
cess to it. But it isnt medical- or legal- users virtual machines for sensitive the outer, provider-controlled, hyper-
records-level sensitive, if thats what data (such as encryption keys) are not visor never needs access to the memo-
youre getting at. Why do you ask? required to do so; furthermore, they ry of a tenant, thereby maintaining the
I hope you realize anyone with are also unable to reveal such disclo- tenants confidentiality.
sufficient privileges in the cloud pro- sures to their customers. While both split toolstacks and
vider could read or modify all the Compliance standards also require nested virtualization help preserve
data. The provider controls the entire restricting internal access to customer confidentiality from rogue admin-
stack. Theres absolutely nothing we data while limiting the ability of a sin- istrators, the cloud provider itself
can do about it. Worse, we wouldnt gle administrator to make significant remains a trusted entity in all cases.
even know it happened. Obviously Im changes without appropriate checks After all, an operator with physical
not suggesting it would happen. I just and balances.5 As the single toolstack access to the system could simply
dont know the extent of our liability. architecture bestows unfettered access extract confidential data and encryp-

MAY 2 0 1 4 | VO L. 57 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 77


contributed articles

tion keys from the DRAM of the host (TPM). While differing in implemen- cloud providers have a strong incentive
or run a malicious hypervisor to do tation, both rely on cryptographic to bolster the confidence of their cus-
the same. While the former is a dif- primitives to establish a root of trust tomers, such transparency conflicts
ficult, if not impossible, problem to for the virtualization platform. with an operational desire to maintain
solve, recent advances help allow us- UEFI is a replacement for the BIOS some degree of secrecy regarding the
ers to gain some assurance about the and is the first software to be executed software stack for competitive reasons.
underlying system. during the boot process. In a secure
boot, each component of the boot pro- At the End of a Long Day...
Trust and Attestation cess verifies the identity of the next Friday, 21:17. Transmogrifica head-
Consider the following scenario com- one by calculating its digest or hash quarters, Palo Alto
monly seen in fiction: Two characters and comparing it against an expected Robin and Andrea are reflecting on
meet for the first time, with one, much value. The initial component requires their long day in the boardroom.
to the surprise of the other, seeming to a platform key in the firmware to attest Well, we got green lights from secu-
act out of character. Later, it becomes its identity. TPM differs slightly in its rity and legal. Sams team just called to
apparent that a substitution has oc- execution; rather than verify the iden- say theyve got it all tested and set up.
curred and the character is an impos- tity of each component while booting, We should be starting any time now,
ter. This illustrates a common prob- a chain with the digest of each compo- says Andrea.
lem in security, where a user is forced nent is maintained in the TPM. Verifi- The cloud scares me, Andrea. Its
to take the underlying system at its cation is deferred for a later time. Cli- powerful, convenient, and deadly. Be-
word, with no way of guaranteeing it is ents can verify the entire boot chain of fore we realize it, well be relying on it,
what it claims to be. This is particular- a virtualization platform, comparing it without understanding all the risks.
ly important in cloud environments, against a known-good value, a process And if that happens, itll be our necks.
as the best security and auditing tech- called remote attestation. But its been right for us this time
niques are worthless if the platform Trusted-boot techniques give users around. Good call.
disables them. a way to gain more concrete guaran- Epilogue. Transmogrifica complet-
Trusted boot is a technology tees about the virtualization platform ed the contract ahead of schedule with
that allows users to verify the identity to which they are about to commit generous bonuses for all involved. Cen-
of the underlying platform that was sensitive data. By providing a way to tralized computation was a successful
booted. While ensuring the loaded verify themselves, then allowing users experiment, and Transmogrifica today
virtualization platform is trusted, it to proceed only if they are satisfied that specializes in tailoring customer work-
makes no further guarantees about certain criteria are met, they help over- loads for faster cloud processing. The
security; the system could have been come one of the key concerns in cloud 150-node cluster server remains un-
compromised after the boot and still computing. However, this increased purchased.
be verified successfully. Two tech- security comes at a cost; to realize the
niques that provide a trusted boot are full benefit of attestation, the source Conclusion
unified extensible firmware interface code, or at least the binaries, must be Cloud computing, built with powerful
(UEFI) and trusted platform module made available to the attestor. While servers at the center and thin clients at
the edge, is a throwback to the main-
Figure 3. Multiple independent toolstacks. frame model of computing. Seduced
by the convenience and efficiency of
such a model, users are turning to the
User As VM User Bs VM cloud in increasing numbers. Howev-
er, this popularity belies a real security
Qemu risk, one often poorly understood or
even dismissed by users.
Disk Network Disk Network
Controller Device Controller Device
Cloud providers have a strong in-
centive to engineer their systems to

Figure 4. Nested virtualization.


User As User Bs
Toolstack Toolstack

VM 1 VM 2 VM 3

Control VM (Commodity OS)


User As User Bs
Hypervisor (Inner) Hypervisor (Inner)
Xen
Hypervisor (Outer)
Managed By

78 COMM UNICATIO NS O F THE AC M | M AY 201 4 | VO L . 5 7 | NO. 5


contributed articles

provide strong isolation and security Service Event in the U.S. East Region; https://aws. 25. Nguyen, A. Raj, H., Rayanchu, S., Saroiu, S., and
amazon.com/message/680342/ Wolman, A. Delusional boot: Securing hypervisors
guarantees while maintaining perfor- 5. Amazon. AWS Risk and Compliance, 2014; http:// without massive reengineering. In Proceedings of
mance and features. Their business media.amazonwebservices.com/AWS_Risk_and_ the Seventh ACM European Conference on Computer
Compliance_Whitepaper.pdf Systems. ACM Press, New York, 2012, 141154.
case relies on getting this combina- 6. Amazon. Overview of Security Processes, 2014; http:// 26. Ristenpart, T., Tromer, E., Shacham, H., and Savage,
tion of trade-offs right. However, they media.amazonwebservices.com/pdf/AWS_Security_ S. Hey, you, get off of my cloud: Exploring information
Whitepaper.pdf leakage in third-party compute clouds. In Proceedings
have an equally strong incentive to 7. Bugnion, E., Devine, S., Govil, K., and Rosenblum, of the 16th ACM Conference on Computer and
keep their software and management M. Disco: Running commodity operating systems Communications Security. ACM Press, New York,
on scalable multiprocessors. ACM Transactions on 2009, 199212.
techniques proprietary (to safeguard Computer Systems 15, 4 (1997), 412447. 27. Rutkowska, J. and Wojtczuk, R. Preventing and
8. Butt, S., Lagar-Cavilla, H.A., Srivastava, A., and detecting Xen hypervisor subversions. Presented at
their competitive advantages) and Ganapathy, V. Self-service cloud computing. In Black Hat USA 2008; http://www.invisiblethingslab.
not report bugs or security incidents Proceedings of the 2012 ACM Conference on com/resources/bh08/part2-full.pdf
Computer and Communications Security. ACM Press, 28. Shacham, H. The geometry of innocent flesh on the
(to maintain their reputations). While New York, 2012, 253264. bone: Return into libc without function calls (on the
hypervisors have been studied exten- 9. Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., x86). In Proceedings of the 14th ACM Conference on
Limpach, C., Pratt, I., and Warfield, A. Live migration Computer and Communications Security. ACM Press,
sively, and many security enhance- of virtual machines. In Proceedings of the Second New York, 552561.
ments have been proposed in the lit- USENIX Symposium on Networked Systems Design 29. Steinberg, U. and Kauer, B. NOVA: A microhypervisor-
and Implementation. USENIX Association, Berkeley, based secure virtualization architecture. In
erature, the actual techniques used CA, 2005, 273286. Proceedings of the Fifth European Conference on
or security incidents detected in com- 10. Colp, P., Nanavati, M., Zhu, J., Aiello, W., Coker, G., Computer Systems. ACM Press, New York, 2010,
Deegan, T., Loscocco, P., and Warfield, A. Breaking 209222.
mercial deployments are generally up is hard to do: Security and functionality in a 30. Thibault, S. and Deegan, T. Improving performance
not shared publicly. commodity hypervisor. In Proceedings of the 23rd by embedding HPC applications in lightweight Xen
ACM Symposium on Operating Systems Principles. domains. In Proceedings of the Second Workshop on
All this makes it difficult for users ACM Press, New York, 189202. System-Level Virtualization for High-Performance
to evaluate the suitability of a commer- 11. Dropbox. Where Does Dropbox Store Everyones Data?; Computing. ACM Press, New York, 2008, 915.
https://www.dropbox.com/help/7/en 31. University of New South Wales and NICTA. seL4.
cial virtualization platform for an ap- 12. Edbert, J. How Netflix operates clouds for maximum http://www.ertos.nicta.com.au/research/sel4/
plication. Given the overall profitabil- freedom and agility. AWS Re:Invent, 2012; http://www. 32. VMware. Benefits of Virtualization with VMware; http://
youtube.com/watch?v=s0rCGFetdtM www.vmware.com/virtualization/virtualization-basics/
ity and growth of the business, many 13. Erlingsson, U., Abadi, M., Vrable, M., Budiu, M., and virtualization-benefits.html
33. VMware. VMware hypervisor: Smaller footprint for
clients have sufficient trust to run cer- Necula, G.C. XFI: Software guards for system address
better virtualization solutions; http://www.vmware.
spaces. In Proceedings of the Seventh Symposium
tain applications in the cloud. Equally on Operating System Design and Implementation. com/virtualization/advantages/robust/architectures.
html
clear is that concern over security and USENIX Association, Berkeley, CA, 2006, 7588.
34. Wang, Z. and Jiang, X. Hypersafe: A lightweight
14. Fraser, K., Hand, S., Neugebauer, R., Pratt, I., Warfield,
liability is holding back other clients A., and Williamson, M. Safe hardware access with the approach to provide lifetime hypervisor control-flow
Xen virtual machine monitor. In Proceedings of the integrity. In Proceedings of the 2010 IEEE Symposium
and applications; for example, Cana- on Security and Privacy. IEEE Computer Society,
First Workshop on Operating System and Architectural
dian federal law restricts health care Support for the On-Demand IT Infrastructure, 2004. Washington, D.C., 2010, 380395.
15. Ganjali, A. and Lie, D. Auditing cloud management 35. Wang, Z., Wu, C., Grace, M., and Jiang, X. Isolating
and other sensitive data to machines commodity hosted hypervisors with HyperLock. In
using information flow tracking. In Proceedings of
physically located in Canada, while the Seventh ACM Workshop on Scalable Trusted Proceedings of the Seventh ACM European Conference
Computing. ACM Press, New York, 2012, 7984. on Computer Systems. ACM Press, New York, 2012,
recent news of U.S.-government sur- 16. Goel, A., Po, K., Farhadi, K., Li, Z., and De Lara, E. The 127140.
veillance programs has prompted in- Taser intrusion recovery system. In Proceedings of the 36. Wilkes, J., Mogul, J., and Suermondt, J. Utilification.
20th ACM Symposium on Operating Systems Principles. In Proceedings of the 11th ACM SIGOPS European
creased caution in adopting cloud ser- ACM Press, New York, 2005, 163176. Workshop. ACM Press, New York, 2004.
vices, particularly in Europe. 17. Hund, R., Holz, T., and Freiling, F.C. Return-oriented 37. Wojtczuk, R. A stitch in time saves nine: A case of
rootkits: Bypassing kernel code integrity protection multiple OS vulnerability. Presented at Black Hat USA
As with most types of trust, trust in mechanisms. In Proceedings of the 18th Conference on 2012; http://media.blackhat.com/bh-us-12/Briefings/
a cloud provider by a client is based USENIX Security. USENIX Association, Berkeley, CA, Wojtczuk/BH_US_12_Wojtczuk_A_Stitch_In_Time_
2009, 383398. WP.pdf
on history, reputation, the back and 18. Keller, E., Szefer, J., Rexford, J., and Lee, R.B. 38. Wojtczuk, R. and Rutkowska, J. Following the
White Rabbit: Software Attacks against Intel VT-d
forth of an ongoing commercial rela- NoHype: Virtualized cloud infrastructure without
Technology, 2011; http://www.invisiblethingslab.com/
the virtualization. In Proceedings of the 37th Annual
tionship, and the legal and regulatory International Symposium on Computer Architecture. resources/2011/SoftwareAttacksonIntelVT-d.pdf
39. Zhang, F., Chen, Chen, H., and Zang, B. Cloudvisor:
setting, as much as it is on technical ACM Press, New York, 2010, 350361.
Retrofitting protection of virtual machines in multi-
19. Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock,
details. In an effort to entice custom- D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski,
tenant cloud with nested virtualization. In Proceedings
of the 23rd ACM Symposium on Operating Systems
ers to move to the cloud, providers R., Norrish, Sewell, T., Tuch, H., and Winwood, S. sel4:
Principles. ACM Press, New York, 2011, 203216.
Formal verification of an OS kernel. In Proceedings of the
already provide greater transparency ACM SIGOPS 22nd Symposium on Operating Systems
40. Zhang, Y., Juels, A., Reiter, M.K., and Ristenpart, T.
Cross-VM side channels and their use to extract
about their operations and proactively Principles. ACM Press, New York, 2009, 207220.
private keys. In Proceedings of the 2012 ACM
20. Kortchinsky, K. Cloudburst: A VMware guest to host
attempt to the meet the compliance escape story. Presented at Black Hat USA 2009;
Conference on Computer and Communications
Security. ACM Press, New York, 2012, 305316.
standards required by the financial and http://www.blackhat.com/presentations/bh-usa-09/
KORTCHINSKY/BHUSA09-Kortchinsky-Cloudburst-
the health-care industries. Given the SLIDES.pdf Mihir Nanavati (mihirn@cs.ubc.ca) is a Ph.D. student in
competing demands of the cloud-infra- 21. Kutch, P. PCI-SIG SR-IOV Primer: An Introduction the Department of Computer Science at the University of
to SR-IOV Technology. Application Note 321211-002, British Columbia, Vancouver.
structure business, this trend toward Intel Corp., Jan. 2011; http://www.intel.com/content/
dam/doc/application-note/pci-sig-sr-iov-primer-sr-iov- Patrick Colp (pjcolp@cs.ubc.ca) is a Ph.D. student in the
transparency is likely to continue. Department of Computer Science at the University of
technology-paper.pdf
22. Leinenbach, D. and Santen, T. Verifying the Microsoft British Columbia, Vancouver.
References Hyper-V hypervisor with VCC. In Proceedings of the William Aiello (aiello@cs.ubc.ca) is a professor in the
1. Abadi, M., Budiu, M., Erlingsson, U., and Ligatti, J. Second World Congress on Formal Methods. Springer- Department of Computer Science at the University of
Control-flow integrity. In Proceedings of the 12th Verlag, Berlin, Heidelberg, 2009, 806809. British Columbia, Vancouver.
ACM Conference on Computers and Communications 23. Microsoft. Windows Server 2008 R2 Core: Introducing
Security. ACM Press, New York, 340353. SCONFIG; http://blogs.technet.com/b/virtualization/ Andrew Warfield (andy@cs.ubc.ca) is an assistant
2. Amazon. Summary of the Amazon EC2 and Amazon archive/2009/07/07/windows-server-2008-r2-core- professor in the Department of Computer Science at the
RDS Service Disruption in the U.S. East Region; http:// introducing-sconfig.aspx University of British Columbia, Vancouver.
aws.amazon.com/message/65648/ 24. Murray, D.G., Milos, G., and Hand, S. Improving Xen
3. Amazon. Summary of the December 24, 2012 Amazon security through disaggregation. In Proceedings of
ELB Service Event in the U.S. East Region; http://aws. the Fourth ACM SIGPLAN/SIGOPS International
amazon.com/message/680587/ Conference on Virtual Execution Environments. ACM Copyright held by Author/Owner(s). Publication rights
4. Amazon. Summary of the October 22, 2012 AWS Press, New York, 2008, 151160. licensed to ACM. $15.00

MAY 2 0 1 4 | VO L. 57 | N O. 5 | C OM M U N IC AT ION S OF T HE ACM 79

Вам также может понравиться