Вы находитесь на странице: 1из 8

Bash on Ubuntu on macOS

Takaya Saeki Takahiro Shinagawa Shinichi Honiden


Yuichi Nishiwaki Information Technology Center Department of Computer Science
takaya.saeki@is.s.u-tokyo.ac.jp The University of Tokyo The University of Tokyo
nyuichi@is.s.u-tokyo.ac.jp shina@ecc.u-tokyo.ac.jp National Institute of Informatics
Department of Computer Science honiden@nii.ac.jp
The University of Tokyo
ABSTRACT 1 INTRODUCTION
Linux is a popular operating system (OS) as a production en- Linux is one of the most popular operating systems (OSs). It
vironment, while many developers prefer to use macOS for is widely used as not only a desktop environment but also
their daily development. One way to deal with this situation a production environment. For example, 37% of the top 10
is running Linux in a virtual machine and the other is porting million websites were hosted by Linux [21], and more than
development environments from Linux to macOS. However, 90% of 371,132 Amazon EC2 instances were Linux (56.4%
using a virtual machine has a resource sharing problem, were Ubuntu) [15]. Therefore, many real-world applications
and porting environments is costly and often incomplete. are developed for Linux and there exist a large number of
A promising approach to low-cost and seamless resource Linux binaries and distributions. On the other hand, a certain
sharing is to develop a Linux compatibility layer for macOS. number of software developers prefer to use macOS instead
Unfortunately, existing methods of implementing OS com- of Linux as a development environment [17]. For these soft-
patibility layers lack robustness or flexibility. In this paper, ware developers, there is a huge gap between the production
we propose a new architecture of OS-compatibility layers. environment and development environment.
It allows user-space implementation of the core emulation To fill the gap, two different approaches are taken. One is
layer in the host OS to improve robustness while maintaining to install Linux in a virtual machine (VM) on macOS. How-
the flexible and powerful emulation ability without heavily ever, resource sharing between the guest and host OSs has
depending on the host OS kernel by exploiting virtualization difficulties because resources are managed by the guest and
technology. We implemented our approach and confirmed host OS separately. For example, the guest and host file sys-
that Ubuntu’s userland runs on macOS. Our experimental tem trees are different. Inter-process communications (IPCs)
results show that our approach has reasonable performance between the guest and host processes is not supported; there-
for real world applications. fore, pipe-based communication is impossible, for example.
Memory management is also performed by both the guest
KEYWORDS OS and host OS independently, and therefore users have to
Operating System Compatibility, Virtualization decide how much memory they “give” to a VM. The other
approach is to port applications and development environ-
ACM Reference format: ments from Linux to macOS. Although various kinds of tools
Takaya Saeki, Yuichi Nishiwaki, Takahiro Shinagawa, and Shinichi initially developed for Linux were ported to macOS, porting
Honiden. 2017. Bash on Ubuntu on macOS. In Proceedings of APSys software is very costly and often incomplete. For example,
’17, Mumbai, India, September 2, 2017, 8 pages. Valgrind, a popular dynamic analysis tool, required almost a
https://doi.org/10.1145/3124680.3124733 year to be ported to macOS Sierra.
Apart from Linux on macOS, there exists a third approach:
Permission to make digital or hard copies of all or part of this work for using an OS compatibility layer. It allows applications imple-
personal or classroom use is granted without fee provided that copies mented for an OS to run on another OS natively with less
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
effort. This approach solves the problems that the above ap-
for components of this work owned by others than the author(s) must proaches have. Since the OS compatibility layer absorbs the
be honored. Abstracting with credit is permitted. To copy otherwise, or environment differences, developers will not be burdened
republish, to post on servers or to redistribute to lists, requires prior specific with porting efforts. In addition, since guest application’s
permission and/or a fee. Request permissions from permissions@acm.org. resources are managed by the host OS, resource sharing
APSys ’17, September 2, 2017, Mumbai, India
between the guest and host OS is achieved smoothly. For
© 2017 Copyright held by the owner/author(s). Publication rights licensed
to Association for Computing Machinery.
example, a guest application can use the current free mem-
ACM ISBN 978-1-4503-5197-3/17/09. . . $15.00 ory of the host OS as much as possible, rather than a fixed
https://doi.org/10.1145/3124680.3124733
APSys ’17, September 2, 2017, Mumbai, India T. Saeki, Y. Nishiwaki, T. Shinagawa, and S. Honiden

amount of memory pre-allocated to the VM. After the appli- each guest process. The monitor process issues host system
cation terminates, the used memory will be freed naturally, calls to emulate guest system calls, and leverages virtualiza-
rather than the guest kernel keep the memory in the VM. tion technology to trap software interrupts and manipulate
Windows Subsystem for Linux (WSL, also known as Bash page tables for the guest process. This design allows most of
on Ubuntu on Windows) [8] and Linuxulator [9] are exam- the emulation layer to be implemented in a user-space host
ples of this approach. They enable unmodified Linux applica- process, while having the flexible and powerful emulation
tions to run on Windows and FreeBSD respectively. They are ability to achieve full binary compatibility. It also achieves
both in-kernel subsystems that handle Linux system calls at seamless communication between guest and host processes
the privileged level. Since privileged software has full control and high portability of the emulation layer.
of software interrupts and page tables, they have enough We implemented a Linux compatibility layer for macOS,
flexibility to increase compatibility as much as the original called Noah, based on our proposed design. Noah can run
kernel. However, they still have a problem. They lack robust- unmodified ELF binaries for x86-64 Linux 4.6 on macOS
ness since in-kernel subsystems are often unstable and they 10.12 Sierra. We confirmed that the userland of Ubuntu 16.04
are not isolated from the host kernel. In fact, WSL sometimes and Arch Linux run on Noah. We implemented emulation
causes the blue screen of death of Windows [7]. for many Linux subsystems such as process management,
The easiest way to realize such robustness is to implement memory management, virtual file systems, networks and
OS compatibility layers in the user space. Cygwin [2] and signals. Noah currently supports 172 out of 329 Linux system
MinGW [16] are such products. These compatibility layers calls. Although the implementation is still in progress, Noah
are implemented in purely user space, therefore the bugs or can build Linux kernels on it and run several X11 applications.
crashes are safely isolated from the host kernel. However, the Noah uses Hypervisor.framework [1] for its virtualization
fact they live in the use space causes another big problem; component, so we do not need to modify the macOS kernel.
inflexibility due to the lack of kernel privilege. Since they do Our experimental results showed that the overhead of
not have privileged ability such as page table management Linux kernel build time on Noah was around 7.2% and the
or interrupt handling, they give up binary compatibility and exec system call was 2.4 times faster than that of macOS.
adopt API compatibility. Therefore, they require guest OS’s This paper is organized as follows. Section 2 shows re-
application to be recompiled with their tool-chains. They lated work. Section 3 explains the architectural design of
also have performance issues. Taking Cygwin for example, our approach and Section 4 describes the implementation of
its fork implementation is struggling because of the inability our Linux compatibility layer for macOS. Section 5 presents
of the copy-on-write technique. MinGW gives up the full experimental results. Section 6 summarizes this paper.
compliance to the Linux kernel to gain performance. This
observation indicates that robustness and flexibility are in a 2 RELATED WORK
trade-off relationship in traditional approaches. Xax [6] abstracts an execution environment of native code
Library OSs, such as OSv [12], implement a kind of OS as a lightweight process, called picoprocess. A picoprocess is
compatibility layer that allows a guest binary to run on a created and mediated by an OS-specific monitor program,
VM interface. However, they do not allow seamless commu- and communicates with it via highly restricted system calls.
nication between the guest and host processes. NOVA [18] To set up a restricted execution environment, a picoprocess
exploits a microkernel-like approach to run a part of a virtual has a boot loader and trampoline code inside it to communi-
machine monitor (VMM) in user space to improve robust- cate with the monitor. A picoprocess is similar to our guest
ness. However, it offers low-level machine interface rather process in that system calls are mediated by a host process.
than system-call level abstract interface, therefore its imple- However, the boot loader and trampoline code inside the
mentation becomes more complicated than OS compatibility picoprocess incur ahead-of-time or just-in-time patching
layers. Barrelfish [3] uses user-level monitor processes to procedure to restrict system calls. The implementation of
implement system calls. However, its architecture does not picoprocesses is also different from ours. Their Linux imple-
use an host OS, therefore the OS functionalities must be mentation uses ptrace to restrict system calls, suffering from
implemented from scratch. performance hit and complication on memory management
In this paper, we present a new architectural design of OS because ptrace does not allow direct manipulation of memory
compatibility layers. This design realizes both robustness map of the target process. Their Windows implementation
and flexibility by utilizing virtualization technology. In our uses a kernel driver to mediate system calls, suffering from
execution model, an individual VM is launched per guest pro- kernel dependency and robustness decline. Our monitor pro-
cess, and a guest binary runs in a VM without the OS kernel. cesses can directly trap system calls and other privileged
System calls issued by the guest process are trapped and em- events without depending on the host kernel by exploiting a
ulated in a host process, called a monitor process, created for hardware-based virtualization technology.
Bash on Ubuntu on macOS APSys ’17, September 2, 2017, Mumbai, India

Embassies [10] extends picoprocesses by adding rich func- The host OS Guest VMs
tions such as the IP protocol and user interface APIs. A suc-
cessor work [11] proposed an architecture to run POSIX load &
applications in it. In this architecture, a POSIX emulator monitor process guest process
manage
running inside a picoprocess offers POSIX ABI. The POSIX user emulate trap
space system
emulator consists of several subsystems such as the virtual callls system callls
file system and IP multiplexer, and is implemented based on kernel upcall
monitor
the Embassies ABI. This approach needs a large emulation space
(no kernel)
kernel
layer to realize the POSIX ABI by the Embassies ABI, which
VMM module
is narrower than normal system calls. Embassies inherits the
same benefits and drawbacks of the picoprocess’s work.
Foreign LINUX (flinux) [20] is an emulation software to Figure 1: The design of our OS compatibility layer
run unmodified Linux binaries on Windows. It performs
binary translation against Linux binaries to allow its user-
space implementation without losing flexibility. In flinux, kernel, whereas we isolate all kernel components from the
system calls are intercepted via translated trampoline code. guest kernel and put them in a host process.
However, memory layout configuration is not so flexible OSv [12] and our work are similar in that they both are
because a guest process shares the memory space with the aiming at constructing from scratch a lightweight kernel
corresponding host process. Additionally, it is significantly interface of Linux. However, their goals and implementations
slower than ours due to the online scan-and-patch process. are distinct. On one hand, OSv is a mere operating system for
Dune [4] resembles our work in that they both run guest VMs. OSv focuses on performance improvement rather than
programs in VMs with higher-level interfaces than the ma- compatibility; it even exposes non-POSIX interface to user
chine architecture. However, their goals are different; Dune programs and is optimized to run faster with executables
aims at providing user programs direct access to hardware specially modified for OSv. On the other hand, ours is not
features, whereas we emulate the kernel interface of a dif- an operating system but an OS compatibility layer. Its main
ferent OS. We only use ring 3 in VMX non-root mode for aim is to accomplish full compatibility with Linux, without
running its guest process, whereas Dune consists of pro- giving up as much performance as possible.
cesses running in different rings and VMX modes.
Multiverse [13] has a goal similar to that of Dune. It gives 3 DESIGN
Linux applications the ability to utilize privileged Hybrid Figure 1 shows the design of our OS compatibility layer. It
Runtime (HRT) environment. Multiverse leverages a VMM consists of three components: a VMM module, guest VMs,
to trap privileged operations and emulate Linux behaviors in and monitor processes. The VMM module is a component
HRT just like our architecture. However, Multiverse has no in the host OS kernel that provides a VM management in-
host OS under a VMM and emulates privileged operations by terface to user-space applications. We can exploit several
the actual Linux kernel that is running cooperatively with the OS-standard VMM modules; for example, Linux has KVM,
HRT. In contrast, our architecture handles them in the host FreeBSD has vmmapi, and macOS has Hypervisor.framework.
OS to realize resource sharing and seamless communication. The guest VMs work as containers of guest binaries and are
Barrelfish [3] is an OS that adopts the multikernel model. used to trap access from guest processes. The monitor pro-
Barrelfish has a similar architecture to our work in that the cesses are regular processes of the host OS that emulate
core kernel component is implemented as multiple user- system calls and manage VMs through the VMM module.
space processes called monitor processes. However, Bar- A guest application is executed in the following way. First,
relfish adopts the message passing model for IPC to im- a monitor process corresponding to the guest application
prove scalability on multi-core and heterogeneous systems, is launched. It creates a VM through the VMM module and
whereas our architecture uses the traditional shared memory loads the guest binary image into the memory space of the
model to improve communication performance between a VM. Then, the monitor process asks the VMM module to start
guest process and the OS compatibility layer. execution of the VM from the entry point of the guest binary,
NOVA [18] is a redesign of VMMs from the viewpoint thereby the control is passed to the guest process. While the
of microkernels. Both Nova and our work put complicated guest process is running, it will issue system calls. These
components, such as page table management, in user space system calls are trapped by the VMM module and the VMM
in order to improve robustness. However, they differ in that module up-calls the monitor process. The monitor process
Nova only isolates complex parts of VMMs from the host emulates the trapped system calls by using the system calls
of the host OS. The monitor process then returns the result
APSys ’17, September 2, 2017, Mumbai, India T. Saeki, Y. Nishiwaki, T. Shinagawa, and S. Honiden

of emulated system calls to the VM by way of the VMM complex memory management of the target OS such as copy-
module, and the control is returned to the guest process. on-write page mappings between processes. Therefore, this
In our design, a guest process runs in a VM and a monitor approach can achieve ABI compatibility without requiring
process is created for each VM. Therefore, when a guest modifications to either guest processes or host kernels.
process tries to create another process, the corresponding
monitor process first creates another monitor process, and 4 IMPLEMENTATION
the new monitor process creates a new VM. In the case of We have implemented a Linux compatibility layer for macOS
processes having a parent-child relationship like in UNIX, based on our design. The implementation, Noah, targets
the new monitor process clones the VM states of the original x86-64 Linux 4.6 and macOS 10.12 Sierra or later. For the
guest process and set the states to the newly created VM. virtualization foundation, we utilized Hypervisor.framework,
Then, the original monitor process returns the result of the a built-in library of macOS that provides a set of user space
process creation to the original guest process, and the new APIs to create and manage VMs. By relying on the built-in
monitor process passes the control to the new guest process. library of the OS, we could avoid writing kernel modules,
One advantage of this design is that process scheduling can improving robustness and reducing implementation costs.
be left to the host OS. A guest process can get control when Note that macOS only runs on Intel CPUs, not AMD’s ones.
the corresponding monitor process is scheduled by the host
OS, therefore, in effect, the guest process is schedule by the
host OS as an ordinary process. This allows fair scheduling
4.1 Boot Process
among multiple guest and host processes. Another advantage A guest Linux process is created when the noah command
is that resource sharing between guest and host processes is executed or a guest process issues a fork(2) system call.
becomes seamless. For example, they can share a file system When a guest process is being created, the monitor process
tree because both processes access the file system of the host first launches a new VM using Hypervisor.framework. Then,
OS. They can also use a pipe to communicate with each other. to avoid injecting custom boot code into the VM, the monitor
A pipe access from the guest process is converted to that of process manipulates the VM registers so that the VM directly
the host OS by the monitor process, and the communications enters x86-64 long mode. Additionally, to prevent any code
are handled by the host OS. Therefore, the guest and host from running in privileged mode in the VM, the monitor
processes can be communicated with each other as if they process initializes some control registers (such as CR0 and
communicates on the same OS, not on different OSs. CR4) and model-specific registers (including IA32_EFER)
An important characteristic of our design is that OS com- with empty settings. In particular, to trap system calls issued
patibility layers can be implemented in user space. This char- by the guest Linux process, the IA32_EFER.SCE bit is cleared,
acteristic leads to two advantages. The first is robustness. thereby the SYSCALL instruction is disabled.
Existing OS compatibility layers with ABI compatibility are In x86-64, some system registers need to hold physical
usually implemented in kernel space, therefore a bug in this memory addresses of memory data structures (e.g., page ta-
layer might cause a kernel crash. In contrast, bugs in the bles and segment descriptor tables). To allocate such data
monitor process in our design do not lead to kernel crashes. structures, we reserved a region of 1 GB in the physical ad-
Although the VMM module runs in the kernel, it is relatively dress space in the VM, which is not mapped in user space.
robust because it is small and well-maintained as a part of The monitor process allocates data structures from this re-
the standard kernel. The second is portability. The monitor gion and initializes them with empty settings, except for
process is implemented in a user-space process and loosely page tables (see Section 4.3 for detail memory management).
coupled with the internal APIs of the host OS kernel. User-
space processes are relatively easy to be ported to another 4.2 ELF Loader
OS, and it can also use any cross-platform utility libraries or We implemented our own ELF loader to load Linux ELF
even high-level languages such as Rust, Go, and Ruby. executable files into VMs. The loader is invoked just after
While our design has the robustness by implementing OS the noah command is executed or a guest Linux process
compatibility layers in user space, it also has the flexible and issues the execve(2) system call. When the loader is given
powerful emulation ability thanks to virtualization technol- a path to a Linux ELF executable file, it first opens the Linux
ogy. By running a guest process inside a VM, system calls ELF loader file (ld.so) through the virtual file system in the
and other privileged events such as page faults are trapped monitor process (see Section 4.5 in detail). It then uses the
by the hardware and the control is passed to the monitor internal version of mmap() to map the content of the loader
process via the VMM module. The monitor process has the file into the guest Linux address space in the VM. After
total control of the memory layout with the ability to ma- setting up the execution environment of ld.so, it passes
nipulate page tables. Therefore, it is possible to implement the control to ld.so with the Linux ELF executable file as
Bash on Ubuntu on macOS APSys ’17, September 2, 2017, Mumbai, India

an argument. Finally, ld.so loads the Linux ELF executable, The implementation of the other part of memory manage-
constructs memory segments, and resolves dynamic linked ment subsystem is surprisingly simple. We manage memory
libraries in the emulated environment as usual. regions with the vm_area_struct structures like Linux. When a
The ELF loader also supports the setuid bit. If the monitor guest Linux process issues mmap(2) or other memory-related
process is executed as root and the setuid bit of the target system calls, the monitor process manipulates them as well
ELF executable file is set, the monitor process changes the as EPT by way of Hypervisor.framework. Since the API of
effective user ID of the guest Linux process to that of the file Hypervisor.framework to manage EPTs accepts a virtual ad-
owner. For example, a setuid-root Linux command can write dress, physical page management can be left to macOS.
to files owned by root. Note that guest Linux processes share To support multiple guest processes, monitor processes
the same user name and ID with the host macOS. need to communicate with each other through an IPC mech-
anism. We chose to use our own shared memory allocator,
mainly for performance improvement. When the noah com-
4.3 Memory Management mand starts, we pre-allocate a few gigabytes of memory
Since we use a VM to run a Linux guest process, we need to region using mmap(2) with the MAP_SHARED flag in the mon-
manage page tables by ourselves. In general, a VM involves itor process. Data structures to be shared among monitor
two page tables: the VM’s page table and the Extended Page processes are allocated from it. Note that the pre-allocated
Table (EPT). To avoid the cost of handling two page tables, buffer consumes little memory owing to lazy page allocation.
we should fix one page table and manipulate only the other.
Which one should be fixed is a design choice.
We chose to fix the VM’s page table and manipulate the 4.4 Process Management
EPT. One reason of this is performance. When a VM is We implemented a subset of the clone(2) system call; only
switched to another, the TLB of the VM’s page table is flushed simple forking and thread creation are supported. Unfortu-
because the page table is changed. On the other hand, the nately, Hypervisor.framework does not support a fork of a
TLB of EPT is not flushed on VM switches, if the tagged TLB process holding a VM, so we need a work around. The actual
for EPT, called Virtual Processor ID (VPID), is supported. handling of a fork is as follows. First, the monitor process
Therefore, we can reduce the number of page walks by using saves the current VM state and destroys the VM. Second,
huge pages in the VM’s page table. Another reason is to the monitor process forks. Third, each of the two processes
make debugging easier. At the early stage of development, launches a new VM. Finally, they restore the saved state and
we designed a VM and its monitor process to share the same start the execution. We synchronize these VM restarts using
mapping from virtual addresses to physical addresses for a condition variables to avoid race condition. Thread creation
particular region. To do so in the VM’s page table, we need is straightforward because Hypervisor.framework provides
to obtain the physical address corresponding to the virtual such APIs. Process-specific data, such as memory region map-
address. Unfortunately, macOS does not support such opera- pings, futex structures, signal handlers, and vfs structures,
tion. On the other hand, Hypervisor.framework supports an are stored in the shared memory area (see Section 4.3).
API to set the physical address of an EPT entry by specifying
a virtual address of the monitor process.
In the VM’s page table, we use straight (identity) mapping 4.5 Virtual File System
where the virtual address is identical to the physical address. File access from guest Linux processes is basically forwarded
This mapping is simple but has a limitation due to the hard- to the host macOS file system. However, to support Linux-
ware bus width. The physical address width of the current style file system trees, we emulate virtual file system (VFS)
Intel CPU series is no more than 39 bits, whereas the virtual of Linux in the monitor process. The VFS consists of a path
address width is 48 bits. Therefore, we cannot map the upper translator and object-oriented programming (OOP) compo-
9-bit virtual address space in a VM with this mapping. Fortu- nent. The path translator converts a virtual path to the host
nately, current Linux does not use this part. In addition, we path by resolving symbolic links and virtual mount points.
have never observed any application exhausting 256 GB (i.e., The OOP component provides an interface to install custom
maximum addressable size with 39 bit) of virtual address file systems. The VFS allows us to expose macOS’s root file
space. The exception is the top 1-GB region. No physical system to Linux programs without breaking the Linux user
page is mapped here in the virtual address space to hide it space. Also, virtual file systems like sysfs and procfs can be
from user space. It is used to locate system data structures implemented upon this facility. The implementation of the
such as page tables and segment descriptor tables (described VFS is designed to be independent from the host OS archi-
above in Section 4.2). Consequently, the VM’s page table has tecture for the most part. This design significantly reduces
511-GB straight mapping and 1-GB empty mapping. the cost of porting Noah to other platforms than macOS.
APSys ’17, September 2, 2017, Mumbai, India T. Saeki, Y. Nishiwaki, T. Shinagawa, and S. Honiden

Table 1: Macro benchmark results on macOS and Noah 5.1 Macro Benchmarks
We measured the performance of Dhrystone of
Benchmark name macOS Noah % UnixBench [14] and compress-7zip, sqlite, postmark
Dhrystone (LPS) 34438960.2 38265208.1 -10% of Phoronix Test Suite [19]. We ran all benchmarks from
compress-7zip (MIPS) 9724 9277 4.80% UnixBench on a single core because some benchmarks got
sqlite (sec) 3.55 4.11 15.8% unstable with multiple cores on Noah. Other benchmarks ran
postmark (TPS) 2308 929 148% on 4 cores. Dhrystone and compress-7zip are CPU-bound,
kernel build (sec) 106.2 113.8 7.2% and sqlite and postmark are I/O bound. We also measured
the Linux kernel build time on a single core. The Linux
kernel is based on version 3.4.113. On macOS, we manually
modified the kernel so that the minimal configuration
of it can be compiled with cross tool chains. The kernel
4.6 Other Subsystems configuration is “allnoconfig”. We built “vmlinux” instead of
Most of the other Linux system calls are passed-through to the usual “bzImage” because even more troublesome work
macOS with flag conversion and structure adjustment. For was needed for building it on macOS.
example, the monitor process handles getpid(2) by simply Table 1 shows the result. The units are Loop Per Second
calling the equivalent system call of macOS and returning (LPS) for Dhrystone, Million Instructions Per Second (MIPS)
the value to the guest process. Most signals sent to a monitor for compress-7zip, Transaction Per Second (TPS) for post-
process are routed to the corresponding VM with proper con- mark, and seconds for sqlite and kernel build. For LPS, MIPS,
version. Signal handlers of the monitor process just record and TPS, higher is better, and for seconds, lower is better.
the arrivals of the signals. Before the next entrance to the The results show that Noah incurred low overheads on CPU-
VM, the monitor process checks the records, and if signal bound applications. This is reasonable because the Linux
arrivals are recorded, it creates a new signal stack frame in process directly run on processors in hardware-assisted VMs.
the VM and set the IP to the registered signal handler. We are not sure why Dhrystone is faster on Noah than on
macOS, but it might be caused by the binary differences. On
the other hand, I/O-bound applications had relatively large
5 EVALUATION overheads. This would be caused by system call emulation
This section shows evaluations of Noah performance and (see next section for detail). Especially, postmark incurred
compatibility. For performance, we ran a couple of bench- high overhead of 148%, but this benchmark uses server work-
marks on Noah and on native macOS. The benchmarks are di- load for production environments and we believe this is
vided into two groups: macro benchmarks and micro bench- an acceptable overhead in development environments. The
marks. The former shows the overall performance of Noah overhead of Linux kernel build is within a reasonable range
in real-world applications, and the latter shows the overhead (7.2%). This suggests that Noah incurs reasonable overhead.
on system calls. For compatibility, we describe the current
compatibility status of Noah with Linux kernel.
We carried out all evaluations on MacBook Pro 5.2 Micro Benchmarks
Early 2015 model with 2-core / 4-thread 3.1GHz We ran most benchmarks from UnixBench that could run on
Intel Core i7, 16GB DDR3 memory, and 512 GB Noah. In addition, we wrote micro benchmarks for “fork +
SSD. It runs the userland of Ubuntu 16.04 on ma- exec” and “System call” by ourselves. “fork + exec” measures
cOS Sierra 10.12.5. The Noah’s git commit revision is the number of fork() and exec() system calls executed in
2bfd3bb4244d9b171091ee21188c2048fff93be0. a second. “System call” measures the number of getpid()
Before evaluations, we need to refer to an important bug system calls, without caching, executed in a second.
of the current Hypervisor.framework. It is a performance Table 2 shows the results. They show that “System call”
degradation bug about VM creation speed. This bug causes incurred the highest overhead (1121%). This is the expected
the slowdown of VM creation as more VMs are created. In result because Noah incurs six context switches for each
addition to the creation speed, kernel_task, a core kernel system call: the guest VM to the VMM, the VMM to the
thread of macOS, consumes more memory and never releases monitor process, the monitor process to the host kernel, and
it, and eventually macOS freezes. Therefore, we suspect that three reverse directions. This benchmark measured only the
this is a memory leak bug. Due to this bug, performance of context switch overhead and was the worst case scenario.
fork on Noah degrades as processes are forked. Therefore, On the other hand, in “execl”, Noah outperforms macOS (2.4
the performance results will be worse than it actually is. This times faster) since “execve” in Noah just replaces the con-
will be fixed in the future version of Hypervisor.framework. tents of VMs without depending on macOS while macOS
Bash on Ubuntu on macOS APSys ’17, September 2, 2017, Mumbai, India

Table 2: Micro benchmark results on macOS and Noah

Benchmark name macOS Noah Unit Overhead


System call 8279169.8 738597.3 LPS 1121%
fork + exec 408.7 393.9 LPS 3.77%
UnixBench/execl 546.2 1325.1 LPS -58.8%
UnixBench/file read 1024 bufsize * 2000 1698850.0 455603.0 KBps 273%
UnixBench/file read 4096 bufsize * 8000 4062016.5 1468246.2 KBps 177%
UnixBench/file write 1024 bufsize * 2000 1502291.3 445372.9 KBps 137%
UnixBench/file write 4096 bufsize * 8000 4271771.7 1530728.9 KBps 179%
UnixBench/pipe 1209016.7 246907.7 LPS 390%
UnixBench/pipe based context switching 175200.2 75834.3 LPS 131%

requires microkernel-based complicated process replacement Though still many features are unimplemented, a lot of
and memory management operations. “fork + exec” has al- real-world applications already work; package managers
most the same performance on Noah and macOS (3.77%). such as apt and pacman, development tools such as gcc, vim,
This indicates that the speed of a VM creation, snapshot make, and Ruby, daily commands such as bash and ls, and
and restore for fork emulation is fast enough. Note that this even X applications including xeyes, xfwrite, and DooM3.
number will get even better if the performance degradation Network tools like nc also works. Much more applications
bug of Hypervisor.framework is fixed. will work as the development proceeds.
The “file” and “pipe” showed modest overheads. Since we
used an SSD, both benchmarks were relatively CPU bound 6 SUMMARY AND FUTURE WORK
rather than I/O bound. Since Noah’s VFS is a thin emula- This paper described the design and implementation of a
tion layer, the overheads mostly came from those of system novel OS compatibility layer. Our design improves robust-
calls. In fact, the file benchmark measured file access per- ness, compatibility, seamlessness, and portability by exploit-
formance by calling read() or write() system calls many ing virtualization technology. In our design, every virtual
times. Therefore, the file access speed is limited by the buffer guest process runs inside a VM and the system calls are
size of each system call multiplied by the number of “System trapped by the corresponding monitor process. The imple-
call” execution per seconds. For example, in the file read ex- mentation, Noah, demonstrated that unmodified Linux exe-
periment with 1024 buffer size, the performance in KBps will cutables, including gcc and X applications, run on macOS.
be the same as the number of read() system calls issued per The implementation still lacks a large part of the full Linux
second. Since Noah can issue system calls at most 738597.3 kernel interface. Some of them should require hard work or
times per second, its upper limit is the same, although ac- new techniques to keep the implementation fast and concise.
tual value is slower than that. There are non-negligible cost For example, ptrace(2) is such a system call that is left
in file access or other micro benchmarks, but we also note unimplemented because of difficulties in implementing inter-
that macro benchmarks indicated that such overhead for process synchronization with satisfactory performance. We
real-world applications was in an acceptable range. will complete the implementation in our future work.

5.3 Compatibility with Linux Kernel ACKNOWLEDGMENTS


This work is partly supported by Mitoh, a financial assistance
How compatible Noah is with the Linux kernel is an impor-
program by the government of Japan for outstanding young
tant question. To measure the quality of Linux compatibility
students and engineers.
layers, Linux Test Project [5] can be used. It runs many test
cases for all system calls to make sure that the system imple-
AVAILABILITY
ments expected behaviors of system calls. However, we did
not carried out such experiments because the current imple- Noah is publicly available from (under the dual MIT / GPL
mentation of Noah is still in early stage. The implementation licenses) https://github.com/linux-noah/noah.
lacks a large part of the full Linux kernel interface; there
are 157 unimplemented system calls among 329 currently.
Hence, we would not have insightful result yet. Thorough
quality evaluation will be executed in our future work.
APSys ’17, September 2, 2017, Mumbai, India T. Saeki, Y. Nishiwaki, T. Shinagawa, and S. Honiden

REFERENCES [20] Xiangyan Sun. 2015. Foreign LINUX - Run unmodified Linux applica-
[1] Apple. 2017. Hypervisor | Apple Developer Documentation. https: tions inside Windows. https://github.com/wishstudio/flinux. (2015).
//developer.apple.com/documentation/hypervisor. (2017). [accessed [accessed 2017-06-14].
2017-06-14]. [21] W3Techs. 2017. World Wide Web Technology Surveys. https://w3techs.
[2] Cygwin authors. 2017. Cygwin. https://www.cygwin.com. (2017). com/. (2017). [accessed 2017-06-01].
[accessed 2017-06-14].
[3] Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris,
Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and
Akhilesh Singhania. 2009. The Multikernel: A New OS Architecture
for Scalable Multicore Systems. In Proceedings of the ACM SIGOPS
22nd Symposium on Operating Systems Principles. ACM, 29–44. https:
//doi.org/10.1145/1629575.1629579
[4] Adam Belay, Andrea Bittau, Ali José Mashtizadeh, David Terei, David
Mazières, and Christos Kozyrakis. 2012. Dune: Safe User-level Access
to Privileged CPU Features. In Proceedings of the 10th USENIX Sym-
posium on Operating Systems Design and Implementation (OSDI 2012).
335–348.
[5] LTP developers. 2012. LTP - Linux Test Project. https://
linux-test-project.github.io/. (2012). [accessed 2017-06-17].
[6] John R. Douceur, Jeremy Elson, Jon Howell, and Jacob R. Lorch. 2008.
Leveraging Legacy Code to Deploy Desktop Applications on the Web.
In Proceedings of the 8th USENIX Symposium on Operating Systems
Design and Implementation (OSDI 2008). 339–354.
[7] gyf304. 2016. MLton crashes and BSODs. https://github.com/Microsoft/
BashOnWindows/issues/847. (2016). [accessed 2017-06-01].
[8] Jack Hammons. 2016. Windows Subsystem for Linux
Overview. https://blogs.msdn.microsoft.com/wsl/2016/04/22/
windows-subsystem-for-linux-overview/. (2016). [accessed
2017-06-14].
[9] Brian N. Handy, Rich Murphey, and Jim Mock. 2017. Chapter 10.
Linux Binary Compatibility. https://www.freebsd.org/doc/handbook/
linuxemu.html. (2017). [accessed 2017-06-14].
[10] Jon Howell, Bryan Parno, and John R. Douceur. 2013. Embassies:
Radically Refactoring the Web. In Proceedings of the 10th USENIX
Symposium on Networked Systems Design and Implementation (NSDI
2013). 529–545.
[11] Jon Howell, Bryan Parno, and John R. Douceur. 2013. How to Run
POSIX Apps in a Minimal Picoprocess. In Proceedings of the 2013
USENIX Annual Technical Conference. 321–332.
[12] Avi Kivity, Dor Laor, Glauber Costa, Pekka Enberg, Nadav Har’El,
Don Marti, and Vlad Zolotarov. 2014. OSv - Optimizing the Operating
System for Virtual Machines. In Proceedings of the 2014 USENIX Annual
Technical Conference. 61–72.
[13] Conor Hetland Kyle C. Hale and Peter Dinda. 2017. Multiverse: Easy
Conversion of Runtime Systems into OS Kernels via Automatic Hy-
bridization. In Proceedings of the 14th IEEE International Conference on
Autonomic Computing (ICAC 2017).
[14] Kelly Lucas and developers. 1989. Byte-UnixBench. https://github.
com/kdlucas/byte-unixbench. (1989). [accessed 2017-06-17].
[15] The Cloud Market. 2017. EC2 Statistics. http://thecloudmarket.com/
stats. (2017). [accessed 2017-06-01].
[16] MinGW.org. 2017. MinGW | Minimalist GNU for Windows. https:
//www.mingw.org. (2017). [accessed 2017-06-14].
[17] Stack Overflow. 2016. Developer Survey Results. https://insights.
stackoverflow.com/survey/2016. (2016). [accessed 2017-06-01].
[18] Udo Steinberg and Bernhard Kauer. 2010. NOVA: A Microhypervisor-
Based Secure Virtualization Architecture. In Proceedings of the 5th
European Conference on Computer Systems (EuroSys 2010). 209–222.
https://doi.org/10.1145/1755913.1755935
[19] Phoronix Test Suite. 2017. Phoronix Test Suite. https://www.
phoronix-test-suite.com/. (2017). [accessed 2017-06-17].

Вам также может понравиться