Вы находитесь на странице: 1из 2

Cilium: Networking and security for containers with BPF and XDP

Wednesday, November 2, 2016


This is a guest post by Daniel Borkmann who was recently recognized through the
Google Open Source Peer Bonus program for his work on the Cilium project. We inv
ited Daniel to share his project on our blog.
Our open source project, called Cilium, started as an experiment for Linux conta
iner networking tackling four requirements:
Scale: How can we scale in terms of addressing and with regards to network polic
y?
Extensibility: Can we be as extensible as user space networking in the Linux ker
nel itself?
Simplicity: What is an appropriate abstraction away from traditional networking?
Performance: Do we sacrifice performance in the process of implementing the afor
ementioned aspects?
We realize these goals in Cilium with the help of eBPF. eBPF is an efficient and
generic in-kernel bytecode engine, that allows for full programmability. There
are many subsystems in the Linux kernel that utilize eBPF, mainly in the areas o
f networking, tracing and security.
eBPF can be attached to key ingress and egress points of the kernel's networking
data path for every network device. As input, eBPF operates on the kernel's net
work packet representation and can thus access and mangle various kinds of data,
redirect the packet to other devices, perform encapsulations, etc.
This is a typical workflow: eBPF is programmed in a subset of C, compiled with L
LVM which contains an eBPF back-end. LLVM then generates an ELF file containing
program code, specification for maps and related relocation data. In eBPF, maps
are efficient key/value stores in the kernel that can be shared between various
eBPF programs, but also between user space. Given the ELF file, tools like tc (t
raffic control) can parse its content and load the program into the kernel. Befo
re the program is executed, the kernel verifies the eBPF bytecode in order to ma
ke sure that it cannot affect the kernel's stability (e.g. crash the kernel and
out of bounds access) and always terminates, which requires programs to be free
of loops. Once it passed verification, the program is JIT (just-in-time) compile
d.
Today, architectures such as x86_64, arm64, ppc64 and s390 have the ability to c
ompile a native opcode image out of an eBPF program, so that instead of an execu
tion through an in-kernel eBPF interpreter, the resulting image can run natively
like any other kernel code. tc then installs the program into the kernel's netw
orking data path, and with a capable NIC, the program can also be offloaded enti
rely into the hardware.
Cilium acts as a middle layer, plugs into container runtimes and orchestrators s
uch as Kubernetes, Docker or CNI, and can generate and atomically update eBPF pr
ograms on the fly without requiring a container to restart. Thus, unlike connect
ion proxies, an update of the datapath does not cause connections to be dropped.
These programs are specifically tailored and optimized for each container, for
example, a feature that a particular container does not need can just be compile
d out and the majority of configuration becomes constant, allowing LLVM for furt
her optimizations.
We have many implemented building blocks in Cilium using eBPF, such as NAT64, L3
/L4 load balancing with direct server return, a connection tracker, port mapping
, access control, NDisc and ARP responder and integration with various encapsula
tions like VXLAN, Geneve and GRE, just to name a few. Since all these building b
locks run in the Linux kernel and have a stable API, there is of course no need
to cross kernel/user space boundary, which makes eBPF a perfectly suited and fle
xible technology for container networking.
One step further in that direction is XDP, which was recently merged into the Li
nux kernel and allows for DPDK-like performance for the kernel itself. The basic
idea is that XDP is tightly coupled with eBPF and hooks into a very early ingre
ss path at the driver layer, where it operates with direct access to the packet'
s DMA buffer.

This is effectively as low-level as it can get to reach near-optimal performance


, which mainly allows for tailoring high-performance load balancers or routers w
ith commodity hardware. One advantage that comes with XDP is also that it reuses
the kernel's security model for accessing the device as opposed to user space b
ased mechanisms. It doesn't require any third party modules and works in concert
with the Linux kernel. Both XDP and tc with eBPF are complementary to each othe
r, and constitute a bigger piece of the puzzle for Cilium itself.
If you re curious, check out the Cilium code or demos on GitHub.
By Daniel Borkmann, Cilium contributor

Вам также может понравиться