Вы находитесь на странице: 1из 40

Hardware Accelerating Linux

Network Functions
Part I: Virtual Switching Technologies in Linux

Toshiaki Makita
NTT Open Source Software Center

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved.


Part I topics

• Virtual switching technologies in Linux


• Software switches and NIC embedded switch
• Userland APIs and commands for bridge

• Introduction to Recent features of bridge


(and others)
• FDB manipulation
• VLAN filtering
• Learning /flooding control
• Non-promiscuous bridge
• VLAN filtering for 802.1ad (Q-in-Q)

• Demo
• Setting up non-promiscuous bridge
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 2


Who is Toshiaki Makita?

• Linux kernel engineer at NTT Open Source


Software Center

• Technical support for NTT group companies

• Active patch submitter on kernel networking


subsystem
• bridge, vlan, etc.

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 3


Switching technologies in Linux

• Linux (kernel) has 3 types of software


switches
• bridge
• macvlan
• Open vSwitch

• NIC embedded switch in SR-IOV device is


also used instead of software switches

• These are often used for network backend in


server virtualization

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 4


bridge

• HW switch like device (IEEE 802.1D)


• Has FDB (Forwarding DB), STP (Spanning tree), etc.
• Use promiscuous mode that allows to receive all packets
• Common NICs filter unicast whose dst is not its mac address
without promiscuous mode
• Many NICs also filter multicast / vlan-tagged packets by default

without bridge with bridge


kernel kernel
TCP/IP TCP/IP
if dst mac is bridge device
br0

pass to bridge
upper layer handler hook
eth0 eth0 eth1
promiscuous
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada promiscuous
mode mode
Copyright © 2015 NTT Corp. All Rights Reserved. 5
bridge with KVM

• Used with tap device


• Tap device qemu/vhost
• packet transmission -> file read Guest
• file write -> packet reception
eth0

fd

kernel read/write

bridge vfs

eth0 tap0

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 6


macvlan

• VLAN using not 802.1Q tag but mac address


• 4 types of mode
• private kernel
• vepa MAC address A MAC address B
• bridge
macvlan0 macvlan1
• passthru
• Using unicast
macvlan
filtering if supported,
instead of promiscuous handler hook
mode eth0
(except for passthru) unicast filtering
• Unicast filtering allows
NIC to receive multiple
mac addresses Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 7


macvlan (bridge mode)

• Light weight bridge


• No source learning kernel
• No STP
MAC address A MAC address B
• Only one uplink
macvlan0 macvlan1
• Allow traffic
between macvlans
(via macvlan stack) macvlan

eth0

External SW
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 8


macvtap (private, vepa, bridge) with KVM

• macvtap qemu/vhost qemu/vhost


• tap-like macvlan variant
Guest Guest
• packet reception
-> file read eth0 eth0
• file write
-> packet transmission
fd fd

read/write read/write

macvtap0 macvtap1

macvlan

kernel eth0
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 9


Open vSwitch
• Supports OpenFlow
• Can be used as a normal switch as well
• Has many features (VLAN tagging, VXLAN, Geneve, GRE, bonding, etc.)
• Flow based forwarding
• Control plane in user space
• flow miss-hit causes upcall to userspace daemon

user space
daemon Flow table OpenFlow
(ovs-vswitchd) controller
FDB
control plane
upcall
kernel openvswitch
Flow table
(datapath) (cache)
data plane
handler hook
eth0 eth1
promiscuous
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

mode Copyright © 2015 NTT Corp. All Rights Reserved. 10


Open vSwitch with KVM

• Configuration is the same as


bridge qemu/vhost
• used with tap device Guest

eth0

fd

kernel read/write

openvswitch vfs

eth0 tap0

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 11


NIC embedded switch (SR-IOV)

• SR-IOV
• Addition to PCI normal physical function (PF),
allow to add light weight virtual functions (VF)
• VF appears as a network interface (eth0_0, eth0_1...)
• Some SR-IOV devices have switches in them
• allow PF-VF / VF-VF communication

PF VF VF
eth0 eth0_0 eth0_1

embedded switch

kernel SR-IOV supported NIC


Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 12


NIC embedded switch (SR-IOV)

• SR-IOV with KVM


• Use PCI-passthrough to attach VF to guest

qemu qemu
Guest Guest

eth0_0 eth0_1

eth0

embedded switch

kernel SR-IOV supported NIC


Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 13


Userland APIs and commands (bridge)

• Various APIs
• ioctl
• sysfs
• netlink

• Netlink is preferred for new features


• Because it is extensible
• sysfs is sometimes used

• Commands
• brctl (in bridge-utils, using ioctl / sysfs)
• ip / bridge (in iproute2, using netlink)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 14


Userland APIs and commands (bridge)

• brctl
# brctl addbr <bridge> ... create new bridge
# brctl addif <bridge> <port> ... attach port to bridge
# brctl showmacs <bridge> ... show fdb entries

• These operations can be performed by netlink


based commands as well (Since kernel 3.0)
# ip link add <bridge> type bridge ... create new bridge
# ip link set <port> master <bridge> ... attach port
# bridge fdb show ... show fdb entries

• And recent features can only be used by netlink


based ones or direct sysfs write
# bridge fdb add
# bridge vlan add
etc... Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 15


Recent features of bridge (and others)

• FDB manipulation
• VLAN filtering
• Learning / flooding control
• Non-promiscuous bridge
• VLAN filtering for 802.1ad (Q-in-Q)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 16


FDB manipulation

• FDB
• Forwarding database
• Learning: packet arrival triggers entry creation
• Source MAC address is used with incoming port
• Flood if failed to find entry
• Flood: deliver packet to all ports but incoming one

FDB kernel
MAC address Dst
learning
aa:bb:cc:dd:ee:ff eth0 bridge
...

eth0 packet eth1


arrival from
aa:bb:cc:dd:ee:ff
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 17


FDB manipulation

• FDB manipulation commands


• Since kernel 3.0
# bridge fdb add <mac address> dev <port> master temp
# bridge fdb del <mac address> dev <port> master

MAC address Dst


kernel
specified mac port
... bridge

specified port eth0 eth1

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 18


FDB manipulation
# bridge fdb add <mac address> dev <port> master temp
• What's "temp"?
• There are 3 types of FDB entries
• permanent (local)
• static
• others (dynamically learned by packet arrival)
• "temp" means static here
• "bridge fdb"'s default is kernel
permanent
br0 if match
• permanent here means
"deliver to bridge device" bridge permanent
(e.g. br0) (br0)
• permanent doesn't deliver
to specified port eth0 eth1

specified port
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 19


FDB manipulation
# bridge fdb add <mac address> dev <port> master temp
• What's "master"?
• Remember this command?
# ip link set <port> master <bridge> ... attach port
• "bridge fdb"'s default is "self"
• It adds entry to specified port (eth0) itself!

kernel

master bridge

specified port eth0 eth1


(self)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 20


FDB manipulation

• When to use "self"?


• Unicast /multicast filtering
• Use case: SR-IOV embedded SW
• VTEP-Mac mapping table (vxlan)

master bridge

PF VF VF
self eth0 eth0_0 eth0_1

embedded switch

kernel SR-IOV supported NIC


Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 21


FDB manipulation

• Example: Intel 82599 (ixgbe)


• Some people think of using both bridge and SR-IOV due
to limitation of VFs
• bridge puts eth0 (PF) into promiscuous, but...
• Unknown MAC address from VF goes to wire, not to PF
qemu qemu
Guest 1 Guest 2
MAC A MAC C
eth1 eth0_0
VF
tap bridge
Dst. A
PF
eth0 MAC B

embedded switch

kernel Intel 82599 (ixgbe)


Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 22


FDB manipulation

• Example: Intel 82599 (ixgbe)


• Type "bridge fdb add A dev eth0" on host
• Traffic to A will be forwarded to bridge

qemu qemu
Guest 1 Guest 2
MAC A MAC C
eth1 eth0_0
VF
tap bridge
Dst. A
PF
add fdb entry eth0 MAC B

embedded switch

kernel Intel 82599 (ixgbe)


Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 23


VLAN filtering

• 802.1Q Bridge
• Since kernel 3.9
• Filter packets according to vlan tag
• Forward packets according to vlan tag as well as mac
address
• Insert / strip vlan tag

kernel
FDB
insert / strip vlan tag
MAC address Vlan Dst
aa:bb:cc:dd:ee:ff 10 eth0 bridge
... filter disallowed vlan
eth0 eth1

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 24


VLAN filtering

• Ingress / egress filtering policy


• Incoming / outgoing packet is filtered if matching
filtering policy
• Per-port per-vlan policy
• Default is "disallow all vlans"
• Since kernel 3.18, vid 1 is allowed by default
• All packets are dropped except for untagged or vid 1

Filtering table kernel


Port Allowed
bridge
Vlans
eth0 10 filter by vlan filter by vlan
20 at ingress at egress
eth1 20 allow 10 disallow 10
eth0 eth1
30

VID 10
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 25


VLAN filtering

• PVID (Port VID)


• Untagged (and VID 0) packet is assigned this VID
• Per-port configuration
• Default PVID is 1 (Since kernel 3.18)
• Egress policy untagged
• Outgoing packet that matches this policy get untagged
• Per-port per-vlan policy
kernel
Filtering table
bridge
Port Allowed PVID Egress
Vlans Untag
apply pvid apply untagged
eth0 10 ✔
(insert vid 20) (strip tag 20)
20 ✔ ✔
eth1 20 ✔ ✔ eth0 eth1
30
untagged
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

packet Copyright © 2015 NTT Corp. All Rights Reserved. 26


VLAN filtering

• Commands
• Enable VLAN filtering (disabled by default)
# echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering
• Add / delete allowed vlan
# bridge vlan add vid <vid> dev <port>
# bridge vlan del vid <vid> dev <port>
• Set pvid / untagged
# bridge vlan add vid <vid> dev <port> [pvid] [untagged]
• Dump settings
# bridge vlan show

• Note: bridge device needs "self "


# bridge vlan add vid <vid> dev br0 self
# bridge vlan del vid <vid> dev br0 self
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 27


VLAN with KVM

• Traditional configuration
• Use vlan devices qemu qemu
• Needs bridges per vlan Guest Guest
• Low flexibility
• How many devices? eth0 eth0
# ifconfig -s
Iface ...
eth0 tap0 tap1
eth0.10
br10
br10 br20
eth0.20
br20
eth0.30 eth0.10 eth0.20
br30
eth0.40 eth0
br40 kernel
... Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 28


VLAN with KVM

• With VLAN filtering


• Simple qemu qemu
• Flexible Guest Guest
• Only one bridge
# ifconfig -s eth0 eth0
Iface ...
eth0
br0
tap0 tap1
pvid/untag pvid/untag
vlan 10 vlan 20
br0
vlan10 / 20
eth0
kernel
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 29


VLAN with KVM

• Other switches
• Open vSwitch
• Can also handle VLANs
# ovs-vsctl set Port <port> tag=<vid>

• NIC embedded switch


• Some of them support VLAN (e.g. Intel 82599)
# ip link set <PF> vf <VF_num> vlan <vid>

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 30


Learning / flooding control
• Limit mac addresses guest qemu qemu
can use
• Reduce FDB size Guest Guest
• Used with static FDB
entries eth0 eth0
("bridge fdb" command)

• Disable FDB learning on


particular port tap0 tap1
• Since kernel 3.11 no learning no learning
• No dynamic FDB entry
no flooding no flooding
• Don't flood unknown bridge
mac to specified port learning
• Since kernel 3.11 flooding
• Control packet delivery to
guests kernel eth0

• Commands
# bridge link set dev <port> learning off
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
# bridge link set dev <port> flood off
Copyright © 2015 NTT Corp. All Rights Reserved. 31
Non-promiscuous bridge

• Since kernel 3.16 qemu qemu


Guest Guest
• If there is only one
learning /flooding port, eth0 eth0
it can be non-promisc

• Instead of promisc
mode, unicast filtering is tap0 tap1
set for static FDB entries no learning no learning
no flooding no flooding
• Automatically enabled if bridge
meeting some conditions learning
• There is one or zero non-promisc
flooding
learning or flooding port
kernel eth0
• bridge itself is not
promiscuous mode
• VLAN filtering is enabled
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 32


802.1ad (Q-in-Q) support for bridge

• Since kernel 3.16

• 802.1ad allows stacked vlan tags


MAC .1ad tag .1Q tag payload

• Outer 802.1ad tag can be used to separate


customers
• Example: Guest A, B -> Customer X
Guest C, D -> Customer Y

• Inner 802.1Q tag can be used inside customers


• Customer X and Y can use any 802.1Q tags

• Command
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
# echo 0x88a8 > /sys/class/net/<bridge>/bridge/vlan_protocol
Copyright © 2015 NTT Corp. All Rights Reserved. 33
802.1ad (Q-in-Q) support for bridge
qemu
• Bridge preserves qemu
guest .1Q tag (vid Guest A
Guest C
eth0.30
30) when inserting
.1ad tag (vid 10) eth0 eth0
.1Q VID 30
• .1ad tag will be tap0 tap1
stripped at .1ad VID 10 pvid/untag pvid/untag
another end .1Q VID 30 vlan 10 vlan 20
point of .1ad bridge (.1ad mode)
network vlan10 / 20
eth0
kernel .1ad VID 10
.1Q VID 30
.1Q VID 30
Customer's Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
.1ad network
another site Copyright © 2015 NTT Corp. All Rights Reserved. 34
Demo

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 35


Non-promiscuous bridge

• Let's setup non- qemu qemu


promiscuous KVM Guest Guest
environment!
eth0 eth0

• Steps
• Create bridge vnet0 vnet1
• Enable vlan filtering no learning no learning
• Attach guests (by libvirt) no flooding no flooding
bridge
• Add FDB entries
learning
• Set port attributes non-promisc
flooding
(learning /flooding)
kernel eth0

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 36


Non-promiscuous bridge setup

• Commands
• Create bridge
# ip link add br0 up type bridge
# ip link set eth0 master br0
• Enable vlan filtering
# echo 1 > /sys/class/net/br0/bridge/vlan_filtering
• Attach guests
# virsh start guest1
# virsh start guest2
• Add FDB entries ("append" overwrites if exists)
# bridge fdb append 52:54:00:xx:xx:xx dev vnet0 master temp
# bridge fdb append 52:54:00:yy:yy:yy dev vnet1 master temp
• Set port attributes
# bridge link set dev vnet0 learning off flood off
# bridge link setProceedings
devof netdev
vnet1 learning
0.1, Feb 14-17, 2015, Ottawa, On, Canadaoff flood off

Copyright © 2015 NTT Corp. All Rights Reserved. 37


Non-promiscuous bridge via libvirt xml

• libvirt (>= 1.2.11 with kernel >= 3.17) can


automatically handle these settings
• Network XML
# virsh net-edit <network>
...
<bridge name="br0" macTableManager="libvirt"/>
...

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 38


Some more useful commands...

• Filter FDB dump per bridge /port (Since 3.17)


• Filter per bridge
# bridge fdb show br <bridge>
• Filter per port
# bridge fdb show brport <port>
• VLAN range (Coming soon... 3.20?)
• Add vlans
# bridge vlan add vid <vid_begin>-<vid_end> dev <port>
• Show vlans in compressed format
# bridge -c vlan show

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 39


Summary

• Linux has several types of switches


• bridge, macvlan (macvtap), Open vSwitch
• SR-IOV NIC enbedded switch can also be used

• Bridge's recent features


• FDB manipulation
• VLAN filtering
• Learning / Flooding control
• Non-promiscuous bridge
• 802.1ad (Q-in-Q) support

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

Copyright © 2015 NTT Corp. All Rights Reserved. 40