Вы находитесь на странице: 1из 5

Moores law and Architectural improvements in processor technology viz.

caches, pipelining, vector processing, hyper-threading, superscalar and multi-core offers more and more processing power to build systems that can host multiple applications and servers on single machine. Multi-core devices offer more performance and widely differentiated services in next-generation systems, giving way for system vendors to increase system performance and add new services while staying within the power budgets. Systems can run their control and data planes, as well as all additional services, on single multi-core processor that was previously spread across multiple discrete chips. Advancement in silicon is not only restricted to CPU subsystem, device capabilities and density on chip and/or board is increasing which leads to equipments providing services which otherwise are provided by diversified solutions. A networking equipment targeted to enterprise market is now equipped with not only processors offering layer-2, layer-3 routing but are equipped with specialized appliances thereby adding many layer-4 through layer-7 services. A network element residing in core network is having switching fabric with gigabits to terabits of switching capacity with multiple technologies based IO interfaces added in single operating environment. Increasing options for accessing WAN ( wide area network) is leading to increased number of communication interfaces in products in Access network space ,as value add to the product is directly dependent on products capability to support the traffic being carried over the public network. Traffic traversing WAN interfaces can be anything voice, video, data (atm, frame relay, multi-link PPP etc). A lot bigger bunch of applications along with different trunk or interface management software for different interfaces of a product lies in single operating environment. A look at a device in our pocket say a mobile phone offers wi-fi, Bluetooth, USB, HDMI etc interfaces with diverse applications ranging from multi-media applications playing streaming voice and video, navigation software and IP based services. All these application or servers or management software share the platform resources viz. CPU, Memory and devices by multiplexing resource access at process level with the support from an Operating System. This process level multiplexing of resources is fine and allows protection between applications using conventional OS techniques but researches reveal that such systems do not adequately support performance isolation, the scheduling priority, memory demand, and network traffic of one process impact the performance of other. Also while adding more features to our product by adding new devices requirement arises to port applications and other software stacks optimized for performance on an operating system different from the one running on existing product putting restrictions on time to market and feature addition cost. As major hardware vendors are expanding their 64-bit offerings because of performance, scalability and value 64-bit platforms can provide, 64-bit version of all software is a big plus. Again we cannot move all the services of existing product in one shot, a step by step transition requires platform to provide both 64-bit and 32-bit operating environments running simultaneously on the product. All this discussion leads to point that multiplexing ample physical resources of a modern system need to be done at the granularity of operating system to take full advantage of them. In simple words multiple operating systems or multiple versions of operating systems or commodity operating system and RTOS should gracefully co-exist and execute concurrently. One of the techniques by which a range of standard operating systems can gracefully co-exist with respective applications is Para-virtualization. As it do not require changes to application binary interface (ABI) the applications run unmodified but the operating systems hosting these applications need to be Para-virtualized i.e. modifications to operating systems are required to run it on the hardware platform, as owner ship of system resources is no longer with operating system but with

another layer of software known as Hypervisor or virtual machine manager. Hypervisors abstract system resources and provides mechanisms to operating systems to use them. In a machine Para-virtualized using Xen hypervisor with a processor without VT (virtualization technology support) hypervisor partitions system in to domains, domain 0 is the privileged domain runs the operating system with direct access to all system resources and domain U for guest operating systems. Single machine can have multiple instances of an operating system or different operating systems by creating more domain Us for each of them. For synchronous calls from domain to xen (hypervisor) domains make use of hyper-calls provided by hypervisor and notifications are posted to domains using event channels. Following table provides an overview of three broad aspects of the system viz. memory management, CPU, device I/O on a system having processor without VT (virtualization technology) support running Xen hypervisor. CPU Hypervisor keeps a virtual CPU for each domain. When a guest OS write to a protected register, instead, the hypervisor writes the value to the corresponding virtual register in the virtual CPU. CPU is scheduled among domains according to borrowed virtual time (BVT) scheduling algorithm Hypervisor runs at more privileged level, whenever a guest OS executes a privileged instruction; instead, a hypercall is made into the hypervisor and is executed by hypercall. Each operating system gets some memory reserved for it. Each time guest OS needs memory for any operation say process creation, it allocates memory form its reservation and registers it with hypervisor. If memory management is using paging system, the OS has read access to the page tables but the updates are done via hypervisor which has write access. Similarly for segmentation based systems, updates to the hardware segment tables are processed through hypervisor. Hardware interrupts to a guest operating system are received through hypervisor. Hypervisor uses an event delivery mechanism to send the notification of interrupt to domains (OS) registered to receive them. Hypervisor maps interrupts to event channels and delivers them asynchronously to target domain. Each guest OS registers a table for exception handlers with hypervisor. Can be trapped through hypervisor but it is for fast handling of system calls it is possible to bypass hypervisor by enabling fast system call handling option. Hypervisor exposes device abstractions. I/O data is transferred to and from each domain (OS) via hypervisor using shared-memory, asynchronous buffer descriptor rings. Split Device driver architecture is used to access devices by Domain U or guest domains.

Protection

Memory Management

Interrupts

Exceptions System Calls Device I/O

Various OS subsystems are modified to use system resources as per the abstractions and mechanisms provided by hypervisor. To understand modifications required in device driver subsystem of OS lets take up a case study. Consider a product where Ethernet interface and HDLC interface is clubbed in to one board. Applications utilizing IP stack of Product A and IPX of product B running in their isolated Linux

operating environments can take advantage of the additional interface. Whether packet is on IP or IPX, traversing Ethernet or HDLC interface, Linux networking stack design is neutral, as it is designed to be protocol independent. This much of the software will run as it is without any modification or porting effort, but the operating subsystem which provides accesses to the devices i.e. device driver needs modification for device to be used by applications from multiple domains. Device drivers need to multiplex the access requests from two isolated operating environments Do we need to re-design the whole device driver? The answer is NO. We illustrate the modifications required and an estimate of effort required to port the NAPI compliant network device driver on platform paravirtualized by XEN (hypervisor) for Ethernet Interface. We begin with overview of an Ethernet Media Access Controller (mac) device followed by device driver in standalone Linux based system and Para-virtualized device driver.

A typical 802.3 compliant Ethernet Mac device offers Ethernet packet transmit and receive functionality supported by two DMA engines, one for transmit and one for receiving Ethernet packet and with interrupt facility to indicate asynchronously the occurrence of one or more events indicating packet reception and transmission. DMA engines utilizes buffer descriptor rings one for each Tx and Rx to move packet from/into memory. A device driver's device dependent part typically initializes device by configuring various options related to speed, duplex or mode of operation, interrupt settings to indicate event that needs notification to be sent followed by initialization of buffer descriptor rings and starting receive and transmission engines. A net_device structure (network adapter) is initialized to plug the device into the driver architecture usable by kernel and hence applications. Network drivers are prepared to support a number of administrative tasks, such as setting addresses, modifying transmission parameters, and maintaining traffic and error statistics besides mechanisms required to push packet into the kernel for transmission and getting received packet from the kernel. The kernel interfaces to the device and Application binary interfaces are coded accordingly.
Code snippet from device initialization routine for mac device integrated into ICH9 I/O controller revealing API exported to kernel by driver implementation of a given device is listed for reference. < drivers/net/e1000e/netdev.c > /* construct the net_device struct */ netdev->open = &e1000_open; netdev->stop = &e1000_close; netdev->hard_start_xmit = &e1000_xmit_frame; netdev->get_stats = &e1000_get_stats; netdev->set_multicast_list = &e1000_set_multi; netdev->set_mac_address = &e1000_set_mac; netdev->change_mtu = &e1000_change_mtu; netdev->do_ioctl = &e1000_ioctl; Application Binary Interfaces exported by kernel : int (*open)(struct net_device *dev); int (*stop)(struct net_device *dev); int (*hard_start_xmit) (struct sk_buff *skb, struct net_device *dev);

struct net_device_stats *(*get_stats)(struct net_device *dev); int (*set_config)(struct net_device *dev, struct ifmap *map); int (*do_ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd); void (*set_multicast_list)(struct net_device *dev); int (*set_mac_address)(struct net_device *dev, void *addr); int (*change_mtu)(struct net_device *dev, int new_mtu);

In a virtualized environment like one provided by Xen (hypervisor) all this functionality of driver is required to be implemented by only operating system environment in domain 0. Operating systems in other domains access the device through domain 0 using Split Driver Architecture. Brief Overview of Split Driver Architecture: The device driver is split into two parts, frontend, runs in the guest domain and the backend runs in a domain which has access to the real device hardware, domain 0. The frontend driver appears to the guest operating system as if it is a real device. It receives IO requests from its kernel as usual, then issue requests to the backend. The backend driver is responsible for receiving these IO requests. The backend driver appears to its kernel as a normal user of in-kernel IO functionality. When the IO completes the backend notifies the frontend that the data is ready for use; the frontend is then able to report IO completion to its own kernel. Split drivers exchange requests and responses in shared memory, with an event channel for asynchronous notifications of activity. When the frontend driver comes up, it uses Xenstore to set up a shared memory frame and an inter-domain event channel for communications with the backend. Once this connection is established, the two can communicate directly by placing requests / responses into shared memory and then sending notifications on the event channel. Network Split Device Driver: The network backend driver consists of a number of logical Ethernet devices. Each of these has a logical direct connection to a, frontend, virtual network device in another domain. Each virtual interface uses two ``descriptor rings'', one for transmit, the other for receive. Each descriptor identifies a block of contiguous machine memory allocated to the domain. The transmit ring carries packets to transmit from the guest to the backend domain. The return path of the transmit ring carries messages indicating that the contents have been physically transmitted and the backend no longer requires the associated pages of memory. To receive packets, the guest places descriptors of unused pages on the receive ring. The backend will return received packets by exchanging these pages in the domain's memory with new pages containing the received data, and passing back descriptors regarding the new packets on the ring. Domain keeps its receive ring stocked with empty buffers else packets destined to it may be dropped. On the transmit path, it provides the application with feedback on the rate at which packets are able to leave the system.

A code-walk through from Linux 2.6.18 code show additional code to be added to network device driver.

Backend driver drivers/xen/netback 1. 2. 3. 4. 5. 6. netback.c xenbus.c accel.c common.h interface.c Makefile

Frontend driver drivers/xen/netfront 7. accel.c 8. netfront.c 9. netfront.h 10. Makefile

2299 lines of code* *generated using David A. Wheeler's 'SLOCCount

2205 lines of code*

Here we go! Like to move product with E1/T1/E3/T3/SONET/SDH interface with running Call processing, Alarm handling, SNMP handling applications on existing trunk management software stack having Alarm management software, configuration and signaling software with same performance using the framers and mappers you relied earlier with your successful product line, on upcoming processor without worrying about MIPS budget, get your Para-virtualized device drivers and move to virtualized platform. About Author: Girish Jain is Founder and Researcher at Technology Support Group (TSG).
Girish.jain@researchsolutions.in

References:
1. Technical Report Syscall Interception: Frederic Beck and Olivier Festor, Project MADYNES

2. HPC virtualization with Xen on Itanium: Thesis Harvard Bjerke 3. Xen and Art of Virtualization: University of Cambridge, Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris,Alex Ho, Rolf Neugebaure, lan Pratt, Andrew Warfield. 4. WhitePaper.SCOPE-Virtualization-UseC: SCOPE Alliance, This document describes a number of use cases for virtualization in the telecommunications space.

Web Pages: 1. http://www.linuxtopia.org/online_books/linux_virtualization/xen_3.0_interface_guide/linux_vir ualization_xen_interface_36.html 2. http://wiki.xensource.com/xenwiki/XenIntro#headea24be3f9c1144310a32c4ca3047f6e4560a9e76

Вам также может понравиться