Windows Offloaded Data Transfers Overview http://technet.microsoft.com/en-us/library/hh831628.aspx
February 28, 2012 Abstract This paper provides the technical details about the Offloaded Data Transfer (ODX) operation and design requirements for intelligent storage devices in Windows 8 Consumer Preview. It also provides conceptual guidelines for developers of intelligent storage devices to understand the operation of ODX in Windows 8 Consumer Preview. This information applies to the following operating systems: Windows 8 Consumer Preview Windows Server 8 Beta
References and resources discussed here are listed at the end of this paper. The current version of this paper is maintained on the Web at: Offloaded Data Transfer (ODX) with Intelligent Storage Arrays
Disclaimer: This document is provided as-is. Information and views expressed in this document, including URL and other Internet website references, may change without notice. Some information relates to pre- released product which may be substantially modified before its commercially released. Microsoft makes no warranties, express or implied, with respect to the information provided here. You bear the risk of using it. Some examples depicted herein are provided for illustration only and are fictitious. No real association or connection is intended or should be inferred. Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 2 February 28, 2012 2012 Microsoft. All rights reserved. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes. 2012 Microsoft. All rights reserved.
Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 3 February 28, 2012 2012 Microsoft. All rights reserved. Document History Date Change February 28, 2012 First publication
Contents
Introduction ........................................................................................................... 4 Problem Statement ................................................................................................ 4 Overview of Offloaded Data Transfer (ODX) ........................................................... 4 Identify ODX-Capable Source and Destination ........................................................ 5 ODX Read/Write Operations .................................................................................. 6 Synchronous Command Adoption and APIs .............................................................. 6 Offload Read Operations ........................................................................................... 6 ROD Token Policy and Management ..................................................................... 7 Offload Write Operations .......................................................................................... 7 Result from Receive Offload Write Result ............................................................. 7 Offload Write with Well-Known ROD Token ......................................................... 8 Performance Tuning Parameters of ODX Implementation........................................ 8 Minimum Copy Offload File Size ........................................................................... 9 Maximum Token Transfer Size and Optimal Transfer Count ................................ 9 Optimal and Maximum Transfer Lengths .............................................................. 9 ODX Error Handling and High Availability Support ................................................... 9 ODX Error Handling .................................................................................................... 9 ODX Failover in MPIO and Cluster Server Configurations ......................................... 9 ODX Usage Models .............................................................................................. 10 ODX across Physical Disk, Virtual Hard Disk and SMB Shared Disk ......................... 10 ODX Operation with One Server .............................................................................. 10 ODX Operation with Two Servers ............................................................................ 11 Massive Data Migration ........................................................................................... 12 Host-Controlled Data Transfer within a Tiered Storage Device............................... 13 Conclusion ........................................................................................................... 14 Resources ............................................................................................................ 15
Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 4 February 28, 2012 2012 Microsoft. All rights reserved. Introduction Increasing demands of high-speed data transfer for system virtualization and cloud storage data migration is pushing system and storage platform innovation to develop a more efficient data transfer mechanism. Today, IT pros and end users perform data transfer through client-server networks or storage networks using extended copy commands. To advance the storage data movement, Microsoft developed a new data transfer technologyOffloaded Data Transfer (ODX). Instead of using buffered read and write operations, ODX starts the copy operation with an offload read, retrieves a token representing the data from the storage device, and then uses an offload write command and the token to request data transfer from the source disk to the destination disk. The copy manager of the storage devices then moves the data according to the token. ODX operates in the backend storage array, which eliminates buffered data movement on the client-server network. CPU usage and client-server network bandwidth consumption drops to near-zero levels. In Windows 8, IT pros and storage administrators can use ODX to interact with the storage device to move large files or data through high-speed storage networks. ODX will significantly reduce client-server network traffic and CPU time usage during large- size data transfers, because all of the data movement occurs in the backend storage network. ODX can be used in virtual machine deployment, massive data migration, and tiered storage device support. Also, the cost of physical hardware deployment can be reduced through ODX and thin provisioning storage features.
Problem Statement A traditional copy operation reads data from a source file into a preserved buffer space and writes data into a destination file with the information stored in the preserved buffer. Many copy offload solutions in the current market try to speed up the copy operation through transfer link enhancement, quality of service (QoS) traffic control, or enhanced data buffer coordination to achieve high-performance data transfer rates on the client-server network. However, moving data through the client- server network can consume a large amount of network bandwidth and CPU time of the host systems. ODX allows the host system to interact with the storage array to move data through the high-speed storage area network (SAN). Because ODX uses the backend storage network, traffic on the front end client-server network and CPU usage is nearly zero. In Windows 8, ODX can help users copy or move data across virtual hard disks (VHDs), Server Message Block (SMB) shares, and physical disks. ODX is an end-to-end design and support feature. It works on the storage devices that comply with the T10 XCOPY Lite specification. This white paper covers the technical overview, design guide for storage arrays, and usage models of ODX.
Overview of Offloaded Data Transfer (ODX) Offloaded Data Transfer (ODX) introduces a tokenized operation to move data on the storage device. The source file and destination file can be on the same volume, two Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 5 February 28, 2012 2012 Microsoft. All rights reserved. different volumes hosted by the same machine, a local volume and a remote volume through SMB2, or two volumes on two different machines through SMB2. The following is the algorithm of the copy offload operation using ODX. The copy offload application sends an offload read request to the copy manager of the source storage device. The application sends a receive offload read result request to the copy manager and returns with a token. The token is a representation of the data to be copied. The application sends an offload write request with the token to the copy manager of the destination storage device. The application sends a receive offload write result request to the copy manager. The copy manager moves the data from the source to the destination and returns the offload write result to the application.
The following figure shows a diagram of the copy offload operation using ODX. Figure 1. Diagram of ODX Operations
Identify ODX-Capable Source and Destination To support ODX, storage arrays must implement the T10 standard specifications. The following sections describe how to identify ODX-capable storage arrays, offloaded read and write operations, and Offload Write with Well-Known Token. During the LUN device enumeration (a system boot or a plug-and-play event), Windows gathers or updates the ODX capability information of the storage target device through the following steps: Query copy offload capability Gather the required parameters for copy offload operations and limitations Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 6 February 28, 2012 2012 Microsoft. All rights reserved. By default, Windows 8 tries the ODX path, if both source and destination LUNs are ODX-capable. If the storage device fails the initial ODX request, Windows marks the combination of the source and destination LUN as a not ODX capable path. ODX Read/Write Operations ODX consists of four major steps: 1. Offload read operations 2. Receive offload read result 3. Offload write with the token 4. Receive offload write result
To avoid SCSI command time-out and ensure robust Multipath I/O (MPIO) and failover clustering support, Windows 8 adopted the synchronous offload SCSI commands. Synchronous Command Adoption and APIs Windows 8 adopts the synchronous offload read/write operation, and splits a large offload write request using the following algorithm to ensure a robust synchronous offload write: Set the optimal transfer size at 64 MB, if the target storage device does not provide an optimal transfer size. Optimal transfer size specified by the storage target devicethe optimal transfer size is greater than zero and less than 256 MB. Set the optimal transfer size at 256 MB, if the optimal transfer size set by the target device is greater than 256 MB.
Synchronous offload read and offload write SCSI commands reduce the complication of MPIO and cluster failover scenarios. Windows expects the copy manager to complete the synchronous offload read/write SCSI commands within 4 seconds. In Windows 8, applications can use FSCTL, DSM IOCTL, or SCSI_PASS_THROUGH APIs to interact with storage arrays and execute copy offload operations. To avoid data corruption or system instability, Windows 8 restricts applications from writing directly to a volume that is mounted by a file system without first obtaining exclusive access to the volume. This is because the write to the volume may collide with the file system writes. When such collisions occur, the contents of the volume may be left in an inconsistent state. Offload Read Operations The offload read request of the application can specify the token lifetime (inactivity time-out). If the application sets the token lifetime to zero, the default inactivity timer will be used as the token lifetime. The copy manager of the storage array maintains and validates the token according to its inactivity time-out value and credentials. The Windows host also limits the number of file fragmentations to 64. If the offload read request consists of more than 64 fragmentations, Windows fails the copy offload request and falls back to the traditional copy operation. Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 7 February 28, 2012 2012 Microsoft. All rights reserved. ROD Token Policy and Management After completing the offload read request, the copy manager prepares a representation of data (ROD) token for the receive offload read result command. The ROD token field specifies the point-in-time representation of user data and protection information. The ROD can be user data in open exclusively or open with share format. The copy manager can invalidate the token according to its ROD policy setting. If the ROD is open exclusively for a copy offload operation, the ROD token can be invalidated when the ROD is modified or moved. If ROD is in open with share format, the ROD token remains valid when the ROD is modified. The ROD token is in a 512-byte format, and the first 4 bytes are used to describe the ROD token type.
Token Format Size in Bytes Contents in the Token 4 Bytes
ROD Token Type
508 Bytes
ROD Token ID
Figure 2. Token Format Because the ROD token is granted and consumed only by the storage array, its format is opaque, not guessable, and highly secured. If the token is modified, not validated, or expired, the copy manager can invalidate the token during the offload write operation. The returned ROD token from the offload read operation has an inactive time-out value to indicate the number of seconds that the copy manager must keep the token valid for the next Write Using Token usage. Offload Write Operations After receiving the ROD token from the copy manager, the application sends the offload write request with the ROD token to the copy manager of the storage array. When a synchronous offload write command is sent to the target device, Windows expects the copy manager to complete the command within 4 seconds. If the command is terminated because of command time-out or other error conditions, Windows fails the command. The application falls back to the legacy copy operation according to the returned status code. Result from Receive Offload Write Result The offload write request can be completed with one or multiple Receive Offload Write Result commands. If the offload write is partially completed, the copy manager returns with the estimated delay and the number of transfer counts to indicate the copy progress. The number of transfer counts specifies the number of contiguous logical blocks that were written without error from the source to the destination Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 8 February 28, 2012 2012 Microsoft. All rights reserved. media. The copy manager can perform offload writes in a sequential or scatter/gather pattern. When a write failure occurs, the copy progress counts contiguous logical blocks from the first logical block to the failure block. The client application or copy engine resumes the offload write from the write failure block. When the offload write is completed, the copy manager completes the Receive ROD Token Information command with the estimated status update delay set to zero and the progress of the data transfer count at 100 percent. If the receive offload write result returns the same progress of the data transfer count, Windows fails the copy operation back to the application after four retries. Offload Write with Well-Known ROD Token A client application can also perform the offload write operation with a well-known ROD token. This is a predefined ROD token with a known data pattern and token format. One common implementation is called a zero token. A client application can use a zero token to fill one or more ranges of logical blocks with zeros. If the well- known token is not supported or recognizable, the copy manager fails the offload write request with Invalid Token.
Token Format Size in Bytes Contents in the Token 4 Bytes
ROD Token Type
2 Bytes Well Known Pattern 506 Bytes
Reserved
Figure 3. Well-Known Token Format In an offload write with a well-known ROD token, a client application cannot use an offload read to request a well-known token. The copy manager verifies and maintains the well-known ROD tokens according to its policy. Performance Tuning Parameters of ODX Implementation Performance of ODX does not depend on the transport link speeds of the client- server network or SAN between the server and storage array. The data is moved by the copy manager and the device servers of the storage array. Not every copy offload benefits from ODX technology. For example, the copy manager of a 1 Gbit iSCSI storage array could complete a 3 GB file copy within 10 seconds, and the data transfer rate will be greater than 300 MB per second. The data transfer rate already outperforms the maximum theoretical transfer speed of the 1 Gbit Ethernet interface. Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 9 February 28, 2012 2012 Microsoft. All rights reserved. Not every file size copy offload will gain the benefit from the Windows ODX technology. It is possible that copy performance for files of a certain size may not benefit from ODX technology. To optimize performance, use of ODX can be restricted to an allowable a minimum file size and maximum copy lengths. To tune the performance of ODX, adjust the following parameters. Minimum Copy Offload File Size Windows sets a minimum file size requirement for copy offload operations. Currently, the minimum copy offload file size is set at 256 KB in the copy engine. If a file is less than 256 KB, the copy engine falls back to the legacy copy process. Maximum Token Transfer Size and Optimal Transfer Count The Windows host uses a maximum token transfer size and optimal transfer count to prepare the optimal transfer size of an offload read or write SCSI command. The total transfer size in number of blocks must not exceed the maximum token transfer size. If the storage array does not report an optimal transfer count, Windows uses 64 MB as the default count. Optimal and Maximum Transfer Lengths The optimal and maximum transfer length parameters specify the optimal and maximum number of blocks in one range descriptor. Copy offload applications can comply with these parameters to achieve the optimal file transfer performance.
ODX Error Handling and High Availability Support When an ODX operation fails a file copy request, the copy engine and the Windows file system (NTFS) fall back to the legacy copy operation. If the copy offload fails in the middle of the offload write operation, the copy engine and NTFS resume with the legacy copy operation from the first failure point in the offload write. ODX Error Handling ODX uses a robust error handling algorithm in accordance with the storage arrays features. If the copy offload fails in an ODX-capable path, the Windows host expects the application to fall back to the legacy copy operation. At this point, the Windows copy engine has already implemented the fallback to traditional copy mechanism. After the copy offload failure, NTFS marks the source and destination LUN as not ODX-capable for three minutes. After this period of time passes, the Windows copy engine retries the ODX operation. A storage array could use this feature to temporarily disable ODX support in some paths during highly stressful situations. ODX Failover in MPIO and Cluster Server Configurations Offload read and write operations must be completed or canceled from the same storage link (I_T nexus). Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 10 February 28, 2012 2012 Microsoft. All rights reserved. When an MPIO or a cluster server failover occurs during a synchronous offload read or write operation, Windows handles the failover using the following algorithm: Synchronous offload read/write command With MPIO configurationWindows retries the failed command after the MPIO path failover. If the command fails again, Windows does the following: Without the cluster server failover option Windows issues a LUN reset to the storage device and returns an I/O failure status to the application. With the cluster server failover option Windows starts the cluster server node failover.
With Cluster Server configurationThe cluster storage service fails over to the next preferred cluster node and then resumes the cluster storage service. The offload application must be cluster aware to be able to retry the offload read/write command after the cluster storage service failover.
If the offload read or write command failed after the MPIO path and cluster node failover, Windows issues a LUN reset to the storage device after the failover. The storage device terminates all outstanding commands and pending operations under the LUN. Currently, Windows does not issue asynchronous offload read or write SCSI commands from the storage stack.
ODX Usage Models This section discusses the usage models of ODX technology: High-performance data transfer across physical disks, virtual hard disks, and SMB shared disks Massive data migration Host controller data movement within a tiered storage device
ODX across Physical Disk, Virtual Hard Disk and SMB Shared Disk To perform ODX operations, the application server must have the access to both source LUN and destination LUN with read/write privileges. The copy offload application issues an offload read request to the source LUN and receives a token from the copy manager of the source LUN. The copy offload applications use the token to issue an offload write request to the destination LUN. The copy manager then moves the data from the source LUN to the destination LUN through the storage network. ODX Operation with One Server In a single-server configuration, the copy offload application issues the offload read and write requests from the same server system. Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 11 February 28, 2012 2012 Microsoft. All rights reserved.
Figure 4. ODX Operation with One Server In the previous illustration, Server1 or Virtual Machine1 has access to both source LUN (VHD1 or Physical Disk1) and destination LUN (VHD2 or Physical Disk2). The copy offload application issues an offload read request to the source LUN and receives the token from the source LUN, and then the copy offload application uses the token to issue an offload write request to the destination LUN. The copy manager moves the data from the source LUN to the destination LUN within the same storage array. ODX Operation with Two Servers In the two-server configuration, there are two servers and multiple storage arrays managed by the same copy manager.
Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 12 February 28, 2012 2012 Microsoft. All rights reserved. Figure 5. ODX Operation with Two Servers In the previous illustration: Server1 or Virtual Machine1 is the host of the source LUN, and Server2 or Virtual Machine2 is the host of the destination LUN. Server1 shares the source LUN with the application client through SMB protocol, and Server2 also shares the destination LUN with the application client through SMB protocol. The application client has access to both source LUN and destination LUN. The source and destination storage arrays are managed by the same copy manager in a SAN configuration. From the application client system, the copy offload application issues an offload read request to the source LUN and receives the token from the source LUN, and then issues an offload write request with the token to the destination LUN. The copy manager moves the data from the source LUN to the destination LUN across two different storage arrays in two different locations.
Massive Data Migration Massive data migration is the process of importing a large amount of data such as database records, spreadsheets, text files, scanned documents, and images to a new system. Data migration could be caused by a storage system upgrade, a new database engine, or changes in application or business process. ODX can be used to migrate data from a legacy storage system to a new storage system, if the legacy storage system can be managed by the copy manager of the new storage system.
Figure 6. Data Migration from Legacy to New Storage System Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 13 February 28, 2012 2012 Microsoft. All rights reserved. In the previous illustration: Server1 is the host of the legacy storage system, and Server2 is the host of the new storage system. Server1 shares the source LUN as the data migration application client through SMB protocol, and Server2 shares the destination LUN as the data migration application client through SMB protocol. The application client has access to both the source and destination LUN. The legacy storage system and new storage system are managed by the same copy manager in a SAN configuration. From the data migration application client system, the copy offload application issues an offload read request to the source LUN and receives the token from the source LUN, and then issues an offload write request with the token to the destination LUN. The copy manager moves the data from the source LUN to the destination LUN across two different storage systems at two different locations. Massive data migration could be also operated with one server at the same location. Host-Controlled Data Transfer within a Tiered Storage Device A tiered storage device categorizes data into different types of storage media to reduce costs, increase performance, and address capacity issues. Categories can be based on levels of protection needed, performance requirements, frequency of usage, and other considerations. Data migration strategy plays an important role in the end result of a tiered storage strategy. ODX enables the host-controlled data migration within the tiered storage device. The following diagram is an example of ODX in a tiered storage device Figure 7. Data Transfer within Tiered Storage Device Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 14 February 28, 2012 2012 Microsoft. All rights reserved. In the previous illustration: The server is the host of the tiered storage system. The source LUN is the Tier1 storage device, and the destination LUN is the Tier2 storage device. All tiered storage devices are managed by the same copy manager. From the server system, the data migration application issues an offload read request to the source LUN and receives the token from the source LUN, and then issues an offload write request with the token to the destination LUN. The copy manager moves the data from the source LUN to the destination LUN across two different tier storage devices. When the data migration task is completed, the application deletes the data from the Tier1 storage device and reclaims the storage space.
Conclusion ODX introduces the tokenized read and write operations for the offloaded data movement. It also allows the host server to interact with the copy manager during the copy offload operation and falls back to the legacy copy process when a failure occurs during the copy offload operation. The key benefits of ODX are: Copy offload across physical servers and virtual machines. High-performance data transfer rate through the storage network. Low server CPU usage and network bandwidth consumption during the offload read/write operations. Intelligent data movement options allow applications to optimize the offload read/write solutions.
ODX can be operated from a physical server system or a virtual machine, and the source and destination disks can be physical disks, VHDs, or SMB shared disks. Many technologies such as volume snapshot, copy on write, and extended copy implementation have been applied to the storage arrays for enhancing massive data transfer. ODX provides a highly secured and efficient front end interface between the host server and copy manager of the storage systems. ODX is the first implementation based on the T10 XCOPY Lite specification. The application can interact with the storage device servers to perform offload read/write operations across storage arrays under the same copy manager. In the future, the platform design will need to involve the server cluster and storage cluster, because the data movement and transaction can also be performed by the copy manager of the storage cluster. The storage cluster can cross different vendors storage arrays when the ROD token format is secured and recognized by different vendors products. Because we expect the T10 Committee to continue the development of the ROD token format, the offload read and write operations could be implemented in all industry-standard storage products in the future. Offloaded Data Transfer (ODX) with Intelligent Storage Arrays - 15 February 28, 2012 2012 Microsoft. All rights reserved. Resources T10 XCOPY Lite Specification (11-059r8) http://www.t10.org/cgi-bin/ac.pl?t=d&f=11-059r8.pdf Windows Offloaded Data Transfer Logo Requirement http://msdn.microsoft.com/en-us/windows/hardware/gg487403