SAN Administration Basics

SAN INTRODUCTION
SAN Introduction:
Problems of DAS: Throughout the 1980s, the standard way of connecting hosts to storage devices was pointto-point, direct-attach storage through interfaces such as Integrated Drive Electronics (IDE) and parallel SCSI. Parallel SCSI offered relatively fast (5 or 10 Mbit/sec) access to SCSIenabled disks, and several disks could be connected at once to the same computer through the same interface. This worked well for the time, with fairly reliable, fast-speed connections allowing administrators to connect internal and external storage through just simple ribbon cabling or multiconductor external cables. However, as storage subsystems became larger and larger and computers faster and faster, a new problem emerged external storage (which at one time was just a simple disk drive on the desk next to a machine) started to get bigger. Tape libraries, Redundant Array of Inexpensive Disks (RAID) arrays, and other SCSI devices began to require more and more spacerequiring the parallel SCSI connection to be stretched farther and farther away from the host. Input/Output (I/O) rates also increased, pushing on the physics of keeping signal integrity in a large bundle of wires (32- and 64-bit data bus widths). Simple parallel SCSI variants were devised to enable longer distances and to address the signal integrity issues. However, they all eventually ran up against the difficulties of high-speed signals across the parallel SCSI bus architecture.
DAS Fig. Solutions from SAN: The solution to all of this was slow in coming, but eventually the storage industry settled on using a serial protocol with high-speed transceiversoffering good noise immunity, ease of cabling, and plentiful bandwidth. Different specifications (Serial Storage Architecture [SSA] and Fibre Channel as well as more advanced parallel SCSI technologies) competed for
adoption, and companies began experimenting with different serial communications media. New high-speed circuits made serial transfers (using a simple pair of wires to transmit bits serially, in order, rather than a large number of wires to transfer several bytes or words of data at a time) the most practical solution to the signal problems. The high speed of the circuits enabled the data rates for Fibre Channel to offer up to 100 Mbit/sec transfers, versus the slower 10 to 20 Mbit/sec parallel limitations. (At present FC provides 1\2\4 Gbit/sec transfers). When Fibre Channel was first applied to the area of storage connections, the primary reason for the technology was for the extended distances and simplified cabling that the technology offered. This extension of direct-attach operation basically replaced the old parallel SCSI attachments with a high-speed serial line (Figure 1.2). The new Fibre Channel connections offered a much faster interface and simplified cabling (four copper wire connections through DB-9 connectors, as well as optical cabling), and could be used to distribute storage as far as 10 km away from a host computer, or 30 km away with optical extenders.
Figure 1.2 Using Fibre Channel to Extend Distances from Storage The connections to disks at this time began using the Fibre Channel Arbitrated Loop (FC-AL) protocol, which enabled disks to negotiate their addresses and traffic on a loop topology with a host (Figure 1.3). Because of the combined ability to easily cable and distribute storage, users were now able to add separate racks of equipment to attach to hosts. A new component, the Fibre Channel hub, began to be used to make it easier to plug in devices. The hub, a purely electrical piece of equipment that simply connected pieces of a Fibre Channel loop together, made it possible to dynamically add and remove storage from the network without requiring a complete reconfiguration. As these components began to be used in increasingly more complex environments, manufacturers began to add "intelligence" to these Fibre Channel hubs, enabling them to independently deal with such issues as failures in the network and noise in the network from loops being added and removed. An alternative to the hub came in the form of the Fibre Channel switch, which, unlike a hub,
was not just connecting pieces of a loop, but instead offered the packet-switching ability of traditional switches.
Figure 1.3 Arbitrated Loop Disk Configurations Attached to a Single Host Because there was now a Fibre Channel network available, other hosts (not storage) were added to take advantage of the same network. With the addition of SAN-aware software, it was suddenly possible to share storage between two different devices on the network. Storage sharing was the first realization of the modern SAN, with companies in the multimedia and video production areas paving the way by using the Fibre Channel network to share enormous data files between workstations, distribute jobs for rendering, and make fully digital production possible (Figure 1.4).
Figure 1.4 Multiple Host Arbitrated Loop for Storage Sharing
The next big step in Fibre Channel evolution came with the increased reliability and manageability of a Fibre Channel switched fabric. Early implementations of FC-AL were sometimes difficult to manage, unstable, and prone to interoperability problems between components. Because the FC-AL protocol was quite complex, what sometimes would happen would be an inability for anything to communicate and stay operational on a loop. The solution to this was a move to a switched fabric architecture, which not only enhanced the manageability and reliability of the connection, but provided switched, high-speed connections between all nodes of a network instead of a shared loop. As a result, each port on a switch now provides a full 1 Gbit/sec of available bandwidth rather than just a portion of the total 1 Gbit/sec of bandwidth shared between all the devices connected to the loop. Fabrics now make up the majority of Fibre Channel installations. A typical Fibre Channel switched fabric installation (Figure 1.5) has multiple hosts and storage units all connected into the same Fibre Channel network cloud through one or more Fibre Channel switches.
Figure 1.5 Switched Fabric, Multiple Host, and Storage Unit Configuration Today, the modern SAN looks much like any other modern computer network. Network infrastructures such as switches, hubs, bridges, and routers help transport frame-level information across the network. Network interface cards interface computer systems to the same network (called HBAs in the SAN world, as they replaced SCSI Host Bus Adapters).
Figure 1.6 shows an example of how these components could be used in conjunction with Fibre Channel switches.
Figure 1.6 Typical Deployed SAN Configuration with Multiple Hosts, Storage, and Tape Devices
FC Technology: 1. What is Fibre Channel?
Fibre Channel (FC) is a serial, high-speed data transfer technology, which can be
utilized by networks and mass storage. Fibre Channel is an open standard, defined by ANSI and OSI and which supports the most important higher protocols, such as Internet Protocol, ATM (Asynchronous Transfer Mode), IEEE 802 (Institute of Electrical and Electronics Engineers Standard), HIPPI (High Performance Parallel Interface), SCSI (Small Computer System Interface), etc.. Fibre Channel is fast (data Transfer), Flexible (supports many topologies), Simple and Scalable. FC Storage Devices available in the current Storage industry includes FC Controllers and HBA, FC Hard disks, FC HD Enclosures, FC Storage Arrays, FC Hubs and Switches, FC Connectors and Cables and Other devices. 2 FC Layers:
The FC-0 layer defines the specification for media types, distance, and signal electrical and optical characteristics. The FC-1 layer defines the mechanism for encoding/decoding data for transmission over the intended media and the command structure for accessing the media. FC-2 layer defines how data blocks are segmented into frames, how the frames are handled according to the class of service, and the mechanisms for flow control and ensuring frame data integrity.
The FC-3 layer defines facilities for data encryption and compression. The FC-4 layer is responsible for mapping SCSI-3 protocols (FCP) and other higher layer protocols/services into Fibre Channel commands.
FC Layers & OSI Layers Comparison
FC Topologies 1. Point-to-Point Topology Up to 2 Devices (ports) can be in p-to-p. Each devices are called Nodes and each node port are designated as N-Port. Point-to-point connections can access the entire 100MB/sec bandwidth available with FC for communication between two nodes.
2. Arbitrated Loop Topology Arbitrated Loop (AL) allows up to 127 ports to be connected in a circular daisy chain. The ports in an AL are designated as NL_Ports, and two ports can be active simultaneously. The other ports function as repeaters and simply pass the signal along. This means, of course, that the bandwidth of 100MB/sec is shared among all devices. Arbitrated Loop uses ALPA (8-bits) Addressing.
3. Switched Fabric Topology: Switched Fabric Allows up to 16 million devices to connect together. FC devices are connected to the network via the F_Ports or FL_Ports. The connection between the individual ports on the network functions similarly to a telephone system. Switched Fabric uses Dynamic (24-bits) Addressing.
10
FC Addressing & Ports WWN (World Wide Names) Static 64-bit address for each port IEEE Assigned by Block to Manufactures Dynamic 8-bit Address when connected to arbitrated loop
AL_PA (Arbitrated Loop Physical Address)
S_ID (Native Address Identifier)

Dynamic 24-bit address assigned to a node in Fabric Basic Port Types
N_Port (Node) End-Point; typically HBA or Disk; Connect to other N_Ports or F_Ports F_Port (Fabric) Found only in Switch; connect directly to N_Ports E_Port (Expansion) Found only in Switch; connect directly to E_Ports in Other Switches to expand SAN
Used in Arbitrated Loops NL_Port (N_Port with Arbitrated Loop Capabilities) Connect directly to N_Port, F_Port, NL_Port or FL_Port Switch port connects to N_Port and NL_Port Switch port that can be F_Port, FL_Port, or E_Port FL_Port (F_Port with Arbitrated Loop Capabilities) G_Port (Generic)
FC Flow Control
Buffer-to-Buffer Credit: This type of flow control deals only with the link between
an N_Port and an F_Port or between two N_Ports. Both ports on the link exchange values of how many frames it is willing to receive at a time from the other port. This value becomes the other port's BB_Credit value and remains till they are logged in.
End To-End Credit: End-to-End flow control is not concerned with individual links,
but rather the source and destination N_Ports. The concept is very similar to bufferto-buffer flow control. When the two N_Ports log into each other, they report how many receive buffers are available for the other port. This value becomes EE_Credit.
11
End to End Credit

(ACK)
N_Po rt 1 NOD EA
F_Po rt 1 Buffer to Buffer Credit

(R_RDY)
F_Po rt 2 Buffer to Buffer Credit
N_Po rt 2 NOD EB
FAB RIC
(R_RDY)
FC Class of Service
Class 1
Guaranteed Bandwidth an Delivery
Dedicated Connection
End-to-End Flow Control (R_RDY and ACK)
Class 2
Guaranteed Delivery (ACK required) Connectionless Service Buffer to Buffer & End to End Flow Control (R_RDY and ACK) Out of Order Delivery of Frames Allowed
Class 3
Delivery Managed Exclusively by Flow Control (R_Rdy) Connectionless Service Out of order delivery of frames allowed
Intermix
Enhanced Class 1, allows Class 2 or Class 3 frames between Class 1 frames.
12
Class 4 Class 4 can be used only with the pure Fabric topology. One N_Port will
set up a Virtual Circuit (VC) by sending a request to the Fabric indicating the remote N_Port as well as quality of service parameters. The resulting Class 4 circuit will consist of two unidirectional VCs between the two N_Ports.
Class 5 The idea for Class 5 involved isochronous, just-in-time service. However,
it is still undefined.
Class 6 Class 6 provides support for multicast service through a Fabric at the
well-known address of hex'FFFFF5' FC AL initialization
Loop initialization happens mainly for two reasons.

When a Loop is newly formed with all FC devices and about to come up in network. When any loop failures happens initialization starts.
3 main functions happens during Loop initialization.

LIP Primitive sequences.
LIP is transmitted by an L-Port after it powers on, or when it detects Loop Failure (loss of synchronization at its receiver). The LIP will propagate around the Loop, triggering all other L_Ports to transmit LIP as well. At this point, the Loop is not usable. Selection of Loop Master. This is done by the L_Ports constantly transmitting Loop Initialization Select Master (LISM) frames. Selection of ALPA by all the devices in the Loop. The concept of an AL_PA bitmap is used, where each L_Port selects (and sets) a single bit in the bitmap of a frame originated by the Loop master and repeats the frame back on the Loop. There are 127 available bits, corresponding to the 127 valid
13
AL_PAs. This Process is done using following four frames.
LIFA A certain AL_PA was assigned by the Fabric. LIPA before this initialization, the L_Port had a valid AL_PA. to claim. claims the first available AL_PA that is left. Two additional frames may be sent by the Loop master, but only if all L_Ports on the Loop support. LIRP & LILP. LIHA the L_Port has a certain AL_PA it tries LISA the L_Port
FC AL Arbitration
Arbitrated Loop is not a token-passing scheme. When a device is ready to

transmit data, it first must arbitrate and gain control of the Loop. It does this by transmitting the Arbitrate (ARBx) Primitive Signal, where x = the Arbitrated Loop Physical Address (AL_PA) of the device. Once a device receives its own ARBx Primitive Signal, it has gained control of the Loop and can now communicate with other devices by transmitting an Open (OPN) Primitive Signal to a destination device. Once this happens, there essentially exists point-to-point communication between the two devices. All other devices in between simply repeat the data.
If more than one device on the Loop is arbitrating at the same time, the x
values of the ARB Primitive Signals are compared. When an arbitrating device receives another device's ARBx, the ARBx with the numerically lower AL_PA is forwarded, while the ARBx with the numerically higher AL_PA is blocked.
Unlike token-passing schemes, there is no limit on how long a device may retain
control of the Loop. This demonstrates the Channel" aspect of Fibre Channel.
14
There is, however, an Access Fairness Algorithm, which prohibits a device from arbitrating again until all other devices have had a chance to arbitrate. The catch is that the Access Fairness Algorithm is optional.
Fibre Channel over IP Two popular solutions for extending the Fibre Channel over the IP network are FCIP and iFCP. Main Reasons for Fibre Channel over IP Networks are as following: Leverage existing storage devices (SCSI and Fibre Channel) and networking infrastructures (Gigabit Ethernet); Maximize storage resources to be available to more applications; Extend the geographical limitations of DAS and SAN access; Use existing storage applications (backup, disaster recovery, and mirroring) without modification; and Manage IP-based storage networks with existing tools and IT expertise.
FCIP Protocol:
15
FCIP is currently the most widely supported IP based extension protocol. This is probably due to the fact that it is simple and easy to implement. The basic concept of FCIP is a tunnel that connects two or more Fibre Channel SAN islands through a IP network. Once connected, the two SAN islands logically merge into a single fabric across the IP tunnel. An FCIP gateway is required to encapsulate Fibre Channel frames into TCP/IP packets. This is sent through the IP network. On the remote side, another FCIP gateway receives the incoming FCIP traffic, strips off the TCP/IP header before forwarding thenative Fibre Channel frames into the SAN (Figure). The FCIP gateway can be a separate device, or its functionality can be integrated into the Fibre Channel switch. The obvious advantage of using FCIP is that existing IP infrastructure can be used to provide the distance extension. How FCIP works FCIP solutions encapsulate Fibre Channel packets and transport them via TCP/IP, which enables applications that were developed to run over Fibre Channel SANs to be supported under FCIP. It also enables organizations to leverage their current IP infrastructure and management resources to interconnect and extend Fibre Channel SANs. FCIP is a tunneling protocol that uses TCP/IP as the transport while keeping Fibre Channel services intact. FCIP relies on IP-based network services and on
16
TCP/IP for congestion control and management. It also relies on both TCP/IP and Fibre Channel for data-error and data-loss recovery. In FCIP, gateways are used to interconnect Fibre Channel SANs to the IP network and to set up connections between SANs, or between Fibre Channel devices and SANs. Like iSCSI, there are a number of "pre-standard" FCIP products on the market. iFCP Protocol: How iFCP works Fibre Channel devices (e.g., switches, disk arrays, and HBAs) connect to an iFCP gateway or switch. Each Fibre Channel session is terminated at the local gateway and converted to a TCP/IP session via iFCP. A second gateway or switch receives the iFCP session and initiates a Fibre Channel session. In iFCP, TCP/IP switching and routing elements complement and enhance, or replace, Fibre Channel SAN fabric components. The protocol enables existing Fibre Channel storage devices or SANs to attach to an IP network. Sessions include device-to-device, device-to-SAN, and SAN-to-SAN communications. From the IP side, each of the Fibre Channel devices connected to the iFCP gateway is given a unique IP address, which is advertised in the IP network. This allows individual Fibre Channel devices to be reached through the IP network via the iFCP gateway.The ability to individually address devices gives iFCP some advantages compared to the FCIP protocol. The biggest advantage is that of stability. Using FCIP between two Fibre Channel SAN islands will cause the islands to merge into one. This means if there are perturbations in the IP network, it can potentially cause the fabric to rebuild on both sides of the IP tunnel. Using iFCP, the connectivity is between individual devices, and the fabrics stay separate. If perturbations occur in the network, it may affect individual connections, but will not cause fabric rebuilds, thus leading to more stable fabrics on both sides of the IP network.The disadvantage compared to FCIP is the limited availability of iFCP solutions in the market place. This could be because FCIP is very simple to implement, thus the FCIP solution is widely available and is provided by a number of different manufacturers. In contrast, iFCP is only supported by a limited number of vendors.
Fibre Channel Switch
17
I. FC Switch configuration 1. Open up a HyperTerminal 2. Login as admin 3. Enter password for password 4. Type in configure a. Configure entire switch 5. Type in Help a. List commands possible 6. Type ipAddrSet 7. Enter the Ethernet IP address a. Get from Tom York 8. Enter the common SubNet Mask a. 255.255.252.0 9. Hit enter twice after SubNet Mask a. Use default values 10. Enter the gateway number, which is the same throughout the lab a. 147.145.175.254 11. When ask to set respond by entering <y> 12. Now type ipAddrShow a. Make sure the ip address held 13. Now reboot a. This will take several minutes II. Enable the Switches 1. Connect to the internet
2. Enter the address http:// ip address of the switch

3. Now click on zone admin 4. Enter admin for user name 5. And enter password for password 6. For zone selection select switch/port level zoning 7. And click <ok> 8. Click port zone tab 9. Click create zone 10. Name zone
18
11. Go to switch port, domain and select ports 0 through 7 12. Click add mem => 13. Create another zone 14. Select ports 8 through 15 15. Now select port config tab 16. Highlight both new zone add them a. Under file zones 17. Add mem => 18. Click enable config and then apply and okay a. All located on bottom of screen
Switch Behavior: Switch Initialization:
At the Power ON, boot PROM diagnostics :
Verify CPU DRAM Memory. Initialize base Fabric Operating System (FOS).
The Initialized FOS does the following:
Execute Power-On Self Test (POST) on switch. Initialize ASICs & Front panel. Initialize link for all ports (put online). Explore the Fabric and determine the Principal Switch. Assign Addresses to Ports. 19
Build Unicast routing table. Enable N-Port operations.

Fabric Port Initialization Process: (From Switch Prospective)
Transition 1: At the beginning, verify if anything is plugged to Switch Port. Transition 2: At FL-Port, Is there any Loop connections present in the Switch. Transition 3: At G-Port, Verifying if any other (switch or Hubs) devices connected. Transition 4: After G-Port, Verifying if Switch or Point-to-Point devices connected.
Communication Protocols:
Fabric Devices typically
FLOGI PLOGI to Name Server SCR to Fabric Controller Register &

Query [using FC Common Transport (FC_CT) Protocol] LOGON.
Loop Devices typically
PRIVATE NL: LIP (PLOGI & PRLI will enable private storage devices that
accept PRLI & thus appear Fabric capable)
PUBLIC NL: LIP FLOGI PLOGI SCR Register & Query LOGO &
then PLOGI & Communicate with other end nodes in the fabric.
LIP Process include: LIP, LISM, LIFA, LIPA, LIHA, LISA and LIRP & LILP. 20
Switch Commands: (general from Brocade switches) Help Switchshow fabricshow Switchenable/disable Nsshow nsallshow Zoneshow/alishow/cfgshow Cfgenable Cfgdisable Cfgcreate Zonecreate Errdump Licenseshow Portcfgdefault Portenable/portdisable Wwn urouteshow
21
22
23
24
Switch or Fabric Zoning: SAN implementations make data highly accessible; as a result, there is a need for datatransfer optimization and finely tuned network security. Fabric zoning sets up the way devices in the SAN interact, establishing a certain level of management and security. What is zoning? Zoning is a fabric-centric enforced way of creating barriers on the SAN fabric to prevent set groups of devices from interacting with other devices. SAN architectures provide port-toport connections between servers and storage subsystems through bridges, switches, and hubs. Zoning sets up efficient methods of managing, partitioning, and controlling pathways to and from storage subsystems on the SAN fabric, which improves storage subsystem utilization, data access, and security on the SAN. In addition, zoning enables heterogeneous devices to be grouped by operating system, and further demarcation based on application, function, or department. Types of zoning There are two types of zoning: soft zoning and hard zoning. Soft zoning uses software to enforce zoning. The zoning process uses the name server database located in the FC switch. The name server database stores port numbers and World Wide Names (WWNs) used to identify devices during the zoning process. When a zone change takes place, the devices in the database receive Registered State Change Notification (RSCN). Each device must correctly address the RSCN to change related communication paths. Any device that does not correctly address the RSCN, yet continues to transfer data to a specific device after a zoning change, that device will be blocked from communicating with its targeted device. Hard zoning uses only WWNs to specify each device for a specific zone. Hard zoning requires each device to pass through the switchs route table so that the switch can regulate data transfers by verified zone. For example, if two ports are not authorized to communicate with each other, the route table for those ports is disabled, and the communication between those ports is blocked.
25
Zoning components Zone configurations are based on either the physical port that devices plug into, or the WWN of the device. There are three zoning components: Zones Zone members Zone sets What is a zone? A zone is composed of servers and storage subsystems on a SAN that access each other through managed port-to-port connections. Devices in the same zone recognize and communicate with each other, but not necessarily with devices in other zones unless a device, in that zone, is configured for multiple zones. Figure 1 shows a three-zone SAN with zones 1 and 3 sharing the tape library in zone 2.
Figure 1: Three-Zone SAN Fabric Zone types Port zoning (all zone members are ports) WWN zoning (all zone members are WWNs) Session-based zoning (zone members are a mixture of WWNs and ports)
26
Zone database Zone database consists of zone objects. A zone object can be an alias, a zone, or a configuration Configurations contain zones which contain aliases For any object, the commands available allow you to create, delete, add, remove, or show Alias An alias is a name for a device in the fabric The alias contains the name of the devices, and either the WWN of the device, or the domain and port the device is attached to WWN alias: alicreate alias1,10:00:00:00:01:01:02:02 Port alias: alicreate alias2,100,15 cfgcreate/delete/add/remove/show zonecreate/delete/add/remove/show alicreate/delete/add/remove/show
Every switch in the fabric has the same copy of the entire database. To clear the zone database from a switch, use cfgclear
What is a zone member? Zone members are the devices within the same assigned zone. See Figure 2. Zone member devices are restricted to intra-zone communications, meaning that these devices can only interact with members within their assigned zone. A zone member cannot interact with devices outside its assigned zone unless it is configured in other zones.
27
Figure 2: Zone Members How is a zone member identified? Each zone member is identified by a WWN or port number. each device has a unique WWN. A WWN is a 64-bit number that uniquely identifies each zone member. What is a zone set? A zone set is a group of zones that function together on the SAN. Each zone set can accommodate up to 256 zones. All devices in a zone see only devices assigned to their zone, but any device in that zone can be a member of other zones. In Figure 3, all 4 zones see Member A.
28
Figure 3: Zone Set Configurations A configuration is a set of zones. You can have multiple defined configurations, but only one active configuration in a fabric at any time. cfgcreate cfg1,zone1 To enable a configuration, use cfgenable config1. This is now called the effective configuration To disable the effective configuration, use cfgdisable command. disable zoning that all devices can now see each other! Note when you
29
Zone Commit A zone commit is the process of updating all switches in the fabric when making a zone change Zone commit is executed for cfgdisable, cfgenable, or cfgsave commands Zone commit uses RCS protocol. The switch making the commit communicates with each switch individually to ensure commit took place
30
When zone commit takes place, entire zoning database is sent to all switches even if only a little change has taken place.
RCS [Reliable Commit Service] RCS is used for zoning, security, and some other things. For zoning, RCS ensures a zone commits happens for every switch in the fabric, or none at all 4 phases to RCS: ACA, SFC, UFC, RCA
Zoning limitation Currently, fabric zoning cannot mask individual tape or disk storage LUNs that sit behind a storage-subsystem port. LUN masking and persistent binding are used to isolate devices behind storage-subsystem ports.
Components of FC-SAN While SAN configurations can become very complex, a SAN can be simplified to three basic entities; The host system or systems, the network and the storage device 1. Host System(s) Application Software (SAN Management Software, CLI Interface and others) Middleware (e.g., Volume Manager or Host RAID) Operating System/File System Host Bus Adapter (HBA) Driver Host Bus Adapter (HBA) Host Bus Adapter Firmware 2. Storage Network/Communications Infrastructure Physical Links (FC, ISCSI, Ethernet) Transceivers (GBIC & SFP or any other Transreceiver) Switches and Switch Firmware (Switches & Directors) Routers and Router Firmware Bridges or Extenders and their Firmware
31
3. Storage Device(s) Interface Adapter Interface Adapter Driver/Firmware Storage Controller Firmware Storage Device (e.g., disk, JBOD, Storage Arrays, Tape or Tape Library) Storage Media Storage Area Network Management
1. Storage Management Software 2. SAN Protection and Security 3. Storage Backup, Disaster Recovery & Data Replication.
1. SAN Management Software Though typically spoken of in terms of hardware, SANs very often include (or require) specialized software for their operation. In fact, configurating, optimizing, monitoring, and securing a contemporary SAN will almost certainly involve advanced software, particularly centralized management tools. When considering more complex options, such as High Availability configurations, selecting the proper management software can be just as critical as choosing the equipment. Though somewhat recent in its development, SAN management software borrows heavily from the mature ideas, benefits, and functionality that have been available for traditional LANs and WANs. Ideally, this new category of software would be universal and work with any SAN equipment. But in today's multi-vendor and hardware diverse SAN environments, this software is very often proprietary and/or tied to certain products and vendors. While this situation is beginning to change, SAN management software today must be selected with great care. Much consideration has to be given to the SAN equipment manufacturers, OS platforms, firmware revisions, HBA drivers, client applications, and even other software that may be running on the SAN. Until SAN management software becomes very universal, it will continue to be quite important, and even vital, to work closely with product (and total solution) providers in order to successfully implement, and realize, the best features that SANs have to offer.
32
Using Management Software following actions can be done
Drive Management (Fail drive, Rebuild drive, initialize drive, online/offline drive, Hotspare drive, Drive FW Upgrade/Downgrade). Controller Management (Volumes ownership, active/passive mode, online/offline, Controller Firmware & NVSRAM Upgrade/Downgrade) Storage Array Management Monitor Performance, Event DataBase, Collect Logs,
( Array Management: Array Profile, Add/Remove Array, Rename or Modify Array, Reset Array Configuration, Connectivity Status. Storage Management: Create Logical drive/ LD Group, Delete LD/ LDGroup, Modify LD Capacity (DCE & DVE), DRM, Modify LD Settings, LUN Mapping, LUN Masking) Switch management can be done using Switch Vendor Management Softwares.
Fig. SAN Storage Array
33
Storage Array In data storage, an array is a Hardware consisting of multiple Storage devices designed to full-fill the need for Data storage. It consists of one or more Controller modules (boxes) and one or more Drive Modules(boxes). Command Module The command module is the housing for the controllers. It allows the user to hot swap controllers and components. This is made possible due to the redundant nature of the controller canister. The components that the user can hot swap include redundant power supplies, batteries, fans, and communications. Additionally, it houses two controllers that are hot swappable. Below is the front and back of a command module. Controller The controller is the brains behind the array. It can be loaded with different controller The firmware code that will enable different features which the array can perform. swappable.
controllers are redundant (there are two housed in the command module) and they are hot One controller can fail and the other will control the array until the failed The Heartbeat light should be blinking during normal operation. controller is replaced.
Below is the front view of a controller. Gigabit Interface Converter (GBIC) and Small Form Factor Pluggable(SFP) - The GBICs and SFPs allow us to connect our host and drive trays to the controller canister through fibre channel cables. There are two sizes of: fibre channel mini-hubs, GBICs, and cables. The LC small form factor connector cable corresponds to the 2 Gb/s fibre channel while the SC larger connector cable corresponds to the 1 Gb/s fibre channel. Below are the 1 Gb/s GBIC, and the 2 Gb/s SFP. 1 Gb/s GBIC 2 Gb/s SFP
34
FIRMWARE A type of software on controllers, drives, or any other storage components that contains instructions for their operation. it includes the RAID algorithms and other features implemented & the real-time kernel, the Diagnostics Manager, the firmware to initialize the hardware, and the firmware to upload and initialize the other parts of the downloadable firmware. NVSRAM Acronym is Non-Volatile Static Random Access MemoryA controller file that specifies default settings for the controller. The file uses either a permanently connected battery or takes advantage of the non-volatile cache to store data indefinitely in the event of a power failure VOLUME or Logical Drive A volume is a region of storage that is provided by the controller and is visible to external I/O hosts for data access. Each volume has a RAID level associated with it. A given volume resides in exactly one volume group. VOLUME GROUP or Logical Drive Group A volume group is a collection of volumes whose storage areas reside on the same set of physical drives in the array. A volume group contains one or more volumes and consists of one or more physical drives. A volume group comes into existence when the first volume is created on it. HOTSPARING The purpose of the Hot Spare drives is to serve as "immediate" replacements for failed drives that had been configured as part of a storage volume. When a configured drive failes, the firmware automatically recognizes that the failure has occurred and will select one of the Hot Spare drives to replace the failed drive; data reconstruction begins immediately to the selected spare once it's been integrated into the volume containing the ailed drive. Once reconstruction has completed and the user has replaced the original failed drive, data from the Spare will be copied back to the original drive's replacement. Once the copy-back operation is complete, the Hot Spare is returned to the spare pool. All reconstruction and
35
copy-back operations happen without significant interruption to I/O processing for the affected storage volume. Dynamic RAID Migration The Dynamic RAID Migration feature provides the ability to change the RAID level for the volumes of a drive group. Changing the RAID level causes the volumes in the drive group to be reconfigured such that the data is mapped according to the definition of the new RAID level. Dynamic Capacity Expansion The Dynamic Capacity Expansion feature provides the ability to add drives to a drive group. Adding drives to a drive group causes the volumes to be reconfigured such that the data is spread over the drives in the newly expanded drive group. After reconfiguration, all unused capacity is evenly distributed across all drives following the last volume. This unused capacity may be used to create additional volumes on the drive group. Dynamic Volume Expansion The Dynamic Volume Expansion feature provides the ability to increase the size of a volume if there is sufficient amount of free capacity on the drive group. If there is not enough free capacity the DVE can be coupled with Dynamic Capacity Expansion to add additional capacity. Active-Active Controller Setup Mode In a traditional Active-Active configuration, both controllers are working concurrently to serve host I/O requests and transfer data. In this mode, when both controllers are operating normally, the system is theoretically able to handle twice the workload and traffic, doubling the speed of the system compared to the Active-Passive configuration. However in practice, performance increases are much less significant. In the event of a controller failure in traditional Active-Active configurations, the remaining controller automatically assumes responsibility for handling all I/O requests and data transfer. Once the failed controller is replaced, the controllers will automatically read the configuration of drives and LUNs in the system, and return to normal operation.
36
Active-Passive Controller Setup Mode Active-Passive is a dual controller configuration where two controllers provide full redundancy to all disks, disk enclosures, and Fibre Channel host connections. In an ActivePassive configuration, the primary (active) controller services all host I/O requests and performs all data transfers, while the passive controllers remains alert to the active controllers status using bi-directional heartbeat communications. Typically, the available space in the RAID array is divided up into an arbitrary number of logical units (LUNs). The
37
capacity of each LUN can be spread across multiple controller Fibre Channels and disk drives. In this configuration, both the active and passive controller know the logical volume configuration. In the event of a primary controller failure, the passive controller automatically and seamlessly assumes I/O and data transfer activities without interrupting system performance or operation. It is important to note that one of the advantages to Active-Passive is that there is no degradation of performance when one controller fails or is taken off-line for maintenance. SAN Failover Mechanisms Storage Array or Controller side Failover Mechanisms RAID controllers generally have two different characteristics for access to the LUNs:
1. Active/Active. & 2. Active/Passive

Higher-end and enterprise controllers are always Active/Active. Mid-range and lower-end controllers can be either. How the controller manages internal failover and your server side, software and hardware will have a great deal to do with your choices for accomplishing HBA and switch failover. Before developing a failover or multipathing architecture, you need to fully understand the issues with the RAID controller. With Active/Active controllers, all LUNs are seen and can be written to by any controller within the RAID. Generally, with these types of RAID controllers, failover is not a problem, since the host can write or read to any path. Basically, all LUN access is equal, and load balancing I/O requests and access to the LUNs in case of switch or HBA failover is simple. All you have to do is write to the LUN from a different HBA path. Active/Passive Increases Complexity If your RAID controller is active/passive, the complexity for systems that require HBA failover can increase greatly. With active/passive controllers, generally the RAID system is arranged in a controller pair where both controllers see both LUNs, but LUNs have a primary path for access to a LUN and a secondary path. If the LUN is accessed via the secondary path, the ownership of the LUN changes from the primary path to the secondary path. This is not a problem
38
if the controller has failed, but if the controller path has failed, either the HBA or switch and other hosts are accessing that LUN via its primary path. Now each time one of the other LUNs accesses the LUN on the primary path, the LUN moves from ownership on the secondary path to ownership on the primary path. Then when the LUN is again accessed on the secondary path, the LUN fails over again to the secondary path. This ping-pong effect will eventually cause the performance of the LUN to drop dramatically. Host-Side Failover Options
On the host side, there are three options for HBA and switch failover, and in some cases, depending on the vendor, load balancing of I/O requests across the HBAs. Here they are in order of hierarchy in the operating system: 1.Volume manager and/or File system failover 2. A failover and/or load balancing driver failover 3. HBA driver failover Each of these has some advantages and disadvantages what they are depends on your situation and the hardware and software you have in the configuration. In the drawing below, we have an example of a mid-range RAID controller connected in an HA configuration with dual switches and HBAs, and with a dual-port RAID controller for both Active/Active and Active/Passive. Fig. Active/Active Controller Setup Active/Passive Controller Setup
39
With an Active/Active RAID controller configuration, the failover software knows the path to each of the LUNs and ensures that it will be able to get to the LUN through the appropriate path. With this Active/Active configuration, you could access any of the LUNs via any of the HBAs with no impact on the host or another host, and both controllers can equally access any LUN. If this were an Active/Passive RAID controller, it would be critical to access LUNs 0, 2 and 4 with primary controller A if a switch or HBA failed. You would only want to access LUNs 0, 2, and 4 from controller B if controller A failed. If a port on controller A failed, you would want to access the LUNs via the other switch and port and not via controller B. If you did access via controller B, and another host accessed the LUNs via controller A, the ownership of the LUNs would pong-ping and the performance would plummet.
Volume Manager and File System Failover Options Volume managers such as Veritas VxVM and file systems such as ADIC StorNext and a number of Linux cluster file system vendors understand and are able to maintain multiple potential paths to a LUN. These types of products are able to determine what the appropriate path to the LUN should be, but oftentimes for Active/Passive controllers, it is up to the administrator to determine the correct path(s) to access the LUNs
40
without
failing
over
the
LUNs
to
the
other
controller
unnecessarily.
Failover at this layer was the initial type of HBA and storage failover available for Unix systems. Failover at the file system layer allows the file system itself to understand the storage topology and load balance it. On the other hand, you could be doing a great deal more work in the file system that might belong at lower layers that have more information about the LUNs and the paths. Volume managers and file system multipathing also support HBA load balancing. Loadable Drivers
Loadable drivers from vendors such as EMC (PowerPath) and Sun (Traffic Manager) are examples of loadable drivers that manage HBA and switch failover. You need to make sure that the hardware you plan to use with these types of drivers is supported. For example, according to the EMC Web site, EMC PowerPath currently supports only EMC Symmetrix, EMC CLARiiON, Hitachi Data Systems (HDS) Lightning, HP XP (Hitachi OEM) and IBM Enterprise Storage Server (Shark). According to Sun's Web site, Sun Traffic Manager currently supports Sun Storage and Hitachi Data System HDS Lightning. Other vendors are developing products that will provide similar functionality. As with the volume manager and file system method for failover, loadable drivers also support HBA load balancing as well as failover. HBA Driver Failover
HBA drivers on some systems provide the capability for the drive to maintain and understand the various paths to the LUNs. In some cases, this failover works only for Active/Active RAIDs, and in other cases, depending on the vendor and the system type, it works for both type of RAIDs (Active/Active and Active/Passive). Since HBA drivers often recognize link failures and link logins faster than other methods, using this failover mechanism generally allows for the fastest resumption of I/O, since at the lowest level you have the greatest knowledge. SAN Features for DATA Backup, Replication and High Availability Snapshots for Backup
41
Snapshots can be of two types Point-in-time Image of actual Volume and Full image of the Actual Volume. Point-in-time Image of actual Volume: A logical point-in-time image of actual volume & it can be simply called as Snapshot. A snapshot is the logical equivalent of a complete physical copy, but you create it much more quickly than a physical copy and it requires less disk space. A repository Volume is additional volume associated to Snapshot and it saves overwritten blocks from the actual or Base volume. The repository volume contains the original image of any modified data along with meta-data describing the location where it is stored in the repository. The repository volume is not accessible to external hosts. Point-in-time image (snapshot) can be created using "copy-on-write" scheme.
Note: Exactly one repository volume is created per snapshot. Increasing the capacity of the base volume does not change existing snapshots. When a base volume is deleted, all associated snapshot volumes and their repositories are also deleted. When a snapshot is deleted, its associated repository volume is also deleted. The Snapshot allows the end-user to quickly create a single point-in-timeimage or "snapshot" of a volume. The primary benefit of this feature is data backup. Online backup images can be created periodically during the course of the day without disrupting normal operations. Full image of the Actual Volume (Cloning):
42
The Volume Full Copy or Clone is used to copy data from one volume (the source) to another volume (the target) on a single storage array. This feature can be used to back up data, to copy data from volume groups that use smaller capacity drives to volume groups using greater capacity drives, or to restore snapshot volume data to the associated base volume.
When you create a volume copy, a copy pair is created, which consists of a source volume and a target volume that are located on the same storage array. The source volume is the volume that accepts host I/O and stores data. The source volume can be a standard volume, snapshot volume, base volume of a snapshot volume.
When a volume copy is started, data from the source volume is copied in its entirety to the target volume. The source volume is available for read I/O activity only while a volume copy has a status of In Progress, Pending, or Failed. After the volume copy is completed, the source volume becomes available to host applications for write requests.
A target volume contains a copy of the data from the source volume. The target volume can be a standard volume, a base volume of a failed or disabled snapshot volume. While the volume copy has a status of In Progress, Pending, or Failed, read and write requests to the target volume will be rejected by the controllers. After the volume copy is completed, the target volume automatically becomes read-only to hosts, and writes requests to the target volume will be rejected. The Read-Only attribute can be changed after the volume copy has completed or has been stopped.
Additionally, volume copy can be used to redistribute data moving volumes
from older, slower disk drives to newer, faster, or higher capacity drives to optimize application performance and/or capacity utilization. Remote Volume Mirroring: Remote Volume Mirroring allows for protection against and recovery from disasters or catastrophic failures of systems or data centers. When a disaster occurs at one site the secondary (or backup) site takes over responsibility for computer services. RVM maintains a fully synchronized image of key data at the secondary site so that no data is lost and minimal interrupt of overall computing services occurs if a disaster or failure occurs. It is a controller-level, firmware-based mechanism for ensuring fully synchronized data replication between the primary and secondary sites. A mirroring relationship is comprised of exactly two volumes, each residing on separate arrays. One volume acts in a primary role, servicing host I/O, the other acts as the backup secondary volume. Replication is managed on a per-volume basis. This allows the storage administrator to associate a
43
distinct remote mirror volume with any/every primary volume of a given storage array. A given array's primary volumes can be mirrored to secondary volumes that reside on multiple distinct remote storage arrays. The following figure shows one possible configuration with a primary and backup data center.
Mirroring relationships are established at the volume level between two storage arrays. Our terminology refers to the primary volume as the one receiving host I/O, the secondary volume is the stand-by mirrored image of the primary. The array controllers manage synchronization activities, both in the initial image synchronization from primary to secondary and replicating host write data.One host channel of each array is dedicated to inter-array data movement. The dedicated host port of the two arrays must be connected via fibre channel fabric with name service. The name service function allows the two arrays to locate each other in the fabric network and perform the required login initialization. 2. SAN Protection and security When the word security is used in association with a SAN, thoughts can easily lead to computer hackers infiltrating the network and causing havoc. Although hacker invasions are a concern, there is another security issue associated with a SAN that must be addressed, and that is the issue of technology containment. For example, Windows NT servers would naturally claim every available Logical Unit Number (LUN) visible to them. In brief, technology containment keeps servers from gaining unauthorized or accidental access to undesignated areas within the SAN. The two major areas of concern with SAN implementations are data access and fabric
44
management security.
Security at Different Stages Open systems offer many different file systems, volume and disk management formats and software requiring that security issues be considered and then implemented during the SAN design and development phase, for the following reasons: A. Data access and security B. Fabric management and security (protection from outside threats) C. Higher levels of availability to data and the applications that use the data 2.A. Data access and security 2.A.I. Questions concerning data access and security Concerning data access and security on a SAN, consider the following questions: 1. How can we segregate operating systems at the port level on the SAN fabric? It is not advisable to have Windows NT and Sun Solaris systems accessing the same RAID-array port on the SAN fabric because Windows NT will attempt to write disk signatures to all new disk LUNs it finds attached to the SAN fabric. That creates the need for a network fabric-enforced way of segregating ports into logical groups of visibility. 2. How can we segregate different application types on the SAN fabric? For example, it may be necessary to ensure that finance systems on the SAN fabric cannot access the data owned by engineering systems, or web systems. That creates the need for a fabric-enforced way of grouping ports on the SAN fabric into zones of visibility based on application, function, or departmental rules. 3. How can we isolate any single LUN on an array, permitting only a certain host(s) access to that LUN and no others?
45
A basic advantage of a SAN is that a large number of hosts can share expensive storage resources. As it concerns RAID storage subsystems, this demands that multiple hosts have access to its disk storage LUNs through a single-shared port on the array. Therefore, it is necessary to employ security methods to ensure that LUNs behind a port are accessible only by the intended hosts. Without special software and architectures to manage multi-host block-level read/write access (when multiple systems access the same LUN concurrently), data corruption or data loss could occur. 4. How can we, from the host side, ensure that hosts see their storage ports and storage LUNs consistently when adding new storage LUNs, and after each reboot? In the world of SANs, the assignment of Small Computer System Interface (SCSI) target IDs is moved from the storage side to the host/Fiber Channel (FC) Host Bus Adapter (HBA) side. Thus, SCSI target IDs can be dynamically reassigned as new storage LUNs are added to an individual host via the SAN. Since this feature is a fundamental advantage of SAN architectures, the assignment of SCSI target IDs requires management to ensure their consistency across storage subsystems, SAN fabrics, and after host configuration changes. 2. A.II. Data access and security methodologies The following are data access and security methodologies: Fabric zoning is fabric-centric enforcement: It provides a fabric port-and host/ storage-level point of logical partitioning and can help ensure that different OS types or applications are partitioned on the SAN. Fabric zoning is managed and enforced on the SAN fabric. Fabric zoning cannot mask individual LUNs that sit behind a port. All hosts connected to the same port will see all the LUNs addressed through that port.
LUN Masking is RAID storage subsystem-centric enforcement: LUN Masking is configured at the RAID storage subsystem level; this helps ensure that only designated hosts assigned to that single storage port could access the specified RAID LUN. LUN masking is a RAID system-centric enforced method of masking multiple LUNs behind a single port. LUN masking configuration occurs at the RAID-array level, using World Wide Port Names (WWNs) of server FC HBAs. See Figure 4. LUN masking allows disk storage resource sharing across multiple independent servers. With LUN masking, a single large RAID subsystem can be subdivided to serve a number of different hosts that attach to it through the SAN fabric. Each LUN (disk slice, portion, unit)
46
inside the RAID subsystem can be limited so that only one or a limited number of servers can see that LUN. LUN masking can occur either at the server FC HBA or at the RAID subsystem (behind the RAID port). It is more secure to mask LUNs at the RAID subsystem, but not all RAID subsystems have LUN masking capability; therefore, some FC HBA vendors allow persistent binding at the driver level to mask LUNs.
Figure 4: LUN Masking Persistent Binding is host-centric enforcement: This consistently forces a host to see a specific storage-subsystem port as a particular SCSI target. Persistent binding also helps ensure that a specific storage-subsystem port on the SAN is always seen as the same SCSI Target ID on the host, across the host and fabric, and throughout storage configuration changes. OS and upper-level applications (such as LANfree backup software) typically require a static or predictable SCSI Target ID for storage and reliability purposes. Persistent binding is a host-centric enforced way of directing an operating system to assign certain SCSI-target IDs and LUNs. For example, where a specific host will always assign SCSI ID 3 to the first router it finds, and LUNs 0, 1, and 2 behind the port to the three-tape drives attached to the router, as shown in Figure 5. Operating systems and upper-level
47
applications (such as backup software) typically require a static or predictable SCSI target ID for their storage reliabilitypersistent binding makes that possible.
Figure 5: Persistent Binding * LUN Mapping, in addition to persistent binding, is another host-centric method of storage visibility management. LUN Mapping selectively allows a system administrator to scan for specified SCSI targets and LUNs at storage-driver boot time and to ignore selectively nonspecified SCSI targets and LUNs. The advantage of LUN Mapping is that it provides a level of security management in SANs where LUN Masking is not an option, perhaps because it is not supported on the storage hardware. The disadvantage is that LUN Mapping is configured and enabled on a host-byhost basis. It requires good coordination among the administrators of the systems sharing the storage, which ensures that only one host sees certain storage unless planned, as in a clustered server configuration. 2.B. Fabric management and security (protection from outside threats) 2.B.I. Questions concerning SAN fabric-level security Concerning SAN fabric-level security, consider the following questions: 1. How can we manage switch-to-switch security on the SAN fabric; also, how can we enforce security policies that prohibit unauthorized switches or hosts from attaching to the SAN fabric?
48
In early SAN infrastructures, additional switches (configured with a default password and login) could easily attach to an existing operating SAN fabric, and that new non-secure switch could be used as a single point of configuration administration for the entire SAN fabric. There is a need for technologies that enforce access control at the fabric-level, and ensure only authorized and authenticated switches can be added to the fabric. 2. How can we centrally manage security and configuration changes on a SAN fabric? In the initial phases of SAN evolution and even today, large SAN fabrics are frequently composed of many 8- or 16-port FC switch-building blocks. Each switch features both inband and out-of-band management components (Simple Network Management Protocol (SNMP), telnet, etc.), and a switch-centric security control model. As large SANs evolve, so does the need for technologies to centrally control security, in regards to SAN data access and fabric management; also, to minimize the number of administrative access and security control points on the SAN fabric. 3. How can we ensure that only authorized hosts connect to the SAN fabric and to a specific port designated by an administrator? Initially, in SAN configurations, a host FC HBA could attach to any point in a SAN fabric and if the FC HBA was capable of basic SAN fabric login, that FC HBA became a participating member of the SAN fabric. There is a need for technologies that allow a fabric-centric method of access control for determining which hosts can attach to a specific port or switch on the SAN fabric. This would prevent a rogue attacker with a Windows NT system and a FC HBA from attaching to a non-secure SAN fabric for the purpose of configuration changes, or data access. 4. How can we ensure that the tools used to manage the SAN fabric, and SAN management requests are coming from an authorized source? Multiple in-band and out-of-band methods are used to manage SAN fabric configurations. A tunnel of communication must exist between SAN management consoles and frameworks, and the targeted SAN fabric being managed. That tunnel of communication must be secure and confirmed as authentic to prevent an attacker from using a management tool to access the nonsecure SAN fabric. 5. How can we ensure that configuration changes on the SAN fabric are valid when there are multiple points of configuration management? In early SAN configurations, multiple administrators could log into different switches on the same SAN fabric and perform fabric-configuration changes concurrently. After enabling
49
and propagating that configuration changes fabrics wide, corruption could occur due to configuration conflicts. Corruption of the SAN fabric usually occurs when configuration changes are made through multiple points on the SAN fabric. There is a need for technologies that ensure SAN fabric configuration changes only occur through a central and secure point on the SAN fabric, and that those configuration changes do not cause configuration conflicts. 2.B.II. Fabric Management and security Technologies The following technologies protect and manage the fabric: Fabric-to-Fabric Security technologies allow Access Control Lists (ACLs) to allow or deny the addition of new switches to the fabric. Public Key Infrastructure (PKI) technology may be applied as a mechanism for validating the identity of the new switch. Also, fabricwide security databases help ensure that all new authorized switches added to the fabric inherit fabric-wide security policies, so that a new out-of-the box switch does not become a non-secured access point. Host-toFabric Security technologies can apply ACLs at the port-level on the fabric, and allow or deny a particular hosts FC HBA to attach to that port. This would prevent an unauthorized intruder host from attaching to the fabric via any port. The hosts ability to log into the fabric is clearly defined and is allowed with this model. Management-to-Fabric technologies can use PKI and other encryption (such as MD5) technologies to ensure a trusted and secure management console-to-fabric communication layer exists. This will help ensure that the management console or framework used to control the SAN fabric is valid and authorized. Configuration Integrity technologies ensure that propagated fabric configuration changes only come from one location at a time, and are correctly propagated to all switches on the SAN fabric with integrity. Distributed lock managers can ensure that only serial and valid configuration changes are enabled on the fabric. 3.A - Backup Solutions Data backup methods There are three effective methods used to backup data: A. Distributed B. Centralized (conventional)
50
C. SAN 3.A.1. Backups in distributed environments In distributed environments, storage subsystems are directly attached to servers. See Figure 2. Distributed backups require IT personnel to touch each system physically (i.e., handling tapes) to perform backup operations. If the server data exceeds the tape capacity (which is usually the case), the IT person must monitor the operation and reload new tapes at the proper time. Distributed environments are fragmented in the following circumstances: When storage is isolated on individual servers (storage islands) When there are point-to-point SCSI connections only When there is a one-to-one relationship between servers and storage subsystems, creating storage islands, which scale poorly and are difficult to centrally manage.
Figure 2: Distributed Backup Environment 3.A.2. Backups in conventional centralized environments In conventional-centralized environments, a storage subsystem is attached to one server, and all other systems are backed up to that storage subsystem through the server and over the Local Area Network (LAN). See Figure 3. Conventional centralized backups limit management overhead to a single storage subsystem. The challenge is not managing the storage subsystem, but getting the data to it. Conventional-centralized backup solutions rely on an Internet Protocol (IP) network as the data path. The problem with this is that the Transmission Control Protocol/Internet Protocol (TCP/IP) processing associated with transporting the sheer volume of data can adversely impact server CPU cycles. This results in long-backup cycles that exceed the scheduled backup window. Therefore,
51
conventionalcentralized backups often overflow into user uptime, resulting in poor network response and generally unacceptable server performance. This method is an improvement over the distributed method, but it still has inefficiencies: Pros: Centralizes the storage in fewer locations and on fewer platforms Requires fewer backup servers and software packages Uses centralized administration Results in fewer human errors Cons: Backup bottlenecks develop on the LAN Bottlenecks become more frequent as storage needs grow Still managing multiple separate backup servers Typically uses the same LAN for production and data backups Many-to-one relationship between servers and the storage subsystem
Figure 3: Conventional-Centralized Backup Environment 3.A.3. Backups in SAN environments In SAN environments, storage subsystems are attached to the SAN fabric where all servers potentially have equal access to them. See Figure 4. SANs offer the following efficiencies and advantages over conventional-centralized and distributed backup methods: The entire storage-network infrastructure can be off-loaded from the LAN, promoting LANfree backups20% or more of LAN traffic can be due to backups Significant improvements in backup times, since data is moved at Fibre Channel (FC) speeds over dedicated storage networks, rather than at Ethernet speeds over a shared network
52
Fewer network interruptions when adding incremental storage hardware Reduces or eliminates backup windows Promotes on-the-fly scaling (non-disruptive) rather than set-planned downtime windows Extends the life expectancy of servers Enables off-host backups where data transfers directly from storage disks to tape libraries, bypassing the server, and reducing server loads
Figure 4: Storage Area Network One of the most valuable time and cost saving features of SAN architecture is its ability to offload backup operations from LANs and servers. This capability can significantly increase the available bandwidth on a LAN to network clients and end users during backup operations. When traditional backup servers are relieved from "handling" backup data, they can be repurposed and made available for other tasks.
53
Traditional Tape Drive Backup
SAN (LAN-free) backup SAN technology provides an alternative path for data movement between the Storage Manager client and the server. Shared storage resources (such as disk and tape) are accessible to both the client and the server through the SAN. Data is off-loaded from the LAN and from the server processor, which can create greater scalability. LAN-free backups decrease the load on the LAN by introducing a storage agent. The storage agent can be perceived as a small Storage Manager server (without a database or recovery log) that is installed and run on the Storage Manager client machine. The storage agent handles the communication with the Storage Manager server over the LAN but sends the data directly to SAN attached tape devices, relieving the Storage Manager server from the actual I/O transfer. A LAN-free backup environment is shown in Figure
54
LAN-free backup solutions can optimize backup operations by offloading backup traffic from a LAN to a SAN, thereby increasing the amount of LAN bandwidth. SAN (Server-less) backup Serverless Backup, on the other hand, extends these performance gains even further by offloading more than 90 percent of the administrative Server-free backup, are made possible by a SAN's flexible architecture and can improve overall performance significantly. Even storage reliability can be greatly enhanced by special features made possible within a SAN. Options like redundant I/O paths, server clustering, and run-time data replication (local and/or remote) can ensure data and application availability. Adding storage capacity and other storage resources can be accomplished easily within a SAN, often without the need to shut down or even quiesce the server(s) or their client networks. These, and other, features can quickly add up to big cost savings, painless expansion, reduced network loading, and fewer network outages. burden that is usually placed upon a dedicated backup server as backups are performed. This is typically achieved by embedding some of the backup intelligence into the data storage devices themselves (RAID systems and tape drives) or SAN connectivity peripherals (switches, hubs or bridges). This can free up traditional backup servers significantly by releasing them from data moving duties and large portions of a backup operation's administration. When implemented properly, these SAN based backup solutions let administrators optimize network and server utilization, dramatically shorten backup times, and regain processor and network resources. Figure. Server-Free Backup
55
SAN data backup and access benefits SANs promote the following benefits: Improved data availability and performance speed Number of connections to storage subsystems can be easily scaled for both availability and performance Access to data is faster, easier, and more reliable 3.B - Disaster Recovery Planning a backup and restoration of files for disaster recovery Planning a backup and restoration of files is the most important step to protect data from accidental loss in the event of data deletion or a hard disk failure. The backup copy can be used to restore lost or damaged data. For taking backups and restoring files, Microsoft has provided a utility called Backup. The Backup utility creates a copy of data on a hard disk of a computer and archives data on another storage media. Any storage media such as removable disks, tapes, and logical drives can be used as a backup storage. While taking a backup of files, the Backup utility creates a volume shadow copy of the data to create an accurate copy of the contents. It includes any open files or files that are being used by the system. Users can continue to access the system while the Backup utility is running without the risk of losing data. Volume Shadow Copy
56
Backup provides a feature of taking a backup of files that are opened by a user or system. This feature is known as volume shadow copy. Volume shadow copy makes a duplicate copy of all files at the start of the backup process. In this way, files that have changed during the backup process are copied correctly. Due to this feature, applications can continue writing data to the volume during a backup operation, and backups can be scheduled at any time without locking out users.
Types of Backups The Windows Backup utility provides various types of backups. While planning for a backup strategy, it is important to choose an appropriate type or combination of different types of backups. The backup type determines which files are transferred to the destination media. Each backup type relates to an attribute maintained by every file known as archive (A). The archive attribute is set when a file is created or changed. When an archive attribute is set, it means that the backup Of this file has not been taken or it is due.
Note: When it is said that "The file is marked as backup", it means that the archive attribute of the file has been cleared. NormalBackups
When an administrator chooses to use a normal backup, all selected files and folders are backed up and the archive attribute of all files are cleared. A normal backup does not use the archive attribute to determine which files to back up. A normal backup is used as the first step of any backup plan. It is used with the combination of other backup types for planning a backup strategy of an organization. Normal backups are the most timeconsuming and are resource hungry. Restoration from a normal backup is more efficient than other types of backups.
57
Incremental backups An incremental backup backs up files that are created or changed since the last normal or incremental backup. It takes the backup of files of which the archive attribute is set. After taking a backup, it clears the archive attribute of files. An incremental backup is the fastest backup process. Restoring data from an incremental backup requires the last normal backup and all subsequent incremental backups. Incremental backups must be restored in the same order as they were created. Note: If any media in the incremental backup set is damaged or data becomes corrupt, the data backedup after corruption cannot be restored.
Differential Backups Differential backup backs up files that are created or changed since the last normal backup. It does not clear the archive attribute of files after taking a backup. The restoration of files from a differential backup is more efficient than an incremental backup. Copy Backups A copy backup copies all selected files and folders. It neither uses nor clears the archive attribute of the files. It is generally not a part of a planned scheduled backup. Daily Backups A daily backup backs up all selected files and folders that have changed during the day. It backs up data by using the modified date of the files. It neither uses nor clears the archive attribute of the files. Combining backup types The easiest backup plan is to take a normal backup every night. A normal backup every night ensures that the data is restored from a single job the next day. Although the restoration of data from a normal backup is easy, taking a backup is time consuming. Hence, an administrator is required to make an optimal backup plan. An administrator must consider the following points before creating a backup plan:
58
The time involved in taking the backup. The size of the backup job. The time required to restore a system in the event of a system failure.
The most common solutions for the needs of different organizations include the combination of normal, differential,and incremental backups.
Combination of Normal and Differential Backups An administrator can use a combination of a normal backup and a differential backup to save time in taking a backup as well as for a restoration of data. In this plan, a normal backup can be taken on Sunday, and differential backups can be taken on Monday through Friday every night. If data becomes corrupt at any time, only a normal and last differential backup are required to be restored. Although this combination is easier and takes lesser time for restoration, it takes more time to take backup if data changes frequently. Combination of Normal and Incremental Backups A combination of normal and incremental backups can be used to save more time for taking backups. In this plan, a normal backup is taken on Sunday and incremental backups on Monday through Friday every night. If data becomes corrupt at any time, a normal and all incremental backups till date are required to be restored. Backing up a System State Data System State data contains critical elements of the Windows 2000 and Windows Server 2003 operating systems. Following are the files included in the System State data: Boot files, including the system files and all files protected by Windows File Protection (WFP). Active Directory (on domain controller only). SYSVOL (on domain controller only). Certificate Services (on certification authority only).
59
Cluster database (on cluster node only). Registry. IIS metabase. Performance counter configuration information. Component Services Class registration database.
For backing up the System State of a computer, the System State node is included as a part of the backup selection in the Backup utility. Note: On domain controllers, System State can be restored only by restarting the domain controller in Directory Services Restore Mode. NTDSUTIL is used to recover deleted objects in Active Directory.
System Recovery In the event of a system failure, the recovery of the system is difficult and tedious for administrators. Recovery involves reinstallation of the operating system, mounting and cataloging the backup tape, and then performing the full restore. To make this process easier, Windows provides a feature called Automated System Recovery (ASR). ASR is used to perform a restore of the System State data and services in the event of a major system failure. An ASR restore includes the configuration information for devices. ASR backs up the system data and local system partition. How to create an ASR set? Take the following steps to create an Automated System Recovery (ASR) set by using the Backup or Restore Wizard: 1.Run Backup from Start Menu > Programs > Accessories > System Tools > Backup. 2.In the welcome screen of the Backup or Restore Wizard, click the Advanced Mode link. 3.On the welcome page of the Advanced Mode of the Backup utility, choose the ASR Wizard option from the Tools menu. 4.In the welcome screen of the ASR Wizard, click the Next button. 5.On the Backup Destination page, specify the location of the backup, and click the Next button. 6.Click the Finish button.
60
Note: An ASR backup does not include folders and files. Best practices for Backup According to Microsoft, administrators should take the following steps to ensure the recovery in case of a system failure: Develop backup and restore strategies and test them. Train appropriate personnel. In a high-security network, ensure that only administrators are able to restore files. Back up all data on the system and boot volumes and the System State. Back up the data on all volumes and the System State data at the same time. Create an Automated System Recovery backup set. Create a backup log. Keep at least three copies of the media. Keep at least one copy off-site in a properly controlled environment. Perform trial restorations. Secure devices and media. Do not disable the default volume shadow copy backup method and revert to the pre-Windows Server 2003 backup method. Back up your server cluster effectively. Back up the cluster disks from each node.
3.C - Data Replication D ata Replication provides many benefits in today's IT environments. For example, it can allow system administrators to create and manage multiple copies of vital information across a global enterprise. This enables disaster recovery solutions, maximizes business continuity, and permits file server content to be distributed over the Internet. Replication options can even improve host processing efficiency by moving data sets onto secondary (often remote) servers for backup operations. In some cases, these data replication capabilities are required by the "high availability" and "server clustering" features provided by many of today's SAN architectures. Remote data replication is typically achieved with one of two basic strategies: Storage replication is focused on the bulk transfer of files, or block data, from one server to "one or more" other servers. This type of replication generally allows application(s) to be
61
running on a server while they, and/or their data, are being replicated to another off-site server. Application level replication is specific to a particular application, such as a database or web server, and is typically performed at the transaction level (field, row, table, etc.) by the application itself. Many replication products include the ability to transfer data Synchronously or Asynchronously. With synchronous transfers, each packet of transmitted data is acknowledged by the receiving server before more data is sent to it. This can be a slower form of replication, but is very reliable. Asynchronous data transfers allow data packets to be sent ahead of acknowledgements from the receiving server for previously sent packets. This method is usually faster but allows more data to be lost if links fail. Eventually, in either case, all transmitted packets must be acknowledged by the receiving system.
62

SAN Administration Basics

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

SAN Administration Basics

Загружено:

Авторское право:

Доступные форматы

SAN INTRODUCTION

Figure 1.4 Multiple Host Arbitrated Loop for Storage Sharing

FC Technology: 1. What is Fibre Channel?

FC Layers & OSI Layers Comparison

AL_PA (Arbitrated Loop Physical Address)

S_ID (Native Address Identifier)

End to End Credit

F_Po rt 1 Buffer to Buffer Credit

F_Po rt 2 Buffer to Buffer Credit

Loop initialization happens mainly for two reasons.

3 main functions happens during Loop initialization.

AL_PAs. This Process is done using following four frames.

Arbitrated Loop is not a token-passing scheme. When a device is ready to

Fibre Channel Switch

2. Enter the address http:// ip address of the switch

Switch Behavior: Switch Initialization:

At the Power ON, boot PROM diagnostics :

Build Unicast routing table. Enable N-Port operations.

Fabric Devices typically

FLOGI PLOGI to Name Server SCR to Fabric Controller Register &

Loop Devices typically

Using Management Software following actions can be done

Fig. SAN Storage Array

1. Active/Active. & 2. Active/Passive

Additionally, volume copy can be used to redistribute data moving volumes

Traditional Tape Drive Backup

Вам также может понравиться