Вы находитесь на странице: 1из 66

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Section 5 - Monitoring and Managing the Data Center

Introduction

2006 EMC Corporation. All rights reserved.

Welcome to Section 5 of Storage Technology Foundations Monitoring and Managing the Data Center. Copyright 2006 EMC Corporation. All rights reserved. These materials may not be copied without EMC's written consent. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. EMC2, EMC, Navisphere, CLARiiON, and Symmetrix are registered trademarks and EMC Enterprise Storage, The Enterprise Storage Company, The EMC Effect, Connectrix, EDM, SDMS, SRDF, Timefinder, PowerPath, InfoMover, FarPoint, EMC Enterprise Storage Network, EMC Enterprise Storage Specialist, EMC Storage Logix, Universal Data Tone, E-Infostructure, Access Logix, Celerra, SnapView, and MirrorView are trademarks of EMC Corporation. All other trademarks used herein are the property of their respective owners.

Monitoring and Managing the Data Center - 1

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Section Objectives
Upon completion of this section, you will be able to: Describe areas of the data center to monitor Discuss considerations for monitoring the data center Describe techniques for managing the data center

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 2

The objectives for this section are shown here. Please take a moment to read them.

Monitoring and Managing the Data Center - 2

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

In This Section
This section contains the following modules: Monitoring in the Data Center Managing in the Data Center

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 3

This section contains 2 modules, monitoring in the data center and managing in the data center.

Monitoring and Managing the Data Center - 3

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring in the Data Center


After completing this module, you will be able to: Discuss data center areas to monitor List metrics to monitor for different data center components Describe the benefits of continuous monitoring Describe the challenges in implementing a unified and centralized monitoring solution in heterogeneous environments Describe industry standards for data center monitoring

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 4

In this module, you will learn about different aspects of monitoring data center components, including the benefits of pro-active monitoring and the challenges of managing a heterogeneous environment (multiple hardware/software from various vendors).

Monitoring and Managing the Data Center - 4

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Data Center Components


Client

HBA Port HBA Port


IP

Keep Alive

IP

SAN Storage Arrays Health Capacity Performance

Network

Cluster

Hosts/Servers with Applications


2006 EMC Corporation. All rights reserved.

Security
Monitoring and Managing the Data Center - 5

The Business Continuity Overview module discussed the importance of resolving all single points of failure when designing data centers. Having designed a resilient data center, the next step is to ensure that all components that make up the data center are functioning properly and are available on a 24x7 basis. The way to achieve this is by monitoring the data center on a continual basis. System Monitoring is essential to ensure that the underlying IT infrastructure business critical applications are operational and optimized. The main objective is to ensure that the various hosts, network systems and storage are running smoothly and to know how loaded each system and component is and how effectively it is being utilized. The major components within the data center that should be monitored include: Servers, databases and applications Network ((SAN) and IP Networks (switches, routers, bridges)) Storage Arrays Each of these components should be monitored for health, capacity, performance, and security.

Monitoring and Managing the Data Center - 5

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Why Monitor Data Centers


Availability
Continuous monitoring ensures availability Warnings and errors are fixed proactively

Scalability
Monitoring allows for capacity planning/trend analysis which in turn helps to scale the data center as the business grows

Alerting
Administrators can be informed of failures and potential failures Corrective action can be taken to ensure availability and scalability

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 6

Continuous monitoring of health, capacity, performance and security of all data center components is critical to ensure data availability and scalability. For example, information about component failures can be sent to appropriate personnel for corrective actions. Ongoing trends show that the data storage environment continues to grow at a rapid pace. According to the International Data Corporation (IDC), external storage-system capacity growth will increase at a compound annual growth rate (CAGR) of approximately 50% through 2007. This represents a doubling of the current capacity every 2 years or so. Automated monitoring and alerting solutions are becoming increasingly important. Monitoring the data center closely and effectively optimizes data center operations and avoids downtime.

Monitoring and Managing the Data Center - 6

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Health
Why monitor health of different components
Failure of any hardware/software component can lead to outage of a number of different components
Example: A failed HBA could cause degraded access to a number of data devices in a multi-path environment or to loss of data access in a single path environment

Monitoring health is fundamental and is easily understood and interpreted


At the very least health metrics should be monitored Typically health issues would need to be addressed on a high priority

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 7

Health deals with the status/availability of a particular hardware component or a software process. (i.e., status of SAN device or port, database instance up/down, HBA status, disk/drive failure, etc.) If a component has failed, it could lead to down time unless redundancy exists. Monitoring the health of data center components is very important and is easy to understand and interpret (i.e., a component is either available or it has failed). Monitoring for capacity, performance, and security depend on the health and availability of different components.

Monitoring and Managing the Data Center - 7

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Capacity
Why monitor capacity
Lack of proper capacity planning can lead to data un-availability and the ability to scale Trend reports can be created from all the capacity data
Enterprise is well informed of how IT resources are utilized

Capacity monitoring prevents outages before they can occur


More preventive and predictive in nature than health metrics
Based on reports one knows that 90% of a file system is full and that the file system is filling up at a particular rate 95% of all the ports have been utilized in a particular SAN fabric, a new switch should added if more arrays/servers are to be added to the same fabric

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 8

From a monitoring perspective, capacity deals with the amount of resources available. Examples: Available free/used space on a file system or a database table space Amount of space left in a RAID Group Amount of disk space available on storage arrays Amount of file system or mailbox quota allocated to users. Number of available ports in a switch (e.g., 52 out of 64 ports in use, leaving 12 free ports for expansion)

Monitoring and Managing the Data Center - 8

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Performance
Why monitor Performance metrics
Want all data center components to work efficiently/optimally See if components are pushing performance limits or if they are being under utilized Can be used to identify performance bottlenecks

Performance Monitoring/Analysis can be extremely complicated


Dozens of inter-related metrics depending on the component in question Most complicated of the various aspects of monitoring

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 9

Performance monitoring measures the efficiency of operation of different data center components. Examples: Number of I/Os thorough a front-end port of a storage array Number of I/Os to disks in a storage array Response time of an application Bandwidth utilization Server CPU utilization

Monitoring and Managing the Data Center - 9

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Security
Why monitor security
Prevent and track unauthorized access
Accidental or malicious

Enforcing security and monitoring for security breaches is a top priority for all businesses

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 10

Security prevents and tracks unauthorized access. Examples of security monitoring are: Login failures Unauthorized storage array configuration/re-configuration Monitoring physical access (via badge readers, biometric scans, video cameras, etc.) Unauthorized Zoning and LUN masking in SAN environments or changes to existing zones

Monitoring and Managing the Data Center - 10

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Servers
Health
Hardware components
HBA, NIC, graphic card, internal disk

Status of various processes/applications

Capacity
File system utilization Database
Table space/log space utilization

HBA HBA

User quota

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 11

Any failure of a hardware component such as HBA or NIC, should be immediately detected and rectified. As seen earlier, component redundancy can prevent total outage. Mission critical applications running on the servers should also be monitored continuously. A database might spawn a number of processes that are required to ensure operations. Failure of any of these processes can cause non-availability of the database. Databases and applications usually have mechanisms to detect such errors and report them. Capacity monitoring on a server will involve monitoring file system space utilization. By continuously monitoring file system free space, estimate the growth rate of the file system and effectively predict as to when it will become a 100% full. Corrective action such as extending the space of a file system can be taken well ahead of time to avoid a file system full condition. In many environments, system administrators enforce space utilization quota on users. For example, a user cannot exceed 10 GB of space or a particular file cannot be greater than 100 MB.

Monitoring and Managing the Data Center - 11

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Servers
Performance
CPU utilization Memory utilization Transaction response times

Security
Login Authorization Physical security
Data center access

HBA HBA

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 12

Two key metrics of performance of servers are the CPU and memory utilization. A continuously high value (above 80%) for CPU utilization is an indication that the server is running out of processing power. During periods of high CPU utilization, applications running on the server, and consequently end-users of the application, will experience slower response times. Corrective action could include upgrading processors, adding more processors, shifting some applications to different Servers, or restricting the number of simultaneous client access. Databases, applications, and file systems utilize Server physical memory (RAM) to stage data for manipulation. When sufficient memory is not available, data has to be paged in and out of disks. This process will also result in slower response times. Login failures and attempts by unauthorized users to execute code or launch applications should be closely monitored to ensure secure operations.

Monitoring and Managing the Data Center - 12

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring the SAN


Health
Fabrics
Fabric errors, zoning errors

Ports
Failed GBIC, status/attribute change

Devices
Status/attribute Change

Hardware Components
Processor cards, fans, power supplies

Capacity
ISL utilization Aggregate switch utilization Port utilization
2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 13

Uninterrupted access to data over the SAN depends on the health of its physical and logical components. The GBICs, power supplies, and fans in switches and cables are the physical components. Any failure in these must be immediately reported. Constructs such as zones and fabrics are the logical components. Errors in zoning such as specifying the wrong WWN of a port will result in failure to access that port. These have to be monitored, reported, and rectified as well. By way of capacity, the number of ports on different switches that are currently used/free should be monitored. This will aid in planning expansion by way of adding more Servers or connecting to more storage array ports. Utilization metrics at the switch level and port level, along with utilization of Interswitch Links (ISLs), are also a part of SAN capacity measurements. These can be viewed as being a part of performance metrics as well.

Monitoring and Managing the Data Center - 13

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring the SAN


Performance
Connectivity ports
Link failures Loss of signal Loss of synchronization Link utilization Bandwidth MB/s or frames/s

Connectivity devices
Statistics are usually a cumulative value of all the port statistics

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 14

A number of SAN performance/statistical metrics can be used to determine/predict hardware failure (health). For example, an increasing number of link failures may indicate that a port is about to fail. The following are metrics which describe these failures: Link Failures - the number of link failures occurring on a connectivity device port. A high number of failure could indicate a hardware problem (bad port, bad cable ) Loss of Signal - the number of loss of signal events occurring on a connectivity device port. A high number indicates a possible hardware failure. Loss of Synchronization - the number of loss of synchronization events occurring on a connectivity device port. High counts may indicate hardware failure. Connectivity device port performance can be measured with the Receive or Transmit Link Utilization metrics. These calculated values give a good indicator of how busy the switch port is based on the assumed maximum throughput. Heavily used ports can cause queuing delays on the host.

Monitoring and Managing the Data Center - 14

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring the SAN


Security
Zoning
Ensure communication between dedicated sets of ports (HBA and Storage Ports)

LUN Masking
Ensure the only certain hosts have access to certain Storage Array volumes

Administrative Tasks
Restrict administrative tasks to a select set of users Enforce strict passwords

Physical Security
Access to Data Center should be monitored

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 15

SAN Security includes monitoring the fabrics for any zoning changes. Any errors in the zone set information can lead to data inaccessibility. Unauthorized zones can compromise data security. User login/authentication to switches should be monitored to audit administrative changes. Ensure that only authorized users are allowed to perform LUN masking tasks. Any such tasks performed should be audited for proper authorization.

Monitoring and Managing the Data Center - 15

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Storage Arrays


Health
All hardware components
Front End Back End Memory Disks Power Supplies

Array Operating Environment


RAID processes Environmental Sensors Replication processes

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 16

Storage arrays typically have redundant components to function when individual components fail. Performance of the array might be affected during such failures. Failed components should be replaced quickly to restore optimal performance. Some arrays include the capability to send a message to the vendors support center in the event of hardware failures. This feature is typically known as call-home. It is equally important to monitor the various processes of the storage array operating environment. For example, failure of replication tasks will compromise disaster recovery capabilities.

Monitoring and Managing the Data Center - 16

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Storage Arrays


Capacity
Configured/unconfigured capacity Allocated/unallocated storage Fan-in/fan-out ratios

Performance
Front End utilization/throughput Back End utilization/throughput I/O profile Response time Cache metrics

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 17

Physical disks in a storage array are partitioned into LUNs for use by hosts. Configured capacity is the amount of space that has been partitioned into LUNs Unconfigured capacity is the remaining space on the physical disks Allocated storage refers to LUNs that have been masked for use by specific hosts/servers. Unallocated storage refers to LUNs that have been configured, but not yet been masked for host use. Monitoring storage array capacity enables you to predict and react to storage needs as they occur. Fan-in/fan-out ratios and availability of unused front end ports (ports to which no host has yet been connected) is useful when new hosts/servers have to be given access to the storage array. Performance: Numerous performance/statistical metrics can be monitored for storage arrays. Some of the key metrics to monitor are the utilization rates of the various components that make up the storage arrays. Extremely high utilization rates can lead to performance degradation.

Monitoring and Managing the Data Center - 17

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Storage Arrays


Security
LUN Access
Ensure the only certain hosts have access to certain Storage Array volumes Disallow WWN spoofing

Administrative tasks
Most arrays allow the restriction of various array configuration tasks
Device configuration LUN masking Replication operations Port configuration

Physical Security
Monitor access to data center

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 18

World Wide Name (WWN) spoofing is a security concern. For example, an unauthorized host can be configured with a HBA that has the same WWN as another authorized host. If this host is now connected to the storage array via the same SAN, then zoning and LUN Masking restrictions will be bypassed. Storage arrays have mechanisms in place which can prevent such security breaches. Auditing of array device configuration tasks, as well as replication operations is important, to ensure that only authorized personnel are performing these.

Monitoring and Managing the Data Center - 18

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring IP Networks
Health
Hardware Components
Processor cards, fans, Power Supplies, ...

Cables

Performance
Bandwidth Latency Packet Loss Errors Collisions
IP

Security
2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 19

Network performance is vital in a storage environment. Monitor network latency, packet loss, availability, traffic, and bandwidth utilization for: I/O (Bandwidth Usage) Errors Collisions

Monitoring and Managing the Data Center - 19

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring the Data Center as a Whole


Monitor data center environment
Temperature, humidity, airflow, hazards (water, smoke, etc.) Voltage power supply

Physical security
Facility access (Monitoring cameras, access cards, etc.)

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 20

Monitoring the environment of a data center is just as crucial as monitoring the different components. Most electrical/electronic equipment are extremely sensitive to heat, humidity, voltage fluctuations, etc. Data center layout and design have to account for correct levels of ventilation, accurate control of temperature/humidity, uninterrupted power supplies, and corrections to voltage fluctuations. Any changes to the environment should be monitored and reported immediately. Physical security is easy to understand.

Monitoring and Managing the Data Center - 20

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

End-to-End Monitoring
Client

HBA Port HBA Port


IP

Keep Alive

IP

SAN Storage Arrays Single Failure Multiple Symptoms Root Cause Analysis

Network

Cluster

Hosts/Servers with Applications


2006 EMC Corporation. All rights reserved.

Business Impact
Monitoring and Managing the Data Center - 21

A good end-to-end monitoring system should be able to quickly analyze the impact that a single failure can cause. The monitoring system should be able to deduce that a set of seemingly unrelated symptoms are result of a root cause. It should also be able to alert on the impact to business arising from different component failures.

Monitoring and Managing the Data Center - 21

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Health: Array Port Failure


H1
Degraded HBA
HBA

SW1 H2
Degraded HBA
HBA Port Port

SW2
Storage Arrays

H3

Degraded HBA
HBA

SAN

Hosts/Servers with Applications


2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 22

Here is an example of the importance of end-to-end monitoring. In this example, 3 Servers (H1, H2, and H3) have 2 HBA each and are connected to the storage array via two switches (SW1 and SW2). The three servers share the same storage ports on the Storage Array. If one of the storage array ports fails it will have the following effect on the whole data center: Since all servers are sharing the ports, all the storage volumes that were accessed via SW1 will be unavailable. The servers will experience path failures. Redundancy enables them to continue operations via SW2. The applications will experience reduced performance (degraded), because the number of available paths to the storage devices has been cut in half. If the applications belong to different business units all of these would be affected even though only a single port has failed. This example illustrates the importance of monitoring the health of storage arrays. By constantly monitoring the array, you can detect the fault as soon as it happens and fix it right away so as to minimize the time that applications have to run in a degraded mode.

Monitoring and Managing the Data Center - 22

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Health: HBA failure


Degraded H1
HBA HBA

SW1 H2
HBA HBA Port Port

SW2 Storage Arrays H3


HBA HBA

SAN

Hosts/Servers with Applications


2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 23

The scenario presented here is the same as the previous (3 Servers H1, H2 and H3 have 2 HBA each and are connected to the storage array via two switches SW1 and SW2. The three servers share the same storage ports on the storage array). In this example, if there is a single HBA failure, the server with the failed HBA will experience path failures to the storage devices that it had access to. Application performance on this server will be affected.

Monitoring and Managing the Data Center - 23

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Health: Switch Failure

SW1
Port

All Hosts Degraded

Port Port Port

SW2
Hosts/Servers with Applications Storage Arrays SAN

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 24

In this example, a number of servers (with 2 HBAs each) are connected to the storage array via two switches (SW1 and SW2). Each server has independent paths (2 HBAs) to the storage array via switch SW1 and switch SW2. What happens if there is a complete switch failure of switch SW1? All the hosts that were accessing storage volume via switch SW1 will experience path failures. All applications on the servers will run in a degraded mode. Notice that the failure of a single component (a switch in this case) has a ripple effect on many data center components.

Monitoring and Managing the Data Center - 24

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Capacity: Array


New Server
SAN Storage Array
Port Port

SW1

SW2
Hosts/Servers with Applications

Port Port

Can the Array provide the required storage to the new server?

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 25

This example illustrates the importance of monitoring the capacity of arrays. A number of servers (with 2 HBAs each) are connected to the storage array via two switches (SW1 and SW2). Each server has independent paths (2 HBAs) to the storage array via switch SW1 and switch SW2. Each of the servers has been allocated storage on the storage array. An application on the new server has to be given access to storage devices from the array, via switches SW1 and SW2.A new server has to be deployed. Monitoring the amount of configured and unconfigured space on the array is critical for deciding if this is possible. Proactive monitoring will help from the initial planning stages to final deployment.

Monitoring and Managing the Data Center - 25

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Capacity: Servers File System Space


No Monitoring
File System

FS Monitoring
File System

Extend FS Warning: FS is 66% Full Critical: FS is 80% Full

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 26

This example illustrates the importance of monitoring capacity on servers. On the left is an application server which is writing to a file system without monitoring the file system capacity. Once the file system is full, the application will no longer be able to function. On the right is a similar setup. An application server is writing to a file system. In this case, the file system is monitored. A warning is issued at 66%, then a critical message at 80%. We can take action and extend the file system before the file system full condition is reached. Proactively monitoring the file system can prevent application outages caused by lack of file system space.

Monitoring and Managing the Data Center - 26

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Performance: Array Port Utilization


HBA HBA HBA HBA

H4 H1

Port Util. %

New Server

100%

SW1

H1 + H2 + H3

H2

HBA HBA

SW2
SAN

Port Port Port

H3

HBA HBA

Storage Arrays Hosts/Servers with Applications


2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 27

This example illustrates the importance of monitoring performance metrics on storage arrays. Three Servers (H1, H2 and H3) have two HBAs each and are connected to the storage array via two switches (SW1 and SW2). The three servers share the same storage ports on the storage array. A new server H4 has to be deployed and must share the same storage ports as H1, H2 and H3. To ensure that the new server does not adversely affect the performance of the others, it is important to monitor the array port utilization. In this example, the utilization for the shared ports is shown using the green and red lines in the line graph. If the actual utilization prior to deploying the new server is the green line, then there is room to add the new server. Otherwise, the deployment of the new server will impact performance of all servers.

Monitoring and Managing the Data Center - 27

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Performance: Servers


Critical: CPU Usage above 90% for the last 90 minutes

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 28

Most servers have tools that allow you to interactively monitor CPU usage. For example, Windows Task Manager displays the CPU and Memory usage (as shown above). Interactive tools are fine if only a few servers are being managed. In a data center, with potentially hundreds of servers, the tool must be capable of monitoring many servers simultaneously. Tools tool should send a warning to the System Administrator whenever the CPU utilization exceeds a specified threshold.

Monitoring and Managing the Data Center - 28

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Security: Servers

Login 1 Login 2 Login 3


Critical: Three successive login failures for username Bandit on server H4, possible security threat

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 29

Login failures could be accidental (mistyping) or could be the result of a deliberate attempt to break into a system. Most servers will usually allow two successive login failures and will not allow any more attempts after a third successive login failure. In most environments, this information may simply be logged in a system log file. Ideally, you should monitor for such security events. In a monitored environment when there are three successive login failures, a message could be sent to the System Administrator to warn them of a possible security threat.

Monitoring and Managing the Data Center - 29

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring Security: Array Local Replication

SAN
SW1 Workgroup 2 (WG1) SW2
Replication CMD

Storage Array
Port WG2 Port Port Port WG1

Workgroup 1 (WG2)

Warning: Attempted replication of WG2 devices by WG1 user Access denied


Monitoring and Managing the Data Center - 30

2006 EMC Corporation. All rights reserved.

This example illustrates the importance of monitoring security breaches in a storage array. A storage array is a shared resource. In this example, the array is being shared between two workgroups. The data of WG1 should not be accessible by WG2. Likewise, WG2 should not be accessible by WG1. A user from WG1 may try to make a local replica of the data that belongs to WG2. Typically, mechanisms will be in place to prevent such an action. If this action is not monitored or recorded in some fashion, be unaware that some one is trying to violate security protocols. But if this action is monitored, a warning message can be sent to the Storage Administrator.

Monitoring and Managing the Data Center - 30

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring: Alerting of Events


Warnings require administrative attention
File systems becoming full Soft media errors

Errors require immediate administrative attention


Power failures Disk failures Memory failures Switch failures

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 31

Monitoring systems allow administrators to assign different severity levels for different conditions in the data center. Health related alerts will usually be classified as being critical or fatal meaning that a failure in a component has immediate adverse consequences. Other alerts can be arranged in a spectrum from Information to Fatal. Generically: Information useful information requiring no administrator intervention, e.g. an authorized user has logged in Warning administrative attention is required, but the situation is not critical. An example may be that a file system has reached the 75% full mark. The administrator has time to decide what action should be taken Fatal immediate attention is required, because the condition will affect system performance or availability. If a disk fails, for example, the administrator must ensure that it is replaced quickly. The sources of monitoring messages may include hardware components, such as servers and storage systems, and software components, such as applications. Continuous monitoring, in combination with automated alerting, enables administrators to: Reactively respond to failures quickly Proactively avert failures by looking at trends in utilization and performance

Monitoring and Managing the Data Center - 31

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring: Challenges
EMC Hitachi

Storage Arrays
CAS HP IBM SUN NAS DAS SAN TLU

NetApp

Cisco

Servers
UNIX WIN MF

Network
McData SAN IP Brocade

Databases Oracle Informix

Applications MS SQL

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 32

The core elements of the data center are the storage arrays, networks, servers, databases, and applications. Storage arrays could be NAS, CAS, DAS, SAN attached or Tape/Disk Library Units The network consists of the SAN and the IP Network Servers could be Open Systems (UNIX or Windows) or Mainframe. There are numerous vendors who supply these data center components The challenge is to monitor and manage each of these components. Typically, each vendor will provide monitoring/management tools for each of the components. As a consequence, in order to successfully monitor and manage a data center, learn multiple tools and terminologies. In an environment where multiple tools are in use, it is almost impossible to get a complete picture of what is going on a single page. Most data center components are inter-related (i.e. a SUN host is connected to a EMC storage array via a Cisco SAN). In an ideal world, the monitoring tool should be able to correlate the information from all objects in one place, so that you can make an informed decision on any of the metrics that is monitored.

Monitoring and Managing the Data Center - 32

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Monitoring: Ideal Solution


One UI

Monitoring/Management Engine Storage Arrays

Storage Arrays
CAS NAS DAS SAN TLU

Network Servers, Databases, Applications

Servers
UNIX WIN MF

Network

SAN Databases
2006 EMC Corporation. All rights reserved.

IP

Applications
Monitoring and Managing the Data Center - 33

The ideal solution to monitoring all data center objects from all the vendors would be a Monitoring/Management engine that would be able to gather information on all the objects and be able to manage all the same via a single user interface. The engine should also be able to perform root cause analysis and indicate how individual component failures affect various business units. Single interface to monitor all objects in the data center Root cause analysis - multiple symptoms may be triggered by single root cause How to individual component failures affect various business units Should have mechanism to inform administrators of events via e-mail/page/SNMP traps etc. Should provide the ability to generate reports

Monitoring and Managing the Data Center - 33

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Without Standards
No common access layer between managed objects and applications vendor specific No common data model No interconnect independence Multi-layer management difficulty Legacy systems can not be accommodated No multi-vendor automated discovery Policy-based management is not possible across entire classes of devices
Host Management Storage Management Database Management Network Management Applications Management

Interoperability!

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 34

SAN Administrators have often been faced with the dilemma of integrating multi-vendor hardware and software under a single management umbrella. It is relatively easy for administrators to monitor individual switches. But, monitoring a set of switches together and correlating data is a more complex challenge. Users and administrators want the flexibility to select the most suitable products for a particular application or set of applications and then easily integrate those products into their computing environments. Traditionally this has not been possible for the reasons listed above. Without standards, policy-based management is not possible across entire classes of devices. This poses a big dilemma for diverse environments.

Monitoring and Managing the Data Center - 34

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Simple Network Management Protocol (SNMP)


SNMP
Meant for network management Inadequate for complete SAN Management

Limitations of SNMP
No Common Object Model Security - only newer SAN devices support v3 Positive response mechanism Inflexible - No auto discovery functions No ACID (Atomicity, Consistency, Isolation, and Durability) properties Richness of canonical intrinsic methods Weak modeling constructs
2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 35

Until recently, Simple Network Management Protocol (SNMP) has been a protocol of choice that has been used quite effectively to manage multi-vendor SAN environments. However, SNMP, being primarily a network management protocol, is inadequate when it comes to providing a detailed treatment on the fine grain elements in a SAN. Some of the limitations of SNMP are shown here. While SNMP still retains a predominant role in SAN management, newer and emerging standards may change this.

Monitoring and Managing the Data Center - 35

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Storage Management Initiative (SMI)


Created by the Storage Networking Industry Association (SNIA) Integration of diverse multi-vendor storage networks Development of more powerful management applications Common interface for vendors to develop products that incorporate the management interface technology Key components
Inter-operability testing Education and collaboration Industry and customer promotion Promotions and demonstrations Technology center SMI specification Storage industry architects and developers
Tape Library
MOF

Management Application Integration Infrastructure


Object Model Mapping Vendor Unique Features

SMI-S Interface

Platform Independent Distributed Automated Discovery Security Locking Object Oriented

CIM/WBEM Technology

Switch
MOF

Array
MOF

Many Other
MOF

Standard Object Model per Device Vendor Unique Function

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 36

The Storage Networking Industry Association (SNIA) has been engaged in an initiative to develop a common, open storage and SAN management interface based on the Distributed Management Task Forces (DMTF) Common Information Model. This initiative is known as the Storage Management Initiative (SMI). One of the core objectives of this initiative is to create a standard that will be adopted by all Storage and SAN vendors, hardware and software alike, that will bring about true interoperability and allow administrators to manage multi-vendor and diverse storage networks using a single console or interface. The Storage Management Initiative Specification (SMI-S) offers substantial benefits to users and vendors. With SMI-S, developers have one complete, unified and rigidly specified object model, and can turn to one document to understand how to manage the breadth of SAN components. Management application vendors are relieved of the tedious task of integrating incompatible management interfaces, letting them focus on building management engines that reduce cost and extend functionality. And device vendors are empowered to build new features and functions into subsystems. SMI-S-compliant products will lead to easier, faster deployment and accelerated adoption of policy-based storage management frameworks. A test suite developed by the SNIA will certify compliance of hardware components and management applications with the specification. Certified components also will be subjected to rigorous interoperability testing in an SMI laboratory.
Monitoring and Managing the Data Center - 36

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Storage Management Initiative Specification (SMI-S)


Based on:
Web Based Enterprise Management (WBEM) architecture Common Information Model (CIM)
Graphical User
Storage Resource Management Performance Capacity Planning Removable Media

Management
Management Tools Container Management Volume Management Media Management Other

Users
Data Management File System Database Manager Backup and HSM

Storage Management Interface Specification


Managed Objects Physical Components Removable Media Tape Drive Disk Drive Robot Enclosure Host Bus Adapter Switch Logical Components Volume Clone Snapshot Media Set Zone Other

Features:
A common interoperable and extensible management transport A complete, unified and rigidly specified object model that provides for the control of a SAN An automated discovery system New approaches to the application of the CIM/WBEM technology

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 37

SMI-S forms a layer that resides between managed objects and managed applications. The following features of SMI-S provide the key to simplifying SAN management: Common data model: SMI-S is based on Web Based Enterprise Management (WBEM) technology and the Common Information Model (CIM). SMI-S agents interrogate a device, such as a switch, host or storage array, extract the relevant management data from CIMenabled devices, and provide it to the requester. Interconnect independence: SMI-S eliminates the need to redesign the management transport and lets components be managed using in-band or out-of-band communications, or a mix of the two. SMI-S offers further advantages by specifying the CMI-XML over HTTP protocol stack and utilizing the lower layers of the TCP/IP stack, both of which are ubiquitous in today's networking world. Multilayer management: SMI-S has been developed to work with server-based volume managers, RAID systems and network storage appliances, a combination that most storage environments currently employ. Legacy system accommodation: SMI-S has been developed to incorporate the management mechanisms in legacy devices with existing proprietary interfaces through the use of a proxy agent. Other devices and subsystems also can be integrated into an SMI-S network using embedded software or a CIM object manager. Automated discovery: SMI-S-compliant products announce their presence and capabilities to other constituents. Combined with the automated discovery systems in WBEM to support object model extension, this will simplify management and give network managers the freedom to add components to their SAN more easily. Policy-based management: SMI-S includes object models applicable across entire classes of devices, which lets SAN managers implement policy-based management for entire storage networks.
Monitoring and Managing the Data Center - 37

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Common Information Model (CIM)


Describes the management of data Details requirements within a domain Information model with required syntax

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 38

The Common Information Model (CIM) is the language and methodology for describing management data. Information used to perform tasks is organized or structured to allow disparate groups of people to use it. This can be accomplished by developing a model or representation of the details required by people working within a particular domain. Such an approach can be referred to as an information model. An information model requires a set of legal statement types or syntax to capture the representation, and a collection of actual expressions necessary to manage common aspects of the domain. A CIM schema includes models for systems, applications, Networks (LAN), and devices. The CIM schema will enable applications from different developers on different platforms to describe management data in a standard format so that it can be shared among a variety of management applications.

Monitoring and Managing the Data Center - 38

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Web Based Enterprise Management (WBEM)

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 39

Web Based Enterprise Management (WBEM) is a set of management and internet standard architectures developed by the Distributed Management Task Force (DMTF) to unify the management of enterprise computing environments, traditionally administered through traditional management stacks like SNMP and CMIP. WBEM provides the ability for the industry to deliver a well-integrated set of standard-based management tools leveraging emerging web technologies. The DMFT has developed a core set of standards that make up WBEM, which includes a data model, the CIM standard; an encoding specification, xml CIM encoding specification; and a transport mechanism, CIM Operation over HTTP .

Monitoring and Managing the Data Center - 39

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Enterprise Management Platforms (EMPs)


Graphical applications Monitoring of many (if not all) data center components Alerting of errors reported by those components Management of many (if not all) data center components Can often launch proprietary management applications May include other functionality
Automatic provisioning Scheduling of maintenance activities

Proprietary architecture
2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 40

Enterprise Management Platforms (EMPs) are complex applications, or suites of applications, that simplify the tasks of managing and monitoring data center environments. They will monitor data center components such as network switches, SAN switches, hosts, and alert the user of any problems with those components. At a minimum, the icon associated with the component in the GUI will change color to indicate its condition. Other forms of alerting, such as email or paging, may also be used. In addition to the monitoring functionality, management functionality is usually included as well. This may take the form of native management by code embedded into the EMP, or may involve launching the proprietary management utility supplied by the manufacturer of the component. Other included functionality often allows easy scheduling of operations that must be performed on a regular basis, as well as provisioning of resources such as disk capacity.

Monitoring and Managing the Data Center - 40

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Module Summary
Key points covered in this module: It is important to continuously monitoring of data center components to support the availability and scalability initiatives of any business
Components include the server, SAN, network, and storage arrays

The four areas of monitoring:


Health Capacity Performance Security

There are attempts to define a common monitoring and management model


2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 41

These are the key points covered in the module. Please take a moment to review them.

Monitoring and Managing the Data Center - 41

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Managing in the Data Center


After completing this module, you will be able to: Describe individual component tasks that would have to be performed in order to achieve overall data center management objectives Explain the concept of Information Lifecycle Management

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 42

The objectives for this module are shown here. Please take a moment to review them.

Monitoring and Managing the Data Center - 42

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Managing Key Data Center Components


Client

HBA Port HBA Port


IP

Keep Alive

IP

SAN Storage Arrays Availability Reporting Capacity Performance

Network

Cluster

Hosts/Servers with Applications


2006 EMC Corporation. All rights reserved.

Security
Monitoring and Managing the Data Center - 43

In the module on Monitoring, we learned about the importance of monitoring the various data center components for Health, Capacity, Performance, and Security. In this section, we will focus on the various management tasks that need to be performed in order to ensure that Capacity, Availability, Performance, and Security requirements are met. The major components within the data center to be managed are: IP Networks Servers and all applications and databases running on the servers Storage Area Network (SAN) Storage Arrays Data Center Management can be broadly categorized as Capacity Management, Availability Management, Security Management, Performance Management and Reporting. Specific management tasks could address one or more of the categories. E.g. A LUN Masking task, addresses Capacity (storage capacity is provided to a specific host), Availability (if a device is masked via more than one path then single point of failure is eliminated), Security (masking prevents other hosts from accessing a given device) and Performance (if a device is accessible via multiple paths then host based multipathing software can improve performance by load balancing).

Monitoring and Managing the Data Center - 43

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Data Center Management


Capacity Management
Allocation of adequate resources

Availability Management
Business Continuity
Eliminate single points of failure Backup & Restore Local & Remote Replication

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 44

Capacity Management ensures that there is adequate allocation of resources for all applications at all times. Capacity Management involves tasks that need to be performed on all data center components in order to achieve this goal. Let us take the example of allocating storage to a new applications that will be deployed on a new server from an intelligent storage array (we will explore this specific example in much more detail later in this module). To achieve this objective the following tasks would have to be performed on the storage array, the SAN and on the server: Storage Array: Device configuration, LUN Masking SAN: Unused Ports, Zoning Server: HBA Configuration, host reconfiguration, file system management, application/database management Availability Management ensures business continuity by eliminating single points of failure in the environment and ensuring data availability though the use of backups, local replication and remote replication. Backup, local and remote replication have been discussed in Section 4 Business Continuity. Availability management applies to all data center components. In this example, of a new application/server, availability is achieved as follows: Server: At least two HBAs, multi-pathing software with path failover capability, Cluster, Backup. SAN: Server is connected to the storage array via two independent SAN Fabrics, SAN switches themselves have built-in redundancy of various components. Storage Array: Devices have some RAID protection, Array devices are made available to the host via at least two front-end ports (via independent SAN fabrics), Array has built-in redundancy for various components, local and remote replication, backup.

Monitoring and Managing the Data Center - 44

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Data Center Management, continued


Security Management
Prevent unauthorized activities or access

Performance Management
Configure/Design for optimal operational efficiency Performance analysis
Identify bottlenecks Recommend changes to improve performance

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 45

Security Management prevents unauthorized access to, and configuration tasks on, the data center components. Unauthorized access to data is prevented as well. In the new application/server deployment example, security management is addressed as follows: Server: Creation of user logins, application/database logins, user privileges. Volume/Application/Database management can only be performed by authorized users. SAN: Zoning (restricts access to front-end ports by specific HBAs). Administrative/Configuration operations can only be performed by authorized users. Storage Array: LUN Masking (restrict access to specific devices by specific HBAs). Administrative/Configuration operations can only be performed by authorized users. Replication operations are restricted to authorized users as well. Performance Management ensures optimal operational efficiency of all data center components. Performance analysis of metrics collected is an important part of performance management and can be complicated because data center components are all inter-related. The performance of one component will have an impact on other components. In the new application/server deployment example performance management will involve: Server: Volume Management, Database/Application layout, writing efficient applications, multiple HBAs and multi-pathing software with intelligent load balancing. SAN: Design sufficient ISLs in a multi-switch fabric. Fabric design core-edge, full mesh partial mesh Storage Array: Choice of RAID type and layout of the devices (LUNs) on the back-end of the array, choice of front-end ports (are the front-end ports being shared by multiple servers, are the ports maxed out), LUN Masking devices on multiple ports for multi-pathing.
Monitoring and Managing the Data Center - 45

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Data Center Management, continued


Reporting
Encompasses all data center components is used to provide information for Capacity, Availability, Security and Performance Management Examples
Capacity Planning
Storage Utilization File System/Database Tablespace Utilzation Port usage

Configuration/Asset Management
Device Allocation Local/Remote Replica Fabric configuration Zone and Zonesets Equipment on lease/rotation/refresh

Chargeback
Based on Allocation or Utilization

Performance reports
2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 46

Reports can be generated for all data center components. Data center reports can be used for trend analysis, capacity planning, chargeback, basic configuration information, etc.

Monitoring and Managing the Data Center - 46

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Scenario 1 Storage Allocation to a New Server


Host
Storage Allocation Tasks
File / Database Mgmt File System Mgmt Volume Mgmt

SAN
SAN Zoning Allocate Volumes Hosts

Array
Assign Volumes Ports Config New Volumes

File System Host / Database Used Used

Host Allocated

Reserved

Mapped

Unconfigured

Configured Volume Group Allocated

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 47

Let us explore the various management tasks with the help of an example. Let us assume that a new server has to be deployed in an existing SAN environment and has to be allocated storage from a storage array. The allocated storage is to be used by an application which uses a relational database. The database uses file systems. The picture breaks down the individual allocation tasks. We will explore the individual tasks in the next few slides. Storage Array Management Configure new volumes on the array for use by the new server Assign new volumes to the array front end ports SAN Management Perform SAN Zoning Zone the new servers HBAs via redundant fabrics to the front end ports of the storage Array Perform LUN Masking on the storage array Give the new server access to the new volumes via the array front end ports Host Storage Management Configure HBAs on new server Configure server to see new devices after zoning and LUN Masking is done Volume Management (LVM tasks) File System Management Database/Application Management
Monitoring and Managing the Data Center - 47

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Array Management Allocation Tasks


Configure new volumes (LUNs)
Choose RAID type, size and number of volumes Physical disks must have the required space available

Assign volumes to array front end ports


This is automatic on some arrays while on others this step must be explicitly performed Intelligent Storage System
Front End
Host Connectivity

Back End Cache

Physical Disks
LUN 0

LUN 1 RAID 0 RAID 1 RAID 5


2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 48

As we learned previously, the physical disks at the backend of the storage array are not directly presented as LUNs to a Host. Typically, a RAID Group or RAID set would be created and then LUNs could be created within the RAID set. These LUNs are then eventually presented to a host. These LUNs appear as physical disks from a host point of view. The space on the array physical disks that has not been configured for use as a host LUN is considered un-configured space and can be used to create more LUNs. Based on the storage requirements configure enough LUNs of the required size and RAID type. On many arrays, when the LUN is created, it is automatically assigned to the Front End ports of the array. On some arrays, the LUNs have to be explicitly assigned to array front end ports this operation is called Mapping.

Monitoring and Managing the Data Center - 48

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Server Management HBA Configuration


Server must have HBA hardware installed and configured
Install the HBA hardware and the software (device driver) and configure

HBA Driver

New Server

Multi-path

HBA

HBA

Optionally install multi-pathing software


Path failover and load balancing
2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 49

The installation of the HBA hardware, software, and HBA configuration has to be performed before the server can be connected to the SAN. Multi-pathing software can be optionally installed. Most enterprises would opt to use multi-pathing because of availability requirements. Multi-pathing software can also perform load balancing, which will help performance.

Monitoring and Managing the Data Center - 49

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

SAN Management Allocation Tasks


Perform Zoning
Zone the HBAs of the new server to the designated array front end ports via redundant fabrics
Are there enough free ports on the switch? Did you check the array port utilization?
Storage Array SW1
HBA HBA Port Port Port

New Server

SW2

Port

Perform LUN Masking


Grant the HBAs on the new server access to the LUNs on the array
2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 50

Zoning and LUN Masking operations have been discussed in detail in the section on FC SAN. Zoning tasks are performed on the SAN Fabric. LUN Masking operations are typically performed on the storage array. The switches should have free ports available for the new server. Check the array port utilization if the port is shared between many servers.

Monitoring and Managing the Data Center - 50

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Server Management Allocation


Reconfigure Server to see new devices Perform Volume Management tasks Perform Database/Application tasks

VG LV FS HBA HBA

DB App
2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 51

Reconfigure Server to see new devices Bus rescan or a reboot Perform Volume Management tasks Create Volume Groups/Logical Volumes/File Systems # of Logical Volumes/File Systems depends on how the database/application is to be laid out Database/Application tasks Install database/application on the Logical Volumes/File Systems that were created Startup database/application

Monitoring and Managing the Data Center - 51

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Scenario 2 Running out of File System Space


Solutions Offload non-critical data
File System

Delete non-essential data Move older/seldom used data to other media


ILM/HSM strategy Easy retrieval if needed

Extend File System


Operating System and Logical Volume Manager dependent Management tasks seen in Scenario 1 will apply here as well

Warning: FS is 66% Full

Critical: FS is 80% Full


2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 52

In this scenario, we will explore data center management tasks that you would possibly have to execute to prevent a file system from getting 100% full. When a file system is running out of space, either: Actively perform tasks which off load data from the existing file system (keep file system the same size) Delete unwanted files Offload files that have not been accessed for a long time to tape or to some other media from which it can be easily retrieved if necessary Extend the file system to make it bigger Considerations for extending file systems Dynamic extension of file systems is dependent on the specific operating system or logical volume manager (LVM) in use The possible tasks to extend file systems is discussed in more detain in the next slide In reality, a good data center administrator should constantly monitor file systems and offload non-critical data and also be ready to extend the file system, if necessary.

Monitoring and Managing the Data Center - 52

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Scenario 2 Running out of File System Space, continued


Correlate File System with Volume Group or Disk Group.
Yes

Done No Execute Command to extend File System. Execute Command to extend VG.

Is there free space available in the VG? No Does the server have additional devices available? No Does the Array have configured LUNs that can be allocated? No Does the array have unconfigured capacity? No
2006 EMC Corporation. All rights reserved.

Is the File System being replicated?


Yes

Yes

Yes

Perform tasks to ensure that the larger File System and Volume Group are replicated correctly

Allocate LUNs to server

Yes

Configure new LUNs

Identify/Procure another array


Monitoring and Managing the Data Center - 53

The steps/considerations prior to the extension of a file system have been illustrated in the flow chart. The goal is to increase the size of the file system to avoid application outage. Other considerations revolve around local/remote replication/protection employed for the application. For instance, if the application is protected via remote/local replication and a new device is added to the Volume Group, ensure that this new device is replicated as well. The steps include: Correlate the file system to the logical volume and volume group if an LVM is in use If there is enough space in the volume group extend the file system If the volume group does not have space does the server have access to other devices which can be use to extend the volume group extend the volume group extend the file system If the server does not have access to additional devices allocate additional devices to the server many or all of the steps discussed in scenario 1 will have to be used to do this (configure new LUNs on array, LUN mask, reconfigure server to recognize new devices extend volume group extend file system)

Monitoring and Managing the Data Center - 53

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Scenario 3 Chargeback Report


Storage Arrays
VG LV FS VG DB App LV FS VG DB App LV FS DB App

Production (Green)
Port

Remote Replica (Red)

SW1

Port Port

SW2 Hosts/Servers with Applications

Port

Local Replica (Blue)

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 54

Scenario 3: In this scenario, we will explore the various data center tasks that will be necessary to create a specific report. A number of servers (50 only 3 shown in picture) with 2 HBAs each and are connected to a Storage Array via two switches SW1 and SW2. Each server has independent paths (2 HBAs) to the storage array via switch SW1 and switch SW2. Applications are running on each of the servers, array replication technology is used to create local and remote replicas. The Production devices are represented by the green devices, local replica by the blue devices and the remote replicas by the red devices. A report documenting the exact amount of storage used by each application (including that used for local and remote replication) has to be created. The amount of raw storage used must be reported as well. The cost of the raw storage consumed by each application must be billed to the application owners. A sample report is shown in the picture. The report shows the information for two applications. Application Payroll_1 has been allocated 100 GB of storage. Production volumes are RAID 1 volumes hence the raw space used by the production volumes is 200 GB. Local replicas are on unprotected (no fault tolerance) volumes, hence raw space used by local replicas is 100 GB. The remote replicas are on RAID5 (5 disk group) volumes, hence raw space used for remote replicas is 125 GB. What are the various data center management steps to perform in order to create such a report?

Monitoring and Managing the Data Center - 54

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Scenario 3 Chargeback Report Tasks


Correlate Application File Systems Logical Volumes Volume Groups Host Physical Devices Array Devices (Production) Determine Array Devices used for Local Replication Determine Array Devices used for Remote Replication Determine storage allocated to application based on the size of the array devices

Example:
VG LV FS Array 1
Source Vol 1 Local Replica Vol 1 Local Replica Vol 2

Remote Array
Remote Replica Vol 1 Remote Replica Vol 2

DB App

Source Vol 2

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 55

The first step in determining the chargeback costs associated with an application is to correlate the application with the array devices that are in use. As indicated in the picture, trace the application to the file systems, logical volumes, volume groups, and eventually to the array devices. Since the applications are being replicated, determine the array devices used for local replication and the array devices used for remote replication. In the example shown, the application is using Source Vol 1&2 (in Array 1). The replication devices are Local Replica Vol 1&2 (in Array 1) and Remote Replica Vol 1&2 (in the Remote Array). Keep in mind that this can change over time. As the application grows, more file systems and devices may be used. Thus, before a new report is generated, the correlation of application to the array devices should be done to ensure that the most current information is used. After the array devices are identified, the amount of storage allocated to the application can be easily computed. In this case Source Vol 1&2 are each 10GB in size. Thus the storage allocated to the application is 20GB (10+10). The allocated storage for replication would be 20GB for local and 20GB for remote. The allocated storage is the actual storage that can be used, it does not represent the actual raw storage used by the application. To determine the raw space, determine the RAID protection that is used to the various array devices.

Monitoring and Managing the Data Center - 55

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Scenario 3 Chargeback Report Tasks, continued


Determine RAID type for Production/Local Replica/Remote Replica devices Determine the total raw space allocated to application for production/local replication/remote replication Compute the chargeback amount based of price/raw GB of storage Repeat steps for each application and create report Repeat the steps each time the report is to be created (weekly/monthly)

Example:
2 Source Vols = 2*10GB RAID 1 = 2* 20GB raw = 40GB 2 Local Replica Vols = 2*10GB = 2*10GB raw = 20GB 2 Remote Replica Vols = 2*10 GB RAID 5 = 2*12.5 GB raw = 25GB Total raw storage = 40+20+25 = 85GB Chargeback cost = 85*0.25/GB = 21.25
2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 56

To determine the raw space, review the steps displayed on the slide using the example listed. Determine RAID type for Production/Local Replica/Remote Replica devices. In the example shown, production devices are 10GB RAID 1, Local replica devices are 10GB with no protection, and remote replica devices are 10GB RAID 5 (5 disk group) devices. Determine the total raw space allocated to application for production, local replication, and remote replication. Based on the values from step 1, you can determine that the total raw space used by the application is 85GB. (Total raw storage = 40+20+25 = 85GB). Compute the chargeback amount based on price per raw GB of storage. Based on the cost per GB of storage (for the example this equals .25/GB), the chargeback cost can be computed. (Chargeback cost = 85*0.25/GB = 21.25). Repeat these steps for each application and create a report. Repeat the steps each time the report is to be created (weekly/monthly). The exercise would have to repeated for every single application in the enterprise in order to generate the require report. These tasks can be done manually. Manual creation of the report may be acceptable if only one or two applications exist. The process can become extremely tedious if many applications exist. The best way to create this report would be to automate these various tasks.

Monitoring and Managing the Data Center - 56

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Information Lifecycle Management


Information Management Challenges Information Lifecycle Information Lifecycle Management
Definition Process Benefits Implementation

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 57

Information Lifecycle Management (ILM) is a key approach for assuring availability, capacity, and performance. Lets look at some of the aspects of ILM.

Monitoring and Managing the Data Center - 57

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Key Challenges of Information Management


CHALLENGE

Scaling infrastructure within budget constraints

1 2 3 4 5

Information growth is relentless

CHALLENGE

Scaling resources to manage complexity

CHALLENGE

Information is more strategic than ever Information changes in value over time

Access, availability, and protection of critical information assets at optimal cost


CHALLENGE

Reducing risk of non-compliance


CHALLENGE

Ability to prioritize information management based on data value

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 58

Companies face three key challenges related to information management: Strong growth of information: Post-dot com rate of growth is around 50%, driven by digitization, increased use of e-mail, etc. Just planning for growth can take up to 50% of storage resources Meeting growth needs has increased the complexity of a customer environment Information is playing a more important role in determining business success: New business applications provide more ways to extract a competitive advantage in the marketplace, e.g., companies like Dell, WalMart, and Amazon, where, at the heart of their respective business models, is the strategic use of information. Finally, information changes in value, and many times not necessarily in a linear fashion. For example, customers become inactive, reducing the need for account information; pending litigation makes certain information more valuable, etc. Understanding the value of information should be at the heart of managing information in general

Monitoring and Managing the Data Center - 58

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

The Information Lifecycle


Sales Order Application Example

New Order Record

Order Processing

Warranty Claim Orders Fulfilled

VALUE

TIME Warranty Voided

Protect Create Access

Migrate Archive

Dispose

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 59

Information that is stored on a computer has a different value to a company, depending on how long it is stored on the network. In the above example, this sales order goes through differing value to the company from the time that it is created to the time that the warrantee is eventually voided. In a typical sales example as this one, the value of information is highest when a new order is created and processed. After order fulfillment, there is potentially less need to have real-time access to customer/order data, unless a warranty claim or other event triggers that need. Similarly, after the product has entered EOL, or after the account is closed, there is little value in the information and it can be disposed.

Monitoring and Managing the Data Center - 59

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Information Lifecycle Management Definition

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 60

Information Lifecycle Management is a strategy, not a product or service in itself; further, this strategy is proactive and dynamic in helping plan for IT growth as it relates to business needs, and reflects the value of information in a company. A successful information lifecycle management strategy must be: Business-centric by tying closely with key processes, applications, and initiatives of the business Centrally managed, providing an integrated view into all information assets of the business, both structured and unstructured Policy-based, anchored in enterprise-wide information management policies that span all processes, applications, and resources Heterogeneous, encompassing all types of platforms and operating systems Aligned with the value of data, matching storage resources to the value of the data to the business at any given point in time

Monitoring and Managing the Data Center - 60

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Information Lifecycle Management Process


Policy-based Alignment of Storage Infrastructure with Data Value

AUTOMATED
Classify data / applications based on business rules Implement policies with information management tools Manage storage environment Tier storage resources to align with data classes

FLEXIBLE Storage infrastructure that is

Application and Lifecycle Aware


2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 61

The process of implementing the ongoing modification of an Information Lifecycle Management strategy consists of four activities: Classify data and applications on the basis of business rules and policies to enable differentiated treatment of information Implement policies with information management toolsfrom creation to disposal of data Manage the environment with integrated tools that interface with multi-vendor platforms, and reduce operational complexity Tier storage resources to align with data classes - storing information in the right type of infrastructure based on its current value.

Monitoring and Managing the Data Center - 61

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Information Lifecycle Management Benefits


Information growth is relentless
1. Improve utilization of assets through tiered storage platforms 2. Simplify and automate management of information and storage infrastructure 3. Provide more cost-effective options for access, business continuity and protection 4. Ensure easy compliance through policy-based management 5. Deliver maximum value at lowest TCO by aligning storage infrastructure and management with information value

Information is more strategic than ever Information changes in value over time

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 62

Implementing an ILM strategy delivers key benefits that directly address the challenges of information management. Improved utilization, by the use of tiered platforms, and increased visibility into all enterprise information Simplified management by integration of process steps and interfaces to individual tools in place today, and by increased automation A wider range of options backup, protection, and recovery to balance the need for continuity with the cost of losing specific information Painless compliance by having better control upfront in knowing what data needs to be protected and for how long Lower TCO while meeting required service levels through aligning the infrastructure and management costs with information value so resources are not wasted or complexity introduced by managing low-value data at the cost of high-value data

Monitoring and Managing the Data Center - 62

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Path to Enterprise Wide ILM


App App App App App App App App App

Data

Data

Data

Data

Data

Automated Networked Storage

ILM for Specific Applications

Cross-Application ILM

Step 1 Networked Tiered Storage


Enable networked storage Automate environment Classify applications / data

Step 2 Application-specific ILM


Define business policies for various information types Deploy ILM components into principal applications

Step 3 Enterprise-wide ILM


Implement ILM across applications Policy-based automation Full visibility into all information

Lower cost through increased automation


2006 EMC Corporation. All rights reserved. Monitoring and Managing the Data Center - 63

Implementing ILM enterprise wide will take time, and no one believes it can be done instantaneously. A three step roadmap to enterprise-wide ILM is illustrated. Step 1 and 2 are tuned to products and solutions available today, with the goal to be ILMenabled across a few enterprise-critical applications. In step 1, the goal is to get the environment to an automated networked storage environment. This is the basis for any policy-based information management. The value of tiered storage platforms can be exploited manually. In fact, many enterprises are already in this state. Step 2 takes ILM to the next level with detailed application/data classification and linkage to business policies. While done in a manual way, the resultant policies can be automatically executed with tools for one or more applications, resulting in better management and optimal allocation of storage resources. Step 3 of the vision is to automate more of the front-end or classification and policy management activities so as to scale to a wider set of enterprise applications. It is consistent with the need for more automation and greater simplicity of operations.

Monitoring and Managing the Data Center - 63

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Module Summary
Key points covered in this module: Individual component tasks that would have to be performed in order to achieve overall data center management objectives were illustrated
Allocation of storage to a new application server Running out of file system space Creating a chargeback report

Concept of Information Lifecycle Management

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 64

These are the key points covered in this module. Please take a moment to review them.

Monitoring and Managing the Data Center - 64

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Section Summary
Key points covered in this section: Areas of the data center to monitor Considerations for monitoring the data center Techniques for managing the data center

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 65

This completes Section 5 Monitoring and Managing the Data Center. Please take a moment to review the key points covered in this section.

Monitoring and Managing the Data Center - 65

Copyright 2006 EMC Corporation. Do not Copy - All Rights Reserved.

Course Summary
Key points covered in this course:
Storage concepts and architecture Evolution of storage and storage environments Logical and physical components of storage systems Storage technologies and solutions Core data center infrastructure elements and activities for monitoring and managing the data center Options for Business Continuity

2006 EMC Corporation. All rights reserved.

Monitoring and Managing the Data Center - 66

This completes the Storage Technology Foundations training. Please take a moment to review the key points covered in this course.

Monitoring and Managing the Data Center - 66

Вам также может понравиться