You are on page 1of 50

Symmetrix VMAX3 Internals: Essentials

Monitoring Workloads and System Activity

This module reviews workload characteristics and how they impact performance on the
Symmetrix. It also introduces the principles of performance management and several tools for
viewing metrics that show how the workload characteristics affect the performance of the
storage system.

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

The objective of this module is not to make you expert on performance, but rather to
introduce some key concepts and provide several tools that can be used to understand the
relationship between workload characteristics and the performance of the array.

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

As with a lot of questions in systems design, the answer to a lot of questions about
performance expectations is It depends. This module will explore some of the dependencies
that impact performance.
With the VMAX3, one of the objectives is to minimize the number of knobs and levers that
can be tweaked and to focus on standard layouts based on proven best practicesincluding
wide workload distribution using virtual provisioning, pooling of CPU resources with dynamic
allocation, and FAST automation to ensure compliance with agreed upon Service Level
Objectives.

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

The high level architecture of the VMAX has not changed with the VMAX3 but how it was
implemented did. Hosts still connect through the frontend, physical disks are connected
through the backend, and IOs go through cache. The key to achieving optimum system
performance is maintaining a balanced workload across system components. The challenge is
how to achieve this when the array typically handles many different workloads that are
constantly changing.
On the VMAX3, the system is 100% virtual provisioned and the pools are preconfigured from
the factory following proven best practices for size, protection, and layout. The value of virtual
provisioning is well known and, when configured following best practices, will achieve the best
possible workload distribution. FAST (Fully Automated Storage Tiering) is always on to ensure
that as workloads change, data is placed in the correct pools in order to get the best overall
performance and maintain compliance with Service Level Objectives.
Global memory sizes have increased on the VMAX3; the intelligent cache management
algorithms ensure that more IOs are serviced from cache, thus providing the best possible
response time.
Workloads tend to come in bursts. While on average, the system may show low utilization,
when bursts of activity come in, which is typical of many environments, the system can
become stressed. The VMAX3 addresses this by pooling CPU resources. Rather than dedicating
CPU cores to ports, CPU cores are pooled and work together to handle the workload across
ports.
4

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Historically, performance has been treated as a break-fix issue: My application is slow, its
your array! This puts EMC in a defensive position to do the analysis to prove that it isnt the
array. Although it is possible that it is the array, its more likely that the requirements were
not understood and the system was not sized properly, or the workload has changed, or the
system was not laid out following best practices.
Over the years EMC has developed customer tools for monitoring storage and has attempted
to instill capacity management, including performance management, as an ongoing discipline
in the data center. This has helped as it has provided a better understanding of what is
normal, and if something changed, it helped focus the analysis.
Each generation of Symmetrix introduces new capabilities and configuration options.
Sometimes there are too many choices, and these may not be fully understood. This
sometimes leads to systems not being configured optimally.
With the VMAX3, not only have there been significant architectural implementation changes,
but also there has been a move to service level based provisioning, using easily understood
response time objectives. FAST and ongoing compliance monitoring have shifted the focus to
understanding the workload requirements upfront, modeling the requirements to develop the
optimum design with the right set of resources, and then building the system in a manner
that follows known best practices. The result is a system that can handle a known workload at
a specified SLO as measured by easily understood response time objectives. This allows for
better prediction of when additional resources are required.
5

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Performance tuning is all about identifying bottlenecks in the systems so they can be
eliminated. Bottlenecks occur when one or more components are over utilized and new work
becomes queued up and waits before being processed. A physical disk drive is a good example
of a potential bottleneck. Because of mechanical and bandwidth limitations, a physical disk
drive is capable of processing only a certain number of IOs in a given timeframe. If there is
more work than can be processed, some work will queue up.
The goal of performance analysis is to identify these bottlenecks or hotspots. Once a
bottleneck is identified, it can be removed by rebalancing the workload. Quite often, after a
bottleneck is removed, another will appear. Therefore, performance analysis is an iterative
process of monitor, identify, resolve, monitor again.
While our focus is on the Symmetrix, there are many other factors that can be the root cause
of perceived performance problems. Oftentimes, we monitor to qualify that the problem is not
with the Symmetrix. At other times, real bottlenecks can be identified through monitoring and
tuning is necessary to resolve them.
Performance can be defined by raw numbers; it is usually the user experience in real-world
environments that will prompt the investigation of performance.

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

An IO is a unit of work, or a complete transfer, between two end points. There are many
end points in an Enterprise storage environment. From the time the operation is initiated by
the application until it is processed by the storage system, different protocols may be
involved and the data transformed. A single IO may be broken into multiple IOs and
reassembled. Keep in mind that more than one complete transfer might be taking place at
different points in the data path.
The illustration shows how a single file might be broken up as it passes from the application
to the array. The file write may be multiple IO operations at the file system, operating
system, and Host Bus Adapter.
The Fibre Channel HBA will transfer each IO it receives as a series of frames of up to 2 KB
to the next connectivity device. The frames are routed through the fabric to the fibre
adapter (FA) on a Symmetrix, where they are re-assembled into the original IO as sent by
the HBA.
At the VMAX, the frontend director allocates one or more cache slots depending on the
logical block address range. When the backend director persistently stores the data, smaller
writes may be combined to larger writes and the backend director will apply the RAID
protection where a single host write to cache may result in multiple IOs to disk.
Note: For this discussion, we are looking at the IO from the array perspective only.

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Transferring a single IO between points involves more than just transferring the data. In
every data transfer protocol, several steps are involved and there is additional link overhead
that includes negotiation, header, and acknowledgement components.
Negotiation and Acknowledgement Both endpoints must agree to the transfer and
manage the operation. This includes any handshaking tasks that processors use to
schedule and organize the activity. Most protocols require some setup negotiation to start
the IO and a final finish message to terminate it.
Header includes identifier or address information and may include other information for the
endpoints to understand the data. This may include CRC or parity bits used to error check
the data that adds additional overhead on the channel.
Data The actual data. This does not include the overhead of the header, negotiation and
acknowledgement.
The Negotiation, Header, and Acknowledgement components are largely fixed in size
regardless of IO size. An 8KB IO has the same negotiation and header information as a 32 KB
IO. Larger block IO have only a larger data component.
The illustration graphically shows the parts of an IO on a time graph. As you can see, the
overhead components take time and consume bandwidth.

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

IOs per second (IOPS) is a measure of the number of transfers per second between end
points, typically between the host and the array but also between cache and backend and
backend to/from disk.
IOPS is not a standalone metric and should be considered in context of other metrics. For
example, IOs may be much higher in an OLTP (Online Transaction Processing) environment,
which is typically a smaller block than a DSS (Decision Support System), which typically does
larger block IO.

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Throughput measures the volume of data transferred per second through an IO channel.
From the array perspective, throughput measures the useful part of the I/O and does not
include the header and negotiation overhead.
Throughput will be highest with larger sized IOs because there is less overhead on the link,
and larger block IO make more efficient use of cache, require less processing, and can be
read from or written to disk more efficiently.
A throughput value is often used to rate the speed of a channel. This is more accurately
termed bandwidth. This number is a theoretical maximum for all traffic on the channel,
including headers and negotiation. Since performance measurement tools report the data
volume and ignore header and negotiation, measured throughput will not reach the rated
maximums in practice.

10

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

The IO per second, throughput, and IO size are closely related.


When small IOs are transferred, less time is taken on each IO, increasing the number that can
be moved in a given time period. The IOPS measure will be greater for small size IO than
larger size IOs; however, the data payload will be smaller as there is more overhead for
header, link negotiation, and acknowledgement. . While more IOs may be sent with small
block size, the total data throughput will be less.
The bottom graph illustrates this relationship. As the IO size increases, the IOPS measure
decreases. The graph also illustrates how throughput increases as the IO size increases and at
the same time the total number of IOs goes down.
Larger sized IOs are more efficient for the array to process; however, the size of the IO is
outside of the control of storage administrator/architect. Understanding the IO size does help
explain the behavior, however.

11

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

There are other considerations other than IO size that influence IOPS and throughput.
Workload skew and locality of reference are very important. Skew can be defined as the
percentage of the workload that is performed by the percentage of the data. A common skew
might be 85-15 where 85% of the work is being performed on 15% of the data. When the
workload is skewed in this manner, which is typical of most environments, tiered storage and
FAST can be very efficient at positioning the most active data on the fastest storage. Locality
of reference is the sequentiality of the IO. If a host reads consecutive logical block addresses,
prefetching can stage data in cache and increase the case hit rate. Writes to consecutive
blocks allow more efficient writes to the backend. For example, sequential small block writes
can be folded into more efficient larger writes on the backend. An example of this is a full strip
write with RAID5 vs. reading the old data and parity before writing new data and new parity
for small block writes.
Utilization is another consideration. While ideally you would want to see utilization of 100%,
as utilization increases, queueing also increases and this effects response time observed. For
example, if it takes 10ms to service an IO from disk and there are 10 IOs ahead of it in the
queue, the response time observed by the application will be 100ms. This may not be
acceptable.

12

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Every component in the system has limits on the amount of work it can perform while
maintaining a reasonable response time. Again, while it is ideal to get 100% utilization on
system components, the thing to remember is that as utilization increases, queueing occurs,
and queueing affects response time. As a general rule, we like to plan for utilization in the
40%, 50%, 60% range. If on average there is 50% utilization, there are very likely bursts of
activity that can peak out at 100%, which can lead to response time issues.
Also take into account degraded mode and the impact if a director should fail and the
surviving director in the engine must take on the additional workload; or in the event that a
drive fails and the data needs to be rebuilt. This puts an additional burden that will likely
increase utilization and response times.

13

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Response is often the most important measure of performance to the end user and is used
with IO per second and throughput. A system may be able to do more IOPS at the expense of
response time. Response time measures the total time taken to process an IO from the
frontend perspective. This is from the time the request is received until the time it is
complete. The response time measured by the Symmetrix does not include SAN connectivity
devices, HBA delays, or host queueing.
Host response times are also measured by the host operating system. The time of the start of
the applications request to the time of the operating systems final acknowledgement to the
application is the host response time. This measure includes queueing, HBA and connectivity
delays, as well as the time it takes the Symmetrix to process the request.
With VMAX3, response time is a key component in SLO based provisioning; the system
monitors compliance with the SLO to ensure that response time expectations are met.

14

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Real time analysis concerns what is going on right now in the Symmetrix. The Solutions
Enabler symstat command provides an interface for looking at systems activity in real time.
Unisphere for VMAX, using the storstpd command also can report on real time activity.
Real time monitoring may help detect bursts of activity where the peaks are often lost due to
averaging over longer term samples. Because of the frequency of data collection with real
time monitoring, typically fewer metrics are available for these short-term samples.

15

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

The statistics command (symstat) performs the following:


y Queries Symmetrix devices to capture raw performance counts and store them in
memory.
y Retrieves the performance counts for the Symmetrix array as a whole.
y Retrieves the performance counts for a director or director port.
y Retrieves the performance counts for one or more Symmetrix devices.
y Retrieves the performance counts for one or more Symmetrix device groups, composite
groups, or RDF groups.
y Retrieves the performance counts for a selection of, or all, Symmetrix disks.
y Retrieves the timestamp of the performance count sample.
y Retrieves and displays replication session statistics for SRDF/A.
y Retrieves GigE iSCSI network statistics.

16

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Oftentimes when doing capacity planning and system monitoring, a longer term perspective is
required. However, for troubleshooting an immediate issue or when performing testing and
benchmarking, an understanding of what is happening right now is needed. This is where
symstat proves its value as a tool.
When using symstat, you need to specify a collection interval using the -i flag, and optionally
a count using c. In earlier versions of symstat it was possible to set a collection interval of a
few seconds. Today, if an interval of less than 60 seconds is specified, symstat will use 60 as
the minimum. The count flag is optional and, if not specified, will run continuously until the
command is terminated using ctrl-c.
The default is to report activity against all devices on the system. On a busy array with
potentially thousands of active devices, it is more likely you are interested in a subset of
devices. This can be specified using a device group or specifying specific devices or directors
in order to focus the output.
Remember that like other Solutions Enabler commands, the request and the response are
sent in-band using the same path as used for normal IO. Running symstat on the same host
and ports that are being used to generate IO may skew the results; if a front-end port is
already stressed, running symstat may give unpredictable results.

17

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

symstat presents different types of information giving different perspectives on the workload.
Requests are the most common set of statistics and reports on activity coming into the
Symmetrix. Other metrics will report on activity to and from cache and activity to disk.
There are also a set of statics for SRDF/A replication.

18

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

The default type of statistics for the symstat command is requests; therefore if a type is not
specified with the type option, requests are presented. The -c argument defines the number
of samples. The default for this argument is continuous sampling. If you do not specify this
argument, but you specify an -i value, the command produces a continuous statistical output,
requiring a cancel (ctrl-c) to stop the process.
To filter the information to display only information of interest, you can use h and specify a
device group. For example: symstat -i 60 -c 3 -test_dg
Here is an example of how to create a device group:
C:/> symdg create test_dg
C:/>symdg g test_dg addall devs 154:15d

19

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

The display here shows an example of the output of the symstat command. The default type
of statistics for the symstat command is requests, if a type is not specified. In this example,
we specified a device group using the dg option. This limits the output to only the devices in
the device group. If a device group is not specified, it will report against all active devices.
Using the i option, the output is generated using an update interval of 60 seconds and the c
specifies a count of 5 intervals before the command exits.
IO/Sec and throughput are self-explanatory. The %hit of approximately 46% would be
reasonable for a semi-random workload on a lightly used Symmetrix as is the 100% cache hit
for writes. This indicates 100% of writes are to tracks that are already in cache and marked
as write pending. As cache utilization increases with other workloads, it is unlikely to sustain
100% hits for writes.
Notice the Device Write pending; while these numbers may seem high, they look normal as
neither the total write pending nor the individual device write pending is approaching the
ceiling.

20

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

The VMAX3 can have up to 16TB global memory but still cache is not an infinite resource. On
the Symmetrix every IO must go through cache, thus cache is the heart of the system. From
a users data perspective, cache serves two primary functions:
Maintains recently accessed, data making it readily available. Statistically, users will access
data that was recently used. The longer data resides untouched, the less likely it is to be
accessed again. The system uses Tag-based caching, a Least Recently Used (LRU)
algorithm to determine which cache slots to overwrite.
Buffers writes until they can be de-staged to a persistent location on disk. When a host
writes data to the VMAX, it is written to cache, and the host is notified the write is
complete. Data is asynchronously written to disk by the backend directors. The priority of
this is dependent on the number of tracks that are write pending.
Through the Virtual Matrix, the memory on each director board forms a single global memory
address space. In addition to being used for caching of reads and writes, global memory also
contains metadata such as track ID tables for each device configured. This metadata is used
by the system to determine if data is in cache and the status of the data. Cache is dynamic,
and is the global memory left after the metadata. Creating new devices and/or adding
snapshots consume global memory and impact the cache slots available.
To ensure fairness and to prevent a device from consuming too much global memory, there
are two ceilings imposed: System Write Pending and Device Write Pending limits.
21

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

There are two Write Pending ceilings on the VMAX:


y System Write Pending: 75% of usable cache. When the Symmetrix is at the System
Write Pending ceiling, all new writes are delayed until the number of write pending
tracks is below this ceiling.
y Device Write Pending: Will not allow a single device to have more than 5% of the
available System Write Pending limit. If the limit is reached, a Device Write Pending
event occurs, and the new writes to the device are delayed until the Device Write
Pending is below the ceiling.
When new write requests come into the frontend faster than they can be de-staged on the
backend, the number of write pending tracks increases. As long as the number of write
pendings is less than 50% of the available cache slots, new requests are serviced on the
frontend at a higher priority than de-staging write pendings on the backend. When the
number of write pendings increase to over 50%, de-stage activity is given a higher priority.
When either the Device or System Write Pending ceiling is reached, de-staging write pending
is given the highest priority. Effectively, hitting either write Pending Limit makes the system
respond to writes at disk speeds. Typically this happens when the backend disks are
oversubscribed; the solution is either to change the disk type or to add more disks to spread
the workload.
These ceilings are dynamic and based on the amount of global memory and device
configuration in the system. Using the command symcfg list -v displays these limits.
22

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

-type memio shows cache statistics including Write Pendings and Prefetch activities. In this
example, taking into account the previous output of the symcfg command, we see that we
are not close to either the system or device write pending ceiling

23

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

This diagram compares the previous architecture of the VMAX where there were four frontend
emulations and four backend emulations with ports and CPU cores dedicated to each instance.
With VMAX3 each director has one instance of frontend and one instance of backend that are
serviced by a pool of CPU cores; each port can leverage the full processing power of the pool
of CPU resources. This enables a VMAX3 to better respond to bursts of activity.
When looking at activity on a VMAX3, it can be measured at both the director level and the
port level, the director level shows all IO for all ports on a director.
Frontend directors, also called channel directors, are configured in pairs, providing
redundancy and continuous availability in the event of repair or replacement.

24

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Frontend activity on a VMAX3 frontend can be measured at both the director and the port
level with the director level showing all IO activity to all ports for a director.
There are queues for each device on a frontend director and the CPU resource process IOs
from these queues. For best performance, a host should be connected through multiple
frontend directors. When configured using Powerpath or other multipathing software, IO
operations can be spread across multiple queues serviced by multiple cores and the
multithreaded frontend emulation code. For best performance, it is recommended that a host
be configured to a minimum of two frontend directors. This will minimize the impact of a
failure in any component of the IO path.

25

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

In this example we are looking at the IO activity for a director. This includes all IO for all
ports. Note: The generic director type -SA was specified. This includes all open systems
frontend directors, including Fibre Channel, iSCSI, and FCoE.

26

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Using the type Port will show the activity to individual ports. The key point is to balance
workload across available ports.

27

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

There are three components involved with processing an IO from the backend perspective:
Logical Device: IOs to and from a host to a Logical Device that result in backend activity.
For example, read misses and new writes to tracks that are not currently in cache and
marked as write pending.
Disk Director: IO activity to/from the physical disks controlled by the director. This
includes host IO, protection writes to mirrored and RAID devices that are the result of host
IOs. On the VMAX3, the EDS emulation supports the backend director for other IO activity
that is not directly the result of an IO request. This would include FAST movement and local
replication.
Physical Disk: All reads and writes to physical disk that are the result of host IO, RAID
protection, rebuilds, and local and remote replication.

28

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

In this example, we are looking at the total IOs from a DA director to disk. Again, the key to
best performance is to balance workload across all directors.

29

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Sequentiality is detected by the frontend director and dispatches a task to the backend
director to begin prefetching.

30

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

The actual prefetching is performed by the DA. On a system with low activity, you may see
more prefetching because it is done as a lower priority task.

31

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Physical disks, especially mechanical disks, are often the slowest link in the IO chain. As we
have seen from prior discussions, ideally most IOs are serviced from cache. However, read
misses require IO to disk and this means the IO is performed at the speed of the mechanical
disk, which includes positional and rotational latency. The impact is exponential when disks
are oversubscribed and there is queueing at the drive level. While a mechanical drive may be
able to service an IO at 10ms, if there is queueing and there are 10 IOs ahead of it, the
response time will be 100ms, which is likely to cause unacceptable performance. The best way
to minimize the impact is to use the fastest storage for the most active data and to spread the
workload wide across as many drives as possible.

32

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Using the -type DISK option with the symstat command displays IO requests and
throughput on a physical disk. The drive is identified by the DA director number and spindle
ID. We generally use the rule of thumb that a 15K RPM drive can perform approximately 150
IOPS while maintaining reasonable response time. For a 7.2K RPM drive, the rule of thumb is
closer to 50 IOPS.
In the example, we are seeing ~1300 RPM to some drives. These are EFD drives, which can
easily do 10X the IOs of a 15K RPM mechanical disks.

33

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

While real time tools allow you to see what is happening right now, diagnosing performance
problems and capacity planning requires longer term monitoring. Solutions Enabler includes
the storstp (Symmetrix Trends and Performance) daemon, which collects both real time and
diagnostic information and makes it available to performance management tools such as
Unisphere for VMAX.
This daemon, as are all Solutions Enabler daemons, is managed by the stordaemon command.

34

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

While real time tools allow you to see what is happening right now, diagnosing performance
problems and capacity planning requires longer term monitoring. Solutions Enabler includes
the storstp (Symmetrix Trends and Performance) daemon, which collects both real time and
diagnostic information and makes it available to performance management tools such as
Unisphere for VMAX.
This daemon, as are all Solutions Enabler daemons, is managed by the stordaemon command.

35

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Before Unisphere for VMAX can be used to display SLO compliance information, or any
performance information, U4V must register with the local storstpd daemon to collect the
necessary performance information. There are two levels of data collection: Root Cause
Analysis and Real Time. Real Time collects high level KPI at 5 minute intervals and is useful
for displaying bursts of activity that may be lost in averaging in the Root Cause Analysis
collection, which by default uses a 5 minute interval but a much more extensive set of
performance metrics.
Generally you want to collect diagnostic information at all times but you may enable and
disable real time collections as needed as they do increase syscall traffic.

36

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

New with VMAX3 and Unisphere for VMAX 8.0 is compliance with Service Level Objectives.
This is typically observed at the Storage Group level as SLO are applied to a Storage Group.
In this example, we see that there are nine Storage Groups defined. Five of them have SLOs
defined and all are in compliance with the response time requirements. There are also four
storage groups using the default SLO of Optimize.
This view also shows the general approach with Unisphere for VMAX of starting with a high
level view and drilling down as appropriate.

37

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

The performance views are organized into three levels:


Monitor High level information that can help determine if the system is working
optimally. Within this view there is a heatmap, summary and dashboard level information.
Analysis This provides greater detail than the Monitor views and allows viewing of more
Key Performance Indicators (KPI).
Charts This view allows a user to create custom charts of KPIs.
The illustration here is a view of the heatmap. This chart is based on component utilization
with a color coded representation of the major subsystems, with the color representing the
utilization; red indicates 100% utilization. This view is a good starting point when evaluating a
system to see the distribution of workload and if any component is oversubscribed. The hover
capability shows the details of the components; as we can see in this example, an FA director
is read and the hover shows that it is over 85% utilized.

38

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

This is the Monitor Summary view; it provides more detail about the specific workload. While
this specific view is for Storage Groups, you can see from the dropdown that there is similar
summary level information for other components.
If there is information of interest, from this view you can drill down to the Analysis level for
more details.

39

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Dashboards have similar information as the summary view but presented in a graphical
manner. Similarly, if there is information of interest, from this view you can drill down to the
Analysis level for more details by clicking the Navigate to Analyze button.

40

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Under Analysis, there are three levels of detail:


Real Time is updated at 5 second intervals and reflects the last hour of activity. The
default behavior is to overwrite older data. If the information is of interest, it can be saved
as a real time trace for later analysis. It is also possible to schedule the capture of a real
time trace for some time in the future.
Root Cause Analysis data is averaged and updated at 5 minute intervals by default. It
allows up to 24 hours to be displayed in one view.
Trending and Planning is averaged and reflects a minimum of 24 hours of information of
activity but can display up to 12 month of activity. This is ideal for understanding trending
for long term planning.
The example here is of Root Cause Analysis data and as you can see from the dropdown, it
can be displayed for different system components.

41

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Charts allow a user to create their own charts using real time, diagnostic, or historical
information. Simply select the object to be analyzed, select one or more metrics, and click the
chart button to build a chart that can be displayed and or saved as a dashboard and exported
to be used elsewhere.

42

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

There are a number of tools available for generating workloads. For this class we will be using
Iometer, an open source tool that is widely used in Windows environments. It allows a user to
specify the characteristics of a workload and the target devices to send it to, and graphically
displays the results.

43

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

While it can be run in a cluster environment with multiple dynamos on different servers, for
our lab exercises we will run a local dynamo and multiple works. The first step is to select the
targets. For most tests, we want to use the raw device. If the device has a partition or has
been initialized as a dynamic disk, it will not show in the list.

44

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

Next we would create an Access Specification. This defines the block size, the level of
sequentiality, and the percent of read write. An Access Specification can be assigned a name
and reused.

45

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

The last setup step is to add an Access Specification to the workers.

46

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

To start Iometer, click the green start flag. When prompted, specify a location to save the
results flag. The results can be displayed as an average since the start of test or since the last
update. The update interval can be set for as low as 1 second.

47

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

SPEED refers to a tool both for maintaining and distributing performance information. There
are separate SPEED groups and qualification processes for different platforms.
Some of the performance information is highly EMC confidential and only available to SPEED
Gurus. A Guru is someone who has demonstrated basic knowledge and competencies in
performance by passing a qualification exam and has agreed to the covenants of the program.
This includes respecting the confidentiality of the information and tools and only sharing as
appropriate. It is also understood as part of the program that you will contribute to the
community and help others with performance issues whenever possible.
The three-day Engineering Education Symmetrix Internals: Performance class, along with selfstudy materials, will prepare you to successfully complete the qualification exam.
Cindy OToole manages the program.

48

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

In this class, we introduced some key concepts and provided insight into several tools that
can be used to understand the relationship between workload characteristics and the
performance of the VMAX3 array.

49

Copyright 2015 EMC Corporation. All Rights Reserved.

Symmetrix VMAX3 Internals: Essentials


Monitoring Workloads and System Activity

50

Copyright 2015 EMC Corporation. All Rights Reserved.