Вы находитесь на странице: 1из 440

HPE

Master ASE Server Solutions eBook (Exam HPE0-S22)

First Edition
Miriam Allred
HPE Master ASE - Advanced Server Solutions Architect V3
Official Certification Study Guide
(Exam HPE0-S22)
Miriam Allred

© 2016 Hewlett Packard Enterprise Development LP.

Published by:
Hewlett Packard Enterprise Press
660 4th Street, #802
San Francisco, CA 94107
All rights reserved. No part of this book may be reproduced or transmitted in any form or by any
means, electronic or mechanical, including photocopying, recording, or by any information storage
and retrieval system, without written permission from the publisher, except for the inclusion of brief
quotations in a review.

ISBN: 978-1-942741-33-6

WARNING AND DISCLAIMER

This book provides information about the topics covered in theHPE Master ASE - Advanced Server
Solutions Architect V3 certification exam (HPE0-S22). Every effort has been made to make this book
as complete and as accurate as possible, but no warranty or fitness is implied.

The information is provided on an “as is” basis. The author, and Hewlett Packard Enterprise Press,
shall have neither liability nor responsibility to any person or entity with respect to any loss or
damages arising from the information contained in this book or from the use of the discs or
programs that may accompany it.

The opinions expressed in this book belong to the author and are not necessarily those of Hewlett
Packard Enterprise Press.

TRADEMARK ACKNOWLEDGEMENTS

All third-party trademarks contained herein are the property of their respective owner(s).

GOVERNMENT AND EDUCATION SALES

This publisher offers discounts on this book when ordered in quantity for bulk purchases, which may
include electronic versions. For more information, please contact U.S. Government and Education
Sales 1-855-447-2665 or email sales@hpepressbooks.com.

Feedback Information

At HPE Press, our goal is to create in-depth reference books of the best quality and value. Each book
is crafted with care and precision, undergoing rigorous development that involves the expertise of
members from the professional technical community.
Readers’ feedback is a continuation of the process. If you have any comments regarding how we
could improve the quality of this book, or otherwise alter it to better suit your needs, you can contact
us through email at hpepress@epac.com. Please make sure to include the book title and ISBN in your
message.

We appreciate your feedback.

Publisher: Hewlett Packard Enterprise Press

HPE Contributors: Jim Robinson, Chris Powell, Chris Bradley, Jeff Holderfield, Andrew Leber,
Brian Beneda

HPE Press Program Manager: Michael Bishop


About the Author
Miriam Allred has spent the last ten years configuring, testing, and troubleshooting HPE wired and
wireless networks. She also has extensive knowledge of servers, storage, and cloud technologies.
Miriam combines this wide range of technical expertise with pedagogy and instructional design
training, allowing her to create technical training courses for both advanced and entry-level
networking professionals. Miriam Allred has a Masters degree from Cleveland State University and a
Bachelors degree from Brigham Young University.

Introduction
Based on the Architecting Advanced HPE Server Solutions course, this self-study guide helps you
prepare for the HPE Master ASE - Advanced Server Solutions Architect V3 certification exam
(HPE0-S22). This certification validates you can design, differentiate and deploy advanced enterprise
server solutions including HPE Integrity, Apollo and Moonshot servers. Additionally this
certification validates your ability to design and demonstrate the best solution based on customers
technical, financial, and business needs.

Certification and Learning


Hewlett Packard Enterprise Partner Ready Certification and Learning provides end-to-end continuous
learning programs and professional certifications that can help you open doors and succeed in the
New Style of Business. We provide continuous learning activities and job-role based learning plans to
help you keep pace with the demands of the dynamic, fast paced IT industry; professional sales and
technical training and certifications to give you the critical skills needed to design, manage and
implement the most sought-after IT disciplines; and training to help you navigate and seize
opportunities within the top IT transformation areas that enable business advantage today.

As a Partner Ready Certification and Learning certified member, your skills, knowledge, and real-
world experience are recognized and valued in the marketplace. To continue your professional and
career growth, you have access to our large HPE community of world-class IT professionals, trend-
makers and decision-makers. Share ideas, best practices, business insights, and challenges as you gain
professional connections globally.

To learn more about HPE Partner Ready Certification and Learning certifications and continuous
learning programs, please visit http://certification-learning.hpe.com

Audience
This book is designed for consultants, sales engineers and presales technical architects who
recommend, design and demonstrate HPE Server solutions for large scale, more complex, or
mission-critical scenarios.

Assumed Knowledge
To achieve the HPE Master ASE - Advanced Server Solutions Architect V3 certification, it is assumed
that you have a minimum of six years’ experience with architecting HPE server solutions. Candidates
are expected to have advanced level industry-standard technology knowledge and business acumen
from training and hands-on experience.

Relevant Certifications
After you pass the exam, your achievement may be applicable toward more than one certification. To
determine which certifications can be credited with this achievement, log in to The Learning Center
and view the certifications listed on the exam’s More Details tab. You might be on your way to
achieving additional certifications.

Preparing for Exam HPE0-S22


This self-study guide does not guarantee that you will have all the knowledge you need to pass
theexam. It is expected that you will also draw on real-world experience and would benefit from
completing the hands-on lab activities provided in the instructor-led training.

Recommended HPE Training


Recommended training to prepare for each exam is accessible from the exam’s page in The Learning
Center. See the exam attachment, “Supporting courses,” to view and register for the courses.

Obtain Hands-on Experience


To pass the exam, Hewlett Packard Enterprise strongly recommends a combination of training,
thorough review of additional study references, and sufficient on-the-job experience.

Exam Registration
To register for an exam, go to

http://certification-learning.hpe.com/tr/certification/learn_more_about_exams.html
Chapter 1 Recognize Industry Trends

EXAM OBJECTIVES
• Describe the trends affecting enterprises and explain how these trends lead to the four
Transformation Areas.
• Describe the key business challenges enterprises are facing.
• Review the role of a server architect, emphasizing how the architect helps companies.
• Provide an overview of the HPE enterprise server solutions covered in this ebook:
✓ Apollo solutions
✓ Moonshot
✓ Integrity Superdome X

Assumed knowledge
Before reading this chapter, you should meet the following criteria:
• Knowledge of processors including DDR3 and DDR4 memory, hard disk drives (HDDs), solid
state drives (SSDs), and RAID levels for storage volumes
• Experience with HPE ProLiant rack and blade servers and options for them such as HPE Smart
Array Controllers
• Knowledge of HPE BladeSystems including interconnect modules and Virtual Connect (VC)
modules
• Experience managing and maintaining servers including iLO, Intelligent Provisioning, UEFI, HPE
Insight Remote Support, HPE Insight Online, HPE Smart Update Manager (SUM), and HPE Insight
Control server provisioning (ICsp)
• Very familiar with HPE OneView capabilities
Chapter topics
In this chapter, you will first briefly review the HPE Server Certification paths. Next, you will look at
the major trends facing the IT industry and how HPE Transformation Areas address these changes.
You will then learn about the HPE software-defined data center (SDDC) and learn about the HPE
server solutions that are covered in this ebook.

HPE Server Certification overview


This section outlines the HPE Server Certification, focusing on how this ebook fits within that
certification and what you will gain as an HPE architect.

HPE Server Certification Paths Overview

Figure 1-1 HPE Server Certification Paths Overview

The information in this ebook is designed for architects and integrators following the Server
Solutions Architect path, shown in Figure 1-1. The ideal candidate should have enterprise-level server
architecture and design expertise and be interested in gaining a solid understanding of HPE
Superdome X, Apollo 6000, Moonshot, DL7XX, and underlying technologies through the ebook
activities and validation of your skills through examination. After passing the exam associated with
this ebook (HPE0-S22), you will be certified as a Master Accredited Solutions Expert.

Keep in mind that although the certification exam is associated with this ebook, the exam also tests
you on your mastery of prerequisite training and HPE OneView—as much as 10%–20% of the items
might be on these subjects.
What you will gain from this ebook as an HPE architect
In this ebook, you will learn how to become a trusted adviser for your customers. This chapter will
introduce you to new trends in IT that have become a vital part of almost every company’s day-to-day
operations, as well as a revenue generator. By understanding the key ways that customers need to
transform to prosper in the new idea economy, you will be able to design HPE server solutions that
better meet customers’ needs.

The rest of this ebook guides you through architecting HPE Apollo, Moonshot, and Integrity
Superdome X solutions, teaching you how to design solutions based on customer business
requirements. It also helps you understand how to present the benefits of these solutions to customers
—maximizing the opportunity for the customer to accept your proposal.

HPE Transformation Areas for the new idea economy


In this section, you will learn about the pressures placed on today’s businesses and how the HPE
Transformation Areas address these concerns.

The idea economy is here

Figure 1-2 The idea economy is here

Ideas have always fueled business success. Ideas have built companies, markets, and industries.
However, there is a difference today.

As you see in Figure 1-2, businesses operate in the idea economy, which is also called the digital,
application, or mobile economy. Doing business in the idea economy means turning an idea into a
new product, capability, business, or industry. This has never been easier or more accessible—for
you and for your competitors.
Today, an entrepreneur with a good idea has access to the infrastructure and resources that a
traditional Fortune 1000 company would have. That entrepreneur can rent compute capacity on
demand, implement a software-as-a-service enterprise resource planning system, use PayPal or
Square for transactions, market products and services using Facebook or Google, and have FedEx or
UPS run the supply chain.

Companies such as Vimeo, One Kings Lane, Dock to Dish, Uber, Pandora, Salesforce, and Airbnb
used their ideas to change the world with very little start-up capital. Uber had a dramatic impact after
launching its application connecting riders and drivers in 2009. Three years after its founding, the
company expanded internationally. Without owning a single car, Uber now serves more than 300
cities in 58 countries (as of May 28, 2015). The company has disrupted the taxi industry; San
Francisco Municipal Transportation Agency reported that cab use in San Francisco has dropped 65%
in two years.

In a technology-driven world, it takes more than just ideas to be successful, however. Success is
defined by how quickly ideas can be turned into value.

Creating disruptive waves of new demands and opportunities

Figure 1-3 Creating disruptive waves of new demands and opportunities

Figure 1-3 illustrates how the idea economy presents an opportunity and a challenge for most
enterprises. On the one hand, cloud, mobile, big data, and analytics give businesses the tools to
accelerate time to value. This increased speed allows organizations to combine applications and data
to create dramatically new experiences, even new markets.

On the other hand, most organizations were built with rigid IT infrastructures that are costly to
maintain. This rigidity makes it difficult, if not impossible, to implement new ideas quickly.

Creating and delivering new business models, solutions, and experiences require harnessing new
types of applications, data, and risks. It also requires implementing new ways to build, operate, and
consume technology. This new way of doing business no longer just supports the company—it
becomes the core of the company.
IT must become a value creator that bridges the old and the new

Figure 1-4 IT must become a value creator that bridges the old and the new

To respond to the disruptions created by the idea economy, IT must transform from a cost center to a
value creator, as shown in Figure 1-4. In order to evolve, IT must shift focus
• From efficiently hosting workloads and services to continuously creating and delivering new
services
• From simply providing hardened systems and networks to proactively managing and mitigating
risks
• From just storing and managing data to providing real-time insight and understanding
• From using software to automate business systems to differentiating products and services

Customers need to make IT environments more efficient, productive, and secure as they transition to
the idea economy. They need to enable their organizations to act rapidly on ideas by creating,
consuming, and reconfiguring new solutions, experiences, and business models.

One of the first steps in achieving this kind of agility is to break down the old infrastructure silos that
make enterprises resistant to new ideas internally and vulnerable to new ideas externally. Designing
compelling new experiences and services does not work if the infrastructure cannot support them.

The right compute platform can make a significant impact on business outcomes and performance.
Examples include storage that “thinks” as much as it stores, networking that moves information faster
and more securely than ever before, and orchestration and management software that provides
predictive capabilities.

Each company is on a unique journey to the cloud, custom-made for the way it consumes and
allocates resources, transforms to the changing landscape, implements financial models, and achieves
desired outcomes.

This unique journey starts with four transformation areas


Figure 1-5 This unique journey starts with four transformation areas

This unique journey starts with four transformation areas, shown in Figure 1-5. The HPE
Transformation Areas are designed to
• Generate revenue and profitable growth
• Increase agility and flexibility
• Deliver remarkable customer experience
• Amplify employee productivity
• Reduce cost and risk

These transformation areas reflect what customers consider most important:


• Transforming to a hybrid infrastructure—A hybrid infrastructure enables customers to get
better value from the existing infrastructure and delivers new value quickly and continuously from
all applications. This infrastructure should be agile, workload optimized, simple, and intuitive.
• Protecting the digital enterprise—Customers consider it a matter of when, not if, their digital
walls will be attacked. The threat landscape is wider and more diverse than ever before. A complete
risk management strategy involves security threats, backup and recovery, high availability, and
disaster recovery.
• Empowering the data-driven organization—Customers are overwhelmed with data; the solution
is to obtain value from information that exists. Data-driven organizations generate real-time,
actionable insights.
• Enabling workplace productivity—Many customers are increasingly focused on enabling
workplace productivity. Delivering a great digital workplace experience to employees and
customers is a critical step.

Transform to a hybrid infrastructure


Figure 1-6 Transform to a hybrid infrastructure

An organization might see cloud services as a key component to access the IT services they need, at
the right time and the right cost. A hybrid infrastructure is based on open standards, is built on a
common architecture with unified management and security, and enables service portability across
deployment models. Getting the most out of hybrid infrastructure opportunities requires planning
performance, security, control, and availability strategies (Figure 1-6). For this reason, organizations
must understand where and how a hybrid infrastructure strategy can most effectively be applied to
their portfolio of services.

The Hewlett Packard Enterprise perspective on hybrid infrastructure


Customers struggle with rigid infrastructures and need to transform to an agile, hybrid infrastructure
that generates business value. Based on extensive research, Hewlett Packard Enterprise has defined a
strategy for helping customers in this transformation.

The following paragraphs describe the HPE point of view.

Open matters

Companies are transforming to a hybrid infrastructure because they need flexibility and agility. HPE
will help them to avoid vendor lock-in so that they can maximize flexibility in the future.

Expertise matters

Hewlett Packard Enterprise has decades of experience helping customers design their data centers and
get the most out of their IT infrastructure.

In addition, a majority of companies are seeking help in moving to the cloud. Hewlett Packard
Enterprise design services help companies to obtain the private or hybrid cloud solution that adapts to
their needs.
Control matters

IT needs to become a service provider for line of business (LOB). HPE provides converged,
software-based tools that help in this endeavor. They bring the entire infrastructure under control,
automating provisioning and management as much as possible. In this way, provisioning times
decrease from weeks to minutes.

Infrastructure matters

Every workload is unique. The HPE converged, software-defined hybrid architecture lets companies
optimize for the needs of each workload. Balancing the needs of the particular use case, the company
can tune efficiency, availability, and performance to the right levels.

Business continuity matters

While eager to obtain the promised agility and efficiency of cloud, CIOs are concerned about the
integrity of their data. HPE designs its solutions to protect companies’ business information, whether
hosted on- or off-premises, from external threats. HPE solutions also protect companies from the
inherent risks of lost or improperly managed data that occur with rapid data growth.

Protect the digital enterprise

Figure 1-7 Protect the digital enterprise

Protecting a digital enterprise requires alignment with key IT and business decision makers for a
business-aligned, integrated, and proactive strategy to protect the hybrid IT infrastructure and data-
driven operations, as well as enable workplace productivity (Figure 1-7). By focusing on security as a
business enabler, HPE brings new perspectives on how an organization can transform from
traditional, static security practices to intelligent, adaptive security models to keep pace with business
dynamics.

HPE solutions help customers protect their data in a variety of ways. HPE StoreOnce delivers simple
and secure data backup and recovery for the entire enterprise. HPE ProLiant Gen9 servers support
options such as UEFI Secure Boot to prevent untrusted, potential malware from booting. This ebook,
though, focuses on the capabilities on HPE Integrity Superdome X Systems in preventing unplanned
downtime or data loss for mission critical workloads—you will learn more about these systems later
in this chapter and throughout this ebook.

Empower the data-driven organization

Figure 1-8 Empower the data-driven organization

A data-driven organization leverages valuable feedback that is available consistently from both
internal and external sources (Figure 1-8). By harnessing insights from data in the form of
information, organizations can determine the best strategies to pave the way for seamless integration
of agile capabilities into an existing environment. Because both technical and organizational needs
must be considered, HPE helps organizations define the right ways to help ensure that processes,
security, tools, and overall collaboration are addressed properly for successful outcomes.

Later in this ebook, you will learn how HPE server solutions provide the ideal infrastructure for a
variety of data workloads.

Enable workplace productivity


Figure 1-9 Enable workplace productivity

Organizations seeking to improve their efficiency and speed place a premium on creating a desirable
work environment for their employees, including offering technology employees want and need.
They believe they must enable employees to work how, where, and whenever they want.

HPE solutions for the workplace provide secure, easy, mobile collaboration and anywhere, anytime
access to data and applications for better productivity and responsiveness (Figure 1-9). To achieve the
expected potential, spending growth for mobile resources is expected to be twice the level of IT
spending growth in general, according to 2014 IDC survey results.

Later in this ebook, you will learn about the HPE server solutions that are purpose-built for
supporting the application and desktop delivery solutions that enhance employee productivity.

The HPE software-defined data center


Next, you will look at the HPE software-defined data center, which is abbreviated as SDDC.

The HPE software-defined data center


Figure 1-10 The HPE software-defined data center

SDDC is a concept in which the infrastructure of an organization’s data center extends the use of
virtualization technology by abstracting, pooling, and automating all of the physical data center
resources. A basic business definition of the term is “systems and procedures used in a manner that
enable infrastructure resources to be controlled at the software level in response to changing business
conditions.”

Currently, the most typical response to changing business conditions is to burst out to additional
virtual machines (VMs) using the hybrid cloud model. This is a useful step, but is only a one-
dimensional response to business conditions. What if the network conditions change, the storage
requirements change, or both? That is why a business needs to progress toward an SDDC, where
computing resources can be more fully adapted and can conform to the changing characteristics of
business activities.

Implementing an SDDC in effect amounts to delivering an IT as a Service (ITaaS) solution, illustrated


in Figure 1-10. In an SDDC, the various elements of the infrastructure (which include network,
storage, compute, and security resources) are virtualized and delivered as a service. Although ITaaS
might represent an outcome of an SDDC, the focus of the SDDC solution is more for the benefit of
the data center architects and IT staff, instead of the users or the consumers of the resources. Software
abstraction in the data center infrastructure is not visible to the consumers.

An SDDC can take the form of various potential implementation scenarios being offered by vendors.
Consequently, some critics see the SDDC as an evolving marketing tool, whereas proponents expect
that software will define data centers of the future and so they accept that the SDDC is a work in
progress.

An SDDC encompasses many concepts and data center infrastructure components where each
component can be provisioned, operated, and managed through a programmatic user interface. The
core architectural components that comprise a given vendor ’s SDDC solution might include the
following:
• Compute virtualization, which is a software implementation of a computer ’s processor, memory,
and I/O resources. This is, of course, commonly referred to as hypervisor software.
• Software-defined networking (SDN) or network virtualization. This might involve provisioning
VLANs on a switch, Ethernet ports operating as a single or aggregated link, ports supporting
access or VLAN trunking, security settings, and so forth.
• Software-defined storage or storage virtualization. This might involve provisioning storage
LUNs on a storage array and HBA zoning on a SAN switch.
• Management and automation software that enables an administrator to provision, control, and
manage all SDDC components.

An SDDC is not the same thing as a private cloud because a private cloud only has to offer a virtual
machine self-service solution. Within the private cloud, the IT administrators could use traditional
provisioning and management interfaces. The SDDC instead envisions a data center that could
potentially support private, public, and hybrid cloud offerings.

Some of the commonly cited benefits of an SDDC include improved efficiencies by extending
virtualization across all resources, increased agility to provision resources for business applications
more quickly, improved control over application availability and security through policy-based
definitions, and the flexibility to run new and existing applications in multiple platforms and clouds.

In addition, an SDDC implementation could further reduce a company’s energy usage by enabling
servers and other data center hardware to run at decreased power levels or be turned on and off as
needed. The SDDC is also likely to further reduce the costs for data center hardware and challenge
traditional hardware vendors to develop new ways to differentiate their products through software
and services.

In summary, further acceleration of access to data center resources will require new control options,
which suggests software-defined solutions will be needed to accomplish such objectives.

Role of IT in an SDDC

Figure 1-11 Role of IT in an SDDC

The HPE journey toward a comprehensive SDDC solution must address the role of IT, where IT
transitions from operating primarily as a cost center to a business value center. The table in Figure 1-
11 lists some of the typical objectives of a traditional IT organization operating as a cost center as
opposed to an IT center ’s objectives, which are more typical of one evolving to a business value
center.

For example, this IT transition needs to address:


• Supporting hybrid cloud operations instead of the strictly traditional on-premises scenario
• Meeting expectations where the data center can be used for developing integrated applications and
workflows supporting software as a service (SaaS), as opposed to being used primarily to deploy
custom off-the-shelf (COTS) applications
• Putting infrastructure in place in a matter of hours instead of weeks
• Enabling business projects to be completed in 3–6 months instead of 9–12 months
• Determining success based on key performance indicators (KPIs) instead of more basic IT
operational metrics

HPE envisions policy-based automation using open architectures as a key underpinning to an SDDC
solution.

HPE SDDC architecture

Figure 1-12 HPE SDDC architecture

The HPE architecture for the SDDC can be viewed as consisting of three major layers, which are
shown in Figure 1-12:
• Application—This layer is a next-generation applications platform supporting business
applications and their related infrastructure applications.
• Control—This layer provides control functions for IT administrators, LOB, and application level.
The control layer implements the software-defined abstractions or constructs that map to the
infrastructure resources needed to support application and service requests.
• Infrastructure—The infrastructure layer presents a unified physical and virtual view. This layer
supports open, standards-based, programmatic access to the underlying disparate physical and
virtual infrastructure resources (compute, storage, and networking) and hardware platforms.

Collectively, these three layers in the HPE SDDC architecture unify the key functions of the IT
organization: operations, security, governance, and business processes.

HPE SDDC infrastructure layer

Figure 1-13 HPE SDDC infrastructure layer

In the HPE SDDC architecture, the infrastructure layer is responsible for provisioning, managing,
and supporting the relationship between the physical and virtual resources of the IT infrastructure, as
shown in Figure 1-13. The virtual infrastructure serves as an overlay upon the underlying physical
hardware components.

The underlying physical infrastructure is referred to as an underlay. For each of the major
components of the physical infrastructure (compute, storage, networking, security, and facilities),
there is a corresponding abstracted element: vCompute, vStorage, vNetworking, vSecurity, and
vFunctions.

Programmatic control and infrastructure management are tightly linked, and in some cases, the same
tools can be used manually as well as through an application programming interface (API). HPE
OneView is one example—each action that can be performed by an IT administrator through the
graphical user interface (GUI) can also be done through the Representational State Transfer (REST)
API. This allows HPE OneView to be part of a toolset for the IT administrator to use, or part of the
programmed actions initiated by control panel applications.

HPE OneView also provides management and analysis connections to power, cooling, and facilities
management utilities. This can help ensure that changing requirements of the infrastructure resources
do not get ahead of the associated facilities, power, and cooling support needs. For example, this
helps to avoid situations where the IT group moves all the web traffic to one section of the data
center, but forgets to adjust the power and cooling for that area of the facility.

This ebook focuses mainly on designing the infrastructure layer, as well as server solutions that
support transformation to an SDDC. You will also learn how those solutions integrate with higher
layers.

Designing server solutions to help customers transform the


business
This section introduces you to the HPE server solutions that you will learn about in this ebook.

HPE Apollo 2000

Figure 1-14 HPE Apollo 2000

The HPE Apollo 2000 System, shown in Figure 1-14, is the enterprise bridge to scale-out
architecture. As IT strives to be a value creator, even the most conservative enterprise customers are
looking for ways to save space and become more efficient. The Apollo 2000 delivers twice the
density of traditional rack mount systems and the efficiency of a shared infrastructure; however, it
maintains a familiar form factor, the same racks, cabling, serviceability access, operations, and
system management. There is no retraining of personnel or cost of change for introducing efficient,
space saving, scale-out architecture.

The Apollo 2000 System brings HPE ProLiant Gen9 server technology, including iLO4, into this 2U,
multi-server chassis. The HPE ProLiant XL170r Gen9 Server and the HPE ProLiant XL190r Gen9
Server offer more configuration choices that cover a much wider range of scale-out workloads.
Storage and I/O flexibility enable customers to optimize for performance or economy—the right
compute for the right workload.

The r2800 chassis with 24 SFF drives allows customers to choose how they allocate the hard drives
across the server nodes. Up to four expansion slots in the XL190r support accelerators or other full-
size cards. And you can mix and match trays to build a unique solution or partially populate, leaving
room for growth in the future.

HPE Apollo 4000 Family


Figure 1-15 HPE Apollo 4000 Family

Figure 1-15 introduces the Apollo 4000 Gen9 family of products, targeting big data solutions.
Starting from the Apollo 4200 Gen9 server and moving up to the Apollo 4510 Gen9 Server, this
server family handles data-intensive workloads that range from Hadoop analytics to object storage.

Enterprise and SME customers who want to start or grow big data solutions with purpose-built,
density-optimized infrastructure that is ready to scale will find exactly what they need in the HPE
Apollo 4200 Gen9 Server. This new system is ideal for customers wanting to deploy smaller object
storage systems; Hadoop and NoSQL-based big data analytics solutions; and smaller, data-intensive,
high-performance computing clusters.

The HPE Apollo 4510 System is purpose-built for object storage solutions. Customers can deploy
cost-effective, HPE Apollo 4510 Systems optimized to meet the needs of their object storage solution
requirements at any scale. HPE Apollo 4510 Systems can be configured to form the foundation
platform for the whole variety of big data object storage solutions—from cost-effective, high-
capacity content repositories that address petabyte-scale data volumes, to the tuned responsiveness
required for content distribution systems.

The HPE Apollo 4530 System is purpose-built for big data analytics. It can be configured to optimally
match technology requirements for economical large-scale, Hadoop-based data analytics or it can be
configured for more complex, compute-intensive analytics with high-performance processors.

HPE Apollo 6000


Figure 1-16 HPE Apollo 6000

The HPE Apollo 6000 System, shown in Figure 1-16, delivers 4x better performance per dollar per
watt than a competing blade, using 60% less floor space. From the beginning, HPE designed this
platform for scalability and efficiency at rack-scale, delivering a total cost of ownership (TCO)
savings of U.S. $3M per 1000 servers over 3 years.

The Apollo 6000 provides the flexibility to tailor the system to precisely meet the needs of your
customers’ workloads. They can scale by chassis or rack with a single modular infrastructure,
external power shelf dynamically allocating power to help maximize rack-level energy efficiency,
and easy management. The system supports up to 160 x 1P servers/48U rack or 80 x 2P servers/48U
rack with 8 chassis. You will look at the various compute options, optimized for various HPC
workloads, later in this ebook.

Efficiency at rack scale is fueled by HPE’s unique external power shelf, dynamically allocating power
to help maximize rack-level energy efficiency while providing the right amount of redundancy for
your customers.

HPE Apollo 8000


Figure 1-17 HPE Apollo 8000

For large compute problems, such as predicting agricultural parameters for optimal crop growth or
finding a medical cure, researchers are excited about the new HPE Apollo 8000 System (shown in
Figure 1-17), fueling ground-breaking research in science and engineering with HPE’s leading-edge
technology.

The HPE Apollo 8000 System reaches new heights of performance density, with 144 teraflops/rack.
That’s up to 4x the teraflops per square foot and up to 40% more FLOPS/watt than comparable air-
cooled servers. In fact, the environmental advantages of the HPE Apollo 8000 System can be taken
one step further by leveraging the water used to cool the solution to heat your customers’ facilities—
which National Renewable Energy Laboratory (NREL) estimates will save them $1,000,000 in OPEX,
including the money that would otherwise be used to heat the building.

At the same time, HPE Apollo 8000 System helps reduce your customers’ carbon footprint, saving up
to 3800 tons of CO2 per year. That is about the same amount of CO2 produced per year by 790 cars.

HPE Moonshot

Figure 1-18 HPE Moonshot


Figure 1-18 shows the HPE Moonshot System, a revolutionary server design that addresses the speed,
scale, and specialization required for the IT of today that is emerging around the converging trends
of mobility, cloud, social media, and big data.

From the position as the leading provider of x86 servers for Internet environments, HPE has created
the HPE Moonshot System, the second offering from HPE Project Moonshot. HPE Moonshot System
is the world’s first software-defined server platform to deliver breakthrough efficiency and scale by
aligning just the right amount of compute, memory, and storage to get the work done.

The HPE Moonshot System adopts a federated approach to server design that saves energy and cost
and enables extreme scale-out without a corresponding increase in complexity and management
overhead. HPE Moonshot 1500 Chassis incorporates common components that include management,
fabric, storage, cooling, and power elements and accommodates up to 45 individually serviceable
hot-plug cartridges. The innovative software-defined cartridges can include one or more servers and
are designed for specific Internet of Things (IoT) solutions, providing optimal results for a given
workload. The workload range extends from dedicated hosting, data analytics, and web front end to
more advanced functions made possible by graphics processing units (GPUs), digital signal
processors (DSPs), and field-programmable gate arrays. HPE Moonshot enables enterprises to
maximize their ability to innovate and speed their time to market with new services while reducing
costs and energy use.

HPE Integrity Superdome X

Figure 1-19 HPE Integrity Superdome X

The HPE Integrity Superdome X servers are purpose-built and optimized for mission-critical
workloads that require the highest availability, scalability, performance, and efficiency (as shown in
Figure 1-19). They provide a way for enterprises with the most critical and demanding business
processing, decision support, and database workloads to gain the benefits of an x86 platform. As you
will learn later in this ebook, Superdome X servers offer built-in Reliability, Availability, and
Serviceability (RAS) features that customers have previously found only in UNIX-based or
mainframe systems. Superdome X servers can, therefore, detect and recover from errors and failed
or failing components, keeping your customers’ mission-critical applications running.

Superdome X servers also offer unique hard partitioning features, which provide greater reliability
than virtual partitioning.
These servers are highly scalable. They support up to 16 sockets and 24 TB of memory, delivering
nine times the performance of 8-socket servers.
Chapter 1 Activity
Take a few minutes to review the high-level benefits of HPE server solutions for customers seeking
to transform and embrace the new idea economy. Specifically, consider how the HPE server solutions
covered in this ebook (HPE Moonshot, Apollo, and Integrity Superdome X) support the four key
ways that customers want to transform.
• List the HPE products that support each Transformation Area:
– Transform to a hybrid infrastructure
– Protect the digital enterprise
– Empower the data-driven organization
– Enable employee productivity
• List at least two benefits of these products in helping customers to transform

You can use what you just learned for this activity, as well as the “Supplemental content” section at the
end of this chapter. Do not worry if these benefits are high level at this point. You will learn much
more about these products and how they support customer business requirements throughout this
ebook.

Summary
In this chapter, you learned that in today’s idea economy, enhanced access, data, and connections are
driving exponential innovation, which creates disruptive new challenges and opportunities for IT. In
this idea economy, organizations must protect their digital enterprise, empower the data-driven
organization, enable workplace productivity, and transform to a hybrid infrastructure.

You also reviewed the SDDC infrastructure and learned how HPE has solutions to support the
architecture. Finally, you were introduced to the HPE server solutions that will be covered in the rest
of this ebook.

Learning check
Review what you have learned by answering these questions. Then check your answers in Appendix
A: Answers to Learning Checks.
1. What is one way that an SDDC differs from a traditional data center?
a. It focuses on functionality.
b. It helps IT act as a cost center.
c. It focuses on usability and experience.
d. It enables project delivery to occur in 9–12 months.

2. Which HPE solution is part of the scale-up compute portfolio?


a. HPE Moonshot
b. HPE Integrity Superdome X
c. HPE Apollo 2000
d. HPE Apollo 6000

For answers, See Chapter 1 in Appendix A.

Supplemental content
HPE perspective on hybrid infrastructure server solutions: Exceptional
technology innovation

Figure 1-20 HPE perspective on hybrid infrastructure server solutions: Exceptional technology
innovation

Hewlett Packard Enterprise offers exceptional technology innovations that help businesses achieve
rapid service delivery and exceptional growth, as you see in Figure 1-20. With the right compute
solutions, your customers can take the business to the next step in automation because HPE servers
are software-defined and cloud-ready.

HPE OneView, which uses easy-to-program RESTful APIs to communicate with management
capabilities embedded within HPE servers, helps to automate the server lifecycle. HPE ProLiant
servers (especially blade servers in an HPE BladeSystem), together with HPE OneView, deliver a
whole new experience for IT with the Power of One—one infrastructure, one management platform,
from one company to speed the delivery of services. Only the Power of One delivers leading
infrastructure convergence, availability with federation, and agility through data center automation.

For customers who need a private or a hybrid cloud, the solutions integrate seamlessly with HPE
Helion CloudSystem, which lets IT define various resource pools for individual use cases. The
company can then easily deploy the right workload to the right location on the fly. Of the solutions
that you will focus on in this ebook, HPE Moonshot solutions are supported by HPE Helion
CloudSystem.

HPE server solutions are composable, which means that their components can be combined to meet
particular use cases. They are also scalable so that companies can easily expand to increase their
capacity and servers. Finally, the server solutions are converged with networking and storage
solutions such as StoreVirtual VSA, making it simple for the company to orchestrate services rather
than just servers.

Empower a data-driven organization with HPE

Figure 1-21 Empower a data-driven organization with HPE

When customers are struggling to extract value from their data, their issues might ultimately derive
not only from issues with their data analytics tools but also from an infrastructure that is not
optimized to support and manage large volumes of data.

Hewlett Packard Enterprise helps customers to lay the foundation for data-driven computing, as you
see in Figure 1-21. As required by their particular workloads, customers can scale the infrastructure
up (by adding power to single systems) or out (by adding systems). Customers can scale up with HPE
ProLiant servers, designed for virtualization density. Some mid-market customers with advanced
needs can scale up even more with HPE Integrity, which provides the leading performance and
availability that mission-critical applications need.

For the customers who need solutions tailored for the precise demands of big data and big data
analytics, you can deploy density-optimized, scale-out solutions. In the next sections, you will learn
how to choose HPE Moonshot and HPE Apollo systems for the appropriate roles within a big data
solution.

HPE ProLiant and Moonshot Are the foundation for a data-driven


organization
HPE ProLiant servers can scale up to meet the high demands of a data-driven organization. These
servers offer impressive performance and scalability. HPE gives customers the flexibility to choose
from a variety of options based on their compute and application requirements. In this way, they
obtain the proper expandable solution for their data center without overprovisioning. Now they can
achieve breakthrough efficiency at a compellingly low TCO. At the same time, HPE ProLiant servers
deliver top-of-the line resiliency features and support experience so that customers can achieve high
uptime levels.
The higher-end ProLiant servers easily support customers’ ballooning data and the applications that
draw on that data.

Larger mid-market and enterprise customers who need to scale out find exactly what they need in the
HPE Moonshot solutions. These solutions offer unrivaled scalability. To obtain greater compute
density and flexibility as they grow, customers simply add more server cartridges. HPE Moonshot
continues to deliver excellent throughput, supporting a growing user base with up to 1.7 times more
operations per second than traditional 2U 2P rack servers (based on HPE Internal testing). At the same
time, they offer a 66% lower TCO than those traditional servers (based on HPE Internal testing when
the servers have an 80% read-heavy workload).

This ebook focuses on the HPE Moonshot servers. (You learned how to architect solutions with HPE
ProLiant servers in prerequisite training.)

Distinguish HPE Apollo as the ideal foundation for big data


Purpose-built for mid-market or enterprise big data, HPE Apollo 4000 servers are ideal for
customers who need to deploy smaller object storage systems, Hadoop, and NoSQL-based big data
analytics solutions. These systems provide storage density, easy scalability, flexible configurations,
performance and efficiency, and simple management converged with other HPE solutions.

In a later chapter, you will learn how to design HPE Apollo 4000 solutions for holding Hadoop
distributed file system (HDFS) and act as storage nodes in the HPE Big Data Reference Architecture.
You will also learn how to using HPE Apollo 4000 for big data analytics and object storage.

Dense storage capacity

The Apollo 4200 servers provide more storage density than any other 2U server: up to 28 or 54 hot-
plug drives, depending on the model. The Apollo 4500 family can scale even further with support for
up to 68 drives, depending on the model.

Easy scalability

These servers’ ultra-dense storage makes it easy for customers to scale their big data solution.

Flexible configurations

These servers can be configured for industry-leading storage density. They can also be configured
for performance and throughput. Whatever your customers need, from object storage to data
analytics to high-performance computing data-intensive applications, the Apollo systems can deliver.

Performance and efficiency

The Apollo servers can be configured for high performance and throughput. They support up to 16
memory DIMM slots with up to 1024GB, delivering the performance required for in-memory data
processing for near real-time analytics. Powerful SAS and SSD drives provide up to 12 GB output to
speed data transfer for analytics workloads. Customers will notice the difference in performance,
unlocking the power of their analytics applications and giving them immediate competitive
advantages from their data.

Common management

These Apollo servers integrate seamlessly into traditional enterprise data centers with the same rack
dimensions, cabling, service options, administration procedures, and tools. It is the ideal bridge
system for enterprises that want to implement a purpose-built big data server infrastructure today and
scale in affordable increments.

Enable workplace productivity with HPE

Figure 1-22 Enable workplace productivity with HPE

Your customers cannot afford to ignore technologies that allow their employees to use the network,
communicate, and collaborate in new and more productive ways.

Under constant pressure to work faster and smarter, your customers’ employees need real-time access
to information, whether they are on the road or in the office. With HPE Moonshot for Citrix XenApp,
shown in Figure 1-22, your customers can quickly scale app delivery solutions to hundreds or
thousands of users.

Whether your customers choose solutions that streamline application delivery, offer hosted desktop
infrastructure, or deliver a mobile workspace, HPE and Citrix offer what your customers need to
boost mobile productivity while maintaining IT operational control. Innovative HPE Moonshot with
Citrix solutions enable your customers to
• Address specific mobile workspace challenges and requirements
• Improve compliance and security, with all data residing on centralized servers, enabling IT to have
greater control over apps and data
• Boost cost-efficiency by using the right compute for each specific workload, so there are no
wasted resources
• Improve space and environmental efficiency (HPE Moonshot’s high-density design reduces space,
cooling costs, and the energy footprint)
• Support up to 2000 users in a single HPE Moonshot chassis
With HPE Moonshot and Citrix, organizations receive the technology they need to boost mobile
productivity and speed innovation while maintaining IT operational control and improving
operational efficiency.

HPE optimized compute portfolio

Figure 1-23 HPE optimized compute portfolio

HPE has a portfolio of purpose-built solutions for a variety of workloads, as you see in Figure 1-23.
These platforms support both scale-out and scale-up architectures to meet workload requirements.
The sections below provide a brief overview of the solutions. You will dive into greater detail on the
HPE Apollo, Moonshot, and Integrity Superdome X solutions throughout the rest of this ebook.

Scale-out compute

The scale-out compute part of the portfolio includes the Apollo family: the Apollo 2000 for general-
purpose scale-out compute, the Apollo 4000 family for big data and object storage, Apollo 6000 and
8000 for HPC, and Moonshot for next generation apps.

Apollo systems provide leading storage density along with compute performance flexibility and the
same iLO, APM, and Insight CMU management to meet the needs of a full range of big data
workloads. This combination delivers leading space and power efficiency while lowering overall
TCO.

Apollo 6000 delivers rack-scale efficiency for HPC with


• Up to 4x better performance per watt per dollar when compared to the competition
• Leading performance per $ per watt

HPE Moonshot System is unlike any other server that exists today. It is a huge leap forward in
infrastructure design that delivers breakthrough efficiency and scale by aligning just the right amount
of compute, memory, and storage to get the work done. The idea is very simple—replace general-
purpose processors with more energy-efficient SoCs (Systems-on-Chip) containing integrated
accelerators tailored for specific workloads.

Scale-up compute
The scale-up compute part of the portfolio consists of Integrity Superdome X platforms, the ProLiant
DL580 and 560 servers, and BladeSystem BL660c series.

HPE is the only vendor unifying UNIX and x86 with a single architecture so that customers have
choice and investment protection from a suite of products for mission-critical workloads (Integrity,
Superdome, NonStop, MCx86; enterprise). HPE has enabled x86 workloads (Linux, Microsoft
Windows) on the Integrity Superdome X server platform.

The HPE ProLiant DL Servers are general-purpose rack servers for enterprise applications. They
deliver top performance, reliability, and efficiency for on-premises and cloud-hosted database, data
warehouse, consolidated/virtualized IT apps, and high-performance computing workloads.
Chapter 2 Gather Customer Requirements

EXAM OBJECTIVES
• Identify key decision makers and explain how to engage them in a discussion about the company’s
business requirements and challenges
• Obtain data and documentation required to understand the company’s business requirements
• Explain best practices for creating requirements, statements, and documents

Assumed knowledge
Before reading this chapter, you should meet the following criteria:
• Knowledge of processors, including DDR3 and DDR4 memory, hard disk drives (HDDs), solid
state drives (SSDs), and RAID levels for storage volumes
• Experience with HPE ProLiant rack and blade servers and options for them such as HPE Smart
Array Controllers
• Knowledge of HPE BladeSystems, including interconnect modules and Virtual Connect (VC)
modules
• Experience managing and maintaining servers, including iLO, Intelligent Provisioning, UEFI, HPE
Insight Remote Support, HPE Insight Online, HPE Smart Update Manager (SUM), and HPE Insight
Control server provisioning (ICsp)
• Very familiar with HPE OneView capabilities
Chapter topics
In this chapter, you will learn strategies for gathering information about customer requirements,
including business continuity and availability requirements, IT management requirements, and facility
requirements. You will also review key decision makers and their top-of-mind issues so that you are
better prepared to engage them in discussions about their company’s business and technical
requirements.

Customer requirements
You will begin by examining general strategies for discussing requirements with customers,
documenting information about the customer ’s existing solutions and needs, asking questions at the
right level for each decision maker ’s business role, and defining meaningful and effective
requirements statements.

Understand the scope and constraints of the design

Before you begin collecting the basic requirements from the customer, you should understand the
scope of the project. Specifically, you should obtain from the customer a clear definition of the
following:
• General scope and purpose—Obtain a basic understanding of the scope of the design. Are you
designing a server solution for a new application? Or does the customer already use the application
and need a hardware refresh to improve performance? Or are you simply scaling out an existing
solution? You should have a clear understanding of what the customer needs from the solution in
general terms, and you should make sure that you and the customer agree completely upon the
scope and purpose before you begin your design.
• Implementation timelines and timeframes—Understand when the customer expects or plans to
have the solution completely installed and operational. Defining the timeframe also includes
defining whether the customer intends to implement the solution all at once or in phases. If the
customer has not offered up the reason for the specific timeframe, you might want to ask. If a
deadline is particularly tight, you should understand what is at stake and what is pushing an
accelerated timeframe.
• Budget—While some would argue that putting in place a proper design should drive the
architecture, the fact is that budget will likely play a large role in many projects. You should
understand the budget for the project and also try to assess what kind of leeway you have to exceed
the budget. Budget needs to take into account not only hardware but also installation services and
any training that the staff might need to become comfortable with the implementation. These
components are often requested to be broken out into separate sections for clarity.

In some cases, you might wish to present a customer with multiple designs. One design might meet
the budgetary constraints. Another design might exceed the budget, but will show decision makers the
features and functionality that they would get if they were to increase the budget for the project. A
contrast and comparison between two designs, one optimal and one that fits within the budget, might
open the door to helping you create a better solution for the customer. When making these
comparisons, ensure that you are focusing on the business needs of the customer. Decision makers
need a clear understanding of how a cost-reduced solution will impact business operations.

As you gain an understanding of the scope and constraints of the design, you should do your best to
reflect them accurately to the customer. You should both be in agreement on these parameters before
you begin your design, so it is important that there is no misunderstanding between you and the
customer.

Focus on business requirements

Figure 2-1 Focus on business requirements

You should always begin your process of defining solution requirements at a high level. Server
solutions address business issues, and customers seek new solutions because they have a need or a
problem that they hope these solutions can ameliorate.

The high-level solutions are useful because they remind you that decision makers are pushed on one
side by the problems that they have encountered in the past and pulled on the other side by the benefits
that they hope to achieve. You can win these customers with a design that addresses the problems and
provides new benefits.

For example, the company might be attempting to improve operations, address existing deficiencies,
or reduce the company’s risk. Figure 2-1 provides some examples of these objectives. Note that these
high-level objectives are not intended as rigid divisions. For example, one company might look at
obtaining more compute power per rack unit as a way of improving efficiency. Another customer
might have an application that does not perform well, but be prevented in scaling out due to physical
constraints. For this customer, obtaining more power per rack unit is a way to address current
deficiencies.

You will also need to use your understanding of the current trends in security threats, your knowledge
of existing security measures, and your understanding of the customer ’s security requirements to
create a solution that helps the company reduce their risk.

You need to be familiar with the regulations that govern particular industries. For example, in the
United States, companies that provide health care must comply with the Health Insurance Portability
and Accountability Act (HIPAA). Retail organizations must comply with the Payment Card Industry
Data Security Standard (PCI DSS 2.0). Because most companies’ business activities extend beyond
country borders, you must understand the company’s overall requirements for complying with
regulations.

Most regulations extend well beyond personal firewalls and local data encryption into extensive
security requirements. While regulations might not specifically state security requirements,
companies will want to add extra layers of protection for applicable servers and data in order to
protect themselves from security breaches that could result in fines or other types of penalties. These
fines can add up quickly, since each piece of data that is compromised can incur a fine. One security
breach could compromise thousands of records and incur fines that can place an organization’s
financial stability at risk.

Begin to identify the applications and workloads that the solution must
support
In your initial discussions, you should ask which applications and workloads the solution must
support. For example, ask about applications such as OTLP, Big data analytics, and Cloud native.
(These are just a few examples; it is by no means a comprehensive list.) You will focus on designing
solutions for particular applications and workloads in Chapter 3.

Assess business continuity and risk management requirements


You need to ensure that the customer can provide its services with minimal interruptions. Service
interruptions can lead directly or indirectly to lost revenue, depending on the importance of the
application. For example, if a transactional database goes down, the company’s operations grind to a
halt. You can calculate the cost to the company with this formula:

(Annual revenue/Annual hours)* Business reliance on service = Revenue lost per hour of downtime

The other main risk that you need to help the customer mitigate is that of lost data, which might occur
through hardware failure. Again, the risks are greater for mission-critical data, such as that stored in
transactional databases.

Help the customer to consider all the costs of unplanned downtime and data losses, including
• Lost revenue—As mentioned above, an outage or data loss can cause the company to lose revenue.
Also consider the impact that losing data related to the revenue stream could have on the
company’s budget sheet. If income can only be reported after the data is manually entered
following an outage, this could equate to revenue not being recognized on a balance sheet until
weeks after the outage. This can be more significant if the outage is during a prime processing
time, or if the data is not entered until the next fiscal year of the organization.
• Damage to reputation—How does downtime affect the brand or the reputation of the company?
• Impact to human resources—How are the personnel in the company affected by an outage? Does
downtime equate to late nights or weekends in order to make up for lost productivity? Does it mean
filling out paperwork by hand, only to have to re-enter the data into the system at a later time?
• Impact to regulatory compliance or contractual obligations—Will a service outage jeopardize
compliance or create a breach of contract? If so, what are the ramifications of noncompliance or a
breach of contract?
• Cost to recover—What are the actual costs to recover from a failure? Aside from the actual cost
in dollars, what kinds of reactions from senior-level managers and executives will a service outage
provoke?

Quantify availability
Your customer might request specific availability levels. When you calculate availability, you should
look at these metrics:
• Mean-time between failures (MTBF)—A measure (in hours) of the time between failures or
outages. This is sometimes also referred to as mean-time between service outages or MTBSO.
• Mean-time to repair (MTTR)—A measure (in hours) of the time it takes to recover from a
failure. Availability is expressed as a percentage that is derived from MTBF and MTTR:
Availability = MTBF/(MTBF + MTTR)

For example, if the MTBF is 4000 hours, and the MTTR is 1 hour, availability is 99.98%. (4000/(4000
+ 1) = 99.98).

This type of availability would be roughly four 9s availability. The number “four” refers the number
of nines in the percentage of uptime. Desired availability generally ranges from three 9s to five 9s.
Table 2-1 provides the allowable downtime based on the required availability.
Table 2-1 Availability calculations

As you discuss the availability requirements with the customer, it is important to specify a timeframe.
A company might be able to accept a cumulative downtime of 87 hours in a year, but could not
tolerate an outage of more than an hour or two on any given day. Discuss with the customer the
absolute requirements for applications in the smallest timeframe that is acceptable.

The higher the availability, the greater the cost, so you should encourage the customer to invest in
greater availability for the truly mission-critical applications. For example, a transactional database
handles mission-critical operations and data; the customer should invest in a solution that provides
99.999% availability here. On the other hand, a server that supports just one node in an HPC cluster
does not need to provide the highest levels of availability.

Study the information in Table 2-2 to learn more about the severity levels for applications.
Table 2-2 Application criticality

Severity Description

Mission- • Requires 99.99% or 99.999% availability.


critical • Downtime will disrupt the core business operations on which the customer bases its mission; downtime will cause
large-scale loss of revenue, loss of business, loss of productivity, loss of reputation, or other otherwise significantly
harm the company financially.
• Downtime impacts multiple segments of the company.

Business- • Requires 99.9% availability.


critical • Downtime will disrupt employees’ ability to do their jobs and might indirectly lead to loss of revenue, loss of
business, loss of productivity, or loss of reputation if sustained.
• Downtime might impact one segment or several segments of the company.

Noncritical • Can tolerate 99% availability.


• Downtime does not pose a significant risk or will not cause significant loss of revenue.
• Downtime affects one or only a handful of individuals within a group or segment of the company.

Create effective requirements statements


After you identify the high-level business needs, you must then transform them into specific, clearly
defined requirements statements.

A precise requirements document not only reassures the customer that your solution will align with
their needs and vision, but it also protects you from unwarranted blame in the future. If a customer
later indicates that the solution does not meet specific criteria, you can turn to the design requirements
to show that those criteria were either not listed or not given priority. Thus, you have protected
yourself from an unfavorable situation.

Some customers will come with an RFP that already has specific requirements. Others need help
elaborating their high-level needs into more precise ones. For each business need, you should create
several design requirements statements of increasing specificity.

Each requirements statement should accomplish the following:


• Accurately reflects what the customer desires
• Defines the requirement at a precise enough level that you can design a solution that
unambiguously meets the requirement
• Assigns a value that places that requirement into a hierarchy based on its importance to the
customer

The IEFT recommends the use of key words in the construction of requirements statements (see RFC
2119). Table 2-3 provides some examples.
Table 2-3 Creating effective requirements statements

Importance Example Keyword Examples

Critical Absolute must Must/shall/required or Shall • The server that hosts the application virtualization controller
have not/must not MUST remain available with the failure of up to one link.
• Data MUST remain available with the failure of up to one disk.
• The NoSQL database solution MUST be able to perform
1,000,000 read or write operations per second.

High Preferable, but not an Should/Recommended or • The NoSQL database solution SHOULD be able to perform
absolute requirement Should not/Not recommended 1,500,000 read or write operations per second.
• The solution SHOULD support automated OS provisioning.

Low Desirable, but not at May/optionally • The server MAY use load balancing on its adapters.
all required

Create a requirements traceability matrix (RTM)

Figure 2-2 Create a requirements traceability matrix (RTM)

As you define the business and technical requirements more precisely, you can begin to plan the
technical tasks that support those requirements. A Requirements Traceability Matrix (RTM) such as
the one in Figure 2-2 helps you to track the requirements throughout a project and ensures that each is
fulfilled. Use the RTM to define each task required to fulfill the requirement. Fully define the task,
including deliverables that will indicate that the task is complete.

Discussions with decision makers


You will now consider the various decision makers you typically work with to gather the information
you need to propose a solution. You will consider the concerns that drive each decision maker so that
you are prepared to engage each one in a meaningful discussion about their company’s requirements.
You will later be able to pitch a solution to each one.

Identify key decision makers


Figure 2-3 Identify key decision makers

You will meet and talk with different types of decision makers, such as those you see Figure 2-3. You
should be aware of each decision maker ’s top-of-mind issues so that you can tailor your questions to
the role. Business leaders such as Chief Executive Officers (CEOs) will be able to answer questions
about the company’s strategic goals or give you an idea about the future expansion of the company.
However, you should not ask CEOs detailed technical questions, such as specifics about application
architecture.

Customers might start with general statements. They might not know exactly what they need. You must
act as a sounding board, soliciting increasingly detailed information that you can then consolidate
into specific and concise requirements statements. For example, a customer might begin by
explaining that they need a server refresh to support big data analytics. You must draw out more
information. What type of analytics does the customer plan to use? Is the customer dealing with
structured data in a SQL database or dealing with unstructured data? Will users run queries on older
data that might take several hours to complete? Or is the customer looking for real-time results?

As you follow this process, remember the difference between business, technical, and financial
questions:
• Business questions—Ask how IT impacts the business, avoiding technical details and focusing on
the underlying business needs.
• Technical questions—Ask about the specific technology or solutions in place or required to meet
business needs.
• Financial questions—Ask questions to determine the available budget for the solution. You should
also begin to determine how the company measures success or failure. For example, how does the
company determine if the solution delivers the business outcomes it seeks?

You will explore total cost of ownership (TCO) and return on investment (ROI) in Chapter 10, but at
this point you should get an idea of what the financial decision makers’ expectations are and how they
measure the value of their investment.

The questions, of course, overlap. The business questions direct you toward particular high-level
business needs, while the technical questions help you to figure out the best ways to approach
fulfilling those needs. You need to remember that technical questions should always flow from
business requirements, particularly when you hear different information from different individuals.
Sometimes these individuals are simply phrasing things differently, but are essentially saying the
same thing about what is required. But other times you are truly hearing differing messages that you
need to resolve with the key decision maker.

You will now take a closer look at key decision makers and review each one’s concerns and roles so
that you can tailor your message accordingly.

Understand CEO’s requirements


CEOs focus on the overall corporate strategy. (See Table 2-4 for their top concerns.) When you
engage with one of these decision makers, stay focused on business requirements and benefits. For
example, a CEO’s top priority might be bringing products to market more quickly. In this case, you
must be able to explain how a given solution can help the company achieve this goal.
Table 2-4 CEO’s concerns

CEO (Business decision • Increasing market share and • Approving early decisions about major business
maker) profitability initiatives
• Gaining competitive advantage
• Reducing costs
• Improving the customer experience
• Enhancing productivity
• Mitigating risk
• Enhancing shareholder returns

Understand Line of Business manager’s requirements


In the past several years, as IT solutions have become embedded in all aspects of the business, line of
business (LOB) decision makers (such as the VP of Product Development or VP of Sales) have taken
an increasingly prominent role in IT purchasing decisions.

According to a survey by Harvard Business Review, an average of 5.4 people have “formal sign off
on each purchase.” Furthermore, these people have a variety of jobs and functions and are even
located in different geographies. (See “Making the Consensus Sale,” March 2015.) These decision
makers will quickly focus in on their business outcomes and priorities. When an LOB manager has
goals and responsibilities that rely on IT applications, the manager will have firm opinions and
detailed requirements about how the application performs. These requirements will not be detailed in
the technical sense—an LOB manager probably would not tell you that a server must provide a
specific amount of memory. But they will be detailed in terms of what the managers need to gain
from the solution. For example, the VP of Product Development might require the electronic design
automation (EDA) application used by designers to complete jobs within a certain amount of time.
Table 2-5 outlines the LOB manager ’s roles and top concerns.
Table 2-5 LOB manager’s concerns

Who Concerns Roles

LOB manager (Business • Identify tools that will make employees more • Obtain the solutions
decision maker) • Identify applications that will attract new customers and retain that will drive business
existing customers • Attract new customers
• Work with IT to obtain the tools and applications the business units • Retain existing
need and get them implemented in a timely manner customers
• Improve customer
experience

Understand CIO’s and IT VP’s requirements


Like CEOs, Chief Information Officers (CIOs) and IT VPs focus on the overall corporate objectives.
However, CIOs and IT VPs have the specific responsibility of driving IT strategy to deliver these
overall objectives. For example, they must ensure that the company complies with regulations,
thereby reducing the company’s risk. (See Table 2-6 for a summary of their concerns.)

The IT VP might also be responsible for defining policies and best practices, and any change or new
solution must fit within these practices. For example, the customer might have policies about how data
is stored in order to prevent data compromise or data loss. These policies are often part of a larger
set of security policies and best practices, defined by the IT directors or perhaps a Chief Information
Security Officer (CISO). Other policies might define minimum requirements for the infrastructure
used for particular applications.

Furthermore, CIOs and IT VPs are motivated by budget concerns and might be frustrated by the high
costs of operating a data center. You could interest such a customer in solutions that will reduce
operating expenses. For example, you might suggest a density-optimized solution that delivers more
compute power in a smaller space and with reduced power requirements.
Table 2-6 CIO’s and IT VP’s concerns

Who Concerns Roles

CIO and IT VP (Business and high-level • Upholding SLAs • Driving IT strategy


technical decision maker) • Reducing costs • For SMBs (VP/Director IT), controlling
• Ensuring compliance with all infrastructure
regulations • Controlling budget
• Ensuring network security

Understand IT director’s and manager’s requirements


IT directors and managers focus on the technical level—although they must still understand how
technical solutions and decisions affect the business. IT directors and managers are responsible for
day-to-day operations. Table 2-7 lists some of the many day-to-day operations that IT must handle.

First and foremost, they are concerned with keeping the data center running efficiently and mitigating
operational risks—whether those risks come from outside threats, improperly scoped hardware, or
faulty change management practices.

They want to do more than keep the data center running. They want to improve uptime and minimize
the time and effort required to manage and maintain the solution.

Some of your customers might be in the process of implementing Information Technology


Infrastructure Library, or ITIL, which defines the organizational structure and skill requirements of
an IT organization. ITIL also imposes a formal process for managing incidents, problems,
configurations, changes, releases, and even the service deck itself. (For more information, visit
www.itlibrary.org.)

Larger companies might also have a solution architect. The decision makers will advocate for the
right hardware for their workloads.
Table 2-7 IT operations decision makers

Who Concerns Roles

Server Operations Director or Manager • Avoid operational risks • Running core and edge
(Technical decision maker) • Improve uptime networks
• Minimize time and effort required to • Managing changes,upgrades,
manage and maintain the solution maintenance, and
• Enforce data standards and security troubleshooting
policies
• Manage patch/software releases
• Implement ITIL (www.itlibrary.org)

Solution Architect or Planning Engineer • Ensuring infrastructure meets • Ensuring the long-term
(Technical influencer and possible decision application needs viability of the solution
maker) • Minimizing the time and effort • Directing changes to the
required to manage and maintain the solution
solution
• Ensuring the long-term viability of
the solution

Understand CFO’s requirements


In most companies today, IT budgets remain flat, but expectations for IT solutions continue to
increase. LOB managers hold IT departments accountable for services provided, and managers and
employees alike demand less downtime and increased productivity.

Meeting these challenges requires not only higher productivity and better utilization of IT assets but
also an alignment between business goals and IT objectives. CFOs are interested in solutions that
reduce costs and increase the efficiency of IT operations through productivity tools and improved
utilization of resources. CFOs are also looking for risk mitigation technologies that allow them to
make more informed decisions and maximize business profits.

The CFO is focused on ways to control expenditures by tracking and consolidating all IT expenses by
asset, project, contract, and owner. The CFO wants to maximize the value of existing assets and
support intelligent financial decisions while being able to capture, monitor, measure, and manage
costs associated with assets, contracts, or projects. The CFO needs to reduce costs by retiring, off-
shoring, or outsourcing IT services while still delivering IT services on-time, within budget and with
established quality standards. Table 2-8 summarizes the CFO’s requirements.
Table 2-8 CFO’s requirements

Who Concerns Roles

Chief Financial • Controlling IT spending • Managing financial risks of the


Officers • Tracking operational and capital expenditures holistically business
• Increasing availability and performance of revenue producing and • Aligning IT needs and
tracking application to prevent revenue loss shrinking corporate budgets
• Financial planning, record
keeping, and reporting
Understand procurement manager’s requirements
With the increasing attention on fiscal responsibility and improved management, the procurement
manager is challenged to improve costs, mitigate compliance and security risks, and provide
information to drive business decisions, as you see in Table 2-9. The procurement manager is
focused on improving measurement tools to provide visibility into how the IT organization is doing,
while finding the right resources at the right price. The procurement manager is often in charge of
gathering other individuals involved in the buying process.
Table 2-9 Procurement’s requirements

Who Concerns Roles

Procurement • Enforcing corporate standards • Provide information to drive business


manager • Obtaining services and products the business needs to decisions
operate • Control all procurement processes
• Finding the right resources to enable IT, at the right • Purchasing decisions
price

Maintain awareness of political climates


In addition to being able to discuss business issues and solutions with different decision makers, you
must always assess the politics and culture of the company. By understanding and navigating the
customer ’s politics, you can create a proposal that the key decision makers are more likely to adopt.

You should be aware of political factors such as these:


• Organization of IT—Does the company have converged teams that include both server and
networking specialists? Or is there a server team and a networking team?
• What are the group dynamics? Do you notice any hostility between certain groups? Does one
group seem to have more to say than another group? Do some groups have similar objectives but
cannot see eye-to-eye on a solution?
• Remember to ask about the ramifications of the server solution design. Who will take the most
responsibility for the success or failure? What are the rewards and who will be rewarded for
success? What are the ramifications for failure?
• Impact to employees, customers, or partners—Who will the server design affect? Will certain
groups within the company be taking on more or less responsibility after the design
implementation? If certain groups will be taking on less responsibility, does this imply
downsizing? Is it possible that some of the people helping you with the design could be eliminated
after the design is completed and implemented?
• History of organization—Find out as much about the history of the previous IT implementations as
possible; this will help to avoid past pitfalls. If previous successful implementations immediately
led to downsizing, this might hinder employees’ willingness to implement new technologies. If the
organization has a history of poor implementations, then it is vital to find out the source of the
failures to avoid repeating these mistakes.

While your job as a server architect is not to moderate the political climate, you might find that
understanding a company’s dynamics can make it easier for you to get the job done. If you can see the
commonalities between different groups within the company, you can help design solutions that meet
the needs of a broader segment of the company. If you understand who stands to gain from the new
server solution, you might be able to make a friend or an ally that can provide you with the data you
require. And if you understand how the network design will impact groups and personnel within the
business, you may gain a better insight into why certain individuals are not as forthcoming as they
might otherwise be.

Gather information about new requirements


You now have a sense of the many different stakeholders with whom you will interact. Next, you will
turn to exploring strategies for collecting the information you need from these decision makers in
order to architect a solution that meets their needs.

You can take several approaches to gathering information about the executive, IT operation, and LOB
requirements, and you have probably used at least some of these many times. In addition to meeting
with decision makers for personal interviews, you can ask these decision makers if they would work
with you in conducting user or IT staff surveys and questionnaires. Such surveys can uncover issues
and pain points of which high-level decision makers are less aware. They can also give you valuable
information about how employees actually use applications and what they expect from the solution.

You might be designing a solution to host a new application that the customer is rolling out. But often
you will be providing an upgrade intended to deliver better performance, greater efficiency, or
greater scale for an existing application. You need to understand as much as possible about the
application and the current solution. Request information such as current server specifications,
logical topologies, and application architectures. Also, request information about current
performance and resource utilization. This information will prove invaluable as you design your
solution. For example, if you know that the existing servers are constantly reaching their memory
limits, you would know to provision more memory for the corresponding servers in your solution.

It is important to note that you should treat any document that the customer provides you with respect
and confidentiality. If you have not already done so, be prepared to sign a Non-Disclosure Agreement
(NDA) or some other form of confidentiality agreement before gaining access to this information.
When a customer has sensitive government information, you may even be required to have a security
clearance.

You should also treat the documentation the customer provides you as subject to some uncertainty
regarding its accuracy. IT jobs are demanding and keeping documentation up to date is not always a
priority. The information the customer provides is there to help inform the decisions that you make
when you design the customer ’s network solution. A rule of thumb is to verify the revision date of
any document that the customer provides you. The further the current day is from the revision date,
the less credence you should give to the document, even if the customer asserts that things have not
changed.

It can be surprisingly easy for important requirements to remain undefined when you rely solely on
discussions. You can job shadow an SME to learn exactly how applications are architecture and used,
to uncover the precise infrastructure requirements, and to gain insight into IT processes. You might
also be able to uncover pain points or inefficiencies that you can solve in your solution, making the
solution more attractive to the customer. For example, you might observe that server administrators
spend a significant amount of time provisioning servers with their OS or that they struggle to give
you the information that you have requested about resource utilization. You would then know that the
customer might be a candidate for a provisioning and monitoring solution such as HPE Cluster
Management Utility (CMU).

Ask questions
You will now explore some of the different types of questions that you might ask during the personal
interview: Verification, New information, Golden nuggets, Opinion, and Commitment. You do not
need to memorize these categories or worry about determining whether a specific question fits one
category or another. What is important is that you consider all the types of questions that you can ask
and know how to ask appropriate ones.

Verification

Figure 2-4 Verification

You will be gathering a great deal of information from many different sources. You must verify that
you have understood what stakeholders have told you, seeking to avoid assumptions that could lead to
design errors. Your sales partners pass on some information to you, but this information is often
high-level, and you must verify and deepen it. In the example in Figure 2-4, the architect confirms that
he or she understands all of the EDA tools a customer is using in order to ensure that the high
performance computing (HPC) solution meets the needs.

New information
You must ask many questions to uncover new information. Even if you have worked with a customer
before, do not assume that you understand the environment. Ask for updates.

Use these types of questions to work with SMEs to fill in any knowledge gaps. For example, you often
are planning a server refresh in order to deliver better performance for an application that a customer
is already using. You need to learn as much as possible about how the application is performing now
so that you can understand what needs to change. In the ongoing EDA example, you might ask, “Have
administrators monitored resource usage during analysis jobs and discovered any overutilized
resources?”

And although your sales partners have probably already discovered many business needs in
preliminary discussions, keep your ears open for other business requirements that you might be able
to meet.
Golden nuggets
You will find “golden nuggets” of information as you ask fact finding, problem identification, and
implication questions that lead decision makers toward understanding the importance of the solution
that you will propose in meeting their business requirements.

You begin by finding facts about the current state of affairs. For example, you might say, “I’m told
that EDA jobs can take hours to run. What do designers do while they wait?” The customer ’s answer
will probably point toward a problem. Your next question should make that problem explicit: “Did I
understand correctly that designers cannot continue working while they wait for their jobs to
complete? Is the backlog affecting deadlines?”

After the customer has acknowledged the problem, you can draw out the implications, pointing the
customer toward the ways in which your solution can solve the problem and meet the customer
business requirements. For this example, you might ask, “Could you bring products to market more
quickly if you had hardware that could better support your EDA jobs?”

Opinion
Sometimes asking decision makers to share their opinions is the best way to discover unidentified
issues. For example, you could ask managers for the department using an EDA application, “Do you
believe that your designers have the help they need to work efficiently?” Questions like this can
reveal requirements that stakeholders might not otherwise have mentioned, but that can transform
your proposal from a merely adequate one to the one the customer chooses to implement. Such
questions also demonstrate to stakeholders that you care about their issues and opinions.

Finally, you can gain valuable information about stakeholder attitudes. As you know, when you plan a
solution, you are not only wrestling with technical requirements but also with political issues. Do key
stakeholders have a bias toward particular types of solutions or designs, such as InfiniBand versus
Ethernet for an HPC interconnect? Do they seem likely to want the best solution money can buy, or do
they want you to balance their requirements with their budgetary constraints?

Commitment
This last question category is intended to help win decision makers to your side and to make them
more likely to commit to your proposal. Acknowledge stakeholders’ expertise and ask for their
honest thoughts about the project and their objectives. For example, “You have been in this role for a
number of years, and I am still learning about the organization. What are your thoughts about this
project and the intended goals of the project?”

When you know what is important to the stakeholder—and when the stakeholder knows that you value
what they value—you can create a proposal that the stakeholder is more likely to accept.

IT management requirements
You should also become familiar with the customer ’s IT processes and governance requirements.
You can then recommend the appropriate management solutions and lifecycle services to meet the
customer needs.

Management domains

Figure 2-5 Management domains

Customers often divide IT management into various domains, such as the ones shown in Figure 2-5.
The domains of most relevance to you are hardware management, software management, and
facilities management. However, modern data centers require more convergence and cooperation
between teams. The customer might have a siloed IT governance culture, but it is important that you
avoid falling into the silo trap. A storage or network manager might have a crucial piece of
information about the current data center infrastructure that will affect your design. For example, in
order to propose uplink modules for HPE Moonshot chassis, you must understand how the chassis
will fit in the data center network. Take care to involve all stakeholders in discussions to avoid
changes to plans at a late date.

Understanding a company’s IT governance processes and a customer ’s particular goals for a project
can also help you to offer the correct solution to the customer. For example, sometimes customers
have more or less standalone projects. They are deploying a new application and want to get the
complete infrastructure required for that application without extensive efforts across siloed
management teams. Offer these customers HPE ConvergedSystems, which bring together the servers,
networking, and storage required for various applications—delivering a proven solution that is up
and running in a fraction of the time for a typical project.

Management and monitoring tools


Figure 2-6 Management and monitoring tools

Customers require tools that operate at several different levels, as shown in Figure 2-6. Element-level
tools manage and monitor a single component, such as one server. Resource pool tools manage
multiple resources, such as all of the customers’ storage arrays or servers. Finally, a solution stack
tool manages and monitors converged resources, including all of the compute, storage, and network
resources required for a solution.

All of these management tools can have a role to play within a customer ’s overall processes. You will
learn about HPE solutions that you can recommend at each level.

IT processes and HPE solutions to transform them


As you meet with the customer, you should assess the level of standardization and automation that has
already been achieved with its IT processes, as well as the level that the customer wants to achieve.
You can help customers understand that they can avoid costly human errors, reduce IT operational
costs, and roll out applications more quickly by standardizing, automating, and aligning the
infrastructure with LOB requirements. You can then propose the HPE management solutions that
support the level of transformation that the customer wants. (You will learn more about these
solutions throughout this ebook.)
Chapter 2—Activity 1
To review the information you have learned, spend a few minutes completing an activity. You will
read about a fictitious company called Make Things Better (MTB) and try to uncover the company’s
pain points and identify key initiatives.

The situation
MTB is a large manufacturer of health products, pharmaceuticals, and consumer packaged goods.
The company’s tagline is “Products for a healthier and happier world.” It is perhaps best known for a
groundbreaking medication that slows the ravages of Alzheimer ’s disease.

Headquartered in the New York metropolitan area, MTB comprises about 250 subsidiary companies
with operations in 63 countries and products sold in 172 countries. The company had worldwide sales
of $78 billion during 2013. (Note: All financial figures in this scenario are US dollars.) MTB
employs about 110,000 people worldwide.

MTB has two enterprise data center pairs, one pair in Pennsylvania, USA, some 35 miles (~56 km)
apart, and another pair in the Netherlands about 10 miles (~16 km) apart. It also has six regional data
centers, located in Brazil, Australia, Singapore, South Africa, India, and China. In addition to these
data centers, MTB has 160 remote locations and roughly 105 manufacturing sites all around the
globe, each with varying IT requirements.

You are a solutions architect with HiP Solutions, an HPE partner based in Brooklyn, New York, and
you have an ongoing relationship with MTB.

Pain points
Read about MTB’s pain points below and then answer the multiple choice question that follows.

Over the years, MTB has allowed its business units to shape their own IT solutions, even as it has
tried to wrap some global governance policies around IT in an effort to streamline operations and
improve the procurement process. Practically speaking, however, this has not worked, and MTB is
experiencing issues with its aging data center, such as outdated environments, nonstandard products,
different vendors, and a mismatch of tools. In addition, MTB’s subsidiary companies that manufacture
pharmaceutical products must comply with local laws and regulations.
1. Which statement best describes the insight you gained about the customer ’s pain points?
a. Although MTB wants to improve its global governance of IT, it still has a massive distributed behavior.
b. Replacing MTB’s aging data center infrastructure with HPE solutions will ease the customer’s difficulties while allowing it to
continue functioning with separate business units.
c. MTB has reached the cloud-readiness stage, but needs help moving from a CAPEX to an OPEX model.
d. MTB’s number one priority is to document the different IT governance and procurement policies in various business units.

You can check your answer by referring to Appendix B: Answers to Activities.


HPC, R&D, and Big Data
Read about MTB’s HPC, R&D, and Big Data initiatives and then answer the following multiple choice
questions.

To address MTB’s top-line goal of speeding up manufacturing, the MTB research group is designing
a new manufacturing process that requires unusually high levels of speed and efficiency. This group
is driven by the innovations they have been able to realize from new compute capabilities. As the CIO,
Amita Deva said about the group’s activities, “There is no end to the insatiable demands for every
increment of HPC from this research group.”

The biggest challenge for MTB in this space is maximizing compute power within budget for capital
expenditures, personnel, and facilities. The group has looked at the Open Compute Project (OCP),
which is championed by Facebook and other large IT companies, but members are worried about the
commitment of low-cost manufacturers to a given product platform because MTB’s projects can last
for years.

You have lunch with a friend who is a scientist in the pharmaceutical field. She tells you casually that
the IT department of one of MTB’s local business units is looking to refresh its HPC environment.

Even if you do not know the details of a customer ’s HPC environment, it is still possible to
demonstrate the value of an HPE solution. The HPE Hyperscale Business Value Calculator enables
you to compare by workload a density-optimized solution to a traditional rack system. Using a simple
comparison of an Apollo 6000 solution with a SuperMicro SD-5038ML-H8TRF with no
customizations, including list pricing, you can show 11% TCO savings over three years. This number
should be enough to pique the customer ’s interest.
2. What does this information tell you about HPC and MTB? (Select two.)
a. MTB’s new manufacturing process and the lead from your friend tell you that HPC is a hot topic within MTB.
b. Only one or two MTB business units and operating companies are looking at HPC.
c. If you can bring HPC into the SDDC environment, a flexible pool of HPC resources might benefit MTB in general and also
allow for better control through centralization.
d. MTB may be interested in the HPC solution, but only because you have demonstrated that it may result in cost savings.

3. MTB’s big data environment for clinical trials currently resides on the Teradata platform. This
environment is starting to become a bottleneck, so Teradata has recently submitted a proposal
for expansion. One of your coaches told you that the proposal on the table is for $26M. What
can you do?
a. Explain that MTB would save money by switching to an all HPE platform.
b. Propose a solution that can offload data from the Teradata environment, allowing MTB to extend the life of the current
environment without performing a complete migration or paying Teradata a large amount of money.
c. Do nothing, as MTB is clearly invested in the Teradata platform and it would cost the company more money to integrate
another vendor’s solutions.

You can check your answers by referring to Appendix B: Answers to Activities.

Decision makers
Read about MTB’s key decision makers. Then answer the question.

Knowing that HPC is a hot topic with MTB, you are now entering discussions with key decision
makers. Before you meet, you review what you have learned about them.

You know that MTB has been in turmoil since the chief executive officer (CEO) resigned two years
ago for a career outside MTB. A new CEO joined the company 14 months ago and, as is customary,
brought along his friends, including the new CFO. The four top executives include:
– The CEO, Rick Jaggers, previously worked in the financial services sector. He is eager to prove
his value to the company, which has not had a breakthrough new drug for several years.
– Amrita Deva, the chief information officer (CIO), most recently worked for another large
pharmaceutical company. She is familiar with Teradata, EMC storage, HPE servers, and Cisco
networking. She has been working hard to bring global governance to disparate business units and
to standardize IT services.
– The chief financial officer (CFO) and economic buyer is Denzel Walker. His most recent role was
CFO for a financial services company, where he reported to Jaggers. He has mentioned several
times that he has stepped into a big job, imposing restrictions on ballooning R&D budgets and
trying to sort out why some projects are consuming budget but producing fewer results.
– Janet Choi, the chief technology officer (CTO), previously worked at one of the Big Four
consulting companies. She is a personal friend of Walker ’s and strongly favors HPE servers,
storage, and networking solutions. She is a key driving force behind initiatives for speeding up
development and manufacturing with HPC and is frustrated that she cannot get working
environments up and running as quickly as she wants.
4. Which question is most appropriate for each decision maker? (Match the question to the
decision maker.)
a. Jaggers (CEO)
b. Deva (CIO)
c. Walker (CFO)
d. Choi (CTO)
__Could you tell me more about how developers are using HPC? What do they do when they cannot get the compute resources
they need to run a job?
__What is the biggest stumbling block stopping IT from deploying HPC environments that meet manufacturing’s insatiable
demands at the pace they require?
__A year from now, what do successful R&D and manufacturing departments look like to you? How will they be using HPC to
get products on shelves more quickly?
__I am hearing R&D and manufacturing say that they need more HPC compute power to finish their projects. Would you be
interested in giving them that without expanding the data center physical footprint and power costs?

You can check your answers by referring to Appendix B: Answers to Activities.

Facilities requirements
You will now consider what you need to know about the facilities to ensure that the products that
comprise the solution can be delivered, moved to the data center, and installed successfully.
Discuss requirements with facilities manager
You need to call or meet with the facilities manager and go over the requirements for delivering the
equipment. Ask questions such as:
• What information do you need to provide?
• What is the process for delivering large shipments?
• What is the delivery address?
• Are there packaging restrictions, such as size and weight limitations?
• Is the dock large enough for a semitrailer?
• Who will meet you onsite for the delivery?
• How will the equipment be moved from receiving to the data center? What route will be taken?

Carefully document the answers to these questions and send this document to the facilities manager to
validate the information.

Site survey
You should conduct a site survey. Ideally, ask the facilities manager to meet you onsite and walk you
through the path that the movers will take to move the products from receiving to the data center. If
you can use an elevator, how big is the elevator? Is there a freight elevator? Will the products fit
inside the elevator? If you must move the equipment up or down stairs, how wide are the stairs? How
sharp are the turns?

Measure doorways: How high are they? How wide?

What time can the equipment be moved? Does it need to be moved after work hours?

How many people are required and what equipment is needed to move the products?

Site survey: data center requirements


You should visit the data center and ensure that it meets the requirements for the solution you are
proposing. You need to consider power requirements and environmental requirements. You also need
to determine where the new products will be housed and how they will be arranged. Is there enough
room? Will you need to remove legacy products before you can install the new ones? How will that
migration happen?

What safety regulations must be followed when the equipment is being moved and installed? What are
the security regulations? Do you need a temporary access card to get into the data center?

Data center facility availability tiers


Figure 2-7 Data center facility availability tiers

You might also need to understand the availability level for which the data center facility is designed.
The Telecommunications Industry Association (TIA) has defined a standard for data center design,
TIA-942, which includes architectural, security, electronic, mechanical, and telecommunications
requirements (see Figure 2-7). TIA-942 defines the availability levels for the infrastructure systems
that support servers, such as the power and cooling systems, with four tiers. (The 2014 release of
TIA-942 replaced the term “tier” with “rating”; however, you might still encounter customers who
use the “tier” terminology.)
• Tier 1 provides one distribution system and no redundant components.
• Tier 2 provides one distribution system with redundant components.
• Tier 3 provides multiple distribution systems, only one of which is active, and also redundant
components.
• Tier 4 provides multiple active distribution systems, each with redundant components.

Tier 1 and Tier 2 systems are subject to both planned and unplanned downtime (Tier 2 systems are
less so, due to the redundant components). Tier 3 and Tier 4 systems do not require planned downtime
for system maintenance. Tier 3 systems are still vulnerable to some unplanned downtime, while Tier 4
systems are protected against at least one worst-case event.

Different systems within a data center can meet the requirements of different tiers. The data center ’s
tier, based on the system with the lowest rating, defines a guaranteed availability level:
• Tier 1 = 99.671% or annual downtime of 28.8 hours
• Tier 2 = 99.741% or annual downtime of 22 hours
• Tier 3 = 99.982% or annual downtime of 1.6 hours
• Tier 4 = 99.995% or annual downtime of .4 hours
Chapter 2—Activity 2
In this activity, you will consider the power and environment requirements that you should determine
before designing a solution. You will further consider what you need to ask the IT director about the
placement and arrangement of the products in the data center. Finally, you will consider what you
need to know about the company’s safety and security regulations.

You are performing a site survey to ensure the site is ready for the new HPE equipment. You must
understand the power and environment requirements for the solution you are recommending and
ensure the customer ’s data center can meet those requirements. You must also know where the new
solution will be housed in the data center, and follow the safety and security regulations when you
install the new solution.
1. What power requirements must you consider when installing a new solution in the data center?
2. What environmental requirements must you consider when installing a new solution in the data
center?
3. What should you ask about the placement and arrangement of the products in the new solution?
4. What should you ask about the safety and security regulations?

You can check your answers by referring to Appendix B: Answers to Activities.

Find specific requirements for HPE products

Figure 2-8 Find specific requirements for HPE products

It is important to carefully consider the unique requirements of the equipment you are designing for.
To find these requirements—which vary from product to product—visit the HPE Information Library
at http://h17007.www1.hpe.com/us/en/enterprise/servers/solutions/info-
library/index.aspx#.VtCVkPIrKUk.

Here, you can find user and installation guides for a specific product or solution, which include site
considerations and setup requirements. Figure 2-8 shows a sample of listings for Servers &
Management software.

Summary
This chapter has given you strategies for discussing customer requirements, such as business
continuity and availability, with key stakeholders. You have also considered the customer ’s IT
management processes and the need to assess how ready the customer is for transforming. Finally,
you have considered the logistical considerations you must take into account for delivering, moving,
and installing the solution.

Learning check
Review what you have learned by answering the following questions. Then check your answers in
Appendix A: Answers to Learning Checks.
1. What is most likely to be a concern for an LOB manager?
a. That IT solutions follow best practices
b. That IT solutions meet security standards
c. That IT solutions meet their tactical requirements
d. That IT solutions support automated patch management

2. If a server provides 99.999% availability over a year, how much unplanned downtime can it
experience?
a. 26.3 seconds
b. 5.3 minutes
c. 44 minutes
d. 8.7 hours

For answers, See Chapter 2 in Appendix A.


Chapter 3 Advanced Architecture for Server
Solutions

EXAM OBJECTIVES
• Analyze the special needs of data, High-Performance Computing (HPC), and mission-critical
workloads
• Given a customers’ specific requirements, architect a solution for a data, HPC, and mission-critical
workloads

Assumed knowledge
Before reading this chapter, you should have a basic understanding of the following:
• Design concepts such as server-to-storage ration and scale-out deployments
• Processors, including DDR3 and DDR4 memory, hard disk drives (HDDs), solid-state drives
(SSDs), and RAID levels for storage volumes
• HPE ProLiant rack and blade servers and options for them such as HPE Smart Array Controllers
• HPE BladeSystems, including interconnect modules and Virtual Connect (VC) modules
Chapter topics
In this chapter, you will analyze the requirements for data-driven organizations and learn how to
architect solutions to meet these requirements. You will also consider requirements for HPC and
mission-critical applications and review solutions for each one.

Architecture for data-driven organizations


In this section, you will look at the variety of workloads that have emerged in data-driven
organizations from traditional relational SQL databases to massive object storage solutions. You will
consider the unique needs of each workload and learn about general strategies for meeting those
needs. This discussion will lay the framework for later chapters when you learn in more detail how to
architect the appropriate HPE server solution to meet the needs of various applications and
workloads.

Data management challenges


Knowledge is power: data has become a key resource and potential revenue generator for almost
every industry. But companies find it difficult to harness the complex and vast amounts of big data.
Industry analysts characterize the emerging world of big data in three ways:
• Volume—Data has been growing exponentially for years, and it promises to continue to do so. Data
is transforming from being counted in terabytes to being counted in zetabytes, and analysts estimate
that people will have generated 40 ZB of data by 2020. To deal with the challenges of volume,
customers require scalable solutions.
• Velocity—Users are constantly generating new data from which companies need to extract real-
time value. Enterprises do not have the luxury of replicating different types of data from globally
distributed sources into a single data warehouse and processing it every night. Cost and time
limitations drive companies to process data more efficiently and even in real-time in some cases.
Only in this way can the company gain true business value from its data. High-velocity data
demands high-performance technologies, systems that can process data instantly in shrinking time
windows, and systems that can scale on demand.
• Variety—Adding to the complexity of managing big data is its variety. In addition to structured
data in relational databases, companies must store millions and even billions of unstructured data
objects.

Structured data is organized in a way that facilitates automated processing and searching.
Unstructured data is not organized and does not facilitate automated processing or searching. Such
data is becoming more and more common; in fact, roughly 85% of data today is unstructured.
Examples of unstructured data include voice mail, memos and other correspondence, meeting notes,
image files, audio and video files, and email messages.

Companies need solutions that can store each type of data efficiently.

Some analysts add other Vs, including veracity (the accuracy of data, added by IBM) and value.
However, most focus on volume, velocity, and variety.
Note that these data challenges apply not only to big data, usually associated with Hadoop, but to all of
a company’s data assets. The following sections discuss many types of data workloads.

Scaling models required by different workloads

Figure 3-1 Scaling models required by different workloads

To fully leverage all of its data assets, a company must use the technology best suited for that asset’s
value and data personality. You can classify data applications in two broad categories: ones that
require scale out compute and ones that require scale up compute, as shown in Figure 3-1.

Scaling out involves deploying a high density of less powerful servers. The scale out model has
become popular in recent years because it often provides more flexibility and allows companies to
grow in a more cost-effective manner.

However, scale-up solutions still have a role to play for the right applications and workloads. These
solutions use powerful servers with a large number of processors, a large memory capacity, and
perhaps a great deal of storage. They deliver high performance, high availability, high reliability, and
disaster tolerance for mission-critical workloads.

Parallelization on scale-up and scale-out systems


Figure 3-2 Parallelization on scale-up and scale-out systems

Parallelization lets an application take advantage of the compute resources that are available in either
a scale-up or a scale-out solution. Some applications are easily parallelized while others can only be
partially parallelized, if at all. For example, a big data analytics application might need to analyze
millions of records to find each record that mentions a specific word. The application can easily split
the task into smaller tasks, each of which analyzes a different set of records. This type of task is called
embarrassingly parallel. If, however, the application then needs to total the number of mentions, this
final part of the task cannot be parallelized because it depends on the completion of previous tasks.

Application designers can make demanding applications run more quickly on scale-out systems using
distributed computing in which a resource scheduling mechanism divides a process into multiple jobs
and assigns jobs to different servers. In later sections about data workloads that require scale-out
compute, you will see the different approaches that applications can take to distributing tasks to nodes.

A scale-up system has many processors with multiple cores, each core essentially being a processor
that shares memory with other cores on the same processor. Even a server in a scale-out system might
have two processors and several cores on each processor.

Multi-threading lets the server take advantage of all of these processors and cores.

A workload essentially consists of a series of operation instructions that a processor core executes in
order. Instruction level parallelization, illustrated in Figure 3-2, lets the server assign some of the
operations in the series to different processor cores to execute in parallel. The server can only do this
when the results of the operations do not affect each other. Therefore, instruction level parallelization
can only take advantage of a limited number of processor cores.

An application running a single system can implement a higher degree of parallelization through
multi-threading. The application process creates multiple threads for each part of the task, and each
thread executes on a different processor core. (Time splitting lets multiple threads share a core, but
each thread then takes longer to execute.) For example, a SQL database assigns a different worker
thread to handling queries from each concurrent user.

Different applications support different levels of multi-threading, depending on how easily


parallelized they are, as well as on how developers chose to program the application.

Modern data applications and workloads that require scale-out compute


Figure 3-3 Modern data applications and workloads that require scale-out compute

Now that you have an idea about some of the differences between scale-out and scale-up models, you
will examine data applications that require scale-out compute (shown in Figure 3-3).

Each application is best served by a specific technology:


• Object storage allows companies to achieve massive content storage.
• Virtualized storage helps to make block and file storage more cost effective and simpler to
manage.
• Hadoop is designed for analyzing big unstructured data or big data.
• Not only SQL (NoSQL) databases provide simple databases for large amounts of unstructured data.

Each technology is optimized across scale, performance, and cost efficiency attributes to deliver a
specific value proposition. The following sections explain each technology in more detail,
characterizing the workload and explaining how scale-out compute best meets the workload needs,
providing
• Distributed performance that can be aggregated across scale-out building blocks
• Density optimization that reduces the data center footprint and power consumption
• Direct-Attach-Storage for eliminating storage complexity and achieving better performance
• Configuration flexibility to reconfigure storage and compute ratios as necessary (as you will learn
more throughout this ebook)

Comparing block, file, and object storage


Figure 3-4 Comparing block, file, and object storage

Before you look at technologies for supporting block, file, and object storage, take a moment to
compare these types of storage that you see in Figure 3-4.

Block storage

Block storage allows devices such as servers or virtual machines (VMs) to access data on remote disk
arrays at the block level. A storage area network (SAN) connects the devices together through a
technology such as Fibre Channel (FC), a networking technology separate from Ethernet: FCoE, a
technology that encapsulates FC for transmission over Ethernet; or iSCSI, a TCP/IP protocol that can
run on Ethernet. The device is called an initiator; controllers for the disk arrays are called targets.
The SAN helps an initiator discover targets and logical unit numbers (LUNs) on those targets, a LUN
being a part of a disk drive, a disk drive, or a RAID set.

To the initiator, the LUNs appear as local drives, which they are allowed to access at the block level.
The FC, FCoE, or array stores raw storage volumes, and the initiator imposes the file system. Block
storage provides high performance for use cases such as providing the boot image for VMs.

File storage

File storage, or network attached storage (NAS), follows a client/server model. A NAS server hosts
the data, as well as a file system for that data. The NAS itself might store the data on direct attached
storage (DAS) or block storage accessed through a SAN. When NAS clients connect to the NAS
server, their OS views the file system as a mounted volume. When the client needs to read or write to
a file, it must send the request to the NAS server, and the client interacts with the data at the file level,
rather than the block level. The NAS server is responsible for serving the data and ensuring
consistency as multiple clients connect.

File storage does not provide as high a performance as block storage, but it is better suited for
situations in which multiple clients need to share access to a file system. Traditionally, NAS only
supports a single server or two servers acting in an active/passive failover design, so the NAS server
could be a bottleneck for read and write IOPS. NAS clusters help to alleviate this issue.

Object storage
Object storage also follows a client server model. A cluster of object servers store data as generic
objects over an underlying file structure. (You will learn more about objects in the next section).
Clients can read and write data at the object level. Generally, a client application performs the read
and writes rather than the OS treating the data as data on a mounted volume. Therefore, applications
must be aware of the object storage solution. However, some object storage solutions allow the OS to
view objects as files on a mounted volume.

Object storage for mass content: Object definition


You will now look at object storage in more detail, examining how it meets the needs for storing
massive amounts of content.

An object provides a flexible way to store data of any type or size. This unstructured data might be
voice mail, memos and other correspondence, meeting notes, image files, audio and video files,
email messages, or any other type of data. In addition to the data, the object includes metadata, which
provides contextual information about the object. The customizable metadata might specify
information that helps to index the object, that informs clients about the object’s usage, that marks the
data as confidential for a specific user or security group, and so on. Each object also has a unique ID.
Unlike a file system, objects are stored without any hierarchy, making the flat object solution very
scalable.

Object storage for mass content: Example architecture (OpenStack Swift)

Figure 3-5 Object storage for mass content: Example architecture (OpenStack Swift)

OpenStack is an open source system for providing infrastructure as a service (IaaS) cloud computing.
The OpenStack Swift component provides cloud-based object storage. In the Swift architecture
(illustrated in Figure 3-5), a cluster of object storage servers hosts the storage devices, which are
generally DAS. These devices form a ring, which does the following:
• Lists the location of each storage device (the IP address and TCP port for the object server and the
physical device ID)
• Maps partitions to the storage device that should hold that partition—More precisely, the ring maps
a replica of a partition to the storage device; each partition has three replicas (by default) on
different devices.
• Specifies the length for hashes—When a server needs to determine the partition for storing an
object, it hashes the object ID, which produces the ID for the partition to be used.

In short, the ring provides object storage servers all the information that they need to replicate objects
and distribute them across each other.

All client requests for objects begin with a request to a proxy server for the object location. The
proxy server also stores the ring so that it can inform the client of the location. The client then reads
and writes to the object through a direct interaction with the object server.

Swift further defines containers that store listings for the objects. The container can define policies
for storing data such as how data is replicated, allowing organizations to set up different tiers of
service. Accounts store lists of containers. An account corresponds to a tenant, permitting multi-tenant
cloud solutions.

Several object storage solutions exist, each with its unique architecture. However, most solutions have
many features in common with the Swift architecture, including a map that helps replicate and
distribute objects across many storage servers, proxy servers for informing clients of the location of
objects, and the ability for clients to send object requests directly to object storage servers.

As this description has made clear, object storage is optimized for extreme scale. Object storage
servers use relatively simple, cost-effective DAS. They follow a simple model for distributing data—
and object delivery services—across many servers, making a density-optimized, scale-out solution a
natural fit.

Block and file storage: Virtualized block storage

Figure 3-6 Block and file storage: Virtualized block storage

Virtualized block storage, shown in Figure 3-6, can offer a more cost-effective solution than a
traditional SAN. A traditional SAN storage array that has adequate capacity can be relatively
expensive, and establishing a SAN can be quite complex. HPE StoreVirtual makes it possible for
customers to replace SAN storage arrays with cost-effective servers with DAS. The StoreVirtual
solution enables the servers to act as iSCSI targets, providing block storage to initiator servers or
VMs over an Ethernet network.

Block and file storage: Virtualized file storage

Figure 3-7 Block and file storage: Virtualized file storage

Similarly, HPE StoreEasy helps to make NAS simpler and more cost-effective, as you see in Figure
3-7.

A standalone NAS server has to scale up expensive hardware and still has the potential to become a
bottleneck. HPE StoreEasy provides NAS clustering so that relatively cost-effective and simple to
manage scale-out servers can serve a large number of NAS clients. For storage, these servers can use
an HPE StoreVirtual solution or a block storage array; the StoreVirtual option delivers a solution
built on cost-effective DAS.

Hadoop 2 for unstructured data analytics: Apache Hadoop 2 architecture

Figure 3-8 Hadoop 2 for unstructured data analytics: Apache Hadoop 2 architecture

You should understand a bit about how Hadoop is architected (see Figure 3-8) so that you know which
HPE solutions to position for various components in a big data solution.

At the foundation of the architecture lies the data itself. Hadoop Distributed File System (HDFS)
handles distributing the data across storage nodes in a variety of file formats including CSV, JSON
record Optimized Record Columnar (ORC), and Parquet files. HBase might run on top of the file
system, organizing the data into a columnar map or NoSQL database. Data processing applications
such as MapReduce2 (MR2) applications query the file system or database in order to complete data
analysis jobs.

Originally, MapReduce was the only framework for Hadoop data processing applications, and it was
responsible for scheduling analysis jobs as well as completing them. In Hadoop 2, yet another
Resource Negotiator (YARN) has taken over the scheduling functions. In addition, YARN permits the
integration of other frameworks for data processing applications into the same resource scheduling
framework.

You will now look at the components of this architecture in more detail.

Hadoop 2 for unstructured data analytics: HDFS

Figure 3-9 Hadoop 2 for unstructured data analytics: HDFS

HDFS runs on a cluster of storage or data nodes. Designed for a scale-out approach, HDFS distributes
the vast number of files required for a big data solution across multiple data nodes. To provide fault
tolerance for single nodes, HDFS replicates data, typically by a factor of three. In other words, each
file is stored on three different nodes, as shown in Figure 3-9.

The nodes work together to handle data replication, reads, and writes. A name node stores the
filesystem Meta data. When an application needs to access a file, it contacts the name node, which tells
the application which data nodes store the file. Because the responsibility for responding to requests
to read and write files is distributed across many nodes, the solution scales well.

YARN Hadoop 2 for unstructured data analytics

Figure 3-10 YARN Hadoop 2 for unstructured data analytics


Figure 3-10 illustrates that data processing or data analytics applications run on compute nodes using
data in the HDFS.

Figure 3-11 YARN Hadoop 2 for unstructured data analytics

These applications are designed for parallel processing, so they require a solution to assign pieces of
a task to available nodes, shown in Figure 3-11. YARN and the application running on YARN work
together to provide that solution. The YARN Resource Manager schedules the running of an
application with a Node Manager on a Hadoop compute node. The Node Manager creates a container
with the CPU, memory, and bandwidth resources allocated for the task. The container runs an
Application Manager, which is specific to the particular YARN application. The Application Manager
is responsible for dividing the task into pieces that it can assign to other compute nodes. Those
compute nodes’ Node Managers check with the YARN Resource Manager to determine which
resources they can allocate to the task.

Hadoop 2 for unstructured data analytics: YARN applications

Figure 3-12 Hadoop 2 for unstructured data analytics: YARN applications

This architecture will make more sense when you consider an example, illustrated in Figure 3-12.
MR2 is a common model for applications. A retail company might have an MR2 application that helps
it to analyze customer shopping patterns. This application could run a query about the average
number of days until shoppers made their next purchase, based on what product they originally
purchased.

The first step of the job would involve mapping. Every compute node would be assigned a series of
customer records to analyze. The compute node would fetch those records from storage and make a
result file that maps keys (in this case, product bought) with values (in this case, the number of days
until the next purchase).

For the next part of the analysis, the compute nodes need to shuffle their results such that the same
compute node has the results for all the same keys. For example, one compute node might be assigned
all of the results for books. This step is required so that the compute node has all necessary
information for running its reduce step—in this case, totaling the number of days until the next
purchase after all book purchases and then calculating an average.

Apache Spark is another general framework for applications that use parallel computing to process
large data sets. Spark can run processes similar to MapReduce, as well as other types of processes,
using a general framework in which applications apply actions to collections called resilient
distributed datasets (RDDs). The application creates these RDDs by pulling in HDFS files or by
transforming other RDDs with filters. Spark is optimized to speed up processing for frequently
accessed, “hot” data, which is placed in the compute node’s memory. For some purposes, Spark can
operate much more quickly than MapReduce applications, particularly for in-memory processing.
Spark can use YARN as its resource management component and draw on data stored in HDFS. It can
also use different resource managers such as Apache Mesos.

As you see, YARN applications might require relatively fewer or more processing and memory
resources. Those resources are distributed across many nodes for extreme scalability and efficiency.
Throughout this ebook, you will learn about flexible approaches for designing the right relationship
between compute and storage.

Simple NoSQL databases

Figure 3-13 Simple NoSQL databases


HDFS is designed primarily to deliver files to data processing applications that analyze “cold” data
(data that is stored over time and accessed relatively infrequently) and that do not need to return
immediate results. However, some applications need to produce results more quickly. In addition,
rather than analyzing long sequences of records, some applications might need random access—the
ability to read and write to different parts of files.

Rather than run directly on HDFS, an MR2, Spark or other data processing application can run on a
database. That database helps to organize the data within a file system, and the proper type of database
can significantly speed up operations for applications that require low latency read/writes to files.

You are probably familiar with SQL databases, the primary form of relational databases. A relational
database consists of rows, each of which is called a record or a key, and columns with values. For
example, a record might be a customer account. Columns could be customer name, email address,
pending transaction IDs, and so on. In a relational database, every row has a value for every column.

Relational SQL databases work with structured data, but Hadoop is designed for handling
unstructured data. In addition, relational databases are optimized for reading and writing to a record
as individual transactions.

NoSQL databases are designed to organize structured and unstructured data. NoSQL databases can
take different forms, but in essence, they map a row, or a key, to values in a less rigid way than
relational databases. Take HBase, the NoSQL database for Apache Hadoop, as an example (see Figure
3-13). Based on Google BigTable, HBase allows developers to create flexible tables (or maps) that
meet their needs. The table has fixed column families, which organize related columns together.
However, new columns can be freely added, and a row (or a key) can have values for whichever
columns the developer chooses. The table is stored sparsely, meaning that when a row does not have a
value for a column, the column simply does not exist for that row, and no space is consumed for it.

A NoSQL column-oriented database is optimized for analytics, so you will often encounter customers
who require such databases as part of their big data and analytics solution.

The HBase database is distributed across HBase Region Servers, each of which holds part of the
database and is responsible for handling read and write queries to that part of the database. The
Region Server operates as much as it can in memory, so the compute node that runs a Region Server
requires generous memory. This scale-out approach provides extreme scale and efficiency.

Cassandra, another example of a NoSQL database, differs in some ways, but similarly consists of
rows with values for a flexible number of columns that are organized into column families (called
tables in Cassandra). Cassandra can run on HDFS or on a different file system such as Cassandra File
System (CFS). And Cassandra also distributes parts of the database across compute nodes.

Design decisions for Hadoop and NoSQL databases that run on HDFS
Now that you understand the characteristics for each scale-out application’s workloads, you can look
at the design decisions for some of the more complicated applications. For Hadoop and NoSQL
databases that run on HDFS, you must choose:
• Whether the data and compute nodes are colocated or not
• How to plan the ratio of compute resources versus storage resources

You will examine each of these decisions in the next few sections.

Traditional architecture: Colocated compute nodes and storage nodes

Figure 3-14 Traditional architecture: Colocated compute nodes and storage nodes

Traditionally, Hadoop has operated under the principle of bringing compute to storage for data
processing. That is, compute nodes are colocated on the data nodes in the form of servers with direct
attach storage (DAS), as shown in Figure 3-14. A YARN application can then assign a piece of a job to
a node that stores the data for that job locally.

This architecture made sense in the past when network bandwidths did not allow moving remote data
to compute nodes in a timely manner. Remember: processors can operate on data most quickly when
the data resides in memory, and next most quickly when it is on local storage. Remote storage
traditionally provides the slowest access.

However, this architecture has led to many inefficiencies as companies’ data and data analysis needs
have expanded. Colocating the compute nodes for an application with the data nodes constrains the
data to that application. However, a company’s needs are rarely met by one application. Therefore,
isolated clusters with the same data proliferate with one running a MapReduce applications, another
running Apache Spark, and so on. Companies are already dealing with data explosions, and this
inefficient model leads to unnecessary duplication, expense, and management complexity.

In addition, the traditional architecture treats compute and storage as one unit, so the two are forced to
scale together. Traditionally, IT has scaled solutions with one spindle, or disk drive, per processor
core. However, some workloads are computationally intense and would benefit from more cores per
drive. Some applications such as Apache Spark applications might benefit from more memory. Other
applications might benefit from more compute power. With the traditional architecture, you cannot
design to meet these particular needs.
HPE Big Data Reference Architecture: Optimized compute nodes and
storage nodes

Figure 3-15 HPE Big Data Reference Architecture: Optimized compute nodes and storage nodes

HPE has discovered that modern Hadoop applications experience better performance when compute
nodes and data nodes are separated, as shown in Figure 3-15. This model allows you to design a
compute layer with HPE servers that are optimized for intense data processing and analytics
workloads. Now you can select servers that have the compute or memory resources that the particular
analytics application requires—without worrying about the server ’s storage capacity. Equally, you
can design a storage layer with HPE servers that are optimized for storing and delivering data. High-
speed 10 GbE Ethernet provides high enough bandwidth that bringing data to the compute nodes does
not interfere with performance—in fact, this fabric can provide higher bandwidth than some local
storage subsystems.

This model returns flexibility and scalability to the data center. If the customer requires more
compute power, you can add compute nodes to the solution. If the customer ’s data expands, you can
scale the storage nodes. Perhaps even more crucially, you can avoid creating isolated clusters. If the
customer has multiple analysis applications, you can plan a cluster of compute nodes for each while
allowing the clusters to share the same storage.

HPE has also found in testing that this model can enhance performance, increasing read IOPS by as
much as 30%.

How to plan the ratio of compute to storage


With the HPE Big Data Reference Architecture, you now have the choice for how to balance compute
resources and storage resources. The traditional Hadoop guidelines—about one core per drive—can
give you a starting point for planning. However, you will need to consider the particular needs of
your customer.

Factors that affect the ratio of compute to storage include


• The type of analytics that the customer intends to use
• The number of applications and jobs that the solution must handle
• How quickly the customer requires results

How to balance compute resources versus storage resources: Application


requirements

Figure 3-16 How to balance compute resources versus storage resources: Application requirements

You can classify data analysis tasks into two categories: CPU bound or IO bound tasks (shown in the
table in Figure 3-16). When most tasks are intense CPU bound ones, you want a higher ratio of
compute to storage node. If tasks are IO bound, you can have a more traditional balanced compute to
storage node ratio. For the IO bound tasks, if you are using the HPE reference architecture with
separated compute and storage nodes, keep in mind the need for 10 GbE speeds between compute and
storage nodes.

Certain Hadoop frameworks for applications also require relatively more compute or memory
resources. These include
• Hive—Hive is a data warehouse that acts much like a structured, SQL database built on top of
HDFS. Hive provides metadata and indexing that can help to speed analyses and queries.
• Spark—Spark, as mentioned previously, is an alternative application framework to MapReduce,
optimized for faster, more random queries.
• Solr—Solr provides indexing and searching for data in HDFS.

How to balance compute resources versus storage resources: Usage


requirements
Also consider how the customer plans to use data analysis. Is the customer ’s big data solution
primarily for archival with occasional analysis jobs? If so, you can plan a lower compute to storage
ratio. Or does the customer plan to run many queries and analysis tasks at once? In the latter case, you
must raise the compute to storage ratio so that enough compute nodes are available to run the
processes.
Discuss, too, how quickly the customer needs results. The more quickly results are required, the more
compute power and memory per TB, you must provide.

Modern data applications and workloads that require scale-up compute

Figure 3-17 Modern data applications and workloads that require scale-up compute

You will now turn your attention to data applications and workloads that require a scale-up approach.
As you see in Figure 3-17, these include structured databases used for business transactions, as well as
in memory databases. Both of these types of databases require the extreme performance, high
availability, reliability, and disaster tolerance provided by a scale-up approach.

Structured database
Structured or relational databases consists of related tables. A table includes rows (which are called
records) and fixed columns (which define parameters for those records). For example, a relational
database might store customer records for a retail organization. Columns might include first name,
last name, phone number, and so on. Every record has a value for every column.

Applications can read and write data to the relational database using Structured Query Language
(SQL). SQL databases are by far the most common form of structured, relational database.

Customers often use structured databases for business operations. These databases must support
complex online transactional processing (OLTP). They might also be used for complex online
analytic processing (OLAP). The next sections describe these workloads in more detail.

Because the business operations supported by OLTP are often mission critical, the databases require a
high performance infrastructure optimized for high Availability, Disaster Tolerance, and business
Continuance.

Structured database: OLTP


Applications can interact with databases in two ways: using online transaction processing (OLTP) or
online analytics processing (OLAP). You will examine OLTP first. OLTP or transactional applications
involve small, simple insert and delete operations to structured databases. A user making a purchase
from an online retailer is an example of an online transaction. Data entry is another example.
OLTP databases typically have high-performance demands. The process might divide into many
threads to handle many users and their transactions with the database concurrently. The application
must be responsive and able to read and write data quickly because users typically interact with it in
real time. Therefore, the multiple threads benefit from multiple processor cores, speeding their
response time. Because OLTP applications are multi-threaded and must maintain data consistency,
they work well with a scale-up model. In addition, they relate to critical business operations, making
safeguards against data loss or corruption crucial.

Structured database: OLAP

Figure 3-18 Structured database: OLAP

OLTP can be complemented by OLAP, which analyzes data in order to extract business intelligence
from it. For example, OLAP might help a company to analyze customer records in order to make
better decisions about how to attract customers. Applications such as SAP Customer Relationship
Management (CRM) often rely heavily on the insights from OLAP.

OLAP typically works with large datasets over a longer period of time than a fast OLTP transaction.
Relational OLAP (ROLAP) runs queries on an OLTP database. However, an OLTP database is
optimized for simple deletes and inserts to rows. Therefore, ROLAP does not provide the best
performance for complex queries.

Multi-dimensional OLAP (MOLAP) can combine and slice data in different ways, permitting complex
queries and analysis. It operates on data in a data warehouse, which is a structured database that is
designed to accommodate the different needs of analytics and BI. For example, the database is often
column oriented. Companies must move data from the OLTP database to the OLAP warehouse using
an extract, transform, and load process on a daily or weekly basis (as illustrated in Figure 3-18).

To support the complex queries, OLAP data warehouses also require a high-performance, high-
availability scale-up model.

In-memory database for real-time analytics


Figure 3-19 In-memory database for real-time analytics

Innovative new in-memory databases are designed to provide faster and more powerful analysis. SAP
HANA is the most common example of an in-memory database, although some customers might use
the in-memory capabilities of structured databases such as Microsoft SQL and Oracle.

As you learned, OLAP traditionally requires replication of datasets from an OLTP database, which
takes time. Because companies only replicate the data periodically, queries run on out-of-date data.
SAP HANA resolves this issue by establishing a single database for OLTP and OLAP. The SAP HANA
database appears as one database to users, as shown in Figure 3-19. However, it includes a component
optimized for OLTP and a component optimized for OLAP, into which up-to-date data is streamed. An
in-memory database holds OLAP datasets in memory. Because processors can operate on in-memory
data much more quickly than they can on data on a local or remote disk drive, analysis runs much
more quickly and users can receive real-time results.

Such databases require vast amounts of memory, of course, and generally high performance. If they
support mission-critical processes, they must also provide high levels of reliability and availability.
Thus, in-memory databases are suited to scale-up infrastructure.

Summary of compute requirements to address data challenges

Figure 3-20 In-memory database for real-time analytics


The table in Figure 3-20 provides an at-a-glance summary of the characteristics of the workloads that
you have explored throughout this section.

Optimized compute solutions for data-driven organizations

Figure 3-21 In-memory database for real-time analytics

Figure 3-21 shows a summary of the HPE ISV partners who provide the different types of
applications that you have explored. HPE also provides cloud and software solutions for data-driven
organizations; however, these are not the focus for this ebook. Also note that HPE provides services
—and, of course, you can deliver your own services to help customers meet their availability
requirements.

HPE optimized compute portfolio for data driven organizations

Figure 3-22 In-memory database for real-time analytics

HPE delivers the optimized infrastructure for these applications. Figure 3-22 summarizes the
solutions optimized for scale-out compute, including:
• HPE Apollo 2000, 4000, 6000, and 8000 Systems
• HPE Moonshot Systems

It also shows the compute solutions optimized for scaling up, including HPE Integrity Superdome X
Systems. HPE scale-up rack servers and Integrity blade servers can also provide scale-up compute,
but they are not the focus for this ebook.

Architecture for HPC


You will now look briefly at the architecture for HPC applications.

High performance computing (HPC)


HPC uses extremely complex computations to solve complex problems. HPC applications can model
systems in which many factors interact in many ways in order to predict how the systems will behave.
For example, an HPC application might model weather systems and predict that you will need an
umbrella that night. Another HPC application might simulate an electronic chip to help engineers
assess the design and improve the design.

To perform these computations, an HPC application requires vast amounts of computing power. This
power is typically measured in floating-point operations per second (FLOPS); a floating-point
operation is any operation that involves numbers with decimal points. HPC applications require
systems that can perform at the level of teraFLOPS. (For some applications, you will need to know
both the single-precision FLOPS, which refers to the rate for operations on 32-bit numbers, and
double-precision FLOPS, which refers to the rate for operations on 64-bit numbers.)

HPC clusters

Figure 3-23 HPC clusters

Today, HPC applications often run on clusters of powerful servers, each of which contribute
processing power and memory to the overall task. A cluster consists of one or more management
nodes and the worker nodes (see Figure 3-23). The compute nodes (sometimes called worker nodes)
are the servers that contribute their processor cores, accelerators, memory, and disk space to
performing computations. Each node runs a cluster-capable OS such as Linux CentOS or Microsoft
HPC Pack 2012, which acts as the platform for the HPC application or applications and also enables
the node to communicate with the other nodes.

Many HPC applications are programmed to break down jobs into smaller tasks, which might run at
least partially in parallel. To run such a job correctly, the nodes must communicate closely.
Applications use libraries known by the cluster OS—most commonly Message Passing Interface
(MPI)—to program these communications.

In other words, HPC often takes a scale-out approach similar to the approach that you examined with
big data analytics. However, HPC focuses on processing power and complex computations on smaller
sets of data.

Often, many users need to use the HPC cluster to run thousands of computations, or jobs, a day. Some
jobs might take hours to complete, and the cluster has a finite set of resources. If users had to
manually initiate a job on a set of compute nodes, they would have to constantly ask each other which
resources they can use and interfere with each other ’s work. A job scheduling or workload
management program allows users to request HPC jobs and manages the assignment of available
compute nodes to the jobs. The program might assign one or more nodes to a job. Some programs
can assign a specific processor on a multi-processor server to job or even assign a core on a
processor. The program might also be able to match a job to a node with the proper resources, such
as a minimum processor speed or RAM size.

The scheduling program is only responsible for initiating the job on the right resources. After a
parallelized application begins to run on the assigned worker nodes, MPI (or a similar interface)
handles the synchronization of the job.

Examples of scheduling programs include Adaptive Moab, Altair PBS Professional, and UNIVA Grid
Engine.

HPE optimized compute portfolio for HPC


Figure 3-24 HPE optimized compute portfolio for HPC

The HPE Apollo 6000 and 8000 Systems are optimized for HPC at midrange and large scale, while
the Apollo 2000 Systems can provide good HPC solutions at a smaller scale (see Figure 3-24). These
modular solutions deliver vast amounts of computing power in a small physical footprint with power
and cooling efficiency. Customers can easily scale out enclosures populated with mix and match
compute options tailored to specific requirements. You will learn how to plan an HPC cluster using
these solutions in the next chapter.

Architectures for mission-critical applications


In this section, you will consider how to use redundancy and resiliency in scale-up and scale-out
architectures. You will also learn about RAS, which stands for reliability, availability, and
serviceability.

Meeting availability requirements

Figure 3-25 Meeting availability requirements


When designing and planning availability for the server solution, you should be familiar with the
concepts of resiliency and redundancy and their relationship to each other and to availability.
• Redundancy—The inclusion of multiple components that provide the same function
• Resiliency—The ability to quickly adapt to change and to recover from errors such as hardware
failures

The two concepts are closely related. Redundancy provides the foundation for resiliency while
resiliency ensures that the redundant components do not go to waste by automatically adapting to
failures and quickly accepting the viable, redundant alternative. For example, a server might have two
redundant NICs, but it is only when the integrator sets up NIC bonding that the server can take
advantage of the redundancy. Similarly, RAID lets a server ’s storage controller distribute multiple
copies of data across disk drives so that a drive failure can occur without data loss.

You should also consider how the scaling model affects the best way to deliver availability. In a scale-
up model, each server provides a critical service that other servers cannot. The server hardware
should be optimized for reliability, availability, and serviceability (RAS). (The next sections describe
RAS in more detail.) In a scale-out architecture (illustrated in Figure 3-25), on the other hand,
multiple servers fulfill the same function, building in greater availability for the solution as a whole.
Typically, the cluster of servers can tolerate the loss of one node with minimal impact on the overall
service delivery.

Defining RAS
A server optimized for RAS must deliver reliability. That is, it must detect and correct errors to
ensure that data is never lost or corrupted. Further, the server must identify and contain uncorrectable
errors, signaling other components so they can take the appropriate action.

The server must also provide availability, guaranteeing uninterrupted operation. Redundancy built
into the hardware—extra processors, extra DIMMs, extra network adapters, and so on—help to
protect from unplanned downtime; however, the server must also have the resiliency to instantly and
automatically fail over to a redundant component if an active one fails or must be deactivated.
Further, the server must be able to isolate failing components to prevent issues from spreading. The
system might also need to provide clustering features that allow for upgrades and maintenance on a
single node without affecting the service.

Finally, the server must be serviceable. As well as handling failed components reactively, it should
use predictive analysis to identify potentially failing components, deactivating these components so
that the system can continue operating with the healthy ones without data loss or corruption. System
partitioning should isolate workloads, making it simpler to maintain one workload without affecting
others. As much as possible, the server should heal itself so that it can continue functioning until
replacement components are installed. The system should also allow for hot-pluggable replacements
that allow uninterrupted service.

RAS hardware features


To support mission-critical workloads, a server needs RAS features embedded throughout the
hardware. Each system should work to ensure data integrity, to proactively detect errors, and to
mitigate potential issues before they cause data loss or downtime.

The server should have a processor such as an Intel Xeon E7 processor that is designed for RAS.
When a traditional processor detects a data error that it cannot correct—whether data in the memory
or cache or data crossing a system bus—the processor produces a “Machine Check Exception” that
can crash the system. A processor designed for RAS, on the other hand, will not produce an exception
and crash. Instead, it will flag the bit with the error in order to contain the error and to inform the
firmware and OS of the problem. The processor should also provide additional features for
detecting, flagging, and containing various types of errors so that they do not propagate over the
network or to storage. Many of these features involve informing firmware of the issue and having the
firmware handle the error. Therefore, it is critical that the firmware supports the RAS processor
features; otherwise, the server will not benefit from them.

Enterprise servers typically have DIMMs that support error code correction (ECC). ECC uses extra
bits to encode data along with parity information so that if a bit is corrupted, the memory can detect
the problem and recover the bit. ECC protects memory from single-bit errors, in which one bit is
flipped due to issues such as background radiation or a failing DRAM. ECC can also detect, although
not correct, double-bit errors. This capability is called single error correcting and double error
detecting (SECDED). For mission-critical workloads, though, SECDED is not enough. The memory
must proactively detect multiple-bit errors and prevent them from accumulating. In addition, it must
protect from persistent errors (such as those caused by a failing DRAM as opposed to background
radiation). Persistent errors can cause multiple-bit errors to accumulate, resulting in corrupt data and
a potential system crash. The memory must be able to deactivate the failing DRAM so that the DIMM
can continue to function without data corruption using the healthy components.

All hardware paths within the server must work to ensure reliable data delivery. Transmitters should
resend data if they do not receive acknowledgements from receivers, and receivers should use cyclic
redundancy checks (CRC), a short code added to the data that will no longer be the same if the data
changes, to verify the received data’s integrity. The hardware should detect issues on a path and take
steps to create a path that avoids bad wires.

Other system components such as system clocks should be fully redundant and hot-swappable.
Finally, all power and cooling systems should have redundant components so that the system can
continue running optimally even if one or more components fail. Fans and power supplies should be
hot-swappable to support simple serviceability.

RAS software features


Some of the hardware features mentioned in the previous section involved flagging errors to be
handled by the server firmware or OS. Thus, the firmware composes an integral part of the server ’s
RAS features. In addition to helping to isolate and contain errors, the firmware should provide
analysis engines for monitoring all hardware components. By detecting failing or failed components
early, the firmware can prevent those components from causing issues. It deactivates the faulty
component and perhaps helps the server instantaneously fail over to a redundant component. The
system can then continue to operate using the healthy components without risk of downtime or data
corruption until the installation of replacement components. For the server to continue working
optimally, of course, the server must be designed with redundant components throughout. For
example, it should have more processor cores than required for the workload in case some must be
deactivated.

This section has given you an overview of the type of hardware and software RAS features that
mission-critical workloads require. You will examine specific RAS features in Chapter 8 “HPE
Integrity Superdome X.”
Chapter 3—Activity
You will now return to the MTB scenario introduced in Chapter 2—Activity 1. You will learn more
about MTB’s initiatives and begin to assess ways to help MTB fulfill these initiatives.

Last month, one of your colleagues held an executive briefing at MTB, which Jaggers, Deva, Walker,
and Choi attended. At this briefing, you learned that MTB has decided on a new IT strategy:
• A software-defined data center (SDDC) is MTB’s future direction. The executives considered a
cloud solution, but they decided to aim toward SDDC.
• MTB is reworking its data center strategy.
• Deva will issue an RFI in the next few weeks.
• Walker and Choi are investigating their manufacturing execution system (MES), which is built on a
transactional database. Employees complain that the system is not always responsive or available,
so they cannot use it the way that it is intended.
• After fixing the issues with MES, Choi wants to enhance the solution with Business Intelligence (BI)
analysis.
• HPC is another avenue that Deva is investigating. Currently, the R&D facilities of various operating
companies within MTB purchase and manage their own HPC environments without the
involvement of MTB’s central IT. HPC clusters that have grown organically offer different levels
of service, some performing well and others less so. Expanding clusters are causing IT sprawl.
• Manufacturing departments are becoming interested in wading into big data analytics. Although
R&D facilities are using Teradata big data environments, the license period is ending. Also
manufacturing IT members are biased toward open source frameworks, and they want to use
Hadoop on their choice of infrastructure. They have a lot of unstructured data that it would like to
start storing in a more scalable way immediately. But they are still working on fully identifying
their analytics needs and developing applications.

Now answer the following questions.


1. What approach would you recommend that MTB takes for increasing the responsiveness and
availability of the MES solution? Also, what would help MTB continue to scale in the future?
2. How well does MTB’s current approach to deploying HPC applications fit with its desire to move
toward SDDC? Should MTB change its approach and, if so, how?
3. What type of server infrastructure will meet the needs for manufacturing’s Hadoop solution?
What advantages does the HPE Big Data Reference Architecture provide?

You can check your answers in Appendix B: Answers to Activities.

Summary
This chapter has introduced you to various types of data applications, as well as HPC applications.
You have learned how to architect server solutions that meet the particular needs of each type of
application and workload. You also learned about how server solutions can fulfill the RAS
requirements of mission-critical workloads.

Learning check
Review what you have learned by answering these questions. Then check your answers in Appendix
A: Answers to Learning Checks.
1. What characterizes OLTP workloads?
a. Very large datasets
b. Distributed datasets over multiple systems
c. The need for scale-up architectures
d. Computationally complex queries

2. A customer requires a solution for a mission-critical transactional database. Why do Intel Xeon
E7 series processors provide a good fit for this workload?
a. These processors have built-in RAS features for workloads that cannot tolerate any data loss or corruption.
b. These processors provide the highest clock speed per core but relatively few cores—the best fit for transactional workloads.
c. These processors provide high performance for a low TCO, enabling fast scale out for the mission-critical workload.
d. These processors are specifically designed for use with scale-out, clustered applications.

3. For which customer need does object storage provide the best solution?
a. Need to provide block-level access to remote drives
b. Need to store structured databases for transactional processing
c. Need to store billions of voice, video, and email files
d. Need to provide a remote drive from which VMs can boot

For answers, See Chapter 3 in Appendix A.


Chapter 4 HPE Apollo Solutions for HPC

EXAM OBJECTIVES
• Explain the features and benefits of HPE Apollo 2000, 6000, and 8000 solutions
• Position HPE Apollo 2000 and 6000 solutions for the right use cases and workloads
• Create an implementation plan for an HPE Apollo 2000 or 6000 solution, including plans for the
proper performance, scalability, high availability, and management

Assumed knowledge
Before reading this chapter, you should have a basic understanding of the following:
• Advanced architectural concepts (which are outlined in Chapter 3,“Advanced Architecture for
Server Solutions”)
• Processors, including DDR3 and DDR4 memory, hard disk drives (HDDs), solid state drives
(SSDs), and RAID levels for storage volumes
• HPE ProLiant rack and blade servers and options for them such as HPE Smart Array Controllers
• HPE BladeSystems, including interconnect modules and Virtual Connect (VC) modules
Chapter topics
This chapter begins with an overview of the HPE Apollo family. Then you will examine use cases for
the solution. Finally, you will learn about planning the architecture before examining how to manage
the Apollo family servers.

HPE Apollo 2000, 6000, and 8000 overview


This section introduces you to the HPE Apollo 2000, 6000, and 8000 families.

HPE Apollo 2000

Figure 4-1 HPE Apollo 2000

HPE Apollo 2000 Systems offer an alternative solution for smaller high-performance computing
(HPC) clusters and for companies taking their first steps toward HPC. The HPE Apollo 2000 System,
shown in Figure 4-1, is the enterprise bridge to scale-out architecture. It delivers twice the density of
traditional rack mount systems and the efficiency of a shared infrastructure, but maintains a familiar
form factor—the same racks, cabling, serviceability access, operations, and system management. No
retraining of personnel or cost of change is required to introduce efficient, space-saving, scale-out
architecture.

The Apollo 2000 System brings HPE ProLiant Gen9 server technology, including iLO4, into this 2U,
multi-server chassis. Storage and I/O flexibility enable customers to optimize for performance or
economy—the right compute for the right workload.

Apollo 2000 System offerings


Figure 4-2 Apollo 2000 System offerings

The Apollo 2000 System is a density-optimized, 2U shared infrastructure chassis for up to four
ProLiant Gen9 independent, hot-plug servers. It has all the traditional data center attributes, including
support for standard racks and cabling, as well as rear-aisle serviceability access (see Figure 4-2).

A 42U rack fits up to 20 Apollo r2000 series chassis, accommodating up to 80 servers per rack.
Apollo 2000 System servers provide the flexibility to tailor the system to the precise needs of each
workload, with a range of compute, I/O, and storage options. Apollo 2000 System servers can be
“mixed and matched” within a single chassis to support different applications. A chassis can even be
deployed with a single server, leaving room to scale as customers’ needs expand.

The Apollo 2000 chassis comes with four new-generation, single-rotor fans, and an additional four
fans can be added for redundancy. The power can be managed by the HPE Advanced Power Manager
(HPE APM), an optional rack-level manager discussed in Chapter 9,“Monitoring and Managing HPE
Solutions.”

HPE ProLiant XL170r—Gen9 1U Node


Figure 4-3 HPE ProLiant XL170r—Gen9 1U Node

The ProLiant XL170r Gen9 Server (shown in Figure 4-3) is a 1U half-width, two-processor server
with configuration options for the following:
• Performance and efficient central processing units (CPUs)—Intel Xeon E5-2600v3 or v4 series
processor options with choices from 4cores to 22 cores, 1.6GHz–3.5GHz CPU speed, and power
ratings between 85 W and 145 W. Customers can also choose Intel Xeon E5-1600v3 series
processors with choices from 4cores to 8 cores and 3.2GHz–3.7GHz CPU speed.
• 16 memory DIMM slots with up to 512 GB double data rate fourth generation (DDR4) memory at
up to 2133 MHz.
• Two I/O slots for a choice of fabric and clustering options including 1 GbE, 10 GbE, 40 GbE, and
56 Gb/s FDR InfiniBand; Fibre Channel (FC); options for either one PCIe slot, plus a FlexibleLOM
or two PCIe slots.

The Apollo 2000r series chassis accommodates up to four independently serviceable ProLiant
XL170r Gen9 servers, supporting up to 80 servers in a 42U rack.

HPE ProLiant Apollo XL190r—Gen9 2U Node

Figure 4-4 HPE ProLiant Apollo XL190r—Gen9 2U Node


The ProLiant Apollo XL190r Gen9 Server (shown in Figure 4-4) is a 2U half-width, two-processor
server with similar configuration options as the XL170r for CPU and memory. However, this server
adds additional PCIe slots in multiple configurations, providing support for additional expansion
cards and for two integrated accelerators per server. The tray supports a variety of NVIDIA and AMD
graphics processing units (GPUs) and Intel Xeon Phi coprocessors; you should check the tray’s
QuickSpecs for up-to-date information.

This server leverages Intel’s latest Xeon E5-2600 v3 or v4 series processors, increasing performance
up to 30%–40%. It supports DDR4 HPE SmartMemory with speeds of up to 2133 MHz and 512GB
maximum, boosting bandwidth and efficiency up to 50% over previous generation servers. The dense
and flexible HPE Apollo 2000 Chassis can also dramatically accelerate professional applications with
the GPUs or coprocessors.

Apollo 2000 storage flexibility

Figure 4-5 Apollo 2000 storage flexibility

The Apollo 2000 has two chassis options (shown in Figure 4-5), with different storage
configurations. The HPE Apollo r2200 Chassis includes 12 large form factor (LFF) hot-plug SAS or
SATA HDDs or SSDs allocated equally across server nodes. The HPE Apollo r2600 Chassis includes
24 small form factor (SFF) hot-plug SAS or SATA HDDs or SSDs, also allocated equally across
server nodes. The HPE Apollo r2800 Chassis provides 24 SFF hot-plug SAS or SATA HDDs or SSDs,
but it lets customers flexibly map the desired number of drives to each node.

The ProLiant XL170r and XL190r servers have embedded SATA storage controllers. Customers can
also purchase PCIe Host Bus Adapters (HBAs) for SAS connectivity, as well as Smart Array
Controllers to add features such as HPE Smart Cache to improve performance and RAID 10 to
improve fault tolerance and uptime.

All Apollo 2000 Chassis are built with the following:


• Four server slots per chassis
• Up to two 800W/1400W power supplies
• HPE Thermal Logic technology for lower power consumption and airflow
• Four single-rotor fans (standard) and options for four additional single-rotor fans for redundancy
• Improved power consumption and acoustics

HPE Apollo 6000

Figure 4-6 HPE Apollo 6000

HPE Apollo 6000 systems are designed to help customers obtain the right performance for their HPC
applications with the right economics (see Figure 4-6). The extremely dense system can deliver up to
20 servers in 5U, giving customers up to four times more performance per dollar and per watt while
using 60% less rack space compared with traditional servers. The systems consist of several chassis
that share dynamically allocated power, making it easy to scale the solution, as well as maximizing
rack-level energy efficiency and simplifying management.

The modular system lets you choose the right compute, memory, fabric, and storage options for the
customer ’s workloads. By tailoring the solution to the requirements, you enhance the performance
while decreasing total cost of ownership (TCO) by as much as $3 million.

Apollo a6000 Chassis


Figure 4-7 Apollo a6000 Chassis

The HPE Apollo a6000 Chassis is designed with density optimization in mind to help you manage and
scale to your business computing demands. The new modular HPE Apollo a6000 Chassis was
designed to hold various compute servers and/or accelerator trays to fit your specific workload (see
Figure 4-7).

Each chassis can hold up to 10 single-slot trays or up to 20 servers. Cooling concerns are reduced by
five dual rotor fans that share a cooling zone and, as an additional feature, power can be managed by
an HPE Advanced Power Manager (APM) option at the server, chassis, or power shelf level.

Quick stats include the following:


• 5U tall
• Fits standard 19-inch rack, ideal for 1.0m depth rack
• Holds 10 single compute trays vertically
• Rear NIC cabled
• 5x80mm redundant fans
• Connects to Power Shelf for pooled power (no internal power)

The chassis has these features:


• One slot and two slot tray support
– 10 single slot trays
– Five double slot trays
• Mix-and-match trays
• Shared cooling
• 12V DC power distribution
• Up to 5700W per chassis

The chassis also offers these serviceability features:


• Front serviceable trays
• Standard rear cabling
• Front serviceable hot-plug drives
• Redundant, hot-plug fans

Apollo 6000 Power Shelf

Figure 4-8 Apollo 6000 Power Shelf

The HPE Apollo 6000 Power Shelf, shown in Figure 4-8, offers pooled power for rack-level
efficiency as well as N+N redundancy to support your customers’ data center needs. Depending on the
power configurations of the trays within a chassis, the power shelf can support two to four fully
populated HPE Apollo a6000 Chassis with maximum DC power up to 15.9 kW. The HPE Apollo 6000
Power Shelf, with its redundant hot-plug power supplies, can also be configured for single- or three-
phase input.

Quick stats for the shelf include the following:


• 1.5U tall
• Efficient pooled/shared power infrastructure
• Holds up to six power supplies max
– 2650W Platinum hot-plug (15.9kW nonredundant)
– 2400W Platinum hot-plug (14.4kW nonredundant)
• Supports N+1 or N+N redundancy
• One power shelf can support up to three to four fully loaded enclosures, depending on power
capacity per enclosure

Apollo 6000 server options

Figure 4-9 Apollo 6000 server options

The Apollo 6000 server has three options, shown in Figure 4-9. For single-threaded workloads, the
HPE ProLiant XL220a Gen8 v2 Server has two single-socket servers in each front-accessible server
tray. Both of the processors are Intel® Xeon® E3-1200 v3 processors and each one has four
dedicated DDR3 memory slots, each capable of holding up to 8GB UDIMMs. Each server also has
two Hot Plug SFF drives, and one Serial/USB/Video (SUV) port dedicated to it.

The HPE ProLiant XL230a Server delivers 2P performance, while taking advantage of the Apollo
6000 System’s modular flexibility and rack-scale efficiency. This server leverages Intel’s latest Xeon
E5-2600 v3 and v4 series processors, increasing performance up to 70%, and DDR4 HPE
SmartMemory, which boosts bandwidth and efficiency up to 50% over previous generation servers.
The modular HPE Apollo a6000 Chassis accommodates up to 10 single-slot XL230a server trays to
address various workload needs.

The HPE ProLiant XL250a Server delivers 2P performance with dual accelerators, while taking
advantage of the Apollo 6000 System’s modular flexibility and rack-scale efficiency. This server
leverages Intel’s latest Xeon E5-2600 v3 and v4 series processors, increasing performance up to 70%,
and DDR4 HPE Smart Memory, which boosts bandwidth and efficiency up to 50% over previous
generation servers. The modular HPE Apollo a6000 Chassis can accommodate up to five double-slot
XL250a server trays to address various workload needs. For acceleration, customers can choose
from a variety of NVIDIA and AMD GPUs, as well as Intel Xeon Phi coprocessors; you should check
the tray’s QuickSpecs for up-to-date information.
HPE Apollo 8000

Figure 4-10 HPE Apollo 8000

For customers with the greatest HPC demands, HPE offers the HPE Apollo 8000, a supercomputer
solution that is the water-cooled version of the HPE 6000 (see Figure 4-10). The HPE Apollo 8000
can hold up to 144 densely packed powerful compute nodes or 72 compute nodes with accelerators. It
also holds InfiniBand switches to interconnect the nodes at lightning speed. This solution packs so
much computing power into a rack by using innovative water cooling to allow more powerful
processors in a smaller space, differentiating it from the competition.

The HPE Apollo 8000 water-cooled rack supports four times as many teraflops per square foot than
air-cooled systems for more than 250 trillion floating-point operations per second (TFLOPS) per
rack. Not only more powerful, the HPE Apollo 8000 is also greener, delivering 40% more floating-
point operations (FLOPS) per Watt and consuming 28% less energy than air-cooled systems.

The HPE Apollo 8000 features many patented features, including dry disconnect servers. The cooling
system is sealed off such that IT staff can remove servers for maintenance without disrupting the
system.

In addition to saving a company’s cooling costs, the HPE Apollo 8000 can actively help make the
company greener in other ways. The company can recycle the water heated by the system and use it to
heat the facility. In these ways, the HPE Apollo 8000 can save up to 3800 tons of CO2 per year (or the
equivalent of 790 fewer cars).

The HPE Apollo 8000 can meet the needs for scientific organizations that need to perform research
computing, climate modeling, and protein analysis. It can also provide product modeling,
simulations, and material analysis for manufacturing companies—as well as meet many other
supercomputing use cases.

Partners cannot sell this solution, but you can refer customers who might benefit from an HPE Apollo
8000 solution to HPE for an assessment of their needs. Because the Apollo 8000 is not the focus of
this chapter, this section will not go into detail concerning the 8000’s components and options. You
can learn more about the HPE Apollo 8000 by visiting the HPE website.
HPE Apollo 2000 and 6000 use cases
You will begin by learning about the high-performance computing (HPC) use cases for which HPE
Apollo 2000 and 6000 solutions are designed.

Why HPC: To out-compute is to out-compete

Figure 4-11 Why HPC: To out-compute is to out-compete

The HPE Apollo systems that you just reviewed are purpose-built to support enterprise HPC (see
Figure 4-11). HPC no longer belongs to large research facilities; enterprises across many verticals
have recognized that to out-compute is to out-compete. By embracing HPC, they can bring better
products to market more quickly. In fact, 97% of companies that had adopted supercomputing said
they could no longer compete or survive without it.

The competitive advantages extend from the enterprise level to the national level, as well with
political leaders and governments recognizing the trend and encouraging the adoption of HPC.

HPC applications

Figure 4-12 HPC applications


Because so many enterprises are adopting HPC, you will find opportunities to design HPC solutions
for customers across many verticals, as you see in Figure 4-12.

Electronics continue to make vast strides as smartphones get smarter, cars become more efficient and
connected, and hardware manufacturers pack more and more power into smaller packages. Engineers
could not keep up this pace without the aid of computers themselves. Manufacturers use computer-
aided engineering and electronic design automation (EDA) to simulate and design better and better
chips.

In the healthcare vertical, HPC helps researchers model systems at the level of the ecosystem or the
molecule. Pharmaceutical researchers use HPC to design drugs that are safer and more effective.
Scientists use HPC to develop greener ways to run the world—from more efficient solar chips to new
types of batteries. HPC also powers their research in innovative fields such as genetics and
computational fluid dynamics.

You see the products of HPC in almost any movie, where computer-generated imagery (CGI) effects
convince us to believe the unbelievable. But HPC also has a home in the music industry, which HPC is
helping to produce better quality.

You should be ready to ask about the need for HPC with proposals for government and education
entities, any of which use HPC for research.

Financial institutions require HPC to inform decisions such as where to invest money or which types
of loans to make. This particular type of HPC is called a Monte Carlo simulation—a simulation that
informs decisions that are influenced by many variables, some of them random. But Monte Carlo
simulations are not just about finance. A retail company might need to choose the best location to
open a new branch. A software company might need to decide how much to devote to developing a
particular project.

Note that you might also encounter customers who are looking for a cloud solution as a way to scale
out and obtain the resources that they need for HPC.

HPC application requirements

Figure 4-13 HPC application requirements


The wide range HPC applications have many requirements in common (see Figure 4-13). They all
demand the highest possible levels of performance and efficiency—the type of efficiency that density-
optimized solutions such as HPE Apollo 6000 solutions can deliver without compromising
performance. Customers need accessible solutions with options for smaller solutions if their
application has fewer requirements. And the solution must be able to scale easily as requirements
increase.

Demand for infrastructure optimized for the application

Figure 4-14 Demand for infrastructure optimized for the application

Although all HPC applications have performance and efficiency requirements in common, no single
type of server hardware gives the right fit to every HPC application because applications differ in
their architecture. Customers need an infrastructure that is tailored for their application’s
requirements, as illustrated in Figure 4-14.

A multi-threaded HPC application lets a compute node divide a job process into many threads. A
server with multiple cores or—as you will learn in more detail a bit later—GPU or coprocessor
accelerators can run the threads in parallel and complete the process more quickly. A node with fewer
cores could still run the job, but the threads would have to time-share the cores, and the job would
take longer to complete. Deploying nodes with many cores and accelerators might increase the
performance because jobs can run on fewer nodes, decreasing the chance of the interconnect acting as
a bottleneck.

Remember what you learned about HPC applications in an earlier chapter. These applications often
divide tasks among worker compute nodes using a mechanism such as Message Processing Interface
(MPI). Threading is a bit different from parallel processing across an HPC cluster. Threading applies
to how a single node handles the process. Distributed processing applies if the node makes an MPI
call to another node to help run the job. An HPC application can use both multi-threading and parallel
processing.

EDA and Monte Carlo simulations tend to be single or lightlythreaded. This means that each node can
only use one or a few processor cores due to the application architecture. Such applications get the
best performance boosts from increasing the power of each processor core in preference to
increasing the number of processor cores per node. In addition, a 1P server might even be able to
execute the job more efficiently than a 2P server; the 1P server introduces less latency because it does
not need to maintain cache coherency.

You need to help your customers find the right fit for their HPC application.

HPE Apollo 2000 and 6000 architecture


This section focuses on HPC use cases addressed by the HPE Apollo 6000. It also touches on
appropriate situations in which to deploy the HPE Apollo 2000.

The next section guides you through architecting HPE Apollo 2000 and 6000 solutions, helping you
to choose components to meet customer requirements, plan for rack-level efficiency, and scale out
the design.

Tailoring to the workload

Figure 4-15 Tailoring to the workload

You will now learn more about tailoring the solution to the workload (see Figure 4-15). The next
several sections give you guidelines for selecting compute trays, accelerators, memory, storage, and
fabric components for the HPE Apollo 2000 and 6000 solutions.

Tailoring the compute tray to the workload: HPE Apollo 6000

Figure 4-16 Tailoring the compute tray to the workload: HPE Apollo 6000
HPE Apollo a6000 Chassis provide ten compute tray slots, each of which you can populate with one
of three compute trays (shown in Figure 4-16).

Optimized for single-threaded HPC, the HPE ProLiant XL220a Gen8 v2 compute tray includes two
one-processor (1P) servers for a total of 20 per-chassis. A chassis is 5U, so the servers have four
times the density of a traditional 1U server. As you learned, single-threaded HPC applications (such as
EDA, as well as some engineering, risk analysis, and life sciences applications) benefit from
processors with higher clock speeds, even if those processors might have fewer cores.

The XL220a delivers the fastest clock speed, up to 3.7 GHz/s and 4.1 GHz/s with Turbo boost, with an
Intel Xeon E3-1200 v3 four-core processor. Each core provides better per-thread performance than a
core on a 2P server. Because 20 of these 1P servers fit in the same space as 10 2P servers, the system
as a whole is optimized. For some single-threaded HPC applications, deploying these servers can
improve efficiency by 35% over deploying 2P servers. And according to a SPECjbb2013-MultiJVM
benchmark of June 2014, the XL220a is the industry-leading 1P server with 16,252 max-jOPS and
4721 critical-jOPS.

For lightly threaded and multi-threaded HPC, you can receive more power from the HPE ProLiant
XL230a Gen9 compute tray. This tray includes one 2P server for a total of 10 2P servers per chassis.
These servers support the latest generation Intel Xeon E5-2600 v3 or v4 processors, which provide up
to 70 percent more power and 36 more efficiency than the previous generation. Examples of HPC
applications that run well on this tray include risk analysis (Monte Carlo simulation) and oil and gas
seismic processing.

The HPE ProLiant XL250a Gen9 compute tray boosts performance for multi-threaded HPC
applications. This tray features the same 2P server as the ProLiant XL230a tray, but adds support for
up to two accelerators. You can select an NVIDIA Tesla accelerator tray, an Intel accelerator tray, or
an AMD accelerator tray. You can then install up to two accelerators of the corresponding type in the
tray (NVIDIA Tesla K40, Tesla K80, Tesla M60, Tesla M60 LAF, or Grid K1 Quad GPUs, Intel Xeon
Phi 5110P or 7120P coprocessors, or AMD FirePro S9150 GPUs, as of the publication of this ebook).
To make room for the accelerators, the XL250a is a double-width server tray. Therefore, the density
of servers per-chassis is lower—five 2P servers per-chassis. However, for the right HPC applications,
the accelerators can more than make up for this lower density.

Examples of HPC applications that benefit from acceleration include as seismic analysis, risk
analysis, Monte Carlo simulation, weather simulation, and genomics.

Note that some types of HPC, such as Monte Carlo simulation, could fit well on various processors;
you should examine the needs of your particular customer ’s application and use case. In a moment,
you will learn more about how you can determine whether an application will benefit from
accelerators.

Tailoring the compute tray to the workload: HPE Apollo 2000


Figure 4-17 Tailoring the compute tray to the workload: HPE Apollo 2000

As mentioned in the previous section, you should choose the Apollo 2000 System when the customer
requires a smaller deployment. In addition, perhaps the customer is just getting started with HPC and
wants a solution with a familiar form factor.

The HPE Apollo 2000, shown in Figure 4-17, provides this familiar 2U form factor. It supports two
options for compute trays:
• The ProLiant XL170r is quite similar to the XL230a. The XL170r also provides a 2P server with
Intel Xeon E5-2600 V3 processors and is well suited to lightly multi-threaded HPC. The Apollo
2000 chassis can hold four of these trays.
• The ProLiant XL190r provides a 2P server with E5-2000 V3 processors as well as two trays for
similar accelerator options as the XL250a (as of the publication of this ebook: NVIDIA Quadro
K4000, NVIDIA Tesla K40 or K80 GPUs, NVIDIA GRID K2-RAF PCIe GPUs, NVIDIA GRID M60-
RAF Dual GPUs, AMD S9150 accelerators, and Intel Xeon Phi 5110P coprocessors). An Apollo
r2000 chassis can hold only two of these trays.

Why GPU and coprocessor acceleration

Figure 4-18 Why GPU and coprocessor acceleration


CPUs were designed to meet the needs of many different types of workloads, including single-
threaded processes and multi-threaded ones. GPUs, on the other hand, were originally designed for
just one purpose: rendering graphics. Rendering each pixel constituted one task, separate from other
tasks, so GPUs were optimized for multi-threading, rendering as many pixels as possible in parallel,
illustrated in Figure 4-18. Many HPC applications also feature workloads that can be parallelized and
divided into many threads. These applications benefit highly from running on a CPU that is enhanced
with a GPU.

The NVIDIA GPUs can boost performance up to ten times depending on the application. The Tesla
K40 dual GPU provides up to 7 single-precision TFLOPS and 0.2 double-precision TFLOPS with
NVIDIA GPU Boost, 24 GB memory, and 288 Gb/s memory bandwidth. The K80 dual GPU provides
up to 8.73 single-precision TFLOPS and 2.91 double-precision TFLOPS with NVIDIA GPU boost, 24
GB memory, and 480 Gb/s memory bandwidth. Refer to NVIDIA materials for the latest
specifications and information on other GPUs.

The AMD S9150 provides 5.07 single-precision TFLOPS, 2.53 double-precision TFLOPS, 16 GB
memory, and up to 320 Gb/s memory bandwidth.

Instead of GPU accelerators, you can install Intel coprocessors. The coprocessor consists of a dense
group of cores (60 for the Intel Xeon Phi 5110P and 61 for the Intel Xeon Phi 5110P) and solid
memory and memory bandwidth (8 GB and 320 Gb/s for the 5110P, 16 GB and 352 Gb/s for the
7120P). These coprocessors, like GPUs, are also optimized for highly parallelized tasks and can
speed those tasks with up to 1.2 double-precision TFLOPS. See Intel materials for more precise
benchmarks.

Before you choose accelerators, it is crucial that you discover whether your customer ’s application is
architected to take advantage of that accelerator. NVIDIA and Intel provide searchable lists of such
applications:
• http://www.nvidia.com/object/gpu-applications.html
• https://software.intel.com/en-us/xeonphionlinecatalog

The NVIDIA site provides estimates of how much the GPU will accelerate the performance for the
application.

All of these GPUs and coprocessors, including the AMD FirePro 9150S, are OpenCL 1.2-compliant.
(OpenCL is an open source project for developing parallel computing, graphics, and other types of
applications that can run on a variety of hardware.) You can look up libraries and applications that use
OpenCL at https://www.khronos.org/opencl/resources.

After you know that your customer ’s application can use the acceleration, you can consider whether
you need one of the higher memory and performance options.

Also remember to check the particular accelerators that are supported by the XL server.

Tailoring memory to the workload: Capacity


Figure 4-19 Tailoring memory to the workload: Capacity

You will now move on to the next choices for tailoring the compute tray options: choosing the
number and type of DIMMs (see Figure 4-19).

You will need to work closely with the customer to determine the memory capacity requirements for
their application. To keep the computation running as quickly as possible, the application needs to be
able to work with data in the memory rather than a drive. Some HPC applications work with smaller
sets of data, while others work with very large ones. By increasing the capacity of the memory to
hold as much of the dataset as possible, you can improve the performance for the application.

Also consider the number of processor cores, because all of the cores share the same memory. A
multi-threaded HPC application can use the cores intensively. In addition, HPC schedulers often
allocate jobs per-processor core. If a processor might be handling several different jobs on its cores,
you should take care to plan enough memory so that the jobs do not contend too much, which would
decrease the performance of the solution. In other words, you would try to plan enough memory to
hold the dataset for several jobs.

As a general rule, provision at least 2 GB per core. Preferably, provision at least 4 GB or even 8 GB
per core, depending on the application’s demands. Note, though, that this is only a guideline intended
to give you a minimal starting point for planning. Understanding the dataset size and number of jobs
per processor is critical. Later in this chapter, you will also learn a bit about benchmarking
application needs.

The XL220a compute tray supports up to 32 GB of RAM per processor, which is often enough for
single-threaded applications. You should generally provision up to this level to get the best
performance. If your customer requires more memory, you can select the XL230a instead. The
XL170r, XL190r, XL230a, and XL250a compute trays have four memory channels with two slots each
on each processor. Currently, the XL230a and XL250a support DIMMs with capacities up to 64 GB,
for up to 512 GB of RAM with one processor and 1024 GB with two processors, providing ample
memory for processors with many cores. The XL170r and XL190r currently support DIMMs with
capacities up to 32 GB for 256 GB with one processor and 512 GB with two.

Tailoring memory to the workload: Performance


Figure 4-20 Tailoring memory to the workload: Performance

Often HPC requires you to maximize for performance, so you should select higher speed memory, as
you see in Figure 4-20. For the XL170r, XL190r, XL230a, and XL250a compute trays, you should
also consider which type of memory to install: registered DIMMs (RDIMMs) or load-reduced DIMMs
(LRDIMMs). LRDIMMs generally provide better performance with some costs in higher energy use.
Also note that standard rather than low voltage memory provides better performance.

Performance also depends on how you distribute the memory. To obtain the best performance, you
should balance the DIMMs (UDIMMs, RDIMMs, or LRDIMMs) in each of the memory channels on
the processor. For example, if you need 64 GB for an XL230a processor, select four 16 GB DIMMs—
one for each channel—rather than two 32 GB DIMMs. You must install the memory in the correct
DIMM slots based on which processors you are using, how many DIMMs you are using, and the
number of ranks the memory provides. Visit http://h22195.www2.hp.com/DDR4memoryconfig to
obtain valid memory configurations.

See Table 4-6 in the “Supplemental content” section at the end of this chapter for an overview of the
compute tray options; for details, refer to the compute tray’s QuickSpecs.

Tailoring storage to the workload


Figure 4-21 Tailoring storage to the workload

Because HPC typically works within a cluster of compute nodes, each of which might need access to
the same files, shared storage plays a crucial role. However, local storage can still be important to the
functioning of the application. For example, the application might use local drives for temporary files
to which they need to read and write quickly during a particular job, as illustrated in Figure 4-21.

For both types of storage, you need to consider the vast demands that HPC can place on storage. HPC
calls for both high performance and high capacity.

First, consider the performance needs. As you know, storage performance is generally measured in
random input/output operations per second (IOPS), which measures how many different read or write
requests the storage can accommodate per second, and in sequential IOPS, which measures how
quickly the drive can deliver a sequence of data such as a complete file.

For local storage, the random IOPS versus sequential IOPS demands depend largely on how the
application works. You will consider various factors in the next section.

HPC can create very high demands for random IOPS in the shared storage because, as each compute
node works on its job or portion of a job, the node accesses the shared storage. Many different nodes
might access a shared drive at the same time, asking for different files or portions of files. If
computations involve accessing many different small files—as does, for example, the physical design
portion of EDA—the random IOPS must be particularly high.

If the HPC application calls for nodes to work with large files, a high sequential IOPS might be
important as well.

HPC can also create large capacity demands. The application might be working with vast data sets and
large, complex file systems.

The next sections give some guidelines for maximizing IOPS, particularly, random IOPS. Storage
I/O can be the slowest part of a job, so enhancing performance can pay off in speeding up the job’s
runtime. (On the other hand, storage I/O might only form a small part of the job, in which case
performance increases are less important. Consider the particular needs of the customer application
as you optimize.) The next sections also point to ways that shared storage can scale to meet the needs
of large HPC clusters that work with a great deal of data.
Tailoring local storage to the workload: IOPS and throughput

Figure 4-22 Tailoring local storage to the workload: IOPS and throughput

You will now look more closely at planning the local storage on each compute node. First, consider
some of the options that you have for different types of drives. In the HPE Apollo 6000 Systems, each
XL compute tray has its own drives. The HPE Apollo 2000 chassis, on the other hand, provide the
drives for their compute trays—depending on the chassis either allocating the same number of drives
to each tray in a fixed manner or flexibly allocating them, as you learned earlier. In either case, the
compute trays support both SAS and SATA HDDs, as well as SATA SSDs and SAS SSDs (SAS SSDs
are not currently supported on the XL220a).

Figure 4-22 indicates generally how these options compare in the performance that they provide.
SSDs cost more than HDDs, but they outperform HDDs in several important areas. They provide
higher sequential IOPS and much higher random IOPS.

Consider how the customer and HPC application will be using this storage. If the local storage is
intended for purposes unrelated to the HPC application, you can propose less high performance
options. When the HPC application is using the local storage for temporary files, though, optimizing
for performance can be critical.

Assess how intensively the HPC application will use the local drives. Do they need to read or to write
from the drives frequently? In that case, the higher cost SSDs might be worthwhile for the customer.
Will the application need to read from and write to different portions of the file throughout the
computation? In this case, the local storage must deliver a high random IOPS. Or will a particular job
bring a small file into its memory, use its memory, and then write a result to the drive at the end of the
computation? In this case, sequential IOPS might be more important. In either case, SSDs deliver the
best performance.

If you need to propose HDDs as a less expensive alternative, always recommend enterprise-class
drives. Select the higher rotations per minute (RPM) option to optimize random IOPS.

Also consider the protocol, which affects throughput. The 12Gbps options, of course, provide higher
throughput and also sequential IOPS, which depends in a large degree on the throughput.
Traditionally, SAS drives generally provide better performance and reliability, but a high capacity
SAS drive is more expensive than a SATA drive with the same capacity.

If the customer requires the highest performance for reading and writing local data, consider adding
a HPE Value Endurance (VE) PCIe Workload Accelerator to the compute tray. These accelerators
increase the IOPS for connected SSDs and can provide very low latency and four times more
transactions per server.

Tailoring storage to the workload: Other considerations

Figure 4-23 Tailoring storage to the workload: Other considerations

You should also discuss with the customer the reliability requirements. How mission critical is data
stored on local drives? In many cases, files in local storage are copied to shared storage, but the
company might have special requirements. Also consider the endurance requirements. You might
want to propose high endurance SSDs drives because drives often get a lot of use as temporary files
are saved to them over many jobs. Note that HPE provides SSDs that are optimized for different
purposes, whether read-intensive, write-intensive, or mixed-use. You should discuss which types of
use the customer ’s HPC application requires. (Note that you cannot reach the maximum capacity
indicated in Figure 4-23 with some varieties.)

You should have now selected the type of drive. Next, determine the required capacity. Discuss with
the customer whether the drives will be used for temporary files only or whether files will
accumulate on them. Sizing the local storage to accommodate the full temporary needs can speed up
the job by ensuring that the node does not have to interact with shared storage many times throughout
the job. The HPC application might give guidelines as to the local needs. For example, an EDA
application might require twice as much local storage space as memory.

As you see in Figure 4-23, the compute trays for the Apollo 6000 servers support high-capacity
options for both HDDs and SSDs, so you should be able to meet the customer requirements no matter
which type of drive you have selected based on the performance requirements. The Apollo r2000
Chassis can provide an even larger amount of local storage capacity for their compute trays,
depending on the drive type and chassis, as shown in Table 4-1. As always, this table and the capacity
information in Figure 4-23 are provided for your convenience; you should check QuickSpecs for the
latest information.

Table 4-1 Local storage for HPE Apollo 2000


Also note that the compute trays include embedded controllers for the drives (HPE Dynamic Smart
Array B140i (SATA controller). However, you might propose a Smart Array controller instead,
which is supported in a PCIe expansion slot. As you know from prerequisite training, Smart Array
controllers provide additional benefits such as HPE Secure Encryption and SSD Smart Cache. The
customer might also require the Smart Array Controller to support SAS drives. As of the publication
of this ebook, supported controllers besides the embedded controller include the following:
• For the XL220a—HPE Smart Array P430/2G and 4G SAS controller
• For the XL230a and XL250a—HPE Smart Array P440 SAS controller
• For the XL170r and XL190r:
– HPE Smart Array P440/4G Controller
– HPE Smart Array P441/4G Controller
– HPE Smart Array P840/4G Controller
– HPE Smart Array P841/4G Controller

For more details, refer to the compute tray’s QuickSpecs.

Assessing how the current environment affects shared storage and fabric
choices
Your final choices for tailoring the solution to the customer scenario include selecting network
adapters for the compute trays and adding additional components, such as servers to host shared
storage or top of rack (ToR) switches to support the HPC interconnect fabric. These choices depend
on the customer ’s current environment, which you should assess during conversations with the
customer. If the customer already has an HPC application, your questions should reveal what type of
shared storage the application uses and how compute nodes reach that storage. You should also
discover the type of interconnect fabric, if different from the fabric used to connect compute nodes
and storage.

With this knowledge, you can ensure that your final choices for the solution fit with the customer ’s
current environment. You can also use surveys to assess customer ’s satisfaction with the current
environment and general expectations. The request for proposal (RFP) might also include updating
the storage and fabric components of the solution, so you need to be ready to architect that portion of
the solution or to work with a team member to architect it.

The next sections provide guidelines for assessing the shared storage solution, proposing a new
solution if necessary, and also proposing network adapters that fit the customer ’s environment and
requirements.

Shared storage approaches that you might encounter

Figure 4-24 Shared storage approaches that you might encounter

You will now look at some of the shared storage options that you might encounter, which are
explained in the sections below. It is important that you understand these options (shown in Figure 4-
24) to ensure that your solution fits with them. You will probably encounter Network Attached
Storage (NAS) most often, although parallel distributed storage is becoming more common.

Storage area network (SAN) shared disk

A SAN provides block storage through a technology such as FC or iSCSI. To the compute node
operating system (OS), the disks in the FC array appear as local drives, which they are allowed to
access at the block level. A SAN can provide high random IOPS, as well as high sequential IOPS with
the sequential IOPS, depending largely on the network bandwidth.

However, although FC provides a limited degree of access control, block storage technologies were
not designed to manage multiple nodes accessing the same shared disks. Each compute node
connected to the SAN requires a shared disk solution to manage shared access. For example, Oracle
Cluster File Sharing (OCFS) helps a node lock a file before altering it.

For this solution, all compute nodes require a connection to the SAN. If the SAN uses iSCSI and the
compute nodes use Ethernet for their interconnect, the compute nodes can use their interconnect for
the storage traffic as well. Compute nodes traditionally required FC HBAs to reach an FC SAN.
However, they can now use Converged Network Adapters (CNAs), which carry both Ethernet and FC
over Ethernet (FCoE) traffic, allowing the nodes to use the same links for the interconnect and for
storage. Note that the fabric infrastructure must also support FCoE as well as the Data Center
Bridging (DCB) technologies that provide low latency and lossless delivery.

To minimize the number of nodes that connect to the SAN, some HPC applications distinguish
between compute nodes and IO nodes. The application is designed to allow compute nodes to direct
their file requests to IO nodes. Only the IO nodes connect to the SAN and run the shared disk solution.
If your customer ’s application takes this approach, you will need to determine whether to propose a
solution for the IO nodes, which often have different requirements from the compute nodes, since you
need to optimize them for serving files rather than for running computations.

Traditional Network Attached Storage (NAS) solution

In a NAS solution, compute nodes receive access to files on shared storage drives through a NAS
server. As long as the node is set up as a NAS client, the shared drive appears as a local drive to the
OS just as it does in a SAN shared disk solution, allowing the HPC application to call up files without
special coding. However, the compute nodes only access storage at the file level, not the block level,
and a NAS server controls each node’s access. This solution can be more reliable than a shared disk
solution in which a misbehaving node might improperly write to a file.

Network File System (NFS) is the typical NAS solution for Linux nodes. Each compute node is an
NFS client to the NFS server. The NFS server holds the file system and connects to the shared storage
drives, which are generally directly attached to the server—although they could be attached through a
SAN. The NFS server is responsible for serving all files to the compute nodes.

The NFS server ’s ability to meet random IOPS demands depends on the capabilities of its local disks
as well as the server ’s ability to handle sessions with many clients. You can optimize sequential IOPS
for this solution by increasing throughput across the path between compute nodes and the shared
storage.

Scale-out solutions

Traditional NAS allows only one NAS server (or perhaps one NAS server and a standby server for
high availability) per file system. Because a single NAS server cannot always meet the high IOPS and
capacity needs of HPC, companies are looking for ways to scale out.

HPE IBRIX Fusion is an example of a scale-out NAS solution, which combines traditional NAS
protocols such as NFS with a clustered file system. The cluster of NAS servers connects to shared
storage in a SAN. Because nodes have many more servers to address their needs, the storage solution
can scale much further.

In an architecture that is perhaps more common (because it eliminates the need for a SAN), a cluster
of servers can each contribute its local disk drives to the solution, and data is striped across these
disks. This approach is called a parallel file system. Lustre is one of the most common parallel file
systems for Linux HPC environments. Another example is GlusterFS, which can use a shared SAN or
local disk drives.

Lustre includes metadata servers (which store information such as file names, directory names, and
file access rules) and object servers (which store the actual files). Compute nodes are clients. When a
client needs to access a file, it first contacts a meta server to learn where the file is stored. It then
contacts the multiple object servers that store pieces of the file. Because clients interact with many
servers in parallel and because data is distributed across each server ’s drives, the solution can scale
predictably with the addition of more servers.

Like NFS, Lustre enables the compute node OS to view shared files as local.

Options for when the customer needs a shared storage solution: HPE
ProLiant SL4540

Figure 4-25 Options for when the customer needs a shared storage solution: HPE ProLiant SL4540

If the customer wants to keep their current storage solution, you can move on to selecting network
adapters, keeping in mind what you learned about how compute nodes reach the shared storage.

In some cases, though, you will need to propose a shared storage solution. HPE ProLiant SL servers
are optimized for scale-out NAS and parallel distributed file solutions. For example, an HPE ProLiant
SL4540 provides up to three powerful compute nodes and up to 60 SFF drives (one-node
configuration). You can add the required number of SL servers to the HPE Apollo 6000 rack or scope
out a rack to serve multiple HPE Apollo racks.

You can choose one of three models (see Figure 4-25). The 1x60 model has one server node that has
60 drives. The 2x25 model has two nodes, each of which has 25 drives for 50 total. The 3x15 node has
three nodes, each of which has 15 drives for 45 total. The models with fewer server nodes have more
capacity at the expense of the ability to serve fewer clients. To maximize random IO, you would
choose a model with more nodes.

Other training provides guidelines on planning HPE ProLiant SL solutions, but Tables 4-2, 4-3, and
4-4 below show the maximum capacity to give you an idea of the number of systems that you will
require.
Table 4-2 Maximum storage capacity for HPE SL4540 1x60 Model

Disk type Protocol Maximum capacity*

HDD SATA 360TB (60 x 6TB)

SAS 360TB (60 x 6TB)

SSD SATA 48TB (60 x 800GB)

*Two slots for SFF drives are also provided, adding up to 2TB (HDD) or 1.6TB(SSD)

Table 4-3 Maximum storage capacity for HPE SL4540 2x25 Model

Disk type Protocol Maximum capacity*

HDD SATA 300TB (50 x 6TB)

SAS 300TB (50 x 6TB)

SSD SATA 40TB (50 x 800GB)

*Four slots for SFF drives are also provided, adding up to 4TB (HDD) or 3.6TB(SSD)

Table 4-4 Maximum storage capacity for HPE SL4540 3x15 Model

Disk type Protocol Maximum capacity*

HDD SATA 270TB (45 x 6TB)

SAS 270TB (45 x 6TB)

SSD SATA 36TB (50 x 800GB)

*Six slots for SFF drives are also provided, adding up to 6TB (HDD) or 4.8TB(SSD)

Options for when the customer needs a shared storage solution: HPE Apollo 2000 local storage
Figure 4-26 Options for when the customer needs a shared storage solution: HPE Apollo 2000 local
storage

If you are architecting an HPE Apollo 2000 solution and you need to propose a shared storage
solution, you might choose to use the local storage for this purpose (see Figure 4-26). These HPC
clusters tend to be relatively small, and the Apollo r2000 chassis provides a higher density of storage,
sometimes enabling it to meet the cluster ’s needs.

The HPE Apollo 6000 compute trays each have their own storage. The HPE Apollo r2000Cchassis,
on the other hand, provides the storage to installed trays:
• HPE Apollo r2200 chassis—Provides 12 LFF SATA or SSD HDDs or SDDs, equally distributed
(three per XL170r server or six per XL190r)
• HPE Apollo r2600 chassis—Provides 24 SFF SATA or SAS HDDs or SSDs, equally distributed (six
per XL170r server or 12 per XL190r)
• HPE Apollo r2800—Provides 24 SFF drives such as the r2600; however, you can choose how
many to allocate to each server

The HPE Apollo r2800 chassis can be a good choice for the HPC solution. You can select one or two
servers to act as the file servers or hosts for shared storage and assign all or most of the drives to
them.

HPE Apollo 2000 Systems can also act as alternatives to HPE SL 4540, providing shared storage for
HPE Apollo 6000 HPC clusters.

Tailoring fabric to the workload: Options


Figure 4-27 Tailoring fabric to the workload: Options

You are now ready to select adapters for the solution. The HPE Apollo a6000 Chassis provides ten
Innovation Zones, where you install the fabric options (shown in Figure 4-27). Each Innovation Zone
is dedicated to a compute tray slot, and you can mix and match options so that you can select the right
option for each.

For each XL220a, XL230a, or XL250a compute tray, you can either install an IO module, which
includes two 1GbE ports, or a Dual FlexibleLOM riser, which supports up to two FlexibleLOM cards.
For each XL230a or XL250a tray, you can alternatively install a PCIe/FlexibleLOM riser, which
supports one FlexibleLOM card and one card with a PCIe form factor. For example, you could install
an FC HBA for connecting to an FC Storage Array.

HPE Apollo XL170r and XL190r servers for r2000 Chassis also support FlexibleLOM cards, as well
as other network adapters. You must choose the appropriate risers to support the FlexibleLOM cards,
and your choices also affect how many PCIe expansion slots you can use for components such as
Smart Array Controllers, HBAs, and accelerators (for the XL190r).

The FlexibleLOM cards and I/O network adapters include options for four-port 1GbE, two-port 10G
Ethernet, two-port 10G FlexFabric (which might provide special features described on the next page),
and two-port InfiniBand.

For details about the exact adapter models, refer to the compute tray’s QuickSpecs.

Tailoring fabric to the workload: Choosing options


Figure 4-28 Tailoring fabric to the workload: Choosing options

To choose between the adapter options, you need to consider what you learned about the customer ’s
existing environment as well as collect information about the HPC application requirements.

For highly parallelized, multi-threaded HPC applications, which run jobs across many nodes in a
cluster, the interconnections between compute nodes can act as a bottleneck, slowing down the
computation. Properly provisioning the interconnections, on the other hand, will seem to give the
cluster a performance boost.

Consider questions such as these:


• Does the customer have an existing fabric solution in which your proposal must fit? If so, you
must, of course, select adapters that match the current solution.
• If you have more freedom in the proposal, does the customer have a preference toward Ethernet or
toward InfiniBand? Does the customer IT staff have more experience with Ethernet? If so, an
Ethernet adapter that can meet the performance requirements might be the best choice.
• How parallelized is the application? Do nodes need to share large amounts of data with each other
at very low latency? Or are nodes running batch jobs that run largely independently?

Whenever HPC applications are highly parallelized, such as with Message Passing Interface (MPI),
the interconnect must deliver high throughput and low latency. The InfiniBand options for HPE
Apollo 6000 compute trays can provide 10 Gbps or 56 Gbps. InfiniBand also delivers extremely low
latency. InfiniBand avoids the traditional IO stack and instead uses Remote Direct Memory Access
(RDMA) to connect nodes at the memory level, essentially extending internal fabric between nodes.

Ethernet can provide high speeds, but traditionally it has higher latency. However, if the customer
prefers Ethernet, certain HPE FlexFabric 10 GbE adapters can support RDMA over Converged
Ethernet (RoCE), which reduces latency (see Figure 4-28). Note that Converged Ethernet uses DCB
technologies to ensure the low latency and lossless delivery required for RoCE. Make sure that the
fabric infrastructure also supports these technologies.

The FlexFabric adapters also feature offloading of traffic processing, which prevents precious
compute power being consumed by processing traffic.

If the compute nodes are running independent batch jobs, 1 GbE might meet their needs adequately.
However, keep in mind that the nodes storage needs as well (as discussed below).
• What type of remote storage solution are you planning? And how does the HPC application use
files retrieved from this storage? If the application is interacting often with the shared storage, low
latency is a must. What size files does the application work with? Larger files require more
bandwidth (higher speeds). Will a single NIC, or a pair of redundant adapters, provide enough
bandwidth for the interconnect and the storage traffic?
– If the customer requires an FC SAN solution, you might want to select FlexFabric adapters that
support FCoE. The two 10 GbE ports can then provide both the interconnect and the SAN
connection, as well as redundancy for both of these connections.
– If the customer requires a NAS or a parallel distributed file solution, the compute nodes
generally connect to the servers using Ethernet (although InfiniBand might be used in some
cases). If you are planning to use Ethernet for both the interconnect and shared storage, plan for
10 GbE, not 1 GbE. If you are using InfiniBand for the interconnect, you will probably need to
plan two cards for each compute node: one InfiniBand and one Ethernet.
• What type of availability is required?

Many HPC applications or their management applications have mechanisms for dealing with the loss
of a single node within a cluster. If not, the loss of a node could compromise the completion of an
important task that has taken hours to compute. Based on the capabilities of the customer ’s software,
decide whether one port on each server is sufficient for the interconnect or whether two are required
for better resiliency. If you are using Ethernet, you must bond the adapters, and you might want to use
NIC bonding (teaming) in a mode such as link aggregation control protocol (LACP) that allows both
adapters to be active to increase throughput.

Options for when the customer needs a fabric proposal

Figure 4-29 Options for when the customer needs a fabric proposal
As discussed earlier, you might need to deliver top-of-rack (ToR) switches as part of the solution,
ensuring the proper technologies for maximizing the cluster performance.

Figure 4-29 shows an example design for an HPC cluster with many servers, which are housed in
several HPE Apollo a6000 chassis. Several HPE ProLiant SL servers are providing the shared
storage.

This design uses a pair of HPE FlexFabric 5930 switches, just one example of an HPE switch model
that meets the needs. Using Intelligent Resilient Framework (IRF), these switches can act as a single
virtual switch, giving you the freedom to connect one NIC in a bonded pair to one switch and the
other NIC to the other switch for better resiliency. Because the two switches act as one, the bonded
adapters can use a bonding mode that requires aggregation on the switch side, such as LACP, so the
solution provides load-balancing on the bonded adapters as well.

The two 5930 switches could support a smaller cluster of about 100 nodes on their own. If you need
to scale the cluster beyond that, the ToR switches could connect to another tier of switches (such as
HPE FlexFabric 7900 switches) in a leaf-spine topology. The switches support Transparent
Interconnection of Lots of Links (TRILL) to ensure that the full bandwidth on the uplinks is used. Note
that an HPC interconnect often requires a nonblocking or near to nonblocking design, so network
designers will need to plan the ratio of 40 G uplinks to 10 G server links accordingly.

These switches also support FCoE, so they can support servers that use FCoE to connect to shared FC
or FCoE storage. They can even connect directly to native FC storage using flexible ports that can
operate in FC or Ethernet mode (see Figure 4-30).

Figure 4-30 Options for when the customer needs a fabric proposal with FCoE support

If you have selected InfiniBand for the interconnect, you must ensure that the customer has Mellanox
FDR edge switches that support the same adapter type (10 Gb/s or 56 Gb/s). The edge switches will
generally need to connect to core director switches in a nonblocking fat tree topology. Examples of
this topology are shown in Figure 4-31 and Figure 4-32. Note that you can establish one link or two
links per server.
Figure 4-31 Options for when the customer needs a fabric proposal, InfiniBand with one link per
server

Figure 4-32 Options for when the customer needs a fabric proposal, InfiniBand with two links per
server

Creating a design for tiered capabilities

Figure 4-33 Creating a design for tiered capabilities

Generally, an HPC cluster features many nodes with identical hardware configurations. Sometimes,
though, you can help the customer optimize performance with the right economics by creating a plan
with different types of nodes to meet the needs of an application with varying requirements.
For example, an EDA application often runs not only many smaller jobs but also a few larger ones
that combine pieces of a simulation. If the customer has a solution for scheduling jobs to run on
servers with the correct resources, you might design two types of servers, most with less memory and
some with more memory. Remember that you can also mix and match compute options. Again, you
can maximize the efficiency of the plan by creating different tiers of servers, some with more
compute power and some with less, as required by the mix of workloads in the customer
environment, illustrated in Figure 4-33.
Chapter 4—Activity 1
Next, take time to review what you have learned by designing an HPE Apollo solution for the
following customer scenario.

Your sales partner has discovered an opportunity with an automotive company. This company relies
on its EDA application to design more efficient, safer, and more powerful cars. Designers are
beginning to complain that their jobs are backing up, and wait times are interfering with their ability
to meet deadlines. The company is looking for a server infrastructure upgrade to improve
performance for the EDA application.

Current solution

The solution that you will replace consists of


• 120 2P servers:
– Four core 2.5GHz Intel Xeon E5-2609 v2 processors
– 16 GB RAM on each processor (32 GB total per server)
– Four 1 Gbps ports (two per processor)
• One NAS (and a standby NAS) with 120 TB storage (including replicated data)

Workload

Assume that you have discussed the workload requirements with the customer and discovered:
• The customer uses an array of Synopsys EDA tools, including IC Compiler, Design Compiler,
Saber, Proteus, and more (see Table 4-5).
• The company’s IC Compiler is set up for light-threading with a maximum of four threads. It does
not use the Design Compiler Ultra edition that permits multi-threading. Other tools also tend to be
single-threaded.
• Most jobs are scheduled to run on individual nodes. Some IC Compiler jobs, however, use
distributed computing.
• Most jobs use and produce quite small files, between 8 KB and 16 KB. The final simulations,
though, produce very large files.
Table 4-5 Example Synopsys EDA applications

Application Description

IC Compiler Tool for designing physical chips

Design Compiler Tool for enhancing and speeding up IC Compiler

Saber Platform for simulating and validating systems

Proteus Tool for analyzing proximity effects on full chips and building a correct model

Requirements
The customer wants to reduce the run time for jobs by 30%. IT administrators find it difficult to
monitor their existing solution. However, after extensive analysis, they believe that both inadequate
processing power and inadequate memory are delaying jobs. The company also wants to increase
shared storage to 200 TB (including replicated data) and provide 100 GB of local storage on each
compute node for temporary files.

The NAS is acting as a bottleneck. The customer wants to migrate to a distributed file system with a
cluster of file servers to handle the requests. You need to propose hardware for this solution.

The solution can tolerate the loss of a node whether because of a problem with the hardware or a link
that connects to the server. The solution should continue to operate as normal if up to one power
supply fails.

Design questions

As you review these questions, record your answers. Refer to Table 4-6 in the “Supplemental content”
section at the end of this chapter for an overview of the compute tray options; for details, refer to the
compute tray’s QuickSpecs.
1. Does the customer ’s application support GPU or coprocessor acceleration? Visit these links to
search for the application:
• http://www.nvidia.com/object/gpu-applications.html
• https://software.intel.com/en-us/xeonphionlinecatalog
• https://www.khronos.org/opencl/resources
2. Which compute tray will you recommend for the solution and why?
3. How much memory will you recommend for each processor? How much total for your
selected compute tray?
4. Visit http://h22195.www2.hp.com/DDR4memoryconfig. Use the tool to plan how you will
configure the DIMMs. If you are given multiple choices, choose a design that will help optimize
performance. Record your configuration and explain the reasoning behind your choices.

If you are planning to use HPE XL230a or XL250a trays, select DDR4 and then choose the Apollo
system and your tray.

The tool does not currently support the HPE XL220a compute trays. However, you can plan the
memory for a Gen8 server with similar options to get an idea of a valid configuration. Select DDR3,
HPE ProLiant DL servers, and HPE DL320 Gen8 v2. You can then plan the memory for one server.
Remember that the XL220a compute tray contains two 1p servers, which must each have the same
memory configuration. Therefore, double the design that you select when you record it.
5. You will need to test to come to a final decision about how many compute trays to propose. At
this point, about how many compute trays and HPE Apollo 6000 chassis will you plan to
propose?
6. You have several options for drives that meet the customer ’s capacity requirements. What are
some additional questions that you can ask the customer to help you make these choices?
7. For the shared storage solution, which server model will you choose? How many are required
to meet the customer ’s needs? Justify your choices.
8. Will you use InfiniBand, 10 GbE, or 1 GbE? How many ports will you plan for each server?
Explain your reasoning. Also list further questions that you might ask the customer if you
cannot make a choice.

You can check your answers by referring to Appendix B: Answers to Activities.

Planning for power at the rack level

Figure 4-34 Planning for power at the rack level

You now know how to plan the building blocks of your solution. But HPC clusters need to scale out;
customers often require hundreds or even thousands of processors. As the first step in scaling out the
solution, you can install several Apollo a6000 Chassis in a rack.

The chassis are designed to transform the rack into one, easily managed, efficient unit. The chassis
itself does not provide any power. Instead, an external power shelf powers two or four fully loaded
chassis or six partially loaded chassis (see Figure 4-34). The shelf intelligently and dynamically
allocates power to each chassis as required. By pooling power across multiple systems, the Apollo
6000 Systems save the customer power and cooling costs as well as valuable data center space.

Begin to plan the power by thinking of the rack as you design the unit. You will probably be able to
support four to six chassis in the rack. Choose a number in that range for your initial plans. You can
then scale that number up or down as you plan more precisely how many chassis a power shelf can
support for your customer ’s requirements. Depending on these requirements, you might need
multiple power shelves in a rack.

First, fully scope out the components that you plan to install in each chassis, including all the
compute, memory, storage, and fabric options covered in the previous sections. Then collect
additional requirements with questions such as these:

What input voltage does the customer ’s site use?

What level of power redundancy does the customer require? N+1 redundancy allows the shelf to fully
power the solution even if one of the power supplies fails. N+N redundancy delivers even higher
availability: a backup power supply for each power supply.

Will you use single-phase power or three-phase power? Three-phase power delivers power as three
alternating currents, each phase shifted by one-third. This phase shifting ensures that the voltage is
never zero, which allows power-hungry devices to draw the power more efficiently. Keep these
requirements in mind:
– You must use either single-phase or three-phase power for all devices powered by the same
shelf.
– You must select PDUs that support the correct phase.
– Three-phase power supports N+N redundancy but not N+1 redundancy.

When you have the answers to these questions, you are ready to plan the power solution using the
HPE Power Adviser. You can download the advisor from
http://www.hpe.com/info/poweradvisor/download.

You can also use the online advisor, which is always up to date, at
http://www.hpe.com/info/poweradvisor/online.

Guidelines for testing the solution

Figure 4-35 Guidelines for testing the solution

After you have made your initial choices for the solution, you are ready to test its performance. You
might begin by benchmarking performance on compute trays in one chassis. Figure 4-35 lists
examples of tools for measuring metrics that are important to HPC applications.

However, keep in mind that specifications such as these only give you a starting point for verifying
that the solution will meet the customer ’s needs. Each HPC application is unique, and parallelized
HPC applications rely on interoperations between multiple nodes. Only testing the customer ’s
application can give a true sense of how your solution will perform.
You should develop a proof of concept (POC) solution. Then work with the customer to select a few
typical jobs, including ones that require fewer resources, ones that require an average amount of
resources, and ones that require the most resources. Determine how many processor or processor
cores the customer plans to run these jobs on. Then scope out a POC solution with the required
number of processors—perhaps one chassis or one rack. Run the jobs and assess whether the time is
what the customer expects or whether you need to adjust the plan.

If the latter, you can use server diagnostics such as those provided by HPE Cluster Management
Utility (CMU) to assess what is slowing down the job. (You will learn more about Insight CMU later
in this ebook.) Is the CPU, the memory, the disk I/O, or the network acting as the bottleneck? When
you know the answer, you can plan which resources you need to enhance.
Chapter 4—Activity 2
You will now scope out your solution more fully while also planning power for it. You will use the
HPE Online Power Advisor for this purpose. (Keep in mind that the version could change from the
time that this ebook was published.)
1. What should you discuss with the customer before planning the power for the HPE Apollo 6000 solution?
2. Access the tool at http://www.hpe.com/info/poweradvisor/online.
3. You might need to activate Silverlight.
4. Agree to the License Agreement (see Figure 4-36).

Figure 4-36 HPE Online Power Advisor: License Agreement

5. Create a profile by filling out your name and email and selecting your country. Then click OK (see Figure 4-37).

Figure 4-37 HPE Online Power Advisor: Profile Information

6. The customer data center uses 220VAC for the input voltage, as shown in Figure 4-38.
Figure 4-38 HPE Online Power Advisor: Input voltage

7. In the navigation pane on the left, expand racks (see Figure 4-39).

Figure 4-39 HPE Online Power Advisor: Racks

8. Choose a 47U Intelligent rack and name it HPC rack, as shown in Figure 4-40. Then click OK.
Figure 4-40 HPE Online Power Advisor: Select the Rack Description

9. Expand Enclosures > HPE Apollo Enclosures and select HPE Apollo 6000, as you see in Figure 4-41.

Figure 4-41 HPE Online Power Advisor: Enclosures

10. Select the chassis that appears in your rack. Click the Config button at the top of the window (see Figure 4-42).
Figure 4-42 HPE Online Power Advisor: Config

11. Your plan probably calls for more than four enclosures. Begin by planning to support four enclosures on one power shelf.
12. Select Single for the Power Phase and choose the power redundancy based on the customer requirements.
13. Now configure the enclosures (see Figure 4-43). For the purposes of the activity, plan the same configuration for all enclosures
and compute trays.

Figure 4-43 HPE Online Power Advisor: General Configuration

Select All Enclosures same as 1 and click Config, as shown in Figure 4-44.
Figure 4-44 HPE Online Power Advisor: Enclosure Configuration

14. Choose the tray that you selected in the previous activity and click Add (see Figure 4-45).

Figure 4-45 HPE Online Power Advisor: Tray Configuration

15. Select enough trays to fill the enclosure.


16. Click Config, as shown in Figure 4-46.

Figure 4-46 HPE Online Power Advisor: Tray Configuration

17. Select All Trays same as 1, as shown in Figure 4-47.

Figure 4-47 HPE Online Power Advisor: Tray Configuration

18. In the real world, you might need to gather more information. For the sake of the activity, assume that you have discussed
options with the customer and decided on:

– If you are proposing an XL220a, a four-core 3.5GHz processor, which is the E3-1241 v3
– If you are proposing an XL230a or XL250a, a 12-core 2.5GHz processor, which is the E5-
2670v3
19. Click Add. Select 1 for an XL230a or XL250a and 2 for an XL220a (see Figure 4-48).
Figure 4-48 HPE Online Power Advisor: Model Configuration

20. Choose the memory options based on the configuration you determined in the previous activity.
21. Click Add and select 1 or 2 (see Figure 4-49).

Figure 4-49 HPE Online Power Advisor: Model Configuration

HPE Apollo 2000 and 6000 management


You will now learn about the onboard management options for HPE Apollo solutions.

Overview of management tools


Figure 4-50 Overview of management tools

HPE Apollo solutions provide built-in tools to help you manage the solutions at the system (server
and chassis) level. They also support tools that help you to manage at the rack level and the solution
level (see Figure 4-50). This section covers the chassis-level tools. Chapter 9, “Monitoring and
Managing HPE Solutions,” covers the rack- and solution-level tools, which also support other HPE
ProLiant servers.

HPE Apollo management modules

Figure 4-51 HPE Apollo management modules

The HPE Apollo a6000 Chassis provides a Management Module with an iLO port through which
administrators reach iLO functions on the servers (shown in Figure 4-51). The HPE Apollo 6000
Management Module simply aggregates the iLO connections for ProLiant XL servers installed in the
Apollo chassis. Administrators still contact and manage the server at its own iLO IP address like a
traditional rack server. If the customer wants to control functions such as power for XL servers on a
wider scale, you should propose the HPE APM, which is covered later in this ebook.
The HPE Apollo 2000 chassis supports an optional Rack Consolidation Management (RCM) module.
Companies have the option of installing the RCM and using its iLO port to reach XL170r or XL190r
iLO functions or of using a dedicated iLO port on each server node. The RCM also provides a port
for connecting to APM.

Planning iLO connections

Figure 4-52 Planning iLO connections

To allow customers to take advantage of the iLO Management Engine, you must establish the correct
connections. (However, if the customer is planning to use HPE APM, APM will provide the iLO
connections instead, as described later in this ebook.) As you design the connections for these iLO
ports, keep in mind that you should typically isolate the management network from the network that
the Apollo servers are using. Plan to add a 1GbE switch (such as an HPE Aruba 3800) for the iLO
connections.

Each Apollo 6000 management module or Apollo 2000 RCM has two iLO ports. The two ports are
not bonded together, which means that if you connect them incorrectly, you could create a loop. Take
care to follow the directions below carefully.

You can choose to connect each Apollo chassis directly to the switch that you selected for the iLO
connections. Connect only one port on each chassis to avoid loops. As an alternative design, you
could connect several Apollo a6000 Chassis in a daisy chain and then connect the final chassis to the
switch. You would then use both iLO ports on most of the chassis, as shown in Figure 4-52. This
design does not introduce a loop, and it uses fewer ports on the network switch; a single switch could
support many chassis in many racks. However, this design is less fault resistant. If one connection
fails, the customer can no longer reach the iLO management engines on all chassis below the failed
connection. For some customers, the increased availability is well worth the limited expense of
purchasing 1 Gbps switches with enough ports for all of the chassis. Other customers do not require
high availability for the iLO functions and prefer to use fewer ports.
Chapter 4—Activity 3
You will now use the HPE Proposal Web to prepare a presentation of your solution benefits. As you
do, keep in mind that you have learned this in product discussions:
• The CEO is very concerned about the environment and making operations as green as possible.
• IT managers found it difficult to assess how resources in the existing solution were being utilized.
• Some decision makers are worried that they would not be able to get a powerful enough solution in
just a few racks.
• Decision makers need to demonstrate the cost effectiveness of the solution that they choose.

Make sure to address these concerns, as well as to list other benefits. You will use the HPE Proposal
Web to help in demonstrating values. Instructions for accessing and using HPE Proposal Web are
provided below. You can also draw on your power plan.

You require access to the HPE Partner Portal to access HPE Proposal Web. If you do not have such
access, skip this activity.

HPE Proposal Web


1. Log into the HPE Partner Portal at https://partner.hpe.com.
2. Select My Workspace > Create Proposals.
3. Click Go (see Figure 4-53).
Figure 4-53 Proposal Web

4. Click Partner Login (see Figure 4-54) and enter your credentials again.

Figure 4-54 Proposal Web log on


5. Choose your language portal (see Figure 4-55).

Figure 4-55 Proposal Web: choosing a language

6. Click the Wizards tab.


7. Choose Enterprise Server, Storage, Networking, and Solutions Wizard (see Figure 4-56).

Figure 4-56 Proposal Web: Enterprise Group Wizards

8. Select the components for your solution (the Apollo solutions are included with ProLiant servers). Then click Next. See Figure
4-57.
Figure 4-57 Proposal Web: Enterprise Group Wizards

9. Choose your precise models (see Figure 4-58). Then click Next.

Figure 4-58 Proposal Web


10. Continue clicking through the wizard, customizing the elements as you choose.
11. Download your proposal.
12. Customize the proposal to remove, for example, mention of compute trays that you are not proposing.

Summary
In this chapter, you have learned why so many companies are turning to HPC solutions to obtain a
competitive edge. You examined HPC use cases and saw how different applications have different
needs. You then learned to address these needs with density-optimized HPE Apollo solutions,
tailoring compute, memory, storage, and fabric to the workload. Finally, you explored ways to
simplify managing, monitoring, and provisioning the solutions.

Learning check
Review what you have learned by answering these questions. Then check your answers in Appendix
A: Answers to Learning Checks.
1. You are planning to propose an HPE Apollo 6000 System, and you have determined that a
customer ’s HPC application will benefit from GPU acceleration. Which compute tray should
you propose?
a. HPE ProLiant XL190r
b. HPE ProLiant XL220a
c. HPE ProLiant XL230a
d. HPE ProLiant XL250a

2. A customer tells you that its HPC application uses a SAN shared disk solution. What should you
make sure to include in your proposal?
a. HPE FlexFabric adapters that support FCoE (or FC HBAs) for the compute tray
b. HPE ProLiant SL4540 server to act as a NAS
c. HPE Apollo 2000 System to connect to the SAN
d. PCIe riser and HPE Smart Array Controller P430 or P440 for the compute tray

For answers, See Chapter 4 in Appendix A.

Supplemental content
Table 4-6 is provided for use during the Activities. You should check the latest QuickSpecs for the
most up-to-date information.
Table 4-6 HPE Apollo 6000 compute tray options
Chapter 5 HPE Apollo 4000 for Data-Driven
Organizations

EXAM OBJECTIVES
• Briefly describe the HPE Apollo 4000 portfolio
• Position HPE Apollo 4000 solutions for the right use cases
• Create an implementation plan for an HPE Apollo 4000 solution, including plans for the proper
performance, scalability, and high availability

Assumed knowledge
Before reading this chapter, you should have a basic understanding of the following:
• Processors, including DDR3 and DDR4 memory, hard disk drives (HDDs), solid state drives
(SSDs), and RAID levels for storage volumes
• HPE ProLiant rack and blade servers and options for them such as HPE Smart Array Controllers
• HPE BladeSystems, including interconnect modules and Virtual Connect (VC) modules
• Server management and maintenance, including experience with iLO, Intelligent Provisioning,
UEFI, HPE Insight Remote Support, HPE Insight Online, HPE Smart Update Manager (SUM), and
HPE Insight Control server provisioning (ICsp)
• HPE OneView capabilities
Chapter topics
This chapter begins with an overview of the HPE Apollo 4000 family. You then review the scenarios
that call for HPE Apollo 4000 solutions. Finally, you learn about decision points for architecting the
solution.

HPE Apollo 4000 overview


Begin with the overview of the HPE Apollo 4000 family.

HPE Apollo 4000 Family—Purpose-built big data servers

Figure 5-1 HPE Apollo 4000 Family—Purpose-built big data servers

Starting from the Apollo 4200 Gen9 server and moving up to the Apollo 4510 Gen9 server, this
server family handles data-intensive workloads that range from Hadoop analytics to object storage,
as you see in Figure 5-1.

The Apollo 4200 Gen9 is a dense storage server product that comes in the familiar 2U form. The
large form factor (LFF) version is perfect for object storage deployments, reaching over 4.4 PB per
rack (20 servers per 42U rack) The small form factor (SFF) model is ideal for Hadoop analytics,
content delivery, and other applications requiring fast spindles.

The Apollo 4500 Gen9 Servers come in 1- or 3-node configurations. The Apollo 4510 1-node
configuration is purpose-built for object storage. It delivers a total of 68 drives per system, up to 544
TB per system, and over 5 PB per rack when it uses 8 TB drives.

The Apollo 4530 3-node configuration is purpose-built for Hadoop analytics. The Apollo 4530
supports Intel Xeon E5-2600v3 processors, as well as up to 16 dual in-line memory modules
(DIMMs), for up to 1024 GB memory per system. It has 15 drives per node (45 total drives per
system).
Apollo 4200

Figure 5-2 Apollo 4200

The HPE Apollo 4200 Gen9 Server was designed as a versatile, entry-level, density-optimized big
data server that integrates seamlessly into traditional enterprise data centers with the same rack
dimensions, cabling, and serviceability and uses the same administration procedures and tools (see
Figure 5-2). This makes it the ideal bridge system for enterprises wanting to start implementing
purpose-built big data server infrastructure today and scale in affordable increments.

This 2U rack server has industry-leading storage capacity of up to 224 TB with up to 28 hot-plug LFF
hard disk drives/solid-state drives (HDDs/SDDs) per server. It can also be configured for
performance and throughput to cover the range of big data solutions technologies from object
storage to data analytics and HPC data-intensive applications. For high-performance computing, there
are options for Top Bin CPUs, integrated accelerators (GPUs and coprocessors), high-performance,
low-latency cluster, and networking input/output (I/O).

HPE Apollo 4200 Gen9 Server model options


Figure 5-3 HPE Apollo 4200 Gen9 Server model options

The HPE Apollo 4200 Gen9 Server, shown in Figure 5-3, offers market-leading storage capacity
among 2U storage servers, supporting up to 24 LFF hot-plug drives or 48 SFF hot-plug drives.

For object storage customers looking for the lowest cost-per-gigabyte economics, the LFF model fits
up to 224 TB of storage using 8 TB LFF HDDs. A 42U rack of Apollo 4200s can fit over 4.4 PB.

For high-performance Hadoop-based workloads, the HPE Apollo 4200 can support up to 54 serial-
attached SCSI/(SAS/SSD) with 12 G throughput and 15 K revolutions per minute. The Apollo 4200
Gen9 Server is a perfect match for parallel processing applications like Hadoop, with up to two Intel
Xeon E5-2600v3 processors per server that can reach up to 18 cores, as well as Xeon E5-2600v4
processors.

For object stores needing fast performance with small objects, or in-memory data processing for
analytics software, the Apollo 4200 offers up to 1024 GB of double data rate fourth generation
(DDR4) memory (16 DIMM slots with DIMMs up to 64 GB).

Other features include


• Embedded 2x1 Gb Ethernet, FlexibleLOM expansion slot
• Up to five peripheral component internet express (PCIe) slots to give flexibility for future I/O
upgrades
• Embedded HPE Smart Array P840ar Controller
– Supports up to 16 drives with two ports
– Provides point-to-point connectivity to SSDs to reduce latency
– Includes Smart Storage Battery
• Standard rack depth (31.5 inches)
• HPE Integrated Lights-Out 4 (iLO4) to significantly simplify server monitoring, manage fault
tolerance, and provide prefailure warnings for drives
• Single hot-plug drives for easy swapping, additions, and subtractions
• Option to equip up to 10 fans for redundancy
• Familiar HPE hard drive, networking, memory, and controller options for seamless transition to a
density-optimized infrastructure

Familiar 2U form factor designed for dense storage

Figure 5-4 Familiar 2U form factor designed for dense storage

Figure 5-4 shows the architecture of the 2U form factor in the Apollo 4200 Gen9 servers. This
example is an LFF model.

Apollo 4510—Purpose-built HyperScale object storage server

Figure 5-5 Apollo 4510—Purpose-built HyperScale object storage server


The HPE Apollo 4510 System is purpose-built for object storage solutions, as shown in Figure 5-5.
Customers can deploy cost-effective, HPE Apollo 4510 Systems optimized to meet the needs of their
object storage solution requirements at any scale. HPE Apollo 4510 Systems can be configured to
form the foundation platform for the whole variety of big data object storage solutions—from cost-
effective, high-capacity content repositories that address petabyte-scale data volumes to the tuned
responsiveness required for content distribution systems. The space-saving storage capacity—of up
to 544 TB per system and 5.44 PB per 42U rack—can grow to meet object storage solution needs at
any scale, up to 100s of petabytes and much more.

The HPE Apollo 4510 System is ideal for a wide variety of object storage solutions, ranging from
collaboration and content distribution, to content repositories and active archives, to back-up
repositories and cold storage, and everything in between.

The HPE Apollo 4510 system is an ideal platform for the variety of object storage solutions
supported by the HPE HyperScale Data Eco-System partners, including Cleversafe, Scality, Ceph, and
OpenStack/Swift; and it also forms the building blocks for HPE’s own Helion Content Depot.

The HPE Apollo 4510 brings HPE ProLiant Gen9 server technology into its 4U, one-server density-
optimized chassis. It includes
• New levels of rack-scale storage server density
– Up to 68 hot-plug HDDs or SSDs per server—544 TB capacity
– Up to 8 TB SAS/SATA drives
– Up to 5.44 PB in 42U rack (based on 10 Apollo 4510 Systems with 68 8TB LFF SAS HDDs per
system)
• New shorter chassis form factor that allows 1 additional chassis in a 42U rack
– Up to 10 Apollo 4510 Systems in 42U
• Up to four PCIe slots with flexible performance and I/O options to match the variety of object
storage workload response and throughput criteria
• Up to 16 DIMMs per node with DIMMs up to 64 GB, for 1024 GB total memory

Apollo 4530—Massive Density for Hadoop and Big Data Analytics

Figure 5-6 Apollo 4530—Massive Density for Hadoop and Big Data Analytics
The HPE Apollo 4530 System, shown in Figure 5-6, is purpose-built for big data analytics. It can be
configured to optimally match technology requirements for economical large-scale Hadoop-based
data analytics, or it can be configured for more complex compute intensive analytics with high-
performance processors, up to 1024 GB DDR4 memory per server, SSDs, high-performance disk
controllers and fast, high-capacity I/O options.

The HPE Apollo 4530 System is ideal for the wide variety of big data analytics solutions. This
includes parallel Hadoop-based data mining to develop a 360-degree view of customers to improve
the cost-effectiveness of advertising and promotion, increase web commerce sales with “next-product
buy recommendations,” and even provide “crowd-sourced quality control” by matching product
return data with social media sentiment information.

The Apollo 4530 System brings HPE ProLiant Gen9 server technology into this 4U, three-server
density-optimized, shared-infrastructure chassis. It also provides:
• More storage capacity per server and per rack with 8TB LFF HDDs:
– Up to 45 LFF top loading hot-plug HDDs or SSDs
– Up to 8 TB SAS/SATA drives
– Up to 120 TB per server
– Up to 10 chassis per 42U rack with 30 servers
– Up to 3.6 PB capacity
• CPU choices to optimize for performance or economy
– E5-2600v3 or v4 series
– 4–20 cores (1.6 GHz–2.6 GHz CPU speed)
– Power ratings between 55 and 135 Watts
– Up to 1024 GB DDR4 memory at up to 2133 MHz
• Up to 5 PCIe slots with flexible performance and I/O options
• Up to 16 DIMMs per node with DIMMs up to 64 GB for 1024 GB total

HPE Apollo 4500 Gen9 (2S/4U)

Figure 5-7 HPE Apollo 4500 Gen9 (2S/4U)


Table 5-1 lists the features found in the Apollo 4510 and 4530 Gen 9 servers (which are shown in
Figure 5-7).
Table 5-1 HPE Apollo 4500 Gen9 server features

Feature Detail

Processors Up to 2 Intel® Haswell EP E5-2600 v3 or v4, up to 135W, C610 Series Chipset

Memory 16 DIMMs (eight per processor), registered, DDR4(1866/2133) w/ECC

Drive • 60 LFF SAS(12 Gb)/SATA(6 Gb) drives in 1 node (1x60), with 8 LFF option in back
support • 15 LFF SAS(12 Gb)/SATA(6 Gb) drives in 3 node backplane (3x15)
• Includes support of SFF drives in converter
• PMC Belmont SAS expander

Network Dual-Port 1 GbE with FlexibleLOM support

Expansion Up to four low-profile PCIe Gen3 slots


• CPU0: One ALOM x8 LP @x16 slot (G3, 25W), One SmartArray with HBA x8 LP @x16 slot (G3, 25W), One
SmartArray with HBA x8 LP @x16 slot (G3, 75W)
• CPU1 : Two x8 LP @x16 slots (G3, 75W)

I/O Front: 2 external USB ports per node, Video, Power/Health/UID Buttons and LEDs

Management iLO4 + one optional dedicated iLO NIC port

Other 4U chassis height, hot-plug redundant fans, HPE Gen9 flex slot (AC and DC versions)
Features

Compute HPE ProLiant XL450 Gen9

HPE Apollo 4500 Gen9 chassis—Top view

Figure 5-8 HPE Apollo 4500 Gen9 chassis—Top view

Figure 5-8 shows the architecture of the 4500 Gen9 chassis from the top view.
HPE Apollo 4500 Gen9 chassis—Rear view

Figure 5-9 HPE Apollo 4500 Gen9 chassis—Rear view

Figure 5-9 shows the 4500 chassis from the rear view. Notice the placement of the management
module, power supplies, and slots for PCI Express Gen 3. It also includes FlexLOM features.

HPE Apollo use cases


You will now examine the use cases for HPE Apollo 4000 servers, as well as review the workloads
for which they are designed.

Object storage use case

Figure 5-10 Object storage use case

Companies that are dealing with exploding amounts of unstructured data cannot take traditional
approaches to storing that data. Block storage is simply too expensive for the petabyte scale that
customers require to store their billions of objects. Typically, customers are archiving the data for
infrequent access, so they do not require the performance of block storage, optimized for heavy read-
writes and speedy I/O. On the other hand, tape (traditionally used for data archival) is too slow,
failing to provide timely access to the data when required.

Object storage provides the right balance for customers with petabytes of unstructured data, lowering
the total cost of ownership (TCO) per gigabyte. However, to deliver the balance of right performance
and right economics that the customer is looking for, the object storage solution must be built on the
right infrastructure (see Figure 5-10). Customers who attempt to build petabyte scale object storage
solutions on “white box” hardware often find that the solution fails to deliver the required
performance and reliability. IT staff must manage a complex set of components that might not work
well together, causing the company to lose the savings that they gained in lower capital expenditures
in higher operating expenditures.

HPE Apollo 4000 solutions solve these woes with tested hardware that provides the right performance
and management simplicity. As you learned earlier, customers have a choice of flexible hardware.
They also provide simple HPE Secure Encryption, a key requirement for many enterprises.

Big data analytics and NoSQL use case

Figure 5-11 Big data analytics and NoSQL use case

A large majority of customers, 75%, agree that insights from big data reduce costs and increase
revenue (see Gartner 2013 CEO Study). Therefore, customers are highly motivated to find big data
solutions that help them to harness their data for day-to-day decision-making processes and for
competitive value (see Figure 5-11). For example, companies often have a great deal of data from
business transactions with their clients. They can mine this data for patterns that could tell them the
best times of day to contact clients, the most effective marketing campaigns, and so on.

Customers recognize and are eager to obtain these potential benefits. Analytics and business
intelligence are the top technology priorities for small and medium-sized business chief information
officers (SMB CIOs). (See Annual CIO Study, Gartner 2014.)

But companies are finding it increasingly difficult to extract value from data. According to Forrester,
66% of customers find doing so very or extremely challenging (see Forrester Study 2015). As much
as customers need the right big data analytics software, they need the right infrastructure bolstering
that software. Many companies’ issues with big data stem from a lack of servers that are optimized
for the correct workloads.

HPE Apollo 4000 solutions are purpose-built to support big data analytics, giving customers faster
results and the real insights that they need to make day-to-day decisions. In this way, they help
customers achieve a real return on investment (ROI) on their data.

HPE Apollo 4000 architecture


This section teaches you how to design HPE Apollo 4000 solutions for object storage and big data.

Traditional big data architecture

Figure 5-12 Traditional big data architecture

Traditionally, Hadoop has operated under the principle of bringing compute to storage for data
processing, as shown in Figure 5-12. That is, compute nodes are colocated on the data nodes in the
form of servers with direct attach storage (DAS). A YARN application can then assign a piece of a job
to a node that stores the data for that job locally.

HPE has designed a new, more flexible architecture that divides compute from storage. This
architecture is discussed in a later chapter. This chapter focuses on customers who want to take the
traditional approach.

The next sections guide you through the decision points for designing a big data analytics solution.

Selecting the HPE Apollo 4000 model for big data analytics
Figure 5-13 Selecting the HPE Apollo 4000 model for big data analytics

You learned about the HPE Apollo 4000 models earlier in the chapter. In Figure 5-13, you see an at-a-
glance view of the two models that are recommended for big data analytics: an HPE Apollo 4200 SFF
or the more powerful HPE Apollo 4530 System with up to three ProLiant XL450 servers. The Apollo
4200 is intended as a bridge to big data for customers who want a traditional 2U server and a smaller-
scale big data analytics solution.

As shown in the Figure 5-13, the 4530 System provides a higher density for HDD storage capacity
(HDD, being the typical choice for the solution), as well as for more processing power and memory,
enabling it to meet the demands of more complex analytics applications. (Note that capacity values are
accurate as of the publication of this ebook and provided for your convenience. However, HPE might
add new HDDs and SSDs; you should check QuickSpecs for the latest values.)

Scoping big data storage needs

Figure 5-14 Scoping big data storage needs

You and the customer should discuss the storage capacity that their big data solution requires. These
discussions will cover the current amount of data that the customer has, as well as the ingest rate—the
rate at which the data is expanding (see Figure 5-14). For example, perhaps the customer currently has
300 TB of data and is ingesting data at the rate of 400 GB per day. If the solution should continue to
meet the needs for two years, then 1.33 PB of storage capacity is required.

From the desired capacity, you can calculate the capacity that you must deliver with the HPE Apollo
4000 solution. First, the storage capacity that is usable by Hadoop distributed file system (HDFS) is
generally about 90% of the raw storage capacity. Next, take into consideration that HDFS will
replicate the data, typically three times.

You must also take into account that applications will need to store files temporarily as part of the
analysis job. For example, as compute nodes complete map tasks, they store the result files to the file
system so that they will be available for shuffling and analysis during the reduce phase. You should
leave about 25% of the space free for these files.

To calculate the required storage capacity, multiply the desired capacity by four and then divide by
0.9. For example, if the customer needs 1.33 PB capacity, the solution should provide at least 5.91 PB.

However, HPE recommends that customers decrease the capacity requirements by using compression.
Discuss with the customer what forms of compression they intend to use. Often customers use Google
Snappy (an open source compression library) for data that is frequently accessed because, although
Snappy does not compress data to the smallest file possible, it optimizes compression and
decompression times. Gzip (another compression library) can compress data to a greater degree, but
takes longer to do so. The amount of space saved through compression depends on the type of files
that are being compressed. For example, Google cites Snappy as having compression ratios of about
1.5–1.7 for plain text and 2–4 for HTML. Files that are already compressed (such as JPEGs) cannot be
further compressed.

You will need to agree with the customer on the amount of space that will be made available by
compression, taking care to estimate for a worst-case scenario. For example, suppose that the
customer is using Snappy and has a mix of file types; most of the data is HTML, but some data are
compressed images. You agree on planning for a 1.5 compression factor. Instead of providing 5.91
PB, you will provide 3.94 PB.

You might also need to add a bit of space for supporting files. (The servers support two micro SSDs
or M.2 2280 SSDs, which you could use for the image.)

Note that you do not need to plan for RAID; instead, the disk drives act as Just a Bunch of Disks
(JBOD). HDFS handles replication and distribution of data. (On the other hand, you might want to put
some other data, such as the OS or HDFS metadata, on disks set up for RAID 10 to provide
redundancy for that data.)

Choosing the drive type


As you have learned, SSDs provide faster random I/O and sequential I/O than HDDs. However, the
performance differences are greatest in random I/O; HDDs can also provide good sequential I/O.
MapReduce applications and many analytic applications that operate on HDFS read files as a whole,
making sequential I/O most important. SSDs might provide somewhat better performance than HDDs,
but for many big data analytics purposes, the difference in performance is not worth the more
significant difference in cost per byte. Thus HDDs often provide the best choice for meeting the
customer ’s capacity requirements at the right TCO.

The HPE Apollo 4000 family does support SSDs, which you should choose for specialized
requirements or a subset of the data. For example, the customer might want to store the most
frequently accessed data on SSDs. For some shuffle-heavy applications, you can improve
performance by placing the intermediate result files for MapReduce jobs, which need to be written
and shuffled during the course of the analytic process, on SSDs. In addition, some big data
applications such as NoSQL databases or Interactive Hive require faster, random access to data, and
the more significant increases in performance might make the SSDs worthwhile to your customer.

Scoping compute and memory requirements for big data analytics


The HPE Apollo 4200 and ProLiant XL450 servers support flexible compute and memory
requirements so that you can match them to the workload.

As a base processor that works for most environments, you might choose the Intel Xeon-E5 2650v3,
which provides 10 cores that operate at 2.3 GHz. With two processors per server with 15 disks, this
choice provides over a 1:1 core to spindle ratio (the typical minimum requirement recommended by
Hadoop). If the customer has CPU-bound applications such as Impala, Spark, and Solr Search, you
can choose a processor with more cores or with more cores and a higher clock speed. (Table 5-2
shows examples of CPU-bound tasks.)

As a general starting guideline, big data analytic workloads require at least 4 GB memory per core.
This guideline would mean that the 10-core processor would require at least 40 GB memory, and a 2P
Apollo 4200 or XL450 server requires at least 80 GB. To maximize memory performance, you also
need to follow the recommendation of balancing DIMMs across all memory channels (four per
processor). Therefore, you should generally round up to using at least eight 16 GB DIMMs, for 128
GB total.

Again, you have the flexibility to provision more memory. For example, you could use 32 GB
DIMMs instead, increasing the capacity to 256 GB. You can scale as high as 1024 GB per server,
using 64 GB DIMMs and both memory slots. Scale the memory up when the server must support
memory-bound applications such as Interactive Hive, Impala, Spark, or HBase or other NoSQL
databases.

As always, you should refer to HPE reference architectures for the customer application when
possible.

Table 5-2 Examples of CPU-bound tasks

CPU bound
Classification
Clustering
Complex data mining
Feature extraction
Natural language processing

Other solution components

Figure 5-15 Other solution components

Remember that a Hadoop solution requires a server for the management node, through which users
submit queries, and two head nodes, which provide active/standby Resource Managers and other
services. HPE recommends that you use three rack servers, such as HPE ProLiant DL360 servers, for
these roles (see Figure 5-15).

Typically, a cluster uses an isolated private network for communications between all worker nodes,
the management node, and head nodes. A connection to an extract, transform, load (ETL) network is
required for ingesting the data. Discuss how the customer plans to have the cluster ingest data. Some
server administrators dual-home all data nodes on the cluster network and the ETL to distribute the
ingesting work. In this case, you would need to ensure that you work with the network architect to
plan the HPE Apollo 4200 or XL450 server FlexibleLOM or I/O adapters to support these
requirements.

Other administrators use an edge node or two redundant edge nodes to handle ingesting and staging
all the data for the cluster. This design protects other worker nodes from the external network. If your
customer wants to use an edge node or nodes, remember to add an HPE DL360 server for each edge
node. This server must have disks with enough capacity to handle the ingest rate.

Guidelines for testing big data analytics


As always, you should develop a Proof of Concept (POC) that demonstrates to the customer the
performance and the efficiency of the HPE solutions and your design. The POC should match your
design as closely as possible.

Before you run the test, it is also important that you tune the nodes to better support the application.
Table 5-3 lists HPE reference architecture documents that explain the tuning guidelines. This tuning
will ensure the best results from the test. You should also recommend that the system integrator
completes the same steps for the final solution so that it operates most efficiently.
Table 5-3 HPE traditional big data reference architectures

Solution Reference architecture

Cloudera HPE Verified Reference Architecture for Cloudera Enterprise 5 on HPE Apollo 4530 with RHEL

Hortonworks Data Platform HPE Verified Reference Architecture for Hortonworks HDP 2.2 on HPE Apollo 4530 with RHEL

You are then ready to test. Benchmarking tools provide generic metrics—for example, the throughput
for reads and writes to the HDFS cluster. Table 5-4 lists some benchmarking tools for big data and
analytics.
Table 5-4 Example benchmarking tools

Solution Benchmarking tool Description

NoSQL databases Yahoo Cloud Service Benchmark Tests throughput for read/write queries to the database

HDFS TestDFSIO Tests throughput and average I/O rate for read/writes to HDFS

HDFS and MapReduce TeraSort Tests time for sorting data (large job)

MRBench Tests average time for completing many small jobs

Benchmarks might have a role to play in your testing, but you are more precisely attempting to
determine how well your customer ’s application runs.

Plan several tests using the customer applications with datasets of various sizes, including one that
meets or exceeds the customer ’s maximum needs. You should also choose tests that place various
demands on the solution, including worst-case scenario demands. For example, for an HBase test, you
might run read-heavy tests and write-heavy tests, as well as balanced read-write tests. You should also
test how the solution handles a high degree of random I/O requests.

After you run the test, determine whether the execution time and other metrics are acceptable or
whether you need to adjust the solution. The application that you are testing might provide you with
valuable metrics for this purpose. For example, Hortonworks Data Platform (HDP) uses Ambari to
collect and expose metrics; the Cloudera Manager also tracks metrics. Table 5-5 gives examples of
some metrics that you might examine as you test. You can find a complete list of Hadoop metrics at
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html.
Table 5-5 Example metrics

Application Metric Meaning

HBase regionserver.Server.blockCacheEvictedCount Number of blocks that had to be evicted from the block cache due
to heap size. If this stays at 0, all of your data fits completely into
HBase blockcache (stored in the cartridge node memory), which is
the most desirable case.
Number of blocks that had to be evicted from the block cache due
to heap size. If this stays at 0, all of your data fits completely into
HBase blockcache (stored in the cartridge node memory), which is
the most desirable case.

regionserver.Server.blockCacheExpressHitPercent The percentage of time that requests with the cache turned on hit the
cache.
Values under 100 mean that hot data being processed cannot be
entirely fit into blockcache. If the number is too far below 100,
scale up the number of compute nodes.

regionserver.Server.storeFileSize Aggregate size of the store files on disk.


Make sure this value is similar on all region servers in order to
properly balance the HBase load.

regionserver.Server.blockCacheFreeSize Number of bytes that are free in the blockcache.


This value indicates how much of the cache is used. It is a good
indicator if your data is “warmed” by moving it into cache, so a low
value is good.

regionserver.Server.readRequestCount The number of read requests received.


You can use this metric to see how many requests the solution is
handling.

regionserver.Server.flushQueueLength Current depth of the memstore flush queue.


This metric should stay about the same over time. If it increases, the
node is falling behind with clearing memstores out to HDFS.

Any QueueMetrics PendingMB The current memory or CPU resource requests that are not yet
YARN QueueMetrics PendingvCores scheduled.
application A high number might indicate that you need to scale out the number
of cartridges so that they provide more memory or cores.

QueueMetrics running_0 The current number of applications whose elapsed time is less than
QueueMetrics running_60 60 minutes, between 60 and 300 minutes, between 300 and 1440
QueueMetrics unning_300 minutes, and more than 1440 minutes.
QueueMetrics running_1440 You can use these metrics to determine whether jobs are completing
in the customer’s desired execution time.

AppsSubmitted Number of applications that have been submitted to the resource


AppsRunning manager for scheduling, that are running, that are waiting to be
AppsPending scheduled, and that are completed.
AppsCompleted You can use these metrics to determine whether the solution can
handle the required number of jobs. For example, you can see how
many applications are running when the number of applications
pending begins to reach an unacceptable time.

Object storage architecture


Figure 5-16 Object storage architecture

You learned about object storage in a previous chapter. Figure 5-16 reviews the basic design for an
object storage solution. A cluster of object storage servers store objects, which they distribute and
replicate across their disks based on rules dictated by the application. When clients need to read or
write to an object, they send a request to a front-end server, which might be called a proxy server,
connector, gateway server, or something else depending on the application. The server tells the client
where the client can access an object by using a map that is often called a ring. Often the architecture
calls for two such servers for redundancy. After a client knows the location of an object, it obtains the
object directly from the object storage server.

You can generally use traditional rack servers such as HPE ProLiant DL360p servers for the proxy
server role. This chapter focuses on the design for the HPE Apollo 4200 or 4510 Systems, which play
the role of object storage servers.

Be aware that the architecture shown in Figure 5-16 is highly simplified. Each object storage
application such as OpenStack Swift for cloud deployments, Ceph, Scality RING, and Cleversafe has
its own architecture and terminology. For example, OpenStack Swift also defines account and
container servers, which help to ensure that data for different tenants is isolated and also that data is
stored according to customizable policies. Nonetheless, the applications generally follow a model
such as the one shown here. You can refer to specific reference architectures for more precise
information about that solution’s architecture, as shown in Table 5-6.
Table 5-6 HPE object storage reference architectures

Solution Reference architecture

Ceph Ceph on HPE Apollo 4200/4500 System Servers

Scality Scality RING on HPE Apollo 4200

Cleversafe Cleversafe on HPE Apollo 4500

Selecting the HPE Apollo model for object storage


Figure 5-17 Selecting the HPE Apollo model for object storage

The HPE Apollo 4510, which supports one ProLiant XL 450 server, is purpose-built for object
storage. As you see in Figure 5-17, this system is optimized for storage density as opposed to
processor or memory density. (Note that values are accurate as of the publication of this ebook.
However, HPE might add new HDDs and SSDs; you should check QuickSpecs for the latest values.)

The Apollo 4200, on the other hand, is designed for more general-purpose storage solutions, so it
balances processors, memory, and storage. The HPE Apollo 4200 System often makes a good choice
for customers who need entry-level object storage solutions around the 1 PB scale. Even for larger
solutions, some customers might prefer the Apollo 4200, which provides less storage per node;
customers might not want the solution to have to recover up to 544 TB of data if a server goes down.
(The solution recovers data by copying it from other replicated copies.) On the other hand, the
Apollo 4510 provides the best density and optimization for object storage.

Scoping object storage capacity

Figure 5-18 Scoping object storage capacity


You should follow a similar process to scope the capacity that an object storage solution requires to
that for scoping a big data storage solution, as shown in Figure 5-18.

First, discuss the needs with the customer, making sure that the customer informs you about how
quickly data is accumulating and how long the customer expects this solution to accommodate the
new data without additional scaling. Then multiply this data by three to account for the standard
replication factor. Make sure, though, to discuss the replication factor with the customer because
some customers might choose to use a different factor.

Also take into account that your systems will need some disk space for other purposes, such as
storing an image and various supporting files. For example, OpenStack Swift requires all object
storage servers to store the ring that maps data to its location. Other object storage solutions have
similar requirements. You should plan to reserve 6%–10% of the disk space for this purpose. (The
servers provide two M.2 2280 SSDs, which you could use for the image.) When you have taken all of
these factors into account, the ratio of required capacity to stored data will be just over 3:1. You can
look to HPE reference architectures for requirements tailored to the application. For example, Ceph
recommends a ratio of about 3.2:1.

Finally, discuss with customers whether they want you to take compression into account when you
propose a solution. As you learned earlier, compression might reduce the size of stored data to a
lesser or a higher degree, depending on the type of compression and the type of files.

In many cases, you will not need to plan for redundant array of independent disks (RAID) because the
object storage application handles distributing and replicating data. However, you should look at the
guidelines for the customer ’s particular application. In some cases, RAID 0 might be recommended.

Planning servers, drive types, and drive number


You will also need to choose a type of drive for the solution. Midline HDDs, using either SAS or
SATA as the customer prefers, provide the best option for most object storage. Some cases, though,
do exist in which greater performance and reliability of SSDs might pay off. For example:
• Account and container data for OpenStack Swift when accounts contain millions of containers or
containers have listings for millions of objects
• Ceph journals

Clients can perform random read-writes to the journal, which are then synchronized to the object,
enabling better performance and consistency (in case multiple clients are accessing a file at once).
Journals can be colocated with the objects, or they can be located on separate SSDs. Using the SSD
option enables faster read-writes, but does add to the cost of the solution and might make design and
failover more complicated. Refer to the Ceph on HPE Apollo 4200/4500 System Servers reference
architecture for more details.

You can use the required storage capacity and the maximum storage capacity for the selected HPE
Apollo model to begin planning the number of systems for the solution. For example, you have
determined that you need to provide 5.8 PB of data, and you have selected HPE Apollo 4510 Systems.
Each Apollo 4510 System provides up to 544 TB, so 11 systems are required.
This estimate provides a starting point. You sometimes need to plan more servers that each supports
less than the maximum capacity:
• Discuss how the customer intends to use the object storage solution. When many clients need fast
responses to their requests? In that case, you might choose to use lower capacity drives. Each drive
will have fewer demands on it, so the system can provide a better disk I/O.
• Also examine the object storage solution’s recommendations for the minimum number of cluster
members. These recommendations might affect how well the application is able to protect data
from loss. For example, most clusters should have at least three physical systems so that replicated
data is distributed across three different systems.
• You should also round up requirements so that every object server has an identical disk
configuration with the same number and type of disk drives. For example, to provide 6 PB of
storage capacity, you should plan 11 fully loaded HPE Apollo 4510 Systems, which actually
provide 6.094 PB. This best practice helps to ensure consistency and good performance.

Planning compute, memory, and fabric for object storage


Compute

You should pay careful attention to the processors that you recommend, particularly for HPE Apollo
4510 Systems, which support a higher density of disks per processor. Each drive represents a device
to which clients can make requests. The more cores and the higher clock speed that a processor
provides, the better the processor is able to handle these requests. Discuss how the customer intends to
use the unstructured data. Is the solution primarily for data archival—“cold” data? Or will clients
interact with the data to a fair degree—“hot” data? In the latter case, plan for more powerful
processors with more cores.

Memory

The general industry guideline for object storage memory is about 0.5 GB RAM per 1 TB storage. If
an HPE Apollo 4510 System is at full 544 TB capacity, it will require about 272 GB, or perhaps 256
GB for a balanced configuration. Discuss whether the solution will include certain more frequently
accessed objects. If so, adding more memory will provide more room for servers to store these
objects in a cache and could improve performance.

Fabric

HPE Apollo 4200 and 4510 Systems support FlexibleLOM and PCIe expansion cards, which include 1
GbE and 10 GbE options. To choose between 1 GbE and 10 GbE, consider the need for a speedy
recovery in case of a failure. Replicating 1 TB of data across 1 GbE links takes about three hours.
The same process can take just 20 minutes with 10 GbE connectivity. Also consider how “hot” or
“cold” the solution is. Hot solutions require greater network bandwidth.

Guidelines for testing object storage solutions


The industry does not have widely accepted benchmarks for object storage performance. However,
you can still create a POC and demonstrate to the customer how well the HPE Apollo 4000 solution
will perform. Remember to include all components, such as HPE ProLiant DL360p servers to act as
proxy or gateway servers.

Test the solution with a variety of request scenarios, including


• All GET requests (reads), all PUT requests (writes), and then a mix of GETs and PUTs based on the
customer expectations
• Requests (GETs, PUTs, and then mix) for files of different sizes

For each test, continue to scale up the number of requests while monitoring the performance. You
want to see that throughput scales more or less linearly as requests are added. Also monitor latency to
determine when it begins to rise over an acceptable level for the client (usually, three or so seconds).
Make sure that the solution can handle the required number of requests with acceptable latency. If you
detect issues, you can use a solution such as HPE Insight CMU or OS tools to check resource
utilization and determine what is acting as the bottleneck.
Chapter 5—Activity 1
You will now examine a customer scenario and plan an HPE Apollo 4000 solution to meet the
customer ’s needs. In this plan, you will
• Design a solution to host big data analytics
• Architect the solution to meet the customer ’s needs, including
– Providing enough storage capacity
– Meeting the compute needs

Scenario
A retailer operates a chain of grocery stores throughout a region. The company has a great deal of
data about inventory, customers, purchases, and so on. The company is just venturing into big data
solutions and plans to deploy Cloudera Hadoop. The customer wants a more scalable and reliable
way to store data. The customer also wants to start analyzing that data to make more informed
decisions. For example, the customer hopes to learn more about the most loyal customers and the
highest-spending customers so that marketing can make better decisions about how to brand the
company.

The retailer has a relatively small data center with traditional rack servers. The CIO has seen projects
fail before due to outdated infrastructure. She wants to ensure that the new big data solution is a
success and is pushing the purchase of servers specifically designed to meet the needs of such as
solution.

Workload requirements
You have discussed the workload requirements with the customer and discovered that
• The customer currently has 1.4 PB of data and an ingest rate of 1 TB a day. The customer wants the
storage solution to provide the necessary capacity for one year before needing to scale out.
• HDFS will use the typical three times replication rate.
• The customer plans to use MapReduce2 applications to analyze data on a weekly basis. Currently,
the customer has just a few standard queries that it will run each week, and the queries can take
hours to complete.

Select HPE products


While discussing needs with key decision makers, you have determined that this customer has a
strong bias toward the Hadoop traditional architecture. You decide to propose an HPE server solution
to support a traditional big data architecture.
1. What are two HPE server solutions that might fit this customer ’s needs?
2. What should you discuss with the customer to help you determine which of these servers you
should propose?
You can check your answers by referring to Appendix B: Answers to Activities.

Scope the storage requirements


Record your answers to these questions.
3. What should you discuss with the customer when planning how you can under provision storage
based on the fact that data will be compressed?
4. Assume that you and the customer have agreed that you can plan on a compression factor of 1.5
(in other words a 1.5 MB file will take up 1 MB). How much storage capacity should you plan for
the data nodes?

Remember to take into consideration the current data, the ingest rate, the replication factor, and space
for result files. (Refer to the “Workload requirements” section above.)
5. Assume that you have discussed the factors that you listed in the first part of this activity and that
you have decided to propose HPE Apollo 4530 Systems. Which ProLiant XL server do you
propose for this system, and how many can you propose per system?
6. How many HPE Apollo 4530 Systems will you propose? More than one answer could be valid,
but think about how you would justify your answer and what you would discuss with the customer
to help you make your choice.
7. You learned about baseline processors and memory for a solution such as this, as well as ones that
meet enhanced needs. Do you think that this customer has baseline or enhanced needs?
8. You will propose two processors per XL server. Based on your answer to the previous question,
which of these processors provides the best choice? (Refer to the information presented in the
chapter to remind yourself of the baseline recommendations. Note that the HPE Apollo 4530
System supports more types of processors; you can find a complete list in the QuickSpecs.)
a. Intel Xeon E5-2698v3 with 2.3GHz frequency and 16 cores
b. Intel Xeon E5-2690v3 with 2.6GHz frequency and 12 cores
c. Intel Xeon E5-2650v3 with 2.3GHz frequency and 10 cores
d. Intel Xeon E5-2603v3 with 1.6GHz frequency and 6 cores

9. Based on your answer to question 4, how much memory capacity will you plan for each XL
server in the Apollo System? The server has 16 DIMM slots (four memory channels with two
slots each on each processor). It supports 4/8/16/32 GB RDIMMs and 8/16/32/64 GB LRDIMMs.
Which DIMMs will you propose?

Although a complete solution has more aspects that you must plan, you have planned the fundamental
components for the HPE Apollo 4530 solution. You will plan a big data solution in more depth in
Chapter 7 when you learn about the HPE Big Data Reference Architecture.

You can check your answers by referring to Appendix B: Answers to Activities.

Summary
This chapter has introduced you to the HPE Apollo 4000 family and the use cases for which it is
optimized. You have learned how to design best practice solutions to meet customer requirements for
big data analytics based on the traditional Hadoop architecture and for object storage solutions.

Learning check
Review what you have learned by answering these questions. Then check your answers in Appendix
A: Answers to Learning Checks.
1. Which HPE Apollo server is purpose-built for object storage?
2. Which type of disk drive meets the typical requirements for HDFS?

For answers, See Chapter 5 in Appendix A.


Chapter 6 HPE Moonshot Solutions

EXAM OBJECTIVES
• Briefly describe the HPE Moonshot portfolio
• Position HPE Moonshot solutions for the right use cases
• Explain options and best practices for designing the networking component of an HPE Moonshot
solution

Assumed knowledge
Before reading this chapter, you should have a basic understanding of the following:
• Processors, including DDR3 and DDR4 memory, hard disk drives (HDDs), solidstate drives
(SSDs), and RAID levels for storage volumes
• HPE ProLiant rack and blade servers and options for them such as HPE Smart Array Controllers
• HPE BladeSystems, including interconnect modules and Virtual Connect (VC) modules
• Server management and maintenance, including experience with Integrated Lights-Out (iLO),
Intelligent Provisioning, UEFI, HPE Insight Remote Support, HPE Insight Online, HPE Smart
Update Manager (SUM), and HPE Insight Control server provisioning (ICsp)
• HPE OneView capabilities
Chapter topics
This chapter introduces you to HPE Moonshot portfolio, solutions, and the customer use cases that
they were specifically designed to address. It then covers some general information that you need to
know as you design any HPE Moonshot solution, including how to architect the networking
components as well as how to manage the solution. The next chapter will give you specific guidance
in designing HPE Moonshot solutions for particular use cases and workloads.

HPE Moonshot overview


This topic introduces you to the HPE Moonshot product and its components.

HPE Moonshot System

Figure 6-1 HPE Moonshot System

The HPE Moonshot System, shown in Figure 6-1, is a huge leap forward in infrastructure design. It
delivers breakthrough efficiency and scale by aligning just the right amount of compute, memory,
and storage to get the work done. The idea is very simple—replace general-purpose processors with
more energy-efficient Systems-on-Chip (SoCs) containing integrated accelerators tailored for
specific workloads.

The Moonshot Chassis incorporates everything that is a common resource in a traditional server—
power, cooling, management, fabric, switches, and network uplinks are all shared across 45 hot-
pluggable server cartridges in a dense form factor. This enables massive scale-out without a
corresponding increase in complexity and management overhead. It gives customers the right
compute for their workloads at the right economics so that they can get the most out of their
infrastructure. With HPE Moonshot, customers can
• Optimize application performance—Avoid paying for IT they are not fully utilizing by using the
best solution their workload
• Realize breakthrough economics—Make better use of their data center space and power while
reducing complexity
• Accelerate business innovation—Respond more quickly to business needs and stay on the leading
edge of technology
HPE Moonshot components

Figure 6-2 HPE Moonshot components

HPE Moonshot converges compute, storage, and networking within a single chassis. The HPE
Moonshot 1500 Chassis houses 45 server cartridges, each of which provides one processor or four
processors. Each processor is called a cartridge node and is a server with its own operating system
(OS). A 1500 chassis fully populated with 4 processor (4p) cartridges has a maximum density of 180
servers in one chassis (see Figure 6-2).

The chassis provides a dense fabric that interconnects the cartridges to each other; it also has two
switch modules and two uplink modules to provide cartridges with external connectivity.

The chassis houses and manages all power and cooling elements for the cartridges, creating an
efficient and green solution.

A chassis management (CM) module provides a single point of access to the essential chassis
functions and server management functions. It consolidates iLO functions for the chassis and all
installed components. This single connection vastly simplifies the management network required to
manage servers at scale. In this module resides the logic that controls the chassis functions, such as
power distribution and cooling. The iLO CM provides a command line interface (CLI) and graphical
user interface (GUI) management interface, as well as a representational state transfer (REST)
interface and Intelligent Platform Management Interface (IPMI) for scripting. You will learn more
about these components throughout this chapter.

HPE Moonshot 1500 Chassis front and rear view


Figure 6-3 HPE Moonshot 1500 Chassis front and rear view

Front view
The HPE Moonshot 1500 Chassis (shown in Figure 6-3) is the essential foundation to unlock total
cost of ownership (TCO) savings. The efficient design enables 45 hot-plug server/storage cartridges
and 2 low-latency network switch cartridges in 4.3U chassis (5U bezel option available). A 42U rack
can easily hold 9 HPE Moonshot 1500 Chassis. The hot-plug cartridges are architected for efficiency,
flexibility, and density. Because a chassis can hold 45 servers (1p cartridges) or 180 servers (4p
cartridges), a standard 42U rack can hold up to 1620 servers (405 servers in a rack using the single-
node server cartridges).

The switch modules are centrally located to enable high bandwidth and low-latency switching. They
connect to uplink modules (visible from the rear view), as well as to the cartridges. You will look at
the internal switching fabric in more detail later.

At a glance, the front panel display conveys the health status of HPE Moonshot System, the individual
health for each cartridge in the chassis, as well as the general system health. UIDs are located on the
front panel display and on each cartridge.

Rear view
The HPE Moonshot 1500 Chassis builds upon the HPE ProLiant SL family by sharing power supplies
and cooling for energy efficiency and cost savings. And like the HPE ProLiant BL family, Moonshot
adds the benefits of embedded network switching to enable cable consolidation and right-sized
networking capabilities for optimizing switch port costs.

HPE Moonshot 1500 Chassis supports up to 45 or 180 servers all sharing the 5 dual-rotor fan
modules and 2–4 common-slot power supplies. The fan modules are dual-rotor fans, for a total of 10
fans capable of up to ~4500W of cooling. Only two power supplies are required, but up to four power
supplies can be added to achieve redundancy. Currently, HPE supports common-slot power supplies
for the HPE Moonshot System, but check for updates in the QuickSpecs.

The HPE Moonshot network uplink modules are paired and matched to corresponding switch
cartridges in the chassis. The switches/uplink modules are stackable and provide the standard rear
cabling of rack-mount servers, but with the cable consolidation of a top-of-rack (ToR) switch.

From the rear, you can also access the HPE Moonshot 1500 iLO CM module, which as you learned,
provides a single point of access to the essential chassis functions and server management functions.

HPE Moonshot application-focused silicon

Figure 6-4 HPE Moonshot application-focused silicon

HPE Moonshot cartridges consist of SoCs. An SoC compresses all the elements connected in a
traditional server motherboard—processor, memory, video card, management interface, adapters,
and storage controller—on a single chip (as shown in Figure 6-4). HPE Moonshot supports 1P
cartridges, which have one SoC, and 4P cartridges, which have four SoCs. It is the SoC’s small form
factor that allows the HPE Moonshot chassis to host such a high density of servers, saving the
customer power, space, and cost. HPE has also designed each SoC to focus on a specific type of
workload, such as video processing.

The SoC design includes a space for future features created by Independent Hardware Vendors (IHVs)
to distinguish their offerings by meeting new needs. When customers need to handle this future use
case, they can simply purchase the new cartridge tailor-made for it.

ProLiant cartridge options


Figure 6-5 ProLiant cartridge options

Figure 6-5 and Table 6-1 provide at-a-glance information about the cartridges available at the time of
the publication of this ebook. You will learn more about selecting cartridges in the next chapter.

Table 6-1 HPE Moonshot cartridges


Highly flexible fabrics

Figure 6-6 Highly flexible fabrics

HPE Moonshot is designed with four communication fabrics to reduce complexity and enable
flexibility. (See a summary of the benefits in Figure 6-6.) The fabrics are connected via a passive
baseboard/backplane for low cost, high reliability, and future expansion.
• The Ethernet fabric is made possible by a standard Moonshot switch cartridge and optional, second
switch cartridge. The stackable, low-latency switches are on separate fabrics for isolation and
redundancy.
• The storage fabric enables optimization of CPU cores to storage from minimum storage
applications to storage-rich applications and from multiple servers sharing a single drive to
multiple drives dedicated to a single server.
• The Moonshot CM module manages the infrastructure power and cooling, but its biggest benefit is
a single point of management of the 45 cartridges via a dedicated management fabric with point-to-
point connections to each module in the chassis.
• The Moonshot architecture was designed with an integrated 2D toroid cluster fabric that allows
point-to-point connectivity from cartridge to cartridge. The traces are simply copper traces, and the
protocol and functionality can be driven by the requirements of applications and cartridges
installed.

There are 29 lanes for the four fabrics, 1 for management, 8 for Ethernet (4+4, cartridge to switch for
external connectivity), 4 for storage (2+2), and 16 for 2D Fabric (4 x 4, cartridge-to-cartridge for
local connectivity).

You will learn about these fabrics in the networking section of this chapter (with the exception of the
storage fabric because you cannot sell storage cartridges at the time of the publication of this ebook).

Storage options

Figure 6-7 Storage options

HPE Moonshot cartridges provide some local storage despite their small form factor. The m300
cartridges give customers the choice between HDDs of capacities up to 500 GB. This cartridge also
provides M.2 SATA SSDs. Other cartridges support M2.SATA SSDs of various capacities: Refer to
Table 6-1 for details.

Some applications require greater amounts of storage than the cartridges can store locally. The HPE
Moonshot cartridge adapters support iSCSI initiators, allowing the cartridges to connect to HPE
3PAR StoreServ Systems. The StoreServ Systems deliver highly scalable, high performance, and
easily managed external block storage to the HPE Moonshot System (see Figure 6-7). The StoreServ
Systems work with m300, m350, m700, m710, and m710p cartridges.

Alternatively, you can propose HPE density-optimized storage servers such as the HPE Apollo 4000
family (or ProLiant SL servers) to provide external storage. The Apollo 4000 servers can also
provide iSCSI block storage through HPE StoreVirtual (supported for m700, m710, and m710p
cartridges), or they can support file or object storage, depending on the customer needs.

HPE Moonshot use cases


In this topic, you will examine different HPE Moonshot use cases based on application requirements.

Why HPE Moonshot

Figure 6-8 Why HPE Moonshot

As IT services have penetrated further and further into day-to-day business operations, the
applications and workloads hosted in a modern data center have proliferated—not only in number but
also in variety. HPE has recognized that customers can no longer rely on all-purpose servers to meet
the need of every application. At the same time, provisioning separate infrastructures for each
application is inefficient, expensive, time-consuming, and complicated.

HPE Moonshot solutions reconcile these conflicting needs. They consist of a variety of options for
cartridges, each designed to deliver the best performance for a specific type of workflow. But the
cartridges also share storage and networking within the Moonshot chassis, adding up to a dense,
efficient solution that is simple to deploy and manage, as illustrated in Figure 6-8.

HPE Moonshot use cases


Figure 6-9 HPE Moonshot use cases

You can design HPE Moonshot solutions that are specialized for the four types of applications shown
in Figure 6-9. A brief overview of the use cases is provided below; the next chapter delves into more
detail on each.

Big data and analytics


Data is increasing at an exponential rate. According to the IDC Digital Universe Study: Big Data,
Bigger Digital Shadows, and Biggest Growth in the Far East, by 2020 every person will be producing
1.7 megabytes of information every second. Companies need solutions to ingest data, store data, and
then explore and analyze the data for business value. HPE Moonshot offers cartridges customized for
big data analytics.

Mobile workspace
Always connected Millennials make up an increasingly large segment of the workplace—and it is not
just Millennials who work more productively when they can receive network access anywhere at any
time and from any device. Companies are discovering that they can gain more by embracing a mobile
and Bring Your Own Device (BYOD) environment than by fighting it. Hosted desktop and application
streaming solutions allow employees to run enterprise-class applications, hosted in the data center, on
the devices of their choice. But these solutions can place high demands on the servers that host them,
particularly as the number of users scales. HPE Moonshot solutions provide the required
performance and density.

Media processing
Various HPE Moonshot cartridges are also designed to deliver to users the rich, high-quality media
content that they demand. Use cases include high-definition video processing and streaming, gaming,
and general content delivery.

Web infrastructure
Any company that hosts a website knows that revenue increases with user traffic, but keeping pace
with demand can be difficult. With its ability to host a mix and match set of cartridges, HPE Moonshot
can host an entire, multi-tiered Web infrastructure in a single chassis, creating an efficient, integrated,
and highly scalable multi-tiered solution.
HPE Moonshot Partner Program

Figure 6-10 HPE Moonshot Partner Program

HPE partners with IHVs such as Intel and AMD to give customers a choice of processors that meet
their computing needs.

HPE also partners with Independent Software Vendors (ISVs), such as those shown in Figure 6-10, to
give customers the peace of mind that comes from tested solutions. You should keep an eye on the
current list of Moonshot ISV partners so that you can easily identify opportunities for proposing HPE
Moonshot solutions. For example, you might encounter a customer who is seeking a server refresh to
improve the scalability of their Citrix XenApp application sharing solution. You would know that
HPE and Citrix tests have demonstrated stellar performance for HPE Moonshot 1500 Chassis with
m710 cartridges for this scenario. You could refer to a technical white paper for help in architecting a
similar solution for your customer, and you could share the HPE and Citrix test results as a
compelling proof point for the HPE value in your proposal.

Visit the HPE website for a current list of partners.

HPE Moonshot networking


Before you dive into the details of designing HPE Moonshot solutions for hosting specific
applications, you need to learn a bit more about the solution architecture and general design
guidelines. This chapter focuses on these general aspects, beginning with architecting the networking
component of the solution. The next chapter covers architecting solutions for specific workloads.

2D torus—Cartridge-to-cartridge connections
Figure 6-11 2D torus—Cartridge-to-cartridge connections

Certain cartridges can connect together in a 2D torus. These connections are provided by the chassis
fabric—4 10 Gbps lanes per connection—and do not require the use of an internal switch.

A 2D torus consists of nodes that connect in rings in two directions, as you see in Figure 6-11. The
Moonshot fabric supports a 3x15 2D torus, in which fifteen cartridges connect together in a ring.
Each of those cartridges also connects to two cartridges in a ring in another dimension. Each of those
other two cartridges is part of another fifteen-cartridge ring. Thus, each cartridge connects to four
other cartridges, and all 45 cartridges have multiple high bandwidth paths to all other cartridges with
eight or fewer hops.

Only cartridges that are designed for use cases that require high-speed server-to-server
communication, such as HPC, support the 2D torus connections (currently the m800 cartridges).

Cartridge-to-switch connections

Figure 6-12 Cartridge-to-switch connections

To connect cartridges to the data center network, the HPE Moonshot chassis provides slots for two
switches (see Figure 6-12). Each installed switch module has four 10.3 Gbps lanes to each cartridge.
These lanes connect to adapters on the cartridge. Different cartridges have different adapters, as you
will see in the next section.
The switch module also connects to an uplink module on 16x 10.3 Gbps lanes. The uplink module
provides the switch with its uplink ports, used to connect to the data center network. A switch module
and uplink module are always installed as a pair. By disaggregating the uplinks from the internal
switch, HPE Moonshot gives customers greater flexibility for deploying their choice of external
interconnects and future-proofing their investment.

Only one switch module and uplink module pair is required. However, if only one switch module is
installed, each cartridge can only use the half of its ports that connect to that module. The second
switch and uplink module pair provides redundancy or connectivity to a second network. Multiple
modules can be stacked within or across multiple chassis, reducing the cost of ToR switches and
providing failover in the event of a switch or an uplink failure.

Table 6-2 lists the features of each switch and uplink module.
Table 6-2 HPE Moonshot Switch and Uplink Modules

HPE Features Intended


Moonshot cartridges
Switch
Modules

Moonshot- The 45G Switch Module, together with the HPE Moonshot-6SFP Uplink Module, provides 1 GbE m300
45G Switch network connections to cartridges within the HPE Moonshot 1500 chassis.
Module

Moonshot- The 45XGc Switch Module, together with the HPE Moonshot-4QSFP+ Uplink Module, provides m400
45XGc 10 GbE network connections to cartridges within the HPE Moonshot 1500 chassis and 40 Gb/10 m710
Switch Gb connectivity external to the chassis. m710p
Module

Moonshot- The 180G Switch Module provides 1 GbE network connections to up to 180 nodes in the HPE m350
180G Switch Moonshot 1500 chassis. m700
Module m800

HPE Features Switches


Moonshot supported and
Uplink intended
Modules cartridges

Moonshot 6- Use up to two HPE Moonshot-6SFP Uplink Modules with six 10 GbE SFP+ ports. Each uplink 45G or 45Gc
SFP Uplink module delivers 60 GbE of aggregate bandwidth to connect the HPE Moonshot System to an m300
Module external network.

Moonshot Use up to two HPE Moonshot-16SFP Uplink Modules with 16x 10 GbE SFP+ ports. Each uplink 180G or 45XGc
16SFP+ module delivers 160 GbE of aggregate bandwidth to connect the HPE Moonshot System to an m350
Uplink external network. m400
Module m700
m710
m710p
m800

Moonshot Use up to two HPE Moonshot-4QSFP+ Uplink Modules with four 40 GbE QSFP+ ports. Each 180G or 45XGc
4QSFP+ uplink module delivers 160 GbE of aggregate bandwidth to connect the HPE Moonshot System m350
Uplink to an external network. m400
Module m700
m710
m710p
m800
Cartridge connectivity options

Figure 6-13 Cartridge connectivity options

The cartridge adapters that connect to the switch modules depend on the type of cartridge (see Figure
6-13). HPE Moonshot supports two types of 1P cartridges: ones with two 1 GbE ports and ones with
two 10 GbE ports. All 4P cartridges have two 10 GbE ports per node for eight total. For all
cartridges, one port on each node connects to one switch module and the other port connects to the
second module (if installed).

Selecting switch modules

Figure 6-14 Selecting switch modules

You will now learn guidelines for selecting the correct switch module based on the cartridge type
(illustrated in Figure 6-14). Begin by considering situations in which all cartridges installed in the
HPE Moonshot chassis are the same type. (Here, type refers to 1P G, 1P 10G, or 4P, not the precise
cartridge model.)

If you are using 4P cartridges, you must select HPE Moonshot 180G Switch Modules, which provide
enough ports for each of the four processors on the 45 cartridges.

If the chassis includes all 1P G cartridges, you have two options for switches that provide 45 1G ports.
Choose the HPE Moonshot 45G Switch Module for basic switching features such as VLANs, IP
routing, and support for Quality of Service (QoS). Choose the HPE Moonshot 45Gc Switch Module
(based on HPE Comware OS) when you need these basic features as well as advanced, data center
switching features such as the following:
• Transparent Interconnection of Lots of Links (TRILL), which lets switches interconnect on many
links without creating loops
• OpenFlow, which enables switches to be controlled by a software-defined networking (SDN)
solution

For 1P 10G cartridges, select HPE Moonshot 45XGc Switch Modules, which are based on the same
OS as the 45Gc switch modules but provide 10G connectivity.

Note that you must choose the same switch module types for both module slots.

Selecting switch modules for chassis with mixed cartridge types


You can mix and match cartridges of different models within the same chassis, tailoring the
requirements for the different workloads required by a solution. For example, a mobile workplace
solution might use m710p cartridges to provide hosted desktops but m300 cartridges to host
controllers. If any of these cartridges are different types, such as 1P G and 1P 10G, you need to take
care to select switches that support both.

The 4P cartridges can only connect to the HPE Moonshot 180G Switch Module, while 1P G and 1P
10G cartridges can work with any of the modules.

When a cartridge and a switch support different speeds, the lower speed is used. That is, although a
10G cartridge can be supported by a gigabit switch (45G, 45Gc, or 180G), it only receives a 1 Gbps
connection from this switch. When the 45XGc switch supports a mix of 1P cartridges, it provides 10G
connections for 10G cartridges, but only 1G for 1G cartridges. Finally, while a 180G switch module
can support 1P cartridges, it only provides such a cartridge with one connection because the cartridge
itself has only two ports (one of which connects to the other switch).

Based on these rules, you must select the 180G Switch Modules whenever the chassis includes 4P
cartridges. For this reason, you might try to avoid mixing 4P and 1P 10G cartridges in the same
chassis because the 10G ports will only be able to operate at 1 Gbps. Of course, if the cartridges do
not require the higher speed connections, then mixing these types of cartridges is permitted.

When you need to mix 1P G and 1P 10G cartridges in the same chassis, select the 45XGc switch
modules, which can support both types of cartridges and allow the 1P 10G cartridges to benefit from
the full bandwidth on their ports.

Selecting uplink modules


Figure 6-15 Selecting uplink modules

On its own, an HPE Moonshot switch module does not have any external ports for connecting to the
data center network. You must select an uplink module to provide these ports, as shown in Figure 6-
15.

The HPE Moonshot 45G and 45Gc Switch support the HPE Moonshot-6SFP+ module. This module
provides six 10 GbE connections, meaning no oversubscription for traffic flowing from the 45
cartridges (the typical flow for most use cases).

For the HPE Moonshot 45XGc and 180G Switch Modules, you have two options: the HPE Moonshot-
16SFP+ Uplink Module (which provides 16 10 Gbps SFP+ ports) and the HPE Moonshot-4QSFP+
Uplink Module (which provides four 40 Gbps QSFP+ ports). Both uplink modules support the same
total bandwidth.

Take the infrastructure at the customer ’s data center into account as you choose a module. Using 40
GbE links requires fewer upstream ports; on the other hand, the customer infrastructure might not yet
support 40 GbE to the rack because 40 GbE connections require different cabling with more fibers.
For example, 40GBASE-SR4 uses MPO cables. These cables have 12 fiber strands, eight of which are
used for transmitting and four for receiving.

If the customer has 10 GbE now, but wants the flexibility to upgrade to 40 GbE in the future, select the
4QSFP+ Uplink Module. You can install a QSFP+/SFP+ adapter kit, which converts a 40 GbE QSFP+
port into 10 GbE SFP+ port, enabling the module to fit into the existing infrastructure until the
upgrade. Or for short-range connections up to 5 meters, you can use QSFP+ to 4x10G SFP+ Direct
Attach Copper Splitter Cables. These splitter cables give the customer four 10 GbE links per QSFP+
port.
Note
Please disregard any compatibility error message while using a QSFP+/SFP+ adapter.

To choose between the 16SFP+ Uplink Module and the 4QSFP+ Uplink Module, also consider
whether you plan to implement stacking (180G) or IRF (45XGc) on the switch modules. You will
learn more about these features in a moment. For now, simply know that if you want to use these
features, you must dedicate some links to establishing the stack or Intelligent Resilient Framework
(IRF) fabric:
• At least four 10 GbE links on a 16SFP+ Uplink Module
• At least one 40 GbE link on a 4QSFP+ Uplink Module

These links are then no longer available for uplinks. It is best practice to establish at least two links
between two-member stacks or fabrics to prevent a situation in which the stack or the fabric splits. If
you are concerned about oversubscription, you might choose the 16SFP+ Uplink Module so that you
can establish multiple stacking or IRF links without having to dedicate a full 80 Gbps to these links.
Note, however, that for some situations 80 Gbps is sufficient. You can also use just one stacking or
IRF 40 GbE link. You should then, however, make sure that the network administrator sets up the
proper mechanism for dealing with split stacks or fabrics (Multi-Active Detection [MAD] on the
45Gc and 45XGc switches) in case that single link fails. Table 6-3 provides information about
oversubscription based on uplink module.
Table 6-3 Switch module oversubscription based on uplink module

Switch Module Uplink Module Oversubscription Oversubscription with stacking/IRF

45G or 45Gc 6SFP None 1.1:1 (two 10 GbE links for stacking/IRF)

180G 16SFP+ 1.1:1 1.3:1 (four 10 GbE links for stacking/IRF)

4QSFP+ 1.1:1 1.5:1 (one 40 GbE links for stacking/IRF)


2.3:1 (two 40 GbE links for stacking/IRF)

45XGc 16SFP+ 2.8:1 3.8:1 (four 10 GbE links for stacking/IRF)

4QSFP+ 2.8:1 3.8:1 (one 40 GbE links for stacking/IRF)


5.6:1 (two 40 GbE links for stacking/IRF)

Note
The Moonshot 6SFP and 16SFP+ Uplink Modules support various HPE SFP transceivers, SFP+ transceivers, and Direct Attach
Cables (DACs). The Moonshot 4QSFP+ Uplink Modules support various HPE QSFP+ MPO SR4 transceivers, QSFP+ DACs,
QSFP+ to 4x10G SFP+ DAC splitter cables, and QSFP+/SFP+ adapters. You should always refer to the latest module QuickSpecs
for a list of the transceivers and cables qualified and certified to work with this module. Transceiver and DAC cables from any
manufacturer will be accepted, but they will not be supported by HPE.

Providing redundancy and avoiding broadcast storms

Figure 6-16 Providing redundancy and avoiding broadcast storms


A redundant network design without the proper technologies in place can cause broadcast storms
(illustrated in Figure 6-16): network switches continually duplicate and forward broadcasts and
multicasts to each other until these packets consume all bandwidth on links—essentially an
unintentional denial of service (DoS) attack. In a moment, you will examine methods for properly
designing network topologies that provide redundancy and high bandwidth
• Use link aggregation (either manual or LACP-based).

You can only aggregate links that connect one switch to the same other switch. However, as you will
see, a stack or IRF fabric count as a single switch for link aggregation.
• Enable a spanning tree protocol such as Rapid Spanning Tree Protocol (RSTP) or Multiple
Spanning Tree Protocol (MSTP).

RSTP and MSTP handle loops across the broadcast domain, blocking redundant paths, which
automatically reactivate in case of failure.
• Enable TRILL.

TRILL provides a better solution than spanning tree for a modern data center because it allows
switches to load balance traffic across many links in a swiftly converging topology.
• Place links in different VLANs such that no one VLAN has a looped topology.

A VLAN defines a broadcast domain. As long as VLANs segment the looped topology such that no
loops exist within a VLAN, no broadcast storm occurs. Often, though, you want redundant
connections to carry the same VLAN, so this chapter will not examine this method in more detail.

You might use some of these methods in conjunction—for example, creating a link aggregation on
multiple ports that connect to the same switch and also enabling TRILL to handle redundant link
aggregations that connect to different switches. The next sections provide some example network
designs, demonstrating when and how you should use the most common methods.

Redundant design without stacking or IRF


Figure 6-17 Redundant design without stacking or IRF

This first design, shown in Figure 6-17, applies to situations where you want cartridge nodes to use
their two ports for redundancy within the same subnet. You also do not want to stack the switch
modules or implement IRF.

For this design, each node’s OS must bond its two ports in a mode that does not require awareness
from the connected switches. You can configure the ports in active/standby mode. Only one port will
forward and receive traffic. The other port will be on standby in case the active port fails. You can
assign the active NIC role to half of the cartridge nodes’ port 1 and the other half of the cartridge
nodes’ port 2 so that both switches are handling traffic during normal operation.

If the workload demands bandwidth from both ports, the bonded ports could load balance traffic;
however, the load balancing must use a mechanism that does not require aggregation on the switch
side. For example, a node with a Linux OS can use balance-tlb mode or balance-alb mode. Windows
Server 2012 can use switch independent mode with load balancing.

On the uplink side, both switch modules are taking advantage of all their uplink ports to connect to
data center switches, probably TOR switches. Although not shown in the illustration for simplicity,
the data center switches connect the Moonshot chassis at high speeds to other Moonshot chassis and
HPE servers in the rack, as well as to the data center core. The same holds true for the designs shown
in the next sections.

In this example, each 6-SFP Uplink Module has three connections to one data center switch and three
connections to another switch. You must aggregate the three links using either a static aggregation or
LACP; LACP is generally preferred, and the connected switch should support that protocol. You can
create similar designs for the 16SFP+ and 4QSFP+ modules.

Also enable spanning tree on both switch modules. Now the modules block links in order to create
only one path to the spanning tree root, which is somewhere upstream in the data center. As you see in
Figure 6-17, a link aggregation counts as a single link for spanning tree. If one of the active link
aggregations fails entirely—for example, a data center switch fails—a module can automatically
unblock the proper links. Note that you should leave the spanning tree priority on each module at the
default to prevent the switch module from becoming elected root (0 is the highest priority).

You should typically also define the downstream ports that connect to servers as STP edge ports,
which enables the ports to come up more quickly and prevents disruptions during failover events.
When you do this, make sure that the redundant server ports are notbridged, which would introduce a
loop. Remember: the ports should be bonded and bonded in a mode that does not require awareness
on the switch modules.

RSTP and MSTP have failover times of just under a second rather than times counted in the
milliseconds, which are often required for today’s applications. Therefore, this design is shown for
your reference, but you should usually use one of the faster converging methods for managing loops,
described in a moment.

Redundant design with TRILL: 45Gc or 45XGc

Figure 6-18 Redundant design with TRILL: 45Gc or 45XGc

Instead of using RSTP or MSTP on the HPE Moonshot 45Gc and 45XGc switches, you can set up
TRILL—as long as the connecting switches in the data center support this protocol (see Figure 6-18).
Like the spanning tree protocols, TRILL prevents traffic from looping in a broadcasts domain.
However, TRILL fails over more quickly and tends to be more stable than a spanning tree protocol. It
also allows switches to use all of their links, creating a lower latency, higher bandwidth, and more
scalable topology.

On the switch modules, you enable TRILL and set up the downlink ports as TRILL access ports.

You might use this design when the data center switches are not implementing IRF. Again, although
not shown in Figure 6-18, the data center switches connect the Moonshot chassis to other resources.

In this design, the switch modules are acting independently—perhaps because you want to ensure that
all uplink bandwidth is available for traffic (none used for stacking or IRF links). Therefore, you set
up the cartridge node ports as you do for the previous example: bonded in a mode that does not
require the upstream switch to know about the aggregation.

Redundant design using stacking or IRF

Figure 6-19 Redundant design using stacking or IRF

You can combine multiple HPE Moonshot 45Gc or 45Xc Switch Modules into a single IRF fabric, as
you see in Figure 6-19. Similarly, you can combine two HPE Moonshot 45G or 180G switches into a
single switch stack. The stack or fabric:
• Is managed from one interface
• Shares a control plane (which is proxied to each member in an IRF fabric)
• Appears as one switch to other switches and routers

As mentioned earlier, you must dedicate some of the uplink ports to stacking or IRF ports, which
connect the modules together. Table 6-4 shows the minimum number of stacking or IRF links for
each switch module. You could add more links if you expect that the links will need to carry more
traffic between the modules, but in the design illustrated above, the links should carry minimal traffic.
Refer to the switch module documentation for details about which port IDs you can use for the links.
Table 6-4 Minimum required stacking or IRF links

Switch Module Uplink Module Minimum required stacking or IRF ports for a two-member stack

45G or 45Gc 6-SFP Two

180G or 45XGc 16SFP+ Four

4QSFP+ One (two recommended to reduce chances of a split)


In this example, the modules are connecting to an IRF fabric composed of two data center switches, so
the design is simple. You place all links across the modules in a single aggregation. If one module or
all of its uplinks fail, failover to links on the other module occurs in milliseconds. Switch modules
support a maximum of 32 links per link aggregation, so you can aggregate all links on both modules
no matter what type of uplink module is used.

Figure 6-20 Redundant design using stacking or IRF

If the modules are connecting to multiple data center switches that are not in an IRF fabric, you need
to implement another method for managing loops: RSTP/MSTP or TRILL.

When you use stacking or IRF, illustrated in Figure 6-20, each cartridge node can implement NIC
bonding in any mode that meets the customer needs. If the use case calls for load balancing traffic on
both ports, you can use LACP mode (sometimes called 802.3ad mode). You then configure the stack
or IRF fabric to establish a link aggregation to each node using one port on each module.
Active/standby mode is also supported, but might increase the traffic on the stacking or IRF links
because traffic incoming from the uplinks might arrive on the module connected to the standby port;
the traffic must then cross the stacking links to reach the module connected to the active port.

When you plan connections for a stack or an IRF fabric that you intend to use for redundancy, create a
balanced design, as shown in the figure, in which each switch has the same number of connections to
each upstream or downstream device. This design prevents undue congestion on the stacking or the
IRF links. During normal operation, when traffic arrives on either member of the stack or the fabric,
that member can forward the traffic on a local link in the link aggregation connected to the
destination device. When you follow these guidelines, using stacking or IRF on two modules should
not affect the networking performance—and it will enhance redundancy.

Note that, during some failover situations, traffic will need to cross the stacking or IRF link because,
for example, half of the cartridge node traffic will still be arriving on a module with failed links.
(Network administrators could create a policy on a 45Gc or 45XGc switch to shut down the ports that
connect to cartridge ports if all uplink ports fail.)

Expanding stacking or IRF across multiple chassis

Figure 6-21 Expanding stacking or IRF across multiple chassis

When you want to use stacking or IRF to enhance resiliency, you should plan the topology as you saw
in Figure 6-20, with two members per stack or fabric and links balanced across the modules.
Sometimes, however, customers want to use stacking or IRF to reduce the number of data center
switch ports required to connect to the Moonshot chassis and perhaps to eliminate the need for a ToR
switch entirely.

For example, the customer shown in Figure 6-21 has four Moonshot chassis in a rack. The chassis are
populated with 1P 10G cartridges and use Moonshot-45XGc Switches installed in slot A together with
4QSFP+ Uplink Modules. This customer does not require redundancy for individual cartridge node
links, so modules are not installed in slot B. However, the customer does want link redundancy at the
chassis level, so if each switch module were acting on its own, at least eight 40 GbE ports would be
required to connect all chassis in the rack.

When you combine the switch modules on all these chassis into a single stack or fabric, the modules
can share uplinks. For example, you could establish four uplinks, one on each module. All four
chassis have link redundancy because if one of the chassis’ uplinks fails, it can use the stacking or IRF
links to reach other module’s uplinks. However, only half the number of ports are required for
supporting the rack, so the customer might be able to deploy end or middle of the row (EoR) switches
instead of ToR switches. If the solution can tolerate more oversubscription for cartridge traffic, you
could reduce the uplinks further.

The increased oversubscription is a tradeoff in this design, as is increased latency. Some outgoing
traffic must traverse stacking or IRF links before reaching the uplinks; similarly, incoming traffic
might need to traverse stacking or IRF links before reaching the module that connects to the
destination cartridge node.

To ensure adequate performance, it is recommended that you use a ring topology as shown in Figure
6-21 and that you carefully consider how much bandwidth is required on the stacking links. When
traffic flows from cartridge nodes, a switch module that has an uplink in the egress link aggregation
forwards the traffic on its local uplink. If it does not have a local uplink, it forwards the traffic on a
stacking link to the closest member with an uplink. However, incoming traffic might arrive on any
module because the upstream switch decides on which link in the aggregation to send it. Then the
traffic must cross the stacking or IRF links to reach the connected cartridge. In many Moonshot use
cases, distributing the uplinks across the stack or fabric so that every module has or is close to an
uplink reduces traffic on the stacking or IRF links. If the traffic patterns for your use case differ, you
might need to add bandwidth to the stacking or IRF links. Inadequate bandwidth can cause lost packets.

Table 6-5 shows the maximum number of members per stack or fabric permitted on various switch
modules. Keep in mind, though, that performance can decrease as the number of members increases,
particularly for the 45G Switch Modules.
Table 6-5 Members per stack

Switch Module Maximum number of members per stack or IRF fabric

45G and 180G 9

180G 2

45Gc and 45XGc 4

Using VLANs
Figure 6-22 Using VLANs

Often a chassis is housed with cartridges that are part of a scale out solution, and all cartridges belong
to the same network. Sometimes, though, you will need to place different cartridge nodes in different
networks. You enforce the network divisions at the switch module level by assigning the node ports to
the correct virtual local area network (VLAN) without tagging (node ports are access ports on 45Gc
and 45XGc modules), as illustrated in Figure 6-22.

You can assign different uplinks to different VLANs without tagging if you want to dedicate certain
uplinks for certain cartridges’ traffic. If you want cartridges to share uplink ports, assign all the
cartridge VLANs to the uplink ports (or aggregations) using tagging (the uplink ports are trunk ports
on 45Gc and 45XGc modules).
Chapter 6—Activity
Now take some time to complete an activity in which you will review what you have learned about
selecting switch modules and uplink modules, as well as properly designing redundant connections.
You can check your answers by referring to Appendix B: Answers to Activities.

Select switch modules


You will now answer several questions. You need to select the switch modules for several HPE
Moonshot chassis. For each scenario below, select the best switch. Assume that the customer wants the
highest bandwidth connections and the most advanced switch functions that are possible to achieve in
the scenario. You can use Table 6-6 to review the cartridge capabilities.

Switch module choices

a. HPE Moonshot 45G Switch Module


b. HPE Moonshot 45Gc Switch Module
c. HPE Moonshot 45XGc Switch Module
d. HPE Moonshot 180G Switch Module

Scenarios

1. The chassis has 45 m710p cartridges.


2. The chassis has 45 m350 cartridges.
3. The chassis has 45 m300 cartridges.
4. The chassis has 30 m400 cartridges and 15 m800 cartridges.
5. The chassis has 3 m300 cartridges and 42 m710 cartridges.
Table 6-6 HPE Moonshot cartridges

Select uplink modules


Listed below are three choices for uplink modules. Think about and record reasons to choose each
type of module. (The reasons for choosing some modules might be quite straightforward; in other
cases, you might need to consider more factors.)
1. HPE Moonshot 4-QSPF+ Uplink Module
2. HPE Moonshot 16-SPF+ Uplink Module
3. HPE Moonshot 6-SPF+ Uplink Module

Plan redundant topologies


When you plan the connections for an uplink module, you must be careful to plan the correct
technologies. Figures 6-23, 6-24, 6-25, and 6-26 show several ways to connect switch modules to data
center switches, some more redundant than others. Some of the designs also use IRF fabrics.

Sketch these simple figures on a blank piece of paper. Then draw a circle round links that you would
combine in a link aggregation. Also label the figure with any technology that you would implement,
such as TRILL or RSTP.

Figure 6-23 Topology 1

Figure 6-24 Topology 2


Figure 6-25 Topology 3

Figure 6-26 Topology 4

You can check your selections in Appendix B: Answers to Activities.

HPE Moonshot management


Next, you will learn how to manage and provision HPE Moonshot solutions.

HPE Moonshot iLO CM module


Figure 6-27 HPE Moonshot iLO CM module

The iLO functions for a Moonshot chassis’ cartridges are managed through an HPE Moonshot iLO
CM module. The iLO CM acts as the single Ethernet gateway for the cartridges.

The CM communicates with satellite controllers (SCs), which are embedded throughout the chassis,
as shown in Figure 6-27. For example, each cartridge has an SC, as does the switching fabric. The
chassis also has more than 1500 sensors, which collect information for the CM. Administrators can
access the CM through a serial port and receive access to the CM CLI. Or they can connect through
the iLO port (after the CM has received an IP address) and access the CLI through SSH or the GUI
through HTTPS.

The device-neutral design allows a common interface to ARM SoC cartridges and x86 cartridges. As
new types of cartridges are invented, the device-neutral architecture can enable the inclusion of that
cartridge into the management fabric.

From the management interfaces, administrators can


• Monitor cartridge and switch health
• View logs
• Monitor and manage power utilization
• Set cartridges’ boot settings
• Manage the CM itself, including updating firmware for the module and SCs, setting the CM’s IP
address, and managing the accounts for users allowed access to the CM

The iLO CM also provides a serial console connection to each cartridge node’s virtual serial port
(VSP). The iLO CM hosts Intelligent Platform Management Interface (IPMI) and the REST API for the
cartridge nodes, allowing scripting for monitoring and maintenance tasks. Chapter 9,“Monitoring and
Managing HPE Solutions,” explains more about the REST API, which is the preferred API.

In short, the iLO CM is quite similar to an HPE BladeSystem Onboard Administrator (OA) with
which you should be familiar from prerequisite training. Table 6-7 shows the various privileges that
administrators can assign to iLO CM users and their consistency with existing iLO systems.

HPE Moonshot chassis support a single iLO CM module. If the module fails, the iLO and
management functions are unavailable until the module is replaced. However, the cartridges and
internal switches continue to function, and production services are undisturbed.
Table 6-7 HPE Moonshot iLO CM privileges

Privilege Comparison to existing iLO-based platforms

Remote Console Privilege Allows access to cartridge node Virtual Serial Ports (VSPs)

Boot Priority Allows configuration of node level PXE/HDD boot settings

Power and Reset Remains consistent with iLO

Configure CM Provides same privileges as the Configure iLO option

Administer User Accounts Remains consistent with iLO

Planning iLO connections

Figure 6-28 Planning iLO connections

As you can for HPE Apollo solutions, you can choose to connect each chassis iLO CM port to a
network switch, or you can daisy chain several Moonshot chassis together and connect only one to a
network switch. The first option provides better availability for iLO CM functions, while the second
option conserves ports. Figure 6-28 illustrates how to use the iLO and Link ports on the chassis to
daisy chain them together. (Administrators also have to enable the daisy chain function from the iLO
CM CLI.)

In either case, it is recommended that you use a separate switch for iLO from the switches used for
production traffic.

Be very careful to avoid loops. On a standalone chassis, connect only the iLO port to the network
switch, not both the iLO and Link port. In a daisy chain, connect the iLO port on only one chassis to
the network switch.

Access to HPE Moonshot cartridge node

Figure 6-29 Access to HPE Moonshot cartridge node

The HPE Moonshot chassis is headless without any connectors for a video console, keyboard, or
USB device. Administrators and system integrators manage the chassis and its components through
the CM exclusively. They can access a node’s virtual serial port (VSP) through the iLO CM CLI (SSH
session only). Using the VSP, they can set boot settings and configure a one-time boot node, as well as
other basic tasks. This access is illustrated in Figure 6-29.

Administrators can also configure settings such as cartridge node boot options through the iLO CM
GUI. From the GUI, administrators can also establish a Remote Console session with keyboard,
mouse, video console, and virtual media—however, they can only do so if the cartridge node is
linked with a Moonshot Remote Console Administrator (mRCA).

HPE mRCA

Figure 6-30 Access to HPE Moonshot cartridge node

You should typically recommend at least one mRCA for the solution. An mRCA, when installed in an
HPE Moonshot chassis cartridge slot, links to a cartridge node and provides Remote Console access
to that node, as shown in Figure 6-30.

Which node is linked to the mRCA depends on the slot in which mRCA is installed. HPE recommends
installing the cartridge that you want to manage in slot 41. If the cartridge is a 1P cartridge or if you
want to manage the first node on a 4P cartridge, install the mRCA in slot 44. To manage one of the
other nodes on a 4P cartridge installed in slot 41, place the mRCA as follows:

Node 2 = slot 40

Node 3 = slot 42

Node 4 = slot 38

Although these slots are recommended, the mRCA can be installed in different slots and link to
different cartridges and nodes. To look up in which slot you should install the mRCA cartridge to link
to a particular node—or in which slot you should install a cartridge to link to an mRCA already
installed in a specific slot—visit
http://h17007.www1.hpe.com/us/en/enterprise/servers/mrca/index.aspx#.VquEUdUrKM8.

With mRCA installed, administrators and integrators can access the iLO CM from their own
management station and launch a Remote Console session with the linked cartridge node. They now
have keyboard, console, and mouse access to the cartridge node. The mRCA also provides virtual
media, so administrators can mount an image to the cartridge node from media connected to their
device. With access to the virtual media, they can also easily create a golden image, installing all the
necessary correct applications and configuring the correct settings for the customer solution.

Integrators can then use tools such as Microsoft Windows Deployment Services (WDS) to capture the
image (they must customize a capture boot file to enable the capture process to proceed correctly on a
headless device). They could then set up preboot execution environment (PXE) as described later in
this section, using the captured golden image as the install image file.

The mRCA also provides a Debugging Tool, which administrators can use to troubleshoot a node.

If the customer primarily wants to use mRCA for the initial deployment of the solution, one mRCA is
sufficient. Administrators can install the mRCA in slot 44 of one chassis and create the golden image
on node 1 of whichever cartridge is installed in slot 41.

If the customer plans to use the mRCA for debugging, you might recommend leaving slots 41 and
onward open in one chassis (with blanks installed). Then the administrators can simply install the
mRCA in slot 44 and move any cartridge that needs to be debugged to slot 41. If the customer has 4P
cartridges, leave slots from 38 onwards open in one chassis. Then administrators can move the
mRCA to slot 40, 42, or 38 to debug node 2, 3, or 4 on a cartridge in slot 41. To support more
extensive debugging, you might recommend one mRCA per HPE Moonshot chassis.

The mRCA supports x86 cartridges. For a current list of supported cartridges, visit
http://www.hp.com/go/moonshot.
Provisioning options

Figure 6-31 Provisioning options

HPE Moonshot cartridge nodes can use PXE to boot an image from the network for their initial OS
installation. The sections that follow provide more information about ensuring that PXE works in for
headless Moonshot cartridges.

Keep in mind, though, that especially if you are architecting a large solution with multiple HPE
Moonshot chassis, you should propose one of two solutions for speeding the provisioning process:
HPE Moonshot Provisioning Manager (MPM) or HPE Insight Cluster Management Utility (CMU), as
you see in Figure 6-30. HPE Insight CMU provides monitoring capabilities in addition to the
provisioning ones and is geared for larger deployments. Chapter 9, “Monitoring and Managing HPE
Solutions,” covers these solutions.

The mRCA cartridge provides Remote Console (including Virtual Media) access to a node on a
linked server cartridge and is recommended for creating the golden image that will be deployed to
other nodes through one of the other methods.

Planning for a network deployment without HPE Provisioning Manager


or Insight CMU
Figure 6-32 Planning for a network deployment without HPE Provisioning Manager or Insight
CMU

If the customer does not use HPE MPM or Insight CMU, integrators would set up PXE using much the
same process as for other ProLiant servers. However, some special considerations apply.

For a Windows installation, they must set up a PXE server, DHCP server, and DNS server (which
could all be the same server) and load the proper boot, install, and driver files on the PXE server (as
shown in Figure 6-32). Windows Deployment Service (WDS) is a common PXE server for a
Windows environment, but HPE Moonshot supports other solutions.

Integrators must also use HPE Moonshot Windows Deployment Packs (MWDPs) to customize the
boot and install files with the proper drivers and settings for supporting a particular HPE Moonshot
cartridge. For example, the MWDP turns on Windows Emergency Management Services (EMS),
which allows the OS to install on the headless cartridge nodes. Integrators must also create answer
files that allow a cartridge node to go through the installation process without user interaction.

When using WDS and other solutions with similar capabilities, integrators should create a pre-staged
solution in which each cartridge node’s MAC address is bound to the correct boot and install files. To
obtain the MAC addresses, integrators can access the iLO CM and generate a list of the addresses.

A Linux installation similarly requires a PXE server with the proper boot and configuration files, a
DHCP server, as well as a TFTP server, and an HTTP, NFS, or FTP server to deliver the OS
installation files. Again, the same server might provide all of these services. The PXE configuration
files require some special settings for the headless environment. Integrators will probably want to use
an automatic installation process using kickstart, pre-seed, or AutoYaST files (the precise file type
depends on the type of Linux OS). Integrators will need to customize these files to support the
headless environment.

The HPE Moonshot cartridge nodes are configured to boot from PXE by default. However,
integrators can access CM to set up booting from a local HDD or SSD or from iSCSI.

Complete instructions for deploying a supported Windows or Linux OS through the network are
provided in the Operating System Deployment on HPE ProLiant Moonshot Server Cartridges User
Guide.

Microsoft System Center Configuration Manager integration (SCCM)


Figure 6-33 Microsoft System Center Configuration Manager integration (SCCM)

HPE Moonshot solutions integrate with Microsoft SCCM (shown in Figure 6-33), another option for
the PXE deployment solution. SCCM helps to automate the deployment of Windows OS to multiple
nodes. It also allows integrators to create application packages and deploy those to nodes. Integrators
can streamline the provisioning process by using scripts to add HPE Moonshot nodes to SCCM.

HPE has published a guide to help integrators successfully navigate the deployment. See HPE
Moonshot Integration with Microsoft System Center Configuration Manager (SCCM).

Summary
This chapter has explained how HPE Moonshot solutions provide compact, converged compute,
storage, and networking that is tailored to the workload. You also learned general principles for
architecting HPE Moonshot networking. And you examined the many options that customers have for
managing Moonshot solutions.

The next chapter teaches you how to design HPE Moonshot solutions for particular workloads.

Learning check
Review what you have learned by answering these questions. Then check your answers in Appendix
A: Answers to Learning Checks.
1. When is an HPE Moonshot 180G Switch Module required for an HPE Moonshot 1500 Chassis?
a. Whenever the customer wants to install an HPE Moonshot 4QSFP+ Uplink Module
b. Whenever the chassis has a mixture of 10 GbE and 1 GbE cartridges
c. Whenever the customer wants to use both ports on a cartridge node
d. Whenever the chassis includes any cartridges with four processors

2. An architect plans to connect an HPE Moonshot chassis to data center switches as shown in
Figure 6-34. How should the architect plan to configure the four 10 GbE ports to prevent a
loop?
a. As a link aggregation that includes all four ports
b. As two link aggregations, each of which includes the two ports that connect to one of the data center switches
c. As four separate ports with the two ports that connect to one switch assigned to one VLAN and the two ports that connect to
the other switch assigned to another VLAN
d. As four separate ports, all of which are assigned to the same VLAN

Figure 6-34 Exhibit for learning check

3. How can administrators contact the VSP for an HPE Moonshot cartridge node?
a. Through the cartridge’s serial port
b. Through the iLO CM CLI
c. At the cartridge node’s iLO IP address
d. At the cartridge node’s IP address on its first port

For answers, See Chapter 6 in Appendix A.


7 HPE Moonshot Solutions for Particular
Workloads

EXAM OBJECTIVES
• Position HPE Moonshot cartridges for the right use cases and workloads
• Create an implementation plan for the following solutions, including plans for the proper
performance, scalability, and high availability:
– Big data and analytics solution
– Video processing solution
– Mobile workspace solution
– Web infrastructure solution

Assumed knowledge
Before reading this chapter, you should have a basic understanding of the following:
• Processors, including DDR3 and DDR4 memory, hard disk drives (HDDs), solidstate drives
(SSDs), and RAID levels for storage volumes
• HPE ProLiant rack and blade servers and options for them such as HPE Smart Array Controllers
• HPE BladeSystems, including interconnect modules and Virtual Connect (VC) modules
• Server management and maintenance, including experience with Integrated Lights Out (iLO),
Intelligent Provisioning, UEFI, HPE Insight Remote Support, HPE Insight Online, HPE Smart
Update Manager (SUM), and HPE Insight Control server provisioning (ICsp)
• HPE OneView capabilities
Chapter topics
This chapter teaches you how to architect HPE Moonshot solutions for four use cases:
• Big data and analytics
• Video processing
• Mobile workspace
• Web infrastructure

Big data and analytics


This topic explains how to architect HPE solutions for big data and analytics.

HPE Big Data Reference Architecture

Figure 7-1 HPE Big Data Reference Architecture

As you learned in Chapter 3, the HPE Big Data Reference Architecture (illustrated in Figure 7-1) can
improve flexibility and scalability, while enhancing performance, by separating compute nodes from
storage nodes. You are then free to select nodes that are optimized for each role. In addition, you can
control the balance of compute to storage nodes. Finally, you can allow multiple compute clusters
running different applications to share the same Hadoop Distributed File System (HDFS) cluster,
eliminating the sprawl of duplicated data.

You will now learn in more detail how to design this architecture.

The storage solution: HPE Apollo 4200


Figure 7-2 The storage solution: HPE Apollo 4200

The HPE Apollo 4000 family provides the ideal solution for the storage nodes (data nodes), which
run a file system such as HDFS to store and serve the data. These servers are purpose-built for big
data and object storage.

The Apollo 4200 System, shown in Figure 7-2, typically provides the best choice for storage nodes
when you divide the compute and storage nodes. This system supports up to two processors in the
Intel Xeon E5-2600 v3 or v4 family, with a wide array of choices ranging from 4 to 22 cores.
Because an HPE Moonshot System will provide the compute, you can choose a mid-range option. The
reference architecture for Cloudera recommends a 10-core E5-2660v3 processor.

You also have many options for disks. You can choose either a large form factor (LFF) 4200 model,
which supports up to 28 LFF disks, or a small form factor (SFF) model, which supports up to 54
disks. Table 7-1 indicates the server ’s maximum storage capacity for various types of storage as of
the publication of this ebook (always check the latest QuickSpecs for updates). SATA HDDs will
typically work well for this solution. These disk drives optimize capacity over performance in
keeping with the storage nodes’ role of storing large amounts of data. The compute nodes, which
operate on data and need to write and perhaps shuffle result files, on the other hand, provide high-
performance SSDs.

The Apollo 4200 System supports two embedded Smart Array controllers, the HPE Flexible Smart
Array P840ar and the HPE Dynamic Smart Array B140i. The P840ar provides many RAID levels
(RAID 0, 1, 5, 6, 60, and ADM); 4 GB for the flash-backed write cache (FBWC) used with the HPE
SmartCache feature to enhance performance for writes; HPE SSD SmartPath, which enhances read
performance on SSDs; and optional HPE Secure Encryption.
Table 7-1 Maximum local storage capacity for HPE Apollo 4200

Disk type Protocol Form factor Maximum capacity when all disks are this type

HDD SATA SFF 108 TB (48 + 6 rear x 2 TB)

LFF 224 TB (24 + 4 rear x 8 TB)

SAS SFF 108 TB (48 + 6 rear x 2 TB)

LFF 224 TB (24 + 4 rear x 8 TB)

SSD SATA SFF 207.36 TB (48 + 6 rear x 3.84 TB)

LFF 44.8 TB (24 + 4 rear x 1.6 TB)


SAS SFF 86.4 TB (48 + 6 rear x 1.8 TB)

LFF 44.8 TB (24 + 4 rear x 1.6 TB)

The compute solution: HPE Moonshot cartridges

Figure 7-3 The compute solution: HPE Moonshot cartridges

HPE offers several Moonshot cartridges that are optimized for data processing and analytics
applications such as YARN applications (see Figure 7-3). Each cartridge is tuned for a slightly
different workload, as you can see in Table 7-2.

The HPE ProLiant m710 and HPE m710p cartridges are optimized for data processing and analytics
applications of many types. They are the recommended cartridges in HPE Reference Architectures
for Cloudera and for Apache Spark. They also provide excellent performance for NoSQL databases
such as HBase and Cassandra. In addition to their excellent processing power and memory, these
cartridges have two 10 GbE ports that support Remote Direct Memory Access (RDMA) over
Converged Ethernet (RoCE), which helps to reduce latency for communications between the
cartridges and the storage. Both cartridges offer a similar set of capabilities; however, the HPE
m710p has a more powerful CPU and GPU accelerator than the m710 does.

For real-time analysis, choose the HPE m800 cartridges. These cartridges bring the premium
performance of Digital Signal Processing (DSP) cores to the dense, efficient, and highly scalable
Moonshot chassis. Real-time analysis demands not only high performance, but also high bandwidth
and low latency data exchange between compute nodes. The m800 cartridges feature high bandwidth
links between the four nodes on the cartridge and also take advantage of the high-speed 2D torus
connections within the Moonshot chassis.

The HPE m300 and m350 provide a lower cost alternative for basic distributed analytic applications.
The m300 provides more processing power per core than the m350, but the m350, with four nodes
per cartridge, provides a higher density solution.
If the customer needs faster results for queries through in-memory analytics—that is, operating on
datasets in memory rather than on disk—choose the HPE m400 cartridges. The m400 cartridges meet
the needs with 64 GB RAM, the highest memory per processor of the HPE Moonshot cartridges. They
also support 10 GbE connectivity.
Table 7-2 HPE Moonshot cartridges for big data analytics
Figure 7-4 Selecting local storage for compute nodes

Many of the cartridge node features are fixed, being tuned already for the workloads for which that
cartridge is designed. As you see in Figure 7-4, you do have choices, though, for the amount of local
storage. Most of the cartridges support SSDs, which will provide high performance for the analytic
applications. The m300 gives a choice between a high-capacity HDD (up to 1 TB) or a higher
performance 240 GB SSD. For analytics, generally choose the SSD.

SSDs provide the low latency and high speed I/O required for the intermediate result or shuffle files
created by many Hadoop applications. For Spark applications, they provide high speed I/O for any
data that does not fit in memory. In either case, you should typically select the higher capacity SSDs to
ensure that these files can fit on the SSDs. The performance offered by SSDs might be particularly
important for I/O-bound jobs such as sorting, grouping, or transforming data.

For a NoSQL server, consider the maximum dataset size for a table. All of the servers together should
be able to hold this data in their local storage. Again, you should typically select the highest capacity
SSD for this workload. Then you will not have to add more compute nodes simply to obtain more
storage.

Three recommended designs

Figure 7-5 Three recommended designs

You have selected the HPE Moonshot cartridges and the HPE Apollo 4000 servers for the solution.
Next you must scope out the rack. See the recommended designs in Figure 7-5.

Architects for big data and analytics solutions often take storage capacity requirements as the
beginning point for planning. In the traditional architecture, in which the same servers provide
compute and storage, compute provisioning was tied closely to the storage requirements. Generally,
architects would select a server that provided one core per disk. The company then scoped out the
number of servers based on the storage requirements.
For customers who want a traditional balance of compute and storage, the HPE Big Data reference
architecture offers a balanced rack design. This rack consists of three HPE Moonshot 1500 Chassis,
each with 45 m710/m710p cartridges, for 135 cartridges (540 cores) total. (Note that you could adjust
the cartridge type for a particular use case such as real-time processing, as you learned earlier). The
rack also has five HPE Apollo 4200 servers. The Apollo 4200 can support up to 224 TB (using 28x 8
TB SATA HDDs) for 1.2 PB per rack. However, some reference architectures call for the Apollo 4200
to use 4 TB SATA HDDs, which adds up to 112 TB and 560 TB per rack. The second configuration
supports a lower storage capacity but the same I/O and compute power, making the rack a bit “hotter.”

After you decide on the capacity for the rack, multiply out the number of required racks based on the
customer ’s capacity needs.

Remember that the HPE big data solutions give you more flexibility in design; compute is no longer
constrained by storage, but can scale at the right rate for the customer.

Perhaps the customer needs fast or even real-time results for queries. Perhaps the customer has many
applications with complex processing demands. Such a customer requires a “hot” solution with more
compute power as compared to storage capacity. A hot rack includes seven HPE Moonshot 1500
Chassis (315 nodes with 1P cartridges or 1260 nodes with 4P cartridges) and one HPE Apollo 4200
(112 TB or 224 TB).

Other customers might have a great deal of data, but place fewer computational demands on the data.
For example, they might be focused more on data archival with limited analysis of stored data. Such a
customer would have overprovisioned compute under the traditional model. You can help the
customer lower total cost of ownership (TCO) by suggesting a “cold” rack design. This rack
provides a great deal of storage, with seven HPE Apollo 4200s and a single HPE Moonshot 1500
Chassis (45 nodes with 1P cartridges or 180 nodes with 4P cartridges).

You can also make your own custom mix, but follow the guideline of eight total components
(Moonshot 1500 chassis or Apollo 4200 Systems) per rack. For example, perhaps you have opted to
provision the Apollo 4200 servers with 112 TB, but this makes a balanced rack a bit hotter than the
customer needs. You could plan two full HPE Moonshot chassis for every six Apollo 4200 Systems.

Strategies for selecting a mix of compute and storage

Figure 7-6 Strategies for selecting a mix of compute and storage


If you are unsure of whether the customer needs a hot, cold, or balanced design, you can attempt to
determine more precisely how many compute nodes are required, as shown in Figure 7-6.

Discuss the number of data analysis application instances the customer expects to run at once and the
maximum amount of time in which an application should finish executing a job whether that be an
hour, two hours, or more.

Each HPE Moonshot chassis loaded with 45 m710/710p cartridges can contribute a certain amount of
processing power, memory, and local storage to running an application. For example, if the customer
is using an MR2 application, HPE recommends up to 14 map tasks and 7 reduce tasks per m710/710p
cartridge. This means that a Moonshot chassis would support up to 628 concurrent map tasks (630
minus 2 for necessary management processes) and a rack of three chassis could support 1888 (1890
minus 2). These limits are designed to ensure that each map task or reduce task has enough memory
for the default heap size (about 2 GB for map tasks and about 3 GB for reduce tasks, with reduce tasks
being limited until map tasks are complete). The application runs much better when the heap can be
loaded fully in the memory rather than the server needing to swap data to and from the SSD. Note
also that all together a chassis can hold 21.6/43.2 TB of data locally (m710p cartridges support the
higher number); a rack of three chassis could hold 64.8/129.6 TB.

Based on the dataset size and number of tasks involved, how long would it take a rack of three chassis
to execute an average application job and a maximum job? Or if you need more than five Apollo
4200 Systems to meet the storage needs, how long would it take the multiple racks to run the
applications? If the customer wants the application to run more quickly, you must make each rack
“hotter.” For example, you might plan a rack of four Moonshot chassis to four Apollo 4200 systems,
increasing the compute to storage ratio nearly by a factor of two from 0.6 to 1. You would then need
to plan more racks to deliver the required total storage capacity.

You might also discuss which resource is most likely to cause a bottleneck for the type of application
that the customer is using: CPU, memory, local storage, or network bandwidth. If the customer
already has a solution in place, administrators can use that solution’s tools, such as Cloudera
Manager, to monitor resource usage. Scope the solution to meet the needs for the bottleneck resource.

Another way to think about the requirements is to consider whether the application is running CPU
bound or I/O bound tasks. You learned about this strategy in Chapter 3. Table 7-3 reminds you of CPU
bound tasks versus I/O bound ones. Also keep in mind that Impala, Spark, and Solr Search
applications tend to be CPU bound. If most tasks are intense CPU bound ones, you might want to make
the rack a bit hotter. If most tasks are I/O bound, make sure that you plan for 10 GbE RoCE
connections and sufficient storage nodes to handle the requests. You might, for example, plan more
HPE Apollo 4200 Systems using lower than maximum capacity HDDs. If the application creates
intermediate files, make sure to select the maximum size SSD for compute nodes.
Table 7-3 Examples of task types

CPU bound I/O bound

Classification Sorting

Clustering Grouping
Complex data mining Data transformation

Feature extraction

Natural language processing

When planning an HBase or other NoSQL solution, consider the maximum dataset size for read or
write queries. The solution performs much better when the dataset fits on the local SSDs (tens to
hundreds of times better based on HPE tests). How many cartridges are required for the full dataset to
fit on the cartridges’ total SSD capacity? For example, if a maximum dataset is 20 TB, 45 cartridges
with one 480 GB SSD each should meet the needs. For customers who want very fast results, you
might scale out the solution a bit more so that more of the dataset fits in the memory (which increases
performance 1.5 to 4 times better than when data exceeds the memory but fits on the SSD, according
to HPE tests).

Reference architectures and tools provided by the application provider (shown in Table 7-4) can also
provide insight in scoping the solution.
Table 7-4 HPE big data reference architectures

Solution Reference architecture

HBase HPE Verified Reference Architecture for running HBase on HPE BDRA

Cloudera HPE Verified Reference Architecture BDRA and Cloudera Enterprise implementation

Hortonworks Data HPE Big Data Reference Architecture: Hortonworks Data Platform reference architecture
Platform implementation

DataStax Cassandra DataStax Enterprise on HPE Moonshot System with HPE ProLiant m710 Server Cartridges

MapR HPE Verified Reference Architecture BDRA and MapR Distribution implementation

Reviewing the need for other solution components

Figure 7-7 Reviewing the need for other solution components


The same requirements for additional big data solution components, which you examined in Chapter
5,“HPE Apollo 4000 for Data-Driven Organizations,” hold true for the HPE big data architecture (see
Figure 7-7). In this architecture, though, the compute nodes do not need to connect to the extract
transform load (ETL) network, even if the solution does not use edge nodes. The storage nodes
would handle ingesting data in that case.

Planning the networking connections

Figure 7-8 Planning the networking connections

Providing low latency connections between the compute and the data nodes will help the applications
to perform better. The m710 and m710p cartridges support 10 GbE with RoCE, so if you are using
them, make sure to select the HPE 45XGc switch modules to ensure that the cartridges obtain the 10
Gbps speeds. For all cartridges, you should generally plan bonding or teaming the ports using load
balancing so that the node can use both of its ports. If the customer wants to use Link Aggregation
Control Protocol (LACP) mode, remember to plan stacking or Intelligent Resilient Fabric (IRF) links
between the two switch modules. The compute nodes should all be in the same network, and the
virtual local area network (VLAN) for that network can be applied on the top-of-rack (ToR) switches
rather than on the switch module.

Plan two HPE ToR switches, such as 5930 switches, combined in an IRF fabric to support
nonblocking speeds between the HPE Apollo 4200 servers and cartridges in the rack’s multiple
Moonshot chassis. Figure 7-8 shows an example in which the 45XGc switch modules connect together
on one IRF link (make sure that network administrators implement multi-active detection [MAD] to
avoid a split fabric). They then have six 40GbE links for connecting to the 5930 IRF fabric with an
LACP link aggregation. The 5930 IRF fabric also has two 10 GbE links to each HPE Apollo 4200
server.

The 5930 IRF fabric will have at least 12 40 GbE ports (six on each switch) available for uplinks. If
the customer ’s needs require you to plan multiple racks, the ToR switches can connect on these
uplinks to an aggregation pair of 5930 switches in another IRF fabric.
The network administrator will also need to plan how the management node will link to an external
network and how edge nodes or storage nodes will link to the ETL. The ToR switches could provide
these links, using VLANs to isolate the networks from the cluster network.

Remember also to include switches with 1G edge ports in the design to provide iLO connections.

Guidelines for testing


The guidelines for testing this solution are similar to those for testing a solution that uses the
traditional big data architecture. They are repeated here for your reference.

Your proof of concept (POC) should match your design as closely as possible, including the HPE
Moonshot 1500 Chassis loaded with the correct components, the HPE Apollo 4000 servers, and the
HPE data center switches.

The HPE Discovery Lab provides you with a secure environment for testing applications on an HPE
Moonshot System. You can access the lab through a virtual private network (VPN) from any location.
To learn more about the lab and to set up a time to use it, visit
http://www8.hp.com/us/en/products/servers/proliant-server.html?compURI=1536877#.VrYR2tsrKM8.

Before you run the test, it is also important that you tune the nodes to better support the application.
Table 7-5 lists HPE reference architecture documents that explain the tuning guidelines. This tuning
will ensure the best results from the test, and you should also recommend that the system integrator
completes the same steps for the final solution so that it operates most efficiently.
Table 7-5 HPE big data reference architectures

Solution Reference architecture

HBase HPE Verified Reference Architecture for running HBase on HPE BDRA

Cloudera HPE Verified Reference Architecture BDRA and Cloudera Enterprise implementation

Hortonworks Data HPE Big Data Reference Architecture: HortonworksData Platform reference architecture
Platform implementation

DataStax Cassandra DataStax Enterprise on HPE Moonshot System with HPE ProLiant m710 Server Cartridges

MapR HPE Verified Reference Architecture BDRA and MapR Distribution implementation

You are then ready to test. Benchmarking tools provide generic metrics—for example, the throughput
for reads and writes to the HDFS cluster. Table 7-6 lists some benchmarking tools for big data and
analytics.
Table 7-6 Example benchmarking tools

Solution Benchmarking tool Description

NoSQL databases Yahoo Cloud Service Benchmark Tests throughput for read/write queries to the database

HDFS TestDFSIO Tests throughput and average IO rate for read/writes to HDFS

HDFS and MapReduce TeraSort Tests time for sorting data (large job)

MRBench Tests average time for completing many small jobs


Benchmarks might have a role to play in your testing, but you are more precisely attempting to
determine how well your customer ’s application runs.

Plan several tests using the customer applications with datasets of various sizes, including one that
meets or exceeds the customer ’s maximum needs. You should also choose tests that place various
demands on the solution, including worst-case scenario demands. For example, for a NoSQL test, you
might run read-heavy tests and write-heavy tests, as well as balanced read-write tests. You should also
test how the NoSQL solution handles a high degree of random IO requests.

After you run the test, determine whether the execution time and other metrics are acceptable or
whether you need to adjust the solution. The application that you are testing might provide you with
valuable metrics for this purpose. For example, Hortonworks Data Platform (HDP) uses Ambari to
collect and expose metrics; the Cloudera Manager also tracks metrics. Table 7-7 gives examples of
some metrics that you might examine as you test. You can find a complete list of Hadoop metrics at
https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/Metrics.html.
Table 7-7 Example metrics

Application Metric Meaning

HBase regionserver.Server.blockCacheEvictedCount Number of blocks that had to be evicted from the block cache due
to heap size constraints.
If this stays at 0, all of your data fits completely into HBase
blockcache (stored in the cartridge node memory), which is the
most desirable case.
If you see values too far above 0, you need more compute nodes
so that each can handle a smaller amount of data that fits in the
memory.

regionserver.Server.blockCacheExpressHitPercent The percentage of time that requests with the cache turned on hit the
cache.
Values under 100 mean that hot data being processed cannot be
entirely fit into blockcache. If the number is too far below 100,
scale up the number of compute nodes.

regionserver.Server.storeFileSize Aggregate size of the store files on disk.


Make sure this value is similar on all region servers in order to
properly balance the HBase load.

regionserver.Server.blockCacheFreeSize Number of bytes that are free in the blockcache.


This value indicates how much of the cache is used. It is a good
indicator if your data is “warmed” by moving it into cache, so a low
value is good.

regionserver.Server.readRequestCount The number of read requests received.


You can use this metric to see how many requests the solution is
handling.

regionserver.Server.writeRequestCount The number of write requests received.


You can use this metric to see how many requests the solution is
handling.

regionserver.Server.flushQueueLength Current depth of the memstore flush queue.


This metric should stay about the same over time. If it increases, the
node is falling behind with clearing memstores out to HDFS.

Any QueueMetrics PendingMB The current memory or CPU resource requests that are not yet
YARN QueueMetrics PendingvCores scheduled.
application A high number might indicate that you need to scale out the number
of cartridges so that they provide more memory or cores.
QueueMetrics running_0 The current number of applications whose elapsed time is less than
QueueMetrics running_60 60 minutes, between 60 and 300 minutes, between 300 and 1440
QueueMetrics unning_300 minutes, and more than 1440 minutes.
QueueMetrics running_1440 You can use these metrics to determine whether jobs are completing
in the customer’s desired execution time.

AppsSubmitted Number of applications that have been submitted to the resource


AppsRunning manager for scheduling, that are running, that are waiting to be
AppsPending scheduled, and that are completed.
AppsCompleted You can use these metrics to determine whether the solution can
handle the required number of jobs. For example, you can see how
many applications are running when the number of applications
pending begins to reach an unacceptable level.
Chapter 7—Activity 1
In this activity, you will design a big data solution using HPE Moonshot and HPE Apollo 4000
systems. In this plan, you will
• Design a solution to host big data analytics
• Architect the solution to meet the customer ’s needs, including
– Providing enough storage capacity
– Meeting the compute needs
– Adding HBase

Scenario

The scenario for this activity is similar to that for the Chapter 5 activity.

A retailer operates a chain of grocery stores throughout a region. The company has a great deal of
data about inventory, customers, purchases, and so on. The company is just venturing into big data
solutions and plans to deploy Cloudera Hadoop. The customer wants a more scalable and reliable
way to store data. The customer also wants to start analyzing that data to make more informed
decisions. For example, the customer hopes to learn more about the most loyal customers and the
highest-spending customers so that marketing can make better decisions about how to brand the
company.

The retailer has a relatively small data center with traditional rack servers. The CIO has seen projects
fail before due to outdated infrastructure. She wants to ensure that the new big data solution is a
success and is pushing the purchase of servers specifically designed to meet the needs of such as
solution.

However, unlike the previous scenario, this customer is open to your suggestions about the best
architecture for the company’s needs. The customer looks forward to adding more analytics
applications as the company begins to reap the benefits of the insights into the data. The customer
wants to maintain the ability to scale the solution and the flexibility to add more types of analytics
applications.

Workload requirements

You have discussed the workload requirements with the customer and discovered that
• The customer requires 5.23 PB raw storage capacity (which you have calculated using the
replication rate and other guidelines discussed in Chapter 5).
• The customer plans to use MapReduce2 applications to analyze data on a weekly basis. Currently,
the customer has just a few standard queries that it will run each week, and the queries can take
hours to complete.
• In the future, the customer plans to develop applications for faster queries.
Select HPE products

You will propose a solution based on the HPE Big Data Reference Architecture to this customer.

Match an appropriate product to each role in the solution. You can use the same product more than
once. You do not have to use every product.

Products
a. HPE DL360
b. HPE Apollo 4200
c. HPE m700 cartridge
d. HPE m710p cartridge

Solution role
1. Active and standby head nodes
2. MR2 worker nodes (compute nodes)
3. HDFS data nodes (storage nodes)
4. Management node

Also answer this question:


5. The customer wants to distribute data ingestion rather than use an edge node. Which products
require dual-homed connections? Will you need to set up VLANs on the Moonshot switch
modules?

Scope the storage requirements

Record your answers to these questions.


6. How many HPE Apollo 4200 Systems will you propose? More than one answer could be valid,
but think about how you would justify your answer and what you would discuss with the customer
to help you make your choice.
7. When a customer is deploying a new Hadoop solution, you should generally begin with a
balanced rack design for the storage nodes and MR2 compute nodes. How many balanced racks
are required to meet the storage needs?

Scope the compute nodes


8. How many Moonshot chassis will you deploy to fill the balanced racks?
9. Assume that you are proposing 960GB SSDs for the HPE m710p cartridges. How much data can
each HPE Moonshot chassis store locally? How much data can the compute node cluster store
locally?
10. How many map tasks can the cluster run at once? How many reduce tasks can it run at once?
11. What would be reasons for you to adjust your design to have relatively more or fewer Moonshot
chassis? What might you discuss with the customer?
Run tests
12. You have created a POC with your proposed servers, which you have set up with a solution as
close to the customer ’s as possible. What guidelines will you follow as you conduct tests?

Describe the benefits of the HPE Big Data Reference Architecture


13. Begin to outline a presentation for winning the CIO and other decision makers to your side using
the HPE Big Data Reference Architecture. (You will learn more about tools that you can use such
as the HPE Alinean TCO/ROI Calculator, in Chapter 10.)

Add HBase

After the initial successful rollout of the Cloudera Hadoop solution, the customer wants to support
faster analysis on smaller, more random datasets. The customer decides to add HBase. You need to
scope a solution for the HBase Region Servers. You discuss requirements with the customer, who
indicates that the dataset size is 100TB.
14. Which Moonshot cartridges will you plan for the HBase Region Servers?
15. How many Moonshot cartridges will you plan for the HBase Region Servers? How many chassis?

You can check your answers by referring to Appendix B: Answers to Activities.

Video processing
You will next learn how to architect HPE Moonshot solutions to support video processing and
delivery as well as other types of content delivery.

Video transcoding and streaming


Various HPE Moonshot solutions are optimized for audio and video transcoding. Transcoding uses a
codec to convert raw audio and video files, which contain analog signals, into a compressed digital
stream. The codec defines how the signal is sampled, how color is encoded, which bit rate is used for
the stream, and so on. Examples of video codecs include H.265, H.264, H.263, H.262, Microsoft
WMV, and Google VP6.

Customers who require video transcoding solutions include film companies, TV companies, and
other creators of media content. Some content creators deliver the content themselves using over the
top (OTT) streaming, which refers to any delivery of content, including audio and video, directly
through the Internet rather than through a multi-system operator (MSO) such as a cable or broadcast
company.

Some streaming service companies focus on streaming content generated by others. You have
probably streamed video from Netflix, Hulu, and Amazon—all examples of OTT streaming services
—many times. Such companies also require transcoding solutions.

Rather than host their own web services, a company might use a Content Delivery Network (CDN)
solution. The CDN provider, typically located in a data center near a major Internet service provider
(ISP), hosts the company’s web content and guarantees a level of availability and responsiveness for
the web services. As multi-media components have become more common in web services, CDNs
have also had to move into the realm of video processing and streaming.

Types of video transcoding

Figure 7-9 Types of video transcoding

You should learn a bit more about various types of video transcoding, since your architecture
decisions might differ based on the type. Figure 7-9 shows the architecture for a Harmonic
transcoding solution (Harmonic is an HPE Moonshot partner). Other transcoding vendors have
similar architectures.

Live transcoding

A live transcoding solution receives live content, which it must immediately transcode and package
for streaming. The transcoding process is not a simple one-to-one conversion. A video streaming
server must be able to serve a variety of clients with a variety of needs. Different Web browsers use
different codecs to present videos. One client might request the video at high resolution and another at
lower resolution. The server might need to adjust the bit rate for streams, not just for different clients
but also for the same client if the client’s connection speed changes. (OTT companies prefer to
throttle the transmission speed just ahead of the viewer so that they do not pay the content provider
for content that no one views.) The transcoding server must ensure that a properly formatted video
stream is available for all of these clients. (Some refer to transrating for adjusting the rate,
transsizing for adjusting the resolution, and transcoding to only converting the format; however,
most people more generally refer to any alteration as transcoding.)

The application might separate the transcoding and streaming roles, or the same server might play
both. In any case, not only must the transcoding servers transcode and package multiple streams, they
must do so in real-time without the lags that cause users to leave negative comments and look for a
new provider. The demands on the servers’ computational power can be intense.

File-based transcoding
File-based transcoding refers to non-time-sensitive transcoding of video files that are then stored in a
video library rather than streamed immediately. For example, a TV company might have a
transcoding farm to transcode files that will need to be streamed by a multi-screen video on demand
(VOD) solution at a later date. Or a company might need to reformat its entire video library into a
more efficient codec. Or a company might provide disaster recovery services for content creators
and continuously receives new video files that it must transcode.

In modern data centers, a transcoding farm works in parallel to transcode the video file, often into
multiple formats with various bit rates and resolutions. The same farm can also package the file into a
streaming format, ready for delivery to end users by a streaming server.

Transcoding types for which HPE Moonshot cartridges are tailored

Figure 7-10 Transcoding types for which HPE Moonshot cartridges are tailored

In order to meet the vast computational demands, video transcoding technology developers have
taken several approaches, evolving beyond simply using the general purpose central processing unit
(CPU) of an x86 machine. Some developers program for custom application-specific integrated
circuits (ASICs), which are hardware architectures specially designed for that application only. These
ASICs can deliver excellent performance, but they require specialized hardware dedicated to the
video transcoding application.

Other developers are programming to make use of graphics processing units (GPUs). As you learned
in a previous chapter, a GPU can accelerate easily parallelized processes, and video transcoding fits
this description. Intel Quick Sync Video technology provides general transcoding and streaming
features that use the Intel Iris Pro GPU. FEI is an open framework for enabling applications to use the
Intel Iris Pro GPU. The software vendor can program within this framework to deliver their own
customized, advanced video transcoding and streaming features.

HPE Moonshot m710 and m710p cartridges, with their Iris Pro GPUs, are optimized for any video
transcoding and streaming application that uses Quick Sync or FEI to make use of the GPU. In fact,
based on HPE tests, they can increase performance up to 20 times per rack unit compared to
traditional servers. They can also work for applications that use the CPU only. Although they do not
provide these applications with as much extra power as they bring to Quick Sync- and FEI-based
ones, they might increase performance up to 4.2 times per rack unit as compared to traditional
servers. The Moonshot solutions are not intended for applications that require custom ASICs.
Both the m710 and m710p provide excellent performance, but the m710p gives a performance boost
of about 20% beyond the m710, allowing it to support more video streams or to transcode files more
quickly.

Figure 7-10 provides examples of HPE partner independent software vendors (ISVs) that use Quick
Sync or FEI. As you see, these vendors also have applications that use the CPU only. You should
investigate the application that the customer needs HPE servers to support and determine whether they
are designed to use GPUs.

HPE Moonshot m800 cartridges provide four nodes, each with an ARM core and eight digital signal
processing (DSP) cores, enabling them to handle transcoding on the CPU. Select these cartridges also
for various forms of voice over IP (VoIP) transcoding and telecommunications use cases. Thomson
Video Network is an example of a video transcoding and delivery company that partners with HPE
and that has developed a reference architecture using HPE Moonshot m800 cartridges. You can visit
this link to view applications, including video transcoding codecs that can benefit from the m800 DSP
cores: http://www.ti.com/lsds/ti/processors/dsp/applications.page.

Table 7-8 provides more specific information about these cartridges.


Table 7-8 HPE Moonshot cartridges for transcoding
Scoping the number of required cartridges for video transcoding
How you scope the size of the solution depends on the intended workload. The sections below
provide general guidelines that you can use as a starting point. Remember to test your solution, as
described later in this section. Also, you should generally over plan by 20%–25% to allow for
expansion.

File-based transcoding

For file-based transcoding, you will follow similar guidelines to those you use for other parallelized,
non-time-critical applications such as high-performance computing (HPC) and big data analytics.
Consider how many minutes of files need to be transcoded each day and attempt to determine how
large of a transcoding farm is required to handle those files. Customers often measure file-based
transcoding performance in real-time ratio, which is the duration of the source video divided by the
transcoding time. For example, a transcoding farm with a 0.5 real-time ratio would be able to
transcode a 90-minute video in three hours.

The real-time ratio is affected not only by the total compute power and memory delivered by the
solution but also by the complexity of the job. As mentioned earlier, the solution might need to
transcode a video file into several different formats as part of the same job. Therefore, when
discussing needs with the customer, make sure that you understand the type of conversion that they
require.

As a starting point for your estimates, you can refer to results from Harmonic, an HPE Moonshot
partner for file-based transcoding solutions. Harmonic found that three m710 cartridges provided a
real-time ratio of 1 for a transcoding job that output eight different files with different bitrates and
resolutions. In other words, the three cartridges can transcode one hour of video in one hour. If the
customer needed 24 hours of video transcoded a day (assuming 24 hour a day operation), the three
cartridges could provide it. A fully loaded Moonshot 1500 chassis could transcode 360 hours of
video a day, assuming near linear scaling.

Live transcoding and streaming

When the customer wants servers for live transcoding, you should scope the size of the solution
based on the number of video streams that the solution must support. The precise number that a
cartridge can support at any moment depends on a variety of factors such as the particular solution,
the codecs, and so on. HPE has tested the m710p as supporting up to 136 HD streams per rack unit and
the m710 as supporting up to 110 HD streams per rack unit. The HPE Moonshot 1500 chassis is 4.3 U
and has 45 cartridges, so the m710p supports up to 13 streams and the m710, up to 10.

Identifying other CDN workloads for HPE Moonshot


Figure 7-11 Identifying other CDN workloads for HPE Moonshot

HPE Moonshot solutions’ ability to meet customers’ content delivery needs extends beyond video
transcoding, as you see in the examples in Figure 7-11.

The world of online gaming introduces its own complexities. Not only does the server need to deliver
high-resolution graphics, it must do so in an interactive, highly responsive manner. When supporting
one of the popular massive multi-player online (MMO) games, the server ’s processing power is
taxed with the need to interact with multiple users, each with their own browser, connections, and
capabilities. Online gaming companies need servers that can meet the requirements while also
permitting easy scaling as the company adds subscribers.

HPE m710p cartridges with their top-of-the-line GPUs are recommended for most gaming
workloads. For customers with less intensive requirements, such as smaller numbers of players or
lower resolution graphics, m700 cartridges can meet the needs. In either case, the HPE Moonshot
System provides the extreme density and scalability that customers require.

Extreme file transfer refers to the transfer of very large files of perhaps several terabytes. These are
often files involved in the production of media content. For example, a media company might need to
transfer digital video files from the place of production to a centralized data center. HPE m710p and
m710 cartridges are optimized for these workloads, as well as for the video processing and
streaming workloads. In fact, the same company might require both.

As you learned, CDN providers deliver content on behalf of other companies. A CDN provider might
need to deliver video, games, or extreme files, in which case, you should plan an HPE Moonshot
solution for that provider much as you would for a company hosting its own delivery services.
Sometimes, the CDN provider is focused on less taxing workloads, such as the delivery of more
traditional web content. If you are designing a solution for this type of CDN, the m700 cartridges can
provide a more cost-effective alternative to the m710 and m710p cartridges.

Planning storage
Figure 7-12 Planning storage

The video processing or other CDN solution often requires external storage. A file-based transcoding
solution reads input files from this storage and outputs files to it. Typically, select SL servers or
Apollo 4000 servers such as the ones that you would use for big data storage nodes, shown here in
Figure 7-12.

Meeting the networking needs for file-based transcoding

Figure 7-13 Meeting the networking needs for file-based transcoding

As you learned in the previous chapter, the switch modules that you select depend primarily on the
selected cartridges. In most cases, you will be using m710p or m710 cartridges, so you should select
the HPE Moonshot 45XGc switches and either HPE Moonshot 16-SFP+ Uplink Modules or 4-QSPF+
Uplink Modules, based on the considerations covered in the previous chapter.

A file-based transcoding solution generates significant traffic between the HPE Moonshot cartridges
and storage. First, you should set up the solution in such a way that cartridges can use both of their
adapters. Typically, you should combine the switch modules in an IRF fabric and set up LACP NIC
bonding on the cartridge node adapters. These adapters support Remote Direct Memory Access
(RDMA) over Converged Ethernet (RoCE) for low latency communications with the storage, and
together they provide 20 Gbps bandwidth.

In a single chassis design, the chassis 40 GbE uplinks can connect directly to storage. (The HPE
reference architecture for Harmonic WFS Xpress calls for this design.) If multiple chassis must
connect to the storage, you can add HPE 5930 ToR switches to the plan to aggregate links for multiple
chassis. And if the solution must extend across racks, you can connect the ToR switches to
aggregation layer HPE 5930 switches or other HPE data center switches, as you see in Figure 7-13.

Meeting the networking needs for live transcoding and other content
delivery

Figure 7-14 Meeting the networking needs for live transcoding and other content delivery

For live transcoding and streaming, as well as online gaming and content delivery networks (CDN),
you must plan a connection to shared storage, as well as a connection to data center switches for
traffic to flow toward external clients.

Because the streams are often destined to end users with limited bandwidth, the streaming files are
typically much smaller than the original files being transcoded. A high-definition (HD) stream
typically consumes between 5 Mbps and 20 Mbps. Therefore, even if an HPE Moonshot chassis is
operating at full capacity and supporting 585 streams, only 11.7 Gbps is required. Two 40 GbE links
or even two 10 GbE links, if the customer data center only supports 10 GbE, should be sufficient, as
shown in Figure 7-14. In the latter case, though, you might want to plan for four 10 GbE links (two on
each switch module) for failover situations.

For gaming and other CDN workloads, discuss the traffic needs with the customer. However, their
bandwidth needs will probably be less than that for HD video streaming.

You might need to add more uplink bandwidth between the data center network and a Moonshot
chassis that supports cartridges used for extreme file transfer.

Guidelines for testing video processing and content delivery use cases
You should test a cartridge with the customer ’s application to verify that the cartridge can handle the
desired workload under a variety of conditions. As mentioned before, you can perform these tests in
the HPE Discovery Lab. Plan a mix of workloads, including worst-case scenarios, such as all HD
streams for a video transcoding application.

When you assess performance, you should be aiming for a particular metric that the customer has
defined as the required level of performance. For example, for file-based transcoding, you might
calculate the real-time ratio: the length of the video file divided by the length that transcoding takes.
File-based transcoding, such as HPC and big data analytic applications, are designed to operate in
parallel across many servers, so it is important that you test on your planned number of cartridges.
After you assess the real-time ratio provided by these cartridges, you can adjust the number of
cartridges up or down as required. Of course, you might not necessarily adjust down. For example, if
45 cartridges provide a better real-time ratio than the customer requires, you might propose the faster
performance as a benefit of your solution.

For live transcoding, you can monitor CPU, memory, and networking usage, as you add streams to a
cartridge, stopping when one of these resources reaches near maximum utilization. (Monitoring
networking I/O is less important; CPU or memory is most likely to be the bottleneck.) You might also
monitor from the client side because, in the end, it is the user ’s experience that matters. The customer
might have a particular metric related to end user ’s experience that the solution must provide. For
example, you might need to monitor the wait time to watch time ratio.

The streaming workloads isolate cartridges more than the file-based transcoding workloads. Each
cartridge streams to a number of users on its own. It is still often a good idea to test performance with
a fully loaded chassis. Generally, though, cartridge performance scales linearly within a chassis. In
other words, if one m710 handles 10 streams, a chassis should handle 450 streams.
Chapter 7—Activity 2
You will now complete an activity in which you will
• Design a video transcoding solution for a TV studio company
• Architect a solution to meet the customer ’s needs, including
– Transcoding up to 300 hours per day
– Storing video library

Scenario

Your sales partner has found an opportunity for selling HPE infrastructure to a TV studio.
Recognizing that more and more viewers are turning to the Internet to watch TV, this studio is seeking
to gain a competitive edge by allowing subscribers to stream their shows on demand. The company
needs a file-based transcoding solution to convert analog video to a digital format and to package it
for streaming. The company has selected Harmonic to provide the software solution, but tests have
shown that the company’s existing general purpose servers cannot handle the transcoding workload.

You must architect a solution to meet the needs.

Workload

The file-based transcoding solution will consist of a Harmonic WFS controller that controls a
transcoding farm of servers. These servers run ProMedia Carbon and ProMedia Xpress, which can
convert video to many different formats. Separate servers in the farm package the transcoded video
for multi-screen video on demand (VoD). The WFS solution manages distributing jobs to servers and
allows for automated and batch processing.

The solution will output video in several bit rates and resolutions using the H.264 video codec.

Requirements

The studio has several channels with 24-hour programming, which needs to be prepared for
streaming in advance of the content’s broadcast date. The studio also has a large backlog of video to
convert for on-demand viewing. Finally, the studio has a number of promotional clips and other
content that must be prepared for streaming. Decision makers have determined that the transcoding
solution must be able to package 300 hours per day. Because Harmonic WFS supports automation and
batch jobs, a day will be considered 24 hours. (The servers can go down for scheduled outages, but
normal operation will be 24 hours a day.)

The company has an FTP server for providing the source files. The company also already has servers
that meet the needs for the combined Harmonic Packager and Origin streaming servers. You must
plan servers to host the transcoding components. You also need to plan a server or servers to support
the video library to which the transcoding farm will output files. The company requires 200 TB of
storage (including duplicated data).
Select cartridge models

Record your answers to these questions based on the scenario above.


1. Select an HPE Moonshot cartridge model for the transcoding farm. Explain your choice.
2. The solution also requires a server for the WFS controller, WFS manager, and SQL database
roles. The controller must meet these specifications:
– Processor: Intel or AMD, 3.0GHz (can include Turbo); quad-core preferred
– Memory: 12 GB or higher

You want to host the controller in the Moonshot chassis to conserve rack space and deliver a complete
solution with Moonshot. Select a cartridge that meets these needs. You might have more than one
option. Explain your choice.

Run tests

You are now developing a POC. The customer has told you that they want to separate the transcoding
role from the packaging role. The packaging role is less intensive, and the customer tells you to
provide two servers to play that role. But you need to test in order to determine how many cartridges
are required for transcoding. You will begin by testing how long it takes one cartridge to transcode a
file and one cartridge to package the file. (This section gives example test results for the purpose of
the activity only. In the real world, results might vary based on the cartridge that you use and the
customer ’s workload.)
1. What information do you need to discuss with the customer to set up the test?
2. As you discussed needs, the customer told you that the Harmonic services must be installed on
either Microsoft Windows 2008 or Microsoft Windows 2012 R2. Visit
http://www8.hp.com/us/en/products/servers/management/operating-environments/os-support-
matrix.html and determine which OS is supported.
3. You have discussed the necessary information and set up the POC. You discover that the m710
cartridge running ProMedia Carbon and ProMedia Xpress can transcode one 60-minute video
into all required output formats in 180 minutes. What is the real-time ratio?
4. What real-time ratio does the customer require?
5. How many cartridges should you provide to transcode files?
6. How many Moonshot chassis should you provide? Remember to include the cartridge for hosting
the WFS controller.

Plan storage

Create a plan for how you will provide storage for the solution. Include the connections between the
HPE Moonshot solution and the storage.

You can check your answers to the questions in this activity by referring to Appendix B: Answers to
Activities.
Mobile workspace
In this topic, you will learn about designing HPE Moonshot solutions to support mobile workspace
applications.

Mobile workspace use case

Figure 7-15 Mobile workspace use case

Millennials—who have grown up with constant, ready access to laptops, tablets, and smartphones—
make up an increasingly large segment of the workforce. They—as well as many of their older
coworkers—work most efficiently when they can access their work from any device, whether they
are in their own cubicle, meeting with a coworker, or visiting a customer site. At the same time that
employees want to be able to work over the network on applications and projects that move with them
from device to device, employees also want to avoid the frustration of lags and poor performance.

Many companies have realized that they can benefit from, rather than struggle against, the Bring Your
Own Device (BYOD) trend. In a BYOD environment, users’ devices essentially become terminals for
applications and services that are hosted in the data center. A BYOD environment can help the
customer to save costs through investing less in managed devices. Users can move more freely
through an open workspace, inviting collaboration and again helping the company save money
through smaller space requirements. BYOD can also help to protect sensitive data, because the data is
hosted in the data center.

However, if employees are to gain the benefits of anywhere access, they must truly be able to use any
applications anywhere, as Figure 7-15 suggests. The solution must be flexible enough to cover a wide
range of applications. It must also give users the same high-quality experience that they expect at a
traditional workstation across a variety of devices. Meeting these challenges can be complex, and the
IT staff needs a solution that helps them to manage both users and devices easily without limiting their
choices. The solution must also help to enforce the proper security and isolation, preventing sensitive
corporate data from transferring to a user device improperly.
Different needs on different devices

Figure 7-16 Different needs on different devices

The BYOD or mobile workspace solution must adapt to the fact that users have different needs and
perform different tasks on different devices, shown in Figure 7-16. Users tend to create content on
more traditional, larger devices such as laptops, traditional company-managed desktops dedicated to
them, or company-managed devices shared with coworkers. On tablets, smartphones, and other small
glanceable, pocketable, grab-and-go type devices, users tend to consume content using apps.

Mobile workspace technologies

Figure 7-17 Mobile workspace technologies

Companies can choose mobile workspace solutions that meet the needs of the various ways their
employees work, as you see in Figure 7-17.

Employees such as bank tellers or call center operators tend to work with a few, primarily text-based
applications. Session or application virtualization works well for these types of tasks. A remote
virtualized application runs on a server in the data center. Users receive on-demand access to the
application through a client. This client logs in with the server and accesses the remote application
using a display protocol that lets the users interact with the application much as they would with an
application installed locally. Virtualized applications can cross the consume-and-create spectrum
presented earlier, and they can run on a wide array of devices, including smartphones, tablets, laptops,
and desktops. Citrix XenApp, a common example of a virtualized application solution, is an HPE
Moonshot partner.
Many office workers spend most of their day working with email, spreadsheets, and word processors.
These workers benefit from a virtual desktop infrastructure (VDI) solution. Like application
virtualization, VDI enables users to interact with an environment running on a remote server.
However, with VDI, this environment is a complete OS as opposed to one application. Data center
servers host virtual machines (VMs), each of which is set up with the basic applications and tools that
employees require. Employees then log in to their remote VM from whatever device they choose.

Traditional VDI can deliver good performance, user experience, and a low cost per seat for
employees who use simple office applications, but it falls short of meeting the needs for a large
segment of employees. These employees require access to multiple applications that feature graphics
and multimedia elements. For example, they might be web designers or software programmers. They
might be sales professionals who need to participate in video conferences. The list goes on. All of
these workers require applications that support robust web content, smooth video, and hardware-
assisted graphics—they are applications that fall on the “create” side of the spectrum that you
examined earlier.

Traditional VDI cannot give these users the essential combination of CPU and graphics performance.
A hosted physical desktop solution, such as that provided by Citrix XenDesktop or Leostream (HPE
Moonshot partners), allows users to access a desktop that is hosted in the data center. As with VDI, the
user logs in to the desktop remotely. However, the desktop is not a VM but rather a physical machine
running just that user ’s OS. Because the machine has exclusive access to the physical hardware, it can
better run graphics-intensive applications. Hosted physical desktop solutions run on laptops and
desktops, whether those belong to the user or to the company, and whether the physical device is
dedicated to one user or shared. Sometimes the hosted physical desktop solution is called Hosted
Desktop Infrastructure (HDI)

Both VDI and HDI work well with HPE Thin Clients, clients designed to act as terminals for remote
desktops without allowing data to leave the data center.

As an alternative to hosted physical desktops for users who run highly demanding applications such
as medical imaging, computer-aided design/computer-aided manufacturing (CAD/CAM), or oil and
gas simulations, graphics-accelerated VDI can provide a good option.

Technologies for which HPE Moonshot is tailored


Figure 7-18 Technologies for which HPE Moonshot is tailored

Here you see highlighted the two technologies for which HPE Moonshot solutions are tailored:
physical hosted desktop and application virtualization. HPE offers ConvergedSystem solutions for
supporting the other technologies, but this ebook does not cover those solutions.

Figure 7-18 also shows how the relevant technologies compare in terms of scale, cost, security, and
per-user compute power. As you see, the more secure the solution and the more compute power it
provides per user, the greater the cost and the less simple it is to scale the solution. Hosted physical
desktop provides good security and user isolation, as well as significant per-user compute power.
Application virtualization offers the best scalability but less security and lower performance because
it does not isolate individual users or dedicate resources to them.

Selecting the correct HPE Moonshot cartridges


Figure 7-19 Selecting the correct HPE Moonshot cartridges

For a physical hosted desktop solution, such as a Citrix XenDesktop solution or a Leostream solution
with Connection Broker, select HPE ProLiant m710/m710p or m700 cartridges. Often, customers’
physical hosted desktop environments use the CPU for all applications, including rich media ones,
leading to slow performance and a poor user experience. Both m700 and m710 cartridges provide
GPUs, dramatically improving performance for users who need to run media-rich applications such
as video conferencing or design applications, as you see in Figure 7-19. The m710/m710p provides
progressively more power per user and 10 GbE ports. The four-node m700, on the other hand,
provides a higher density solution and 1 GbE ports.

The HPE ProLiant m710/m710p and m700 cartridges are also optimized for application virtualization
solutions such as Citrix XenApp. Once again, the m710 cartridges provide more power, as well as
higher speed connectivity. The m710p provides more power still, particularly for GPU-accelerated
workloads. See Table 7-9 for more specific information about these cartridges.

Unlike the big data and video processing workloads, the mobile workspace workloads do not
typically call for high-capacity local storage or connectivity to external storage. You could add a
smaller SSD to the proposal to hold the image for a local boot, as well as to provide some local
storage for users in a hosted physical desktop solution.
Table 7-9 HPE Moonshot cartridges for mobile workspace solutions
Scoping the number of required cartridges for mobile workspace
applications
Figure 7-20 Scoping the number of required cartridges for mobile workspace applications

You should find it fairly straightforward to scope the number of cartridges required for a physical
hosted desktop solution. (Figure 7-20 provides a summary of guidelines.) The intent of such a
solution is to provide one machine per user, so you simply need to know the number of users who
require desktops—which the customer should be able to tell you. Discuss with the customer decision
makers whether they have included room for expansion, new hires, and so on in their estimate.
Remember that the m710/m710p has one node, so you should plan one cartridge per user. The m700
with its four nodes can support four users. Recommend to provide about 20%–25% more cartridges
than are currently required, to allow for future growth.

To plan an HPE Moonshot solution to support application virtualization, you need to know the
maximum number of users who will concurrently access the application, as well as the type of
applications. The customer should be able to give you the maximum number of concurrent users
based on user surveys. You should discuss the application types with the customer and classify them
as either normal applications or media-rich ones. Citrix and HPE have tested HPE m710 cartridges
running Citrix XenApp. Based on these tests, one m710 cartridge can support about 50 users who are
using applications such as word processors and spreadsheets. It can support about 40 users who are
running more media-rich applications. The chassis performance scales quite linearly, so 45
cartridges can support about 2250 normal users or 1800 rich-media users.

Media-rich applications include Adobe Photoshop, CAD applications, and other applications that
render 2D and 3D graphics. The m700, m710, and m710p all provide GPUs, but the m710 and m710p
provide progressively more power.

The m710/m710p cartridge GPU supports OpenGL 4.2. Visit


https://en.wikipedia.org/wiki/List_of_OpenGL_programs for a list of more applications that use
OpenGL.

As you see, the GPU enables the Moonshot cartridge to support almost as many rich-media users as
normal users—differentiating it from other servers and making Moonshot a great fit for companies
with rich-media users.

Use these values to begin planning the number of cartridges for your customer. Remember, though,
the importance of testing with your customer ’s precise application and requirements. Also remember
the best practice of leaving room for 20%–25% growth. For example, your customer needs to
support 4000 rich-media users. You should plan 125 m710 cartridges, nearly filling three Moonshot
1500 Chassis, to support 5000 users as a starting point. You should then test your solution.

Planning for the solution infrastructure

Figure 7-21 Planning for the solution infrastructure

The hosted physical infrastructure or application virtualization solution typically includes a few
additional servers beyond the hosted desktops or application servers themselves. These servers help
to manage the solution, set up the sessions, and so on. The example in Figure 7-21 shows that for
Citrix solutions, these servers might include a controller; Provisioning Services (PVS), which
deploys OS; XenMobile, which manages mobile devices and their applications; and Netscaler, a
gateway that helps to optimize application delivery. For Leostream, these servers might include load
balancers and a Connection Broker cluster. The customer should be able to provide you with a list.

Remember to plan cartridges to support these additional servers. The processing demands on these
servers are typically less intense, and they might be able to run as VMs. You can add one or two HPE
m300 cartridges to one of the planned chassis to host these VMs.

Planning the network connections


Figure 7-22 Planning the network connections

For mobile workspace solutions, network bandwidth is the resource least likely to cause a bottleneck,
whether you are using cartridges that support 10 GbE or 1 GbE connections. Even if users are
running very graphics-intensive workloads, their network bandwidth requirements should not exceed
about 30 Mbps. Therefore, m700 cartridges can easily support even four users on one of its 1 GbE
ports. And a 6-SPF+ Uplink Module should be able to provide more than enough bandwidth for all 45
cartridges, even if you only use some of its ports.

A cartridge supporting application virtualization for many very media-rich users might need a bit
more than 1 Gbps bandwidth. You might want to set up both 1 GbE ports on m700 cartridges for load
balancing. The m710/m710p cartridges, with their 10 GbE ports, should not have difficulty meeting
the requirements (see Figure 7-22).

In this type of solution, most of the traffic flows between the cartridges and the external network that
connects to users. Work with the network architect to plan the uplink bandwidth. For example, for a
chassis with m700 cartridges, you might plan to use two links on each 6-SPF+ Uplink Module to stack
the switches and the other four for uplinks. For m710/m710p cartridges, you might use all four ports
on a 4-QSPF+ Uplink Module, or, if you are using two switch modules, you could use two ports on
each module for IRF and the other two on each module for uplinks.

Remember to discuss the customer ’s availability requirements. If the customer wants to provide link
redundancy for each cartridge node, you must include two switch modules and accompanying uplink
modules, even if they are not required for bandwidth.

Guidelines for testing


The HPE Discovery Lab remains a great resource for running tests and demonstrating the excellent
solution performance to your customer. Set up a chassis in the intended configuration and plan your
tests.

For a hosted physical desktop solution, you are primarily testing your selected cartridge. Load the
cartridge with the applications that the customer has told you users require. Then log in to the hosted
desktop and run each application, attempting to use the application as much like a user as possible.
Run multiple applications at once as users tend to do. Use the OS tools to monitor the CPU and
memory. Also assess your experience: Is the application responsive? Does it lag? If you detect any
issues or overutilization of the CPU or GPU, you might propose an HPE ProLiant m710 or even
m710p rather than an m700.

To test an application virtualization solution, you need to simulate many users running the virtualized
application. Select a tool such as LoginVSI for this purpose. Begin with one cartridge and simulate a
certain number of users, such as 10 or 20. Monitor metrics such as response time—the ultimate
indicator of whether the solution is functioning well. The customer might give you a particular
response time that cannot be exceeded—for example, three seconds. You can also track the other
metrics, including processor utilization, memory utilization, network utilization, disk throughput, and
GPU utilization (for rich application delivery) to get a sense of how many resources the application
instances demand.

Continue to add users until one of the resources (most likely CPU) reaches near full utilization or the
response time becomes too long. You can see that the resource is becoming a bottleneck because its
utilization will stop rising and plateau. You should record the number of users at this point as the
maximum number of users that the cartridge can support. You can then scope out the number of
cartridges required.

You should also test with multiple cartridges to demonstrate how the solution scales.

As you perform the tests, you should also monitor power usage in the iLO CM. One of the primary
benefits of an HPE Moonshot solution is that it helps to reduce power and cooling costs. Therefore,
the relatively low amount of power consumed as the solution delivers high performance can provide
a compelling selling point for your proposal. This recommendation also holds for other types of
Moonshot solutions.
Chapter 7—Activity 3
In this activity, you will plan an HPE Moonshot solution to provide application virtualization to your
customer. To create this plan, you will
• Gather information about the customer
• Plan a mobile workspace solution for the customer with:
– A rapidly growing workforce
– Users who want to use a mix of devices

Scenario
A company that designs specialized hardware for electronics has a rapidly expanding workforce,
particularly in the design department, but also in sales and other departments. The company is
outgrowing its floor space. At the same time, the primarily young and tech-savvy workforce prefers
to use a mix of tablets and traditional desktops to do their work.

One ongoing challenge is that developers working on the go could lose data. The latest debacle
involved a laptop that an employee left behind on the metro in Washington, D.C. Weeks of design
work was lost, costing the company hundreds of thousands of dollars. Worse, if an employee lost
clients’ private data, the company could fail an audit and be fined.

To avoid disasters like this and to boost productivity, the CFO and CISO want to shift the model and
provide employees with thin clients that act as terminals for virtualized applications. But the CIO is
cautious. A pilot program with virtual desktop solutions has been plagued with issues. The IT team
has struggled with capacity and sizing as well as trying to manage multiple vendor relationships.
Support tickets have resulted in finger pointing and plans for moving past the pilot change every time
the stakeholders discuss them.

Record your answers as you review the questions related to the tasks below. You can check your
answers by referring to Appendix B: Answers to Activities.

Gather information
1. What should you discuss with the customer to gain a better idea about the workload and the
solution requirements?
2. What should you discuss to set the CIO’s mind at ease about the solution?

After your further discussions, you have collected the information below.

The company has decided to use Citrix XenApp for the application virtualization solution.

At this point, the pooled resources include


• Microsoft Office Professional Plus 2013, which includes applications such as Word, Excel,
PowerPoint, OneNote, Outlook, Publisher, Access, and Skype for Business
• Adobe Reader XI
• Doro PDF printer
• SolidWorks eDrawings Viewer, a 3-D design application that supports OpenGL on hardware as
well as software
• Internet Explorer
• Adobe Photoshop

The solution must support 4000 users:


• 2000 designers, who use Microsoft Office Professional Plus 2013, including Access and Skype for
Business, eDrawings Viewer, and Adobe Photoshop
• 1000 sales professionals, who use Microsoft Office Professional Plus 2013, including Skype for
Business for video conferences
• 750 marketing employees, who use Microsoft Office Professional Plus 2013 and Adobe Photoshop
• 250 HR, accounting, and receptionists, who use Microsoft Office Professional Plus 2013

The company does not require redundancy for individual application server links. However, link
redundancy is required for the solution as a whole.

Because many employees will use design applications, which involve large files, the company would
prefer 10 GbE for the servers. The chassis can connect to the datacenter network infrastructure on 40
GbE links; the customer would prefer to consolidate ports as much as possible. The customer
network administrators tell you to ensure that uplinks have no more than 8:1 oversubscription.

The company plans to use hosted applications that are installed on the VDA servers locally. Each
server will need at least 90 GB of local storage for the server image, for installed applications, and
for temporary files while hosting sessions. The servers will also need to connect to a NAS, which the
company already has set up.

Plan the solution


1. Which applications will benefit from GPU acceleration?
2. Which type of HPE Moonshot cartridge should you recommend for the XenApp Virtual Desktop
Agent (VDA), which supports the user sessions?
3. What is your initial estimate for the number of cartridges required for the XenApp VDA servers?
4. Remember also that the company is growing. What percentage does HPE recommend adding to
the solution to account for this growth? How many cartridges total should you recommend for
VDA servers?
5. The company also requires three servers to host VMs for the XenApp controller and Netscaler.
Which cartridge provides the best choice for hosting these VMs?
a. m300
b. m700
c. m800
6. With the three additional cartridges, how many total cartridges have you planned? How many HPE
Moonshot 1500 chassis are required?

Plan the networking

Based on the requirements that you gathered, plan the following:


• Switch modules
– Type of switch module
– Number of modules per chassis
• Uplink modules
– Type of uplink module
– Number of modules per chassis
• IRF or stacking
– Do you plan to use this feature?
– If you do, which modules will you combine and how will you link the modules?
• Number of uplinks to connect

Plan provisioning
1. Server administrators explain that they plan to use WDS to provision cartridges with their
Windows Server 2012 R2 images. They will then use Citrix Machine Creation Services (MCS) to
deliver an image with proper applications to the VDA servers. MCS creates a thin-provisioned
clone of a master image, which the hypervisor on the VDA server hardware uses to create a VM
for the VDA server. The alternative solution is Citrix Provisioning Server (PVS), which uses a
dedicated server to deploy images to physical servers or VMs. Why might you recommend that
the customer use PVS instead of MCS?

Run tests
1. You have set up a POC with your proposed cartridges and deployed the proper OS and customer
applications. What guidelines will you follow as you conduct tests?

Web infrastructure
HPE Moonshot can deliver an all-in-one, scale-out Web infrastructure solution. HPE Apollo 2000
solutions (another family of HPE density-optimized servers) can also meet the requirements for such
a solution.

Web infrastructure hosting demands


Web services might not impose the same processing and memory requirements as the other
workloads that you have examined in this chapter. And you and your customers are probably very
familiar with designing solutions for them. However, the changing world has imposed new demands.
More and more users are living online—always connected through a smartphone. For many
companies, their web presence is becoming an increasingly important revenue generator, whether in
generating sales, subscriptions, advertisement revenue, social media presence, customer loyalty,
brand awareness, or a mix of many of purposes. To continue to enjoy the benefits, customers need to
be able to scale out their web services as quickly as demand grows, and they must achieve the scale
out quickly, simply, and cost effectively.

Many customers are considering a public cloud Web hosting solution to provide the on-demand
scalability that they require. However, they hesitate to lose the control and security that comes with
having their own dedicated hardware.

HPE density-optimized solutions

Figure 7-23 HPE density-optimized solutions

HPE density-optimized solutions allow customers to balance these demands, as you see in Figure 7-
23. They retain control of the infrastructure, but they can scale out efficiently, compressing the
infrastructure that used to require racks to one or a few chassis. If the customer wants the true cloud
experience, you can even propose an HPE Helion CloudSystem solution to control the Moonshot
solution. These solutions are discussed in Chapter 9,“Monitoring and Managing HPE Solutions.”

Selecting the correct HPE Moonshot cartridges for web infrastructure


Figure 7-24 Selecting the correct HPE Moonshot cartridges for web infrastructure

Use HPE ProLiant m300 or m350 cartridges for the web infrastructure (see Figure 7-24). The m300
supplies greater compute power and memory. However, the m350 also provides good performance
for this type of workload, and it delivers higher density with four nodes per cartridge. Review the
information in Table 7-10 for more details about these cartridges.
Table 7-10 HPE Moonshot cartridges for web infrastructure

Cartridge m300 m350

Workload Web infrastructure Higher density web infrastructure

CPU

Number of processors 1 4

Processor type Intel® Atom™ Intel Atom Processor C2730


Processor C2750

Frequency per core 2.4 GHz 1.7 GHz

Cores per processor 8 8

GPU — —

Memory

DIMM type 1600 MHz, DDR3 1600 MHz, DDR3 SO-DIMM, ECC4
UDIMM SO-DIMM DIMMs, 4 Embedded DRAM
ECC

Capacity 32 GB (4x8 GB) 64 GB (8x8 GB) 16 GB per processor

Network

Integrated NIC 2x 1 GbE 8x 1 GbE2x 1 GbE per processor

Intra-cartridge — —

Cartridge-to-cartridge — —

Storag e

Local One of: Either:


• 500 GB SATA • 4x 32 GB M.2 SATA 2230 SSD
HDD • 4x64 GB M.2 2230 SATA SSD
• 1 TB SATA HDD
• 240GB SATA SSD
32 or 64GB 2242 M.2
SSD

External capabilities iSCSI software initiator iSCSI software initiator

Power <30 W <60 W

Supported OS See the latest at Ubuntu 13.10RHEL Ubuntu 14.04.4RHEL 6.5 SLES 11
http://www8.hp.com/us/en/products/servers/management/operating- 6.5, RHEL 7 SLES 11 SP3+PLDP for 1GB NICCentOS
environments/ SP3Windows Server 6.5Windows Server 2012Windows
2012 R2 Server 2012 R2

Reference architecture for a three-tier RHEL solution

Figure 7-25 Reference architecture for a three-tier RHEL solution

An enterprise-scale web infrastructure solution typically has three tiers. Load balancers receive the
incoming requests and distribute them to available web servers. The web server sets up the HTTP or
HTTPS session with the client, sends web content, and otherwise interacts with the client. The web
servers communicate with backend databases that store content for the solution.

Figure 7-25 shows an example architecture for Red Hat Enterprise Linux (RHEL) that fits in one HPE
Moonshot chassis. It features four HAProxy load balancers, 19 Tomcat or Geronimo Web or
application servers, and 22 MySQL database servers. Each server is installed on one m300 cartridge
with an identical configuration. You would adjust this solution based on the customer ’s workload and
needs. For example, a website might generate a great deal of traffic but have less content, such as user
accounts that need to be stored in a database. In that case, you might deploy more web servers and
fewer databases.

HPE has tested the example architecture as supporting 2400 operations per second and maximum
response times of 2.5 seconds with up to 115,000 users.

The storage and networking design for this solution are fairly straightforward. Choose an
appropriate size SSD for the databases, based on the database size. Due to the scale-out approach, the
customer might not require a second switch module and redundancy for the cartridge node adapters.
However, be sure to discuss the bandwidth requirements and whether the servers might need to use
both of their 1 GbE ports. The networking solution probably does not require advanced features, so
you can propose the 45G switch (or the 180G switch if you are proposing the HPE ProLiant m350
cartridges).

Expanding the solution

Figure 7-26 Expanding the solution

If the customer needs to support more than 115,000 users, you need to expand the solution; otherwise,
the response time will grow too large. You can scale out the number of cartridges configured as web
servers. However, you shouldn’t scale out the MySQL servers because structured databases, as you
learned in Chapter 3, “Advanced Architectures for Server Solutions,” work better with a scale up
approach. Instead, add other HPE ProLiant servers such as DL or BL servers to the plan to provide the
database services, as you see in Figure 7-26. (You learned how to architect such solutions in
prerequisite training.)

HPE Apollo 2000 hyperscale web infrastructure

Figure 7-27 HPE Apollo 2000 hyperscale web infrastructure

Alternatively, you can create a hyperscale web infrastructure using HPE Apollo 2000 Systems. These
systems do not provide the ultra-high density of the HPE Moonshot Systems; however, they provide
more powerful processors and greater amounts of memory.

Each Apollo 2000 chassis can house four HPE ProLiant XL170r servers. As you learned previously,
these servers support high-end Intel Xeon processors and are powerful enough for smaller scale
HPC. At the same time, these powerful systems are twice as dense as traditional 1U rack servers. As
you remember, these systems also provide a generous amount of storage. The HPE Apollo r2200
chassis supports three SATA or SAS HDDs or SSDs per XL170r server, and the HPE Apollo r2600
chassis supports six per XL170r server. The HPE Apollo r2800 supports 24 drives like the r2600;
however, you can choose how many you want to allocate to each server.

As shown in Figure 7-27, the Apollo 2000 Systems give you the density and power to scale out an
architecture such as the one that you examined earlier. To support higher numbers of users, choose
dual-socket systems with higher numbers of cores and greater memory for the MySQL servers.

Alternative reference architecture using NoSQL databases

Figure 7-28 Alternative reference architecture using NoSQL databases

The customer might choose to use NoSQL databases as opposed to SQL databases. The NoSQL
databases work well with a scale-out approach and can enable the web infrastructure to handle many
concurrent transactions at the same time.

In the reference architecture for the NoSQL database solution, the cartridges run CentOS 6.5, and
operate in a three-tier application architecture, illustrated in Figure 7-28:
• Two cartridges host one VM each and act as redundant load balancers. Note that most of the
cartridges use NIC bonding and load balance traffic out both of their ports. However, these
cartridges use one NIC for connecting to the external network, but the other NIC to exchange
heartbeats. The switch modules would need to apply VLANs to isolate the external network ports
and the heartbeat ports. The load balancers send users to one of the Web servers.
• Three cartridges host two VMs each, one NGINX Web server and one Node.js API server. The Web
servers run API servers, which contact the NoSQL servers.
• Three cartridges host one VM each—a Couchbase NoSQL server. Like the NoSQL servers
introduced earlier in this chapter, these servers act as a cluster in which all of the nodes are active
and can respond to requests.

The reference architecture also includes three cartridges to provide backend services:
• One cartridge acts as the preboot execution environment (PXE) and Dynamic Host Configuration
Protocol (DHCP) server to help deploy the OS to other cartridges.
• One cartridge hosts two VMs, a Lightweight Directory Access Protocol (LDAP) server, and a
monitoring server.
• One cartridge hosts one VM and a secondary LDAP server.

You can scale out the number of Web/API servers and NoSQL servers based on the number of users
and transactions that the customer must support. For the NoSQL servers, make sure to choose SSDs
of an appropriate size.

Guidelines for testing


When you set up your POC in your own lab or the HPE Discovery Lab, you will need to choose a
tool to test the solution. Apache JMeter tests application loads by simulating users logging in,
requesting pages, and performing actions. You should set up the tool to test according to the load that
your customer expects. Monitor metrics such as throughput, which indicates the number of requests
divided by the total time. For example, you run the test for 10 minutes (600 seconds), and the solution
responds to 1,440,000 requests. The throughput is 2400 requests per second.

You should also pay attention to the maximum and average response times. You do not want the
average response time to exceed more than 2 or so seconds, or users will have a poor experience. If
response times are too high for the number of concurrent users that the customer requires, remember
the guideline about moving SQL databases to rack or blade servers while keeping the Web server in
the Moonshot chassis.

As mentioned in earlier sections, you should also monitor power during the tests so that you can
demonstrate to the customer the potential savings in operating costs.
Chapter 7—Activity 4
In this activity, you will demonstrate how the HPE Moonshot solution provides greater performance
per dollar, per Watt, and per rack unit than a traditional server solution for video transcoding. You
could demonstrate similar results for other scenarios, but you will not take the time to do so now. You
can check your answers by referring to Appendix B: Answers to Activities.
1. Table 7-11 compares an HPE Moonshot solution with a traditional rack solution, based partially
on HPE and Harmonic tests.
2. Divide performance by rack units to obtain performance/U for each solution. Do the same for
performance/Watt and performance/$.
3. Next compare the two. Divide the Moonshot performance/U by the traditional performance/U
value to see how much more performance the Moonshot solution packs into a rack unit.
Table 7-11 HPE Moonshot transcoding performance

HPE Moonshot solution Equivalent traditional server


solution

Solution 1 chassis—43 ProLiant m710 14 x DL360 Gen9 servers


cartridges

Performance (real-time transcoding ratio or transcoded minute 14.1 14.4


per minute)

Rack units 4.3 14

Performance/U

Moonshot provides _____ times more performance/U

Power consumption 3195 W 5878 W

Performance/W

Moonshot provides _____ times more performance/W

Cost $333,462 $514,643

Performance/$

Moonshot provides times more performance/$

Summary
This chapter has guided you through architecting HPE Moonshot solutions tailored for specific
workloads. It has helped you to consider the solution as a whole, pointing out when the architecture
calls for other HPE servers such as HPE Apollo 4000 servers to provide storage for a big data
solution. You can now follow best practices in designing flexible, powerful architectures for big data
and analytics workloads. You can deploy HPE Moonshot solutions that support video transcoding and
streaming. You can also design systems that support the application virtualization and hosted physical
desktop solution that customers need in a mobile workspace and BYOD environment. Finally, you
examined architectures for a complete three-tier web infrastructure in a Moonshot chassis.
Learning check
Review what you have learned by answering these questions. Then check your answers in Appendix
A: Answers to Learning Checks.
1. What distinguishes the HPE Big Data Reference Architecture from a traditional big data
architecture?
a. The HPE architecture relies on a scale-up approach to compute resources.
b. The HPE architecture features separate compute and storage nodes, each optimized for their role.
c. The HPE architecture brings compute to data by co-locating compute resources on storage nodes.
d. The HPE architecture ensures that YARN applications always run on the same server that stores the data to be analyzed.

2. A customer needs a high-density hosted physical desktop solution. Why might the m700
cartridge provide a better solution for this customer than an m710 cartridge?
a. This cartridge has a two-port 10 GbE adapter, so it can stream more data to users.
b. This cartridge has a more powerful GPU than the m710, enabling it to support more media-rich applications.
c. This cartridge provides more memory than the m710, enabling it to support more users.
d. This cartridge provides four nodes, so each cartridge can support four physical desktops.

3. For which type of video transcoding are HPE m710 and m710p specifically designed?
a. Video transcoding that relies on the CPU alone
b. Video transcoding that uses Intel Quick Sync or FEI GPU acceleration
c. Video transcoding that is designed for hardware processing with custom ASICs
d. Video transcoding that is optimized to use DSP cores

For answers, See Chapter 7 in Appendix A.


Chapter 8 HPE Integrity Superdome X
Solution

EXAM OBJECTIVES
• Explain the benefits of the HPE Integrity Superdome X and describe its available options
• Explain the benefits of nPar and RAS features for HPE Integrity X solutions
• Position HPE Integrity Superdome X solutions for the right use cases
• Create an implementation plan for HPE Integrity X solutions, including plans for the proper
performance, scalability, fault tolerance, high availability, and manageability

Assumed knowledge
Before reading this chapter, you should have a basic understanding of the following:
• Architectural concepts such as scale-up and scale-out
• Processors, including DDR3 and DDR4 memory, hard disk drives (HDDs), solidstate drives
(SSDs), and RAID levels for storage volumes
• HPE ProLiant rack and blade servers and options for them such as HPE Smart Array Controllers
• HPE BladeSystems, including interconnect modules and Virtual Connect (VC) modules
• Server management and maintenance, including experience withIntegrated Lights Out (iLO),
Intelligent Provisioning, UEFI, HPE Insight Remote Support, HPE Insight Online, HPE Smart
Update Manager (SUM), and HPE Insight Control server provisioning (ICsp)
• HPE OneView capabilities
Chapter topics
After the HPE Integrity Superdome overview, you will examine appropriate use cases for the HPE
Integrity Superdome X solution. You will also learn more about the solution architecture and meeting
Reliability, Availability, and Serviceability (RAS) requirements.

HPE Integrity Superdome X overview


In this topic, you will learn about the capabilities and features that Integrity Superdome X offers.

HPE Integrity Superdome X

Figure 8-1 HPE Integrity Superdome X

The HPE Integrity Superdome X Server, shown in Figure 8-1, is the ideal platform to support the
most demanding mission-critical business processing and decision support workloads. Superdome X
blends x86 efficiencies with proven HPE mission-critical innovations, delivering mission-critical
reliability, availability, and serviceability (RAS) features with high-performance and cost-effective,
industry-standard x86 computing power. Superdome X provides superior single-scale Online
Transaction Processing (OLTP) performance, scaling performance 1.7 times as the system expands
from two to four or from four to eight sockets. Superdome X also delivers the same 1.7X scaling
when consolidating multiple workloads all the way out to its maximum, industry-leading 16-socket
size.

Extreme scalability and performance


Figure 8-2 Extreme scalability and performance

Superdome offers extreme scalability, as you see in Figure 8-2. It supports up to 16 Intel® Xeon®
Processor E7 v3 and E7 v2 Family processors. Its 384 DIMM slots can support up to 24 TB of DDR4
memory (Gen9) and provides a large memory footprint for the most demanding applications. Other
scalability features include
• Unique x86 hard partitioning, which optimizes resource utilization by enabling different
environments within a single enclosure.
• 1.7 times scalability factor as sockets double
• 1–8 scalable two-socket blades, providing a total of 2–16 sockets and 20–240 core count

In addition to the 1 million jOPS, the Superdome X delivers up to nine times the performance of
HPE’s current eight-socket offering, the DL980 Gen9 system.

Comparison tests have revealed the Superdome X to be


• Number one in overall performance at 10,000 GB scale factor on nonclustered TPC-H benchmark
• Number one in overall 8-socket price/performance at 10,000 GB scale factor on nonclustered
TPC-H benchmark
Note
TPC-H results show the HPE Integrity Superdome X with a result of 780,346.9 QphH @10,000 GB and $2.27QphH at 10,000 GB
with system availability as of February 3, 2016; see tpc.org/3317. The TPC believes that comparisons of TPC-H results published
with different scale factors are misleading and discourages such comparisons.

• Number one in x86 16P SPECjbb2015-MultiJVM max-JOPS and critical-JOPS benchmark results
• Holder of the x86 16P world record on the SPECjbb2015 Multi-JVM benchmark3
Note
The Standard Performance Evaluation Corporation (SPEC) results are from February 22, 2016; see spec.org.

• Highest performing 16-processor result on the two-tier SAP® Sales and Distribution (SD) standard
application benchmark of 100,000 SAP benchmark users and 545,780 SAPS
Note
HPE received certification on February 15, 2016, from SAP SE of the results of Superdome X on the two-tier SAP® SD standard
application benchmark performed in Houston on February 9, 2016. The Integrity Superdome X has achieved the leadership 16-
processor result on the two-tier SAP Sales and Distribution (SD) standard application benchmark of 100,000 SAP benchmark users
and 545,780 SAPS. To achieve this result, the Integrity Superdome X used 16 Intel® Xeon® Processors E7-8890 v3 at 2.5 GHz and
4 TB of memory running Microsoft Windows Server 2012 R2 Datacenter Edition, Microsoft SQL Server 2014, and SAP enhancement
package 5 for the SAP ERP application 6.0. See sap.com/benchmark for up-to-date information.

HPE Integrity Superdome X enclosure

Figure 8-3 HPE Integrity Superdome X enclosure

The HPE Integrity Superdome X enclosure, shown in Figure 8-3, is 18U high and supports the
following components:
• One to eight full-height two-processor HPE BL920s Gen9 server blades with
– Two multi-core Intel Xeon E7 v2 series processors
– 24 DIMM sockets per CPU socket and a maximum of 48 DIMMs per blade
• Superdome midplane with lower (c7000) and upper midplanes, all in one
• Four XBAR Fabric Modules (XFMs)
• Two Global Partition Service Modules (GPSMs)
• Two Superdome 2 Onboard Administrator (OA) modules
• One Insight Display (also known as the LCD)
• One DVD module
• One to eight c-Class half-width interconnect modules used to connect the on-blade I/O to the
external network and storage arrays
• 12 2450W hot-swap power supplies
• 15 active cool hot-swap fans
• Two single-phase or three-phase AC input modules to connect to the power supplies
• A pull tab with enclosure model and serial number and the UUID, located at the top right of the
enclosure under the power supplies

Enclosure architecture

Figure 8-4 Enclosure architecture

HPE Integrity Superdome X provides an upper midplane through which blades connect to each other.

The right side of Figure 8-4 shows a zoom in on one blade. As you see, the blade has two processors,
each connected to its own memory (up to 24 DIMMs) through scalable memory buffers (SMBs). The
processors connect to each other, as well as to the eXternal Node Controller (XNC2) controller, on 8
Gigatransfers per second (Gt/s) QPI links. The XNC2 controller then connects on eight 5Gt/s links
through the upper midplane to the four sx3000 XFMs. The XFMs connect to XNC2 controllers on
blades in other slots, creating the extremely high-speed fabric required for the blades to operate as
one massively scale-up system.

Each blade’s processors can access the memory on other blades, allowing the system to scale up to 24
TB (3 TB on each blade). The processor experiences slightly higher latency for memory on other
blades, creating a Non-Uniform Memory Access (NUMA) situation; however, the Superdome X
design offsets this issue with its high-speed links. The OS might also be NUMA aware and capable of
compensating for the latency discrepancies (for example, Microsoft Windows Server 2012 R2).

The XNC2 controllers are custom-designed Application-Specific Integrated Circuits (or ASICs) that
provide these features:
• Physical address support for up to 64 TB of main memory (well exceeding the system’s current
maximum)
• Large Remote Tag Cache for scalable coherency
• Link-level retry, link-width reduction, and end-to-end retry to provide fault tolerance from fabric
errors
• Management link for reading/writing ASIC registers to perform real-time error analysis

HPE Integrity Superdome X cabinets

Figure 8-5 HPE Integrity Superdome X cabinets

The HPE Integrity Superdome X general purpose server product family, shown in Figure 8-5,
includes HPE Integrity Superdome X servers configured as 2S (one-blade), 4S (two-blade), 8S (four-
blade), and 16S (eight-blade) models, each configured as one nPartition.

HPE Integrity Superdome X options

Figure 8-6 HPE Integrity Superdome X options

Figure 8-6 and Figure 8-7 provide a summary of the HPE Integrity Superdome x options.
Specifically, each BL920s Gen8 or Gen9 Server Blade includes up to:
• Two processors:
– Gen9 = Intel Xeon E7 v3 processors

Processor options range from 4-core 3.2GHz processors to 18-core 2.5GHz processors with several
options in between.
– Gen8 = Intel Xeon E7 v2 processors

Processor options range from 4-core 3.2GHz processors to 15-core 2.5GHz processors with several
options in between.
• 48 DIMM slots

DIMM slots are filled in multiples of 16. Gen9 blades have DDR4 load-reduced DIMMs (LRDIMMs)
of 16 GB, 32 GB, or 64 GB each. The Gen8 servers use DDR3 registered DIMMs (RDIMMs) of 16
GB or 32 GB.
• Two FlexLOM slots

FlexibleLOM adapters include 10 GbE and 20 GbE Converged Network Adapters (CNAs) that support
Fibre Channel over Ethernet (FCoE) and Remote Memory Direct Access (RDMA) over Converged
Ethernet (RoCE). Other options include FC Host Bus Adapters (HBAs) and InfiniBand Host Channel
Adapters (HCAs).
• Three mezzanine slots:
– One PCIe Gen3 x8 Mezzanine (Type A) slot
– Two PCIe Gen3 x16 Mezzanine (Type B) slots

Because each enclosure can house up to eight blades, a Superdome X server can have:
• Up to 16 processors
• Up to 24 TB memory (384 DIMMs of 64 GB each; only Gen9 blades support the 64 GB DIMMs)
• Up to 16 FlexLOM adapters
• Up to 8 Type A mezzanine slots and 16 Type B mezzanine slots

The Superdome X enclosure also supports:


• Two OAs
• Eight I/O Interconnect Bays

These bays can contain 1 GbE switch modules, 10 GbE switch modules, 10 GbE pass-thru modules,
16 GbFibre Channel modules, and InfiniBand Interconnect Modules.
• Four XFMs
• Two GPSMs

The GPSMs manage CAMNET distribution to all server blades and XFMs. They also provide the
redundant global clock source. The OA also uses the GPSMs to manage the fans and power supplies
in the upper section of the enclosure.
• A CAMnet Complete module (CCM)

A CCM is a filler blade. The Superdome X enclosure requires a blade server in bay 1 and a blade
server or CCM in bay 2 or bay3 to provide a redundant manageability fabric from GPSMs to OAs. If
a customer orders an enclosure with just one BL920s, a CCM in bay 2 is automatically added to the
configuration.

HPE Integrity Superdome X options (cont.)

Figure 8-7 HPE Integrity Superdome X options (cont.)

The enclosure further supports 15 fans, providing N+1 redundancy (or more depending on how many
blades are installed) and 12 power supplies, providing N+N redundancy (twice the number of
required supplies).

The enclosure provides a DVD module, which is shared to all nPartitions. For local provisioning and
troubleshooting, it also provides a serial, USB, and video (SUV) dongle cable. The SUV dongle cable
can connect to a blade SUV port, enabling administrators to connect a crash cart or direct-attach USB
DVD.

Power distribution options


The Superdome X enclosure supports three main types of power distribution configuration.
• In a Single Phase only configuration, single phase power cords connect the 12 power supplies
directly to wall sockets. Customers can use single phase power distribution units (PDUs) to
aggregate power cords, decreasing the required number from 12 to 4.
• In a Single and Three Phase mix configuration, single phase cables connect single phase power
supplies to PDUs; the PDUs then use three phase cables to connect to the customer supplied
receptacles.
• The Three Phase only configuration uses three-phase cables to connect three-phase power supplies
directly to the customer supplied receptacles.

BL920s Gen9 Server Blade at a glance


Figure 8-8 BL920s Gen9 Server Blade at a glance

The two-processor BL920 Gen9 server blade (shown in Figure 8-8) provides the processing and
memory resources for the HPE Integrity Superdome X server platform. The blade enclosure supports
from one to eight of the two-socket HPE BL920s Gen9 server blades. (Gen8 server blades are also
available but, because they are less of a focus for this chapter, you will not examine the blade
hardware in detail.)

The HPE BL920s Gen9 server blade also has the following components:
• 48 slots for buffered DIMMs and their associated memory buffer chips and VRMs
– Each processor module is associated with 24 DIMMs
• One XNC2 Node Controller ASIC that serves as the interface between the processors on the server
blade and the processors on the other server blades
• Manageability logic: Processor Dependent Hardware (PDH) chip, PDH Controller (PDHC) chip,
and Local Power Management (LPM) chip
• HPE iLO 4 remote management processor
• One Platform Controller Hub ASIC that serves as the interface between processor 0 and the HPE
iLO 4 processor
• Connectors for three mezzanine cards (one Type A and two Type B)
• Connectors for two FlexLOM adapter cards that provide NIC capability to the server blade
• Connectors for the upper midplane and the lower midplane
• Front panel with status LEDs
• SUV interface connector (under a flip-open cover)

Blending trusted HPE Integrity Superdome 2 with standard x86 design


Figure 8-9 Blending trusted HPE Integrity Superdome 2 with standard x86 design

Now that you have examined the Superdome X enclosure and the blades that fit in it, you are ready to
examine the system as a whole. HPE has designed Integrity Superdome X to deliver the trusted HPE
Integrity Superdome RAS features on a standard x86 architecture. Figure 8-9 lists some of the RAS
features that were previously only available on UNIX systems, but that Superdome X delivers on x86.
In the next sections, you will examine each of the features—and more—beginning with hard
partitioning.

Hard partitioning withnPars

Figure 8-10 Hard partitioning with nPars

All blades in a Superdome X enclosure can work together as a single server with a single OS. The
HPE nPars technology also allows customers to divide a single physical server into a set of
independent units, or partitions, shown in Figure 8-10.

Unlike virtual partitioning, hard partitioning provides complete electronic isolation, thus providing
strong workload isolation and more efficient maintenance cycles. The partitions can boot
independently of each other, and each partition runs its OS and applications in isolation from the
others. You can reconfigure partitions as necessary, splitting a single partition into smaller partitions
—or combining existing partitions into fewer, larger partitions. You can remove resources from one
partition and add them to another using the Superdome Onboard Administrator (SD OA), without
having to manipulate the hardware physically. Partitioning changes require a server reboot to take
effect.
In addition to flexibility, electronicallyisolated partitioning also provides protection from hardware
or software failures in other partitions, as well as a high degree of security between partitions. Hard
partitions run in native speed, just like a standalone system, with no emulations or hypervisor layers
to traverse—and no added latency.

Error isolation with hard partition

Figure 8-11 Error isolation with hard partition

Because nPartitioning separates partitions at the hardware level, each partition is isolated from both
hardware and software failures on other partitions (as illustrated in Figure 8-11). Thus, nPartitions
provide isolation between the OS instances running within the partitions. Each nPar has its own
independent CPUs, memory, and I/O resources pooled from the blades that make up the partition.

Many systems use a shared backplane in which all blades are competing for the same electrical bus.
This design can cause high queuing delays and saturation of the shared backplane, which limits
performance scaling. The shared bus also means that a failure on one partition can affect other
partitions. On Superdome X, the fault-tolerant Xbar fabric logically separates the physical partitions,
providing isolation for a more reliable, higher performance, and a more scalable system.

Integrity Superdome X built-in RAS features


A growing emphasis on availability is driving customers to seek mission-critical Linux
environments. HPE Integrity Superdome X servers offer RAS features in key hardware subsystems—
processor, memory, and I/O—and provide the ideal foundation for mission-critical Linux operating
environments. They help to ensure that customers’ business is always on by providing availability
through a layered approach that offers application, file system, and OS protection. HPE’s mission-
critical Superdome X infrastructure and the Linux operating environment provide a comprehensive
RAS strategy that covers all layers—from application to hardware. (HPE Superdome X now also
supports Microsoft OSs, bringing many of the same RAS features to Microsoft environments.)

When faults occur that require support action, accurate diagnosis of the fault is critical to determine
what is wrong and how to fix the fault correctly the first time. Superdome X has built-in diagnostic
abilities through its OA with built-in Analysis Engine. These are designed to
• Minimize time to repair
• Capture enough data to diagnose failures the first time
• Allow the system to run after failure for complete error logging
• Diagnose all system components (software, firmware, and hardware) via complete error logging
• Provide Field Replaceable Unit (FRU) level granularity for repair
• Provide component level granularity for self-healing

Part of HPE Integrity Superdome X’s comprehensive strategy for fault management includes
“Firmware First” problem diagnosis. With Firmware First, firmware with detailed knowledge of the
Superdome X system is first on the scene of problems to quickly and accurately determine what is
wrong and how to fix it. Intel Xeon E7 processors’ Enhanced Machine Check Architecture (eMCA)
allows firmware a first look at error logs so that firmware can diagnose problems and take
appropriate actions for the platform before OS and higher-level software involvement. Firmware
First covers correctable and uncorrectable errors and gives firmware the ability to collect error data
and diagnose faults even when the system processors have limited functionality. Firmware First
enables many platform-specific actions for faults, including predictive fault analysis for system
memory, CPUs, IO, and interconnect.

While features such as hot-swappable, N+1 power supplies, and single-/multi-bit memory error
correction have become common in the industry, a number of RAS differentiators set Superdome X
servers apart from other industry-standard servers. Superdome X servers offer several types of RAS
differentiators:
• Self-healing capabilities
• Processor RAS
• Memory RAS
• Platform RAS
• Application RAS
• OS RAS

The next sections cover these differentiators. You can also refer to this link:
http://www8.hp.com/h20195/v2/getpdf.aspx/4AA5-6824ENW.pdf

Self-healing RAS features


Superdome X self-healing RAS features enable the system to react to failures and avoid unplanned
downtime. It does so primarily by disabling failed or failing components during boot and by
deactivating failed or failing components during runtime. Taking failed or failing hardware offline
allows the system to remain running with healthy hardware until the system can be serviced—
preventing unplanned system downtime.
Superdome X is architected to tolerate any single hardware failure.

Reactive and predictive fault analysis allows for deconfiguration of failed or failing memory DIMMs
and CPU cores. The system can remain available with only healthy memory DIMMs and CPU cores in
use. Deconfiguration capabilities can also proactively deal with serious faults on Superdome X blade
hardware. Any multi-blade nPartition can survive blade hardware faults by deconfiguring the faulty
blade and allowing the remaining blades to boot with healthy hardware.

Superdome X can deactivate failing resources during runtime, preventing their continued use. This
level of self-healing provides zero system downtime and allows for repair actions at the next planned
downtime event. System interconnects and the memory subsystem provide these self-healing
capabilities:
• System Crossbar Fabric self-healing with link width reduction, online port deactivation, and
alternate routing for fabric connections
• QuickPath Interconnect (QPI) link width reduction at runtime
• Memory interconnect link width reduction at runtime
• Enhanced Memory Double Device Data Correction (DDDC) to tolerate two failed devices on a
DIMM

Processor and memory RAS features


HPE Integrity Superdome X servers use the Intel Xeon E7 v2 (Gen8) and v3 (Gen9) processors. These
processors include extensive capabilities for detecting, correcting, and reporting hard and soft errors.
Since these RAS capabilities require firmware support from the platform, they are often not
supported in other industry-standard servers. Superdome X, on the other hand, lets customers
leverage all of the RAS functionality provided in Xeon E7 processors. These capabilities are
described in detail in the sections below.

Corrupt Data Containment

The Corrupt Data Containment mode detects uncorrectable errors and possibly recovers the system
from these errors. When Corrupt Data Containment mode is enabled, the producer of the uncorrected
data no longer signals a Machine Check Exception. Instead, the corrupted data is flagged with an
“Error Containment” bit. When the consumer of the data receives the data with the ” Error
Containment” bit set, the firmware and OS handle the error. Several recovery flows are possible,
including Uncorrected No Action (UCNA), Software Recovery Action Optional (SRAO), and
Software Recovery Action Required (SRAR). The mission-critical Superdome X infrastructure and
the Linux operating environment support all of these Corrupt Data error flows and provide end-to-
end hardware/firmware/software error recovery where possible.

Live Error Recovery Containment

Uncorrectable errors in a server ’s PCIe subsystem can potentially propagate to other components,
resulting in a crash of the partition—if not the entire server. To minimize this risk in Superdome X
servers, HPE implemented specific firmware features leveraging Intel’s Live Error Recovery (LER)
mechanism. These features can trap errors at a root port to prevent error propagation. LER
containment allows the platform to detect a subset of Advanced Error Reporting (AER) and
proprietary-based PCIe errors in the inbound and outbound PCIe path. When a PCIe error occurs,
LER can contain the error by stopping I/O transfers, thus preventing corrupted data from reaching the
network and permanent storage. LER containment also avoids the propagation of the error and an
immediate crash of the machine.

In parallel with this error containment, LER also informs the Superdome X firmware. In turn, the OS
and upper-layer device drivers are made aware of the error. HPE’s enhancement of the AER PCIe
implementation allows Linux to better report the details of such errors in the Linux syslog files and to
better cooperate with device drivers to resume from recoverable PCIe errors. Superdome X’s
innovative solution for Live Error Recovery is not available on typical Xeon E7 processor-based
systems.

See this video demo for details on Superdome X LER containment, recovery, and detailed error
reporting: https://vrp.glb.itcs.hpe.com/SDP/Content/ContentDetails.aspx?ID=4376

Viral Error Containment

Superdome X servers further protect customer data from corruption by enabling Viral Error
Containment (VEC) mode in the E7 processor and scalable Server chipset. While Corrupt Data
Containment mode contains data errors, VEC mode contains address, control, and miscellaneous fatal
errors. By containing the error, VEC prevents it from being committed to the network or permanent
storage. VEC mode takes additional steps in hardware to detect and signal errors beyond those that
impact a single data packet. When the system enters VEC mode, all transactions that are possibly
corrupted are marked as contaminated so that no corrupt data can reach permanent storage.

Processor interconnect fault resiliency

All processor interconnects, including QuickPath Interconnect (QPI), the Memory Interconnect, and
PCIe, have extensive cyclic redundancy checks (CRCs) to correct data communication errors on the
respective busses. Additional self-healing mechanisms allow for continued operation through a hard
failure such as a failed link.

QPI and memory link self-healing automatically reduce full-width links to half-width when they
detect persistent errors on the QPI or memory link. This capability allows operation to continue until
repairs can be made. PCIe links also support width reduction and bandwidth reduction when full
width/full speed operation is not possible.

Advanced MCA recovery

The advanced MCA Recovery technology combines processor, firmware, and OS features. The
technology allows the OS to attempt to recover from errors that cannot be corrected within the
hardware alone. Without MCA recovery, the system would be forced into a crash. With MCA
recovery, the OS examines the error and determines whether it is contained to an application, a
thread, or an OS instance. The OS then determines how it wants to react to that error.

Xeon processor E7 v2 and v3 processors expand upon previous Xeon E7 processor capabilities to
provide advanced error recovery. The v2 and v3 processors can now recover from uncorrectable
memory errors in the instruction and data execution path (Software Recovery Action Required, or
SRAR errors). In addition, they can handle nonexecution path uncorrectable memory errors
(Software Recovery Action Optional, or SRAO errors).

When certain uncorrectable errors are detected, the processor interrupts the OS or virtual machine
and passes the address of the error to it. The OS resets the error condition and marks the defective
location as bad so it will not be used again and continues operation.

In expanding on E7 processor memory error recovery, including SAP HANA application recovery
(Intel, 2011), HPE has done extensive development and testing of execution path recovery. See a demo
of this feature at: https://vrp.glb.itcs.hpe.com/SDP/Content/ContentDetails.aspx?ID=4407

Memory RAS

Main memory failures have been a significant cause of hardware downtime. Superdome X servers
use several technologies for enhancing the reliability of memory: proactive memory scrubbing and
Enhanced Double Device Data Correction (DDDC)+1. Additionally, HPE Smart Memory DIMMs are
qualified to provide both performance and quality.

Proactive memory scrubbing


To better protect memory, Integrity Superdome X servers implement a memory scrubber. The
memory scrubber actively scans through memory looking for errors. When an error is discovered,
the scrubber rewrites the correct data back into memory. This proactive scrubbing, combined with
ECC, helps to prevent multiple-bit, transient errors from accumulating.

Enhanced DDDC +1
The industry standard for memory protection is single error correcting and double error detecting
(SECDED) of data errors. Additionally, many servers on the market provide Single Device Data
Correction, also known as Chip Sparing or Chip-kill.

Single device correction protects the system from any single-bit data errors within a memory device,
whether they originate from a transient event such as a radiation strike or from persistent errors such
as a bad dynamic random access memory (DRAM) device. However, Single-Chip Sparing will
generally not protect the system from a failed DRAM and a single-bit error. Though detected, these
errors will cause a system to crash.

Combined with memory scrubbing, ECC helps to prevent multiple-bit, transient errors from
accumulating. However, persistent errors can still put the memory at risk for multiple-bit errors that
cannot be corrected and can result in data corruption.

DDDC +1 in Superdome X servers addresses these problems. DDDC +1 technology determines when
the first DRAM in a rank has failed, corrects the data, and maps that DRAM out of use by moving its
data to spare bits in the rank. Single Device correction is then still available for the corrected rank.
Thus, a total of two entire DRAMs in a rank of DIMMs can fail, and the memory is still protected with
ECC. This amounts to the system essentially being tolerant of a DRAM failure on every DIMM and
still maintaining ECC protection. (The “+1” indicates that this feature provides an additional layer of
protection for single-bit errors, even in the presence of two entire device failures.)

DDDC +1 drastically improves system uptime, as fewer failed DIMMs need to be replaced. This
technology delivers up to a 17 times improvement in the number of DIMM replacements versus those
systems that use only Single-Chip Sparing technologies. Furthermore, DDDC +1 significantly
reduces the chances of memory-related crashes compared to systems that only have Single-Chip
Sparing capabilities.

Although DDDC +1 is based upon an Intel Xeon processor E7 v2 or v3 processor feature, Superdome
X has enhanced the feature with specific firmware and hardware algorithms. HPE’s enhanced DDDC
+1 provides a memory RAS improvement over Intel base code and reduces memory outage rates by
33%–95% over standard x86 offerings.

New Memory RAS with Intel® Xeon® processor E7 v3 E7 v3 processors and their DDR4 memory
subsystem provide two new memory RAS features not available in previous E7 versions. These new
features are described below.

DRAM Bank Sparing


This feature is only supported on the BL920s Gen9 servers, which have Intel E7 v3 processors and a
DDR4 memory subsystem. To better target the most likely memory failure modes at the DRAM level,
DRAM Bank Sparing can move data away from a faulty Bank. DRAM Bank Sparing is automatically
enabled as part of HPE-enhanced DDDC +1 and provides up to 33% more error resiliency compared
with E7 v2 enhanced DDDC +1.

DDR4 Command/Address Parity Error Retry


This feature is only supported on the BL920s Gen9 servers, which have Intel E7 v3 processors and a
DDR4 memory subsystem. DDR4’s Command/Address bus is parity protected, and the E7 v3
integrated Memory Controller and Memory Buffer provide detection and logging of parity errors. In
previous E7 platforms, all Command/Address bus parity errors were fatal events, which caused an
OS crash. The v3 Memory Controller helps prevent crashes by automatically retrying any
transactions reporting parity errors. Command/Address Parity Error Retry, HPE enhanced DDDC +1
(with Bank Sparing), and memory interconnect self-healing work in harmony to provide resiliency
against errors across all memory interfaces and components.

Platform RAS features


Figure 8-12 Platform RAS features

The HPE Integrity Superdome X platform itself offers built-in RAS features including clock
redundancy, system fabric RAS, and fault-tolerant RAS, as illustrated in Figure 8-12.

Clock redundancy

Superdome X offers a fully redundant clock distribution circuit, which contains the clock source and
continues redundancy through the distribution circuit to the blade itself. The system clocks are
powered by two fully redundant Hardware Reference Oscillators (HSOs) which support automatic,
“glitch-free” fail-over/reconfiguration and are hot pluggable under all system operating conditions.

During normal operation, the system selects one of the two HSOs as the platform’s clock source. If
only one HSO is installed, then its output is used (assuming it is of valid amplitude). If both HSOs are
plugged in and both outputs are valid, then one of the two is selected by the clock switch logic on the
blade. If one of the HSO outputs fails to have the correct amplitude, the clock switch logic will use the
valid HSO as the source of clocks and send an alarm to the system indicating which HSO failed. A
good HSO has a green LED, and a failed HSO has a yellow LED. The customer can then repair the
failed clock source through a hot-plug operation.

System fabric RAS

Superdome X leverages its crossbar interconnect from Integrity Superdome 2 and continues the HPE
strategy of attacking IT sprawl with standards-based, modular architectures for mission-critical
infrastructure. The heart of the Superdome X architecture is the fault-tolerant HPE Crossbar Fabric,
consisting of passive midplanes with end-to-retry and link failover functionality. HPE’s innovative
scalable enterprise system chipset includes extensive self-healing, error-detection, and error
correction capabilities.

Fault-tolerant fabric

Superdome X sets the industry standard for fault-tolerant fabric resiliency. The fabric essentially
consists of high bandwidth links that provide multiple paths and a packet-based transport layer that
guarantees delivery of packets through the fabric. The physical links contain availability features such
as link-width reduction. If individual wires or I/O pads on devices fail, this feature can reconfigure
links to eliminate the bad wire. Strong CRCs are used to guarantee data integrity.
Beyond the reliability of the links themselves, the fabric implements end-to-end retry as the next stage
of defense. The receiver of a packet is required to send acknowledgement back to the transmitter. If
the transmitter receives no acknowledgement, it retransmits the packet over a different path to the
receiver. Thus, end-to-end retry guarantees reliable communication despite any disruption or failure
in the communication path, including bad cables and chips.

The system crossbar provides unprecedented containment between partitions. High reliability for
single partition systems derives from high-grade parts for the crossbar chipset and from fault-
tolerant communication paths between Integrity Superdome X blades and I/O. Furthermore, unlike
other systems with partitioning, HPE provides specific hardware dedicated to guarding partitions
from errant transactions generated on failing partitions.

OS and application level RAS


The OS benefits from the error recovery features introduced earlier, including recovery from
memory and PCIe faults. Many years of collaboration between the processor, firmware, OS, and
application design teams has led to the delivery of several advanced error recovery capabilities.

Customers need a way to build on these RAS features and ensure that their applications remain
available. HPE Serviceguard Solutions for Linux (SGLX) provides a high availability and disaster
recovery clustering solution for customer applications. It monitors the availability and accessibility
of critical IT services, including databases (DBs) and both standard and custom applications—and
everything these services rely on. SGLX meticulously monitors for faults in hardware, software, OS,
virtualization, storage, or network. If it detects a failure or threshold violation, it automatically and
transparently fails over the faulty component and resumes normal operations in mere seconds,
without compromising data integrity or performance.

HPE Serviceguard Storage Management Suite (SMS) allows the clustered solution to use a clustered
file system to achieve the highest levels of availability.

Customers can extend clusters across data centers to create a true disaster recovery solution using
HPE Serviceguard Metro cluster for Linux and HPE Serviceguard Continental, which provide support
for geographically dispersed clusters.

By utilizing multiple copies of data and multiple data centers separated by any distance, customers can
maintain access to critical data and applications without impacting data integrity and performance,
even if a data center fails. The toolkits and extensions for Linux simplify and quickly integrate
complex applications into a standardized and proven framework.

The next section illustrates an example SGLX solution and describes how it works in more detail.

Example disaster recovery design with SGLX


Figure 8-13 Example disaster recovery design with SGLX

Figure 8-13 illustrates a design for an SGLX cluster. The Superdome X enclosures could be in
separate data centers as part of a disaster recovery solution.

Hardware redundancy

SGLX, like all other high-availability (HA) clustering products, uses hardware redundancy to
maintain application availability. To achieve the highest level of availability, you must design the
solution to eliminate all single points of failure (SPOFs). For example, the Serviceguard
configuration guidelines require redundant networking paths between the nodes in the cluster. This
requirement protects against total loss of communication to a node if an adapter fails—the redundant
adapter simply takes over.

Cluster membership protocol

Similarly, the cluster provides complete redundancy for the components required to maintain a
system. The cluster shown in Figure 8-13 consists of two nodes—that is, two Integrity Superdome X
nPars. If a node in the cluster fails, another node is available to take over applications that were active
on the failed node. The SGLX cluster membership protocol determines which nodes in the cluster are
currently operational. The nodes exchange heartbeat messages and maintain a cluster quorum. After a
failure that results in loss of communication between the nodes, active cluster nodes execute a cluster
re-formation algorithm to determine the new cluster quorum.

It is important to note that if more than 50% of the nodes in the cluster fail at the same time, the
remaining nodes have an insufficient quorum to form a new cluster and fail themselves. Therefore,
you must carefully analyze the cluster configuration to prevent circumstances in which multiple
nodes will fail at the same time. For example, two nodes should not share a common power
distribution system in a cluster with three nodes.

HPE Integrity Superdome X use cases


You are now ready to examine the use cases for which HPE Integrity Superdome X solutions are
designed.

Scale up for the right workload


Companies can reduce server sprawl by consolidating many workloads and virtual machines (VMs)
on one physical host. Scale-up represents an attractive alternative for a variety of reasons:
• Operational cost reductions because of reduced server management staffing requirements
• Reduced power and cooling costs
• Reduced software licensing costs
• Reduced IT infrastructure costs

Scale-up can also improve performance and greatly reduce unplanned downtime. IDC reports that IT
organizations found that this type of scale-up consolidation, combined with the powerful effects of
virtualization, resulted in savings of over 35%. (“Could HPE’s Superdome X Be the Mission-Critical
x86 Platform We’ve Been Waiting For?” Dec. 2014.)

Target workloads

Figure 8-14 Target workloads

Whether customers have a large number of concurrent, short-lived queries or large complex queries,
Superdome X delivers high performance and low latency for business processing and decision
support workloads, as shown in Figure 8-14.

Business processing workloads include


• Enterprise Resource Planning (ERP),workloads, including transactional applications such Oracle,
PeopleSoft, and SAP
• Other business commerce applications that facilitate business transactions or other task automation
over networks
• Departmental transactional applications that run on servers but do not tie directly to other
applications
• Customer Relationship Management (CRM), which automates customer-facing business processes
within an organization
• Online transaction processing (OLTP) that uses a database and is not ERP
• Batch workloads, such as traditional legacy mainframe-type processes that execute business
process transitions in a batch process

Decision support workloads include


• Data warehousing/data mart tools, which are used to create and run data warehouses and data marts
• Data analysis/data mining tools, which are used to access data warehouses for online analytical
processing

Example use cases by industry

Figure 8-15 Example use cases by industry

IDC has identified SAP, Oracle, and custom-developed application environments as prime candidates
for scale-up, but you will find applications in any industry that will benefit from a scale-up
implementation (as shown in Figure 8-15). For example, healthcare organizations use applications
that manage healthcare records, while government agencies might have massive payment processing
applications.
Chapter 8—Activity 1
In this activity, you will identify which customers are candidates for HPE Integrity Superdome X
solutions based on their concerns and requirements. You will also identify issues that might drive
customers toward the HPE solution. You can check your answers by referring to Appendix B:
Answers to Activities.

Identify scenarios for HPE Integrity Superdome X

You will now identify scenarios in which you should recommend an HPE Integrity Superdome X
solution. Record the number of each scenario for which the Superdome X is a good solution.
1. A healthcare institution has acquired several new hospitals. It is standardizing patient records on a
single Electronic Healthcare Record solution and needs to update its infrastructure for this
solution.
2. An automotive manufacturer needs a better platform for an Electronic Design Automation (EDA)
solution.
3. An automotive manufacturer needs a better platform for its SAP Enterprise Resource Planning
(ERP) solution.
4. A media company is beginning to live stream content digitally and needs a solution that can meet
the high demands of live transcoding.
5. A retail organization wants to mine structured data in its data warehouse for help in making
business decisions. It plans to deploy an Oracle OLAP cube for this purpose.
6. A social media site is deploying Cloudera HBase to organize its unstructured data for faster
processing and analysis.
7. A financial institution is rolling out a new Temenos core banking solution so that it can provide
more online banking services to its customers.
8. A government department has developed an application in-house for processing license fees.
Recently the servers hosting the application experienced unplanned downtime.
9. A software development company wants to boost employee productivity and reduce capital
expenditures by creating an open floor space for developers. As part of the initiative, the company
wants to move from traditional managed desktops to desktops hosted in the data center.
10. A call center uses a resource management solution to manage its workforce. However, the
solution is reaching its capacity and the call center needs a refresh.
11. A retail organization uses sensors to track inventory and plan just-in-time stocking. But the
current solution sometimes loses data, causing managers to make incorrect decisions.
12. A university is attempting to virtualize its data center services, as well as create a self-service
model for deploying applications.

Discuss deployment drivers

You will now examine one of the scenarios in more detail. Your sales partner has established contact
with a manufacturer that is looking for a better platform for its SAP ERP solution. You need to
discover more information about the customer ’s issues and needs. What can you ask to help you to
determine whether the customer ’s pain points could drive it toward an HPE Integrity Superdome X
solution?

HPE Integrity Superdome X solution architecture


You will now consider how to architect Integrity Superdome X solutions.

Reasons for implementing hard partitioning

Figure 8-16 Reasons for implementing hard partitioning

As you learned earlier in the chapter, hard partitioning—the ability to flexibly divide the system into
electronically isolated partitions—distinguishes Superdome X solutions. Hard partitioning allows
you to create different test, development, and production environments in a single enclosure, all
running different OS and application versions. Any disruption of service to one environment would
not impact the other environments.

Hard partitioning also provides guaranteed CPU, RAM I/O, and NET resources so mission-critical
applications do not compete for performance and resources with less important applications.

Hard partition systems offer software licensing cost advantages for customers using applications
from the most popular ISVs.Frequently, only the CPU cores in a hard partition (versus the total
number of cores in the entire system) are licensed, dramatically reducing the TCO.

In addition, hard partitions are 20 times more reliable than soft partitions alone: Hard Partitions have
about 5% the number of SPOFs of software-only partitions.

The HPE Superdome X can be partitioned into different mixes of nPars. The configurations shown in
Figure 8-16 allow the customer to hard partition their HPE Superdome X into the configuration of 2-
socket, 4-socket, 8-socket, or 16-socket configurations.

Licensing
Make sure to include the proper license in your proposal. The Basic Partition License is intended for
entry-level configurations. It permits only one partition, which can be up to four blades (eight
sockets) in size. If the customer wants multiple partitions, or a partition larger than four blades, or
both, propose the Advanced Partition License.

Guidelines for implementing hard partitioning


When loading partitions, you should load the largest partitions first. For BL920s blades in the same
partition, load partitions all in odd slots or all in even slots. For instance, a four-blade partition must
have blades loaded in slots 1/3/5/7 or 2/4/6/8.

A BL920s blade must be in slot 1 of the enclosure.

A BL920s blade or filler blade (HPE CAMnet Completer Module, CCM) must be in slots 2 or 3 of the
compute enclosure.

If there is only one single blade, a CCM is required to provide redundant manageability fabric from
the GPSMs to the OAs. In this case, a CCM is automatically included in the enclosure, in slot 2.

Follow these guidelines for selecting processors on blades that will form the same nPartition:
• You cannot mix processor types within an nPartition.
• Processors within the same nPartition cannot run at different frequencies or use different cache
sizes.

When you are planning processors for blades, keep in mind the current nPartition plans and also
future plans.

Valid partition designs

Figure 8-17 Valid partition designs


The HPE Superdome X can be partitioned into different mixes of nPartitions. The configurations
shown in the table in Figure 8-17 allow the customer to hard partition their HPE Superdome X into 2-
socket, 4-socket, 8-socket, or 16-socket configurations.

Keep in mind that the numbers in the “1-Blade,” “2-Blade,” and other rows in the table refer to the
number for the nPar. For example, if you want to create a two-blade (4-socket) nPar, use slots 1 and 3
(the slots with 1s in them in the table). To create a second two-blade nPar, use slots 5 and 7 (the slots
with 2s in them in the table).

Valid memory designs

Figure 8-18 Valid memory designs

HPE Superdome X uses the Intel Scalable Memory Buffer chip to translate between the Scalable
Memory Interconnect 2 (SMI2 or VMSE) technology on the memory controller and the DIMMs.
Figure 8-18 provides an at-a-glance view of the information below.

The following DIMMs are supported on BL920s Gen9 blades:


• 16 GB DDR4-2133 CAS-15-15-15 LRDIMMs for BL920s Gen9
• 32 GB DDR4-2133 CAS-15-15-15 LRDIMMs for BL920s Gen9
• 64 GB DDR4-2133 CAS-15-15-15 LRDIMMs for BL920s Gen9

The following DIMMs are supported on BL920s Gen8 blades:


• 16 GB PC3-12800R (DDR3-1600) Registered CAS-11 DIMMs for BL920s Gen8
• 32 GB PC3-14900R (DDR3-1866) Registered CAS-13 DIMMs for BL920s Gen8

Only DIMMs that Hewlett Packard Enterprise has qualified on BL920s Server Blades are supported.

The BL920s Server Blade supports 48 DIMMs and eight Intel Scalable Memory Buffer chips. This
equates to twelve DIMMs and two Scalable Memory Buffer chips per memory controller (two
memory controllers per processor).
Follow these general memory configuration rules:
• You must add DIMMs in groups of 16, which fills one slot on all eight memory channels on both
processors.
• For best performance, the amount of memory on each blade within the partition should be the
same.
• Use the same amount of memory on each processor module within a partition.
• As possible, considering that you must add DIMMs in groups of 16, use fewer DIMM populations
of larger DIMM sizes to get the best performance.
• For example, use16x 32 GB DIMMs instead of 32x 16 GB DIMMs if you need 512 GB of memory
per blade.
Chapter 8—Activity 2
In this activity, you begin to architect an HPE Integrity Superdome X solution. You will put what you
have learned into practice and design nPartitions and memory configurations for a customer
scenario.

You can check your answers by referring to Appendix B: Answers to Activities.

Scenario

A manufacturing company that designs and creates specialized hardware uses SAP ERP to manage its
manufacturing schedules, product releases, and many other processes. This solution forms a vital part
of many of the company’s day-to-day operations. The company has been expanding its business—
good news for its revenue, but bad news for its ERP solution which can no longer keep up with
demand. Employees are beginning to complain that they cannot use ERP effectively because response
times are so slow. Worse, the system experienced an unplanned outage last month, leading to
manufacturing delays that affected the company’s ability to deliver promised goods to its customers.

The company is looking for a new solution that will resolve these issues.

Workload and current solution

The ERP version is SAP ERP 6.0 with SAP enhancement package 5. The ERP application runs on
Microsoft Windows Server 2012 R2 Datacenter Edition using SQL Server 2014.

The application currently runs on a server with


• Four 15-core Intel Xeon Processor E7-4870 v2, 2.4 Ghz processors
• 512 GB memory
• Four 1GbE links

Requirements

The solution must support 30,000 users with response times of less than 1 second. The customer
wants a server that provides
• Eight Intel Xeon E7 v3 processors
• At least 1 TB memory
• At least four 10 GbE links

A second server must have exactly the same specifications. It will operate in a Windows cluster with
the SAP ERP and SQL database for availability purposes.

The storage solution must provide at least 4 TB memory.

Plan nPartitioning and memory


You are planning to propose an HPE Integrity Superdome X server to meet the customer ’s needs.
Record your answers to these questions about the solution.
1. Your sales partner has told you that the SAP ERP solution must support 30,000 users. What
additional information might you want to discuss with the customer to flesh out this
requirement?
2. Estimate whether an HPE Integrity Superdome solution with the customer ’s proposed
specifications will meet the customers’ needs.
a. Visit http://global.sap.com/solutions/benchmark/sd2tier.epx.
b. Search for Integrity Superdome and find a row that lists the customer’s solution.
c. Examine the specifications and the number of supported users. Do you feel confident that a solution with the customer’s
minimal requirements will meet the needs? Do you want to propose exceeding the requirements in any way?

3. Which server blade and processor will you propose? Table 8-2 at the end of this chapter gives
an overview of options. More than one option might be valid. Explain your choice.
4. How many nPartitions will you create?
5. Which license does the HPE Integrity Superdome X solution require?
6. Which slots will you combine in each nPartition? Use Figure 8-17 to make the plan.
7. How much memory will you recommend? Create a table similar to Table 8-1. Remember that
you must install memory in increments of 16 DIMMs (one DIMM in each of eight channels for
two processors). You can use this strategy to plan:
– Fill in the total capacity that you require.
– Divide that capacity by the number of blades in the nPartition.
– In Figure 8-18, find the row with the value that matches the capacity required per blade
(round up).
– Fill in the rounded up capacity per blade. Then fill in the other cells in your table.
Table 8-1 Memory plan

DIMM capacity Number of DIMMs per blade Capacity per blade Total capacity per nPartition

Options for connecting to storage


Integrity Superdome X does not have any local storage; storage for the OS and other data is delivered
via FC attached storage. Superdome X currently supports FC data connectivity and boot support with
the following HPE Storage devices:
• HPE 3PAR StoreServ 7000/7450/8400/10000 Storage
• HPE MSA 2040 Storage
• HPE XP7 Storage
• HPE XP P9500 Storage
• HPE EVA P6000 Storage
• HPE EVAx400 Storage

Check the latest storage options by visiting https://h20272.www2.hpe.com/spock/

You will learn how to establish the FC connections as you examine the external connectivity options
in the next sections.
Chapter 8—Activity 3
Earlier you saw how blades can connect together as a single system or nPartition through the upper
midplane. You will now look at how blades in different nPartitions connect to each other and to
external networks through the lower midplane and interconnect modules.

An HPE Integrity Superdome X nPartition (or unpartitioned server) can use any adapters available to
any of its blades. Mapping between a blade’s FlexibleLOM or mezzanine ports follows the same rules
as those for HPE c7000 Blade Enclosures, with which you should be familiar from prerequisite
training.

Review those rules by sketching Figure 8-19 on a piece of paper (or printing out a screen shot) and
mapping each FlexibleLOM or mezzanine port to the correct ICM Bay.

You can check your answers by referring to Appendix B: Answers to Activities.

Figure 8-19 Mapping between a blade’s FlexibleLOM or mezzanine ports to the ICM Bays

Connectivity through interconnect modules


Figure 8-20 Connectivity through interconnect modules

You should have drawn the connections shown here during Chapter 8—Activity 3.

Figure 8-20 shows the connections for just one blade, but every blade has the same connections. In
fact, blades in different nPartitions can use these connections and the interconnect module to reach
each other. For example, nPartition 1 and nPartition 2 can communicate on their first FlexibleLOM
port through interconnect module 1—no external cables on the interconnect module are required.
These connections experience low latency because the HPE Integrity Superdome X processors
connect to their FlexibleLOM or mezzanine ports without a southbridge chip. The same low latency
applies to external connections.

Requirements for FlexibleLOMs

Figure 8-21 Requirements for FlexibleLOMs

In addition to interconnecting blades in different nPartitions, the interconnect modules provide the
Superdome X server with its external connections, illustrated in Figure 8-21. You need to consider all
the types of connections that your customer ’s solution might require. As you plan, also remember that
all of the ports on each blade within an nPartition are available to that nPartition.

Each blade requires at least one FlexibleLOM adapter. Various options provide two 10 GbE or two 20
GbE ports. Depending on the number of blades in the nPartition, the system will have one to eight
FlexibleLOMs. You can bond or team every corresponding port on a blade in the same nPartition,
providing a greater pool of bandwidth. For example, if an nPartition includes blades 1 and 3, you can
bond the FlexibleLOM port 1 on both using any type of bonding that you like, including LACP. The
interconnect module is an Ethernet switch or pass-thru module. It (or the upstream switch if you are
using a pass-thru module) requires a corresponding LACP connection.

You can optionally add a second FlexibleLOM adapter on any of the blades. Port 1 on this
FlexibleLOM also connects to the module in interconnect bay 1, making more bandwidth available
for the bonded adapters or NIC team. The partition can support up to 16 FlexibleLOMs.

Often you will use the FlexibleLOM ports to connect the system to an external network through which
client requests arrive. You need to discuss the needs with the customer. Is 10 GbE connectivity
sufficient, or is 20 GbE with RDMA over Converged Ethernet (RoCE) required?

Also discuss with the customer whether the server will be using any features such as Virtual
Extensible LAN (VXLAN) or Network Virtualization using Generic Routing Encapsulation (NVGRE)
tunneling. VXLAN and NVGRE are alternative technologies that meet similar needs. They create
overlay tunnels between VMs on different hosts that enable the VMs to connect at Layer 2 regardless
of the intervening network infrastructure. These technologies also help to maintain multi-tenant
environments in which many tenants have many VLANs each. The VLAN standard only supports up
to 4096 VLAN IDs (with IDs 0 and 4095 reserved). VXLAN and NVGRE tunnels encapsulate the
tenant VLAN ID in traffic, allowing tenants to use overlapping VLAN IDs and extending the number
of VLANs supported. If the customer needs to use either of these technologies, make sure that the
adapter supports tunnel offloading to ensure that these technologies do not affect the server
performance.

If the customer wants to converge storage traffic on these ports, select options that support FCoE.

Table 8-3 and Table 8-4 at the end of this chapter provide a list of adapters and interconnect modules
as of the publication of this ebook.

Requirements for FlexibleLOMs: Adding a second interconnect module


Figure 8-22 Requirements for FlexibleLOMs: Adding a second interconnect module

Port 2 on each FlexibleLOM adapter connects to the interconnect module in bay 2. You might choose
to connect the second interconnect module to a different network, such as one dedicated to
management. In this case, the first port on FlexibleLOMs for all blades in the nPartition form one set
of bonded adapters, and the second ports form a second set.

Often, though, you want the pair of interconnect modules to connect to the same network for
redundancy. In this case, you can either set up a form of NIC bonding that does not require switch
awareness, or you establish an IRF fabric on the interconnect modules (as shown in Figure 8-22).

You should discuss whether the customer needs any other dedicated external connections such as a
management connection or, if the customer is using software-defined virtualization, a connection for
VM migration traffic. You can install additional mezzanines and interconnect modules to support
these connections.

Requirements for FC connections

Figure 8-23 Requirements for FC connections

As you learned, the Superdome X blades do not have local drives. You must install an FC HBA in a
Mezzanine on at least one blade in the nPartition. You must then dedicate at least one FC port to FC
boot. The interconnect bays connected to that mezzanine require FC switch modules. Figure 8-23
shows how those switches can then connect to an HPE MSA 2040, which provides the drives for
booting the nPartition. The MSA 2040 can support boot disks for all partitions. Storage
administrators should use the Explicit Mapping feature to assign boot disks. One controller on the
MSA 2040 is required, but two are recommended to enhance availability and permit online updates
for the controller firmware.

As an alternative, you can use an external SAN for the FC boot. In the example shown in the figure,
multiple blades in the nPartition connect to an FC SAN, which then connects to a supported FC
storage array (preferably, with two hops or fewer). The SAN should also provide redundant paths,
and the storage arrays should have built-in redundancy to allow online firmware updates without
disrupting the Superdome X nPars.

An nPar can use external storage array for both its boot disks and other types of storage. Remember,
though, that at least one port must be dedicated to FC boot, not to other storage traffic. The second
port on the same adapter, however, can be used for other storage traffic. You might set up two ports
on different blades for FC boot for redundancy purposes.

If you are planning an SGLX solution for a customer, you should also dedicate two FC HBAs for this
purpose.

Example use case: 3PAR SAN solution for an OLTP solution

Figure 8-24 Example use case: 3PAR SAN solution for an OLTP solution

In the example shown in Figure 8-24, an Integrity Superdome X enclosure has two nPars, including
one that supports an OLTP database, that must connect to external storage. An HPE 3PAR storage
array was selected for both excellent storage capacity and scalable performance capabilities. The
3PAR and the Superdome X are connected via FC, and both the 3PAR and Superdome X are on a
shared 10 GbE network for end user and administrative connectivity.
The volumes in the 3PAR are configured via Thin Provisioning, with 5 TB allocation size using the
SSD_R1 (RAID 1) configuration, except for the operating system volume, which was allocated with 1
TB (thinly provisioned).

The Reference Architecture is split into two nPars: a scale-up OLTP system, composed of four blades,
and a system hosting Hyper-V VMs, also composed of four blades. The 3PAR system is divided into
four logical partitions, each of which is optimally served by two controller nodes (all nodes can
reach all storage, but these nodes do so with the least latency). The partitions optimally connected to
controller nodes 0 and 1 contain drives used by nPar 1. The other partitions, optimally connected to
nodes 1 and 2, contain drives for nPar 2.

In the reference architecture, the FC SAN switch modules use zones to map the nPar 1 HBAs
(provided by blades in bays 1, 3, 5, and 7) to ports on 3PAR controller nodes 0 and 1. Similarly, zones
map the nPar2 HBAs (in bays 2, 4, 6, and 8) to ports on the 3PAR controller nodes 2 and 4. This
logical configuration, illustrated in the figure, ensures optimal forwarding for the storage traffic.
(Note that the figure only illustrates the mappings on the first SAN switch module, but the second
module enforces the same mappings.)

Physically, two uplinks on each FC SAN switch connect to each of the four 3PAR controllers, creating
a highly redundant and available design.
Chapter 8—Activity 4
You will now practice planning the FlexibleLOM adapters, mezzanine adapters, and interconnect
modules for the customer scenario. You will plan the LAN and the SAN connections for the
customer ’s new solution for its SAP ERP application.

The customer needs 2 TB of storage. You are proposing an HPE 3PAR StoreServ 7440c storage array
with four controllers to provide the storage. You must plan how the HPE Integrity Superdome X
System will connect.
1. Refer to the customer requirements, the FlexibleLOM requirements that you just learned about,
and Table 8-3 at the end of this chapter. Select a FlexibleLOM option and indicate the number
that you will install.
2. Which interconnect module bays provide the uplinks for these ports?
3. Which module or modules will you plan for these bays? Refer to Table 8-4 at the end of this
chapter for options.
4. How will you set up NIC teaming on each nPartition? What should you discuss with the network
administrator for the switch module configuration?
5. The system needs high bandwidth connectivity to the 3PAR storage array: a two-port 16 Gbps
FC adapter on each server blade. What is a valid mezzanine slot for the adapters?
6. Which interconnect module bays provide the uplinks for ports installed in this mezzanine?
7. Which switch modules will you plan for these bays?
8. What requirement for an nPartition’s image can affect your plan for the FC connections?

You can check your answers by referring to Appendix B: Answers to Activities.

HPE Integrity Superdome X management


The final topic in this chapter discusses Integrity Superdome X management.

Managing Superdome X with built-solutions


Figure 8-25 Managing Superdome X with built-solutions

Integrity Superdome X offers extensive management capabilities through both built-in management
components and additional management resources. Its management components include the SD OA,
Insight Display, and iLO, as you see in Figure 8-25.

Superdome Onboard Administrator (SD OA)

Each HPE Integrity Superdome X chassis supports one or two Onboard Administrator (OA) modules.
The OA features are described in more detail in the next sections.

Insight Display

The Insight Display provides a graphical representation of the physical configuration of the
Superdome X enclosure. The Insight Display indicates when any device, configuration, power, or
cooling errors are detected. A display highlighted in green indicates no errors; a display highlighted
in amber indicates that an error has been detected.

iLO on Superdome

Each blade on the Superdome platform has an iLO management engine for use as an engine by virtual
media and virtual keyboard, video, mouse (KVM) features. One iLO is active for any given nPartition
and has its programmatic LAN interfaces enabled, but not its web GUI.

The SD OA provides the GUI. The flexible and aggregate nature of Superdome nPars and the SD OA
allows the SD OA to provide all the management interaction necessary for working with the servers
created within the Superdome enclosure. The SD OA has full inventory and status information by
nPartition as well as by blade. Administrators can launch a console and a virtual media directly from
the SD OA for any particular nPartition using iLO management.

HPE Superdome Onboard Administrator (SD OA)


Figure 8-26 HPE Superdome Onboard Administrator (SD OA)

The SD OA offers a built-in, always available platform and partition management system. While
based on the C-class Onboard Administrator, the SD OA has expanded functionality such as the ability
to manage partitions, gather detailed knowledge of component inventory and health, and evaluate
system faults with an Analysis Engine. You will examine these features in more detail in the next
sections.

The SD OA provides a user-friendly experience and makes managing the Superdome X much easier
by centralizing the control and building the management into the hardware and firmware of the
system (see Figure 8-26). It provides a choice of user interfaces:
• Command Line Interface (CLI) for easy scripting and power user convenience
• Graphical User Interface (GUI) for intuitive operation

HPE SD OA: Onboard Partition Manager and Firmware Manager


Figure 8-27 HPE SD OA: Onboard Partition Manager and Firmware Manager

The HPE SD OA provides an Onboard Partition Manager for creating and managing nPartitions.
Because the hard partitioning is implemented entirely in firmware, customers do not need to deploy
additional software tools, special hypervisors, and external management solutions to build their
desired partition configuration.

Administrators can manage the nPars entirely from the SD OA CLI or GUI. They can configure nPars
on a new system, as well as rearrange nPars in different configurations. They can also start and stop
individual nPars. As you recall, each nPar is electronically isolated, so administrators can boot each
one without affecting others.

Expanding an nPar from one to two blades or from two to four blades is simple: administrators shut
down the system, adjust the resources in the SD OA, and then restart the nPar. The OS running on the
nPar should dynamically adjust to the additional computing resources, as should external storage.

Online Partition Manager also provides “ParSpecs,” which are a way to save, create, and build
partitions from resource definitions. In other words, the ParSpec acts as a template, defining in
advance the blades that belong to an nPar. The ParSpec can be applied later to build the actual nPar.
Multiple ParSpec definitions have overlapping resources, as long as the booted nPars do not claim
the same resources at the same time.

For example, administrators might create ParSpecs for normal operation, assigning two blades to an
nPar for a production database, two blades for a development database, and so on. They might also
define ParSpecs that use more blades for intensive end-of-month jobs, in which the production DBs
draw on blades used by the development resources, as you see in Figure 8-27. The ParSpecs use
overlapping resources, but they are not applied and booted at the same time.

The Onboard Firmware Manager can scan a partition and report components with incompatible
firmware versions. Administrators can then easily update all blades to ensure consistency and proper
functionality.
HPE SD OA: Error Analysis Engine and Health Repository
As you learned earlier in this chapter, HPE Integrity Superdome X features an Error Analysis Engine
for hardware components: This engine can predict failures and initiate automatic recovery actions, as
well as notify administrators and HPE Insight Remote Support of the issues.

Administrators can use the SD OA Health Repository to monitor the actions taken by the Error
Analysis Engine. The Health Repository provides up-to-date status for hardware components and
subcomponents as well as a historical record of previous statuses and changes.

The Health Repository provides Indictment Records to help administrators learn about automatically
remediated issues, as well as resolve issues that the Error Analysis Engine could not fully address on
its own. Each record includes the time the error occurred, the error cause, and the subcomponent that
caused the error. If the Error Analysis Engine could not determine exactly which subcomponent
caused the error, it indicates the most likely suspect or suspects. The Indictment Record also indicates
what the engine did to resolve the error—such as deactivate the faulty subcomponent—and what it
recommends the administrator should do.

In short, the Health Repository makes it simple for administrators to interact with the sophisticated
Error Analysis Engine and to keep the HPE Integrity Superdome X server online and error-free.

HPE Integrity Superdome X optimization—Examples


This section covers options for optimizing a Linux or Microsoft OS to run on HPE Integrity
Superdome X and provides references for in-depth optimization steps.

Example: Optimizing network performance for Linux


A system’s networking performance is typically defined by its scalability, bandwidth, and latency. The
sections below provide recommended, tunable settings to achieve high network performance for
Superdome X servers running Linux.

HPE tested these settings using an 8-socket (120-core) Superdome X server. The server used Intel
82599 10 Gb FlexLOM adapters to connect to another Superdome X partition through the internal
Intelligent Resilient Framework (IRF) path feature of the 6125XLG switch modules. Tests were also
run between the Superdome X and DL360s Gen9 Servers accessed through an external 8212ZL 10
GbE switch. HPE evaluated both single NIC performance and performance on a four-port NIC bond.

9000 byte MTU (Jumbo Frames)

The default Ethernet frame size is 1500 bytes, typically consisting of 1460 bytes of data payload and
40 bytes of UDP/TCP/IP header information. Using Jumbo frames can improve the efficiency of
packet handling. Without Jumbo frames, any UDP message larger than 1460 bytes would be
fragmented into multiple IP datagrams and require reassembly by the receiver. Many driver
algorithms used to steer traffic cannot determine the optimal inbound queue assignment of these IP
fragments, resulting in default, nonoptimal processing of the frames. By using a 9000 byte MTU size,
larger messages can be managed more efficiently. Using Jumbo frames does require configuration of
the attached network switch equipment to properly support larger MTU sizes.

To set the frame MTU size:

ifconfig<interface>mtu 9000

ixgbe UDP HW RSS mode

By default, the ixgbe driver Receive Side Scaling mechanism (RSS) steers incoming packets toward
its multiple queues based on its Flow Director logic. This default steering logic looks at the 4-tuple of
source IP address, destination IP address, source port, and destination port numbers to decide to which
of its multiple queues it should assign the incoming packets. For, UDP however, RSS only evaluates
the source and destination IP addresses, which can result in poor distribution and overloading of one
queue for certain environments. The following ethtool command enables the use of 4-tuple steering
logic for UDP packets resulting in more efficient and even UDP traffic processing:

ethtool -N <interface>rx-flow-hash udp4 sdfn

Bonded adapters & Bond xmit_hash_policy

As you learned, a Superdome X nPartition often provides several adapters that connect to the same
interconnect module. By default, interfaces in Linux are not bonded together. You should bond (or
“trunk” ) the adapters to increase aggregate bandwidth and provide greater sharing of the CPU
resources needed to support adapters and adapter drivers. Many bond configurations also include
redundancy schemes. Connections are spread across the available adapters based on various policies.
The ‘xmit_hash_policy’ chosen for the bond in HPE testing used a hash of Level 3 and Level 4
protocol header information (IP addresses and UDP/TCP port numbers) to assign connections to
adapters.

ixgbe driver rx-usecs

The rx-usecs parameter controls the driver interrupt throttling mechanism. By default, the driver uses
a dynamic delay in asserting interrupts in hopes that more packets can be processed within a single
interrupt event, thus improving CPU efficiency. This added delay can impact latency-sensitive
applications. Setting this value to zero disables the interrupt behavior. Disabling the driver interrupt
throttling is done with the following command:

ethtool -C <interface>rx-usecs 0

net.core.rmem_max and net.core.wmem_max

These sysctl tunable parameters dictate the largest allowed send socket buffer size. Defaults vary per
OS version, but HPE recommends 16 MB.

net.ipv4.tcp_rmem and net.ipv4.tcp_wmem

These sysctl tunable parameters define the default receive socket buffer size if no other size is
explicitly requested. Defaults vary per OS version, but HPE recommend 16 MB.
Power Savings

To reduce I/O latencies, be sure to follow the power savings recommendations

For more information and to read about power savings and storage I/O performance
recommendations, read the whitepaper Optimizing Linux performance on HPE Integrity Superdome
X at: http://www8.HPE.com/h20195/v2/GetPDF.aspx/4AA5-9698ENW.pdf

Example: Optimizing performance for Microsoft scale-up OLTP and


consolidation solution
The HPE Verified Reference Architecture for Microsoft SQL Server 2014 on HPE Superdome X:
provides configurations for two nPars:
• One nPar, which can consist of one, two, or four blades, hosts a scale-up OLTP solution using
Windows Server 2012 R2 Standard Edition and SQL Server 2014 Enterprise Edition.
• One nPar, consisting of blades, runs Windows Server 2012 R2 Datacenter edition, with the Hyper-V
feature installed. Eight separate VMs are installed, each running Windows Server 2012 R2
Datacenter Edition and SQL Server 2014 Enterprise Edition.

The sections below outline guidelines for optimizing these solutions to run on an Integrity
Superdome X, which provides a high number of CPU cores and large amount of memory. For more
information, see HP Verified Reference Architecture for Microsoft SQL Server 2014 on HP
Superdome X at: http://www8.hp.com/h20195/v2/GetPDF.aspx/4AA6-1676ENW.pdf

Scale-up OLTP solution

Hyper-Threading should be enabled for all blades. Complete the installation and configuration with
default settings with these exceptions:
• Allow Instant File Initialization (service account given “Perform Volume Maintenance Tasks” ) to
allow faster database file creation
• Set Maximum Degree of Parallelism to 1 to optimize for a strictly OLTP workload
• Increase Maximum worker threads to 4000

Because the system provides a high number of CPUs combined with the large workload, you need to
enable the database to use more threads than are permitted by default. The 4000 number was for a
system with up to eight CPUs; adjust your number accordingly.
• Set Max Server Memory to an appropriate setting for the server:
– 4 Blades using 32x 32 GB DIMMs: 3800000
– 2 Blades using 32x 32 GB DIMMs: 1900000
– 1 Blade using 32x 32 GB DIMMs: 900000
• The following trace flags are enabled to scale performance with large numbers of processor and
large amounts of memory:
– T1118 (improve tempDB concurrency)
– T1117 (reduce allocation contention in tempdb)
– T8048 (reduce CMEMTHREAD waits)
– T3502 (extended checkpoint logging)
– T834 (use large page allocations in Windows)

Microsoft tested this Superdome X solution as providing excellent scalability as demands on the
OLTP database increased and blades were added to the nPar.

Consolidation solution

Administrators should turn on Hyper-Threading and map each VM to the number of Hyper-Threaded
cores available to it (twice as many as the physical cores). The reference architecture has eight VMs
on a four-blade (8S) nPar using 15-core processors. Therefore, each VM maps to a single CPU and is
assigned 30 “CPUs” in the Hyper-V configuration.

Microsoft has no specific guidelines for setting up the VM OS because installing VMs on Superdome
X is no different from installing VMs on any other Hyper-V host. Similarly, administrators can
follow typical installation and configuration procedures for individual instances of SQL Server 2014.
One exception applies: administrators should allow Instant File Initialization (service account given
“Perform Volume Maintenance Tasks” ) to allow faster database file creation.

The SQL Server program files are installed on the drive with the OS files. Data files are installed on
separate volumes exported from the 3PAR storage array, and log files are isolated to their own
volume.

Microsoft tested this solution as supporting excellent scalability and better performance than typically
expected for VMs.

Finding reference architectures and additional information


You should review the HPE Superdome X reference architectures on your own before designing and
implementing a solution. To view the reference architectures, visit the following links:
• “HPE Reference Architecture for Microsoft SQL Server 2014 mixed workloads on HPE Integrity
Superdome X with HPE 3PAR StoreServ 7440c Storage Array”
– http://www8.hp.com/h20195/v2/GetPDF.aspx/4AA6-3436ENW.pdf
• “HPE Verified Reference Architecture for Microsoft SQL Server 2014 on HPE Superdome X”
– http://www8.hp.com/h20195/v2/GetPDF.aspx/4AA6-1676ENW.pdf

The “HPE Integrity Superdome X System Architecture and RAS” provides more information about
the system design and RAS features:

http://www8.hp.com/h20195/v2/getpdf.aspx/4AA5-6824ENW.pdf?
Chapter 8—Activity 5
You will now prepare a presentation of the benefits of your proposal.

As you read in the scenario, the customer wants to


• Prevent future issues with data loss and unplanned downtime
• Improve responsiveness to stop employee complaints and encourage them to use the solution more
effectively

Make sure to address these concerns in your proposal, as well as to list other benefits. Focus in
particular on the RAS features that you just learned. Your presentation will be unique; however, you
can refer to Appendix B: Answers to Activities to compare your plan to the suggested list of features
that you should cover.

As part of your presentation, you can use videos in the HPE Solution Demonstration Portal.
Instructions for accessing and using the Solution Demonstration Portal are provided here for your
reference.

HPE Solution Demo Portal


1. Navigate to the Solution Demonstration Portal at

    https://vrp.glb.itcs.hpe.com/SDP/default.aspx
2. Select Servers & Blades > Mission Critical (see Figure 8-28).
Figure 8-28 Select Servers & Blades in the SDP

3. Click Superdome (see Figure 8-29).

Figure 8-29 Click Superdome in the Mission-Critical section of the SDP

4. Download and watch relevant demonstrations such as those on the Error Analysis Engine or Microsoft SQL 2014.
5. You can explore the portal further if you want.

Summary
In this chapter, you learned how HPE Integrity Superdome X is bringing the RAS required by
mission-critical applications to an x86 architecture. You learned about the types of environments for
which this solution is recommended and how to architect the system for such environments, including
how to create nPartitions. You also explored management options, in particular the HPE SD OA.

Learning check
Review what you have learned by answering these questions. Then check your answers in Appendix
A: Answers to Learning Checks.
1. For which customer need does HPE Integrity Superdome X provide a good fit?
a. Storing HDFS files for a big data analytics solution
b. Supporting a CRM application
c. Providing live transcoding of high definition (HD) video streams
d. Supporting a NoSQL database on top of HDFS files

2. What is one rule that architects must follow when planning network adapters for the BL920s
Gen9 blades in an HPE Integrity Superdome X?
a. Only one blade in each nPartition should have a FlexibleLOM card.
b. The blades in the lowest numbered slot of each nPartition must use the same mezzanine cards.
c. Every blade in the nPartition must have the same number of mezzanine cards.
d. Every blade requires at least one FlexibleLOM card.

3. What are two management tasks supported by the HPE SD OA? (Select two.)
a. Registering a support case with HPE Support
b. Viewing 3D graphical displays of CPU, memory, and other resource utilization on BL920s blades
c. Viewing actions taken by the Error Analysis Engine to automatically mitigate potential issues
d. Auditing the firmware on all blades in an nPartition for consistency and updating them as required
e. Setting up storage controllers on the SAN arrays to which nPartitions connect

For answers, See Chapter 8 in Appendix A.

Supplemental content
The information in the following tables is provided for the activities in this chapter. You should check
the latest QuickSpecs for the most up-to-date information.
Table 8-2 Server blades and processor options for HPE Integrity Superdome X
Table 8-3 Adapter options for HPE Integrity Superdome X
Table 8-4 Interconnect modules for HPE Integrity Superdome X
Chapter 9 Monitoring and Managing HPE
Solutions

EXAM OBJECTIVES
• Recommend and substantiate the HPE management tools that optimize administrative operations
for various customer environments
• Explain the benefits of the HPE Representational State Transfer (REST) application program
interface (API)

Assumed knowledge
Before reading this chapter, you should have a basic understanding of the following:
• Processors, including DDR3 and DDR4 memory, hard disk drives (HDDs), solid state drives
(SSDs), and RAID levels for storage volumes
• HPE ProLiant rack and blade servers and options for them such as HPE Smart Array Controllers
• HPE BladeSystems, including interconnect modules and Virtual Connect (VC) modules
• Server management and maintenance, including experience with iLO, Intelligent Provisioning,
UEFI, HPE Insight Remote Support, HPE Insight Online, HPE Smart Update Manager (SUM), and
HPE Insight Control server provisioning (ICsp)
• HPE OneView capabilities
Chapter topics
You will begin by reviewing HPE management solutions for ProLiant servers, which you learned
about in a previous training. You will then turn your attention to additional management solutions for
the systems that are the focus of this ebook. Earlier chapters covered chassis-level management tools.
In this chapter, you will move from the chassis-level to the rack-level, learning about HPE Advanced
Power Manager (APM), and then from the rack-level to the solution-level, examining HPE hyperscale
provisioning and management solutions for hyperscale servers.

Next, you will explore using the HPE REST API to automate management of HPE ProLiant servers,
including XL servers for Apollo solutions and Moonshot cartridges. Finally, you will learn about
taking the next step, which will take your organization beyond automation to aligning IT with line of
business (LoB) requirements and deploying HPE servers in a private cloud.

Review of HPE management solutions


In prerequisite training, you learned about several tools that help IT staff to manage, monitor, and
troubleshoot HPE ProLiant servers. You will now review these tools.
Chapter 9—Activity 1
In this activity, you will review what you have learned about iLO in prerequisite training. Prepare a
presentation about iLO as if you were presenting the benefits to a customer. Point out benefits in how
iLO aids in
• Setting up and provisioning servers
• Monitoring servers
• Diagnosing and troubleshooting servers
• Connecting to support services

After you create your presentation, compare it with the suggestions shown for this activity in
Appendix B: Answers to Activities.

Review built-in management solutions

Figure 9-1 Review built-in management solutions

The following sections review management options (shown in Figure 9-1) that are built into HPE
ProLiant XL servers, ProLiant Moonshot cartridges, and Integrity Superdome X servers.

iLO

The HPE iLO management engine ships standard with all ProLiant Gen8 and Gen9 servers. The
servers that are the focus of this ebook support these features:
• Agentless Management—A key value of Gen8 and Gen9 servers, this base hardware monitoring
and alerting capability is built into the system (running on the iLO chipset) and starts working the
moment that a power cord and an Ethernet cable are connected to the server.
• Active Health—This component acts as the 24 x 7 mission control for ProLiant servers. To help
mitigate the risk of costly unplanned downtime, Active Health and Insight Online automatically
analyzes the health of Gen8 and Gen9 servers across 1600 data points, enabling clients to resolve
unplanned downtime issues much faster than ever.
• Dynamic Power Capping—This feature enables the measurement of real-time power usage,
correlating power use and server performance, regulating central processing unit (CPU) power
usage based on workload, and regulating and capping power consumption by using a server power
microcontroller to measure and control power consumption. It brings a server experiencing a
sudden increase in workload back under its power cap in less than one-half second. This prevents
any surge in power demand that could cause a circuit breaker to trip.
• Intelligent Provisioning maintenance features—HPE ProLiant XL servers and Moonshot
cartridges do not support the provisioning of Intelligent Provisioning. However, XL servers do
support the maintenance features, including Insight Diagnostics. Insight Diagnostics provides logs,
status information, tests, and diagnostics information—all designed to help IT staff more easily
pinpoint and address problems. (Note that HPE Apollo 4200 Systems, for which the motherboard is
called a ProLiant XL420 server, make an exception. They support both provisioning and
maintenance features of Intelligent Provisioning.)

Unified Extensible Firmware Interface (UEFI)

Unified Extensible Firmware Interface (UEFI) is an industry-standard set of interfaces between the
system firmware and the operating system and between various components of the system firmware.

UEFI is responsible for initializing ProLiant Gen9 server hardware and then handing full hardware
control over to the operating system or the hypervisor. UEFI standardizes the environment for
booting operating systems and preboot applications (such as boot loaders, diagnostics, setup scripts,
and so forth).

Developed by a consortium of industry leaders that includes HPE and Microsoft, UEFI is processor
architecture-agnostic, supporting x86, x64, ARM, and Itanium processors.

The ProLiant system BIOS is a UEFI solution that is based on the latest revisions of UEFI
specification 2.4. ProLiant Gen9 servers are UEFI Class 2 solutions, supporting both legacy and UEFI
boot modes and allowing users to switch between either of these modes. These servers also support
UEFI Secure Boot, which helps to protect the server from malware by only running ROMs, preboot
applications, and OS boot loaders that have been signed by a trusted certificate.

Insight Display

HPE Integrity Superdome X includes Insight Display, which provides a graphical representation of
the physical configuration of the Superdome X enclosure. Insight Display displays an indication when
it detects a device, configuration, power, or cooling error. A display highlighted in green indicates no
errors; a display highlighted in amber indicates that an error has been detected.

Review external management solutions


Additional management resources such as HPE Insight Remote Support, HPE Insight Online, and HPE
SUM offer efficient and comprehensive monitoring and control of HPE servers from virtually
anywhere.
HPE Insight Remote Support

HPE Insight Remote Support connects to server iLO ports and chassis management modules. It helps
IT staff in monitoring for problems and in troubleshooting.

It works with the embedded HPE Integrity Superdome X Error Analysis Engine in the Superdome
Onboard Administrator (SD OA), and it can connect to HPE Back-end to automatically notify HPE
Support when problems with the system occur. Various support contract levels are available
depending on the solution model.

HPE ProLiant servers, including XL servers, also integrate with Insight Remote Support. Refer to this
link for a support matrix:
http://h17007.www1.hp.com/us/en/enterprise/servers/supportmatrix/insight_rs.aspx

HPE Insight Online

HPE Insight Online provides one stop, secure access to the information companies need to support
their servers with standard warranty and contract services. It is a new addition to the HPE Support
Center portal for IT staff members who deploy, manage, and support systems. Through the HPE
Support Center, Insight Online can automatically display devices that are remotely monitored by HPE.
It provides the ability to easily track service events and support cases, view device configurations, and
proactively monitor HPE contracts and warranties. This allows IT staff or HPE Authorized Services
Partners to be more efficient in supporting HPE environments. What’s more, they have the ability to
do all this from anywhere and at any time. HPE Insight Online also provides online access to reports
provided by HPE Proactive Care services.

The embedded management capabilities built into the Superdome X server have been designed to
seamlessly integrate with Insight Online and Insight Remote Support 7.0 (and later). Customers can
also set up their ProLiant servers, including XL servers, for Insight Online.

HPE Smart Update Manager

HPE SUM is HPE’s firmware management and update tool for enterprise environments. It can
remotely update firmware, software, and drivers on HPE ProLiant servers (including the ones
covered in this ebook), as well as firmware on HPE Integrity servers. HPE SUM gives
recommendations for firmware that needs updating and has an easy-to-use, browser-based interface
that provides reporting capabilities, dependency checking, and update installations in the correct
order through the command line interface (CLI), the graphical user interface (GUI), or both.

HPE Insight Control server provisioning (ICsp)

Insight Control server provisioning (ICsp) is designed to help IT staff streamline server-provisioning
administrative tasks. ICsp serves as the common multi-server provisioning capability. ICsp
significantly simplifies the process for deploying operating systems on ProLiant bare-metal servers.

ICsp uses resources such as operating system (OS) Build Plans and scripts to run deployment jobs.
ICsp allows IT staff to
• Install Windows, Linux, ESXi, and Hyper-V on ProLiant servers
• Update drivers, utilities, and firmware on ProLiant servers using Service Packs for ProLiant
(SPPs)
• Configure ProLiant system hardware, iLOs, BIOS, and HPE Smart Array
• Deploy to target servers with or without Preboot Execution Environment (PXE)—on ProLiant
Gen8 and later
• Run deployment jobs on multiple servers in parallel or simultaneously
• Customize ProLiant deployments with an easy-to-use, browser-based interface

Note that ICsp, as of the publication of this ebook, does not support HPE Moonshot cartridges or HPE
BL920s servers for Integrity Superdome X. Refer to the latest support matrix at:
http://www.hp.com/go/insightcontrol/docs

Rack-level management with HPE APM


This section covers HPE APM, a solution for managing shared infrastructure at a rack-level.

HPE Advanced Power Manager

Figure 9-2 HPE Advanced Power Manager

HPE APM is an optional appliance that you should recommend for simplifying the management of
the rack environment. Companies can use APM to monitor and manage power from a single console
for the following servers:
• Apollo 6000 System, including ProLiant XL220a, XL230, and XL250a servers
• Moonshot 1500 System, including a variety of Moonshot cartridges
• SL6500
• SL4500
• SL2500
APM automatically discovers connected chassis, as well as all of the servers installed in them.
Organizations can then connect to the console (as you see in Figure 9-2) to monitor power and
cooling usage, as well as set up policies for dynamically allocating power.

Each APM can support up to 10 HPE Apollo chassis (for a total of 50–200 servers, depending on how
companies populate the chassis), as well as 20 HPE Moonshot 1500 chassis or ProLiant SL servers.
However, it is best practice to use the APM as a rack-level management solution; if you have planned
racks with five Apollo 6000 chassis, for example, you can deploy one APM per rack rather than per
two rack. For a multiple rack solution, companies can link the APMs together so that they can manage
all of them from one console.

Rack management
HPE APM automatically discovers and inventories servers connected to it. Administrators can tag
assets with intuitive names for simpler management. APM receives and logs events from managed
chassis. For example, APM logs fan and power supply status. With its integrated serial aggregator,
APM becomes the single point of connection for management access to all connected chassis. As you
will see in a moment, these chassis no longer require their own iLO connections to permit access to
their servers’ iLO functions.

Integrated power management


HPE APM helps to adjust power to the demands of workloads and to increase data center efficiency
by providing dynamic power allocation, measurement, and control. Administrators can enable power
pooling to provide for more efficient use of the power resources. Administrators can apply these
controls flexibly to control at the rack-, chassis-, or server-levels. For example, they can set power
caps at a global- or zone-level to prevent surges. Note that power capping requires an iLO Scale Out
or iLO Advanced license on the servers.

APM’s control extends across the complete environment. It controls power outlets and measures
current at the power distribution unit (PDU) level. It manages the Power Shelves that support HPE
Apollo 6000 Systems and also integrates with the HPE uninterruptable power supply (UPS)
subsystems.

Planning APM connections


Figure 9-3 Planning APM connections

To enable HPE APM to manage an HPE Apollo system, administrators connect the Apollo chassis
SLAPM1 port to an RDM1 port on APM using an HPE Consolidated Management cable. APM now
provides the iLO connection for the connected chassis. To support HPE Moonshot 1500 Chassis or
HPE ProLiant SL servers, APM must connect to an RDM using its RDM2 or RMD3 port, as shown in
Figure 9-3. The RDM then connects to the Moonshot iLO Chassis Manager (CM) or the SL server.

To enable iLO access to the servers installed in the chassis, connect the APM iLO port to a network
switch; do not connect the chassis iLO port.

It is very important that organizations do not connect the chassis iLO port to the network when the
chassis is connected to an APM. If they do, they will create a loop that can bring down the network.

Note that organizations must also connect the power shelf for Apollo 6000 chassis to APM. Connect
the shelf’s APM module connector to an APM Power Distribution Module port using a Micro DB9 to
DB9 cable or Micro DB9 to Micro DB9 cable. Each chassis still connects to the power shelf through
its MGMT port as it does in a solution without APM.

In addition to connecting the APM iLO port to a network switch, an organization must connect its
Ethernet port to provide remote management access to the console. APM supports these Ethernet
access methods:
• CLI: Secure Shell (SSH) and Telnet
• Simple Network Management Protocol (SNMP)
• Syslogd
• Hypertext Transfer Protocol (HTTP(S)) or XML

APM also permits CLI access through its serial console portal. A separate service port provides read-
only access. SSH, Telnet, and serial access are secured by authentication to up to 11 internal user
accounts or to an external Remote Authentication Dial-in User Service (RADIUS) server.
HPE hypervisor server provisioning and management solution
You will now learn about solutions that HPE provides for provisioning and managing HPE
hyperscale servers, such as HPE ProLiant XL servers and HPE Moonshot cartridge nodes.

HPE Apollo (ProLiant XL server) provisioning options

Figure 9-4 HPE Apollo (ProLiant XL server) provisioning options

Companies can provision HPE ProLiant XL servers that are installed in HPE Apollo chassis in most
of the same ways that they can provision other HPE ProLiant servers, as shown in Figure 9-4. They
can set up one server at a time using a local console, or they can initiate a Remote Console session
using iLO. They can also boot the servers from the network, using a PXE server provided by the
organization. Note that the XL servers do not support Intelligent Provisioning for installing the OS.
They do support the Intelligent Provisioning maintenance features.

The HPE Apollo solutions are designed for high density and hyperscale deployments, so companies
will often appreciate solutions that help them to provision, as well as to manage, the servers on a
larger scale. HPE ICsp is a good solution for automating provisioning and other maintenance tasks.
HPE Insight Cluster Management Utility (CMU) also provides OS provisioning services, but is
focused more on monitoring the health and performance of the hyperscale solution.

HPE Moonshot provisioning options


Figure 9-5 HPE Moonshot provisioning options

HPE Insight CMU also supports HPE Moonshot servers and, for many companies, provides faster and
more painless provisioning than a simple PXE solution can. Alternatively, if the customer is
deploying only a few Moonshot chassis, HPE Moonshot Provisioning Manager (MPM) is a good
choice. Figure 9-5 shows the provisioning options. The next sections provide more information about
these options so that you can select the correct solutions for your customer ’s needs.

HPE MPM versus HPE CMU


You should generally propose either HPE MPM or Insight CMU for the Moonshot solution. To
choose between the two, keep these distinguishing features in mind:
• HPE MPM is focused on Moonshot solution provisioning, while HPE Insight CMU is designed to
manage any HPE cluster and can provision any HPE ProLiant servers, including those in HPE
Apollo and Moonshot systems.
• MPM is targeted as a provisioning tool, while Insight CMU is a complete lifecycle management
solution for hyperscale systems. Insight CMU provides extensive views of node status, health, and
resource use metrics; an alerting framework; and centralization of common cluster management
tasks.
• MPM meets needs for midsized Moonshot deployments. Insight CMU scales for large deployments
with features such as parallel cloning of an image to many servers at once.

HPE MPM
Figure 9-6 HPE MPM

Designed for midsized deployments, the HPE MPM eliminates and streamlines many of the
provisioning steps described in Chapter 6,“HPE Moonshot Solutions.”

Integrators no longer need to customize boot and image files, nor do they need to create their own
answer and auto-install files. HPE MPM provides all of the necessary PXE, Dynamic Host Control
Protocol (DHCP), and file serving services for the deployment. The HPE MPM is designed to deploy
the OS to the headless cartridges and comes with autoinstall templates that integrators easily
customize with the correct settings for organizations’ environments and load to HPE MSM.
Integrators can also load their choice of supported OSs to MPM.

HPE MPM runs as a virtual machine (VM) on Hyper-V or VMware Player, allowing integrators to
install it on a laptop that they can take onsite.

HPE MPM automatically discovers connected HPE Moonshot chassis, their switch modules, and their
cartridges. The station running HPE MPM can connect directly to a Moonshot CM iLO port. The CM
Link port then connects to the first port on the switch module installed in slot A, allowing HPE MPM
to discover and configure the switch as required for the OS deployment. Alternatively, integrators can
connect the CM iLO port and switch module port of one or more Moonshot chassis to a physical
switch, permitting discovery and provisioning of multiple chassis, as shown in Figure 9-6. The
station running MPM is also connected to this switch either directly or through an intervening
network. This latter option permits a remote installation, but must be monitored by a network
administrator to ensure that this network is isolated from other data center networks. Otherwise,
MPM’s DHCP, Network File System (NFS), and other services could interfere with existing services.

After MPM has discovered the chassis, integrators use the MPM GUI to select cartridge nodes and
install an OS on them. Integrators can also back up a node’s image (for example, a golden image
created using mRCA) and then clone this image to other nodes. For these deployment processes,
MPM uses a provisioning virtual local area network (VLAN) that it automatically creates inside the
chassis. It transparently assigns Internet Protocol (IP) settings to nodes on this VLAN, moves the
nodes’ ports to the VLAN (if the switch assigns them to a different one), and initiates the network
boot. MPM also automatically moves the node back to the correct VLAN after the deployment,
making the process entirely transparent.
In addition to providing provisioning, MPM shows cartridge node health status. It also allows
administrators to set up basic switch settings such as splitting a Quad Small Form Factor Pluggable
Plus (QSPF+) port into four 10Gbps ports or creating VLANs and assigning ports to them.

Managing the solution with HPE Insight CMU

Figure 9-7 Managing the solution with HPE Insight CMU

HPE Insight CMU can manage the lifecycle for most HPE ProLiant Gen8 and Gen9 servers, including
HPE ProLiant XL servers in Apollo chassis and HPE Moonshot cartridges. Insight CMU makes it
easier to deploy a high-performance computing (HPC), big data, or other hyperscale solution, as well
as to monitor and maintain it (see Figure 9-7).

Preparing to provision and manage the cluster with HPE Insight CMU

Figure 9-8 Preparing to provision and manage the cluster with HPE Insight CMU

As shown in the Figure 9-8, HPE Insight CMU connects to the cluster servers on two networks: the
management, or iLO, network and a network for administrating the cluster. The second network can
be the fabric used to interconnect nodes in the cluster. You could also propose additional adapters for
the servers and set up a separate cluster administration network. HPE Insight CMU should also
connect to a site network that allows managers to access it.
To prepare the servers to be provisioned by HPE Insight CMU, you must plan the following settings
(for Moonshot solutions, the iLO settings are configured on the iLO CM module):
• Ensure that the servers have an iLO connection to the data center network, whether through their
chassis iLO port or through the iLO port of the HPE APM connected to their chassis.
• Assign each server a static iLO IP address on the same subnet. It is a best practice to assign a range
of consecutive addresses to servers in the same rack.
• Assign each server the same iLO username and password.
• Ensure that each server is connected to the administration network (which could be the cluster
interconnect) and has an IP address on this network.
• Assign each server a static IP address on the same subnet. It is a best practice to assign a range of
consecutive addresses to servers in the same rack.
• Make sure that neither the administration network nor the iLO network has a DHCP server or a
PXE server. Insight CMU will provide these services.
• Use the storage controller to configure a logical drive, which the OS will use after the cloning
process. Create the same logical drive on each compute node. Organizations can use any redundant
array of independent disks (RAID) level if they have multiple drives, but if they only have one
physical drive, they must configure it as RAID 0.
• Make sure that the BIOS settings support the Insight CMU clone process.
– Usually, the default settings work because booting from the network on the iLO adapter has top
priority. In addition, if another boot method has a higher priority, CMU can send a boot next
PXE command.
– Set the Virtual Serial Port (VSP) to COM1, which allows viewing the boot process through the
VSP.

Note that organizations can deploy a redundant Insight CMU appliance. Both appliances must connect
to the cluster servers (iLO and administration networks) as well as to the site network. They each have
their own IP addresses on these networks, but they present a third virtual IP address on each network,
which is the address at which they are contacted. The appliances must also connect to shared storage,
which hosts the Insight CMU directory. Finally, organizations must plan to install a Linux high-
availability solution on the appliances that host the two Insight CMUs.

Provisioning the solution with HPE Insight CMU


Figure 9-9 Provisioning the solution with HPE Insight CMU

After organizations set up the solution as described in the previous section, HPE Insight CMU
discovers all of the servers. Administrators can then select one server and install an image on it. This
server becomes the golden node. Administrators can assign this node to a logical group and easily
clone its image to all other servers assigned to the group; the clone executes on each server in
parallel. Administrators can set up different images for golden nodes assigned to other logical
groups. This simple process (represented in Figure 9-9) drastically reduces provisioning time; after
the clone begins, Insight CMU can provision a thousand servers in less than half an hour.

Controlling the solution with HPE Insight CMU

Figure 9-10 Controlling the solution with HPE Insight CMU

In addition to a CLI, HPE Insight CMU provides a GUI, shown in Figure 9-10. From the GUI,
administrators can control the connected servers. Administrators can copy files to servers. They can
control the servers’ state, shutting them down, booting them, and powering them on and off. For more
nuanced control, administrators can launch a Remote Console to servers, as well as launch SSH
connections. They can even open a connection to multiple servers at once, simultaneously
broadcasting commands from one window to all windows, which greatly speeds management tasks.
Among other settings, they can control the servers’ BIOS settings remotely.

HPE Insight CMU provides firmware audits. When it detects servers with noncompliant firmware,
administrators can initiate the update from Insight CMU.

Monitoring the solution with HPE Insight CMU

Figure 9-11 Monitoring the solution with HPE Insight CMU

HPE Insight CMU monitors servers in the hyperscale solution using a lightweight and efficient
monitoring client, shown in Figure 9-11. Administrators can deploy this monitoring client to the
golden node before cloning the image to other servers for a simple and efficient deployment.
Administrators can then monitor the health of nodes across the hyperscale solution from the Insight
CMU GUI. The GUI presents the entire system at a glance. Administrators can also drill down to
details about a node and assess its performance with metrics such as CPU, graphics processing unit
(GPU), memory, disk input/output (I/O), and network I/O usage.

Time views show usage over time in graphical 3D format. Adaptive stacking allows administrators to
monitor longer periods of time without sacrificing the 5-second granularity. The first 24 rings in the
view each depict 5 seconds, for 2 minutes total. As new data accumulates, the older data slide into an
intermediate ring that combines five rings into one, which depicts 30 seconds of data. The view shows
80 of these rings for 40 minutes of data.

Administrators can tag server nodes with colors in the view so that they can easily pick out the
servers that interest them.

Note that Insight CMU helps administrators to monitor and manage a cluster infrastructure, not the
application that runs on the infrastructure itself. Companies can use it to manage any type of
hyperscale deployment, but it can be especially useful for HPC clusters. In addition, it streamlines
operations by integrating with common HPC management tools, as shown in Table 9-1.
Table 9-1 HPC management solutions with which HPE Insight CMU integrates
Tool Description

Adaptive Moab A workload management solution that automates scheduling and managing HPC jobs on HPC clusters based on
company policies

Altair PBS A workload management solution that automates scheduling and managing HPC jobs on HPC clusters based on
Professional company policies

Mellanox UFM A solution for monitoring, managing, and optimizing the cluster interconnect fabric

UNIVA Grid Engine A solution for scheduling batch jobs on HPC clusters

ScaleMP vSMP A solution that aggregates multiple x86 servers into a virtual entity from the administration viewpoint

Ganglia A distributed monitoring solution for HPCs

HPE REST API


This section explains the benefits of the HPE REST API and provides general guidelines for scripting
to this API.

HPE REST API

Figure 9-12 HPE REST API

Representational State Transfer (REST) is a web service that allows clients to use basic HTTP
commands to perform create, read, update, and delete (CRUD) operations on resources. When an
application provides a REST API, it is called a RESTful application. Developers can use their favorite
scripting or programming language to develop a client or a user interface to issue HTTP commands
and manipulate resources. They can also automate tasks with these scripts. Because REST APIs
provide a simple, stateless, and scalable approach to automating, they are common to two-thirds of
today’s top web environments, and customers’ IT staff should be quite familiar with developing to
them.
HPE has developed the HPE REST API, illustrated in Figure 9-12, as a common interface for
managing its ProLiant servers. Companies can develop scripts to automate tasks such as inventorying,
updating BIOS settings, checking temperature, and much more, helping them take a step toward a
software-defined data center.

Systems that host the REST API

Figure 9-13 Systems that host the REST API

HPE ProLiant Gen8 and Gen9 servers that support iLO 4 firmware version 2.0 support the HPE REST
API, as does the HPE Moonshot iLO CM. Clients can make calls to this API to check and change
settings on this system, as shown in Figure 9-13. The REST API replaces the former iLO Extensible
Markup Language (XML) API (RIBCL) and Intelligent Platform Management Interface (IPMI)
interfaces, enabling control over more features, simpler scripts, and greater scalability.

HPE management solutions such as HPE OneView and HPE Helion CloudSystem 9.x use the REST
API to interact with controlled servers.

Companies can also develop their own clients using a scripting language such as Python, which can
use the cURL library to send and receive the HTTPS traffic on which the REST interactions rely.
HTTPS, rather than HTTP, is required. The client can be programmed to ignore the certificate
presented by the server hosting the API (cURL uses the—insecure flag); however, companies should
not typically do so in a production environment.

HPE OneView currently does not support the servers that are the focus of this ebook. However, some
HPE ProLiant servers can be controlled by both OneView and the REST API. REST API clients
should not alter settings that are controlled by OneView on these servers; instead, they should interact
with the REST API hosted by OneView for these settings. As you see, HPE OneView can use REST
calls to manage systems, and it also hosts a REST API itself with which customer solutions can
interact.

Companies that are not interested in developing their own client, but want to take advantage of the
HPE REST API can use the HPE RESTful API Tool. This tool provides a CLI from which
administrators can manage systems that support the HPE REST API.
REST API operational model

Figure 9-14 REST API operational model

You will now look at how the REST API works in a bit more detail. The REST API defines resources,
each of which is a URL that a client can access. For example, the iLO 4 REST API makes a server ’s
Secure Boot settings accessible at rest/v1/Systems/{item}/SecureBoot ({item} indicates the ID for the
server). A client can send an HTTP command to this URL just as a web browser can browse to a
website.

The table in Figure 9-14 indicates HTTP commands, and how they are used with REST. A GET
Command reads the resource. The API returns information in the form of a JavaScript Object
Notation (JSON) record, which works well with the scripting languages that developers most often
use. For example, a Python script could easily take this record and input its values into a dictionary. A
POST command creates a resource. One of the most common commands that scripts use is PATCH,
which updates one or more properties of a resource without affecting other properties in the
resource. Finally, DELETE removes a resource.

The resource type defines properties for the resource. Each property consists of a key and a value.
For example, one of the properties of the HPSecureBoot resource type is SecureBootEnable, which
defines the state for this setting. The resource type defines valid values for this key. In this case, valid
values include “true” or “false” (in other words, enabled or not). Other properties might use string
values.

The REST API also defines valid commands for each property per resource. For example, some
resources have read-only properties. The API will only respond to GET commands. Other resources
permit PATCH commands for a property, enabling scripts to make calls to change the property.

Figure 9-14 provides an example for making a call to the REST API. A client sends a PATCH
command to /rest/v1/Systems/{item}/SecureBoot. The call includes the SecureBootEnable key and
sets the key to true—the client has enabled the Secure Boot setting.

HATEAOS model
Figure 9-15 HATEAOS model

To provide greater flexibility, the HPE REST API uses a Hypermedia as the Engine of Application
State (HATEAOS) model. This model has clients crawl to resources using links that are embedded
within other resources. For example, a script for enabling Secure Boot should not specify the
/rest/v1/Systems/{item}/SecureBoot URL. Instead, the script should begin by getting the
rest/v1/Systems resource, which includes links to other resources. The script would then search
through the links iteratively—that is, “crawl” to the desired SecureBoot resource. (Note that links do
not follow a strict tree pattern; a resource might link to another resource in another part of the tree.
HPE recommends that scripts add visited links to a dictionary to prevent infinite cycling through
links.)

The HATEAOS approach allows the same script to work for different types of servers. For example,
a ProLiant DL server has just one node, while an HPE Moonshot chassis supports many nodes.
Therefore, the Moonshot iLO CM REST API adds a layer to specify the node and places the BIOS
settings at a lower level than a single-node server. Because the script crawls to these settings, though,
the same script works for both systems. The model also helps future proof scripts so that they work
even if the API slightly changes resource locations.

Scripts can begin crawling at certain fixed resource URLs, which are listed in the table in Figure 9-15.
Begin at the resource that contains the general type of settings that you want to monitor or to
configure. The rest/v1/systems URL contains compute related resources such as BIOS settings. The
rest/v1/chassis URL contains physical resources of all sorts, even for traditional rack servers. Refer
to the table in the figure for other URLs.

Model for changing settings


Figure 9-16 Model for changing settings

Many settings that can be configured through the HPE REST API are associated with two resources,
making it simpler and cleaner for clients to change these settings. As you see in Figure 9-16, the first
resource has a particular URL, and the second resource has the same URL followed by /Settings. For
example, a server ’s boot settings are contained in /rest/v1/Systems/{item}/Bios/Boot and in
/rest/v1/Systems/{item}/Bios/Boot/Settings.

The first resource is read-only. It contains the current settings, as well as a link to the /Settings
resource of the same type. The /Settings resource permits the PATCH command so that clients can
alter the properties. In other words, this second resource holds the pending configuration. When the
server reboots, it applies the pending configurations. Clients can check the results in the read-only
resource’s SettingsResult property.

This model allows clients to change multiple settings without worrying about dependencies and
configuration order. Clients can also query the current setting and the pending setting separately.

Authentication

Figure 9-17 Authentication


The HPE REST API only accepts unauthenticated requests to the top level rest/v1 resource, which
simply lists links to the fixed URLs. Calls to any other resource require authentication (see Figure 9-
17); otherwise, the server hosting the API rejects the call. If a client is making a single request, it can
use basic authentication. That is, the REST API call must include an Authorization=Basic header and
the iLO username and password. (The cURL syntax uses -u username:password.)

For most purposes, the client should log in instead of using basic authentication. To log in, the client
sends a POST HTTPS request to rest/v1/sessions, including the iLO username and password. If these
credentials are correct, the REST API then responds with a message that contains an X-AUTH token
and a location for a session. The client must include the X-AUTH token in every call that it makes.
When the client wants to log out, it deletes the URL for its session.

Redfish conformance
Redfish 1.0 is an open source REST API sponsored and controlled by Distributed Management Task
Force (DMTF), an industry recognized peer-review standards body. (See
http://www.dmtf.org/standards/redfish.) Redfish provides a schema for managing heterogeneous
servers in today’s cloud and web-based data center infrastructures, helping organizations to
transform to a software-defined data center.

As of firmware version 2.3, HPE iLO 4 conforms to Redfish. At the same time, HPE continues to
extend the capabilities of the API with features such as BIOS configuration. The Redfish-compliant
resources are available at redfish/v1 as opposed to rest/v1. Currently, the HPE REST API responds to
both URLs, and it returns properties for both the legacy REST API and Redfish API. Clients can
include the Redfish-required OData header to receive just the Redfish properties.

Companies should begin updating their scripts to use Redfish because HPE will eventually phase out
the legacy API.

Reference to materials for scripting guidelines


HPE provides the following resources to help you script to the HPE REST API:
• Managing HPE Servers Using the HPE RESTful API
http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=c04423967

A User Guide with guidelines for creating scripts


• HPE RESTful API Data Model Reference for iLO 4—Redfish 1.0 Conformance
http://h22208.www2.hpe.com/eginfolib/servers/docs/HPRestfultool/iLo4/data_model_reference.html

A Resource map and resource definitions


• HPE RESTful Interface Tool 1.30 User Guide
http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=c04423965

A User Guide for the HPE RESTful Interface tool


• https://github.com/HewlettPackard/python-proliant-sdk
Python library for the HPE ProLiant iLO REST API
• https://github.com/HewlettPackard/python-hpOneView

Python library scripts for the HPE OneView REST API

You might also want to watch a demonstration about the OneView PowerShell. Access the HPE
Solution Demo Portal by navigating to this link: https://vrp.glb.itcs.hpe.com/SDP/default.aspx

Click the Search The Portal button, and search for “OneView PowerShell.” Then select and watch the
“HP OneView PowerShell automation streamlines IT service delivery” video.

Deploying HPE servers in a cloud


This section explains how companies can take a further step toward the software-defined data center
by deploying their servers in an HPE cloud environment.

Adding HPE ProLiant servers and Moonshot to a cloud


Many of the HPE solutions that you have examined in this ebook are designed for scale-out
applications; companies can easily expand these ultra-dense systems by adding new components such
as new Moonshot cartridges. They can use HPE Helion cloud solutions to make their HPE solution
even more agile and scalable. HPE Helion CloudSystem 9.x can use any HPE servers that support the
required versions of VMware ESXi, Red Hat KVM, or Microsoft Hyper-V. These servers include
HPE Moonshot cartridge nodes as well as many other HPE ProLiant servers.

Cloud computing transforms IT resources such as physical servers, network connections, and storage
drives into abstract compute, network, and storage resources that can be combined into services.
Cloud consumers can then order these services and have them delivered over a network. Common
cloud services include Infrastructure as a Service (IaaS), which provides a VM with accompanying
network connections and storage resources, and Platform as a Service (PaaS), which builds on IaaS to
add tools and applications to the VM. Software as a Service (SaaS) gives users on-demand access to
an application. Consumers can pay for services that they order from a third party—the public cloud
model—or they can order services on demand from their own IT departments—the private cloud
model. Customers can also follow a hybrid cloud model, in which they provide some services in
house with a private cloud and obtain other services from a public cloud provider.

The scale-out model of many of the solution that you have examined in this ebook makes those
solutions well suited to cloud computing environments, particularly when needs are variable.

For example, perhaps an HPE Moonshot solution provides application sharing services. The
company knows that demand for the services will rise but cannot predict exactly which services will
need additional capacity. Traditionally, IT would need to wait until the needs were known, then plan
the deployment of additional servers, and finally complete the deployment. With a private cloud
solution, cloud architects can set up abstract pools of resources and design services in advance. When
LOB users know what they need, they can instantly order the correct service, which is then
automatically deployed to an appropriate resource.
Deploying HPE servers in a cloud can also help organizations that want to become service providers.
For example, a company could use HPE Moonshot solutions with HPE cloud solutions to provide
application virtualization as a service.

Capabilities of HPE Helion cloud solutions

Figure 9-18 Capabilities of HPE Helion cloud solutions

Based on OpenStack, open source IaaS software, HPE Helion CloudSystem Foundation 9.x is the
basic HPE private cloud solution for IaaS. It includes controllers for transparently deploying cloud
workloads to infrastructure devices that are added as compute, networking, and storage nodes. It also
provides interfaces for managing the cloud, creating services, and ordering services.

HPE Helion Development Platform, based on Cloud Foundry and included with Helion CloudSystem
Foundation 9.x, delivers PaaS. This component allows cloud designers to set up complete platforms
with not only the correct OS but also the correct applications. Helion Development Platform supports
Docker containers, which contain all the files and supporting tools for an application. It also helps to
manage the application lifecycle.

To add HPE Moonshot and other HPE ProLiant servers to Helion CloudSystem Foundation,
organizations simply need to install a supported virtualization platform such as Microsoft Hyper-V
or KVM on the cartridge node and import the hypervisor into Helion CloudSystem Foundation (as
shown in Figure 9-18). The cartridge nodes or servers then become compute nodes. Cloud service
designers can create services for IaaS or PaaS. When cloud consumers order these services, the
proper platform is deployed to a compute node, such as a Moonshot cartridge node, in the form of a
VM.

HPE Helion CloudSystem Enterprise 9.x builds on CloudSystem Foundation to provide a complete
hybrid cloud solution for IaaS and PaaS. In addition to the CloudSystem Foundation capabilities just
described, CloudSystem Enterprise adds these components and capabilities:
• HPE Helion Cloud Service Automation, which
– Manages cloud services and workloads across a hybrid environment, including the Helion
CloudSystem Foundation private cloud and various public clouds
– Provides advanced cloud service design tools that can incorporate workflows
– Publishes cloud services as offerings in catalogs with customizable access controls and pricing
structures
– Provides a Marketplace Portal in which cloud consumers can easily browse catalogs and order
services
• HPE Operations Orchestration (OO) Studio and Central for creating and running workflows

Helion CloudSystem Enterprise can help organizations that need to manage their own mix of private
cloud and public cloud services, as well as organizations that want to become cloud service
providers.

Bare metal servers in the cloud

Figure 9-19 Bare metal servers in the cloud

As mentioned earlier, HPE Helion CloudSystem solutions currently support deploying virtualized
workloads to HPE Moonshot cartridge nodes and other HPE ProLiant servers that are set up as
hypervisors. HPE CloudSystem Enterprise 9.x—when integrated with HPE OneView and Insight
Control server provisioning (ICsp)—also supports provisioning physical servers with their OSs and
deploying workloads to such servers (see Figure 9-19). When HPE OneView and ICsp support HPE
Moonshot cartridges, these capabilities will extend to the HPE Moonshot solutions.
Chapter 9—Activity 2
In this activity, you will draw on what you have learned about managing HPE servers in this chapter,
earlier chapters, and prerequisite courses. Read each scenario below. Then select the solution or
solutions that meet the customer ’s needs. You can select more than one letter if the scenario calls for
multiple solutions.

Record your choices and then refer to Appendix B: Answers to Activities to check your answers.

Solution choices
a. HPE Moonshot Provisioning Manager (MPM)
b. HPE mRCA
c. HPE Insight Cluster Management Utility (CMU)
d. HPE Helion CloudSystem Enterprise
e. HPE APM
f. HPE Superdome Onboard Administrator (SD OA)
g. HPE Smart Update Manager

Scenarios
1. You are proposing one HPE Moonshot Chassis to a customer who needs an application
virtualization solution. The customer needs a quick way to provision cartridges with Windows
Server 2012 R2, the XenApp Desktop Virtual Agent (DVA), and other supporting tools. The
customer does not have another solution for this provisioning process and wants you to
provide it.
2. You are proposing five HPE Apollo 6000 chassis to a customer. The customer is focused on
simplifying operations for the small IT team and wants a solution that allows admins to
monitor hardware components and set power policies for servers in all five chassis at once.
3. You are proposing an HPE Integrity Superdome X solution to a customer. The customer needs
an easy-to-use management solution that allows admins to create nPartitions and to set up
notifications in cases of component errors.
4. You are proposing three HPE Moonshot Chassis to a customer. The customer wants to set up a
self-service solution for deploying web server VMs to the cartridges. Your solution must also
help the customer provision the cartridges with the hypervisor.
5. A customer has a data center with HPE Integrity Superdome X and HPE Apollo 6000 Systems.
The customer needs a single solution for patching and updating the firmware on all the systems.
6. You are proposing three HPE Moonshot Chassis to a customer. The customer server
administrators need a way to troubleshoot and debug a cartridge that is not functioning
correctly.
7. You are proposing 12 HPE Moonshot Chassis and 6 HPE Apollo 4200s for a big data and
analytics solution. The customer wants to get the solution up and running as quickly as possible.
The customer also wants a way to monitor the Moonshot cartridge and Apollo XL server
performance and resource utilization.
Summary
In this chapter, you have reviewed the wide range of HPE management solutions that help
organizations to simplify and automate management of their HPE server infrastructure from the rack-
level to the solution-level.

Learning check
Review what you have learned by answering these questions. Then check your answers in Appendix
A: Answers to Learning Checks.
1. Which solution enables organizations to monitor the temperature for HPE scale-out servers at
the rack-level?
a. HPE ICsp
b. HPE APM
c. HPE mRCA
d. HPE Moonshot iLO CM

2. What advantage does the HATEAOS model for the HPE REST API provide?
a. Scripts remain valid for different types of systems, including future ones.
b. Developers have a list of precise URLs for each resource that they need to contact.
c. Users can authenticate securely by submitting a hash value for their password.
d. The client does not need to trust the certificate on the system hosting the REST API.

3. You have connected an HPE Apollo 6000 chassis to an HPE APM. How should you handle the
iLO ports on the Apollo 6000 chassis?
a. Connect both iLO ports in the same VLAN to which APM’s Ethernet port is connected.
b. Connect one and only one iLO port in the same VLAN to which APM’s Ethernet port is connected.
c. Avoid connecting either iLO port to the data center network.
d. Connect one iLO port to APM and the other port to another HPE Apollo chassis.

For answers, See Chapter 9 in Appendix A.


Chapter 10 Working with Customer Business
Financials

EXAM OBJECTIVES
• Demonstrate business acumen through an ability to analyze financial statements
• Define basic financial terms used when talking with a customer ’s executive officers
• Calculate key performance indicators (KPIs) to analyze a customer ’s financial health and
understand industry and company trends
• Use HPE tools to analyze a company’s financial position

Assumed knowledge
Before reading this chapter, you should have a basic understanding of the following:
• Return on investment (or ROI)
• Common financial concerns companies face

Overview of financial statements


You might have heard of financial metrics such as net cash flow, return on investment, or earnings
per share. These metrics tell a story in numbers about the company’s cash flow or financial health.
Each financial metric provides different information about a company and reveals a characteristic of
the bigger picture that might not be apparent from reviewing individual financial figures. It is also
important to evaluate these metrics both over time, including economic downturns, and relative to
competitors.

Most enterprise companies around the world publish some sort of financial statements. Publicly
owned companies in the United States are required to publish the following three primary financial
statements annually and quarterly:
• Income statement—A summary of the company’s performance over a given period. It itemizes the
revenues and expenses that led to the profitability of the company during that period, expressed as
net income or loss.
• Balance sheet—A summary of the company’s financial position on a specific date, usually the last
day of a year-end or quarter-end period. It itemizes what a company owns (assets) and how much it
paid for them, what the company owes (liabilities), and the net result of assets minus liabilities
(equity or retained earnings).
• Statement of cash flows—A summary of the company’s change in cash over a period. It is usually
detailed in three main sections:
– Operating activities (sales of goods)
– Investing activities (purchase or sale of fixed assets)
– Financing activities (stock or bond borrowings or retirement)

These three financial statements combine to provide the critical set of financial information required
to evaluate and manage a company’s business. These documents are published in several places:
• The company’s website, usually under an investor relations tab
• Various geography-specific government or reporting entities:
– USA: Securities and Exchange Commission (SEC) website—http://www.sec.gov/
– International: Bureau van Dijk—http://www.bvdinfo.com/en-gb/home
– UK: Companies House—http://www.companieshouse.gov.uk/
– China: China Company Research Services—http://www.ccrs.info/about.asp
– Japan: EDINET—http://disclosure.edinet-fsa.go.jp/
• Yahoo finance and other financial sites
• The HPE Sales Information Gateway (SIG)
Note
Please disregard any compatibility error message while using a QSFP+/SFP+ adapter.

Financial terms and concepts


Certain terms are regularly used when working with financial statements. KPIs are not used in a
vacuum, but are most valuable when they are evaluated over time and when compared to other
organizations in the customer ’s industry. Knowing the definition of these KPIs is essential to
extracting the information most helpful to you.

Income statement metrics


• Return on investment (ROI)—A numerical representation (expressed as a percentage) of the
earning power of a company’s assets. ROI is calculated as the ratio of the company’s net income to
their average equity. ROI can be calculated for the entire company or a specific project. This ratio
can be compared to the company’s cost of capital to determine if a company or a project is
financially viable.
• Operating expenses (OPEX)—Expenses incurred to perform the company’s daily activities that
are not directly tied to the production of goods for sale. Operating expenses are typically broken
out between selling expenses and general and administrative (G&A) expenses. These expenses can
usually be found on a company’s income statement.
• Gross profit (GP)—Gross profit is calculated as the company’s revenue minus its cost of goods
sold (COGS). This is a measure of how effectively a company uses its supplies and labor in the
production process. This number is usually displayed on a company’s income statement.
• Gross margin—A way of expressing gross profit as a percentage. This is useful when comparing
one company to another. It is calculated as gross profit/sales revenue.
• Operating income (also called operating profit)—Income/profit realized from a company’s
primary business operations. It is less than gross profit because it includes additional expenses,
such as corporate overhead, other nonproduction costs, and depreciation. Operating income is a
synonym for earnings before interest and taxes (EBIT).
• Operating margin—A way of expressing operating profit as a percentage. This is useful when
comparing one company to another. It is calculated as gross profit/sales revenue.
• Direct expenses—Accountants split a company’s costs into two categories: direct and indirect.
Direct expenses are those that vary with changes in sales volumes. Direct expenses are primarily
included in the cost of goods sold section on the income statement. For example, a manufacturing
company’s direct expenses include
– Raw materials used to make final product for sale
– Shipping costs to transport goods to and from the factory
– Labor costs to produce goods
– Packing and other finishing costs
– Indirect expenses—There are many more indirect expense categories than for direct expenses.
These are generally called overhead and do not generally vary with changes in sales volume.
Examples of indirect expenses are rent (both factory and headquarters), salaried compensation,
administrative hourly wages, depreciation and amortization, and research and development.
Indirect costs are included in the operating expense section on the income statement.

Balance sheet metrics


• Net book value (NBV)—The value of an asset carried on a company’s balance sheet. It is the cost
of the asset less the total of accumulated depreciation on the asset.
• Fair market value (FMV)—As differentiated from NBV, FMV is the value of a company’s asset in
the marketplace given a reasonably knowledgeable buyer and seller.
• Current ratio—This is one of the most widely used tests of financial strength and is calculated by
dividing current assets by current liabilities. This ratio is used to determine whether a business is
likely to be able to pay its bills. A minimum acceptable ratio would be 1:1; otherwise the company
would not be expected to pay its bills on time. A ratio of 2:1 is much more acceptable, and the
higher, the better.
• Quick ratio—This is sometimes called the acid test ratio because it concentrates on only the more
liquid assets of a business. It is calculated by dividing the sum of cash and receivables by current
liabilities. It excludes inventories or any other current asset that might have questionable liquidity.
Depending on the company’s history for collecting receivables, a satisfactory ratio is 1:1.
• Working capital—Bankers especially watch this calculation very closely because it deals more
with cash flow than just a simple ratio. Working capital equals current assets minus current
liabilities.
• Inventory turnover ratio—Not every business has an inventory that needs to be considered. This
ratio tells you if inventory is turning over fast enough and is calculated by dividing net sales by the
average inventory.
• Leverage ratio—This is another of the analyses used by bankers to determine if a business is
creditworthy. Basically, it shows the extent to which the business relies on debt to keep operating.
This ratio is calculated by dividing total liabilities by net worth (total assets minus total liabilities).
The higher the ratio, the more risky it becomes to extend credit to the business. This is often the
calculation a supplier to the business will make before extending credit.

Cash flow statement and combination metrics


• Capital expenditures (CAPEX)—Cost to acquire or upgrade physical assets of the company.
These include property, industrial plants, production equipment, and IT infrastructure. The concept
is that these expenses provide benefit for more than just the current accounting period and need to
be included as an asset on the balance sheet. Total CAPEX (or capital spend) for any given period
can usually be found on the company’s statement of cash flows.
• Earnings before interest, taxes, depreciation, and amortization (EBITDA)—A common metric
used to analyze a company’s profitability. It is calculated by adding back depreciation and
amortization expenses to operating income (EBIT), resulting in a larger number. It can also be
calculated as revenue minus expenses (excluding tax, interest, depreciation, and amortization).

Project-specific metrics
• Total cost of ownership (TCO)—Cost of direct capital investment in hardware and software plus
indirect costs such as installation, training, downtime, and licenses. TCO measures the economics
of the IT assets over their useful service life.
Note
HPE provides several whitepapers, videos, TCO calculators, and other resources and tools related to TCO:
http://www.hpe.com/go/TCO

• Internal rate of return (IRR)—A discounted cash flow (DCF) method used to compare a
company’s investment alternatives. It takes the expected cash outflows and inflows over the
project’s life and calculates the rate that reduces the net present value of those cash flows to zero. If
the IRR is greater than the company’s desired rate of return on investment, then the project is
desirable.
• Net present value (NPV)—A second DCF method used to compare a company’s investment
alternatives. With this method, a company’s desired rate of return is applied to the expected cash
outflows and inflows and returns a dollar amount that represents the investment’s NPV. A positive
NPV means the investment earns a return greater than the company’s desired return. A negative
NPV means a worse return. The project with the highest NPV is the most desirable.
• Total cost of acquisition (TCA)—Sum of all costs incurred when buying goods, including all
ordering, shipping, carrying (for example, storing or holding), and stockout/shortage.
• Cumulative average growth rate (CAGR)—Average annual rate of increase in the value of
investment compounded over several years.
• Payback period (PB)—The expected time needed to recover a company’s initial investment in a
project.
• Lease rate factor (LRF)—The lease payment as a percent of the total cost of the leased equipment.

Analyzing financial statements


Financial statements show the profitability of the business and its financial position at a specified date.
This information is helpful to understand how to position an IT solution relative to the customer ’s
budget.

It is valuable not just to look at these metrics for a specific company at a single point in time but also
to look at the trends over long periods of time, during recessions and expansions. In addition,
comparing a company’s metrics to others in their industry and to industry averages is a useful
exercise. It is helpful in identifying areas of strength and, more importantly, areas for improvement.
• Trend analysis—By comparing current year financial statements to financial statements from
earlier years, you can see how the business has evolved. This type of analysis allows you to answer
questions such as
– At what rate have annual sales been growing (or declining)?
– How has the cost structure of the business changed, both in absolute dollars and as a percentage
of sales?
– Is the company becoming more efficient? You can assess this by looking at the profitability
trends.
– Is the company consistently generating positive cash flows? In which direction are they
trending?
• Industry comparisons—This analysis is a comparison of the customer ’s performance to others in
their industry. It is helpful to look at individual competitors as well as the industry as a whole. With
this information, you can identify areas where the company is exceeding or lagging its
competition.
• Actual compared to planned performance—Depending on your access to data, it is likely that you
can access the company’s previous financial projections. You can compare the projections to actual
results statements by line item. If the customer had fewer sales than planned, you should know or
find out why. Similarly, you should also understand the reasons for any cost overruns.

HPE financial tools


HPE provides tools such as these to help you analyze a company’s financial position:
• HPE Converged Infrastructure Business Value Calculator
• HPE Hyperscale Business Value Calculator
• HPE Client Virtualization ROI Calculator

HPE Converged Infrastructure Business Value Calculator


Figure 10-1 HPE Converged Infrastructure Business Value Calculator

Modern IT solutions generate significant cost savings that can affect a customer ’s bottom line. With
the HPE Converged Infrastructure Business Value Calculator (also known as the HPE Converged
Infrastructure ROI Calculator), you can determine how much a customer can save by moving to an
HPE Converged Infrastructure. In the tool, shown in Figure 10-1, you enter the number of servers,
number of virtual machines per virtual host, and storage capacity. The tool quickly predicts the ROI,
TCO, and payback period for several HPE solutions. It also generates a detailed report filled with
custom recommendations and cost breakdowns. Customers can discover savings on power, cooling,
labor, support, infrastructure, downtime, and more. Because the tool includes all of these factors, it
helps you to demonstrate the value of your proposal in the most compelling way possible. For
example, the cost of the HPE solution might be offset by savings in licensing fees or power costs.

There are many cases where you might want to present a single combined financial benefit view to a
customer when positioning servers, storage, networking, and services. This calculator enables you to
do that, as well as to compare a build-your-own solution to a ConvergedSystems solution.

Customers can use the customer-facing version to see which solution might suit their requirements
based on the building blocks of ConvergedSystems. This is helpful for lead generation. Sales and
presales teams can also help with qualifying the lead and detailed analysis. Currently, the calculator
enables you to compare legacy equipment to new solutions, but as enhancements are added to the tool,
some competitive information might be added.

You can generate a TCO/ROI analysis for your customer by completing these simple steps:
1. Capture the customer ’s current IT organization and IT environment details and input the data
into the tool.
2. Select the relevant HPE servers, storage, network devices, switches, and services that you are
proposing to the customer to replace their current IT infrastructure.
3. The tool then analyzes the various direct and indirect costs within the customer ’s IT
environment and creates an estimated cost comparison based on CAPEX/OPEX cash flow
forecasts. You can view the details of each cost item and edit the cost configuration where
possible in the tool.
4. The Financials section presents the financial numbers that show you the cost of doing nothing
versus investing in HPE Converged Infrastructure as a platform for IT applications. It also
shows how soon and how much would be the payback for the customer.
Note
You can access the HPE Converged Infrastructure Business Value Calculator at:
https://roianalyst.alinean.com/ent_02/AutoLogin.do?d=56724514952587761

HPE Hyperscale Business Value Calculator

Figure 10-2 HPE Hyperscale Business Value Calculator

This calculator has been designed to capture the needs of the density-optimized solutions (see Figure
10-2). The calculator covers solutions for ProLiant SL and XL servers and can compare ProLiant SL
servers to Moonshot systems.
Note
HPE Moonshot is not included in this calculator; it is handled by the Moonshot Business Value calculator. The HPE Apollo 8000
server line will be added to this calculator in the future.

Currently, there are no customer-facing functions for this calculator. However, there are sales and
presales functions. The sales function has a minimal number of questions and should be used to
qualify a lead. The presales function is designed to enable you to meet with a customer and customize
the analysis to better match the customer ’s environment.

HPE Client Virtualization ROI Calculator


Figure 10-3 HPE Client Virtualization ROI Calculator

The Client Virtualization ROI Calculator (shown in Figure 10-3) is a detailed tool for comparing a
legacy environment of traditional desktops to a VDI-only or VDI/HDI hybrid environment. It includes
servers, storage, software, networking, and thin clients to produce a comprehensive ROI analysis.

This tool can be used together with a desktop assessment commonly performed during a client
virtualization opportunity.
Note
Eventually, this tool will be retired and moved into the Converged Infrastructure Business Value calculator.
Chapter 10—Activity
You will now return to the scenario that was introduced in Chapter 4, in which the automotive
company needs an HPC solution. You will practice using the HPE Alinean TCO Calculator to
persuade the CFO of the value of your solution. Record your answers and notes as you go through the
tasks. Refer to Appendix B: Answers to Activities to check your answers.

Scenario
Earlier, you designed an HPE Apollo 6000 solution for an automotive manufacturer who needed a
better foundation for its Electronic Design Automation (EDA) application. You began planning a
proposal and presentation of the benefits of your proposal. You have won the customer CEO and CIO
to your side, but you still need to strengthen your case to the CFO.

You will now use the HPE Alinean TCO/ROI Calculator to make your presentation even more
compelling. This web-based tool allows you to compare the TCO between HPE hyperscale Apollo
solutions and competing solutions. It demonstrates savings over the life of the solution, making it
easy for you to build a business case that highlights the power and cooling savings delivered by an
HPE Apollo System.

You will also learn how to interpret the results of an Alinean analysis and communicate them to a
CFO.

The publically available Alinean Tool is a scaled-down version of the tool that HPE certified Server
Master Architects would use when working with customers. This activity uses the more detailed
version of the tool. You require access to the HPE Partner Portal to complete the activity.

Use the HPE TCO/ROI calculator


1. Log into the HPE Partner Portal.
2. Select My Workspace > TCO/ROI Solutions.
3. Click Go (see Figure 10-4).
Figure 10-4 TCO/ROI Solutions page

4. Under Create New Analysis, expand HPE Servers and select HPE Hyperscale Business Value
Calculator v2.3.
5. Enter names for the company and the analysis.
6. Click Create a New Analysis (see Figure 10-5).
Figure 10-5 Create a New Analysis

7. Read through the tutorial if you like.


8. Click the Analysis Selection tab.
9. Select Pre-Sales – Prove Value (see Figure 10-6).

Figure 10-6 Analysis Selection tab

10. Click Proceed to Analysis.


11. Fill in the fields based on the scenario and your plan; refer back to Chapter 4—Activity 1.
(Assume that testing indicated that your plan is adequate.) Use Dedicated Power and Cooling
and a max kW per rack of 7 kW. Run the analysis for three years (refer to Figure 10-7).
When you specify the number of servers, remember that each ProLiant XL220a Gen8 tray has two
servers. Specify the number of servers, not the number of trays.

Figure 10-7 HPE Hyperscale Business Value Calculator

12. Compare with SuperMicro, which is the competitor that the customer is considering. The
SuperMicro SD-5038ML-H8TRF is also one-processor server, so you should specify the same
number as HPE servers.
13. Scroll down to the results (see Figure 10-8). Note the high-level comparison, including the
number of racks required, the number of cores provided, the number of Watts consumed, and
the TCO. Begin planning how you will use this information in your presentation to the CFO.

Figure 10-8 Results—Three Year Analysis

14. You can click the Configuration button under the ProLiant XL servers to adjust the plan if you
like (see Figure 10-9). For example, you could change the support pack or the TOR switches,
or you could add HPE Insight CMU to the plan.

Figure 10-9 Configuration tab

15. Click the Assumptions tab to see a breakdown of the assumed costs (see Figure 10-10). You can
adjust these to reflect the customer ’s situation more closely. For example, you can adjust the
cost of power.

Figure 10-10 Assumptions tab


16. Click the Financial Results tab to review details for the TCO comparison.
17. For which types of costs does the HPE solution provide a higher cost? For which types of costs
does it provide a lower cost? Begin to plan how you will discuss the comparison with the CFO.
18. You can clear a check box for any type of cost to remove it from the comparison. Explore
clearing various check boxes. Then select them all again.
19. Click the graphic comparing the TCO for the solutions to enlarge it (see Figure 10-11).

Figure 10-11 TCO graph

20. Use this graphical representation to draft an explanation for the CFO about the difference in
TCO of an HPE Apollo 6000 solution after just three years.
21. The automotive company might be interested in an HPE Financial Services option. Click the
graphic to enlarge it (see Figure 10-12).
Figure 10-12 HPE Financial Services option

22. Use this graph to help you draft an explanation of how this service can help to decrease the
impact on the customer ’s cash flow for a single year.
23. Click Create a Report
24. Use the Word template to create the report (see Figure 10-13).
Figure 10-13 Create a Report

25. You can open the report and edit it. In the real world, be sure to edit the report to customize the
results for your customer.

Talk with the CFO


Use the Alinean tool report and the notes from the previous task to answer the following questions.
1. How does the HPE Apollo 6000 solution help to support the company’s environmental
initiatives?
2. Based on what you learned earlier about the benefits of HPE Apollo 6000 and ProLiant XL220a
systems, how will the HPE solution deliver a favorable ROI?
3. Will you recommend the solution to the CFO?
4. Prepare a pitch to convince the CFO of the benefits of this solution.

Summary
Financial metrics tell a story in numbers about a company’s cash flow or financial health. Each
financial metric provides different information about the company and reveals a characteristic of the
bigger picture that might not be apparent from reviewing individual financial figures. It is also
invaluable to evaluate these metrics over time, including economic downturns, and also relative to
competitors.
Certain terms are regularly used when working with financial statements. Knowing the definition of
these terms is key to extracting the information most helpful to you when working with a customer.

Financial statements show the profitability of the business and its financial position at a specified date.
This information is helpful for you to understand how to position an IT solution relative to the
customer's budget.

HPE provides tools that help you analyze a company’s financial position, including the following:
• HPE Converged Infrastructure Business Value Calculator
• HPE Hyperscale Business Value Calculator
• HPE Client Virtualization ROI Calculator

Learning check
Review what you have learned by answering these questions. Then check your answers in Appendix
A: Answers to Learning Checks.
1. Which financial term describes the periodic rental payment, expressed as a percentage (or
decimal equivalent) of equipment cost? It is used to calculate payments, given the cost of
equipment (for example, 0.0240 on equipment cost of $10,000 requires a monthly payment of
0.0240 × $10,000 = $240).
a. Internal rate of return (IRR)
b. Net present value (NPV)
c. Lease rate factor (LRF)
d. Cumulative average growth rate (CAGR)

2. Which financial term describes the type of business outlay that is reflected on the company’s
balance sheet as assets? It creates a depreciation expense on the income statement for each year
of the asset’s depreciable life. This depreciation expense lowers reported income (profit),
thereby creating a tax savings for each of these years.
a. Cumulative average growth rate (CAGR)
b. Operating expenditure (OPEX)
c. Net present value (NPV)
d. Capital expenditure (CAPEX)

3. Which type of business outlay addresses spending on predictable, repeatable costs for items or
services that are not registered as assets and that are not depreciated? It impacts reported profit
and taxes on earnings only in the single reporting period it is incurred.
a. Internal rate of return (IRR)
b. Total cost of acquisition (TCA)
c. Operating expenditure (OPEX)
d. Gross Profit (GP)

4. What is the monetary amount by which an asset is valued in business records, a figure not
necessarily identical to the amount the asset could bring on the open market? It could also be
used to designate the sum of the assets on a portfolio or within a company.
a. Net investment value (NIV)
b. Net book value (NBV)
c. Fair market value (FMV)
d. Total cost of acquisition (TCA)

For answers, See Chapter 10 in Appendix A.


Chapter 11 Practice Exam

Introduction
This practice exam is designed to test your readiness for the HPE0-S22 exam.

The HPE0-S22 exam tests candidates’ knowledge and skills on advanced architecting HPE server
products and solutions. Topics covered in this exam include advanced server architectures and
associated technologies, as well as their functions, features, and benefits. Additional topics include
analyzing the server market, positioning HPE server solutions to customers’ solutions, demonstrating
server-related business acumen, and explaining how the HPE Transformation Areas relate to HPE
server products and solutions.

Exam details

The following are details about the exam:


• Exam ID:HPE0-S22
• Number of items:60
• Item types:Multiple choice (single response); multiple choice (multiple responses); matching
• Exam time:90 minutes
• Passing score:70%

HPE0-S22 testing objectives


The exam is designed to validate that candidates can successfully meet the following objectives. The
percentage next to each of the main objectives indicates how the objective is weighted in the exam.
15% Foundational server architectures and technologies
• Determine optimal processors for specific use cases and operational workloads.
• Determine interconnect (networking, storage) technologies based on customer/solution
requirements.
• Explain the benefits of APIs.

25% Functions, features, and benefits of HPE server products and


solutions
• Differentiate and position the HPE server product offerings, architectures, and options.
• Explain the functions and benefits of HPE health and fault technologies.
• Compare and contrast management tools.
• Given a customer environment scenario, recommend and substantiate which HPE management
tools optimize administrative operations.

20% Analyzing the server market and positioning HPE server


solutions to customers
• Determine an approach to address customers’ business requirements (TCO, ROI, IRR, NPV, TCA,
CapEx, OpEx, HPE financial services, and so forth).
• Explain how the four HPE Transformation Areas relate to given server solutions.

40% Planning and designing HPE server solutions


• Given a scenario with changed customer requirements, recommend modifications to the
implementation plan.
• Given a customer ’s storage infrastructure (for example, iSCSI, Fibre, NAS, DAS), determine an
appropriate configuration for server deployment.
• Given a customer ’s networking infrastructure, determine an appropriate configuration for server
deployment.
• Determine customer ’s internal/external storage capacity and performance requirements.
• Given a scenario, determine the customer ’s IT maturity and recommend next steps.
• Given an anticipated performance bottleneck, determine an appropriate design solution.
Practice exam questions
As you take this practice exam, remember to read all the choices carefully because there might be
more than one correct answer. Answers and explanations are provided at the end of this chapter.
1. A server architect is planning an HPE Apollo 6000 solution for a customer ’s weather modeling
high performance computing (HPC) application. For the initial design, the architect needs to
select the proper compute tray for the workload. Which question helps the architect determine
whether the ProLiant XL250a could be a better fit than the XL230a?
a. Does the customer’s application support GPU acceleration?
b. Does the application have high memory requirements?
c. Which GPU vendor does the customer prefer?
d. Does the customer have a preference for InfiniBand or Ethernet fabrics?

2. An architect is proposing an HPEBladeSystem solution with Virtual Connect FlexFabric-20/40


F8 Modules. The customer needs an external Fibre Channel (FC) storage solution for the blade
servers. The solution must be simple to deploy and easy to manage, and the customer also
wants to reduce equipment. The architect is proposing an HPE 3PAR StoreServ System.
What should the architect propose for connecting the BladeSystem to the storage?
a. Connecting the Virtual Connect modules to Ethernet switches that support iSCSI and connect to the StoreServ System
b. Adding FC SAN switch modules to the BladeSystem and setting up a SAN to connect to the StoreServ System
c. Directly connecting the Virtual Connect modules to the StoreServ System
d. Adding Virtual Connect 16Gb 24-port Fibre Channel Modules to the BladeSystem and directly connecting them to the
StoreServ System

3. An architect is proposing an HPE Moonshot 1500 System that has 40 m710p cartridges and five
m300 cartridges. The architect is now planning the switch module solution. The customer
requires
– The highest possible bandwidth for the m710p cartridges
– Advanced data center technologies such as TRILL
– LACP NIC bonding on the cartridge ports
Which switch solution should the architect propose?
a. One Moonshot 45Gc switch and one Moonshot 45XGc switch
b. Two Moonshot 45XG switches
c. One Moonshot 45G switch and one Moonshot 45XG switch
d. Two Moonshot 45XGc switches

4. An architect is proposing several HPE c7000 Blade Enclosures that are managed by HPE
OneView. The customer has developed an in-house management solution for inventorying and
tracking assets. The customer wonders how the new HPE solution will fit with its existing
management solution.
What should the architect explain?
a. The customer should program the in-house solution to receive information about the new servers using the HPE OneView
REST API.
b. The customer should import the SNMP templates used by OneView into its in-house management solution.
c. The customer should replace the in-house solution with HPE OneView, which provides all the capabilities that the customer
requires.
d. The customer should program the in-house solution to use SOAP to communicate directly with blade servers.

5. An architect is planning an HPE Moonshot System for a customer who needs a hosted desktop
solution. The hosted desktops are for graphic designers who run a variety of rich media
applications. Decision makers have emphasized that they want the solution that provides the best
performance and user experience.
Which cartridges should the architect propose?
a. m400 cartridges
b. m700 cartridges
c. m710p cartridges
d. m800 cartridges

6. An architect is designing an HPE Apollo 2000 solution for a customer and needs to choose the
chassis. What is one reason to propose the r2800 chassis versus the r2600 chassis?
a. The customer needs a higher density of servers per chassis.
b. The customer needs flexibility in allocating local drives to servers.
c. The customer needs a higher density of local drives per chassis.
d. The customer needs the ability to aggregate server connections.

7. A customer needs a new server solution to host its transactional database and its business
intelligence application, both of which are growing in size. The databaseis licensed per
processor core. Which solution should the architect propose to help both improve performance
and reduce licensing costs?
a. HPE Integrity Superdome X with an nPartition scoped to the size of each application
b. HPE Integrity Superdome X with a VMware ESXi virtual machine to host each application
c. HPE Moonshot with multiple cartridges scaled out to meet the needs of each application
d. HPE Moonshot with a dedicated m710p cartridge per application

8. A customer has HPE BladeSystem and HPE 3PAR StoreServ solutions that are managed by HPE
OneView 2.x. VMware ESXi hosts are deployed on the blade servers, which use StoreServ for
storage volumes. What is one benefit of HPE OneView for vCenter for this customer?
a. Integrated management of the StoreServ solutions from vCenter
b. A OneView Dashboard integrated into vCenter Operations Manager
c. Automated deployment of HPE StoreOnce to StoreServ Systems
d. A self-service portal for deploying cloud workloads to the hosts

9. A customer has been using HPE Virtual Connect Enterprise Manager (VCEM) to manage its
Virtual Connect solutions. The customer has been adding more HPE servers and storage
solutions to its data center and now wants to deploy HPE OneView 2.x to manage all HPE
servers, storage, and VC modules centrally.
What should the architect explain about how to make this change successfully?
a. Administrators should add the VC modules to OneView so IT staff can manage the solutions from both VCEM and OneView,
as they choose.
b. Administrators should remove VCEM and then migrate VC module management to OneView.
c. Administrators should discover VCEM from OneView, which will manage the VC domains through VCEM.
d. Administrators should add a OneView license to VCEM before deploying OneView to ensure that OneView and VCEM can
integrate successfully.

10. An architect is proposing an HPE Apollo 6000 solution to a customer. The solution includes
multiple racks of Apollo chassis. Although the customer does not require advanced monitoring
capabilities at this point, the customer needs to simplify and accelerate the deployment of
images to Apollo servers. The customer also wants a tool for simplifying the maintenance of
servers centrally.
Which HPE solution should the architect propose?
a. HPE Cluster Management Utility (CMU)
b. HPE OneView 2.x
c. HPE Onboard Administrator (OA)
d. HPE Smart Update Manager (SUM)

11. An architect is working with a customer who is considering an HPE server solution. The
customer needs to assess whether it is worthwhile for the company to invest in the solution.
Which value helps the customer compare investment alternatives, taking into consideration the
company’s desired rate of return, as well as each investment’s expected cash inflows and
outflows?
a. Cumulative average growth rate (CAGR)
b. Total cost of ownership (TCO
c. Net present value (NPV)
d. Lease rate factor (LRF)

12. An online retailer collects a great deal of information about its customers and their purchases
in the form of structured databases, as well as emails, messaging boards, and social media
content. The retailer is looking for ways to become more competitive.
Which transformation area should the architect focus on?
a. Transform to a hybrid infrastructure
b. Protect the digital enterprise
c. Empower the data-driven enterprise
d. Enable employee productivity

13. Refer to Figure 11-1.


Figure 11-1 Exhibit for item 13

The architect was planning to connect HPE Apollo 6000 Management Module iLO ports to the
network as shown. Customer decision makers then indicated that they want a more highly
available design for iLO functions. How should the architect change the design?
a. Remove the links between the chassis and connect each chassis to the management network switch on one iLO link.
b. Remove the links between the chassis and connect each chassis to the management network switch on two iLO links.
c. Connect the second port on the bottom chassis to the management network switch.
d. Connect the second port on the bottom chassis to a different management network switch and make sure that both management
switches are connected on the same VLAN.

14. An architect is proposing several HPE Moonshot System for supporting the Cloudera
distribution of Hadoop MapReduce 2 and several HPE Apollo 4200 servers for supporting the
Hadoop Distributed File System (HDFS). The Moonshot Systems use:
– m300 cartridges
– Moonshot-45Gc Switch Modules
When testing the application on the proposed solution, the architect discovers high latency for
disk IO during the shuffle phase.
What should the architect consider to improve performance for the solution?
a. Replacing the 45Gc modules with Moonshot-45XGc Switch Modules
b. Replacing the HPE Apollo 4200 servers with HPE Integrity Superdome X servers
c. Ensuring that the m300 cartridges are using the highest capacity SSDs
d. Adding more DDR4 memory to the m300 cartridges
15. Refer to Figures 11-2, 11-3, and 11-4.

Figure 11-2 Exhibit 1 for item 14

Figure 11-3 Exhibit 2 for item 14


Figure 11-4 Exhibit 3 for item 14

An architect is proposing HPE Apollo 6000 Systems with ProLiant XL 220a compute modules for
a customer ’s high performance computing (HPC) application. For each compute module, the
architect plans two nodes, each with:
– One Intel Xeon E3 1200 v3 series processor with four cores at 3.5 GHz
– Two 8GB DIMMs (16 GB total)
– Two 400GB SSDs (800 GB total)
The architect tests the application on the proposed solution and discovers the results shown in the
exhibit.
What should the architect consider changing to resolve potential performance issues?
a. Replace the SSDs with higher capacity HDDs.
b. Add another processor to each node.
c. Select processors with more cores.
d. Add more memory capacity to each node.

16. An architect is proposing an HPE Integrity Superdome X System for a customer business
intelligent application. The application needs to have access to block-level storage for data
mining. What should the architect plan to fulfill this requirement?
a. External Network Attached Storage (NAS) on a server such as HPE Apollo 4200
b. External Fibre Channel (FC) storage such as HPE 3PAR StoreServ
c. HDDs local to each blade on the application’s nPartition
d. An SSD storage blade within the application’s nPartition

17. An architect is proposing an HPE Moonshot System with HPE Moonshot-45XGc Switch Modules.
The Moonshot System requires these connections on each uplink module:
– Four 10GbE connections to HPE Apollo 4200 servers in the same rack
– Two 40GbE connections to top of the rack (TOR) switches
Which uplink solution should the architect propose?
a. Moonshot-16SFP+ Uplink Modules with QSFP+ and SFP+ transceivers
b. Moonshot-6SFP+ Uplink Modules with QSFP+ and SFP+ transceivers
c. Moonshot-4QSFP+ Uplink Modules with QSFP+ transceivers and DAC splitter cables
d. Moonshot-4QSFP+ Uplink Modules with QSFP+ transceivers, QSFP+/SFP+ adapter kits, and SFP+ adapters

18. Match each member of the HPE Apollo 4000 Family with a typical situation for proposing it.
a. Apollo 4200
b. Apollo 4510
c. Apollo 4530
___ The customer requires a solution that provides both compute and storage for a complex data analytics application.
___ The customer needs a server for hosting its Scality object storage solution.
___ The customer is just getting started with big data analytics and needs an entry-level solution.

19. A customer requires a solution for supporting the Cloudera distribution of Spark. The architect is
proposing:
– Three HPE Moonshot Systems with m710p cartridges for supporting Spark
– Five HPE Apollo 4200 servers for supporting the Hadoop Distributed File System (HDFS)
Based on the typical requirements for the application, what should the architect consider changing
about the proposal?
a. Adding more Moonshot Systems
b. Adding more HPE Apollo 4200 servers
c. Replacing the m710p cartridges with m300 cartridges
d. Replacing the Apollo 4200 servers with Apollo 4530 servers

20. A customer needs a solution for a SAP HANA database. Which solution should the architect
propose?
a. HPE Moonshot System
b. HPE Apollo 4510
c. HPE Apollo 4530
d. HPE Integrity Superdome X

Practice exam answers


This section provides answers and explanations for the practice exam. If you need to review a topic in
more detail, see the provided reference.
1. A is correct. A primary distinguishing feature of the XL250a, as opposed to the XL230a, is its
support for GPU or coprocessor accelerators. Therefore, architects should ask about whether
the customer ’s application supports GPU acceleration early in the design process.
B is incorrect. Both the XL230a and XL250a modules support the same maximum memory, so
this question does not help the architect select the right module for the workload.
C is incorrect. This might be an important question later, but the XL250a supports GPUs from
several vendors, whereas the XL230a does not support any GPUs at all. This question is less
important at this point.
D is incorrect. Both of the modules in question support InfiniBand or Ethernet adapters, so this
question does not help the architect choose between the modules.
To review topics related to this question, refer to “HPE Apollo 2000 and 6000 architecture” in
Chapter 4.
2. C is correct. Virtual Connect FlexFabric-20/40 F8 Modules support direct attachto 3PAR
StoreServ Systems. On the downlink side, servers use FlexFabric adapters for both traditional
data and storage traffic. This option eliminates a great deal of SAN equipment and is simple to
set up.
A is incorrect. The customer requires FC storage, not iSCSI storage.
B is incorrect. This option would require FC adapters on the blade servers and additional SAN
equipment. It does not meet the customer ’s requirements for a simple solution.
D is incorrect. This option would require FC adapters on the blade servers, so it does not meet
the requirements to eliminate as much equipment as possible.
This item tests whether you have the required knowledge from prerequisite training, including
Architecting HP Server Solutions (ASE-level training). You should review features of
BladeSystems and Virtual Connect modules, as well as other topics, to prepare for the exam.
3. D is correct. The customer wants the highest bandwidth possible for the m710p cartridges, so
the architect must propose 45XG or 45XGc switches, which support 10GbE. (These switches
will also support the m300 cartridges, although the m300 cartridges will only receive 1GbE
connectivity.) The 45XGc is the correct choice because it supports the advanced technologies.
Finally, the customer requires two of these switches to support both of the cartridges ports,
which will use NIC bonding. (The switches can use their IRF technology to support LACP NIC
bonding.)
A and C are incorrect. Both switch modules must be the same type.
B is incorrect. The 45XG switches do not support the advanced technologies that the customer
requires.
To review topics related to this question, refer to “HPE Moonshot networking” in Chapter 6.
4. A is correct. The HPE REST API accepts calls from applications and helps customers automate
server monitoring, management, and maintenance using applications of their choice. Servers’
iLO engines support the REST API, and so does HPE OneView. In this case, the servers are
managed by HPE OneView, so the application should use the OneView REST API. HPE offers a
number of tools, such as a Python library, for helping customers to script to this API.
B is incorrect. Importing SNMP templates won’t help the in-house application integrate
inventory information from OneView.
C is incorrect. The customer wants to leverage the in-house application, and the architect should
explain to the customer how HPE helps.
D is incorrect. The OneView REST API offers the simplest way to integrate inventory and
server status information into the in-house application.
To review topics related to this question, refer to “HPE REST API” in Chapter 9.
5. C is correct. The m710p cartridge has a powerful GPU and is designed to provide high
performance for rich media hosted desktop solutions.
A and D are incorrect because those modules do not provide GPUs and are not recommended
for hosted desktop solutions.
B is incorrect. The m700 cartridges are suitable for some hosted desktop solutions and has the
advantage of providing a higher density (four desktops per cartridge). However, this customer
has indicated that performance is the most important consideration. Therefore, the m710p is the
better choice.
To review topics related to this question, refer to “Mobile workspace” in Chapter 7.
6. B is correct. The r2600 allocates a fixed number of local drives to each server,whereasthe
r2800 chassis allows flexible mapping of any number of drives to any server.
A is incorrect. Both chassis support the same number of servers.
C is incorrect. Both chassis support the same total number of drives.
D is incorrect. Neither chassis aggregates server connections.
To review topics related to this question, refer to “HPE Apollo 2000 and 6000 architecture” in
Chapter 4.
7. A is correct. Integrity Superdome X solutions provide the scale-up approach that transactional
databases require. The nPartition technology allows customers to hard partition the system. The
customer saves money by licensing only for the cores in the nPartition.
B is incorrect. Database vendors such as Oracle and Microsoft do not accept soft partitioning,
such as VMware ESXi. When the system uses soft partitioning, the database must still be
licensed for all cores on the system. This is one benefit of nPartitioning that architects should
communicate to the customer.
C and D are incorrect. A scale-out solution such as Moonshot does not meet the needs for
transactional databases.
To review topics related to this question, refer “HPE Integrity Superdome X solution
architecture” in Chapter 8 and “Architecture for data-driven organizations” in Chapter 9.
8. B is correct. HPE OneView for vCenter integrates a OneView Dashboard into vCenter
Operations Manager, helping to improve resource monitoring and troubleshooting.
A and C are incorrect. HPE OneView for vCenter does not integrate storage management or
deployment of StoreOnce into vCenter.
D is incorrect. For a self-service portal for deploying cloud workloads, the customer requires
an HPE Helion CloudSystem Enterprise solution.
This item tests whether you have the required knowledge from prerequisite training, including
Architecting HP Server Solutions (ASE-level training). You shou