You are on page 1of 16

89 Fifth Avenue, 7th Floor

New York, NY 10003

www.TheEdison.com

@EdisonGroupInc

212.367.7400

White Paper

Cray XC Series Supercomputer


Accelerates CVA Performance in
Addressing Counterparty Risk
Printed in the United States of America

Copyright 2015 Edison Group, Inc. New York.

Edison Group offers no warranty either expressed or implied on the information contained
herein and shall be held harmless for errors resulting from its use.

All products are trademarks of their respective owners.

First Publication: December 2015

Produced by: Matthew Elkourie, Analyst; Manny Frishberg, Editor; Barry Cohen, Editor-in-Chief
Table of Contents

Executive Summary ..................................................................................................................... 1

Proactively Solving Counterparty Risk with CVA ................................................................ 2


Computational Requirements for CVA-Type Simulations ................................................. 2

Meeting CVA Computing Requirements with the Cray XC Series Supercomputer 4


System Design: Built Around Application Performance .................................................... 4

Cray Aries Interconnect ........................................................................................................... 5

Cray DataWarp Applications I/O Accelerator ................................................................ 6


Cray Ecosystem ......................................................................................................................... 8

Total Cost of Ownership .......................................................................................................... 9


Enhancing ROI: Processing Capability Powered by Intel ................................................. 10

Risk Workloads on Apache Spark ...................................................................................... 12


Conclusion................................................................................................................................... 13
Executive Summary

Credit Valuation Adjustment (CVA) evaluation activity in financial institutions has seen
an increase in demand and complexity, which is driving the need for improved
infrastructures.

One driver for this is the increase in regulation and reporting requirements since the
credit crisis of 2008. To mitigate the impact, banks must try to decrease the cost of
compliance while using required regulatory calculations to better monitor and manage
risk within the firm. CVA is an example of this: Whole portfolio CVA is done as a
compliance mandate, but many firms leverage this not only to manage risk but also to
actively internally trade CVA.

Additionally, in only a few short years management of counterparty credit risk (CCR)
has shifted from passive to more active and continuous management requiring CVA.
Like other valuation adjustments, active management of CVA needs to be considered at
two levels:
First, every credit derivative deal needs to have an accurate assessment of CVA to
price it fairly
Second, the management of the CVA of the banks whole portfolio is critically
important in answering questions such as:
How volatile is it?
How sensitive is it to varying market factors?
How can it be hedged?
What capital buffers and charges are appropriate?
From a computational perspective the first consideration is more latency sensitive with
near real-time calculation, and the second is a problem of vast scale. Both present
extreme challenges to traditional scale-out grid infrastructures, which are already under
pressure with computing demand from normal risk workloads and CVA. When
United States Federal Reserve CCAR stress tests are added, along with CVA sensitivity
analyses, the growth in core count can be explosive.

To keep up and stay competitive, firms are thus forced to reevaluate their grid
infrastructure strategies and are investigating high performance computing (HPC)
solutions that can more cost-effectively provide better performance than the traditional
grid solutions. We believe that the Cray XC series supercomputer gives firms an
opportunity to significantly flatten the cost curve of core expansion by reducing TCO
and enabling development agility.

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 1
Proactively Solving Counterparty Risk with CVA

Before looking at infrastructure solutions that can best accelerate CVA evaluation
performance, it is useful to review the importance of CVA and how it is used in financial
institutions.

CVA, defined as the market value of counterparty credit risk, was until recently used
primarily as a passive insurance-style evaluation. Typically, large banks would use CVA
on a monthly basis to correct derivatives positions, ensuring counterparty risk was
evaluated. Where previously firms applied credit limits and measures to limit possible
exposure to counterparties, CVA facilitated a means to further evaluate future positions
in order to more accurately predict and manage such exposure.

Markets, and the credit partners or customers of financial institutions, are increasingly
becoming accessible globally. And recent shocks to the credit market caused by
unexpected crises have resulted in a rapid shift of active management of CVA from a
nice-to-have to a requirement in business transactions, in order to minimize risk.

Where CVA previously ran on a monthly schedule, under active management strategies
CVA calculations are now often performed in daily, intra-day, and in real-time (sub-
second) intervals. CVA is most efficiently managed by centralizing the function for
optimal netting and co-simulation. This centralized CVA desk then arranges per-deal
transfer pricing to derivatives transactions and can manage its own P&L. As a result,
firms can more easily manage the whole portfolio, its volatility, and the correlation
between market risk factors.

The scale of computing required for traditional derivative valuation is far less than that
required for CVA, which in turn is far less than that required for sensitivity analysis for
the whole portfolio. The Basel III regulatory framework already incorporates CVA by
establishing a minimum capital charge to capture the potential market-to-market losses
faced by a bank from the deterioration in counterpartys credit worthiness. The Basel
committee is looking to expand this to take into account the variability arising from
daily changes in market risk factors, further increasing the CVA workload.

Computational Requirements for CVA-Type Simulations

A Monte Carlo simulation, of which CVA is a critical component, provides a means for
pricing and measuring counterparty risk as part of a CCR strategy. The Monte Carlo
method is a computerized mathematical technique that allows institutions to account for

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 2
risk in quantitative analysis and decision making. With financial institutions facing
increasing numbers of credit decisions of varying complexity, the ability to leverage a
Monte Carlo simulation to view probabilistic results of a credit decision over many
possible outcome paths is a powerful means to account for, and assess, credit risk.

The ability to execute decisions quickly yet accurately largely depends on how quickly
Monte Carlo simulations can be performed. The capabilities and benefits of HPC
infrastructures are well aligned to accomplishing this goal.

For example, consider typical Monte Carlo simulations like those used in Value At Risk
(VAR) or derivative pricing, where simulations are partitioned perfectly with
independent segments each given a compute core. The problem when performing whole
portfolio CVA is that the complexity and sheer scale of data can be far too large for the
memory on any one node. The calculation is not independent but mutually
interdependent, yet it still must partition across some number of cores and will require
results calculated on different nodes. This can significantly decrease throughput in a
typical scale out grid architecture; while on-node performance might be scaled for
increasingly complex CVA calculations, bottlenecks like connectivity and latency to
other nodes are compounded with increased core counts.

The result is that these systems often do not scale sufficiently for users needing rapid
results.

As analytics and further financial instrument treatment may depend on the results of
many problem partitions, CVA needs a better way to scale and deliver ever faster results
more frequently. As examined in this white paper, the Cray XC series supercomputer
with Intel processors is a well-suited delivery platform to meet todays CVA scaling
challenges.

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 3
Meeting CVA Computing Requirements with the
Cray XC Series Supercomputer

The reader will recall examining CVA as a critical path component to evaluating risk,
requiring ever faster results for ever more complex analytics. Faced with such increased
demands for computing capabilities, institutions have traditionally taken either a scale-
out or scale-up approach:

The scale-out approach amounts to simply adding more compute, storage, and
networking as separate grid nodes or cloud computing resources. This has
historically been considered an adequate solution. However, as shown previously,
this does not add up to linear performance increases for complex problem-solving
challenges like CVA.

In a scale-up approach, financial institutions add more cores, RAM, etc., to a single
compute node. This too does not solve the resultant issue of increased system latency
and inter-system communications across and to other compute nodes. The issue
might be compared to moving increasing volumes through a pipe at some point it
just becomes too expensive to build pipes of ever-increasing diameter.

The Cray XC supercomputer addresses CVA compute scaling issues, either scale-up or
scale-out, in a number of ways. Lets identify and examine these challenges and see how
the Cray XC supercomputer tackles them.

System Design: Built Around Application Performance

Conventional grid clusters are built by aggregating compute nodes connected over
10GbE or InfiniBand along with a shared NAS disk subsystem. The primary design
criterion is simple replication, and as a result these architectures (which were designed
for small jobs) struggle when applications demand greater portions of the resource pool.

Contrast this with a Cray XC40 supercomputer, where the primary design criterion is
sustained performance of the users application. The XC40 system is specifically
designed to run the most challenging of workloads and provides performance
efficiencies at all scales from large throughput of small jobs to the very largest and
most complex applications. This is achieved through a rigorous approach to
performance efficiency and reliability throughout the hardware and software stack, as
well as Crays own unique interconnect technology, Aries (described in detail later in
this paper). The result is a highly integrated system that meets the requirements of
applications such as CVA and provides true economies of scale.

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 4
Why is this important?

While we will expand on the benefits of the Cray XC series in the next section, it is
worthwhile here to point out some of the high-level TCO and workload benefits that this
differentiated approach yields.

In todays application ecosystem, there are two primary constraints. One is data
movement: Organizations are grappling with and taking advantage of ever-increasing
volumes of data. Network bottlenecks can bring even the most powerful grid clusters
down to a snails pace, and having a plethora of cores and memory does little good
when the data is inaccessible or slow to access across nodes. The other constraint is the
ability for applications to scale and use significant portions of the system. These two
constraints, of course, go hand in hand as organizations need to leverage the data and
run applications of greater value to compete in todays marketplace.

How are these challenges addressed?

The Cray XC series uniquely solves these challenges in several ways. It was designed to
rank among the worlds most powerful and scalable computer systems, providing
efficient data movement and computation through balanced performance between
processors, memory, interconnect, and I/O. The Cray XC40 supercomputer is designed
around leading Intel microprocessor technology, the Cray custom-designed Aries
interconnect, Crays DataWarp technology and a highly optimized system software
stack. Developing applications using the Cray Programming Environment, which is
tightly integrated with the Cray XC40 system, yields high levels of sustained
performance and reliability.

In addition, Cray systems are both highly modular and upgradeable. The design of the
XC system allows the incorporation of new-generation Intel processors, including Xeon
Phi, while still delivering superior sustained performance on real applications. The
customers investment in system infrastructure is preserved for several years. The
infrastructure of Cray systems is designed from the ground up to optimize total lifecycle
costs by providing upgradability across several generations of technologies and
maximizing power and cooling efficiencies.

Cray Aries Interconnect

Crays industry-leading Aries interconnect enables the efficient performance of


increasingly challenging whole portfolio CVA applications. The Aries interconnect
network is packet-switched and features adaptive routing that avoids network

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 5
congestion and failures. Aries focuses on the provision of cost-effective, scalable, global
bandwidth that is very high performing and resilient to workload variations and job
placement.

A key aspect of the Cray XC series is its provision of an Aries-based network based on a
Dragonfly network topology that addresses performance and latency issues found in
more traditional fat tree approaches. To do so, the XC40 system and the integrated
Aries interconnect incorporate a three-tier design for interconnecting nodes: an electrical
backplane, copper cables, and optical fiber. Compared to traditional fat tree InfiniBand
networks, the Aries interconnect and Dragonfly topology provide a two-to-one
advantage, where each packet takes one optical hop at most, compared to two in a
typical fat tree network topology.

The ability to leverage vastly increased bandwidth while avoiding the latency hit
incurred in fat trees from multiple network hops is itself a dramatic leap forward
in inter-node network performance.

Integrating the Aries interconnect on each blade enhances both the injection bandwidth
and the systems global bandwidth while minimizing latency and with no external
switches there are no disproportionate additional costs in expansion of the network.
This technology is one of the reasons Cray dominates the list of Top100 global
supercomputers.

More information on the design of the Aries interconnect and Dragonfly topology in the
XC series is available for users wanting a deeper dive into the technical advantages Cray
has to offer1.

Cray DataWarp Applications I/O Accelerator

The Cray DataWarp applications I/O accelerator is a shared burst buffer filespace that is
present in specially designed I/O blades and connected to compute nodes via the speedy
Aries interconnect. Programmatically or transparently, the DataWarp burst buffers can
be dynamically allocated on a per-job basis. This means that not only can users avoid
purchasing SSDs for every node, they can simply purchase a few DataWarp blades
sufficient to supply the volume and bandwidth the system needs.

1 http://www.cray.com/sites/default/files/resources/CrayXCNetwork.pdf

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 6
The DataWarp file system is extremely effective for sharing initial conditions and
intermediate results. No longer do users need a massive and slow initial condition
broadcast to all nodes something common in commodity clusters; all thats required
is to send files to a handful of DataWarp blades. This can significantly impact the wall
clock time of a job, and the sharing makes it easy to keep initial conditions consistent.

DataWarp technology provides a bridge to enabling faster data access by avoiding


latency and throughput penalties incurred by nodes seeking data hosted on disk-based
storage arrays. Data access bottlenecks are addressed by bringing CVA I/O closer to
compute resources and by providing faster access to data contained on the Cray
DataWarp infrastructure.

For most grid workloads, where each node shares many of the same files, this typically
means there is a long initial step. Files are broadcast over the network to on-node disk or
SSDs. With Aries and DataWarp technology it takes far less network time and
bandwidth to transfer to a few shared DataWarp I/O blades. Compared to a
conventional HPC cluster, the DataWarp approach also frees users from having to
individually and statically provision all nodes with large SSD capacity at the high
watermark of usage.

More information on Cray DataWarp technology can be found on the Cray website2.

2 http://www.cray.com/sites/default/files/resources/CrayXC40-DataWarp.pdf

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 7
Figure 1: Architectural Comparison Commodity Hardware vs. Cray XC System with Aries Interconnect
and DataWarp Technology

Cray Ecosystem
The Cray XC series brings many performance benefits to the table for institutional users
demanding the utmost in system performance and scalability. In addition to the design
focus on core infrastructure capability, Cray also provides a fully integrated software
environment to users and administrators which increases manageability and thus
productivity.

While the features are too many to list in a brief white paper, it is important to note that
institutional users evaluate critical parameters other than just core infrastructure in

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 8
purchase decisions. Having the fastest, most powerful system available is of little use
without a strong supporting cast of elements.

The Cray XC system comes with a complete, production-ready software environment


known as Cray Linux Environment (CLE), which includes a Linux-based OS
distribution, system management tools, job scheduler, high performance file system,
hardware supervisory system and energy monitoring system. CLE provides a single
cohesive environment for application developers and users. A key aspect of CLE is that
Cray XC compute nodes run a lightweight version of Linux, a key ingredient to
application scalability and runtime repeatability.

Financial systems administrators benefit from Crays fully integrated approach to


systems administration, monitoring, health, and maintenance. Rather than needing to
maintain a multitalented support engineer staffing pool, XC system administrators
benefit from its holistic approach to integrated systems. They remain fully aware of
system performance and can access the entire ecosystem via Cray systems tools.

In addition, the Cray operating system and supporting software stack are fully
integrated at the system level, alleviating the need to find or procure additional software
elements key to running and supporting a Cray-based HPC environment. Because
software is fully integrated into the Cray stack, administrators spend less time, for
example, finding and fixing software libraries and drivers. Administrators spend less
time on management or scheduling downtime between jobs, and more time supporting
users.

Total Cost of Ownership


The Cray XC system delivers a substantial reduction in total cost of ownership (TCO)
when compared with commodity hardware:

Crays emphasis on application sustained performance ensures faster turn-around


times and peak performance with less hardware.

The embedded Aries network means no separate purchase, installation, and support
of all that additional inter-node networking equipment. This is a huge savings, as the
bandwidth of the Aries interconnect can be configured to match performance
requirements and cost constraints.

With the embedded and shared approach to providing fast SSD capability via
DataWarp technology, files can be striped across DataWarp blades so the blades can
be purchased based on bandwidth requirements. In a typical 1,000-node
configuration, for example, you may need only a handful of DataWarp blades.

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 9
Rather than having to provision nodes individually and statically with large SSD
capacity at the high watermark of usage, the DataWarp accelerator can be
provisioned dynamically on a per-job capability and shared over Aries. This
saves equally large sums on the purchase and support of SSDs in each node, since
the solution can be provisioned for only the throughput users need. Similarly, the
disk in a parallel file system can be sized to just the volume required. This reduces
the network saturating latency at the start of each job broadcasting out common
data.

At a time when data center resources and availability are at a premium, the Cray
architecture is highly efficient in terms of power consumption and draw, floor space
consumption, and cooling requirements. Crays approach creates a very streamlined
node without on-board SSD or disk, which means nodes can be packed more
densely and use less power. Crays power and density are also class-leading.
Because the Cray XC system leverages a tightly integrated package, owners on one
hand receive a highly scalable high performance environment while on the other
hand can take full advantage of reduced overhead on environmental factors like
power, cooling, and space.

Enhancing ROI: Processing Capability Powered by Intel

An integral component in providing a highly scalable, reliable, compatible, and


performance-designed solution, as is demanded by critical users like financial
institutions, lies in the compute capabilities offered in HPC systems. Intel processors
play a key role in the Cray XC series strategy for end users.

The Cray XC40 system, configured with Intel processors, gives institutions several key
advantages. Intel processors are compatible with x86-based software and applications;
rather than having to custom tune or rewrite applications and institutional frameworks
to run on an Intel-based Cray XC40 system, users can get directly to the task at hand,
leveraging the built-in compatibility from day one.

As a globally accepted processor standard, the Intel-based solution gives users the
greatest flexibility in deciding what workloads will run on the cluster without having to
hunt down drivers and software libraries, or having to invest in substantially changing
application frameworks critical to daily business function. In addition, the Intel-based
processor is broadly available in the market. These key factors reduce overall cost of
ownership and time invested in implementation.

Further bolstering the Intel advantage in the Cray XC series, the Intel processors
scalability at the socket level ensures that institutional investments made today continue
Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 10
to pay off as a future-proof investment. As performance requirements continue to
increase, users can take full advantage of socket-level scalability by upgrading only
chips, not entire systems.

An exciting element to the Intel strategy in the Cray XC system comes in the form of
compatibility and support for the Intel Xeon Phi co-processor3. The Intel-based XC
supercomputer, already a powerful workhorse for financial institutions today, can be
significantly boosted in performance and capability with the additional computational
power the Intel Xeon Phi co-processor brings to the table.

Utilizing the Intel Many Integrated Core (MIC) compute architecture with 8GB of
GDDR5 memory on each co-processor card, integrated with the Cray XC40 system, the
high-density form factor Intel Xeon Phi 5120D can deliver over one teraflop of peak
double precision floating point executions per co-processor. Theres no need to re-code
in a new language; code that is optimized for multithreading and vectorization can run
on an Intel Xeon Phi without modification using common frameworks such as OpenMP.
For institutions requiring the utmost performance and scalability, the joint Cray and
Intel solution4 makes perfect sense.

3http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-
coprocessor-brief.html
4 http://www.cray.com/sites/default/files/resources/CrayXC_IntelXeonPhiPDC.pdf

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 11
Risk Workloads on Apache Spark

Most banks are looking to significantly update their risk application suites for many
reasons: intraday and real-time risking is becoming pervasive, CCAR stress tests are
becoming more demanding, and enterprise and whole portfolio risk analytics are
becoming important not only for regulators but also for managing the bank. All these
conditions are putting pressure on IT departments.

Recently, banks have become interested in using high performance analytics platforms
as workbenches for Apache Hadoop5 and Apache Spark6 workloads for data
analytics. With the arrival of Spark in particular, progressive banks are investigating
how to refactor their whole risk application suite onto Spark for its efficiency in
application development and execution.

Typical risk analytics are actually very well-suited to Spark and the Scala language.
Spark is efficient at memory-first, highly parallel analytics with shared files, which in
risk analytics are initial conditions such as trial loss matrices, reference and market data,
which can often exceed 1 terabyte Instead of sending these files to each node, Spark can
be used to share data across the system. The Scala language is also very expressive and
can be easily used to write Monte Carlo applications or describe parallel computations
in just a few lines of code. Spark also increases productivity in that optimized linear
algebra libraries can be accessed via JNI.

The risk suite is subject to substantial modification, some from greater regulatory
demands and some from running the bank more efficiently. Given that the application
suite is subject to substantial change, it makes sense to reconsider the framework to use.
This is where Apache Spark comes in: Sparks development productivity efficiencies can
be leveraged for real-time, intraday and overnight workloads with rapid prototyping.

Because Spark workloads are highly iterative and dependent on latency and bandwidth,
Crays analytics platforms play naturally to the demands of Apache Spark performance-
based workloads.

In addition, as organizations search for ways to incorporate big data to enrich risk
analytics, some firms are looking at graph databases. While these types of analytics are
at the drawing board stage, firms should take a close look at Cray, as it uniquely
provides the most scalable graph database available, converged within Hadoop and
Spark workloads on a single low-TCO platform.

5 https://hadoop.apache.org/
6 http://spark.apache.org/

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 12
Conclusion

Counterparty credit risk modeling has become a focus area for banks risk analytics
since the 2008 financial crisis. The development of CVA and related analytics has been a
key driver of grid growth recently, driven in particular by pre-trade CVA modeling,
CVA trading, whole portfolio CVA, and sensitivity analyses. Workloads of this scale
should force banks to challenge themselves as to whether their current way of managing
risk is the best possible approach.

While Cray uses commodity Intel components to retain application compatibility, XC


systems are built with a unique architecture that has several differentiating capabilities,
including the unique Aries interconnect and DataWarp SSD. As a result, financial
institutions using Cray systems to power CVA and related applications can not only
significantly reduce TCO, but also improve performance and provide more options for
their developers to improve agility.

Whether youre looking to extend traditional grid-style Monte Carlo applications or to


re-factor to a newer technology such as Spark, we believe Crays platform solutions offer
significant advantages for developer agility, performance and cost reduction.

Edison: Cray XC Series Supercomputer and CVA Performance in Addressing Counterparty Risk Page 13