Optimization and Monitoring Perfomance

THE DZONE GUIDE TO
Performance
Optimization & Monitoring
VOLU ME I I I
BROUG HT TO YOU IN PA RT NERSHIP WITH

DZONE.COM/GUIDES DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III
DEAR READE R , TABLE OF CONTE NTS

A lot has changed in the world of performance 3 EXECUTIVE SUMMARY BY MATT WERNER
tuning and monitoring over the years. Gone are the

4 KEY RESEARCH FINDINGS BY G. RYAN SPAIN
days where we have a handful of languages that
compile directly down to machine code, optimized 8 WHERE DOES DEEP API MONITORING FIT INTO APM? BY DENIS
GOODWIN
and tuned with simple tools and print statements
by directly monitoring a single computer system. 12 REAL WORLD PERFORMANCE AND THE FUTURE OF JAVASCRIPT
Todays environments are complex, and typically BENCHMARKING BY BENEDIKT MEURER
involve loosely coupled architectures rife with

16 WRITING 600 TIMES FASTER BY OREN EINI
multipledependenciesdistributedover a virtual cloud
infrastructure, making it difficult to pinpoint the 19 METRIC OR LOG ANALYTICS: CHECKLIST BY STELA UDOVICIC
bottlenecks in the system.

22 UNDERSTANDING AND MONITORING DEPENDENCIES IN CLOUD
APPLICATIONS BY NICK KEPHART
How do you navigate these systems to make sure users
24 WHAT AILS YOUR APPLICATION? INFOGRAPHIC
have a great experience? Just like Alice, you often feel
like you are heading down the rabbit hole, uncovering 26 REINVENTING PERFORMANCE TESTING BY ALEX PODELKO
layer after layer of complexity and discovering that there
30 6 COMMON API MISTAKES BY HEITOR TASHIRO SERGENT
are many places a bottleneck can be hiding.
33 DIVING DEEPER INTO PERFORMANCE
The good news is that there are many passionatepeople
36 EXECUTIVE INSIGHTS: PERFORMANCE OPTIMIZATION AND
out there working to make these systems easier to
MONITORING BY TOM SMITH
profile and optimize which has led to improved user
experiences, less outages, and happier users. 40 THE SMOKE AND MIRRORS OF UX VS. APPLICATION
PERFORMANCE BY OMED HABIB
In the 2017 DZone Guide to Performance: Optimization and 43 PERFORMANCE SOLUTIONS DIRECTORY
Monitoring, we will be sharing with you the exciting
48 GLOSSARY
work being done out there, and we have asked experts
in the field to bring you the latest tools and techniques
PRODUCTION EDITORIAL BUSINESS
to help you break through those performance barriers CAITLIN CANDELMO RICK ROSS
CHRIS SMITH
and unleash the potential of your application. In this DIRECTOR OF PRODUCTION DIRECTOR OF CONTENT + CEO
COMMUNITY
issue, we will explore the best practices for load testing ANDRE POWELL MATT SCHMIDT
SR. PRODUCTION COORDINATOR MATT WERNER PRESIDENT & CTO
in the cloud, look at several profiling tools to help tune CONTENT + COMMUNITY MANAGER JESSE DAVIS
G. RYAN SPAIN
JavaScript applications, and cover strategies for tuning PRODUCTION PUBLICATIONS EDITOR MICHAEL THARRINGTON
EVP & COO
and monitoring cloud based systems, monitoring CONTENT + COMMUNITY MANAGER MATT OBRIAN
ASHLEY SLATE DIRECTOR OF BUSINESS
services, infrastructure, and APIs. We interviewed 12 DESIGN DIRECTOR MIKE GATES DEVELOPMENT
SR. CONTENT COORDINATOR sales@dzone.com
executives, and received over 470 responses on this MARKETING ALEX CRAFTS
SARAH DAVIS
years survey. The interviewees and respondents alike KELLET ATKINSON CONTENT COORDINATOR
DIRECTOR OF MAJOR ACCOUNTS
DIRECTOR OF MARKETING
shared their insights into the tools they prefer, biggest JIM HOWARD
TOM SMITH SR ACCOUNT EXECUTIVE
LAUREN CURATOLA RESEARCH ANALYST
areas for improvement, and what skills developers MARKETING SPECIALIST JIM DYER
need to optimize performance and monitoring. They JORDAN BAKER ACCOUNT EXECUTIVE
KRISTEN PAGN CONTENT COORDINATOR
also provided information on common application MARKETING SPECIALIST ANDREW BARKER
ACCOUNT EXECUTIVE
performance issues, which are illustrated in our NATALIE IANNELLO
CHRIS BRUMFIELD
MARKETING SPECIALIST
infographic, What Ails Your Applications, on page 24. SR. ACCOUNT MANAGER
MIRANDA CASEY ANA JONES
MARKETING SPECIALIST ACCOUNT MANAGER
DZone is the knowledge-sharing company, and we
are excited to bring you the latest volume of our WANT YOUR SOLUTION TO BE FEATURED IN SPECIAL THANKS
COMING GUIDES? to our topic experts,
Performance and Monitoring Guide, we hope you love it. Please contact research@dzone.com for submission Zone Leaders, trusted
Give it a read and let us know what you think. information. DZone Most Valuable
Bloggers, and dedicated
LIKE TO CONTRIBUTE CONTENT TO COMING GUIDES?
users for all their
BY JESSE DAVIS Please contact research@dzone.com for consideration.
help and feedback in
CHIEF OPERATING OFFICER, DZONE, INC. INTERESTED IN BECOMING A DZONE RESEARCH making this guide a
PARTNER?
RESEARCH@DZONE.COM great success.
Please contact sales@dzone.com for information.
2 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

Executive
Eini and his team optimized the RavenDB database, check
page 16.
DESIGNING FOR PARALLEL EXECUTION HASNT
Summary
CAUGHT ON YET
DATA The number of DZone users who design programs for
parallel execution increased 1% over last years survey to 44%.
Of the parallel execution design techniques, load balancing
was the most used at 68%. Multithreading is the most popular
BY M AT T W E R N E R parallel programming model at 72%.
C O N T E N T A N D C O M M U N I T Y M A N AG E R , DZO N E
IMPLICATIONS Load balancing continues to be an important

part of running applications in production. Parallel execution
Back in the 1990s, it could take minutes to load a
was also seen as crucial for embedded apps and high-risk
web page, while today it typically takes seconds. software by 54% of survey respondents, since failure of these
However, as Google discovered in 2012, even 400 applications can lead to the loss of life. The small amount of
milliseconds can be considered to be too slow for growth between last year and this year seems to indicate that
users, which may cause them to bounce or complain redesigning existing applications or spending time to design
new applications for parallel execution seems to be less of a
to the owners of the site theyre trying to visit.
priority for apps where it is not seen as a crucial feature.
Imagine how they might react when something
in the application code stops them from instantly RECOMMENDATIONS Development and operations teams need
accessing the information they need by a factor of to invest more into load balancing in order to reduce the strain
seconds or even minutes. As developers are being on your servers as traffic comes in. Fewer constraints on your
expected to optimize their applications for security resources means you should encounter fewer performance
and rapid changes through DevOps methodologies, problems related to load and traffic. For high-risk or business-
critical software, consider adopting parallel execution to
they are also starting to become more involved in
improve speed when it absolutely counts. Developers working
application performance optimization. To monitor
on applications outside of these fields should consider these
the state of the industry regarding performance techniques and models. Not only will it be useful experience, but
optimization, DZone surveyed 471 tech professionals it also potentially pay off in application speeds and happy users.
to discover how they prepared for performance
issues and how they dealt with them. TOOLING MATTERS
DATA 64% of respondents reported that they use between 1
LOCKING EVERYTHING DOWN and 4 performance monitoring tools. The three most popular
DATA 46% of respondents build performance optimization tools are Nagios (33%), LogStash (27%), and AWS CloudWatch
into the development process. Of those, 30% were likely to find (21%). Those who use monitoring tools are 7% less likely to
frequent code issues compared to 38% of developers who build discover problems through communication with users and
functionality first, then optimize for performance. Those who support tickets than those who do not use them, and were 12%
bake performance into the SDLC solve performance issues 35 less likely to accidentally encounter performance issues.
hours faster on average than those who do not.
IMPLICATIONS LogStash and CloudWatch have both made
IMPLICATIONS Incorporating performance optimization from large jumps in popularity since 2016 (5% and 6%, respectively),
the start of a project can drastically reduce headaches in the suggesting that more developers and organizations are
long term, and can make it easier to get users back on track adopting monitoring tools. These tools have proven their
as soon as possible. In addition to being faster, there are likely usefulness by helping to pinpoint performance issues before
to be fewer failures in the first place, leading to saved cost anyone notices or encounters it while using the application.
in development time, so more resources can be allocated to
different projects rather than stuck doing maintenance. RECOMMENDATIONS The only thing better than quickly fixing
a performance issue for a user is to fix it before the user can
RECOMMENDATIONS Developers need to start learning about find it. Monitoring tools are becoming critically important
the best ways to optimize their applications from day one. for maintaining applications. In addition to monitoring tools,
The extra time it may take to do this right from the start will we found that those who use multiple methods or tools like
be worth it in saved time and costs from fixing problems application logs to find the root cause of performance problems
down the road. Leadership teams should also be educated will find the root cause of an issue faster than those who dont
by performance optimization experts and project managers use monitoring tools, or only use one tool or method. For more
on the long-term benefits of incorporating performance detail, consult the Key Research Findings on the following page
optimization early. For an interesting case study on how Oren or Denis Goodwins article on API monitoring on page 8.

STARTING OFF
Key
54% of this years survey respondents said they worry
about application performance only after they have
built application functionality, a response similar
Research
to the results of DZones 2016 Performance and
Monitoring survey. However, the frequency with which
respondents claimed to experience certain application
performance issues was positively impacted by building
Findings
performance into the application first. For example, the
most frequent area for performance issues in this years
survey was application code, with 35% of respondents
saying they have frequent issues with this part of
BY G . R YA N S PA I N their technology stack. On average, respondents who
P R O D U C T I O N C O O R D I N ATO R , DZO N E said they build performance in from the beginning
of their application were 30% likely to find frequent
performance issues in their application code, as
471 respondents completed our 2017 Performance
opposed to 38% of respondents who worry about
and Monitoring Survey. The demographics of the
performance after functionality. Likewise, those who
survey respondents include:
said they generally considered application performance
24% of respondents work at organizations with at from the beginning were able to solve performance
least 10,000 employees; 18% work at organizations issues 35 hours faster, on average, than those who did
not (187 hours compared to 222). Of course, focusing too
between 1,000 and 10,000; and 19% work at
much on performance from the outset of a project can
organizations between 100 and 1,000.
lead to unnecessarily lengthy design and development
37% of respondents work at organizations in times, but having an idea of how performance will fit
Europe, and 29% work at organizations in the US. into an application from the start can save headaches
later on in the SDLC.
Respondents had 15 years of experience as an IT
professional on average; 29% had 20 years or more
WHAT TOOLS DOES YOUR TEAM COMMONLY USE TO FIND
of experience. ROOT CAUSE FOR APPLICATION PERFORMANCE PROBLEMS?
30% of respondents identify as developers or 100

engineers; 21% as developer team leads; and 20%
as software architects.
89
83% of respondents work at organizations that 80
use Java, and 79% work at organizations using

JavaScript (45% only using client-side, 3% only 69
using server-side, and 31% using both). 60
61 60
WHEN WAS THE LAST TIME YOU HAD TO SOLVE A

48 50
PERFORMANCE PROBLEM IN YOUR SOFTWARE? 40 44
THIS WEEK 36
3%
IN THE LAST TWO WEEKS
5% 20
7% 21
THIS MONTH
28% 14 4
IN THE LAST 3 MONTHS
13%
IN THE LAST 5 MONTHS LANGUAGE'S APPLICATION DATABASE
PROFILERS
BUILT-IN TOOLING LOGS LOGS
14% IN THE PAST YEAR
16% MEMORY DUMP THREAD DUMP OS COMMAND
DEBUGGERS
13% OVER A YEAR AGO
ANALYZERS ANALYZERS LINE TOOLING
ASKED FOR
APM TOOLS OTHER
NEVER EXTERNAL HELP

KEEPING AN EYE OUT and interpreting various metrics took the most time.
The majority of respondents (64%) said they use Respondents said they use a number of different tools
between 1 and 4 performance monitoring tools. The in order to search for the root cause of performance
most popular monitoring tools were Nagios, used by issues. The most popular of these methods included
33% of respondents organizations, and LogStash, used application logs (89%), database logs (69%), profilers
by 27%. Both LogStash and Amazons CloudWatch saw (61%), and debuggers (60%). Individually, none of
significant growth from last years results, with LogStash these tools had an impact on how time-consuming
growing 5% and CloudWatch growing 6% to 21%, making respondents found root cause discovery; however,
it this years third most popular performance monitoring respondents using more of these tools together were
tool. Increased usage of monitoring tools decreased the increasingly less likely to find root cause discovery
average estimated amount of discovering performance time consuming until peaking at 6 tools (because of a
issues through user support emails/social media or sample size of less than 1%, responses showing 0 tools
through dumb luck; respondents whose organizations used were not considered in this analysis).
use 3 or 4 monitoring tools were 7% less likely to find
out about performance problems from users than those SPLITTING THE LOAD
The usage of parallel execution in application design has
who used none (17% vs. 24%), and were 12% less likely to
not taken off much since last year. 44% of respondents
accidentally stumble upon performance issues through
this year said they regularly design programs for parallel
dumb luck (11% vs. 23%). The most popular types of
execution, only 1% higher than last year. The tools
monitoring were real user monitoring (34%) and business
and methods for parallel design hasnt changed much
transaction monitoring (26%).
either; like last year, the ExecutorService framework in
Java is the most frequently used framework/API among
WHATS THE PROBLEM? respondents, with 50% of those who design for parallel
Much like last year, finding the root cause of an execution regularly using this framework often. Also,
issue was found to be the most time-consuming load balancing is again the most popularly used parallel
part of fixing performance-related problems. 52% of algorithm design technique used, with 68% of parallel
respondents ranked this as the most time consuming, execution designers using this often. And multithreading
followed by 25% of respondents who said collecting is at the top of the list for parallel programming
models, with 72% of this subset of respondents using
multithreading often. The choice to design for parallel
WHICH OF THE FOLLOWING PERFORMANCE TESTS AND/OR
MONITORING TYPES DOES YOUR ORGANIZATION USE? execution in an application can be affected by multiple
factors. For instance, the type of application being
70 designed may increase the need for parallel execution;
67 respondents who said they build embedded services
65 or high-risk software (i.e. software in which failure
60
could lead to significant financial loss or loss of life)
55 were much more likely to regularly design for parallel
50 53
execution, with over half of these respondents (54%
each) answering this question positively.
40 42
30 34 HOW DO YOU PRIORITIZE PERFORMANCE IN YOUR

APPLICATION DEVELOPMENT PROCESS?
26
20
17
15
10 12 13
8 BUILD PERFORMANCE
1 6 INTO THE APPLICATION
FROM THE START
WEBSITE SPEED
TESTS
REAL USER
MONITORING
SYNTHETIC
MONITORING
STRESS
TESTS
54% 46%
BUILD APPLICATION
FUNCTIONALITY FIRST,
LOG MGMT./ BOTTOM UP THEN WORRY ABOUT
LOAD TESTS SMOKE TESTS
ANALYSIS MONITORING
PERFORMANCE
TRANSACTION BUSINESS TRANS. ADDM APPLICATION

PATH SNAPSHOTS MONITORING COMPONENT
DEEP DIVE
NONE SPECIFIC MONITORING
OTHER
TO PERFORMANCE

SPONSORED OPINION DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III
you arent sure how frequently you need to test, make sure
you invest in tools that support flexible, consumable licensing
Performance Testing models until you establish your necessary cadence. This will
keep your team from overinvesting in performance and save
in an Agile World
your constrained budget for more pressing issues.
WHERE DOES OPEN SOURCE FIT IN?

Open source tools can be an important part of the
development process for any new application! Tools such
SHOULD I LEFT SHIFT PERFORMANCE TESTING? as JMeter, Gatling, and Selenium are particularly effective in
Because testing environments are expensive to setup and the early stages of the development process, when testing
maintain, performance testing is often left to the end of the individual modules or the relationships between two systems.
release cycle. Unfortunately, many of the defects found require Eventually, however, the scope will change: end-to-end, UI
architectural changes to resolve, which can delay releases. You driven testing becomes necessary, higher levels of traffic need
may not need to run performance tests on a daily basis, but to be generated on a global basis, different network conditions
using tools that integrate with your CI/CT toolset is imperative need to be simulated, or the root cause analysis becomes more
to enable testing early in the release cycle. Moreover, tools complex. As you transition to a professional tool that supports
that allow testing from the cloud and leverage dockerized load those needs, make sure to identify tools that will allow you
generators can reduce the overhead traditionally associated to reuse your open source scripts! The time spent developing
with performance testing by up to 75%. those tests is valuable, and new products that need to get to
the market ASAP cant afford to discard that investment.
HOW FREQUENTLY SHOULD I TEST?
Applications that release frequently may require nightly
performance sanity tests and a weekly soak, while others will WRITTEN BY JASON BENFIELD
run periodic spot checks along an extended release cycle. If DIRECTOR PERFORMANCE ENGINEERING PRODUCT MANAGEMENT, HPE
Performance Testing By HPE
LoadRunner, Performance Center and StormRunner Load- a comprehensive product suite for
testing any application, scenario, and protocol. Because performance matters.
CATEGORY NEW RELEASES OPEN SOURCE STRENGTHS

Performance Quarterly Integrations
Enable use of Git, Jenkins, Docker and many other Devops tools
Testing
Built-in root cause analytics with anomaly detection
CASE STUDY Detailed breakdown reports with optimization recommendations

McGraw Hill was using HPE Performance Center for Asset and license sharing within a centralized repository
their Center of Excellence, but digital teams needed to
Public, private, hybrid cloud and on-premise support
test under high, globally distributed loads, coming from
different browsers and mobile devices. They adopted Global, cloud-based load generation to significantly reduce costs
StormRunner Load to execute tests for up to 900,000
Network Visualization to better simulate traffic conditions
simultaneous users. StormRunner Load allowed them to
reduce the cost of running larger tests by eliminating the Over 250,000 WW users
need to build and maintain a local testing environment,
while reusing test scripts built for smaller performance
Combine the power of the three tools with access to the
tests in Performance Center.
industrys largest community of testing practitioners and
theres no performance challenge that cant be beat.
As a result, McGraw Hill continues to use Performance
Center for testing internal applications, while using
NOTABLE CUSTOMERS
StormRunner Load to test web and mobile. The ease
CodeFresh Rockwell Sky
of use of StormRunner Load allows users to seamlessly
Automation
switch between the environments and gain efficiencies. University of Hexaware
Copenhagen JetBlue Technologies
WEBSITE go.saas.hpe.com/performanceengineering TWITTER @HPE_Loadrunner BLOG hpe.com/blog/loadrunner

QUICK VIEW
Where Does Deep 01 Simple endpoint monitoring is not

enough. Only through deep API
monitoring can you ensure your API
journeys are invoked in sequence
with contextual data passed from
API Monitoring Fit

one call to the next.
02 Availability monitoring is also

not enough. Only with deep API
monitoring can you ensure your
sequenced APIs are available, fast,
Into APM?
and functionally correct.
03 Your monitors are only as good as

the underlying monitoring platform.
Without a robust set of scheduling,
playback, and alerting capabilities
deployed across the globe and
inside your firewall, you will not have
the real-time coverage needed to
BY DENIS GOODWIN ensure optimal user experience.
DIRECTOR OF PRODUCT MANAGEMENT FOR ALERTSITE, SMARTBEAR SOFTWARE
APIs are the backbone of modern applications

web and mobile alike. Even though end users IoT WEB BIG DATA REAL-TIME ENTERPRISE
may be oblivious to underlying APIs when

they are using web and mobile applications, YAML
API issues can affect them directly. When WSDL SOAP
APIs fail, one or more critical functions in
JSON HTTP
REST
the application become unavailable. Its even
worse when APIs seem to be working but are
functionally incorrect. That means end users XML
SSL HTML5
can use the application but the actions return CoAP
SWAGGER MQTT
unexpected results. Thats where sophisticated
WEBSOCKETS
API monitoring comes into play. Just checking
an API endpoint for 200 OK is not enough when
you are dealing with chained API transactions, MOBILE DESKTOP/BROWSER IoT EMBEDDED ENTERPRISE SAAS
where each call depends on the results from

previous API calls. Its about time we revise
the definition of APM to include advanced API clear to me from the results that many of the concerns
monitoring functionality. of both API consumers and producers can be addressed
through deep API monitoring. High quality delivery of APIs
APIs are taking the world by a storm. As you strive to deliver is critical to ensuring that the web and mobile applications
world class web, mobile, and SaaS applications, make sure that depend on them are meeting the increasingly high
the APIs that power them are running smoothly. expectations of users. Some highlights of that survey
include the following:
SmartBears The State of API 2016 survey queried both API

The top three concerns of API Consumers are:
consumers and producers on a variety of topics impacting
Ease of use
the API space. The survey had more than 2,000 respondents
Performance
from more than 50 industries and 100 countries. While
the focus of this survey was not API monitoring, it became Service reliability/uptime

The top two measures of success identified by API API monitors should be able to be easily created by
producers are: reusing your existing test cases as the foundation for
Performance your monitors. In addition to freeing up development
Uptime/availability and operational resources from monitor creation
duties, leveraging existing test cases provides the
The top two barriers to solving API issues are: added benefit of allowing monitoring to become a key
Determining root cause part of each version of the application as it progresses
through the development lifecycle. With increased
Isolating the API as the cause of the issue
use of agile development, continuous integration, and
Looking at the results from these three questions, it is clear delivery methodologies, pre-production monitoring
that performance and uptime/reliability are paramount to makes more sense than ever.
both producers and consumers. Adding to the clear need
In addition to the above-mentioned API-specific
for performant and reliable APIs, ease of issue diagnosis
attributes, a monitoring solution will not be successful
also emerged as a key trend. Knowing exactly which API
without robust scheduling, playback, and alerting
endpoint is failing within a chain of successive calls is a
capabilities. You will only reap the benefits of
clear operational need.
proactive monitoring if you are getting accurate alerts
Another key takeaway of the survey is that both API and results in real time.
consumers and producers are in alignment on the key Another important consideration when monitoring
measures of success for API delivery. Since it appears that your APIs is location. Whether around the globe or
all involved agree that the need for deep API monitoring behind your firewall, it is important to monitor where
is obvious, what are the key attributes of a successful API your users or API consumers are. Increasingly, more
monitoring approach? customers are concerned about users spread across
both a mix of physical geographic locations and
internal points of presence inside your infrastructure.
These internal locations can be physical data centers,
private cloud deployments, various office locations, or
AVAILABILITY CORRECTNESS
even vendor and customer locations.
You may be a producer or a consumer of APIs. You may be

PERFORMANCE both and we are increasingly seeing this scenario in our
customer base. In either case, proactive deep API monitoring
is critical to ensuring you are delivering and receiving
excellent performance and reliability from the APIs that you
depend on to run your business. The same holds true for
Just checking if a single API endpoint is available is the operations teams tasked with supporting the APIs and
not enough when your applications depends on APIs. applications that use them. They need to know the health of
Its essential that any monitoring tool also be able to those applications in real time, and they need to be able to
measure and monitor performance and functional quickly resolve issues when they occur. Deep API monitoring
correctness. Deep API monitoring provides three provides the foundation to do this.
lenses of API monitoring availability, performance,
and functional correctness. Knowing with certainty Without deep API monitoring, your customers and users
that the API is available, fast, and fully functional must are the canary in the coal mine informing you far too late of
be the standard going forward. service delivery failures.
API endpoints are not called in isolation by web and

D E N I S G OODW I N is a Director of Product Management at
mobile applications so your monitoring shouldnt rely SmartBear Software and is responsible for the AlertSite synthetic monitoring
on simple endpoint availability checks. A monitoring product line. Denis has been leading the AlertSite product line at SmartBear
for nearly three years and spent the previous five years managing monitoring
platform must be able to execute and monitor chained platforms at other firms. Prior to this 8-year product management stretch, Denis was at
API transactions, where API endpoints are invoked in Fidelity Investments for 10 years in various technical operations and analyst groups that
culminated in him leading a team of analysts that was responsible for measuring and
sequence with contextual data passed from one call managing the user experience of Fidelity Investments web properties. Included in this
to the next. This is how your critical web and mobile groups responsibilities were the managing of Fidelitys internal and vendor monitoring
approach as well as defining and delivering SLA and health reports to Fidelity executive
applications consume those APIs and that is how you level management. When not helping companies provide the best user experience
must monitor them. possible, he can be found hiking in New England with his family.

See Code Impact
in Production
Know first about anomalies in your cloud application in production.
Boost performance proactively. Drive accountability for everyone.
QUERY-DRIVEN ANALYTICS INTELLIGENT ALERTING MADE FOR ENTERPRISES
The most powerful query language The results are higher quality alerts, For serious SaaS and digital
in monitoring running against a anomaly detection, and crucially businesses where performance,
unified, full detail, metrics store in valuable insights that no other reliability, scale and support are
real-time with no limits. monitoring tool can offer. essential to their business.
#1 IN SELF-SERVICE METRICS FOR DEVOPS AND DEVELOPER TEAMS

Find out why Wavefront helps you build and run great software at
wavefront.com
2017 Wavefront, Inc. All rights reserved.

Unlearn You Must Do: and SREs know of anomalies earlier, without exhausting alert
noise. Three, they embrace a culture of sharing dashboards,
5 Tips for Devs to Relearn thus accelerating collaborative learning for everyone. Four,
they use metrics and analytics that go beyond reactive
Monitoring of Code in the Cloud Era methods to proactive methods, allowing them to validate
code optimizations, and fix things well before customers are
impacted. Five, they plan for growth and choose a monitoring
platform that will scale, knowing demand for code metrics and
Yoda understood that change is constant. Dev teams of modern
analytics will only expand across teams.
cloud applications realize the same. Yesterdays static approaches
to knowing code and services behavior must be unlearned.
Companies like Google and Twitter saw that once customers

We moved to self-service metrics for scale.
adopt a cloud service en masse, codebases and infrastructures Now over 1,000 developers use Wavefront
inflate quickly. But while cloud can accommodate the growth of
digital services, engineering teams dont grow at the same pace. dashboards and alerts.
How can modern dev teams avoid becoming victims of success? -VP OF PLATFORM ENGINEERING AT WORKDAY
For hyper-growth SaaS leaders like Lyft, Intuit, and DoorDash,
it meant rethinking monitoring. Unified Metrics Analytics from
Yoda concluded: what defines a great monitoring product is how
Wavefront is their way of the force.
flexible it is, and how easily the engineering team can customize
it to meet specific needs. Unlearn you must do, and embrace
Relearn their five ways here. One, their engineering teams
Unified Metrics Analytics from Wavefront.
unify application and infrastructure metrics in a self-service
platform in order to fully see the impact of code changes,
helping them to iterate faster with stable infrastructure. Two, WRITTEN BY STELA UDOVICIC
they deploy dynamic, analytics-driven alerting so developers SR. DIRECTOR, PRODUCT MARKETING, WAVEFRONT
Cloud Application Metrics & Analytics By Wavefront
Wavefront gives you continuous metrics, analytics, and alerts at enterprise scale for
applications, services, cloud, containers, and infrastructure EVERYTHING.

Cloud Applications Continuous delivery Some components Query-driven Analytics: see code performance
Monitoring anomalies across entire stack.
Smart Alerts: proactively monitor application

CASE STUDY
issues, avoid false positives.
uShip.com is a global market for shipping services. Their DevOps
team was tired of maintenance and manual configuration of Interactive Dashboards: investigate, iterate, and
open-source monitoring tools. The lack of analytics and smart share insight across all teams.
alerting caused too many unplanned incident responses.
Immediately after running Wavefront, SREs saw anomalies Complete API: automate monitoring, integrate
across DevOps functions.
sooner and resolved them faster. Wavefronts Query-driven
Analytics helps uShip engineers push code to production Hyper-scale Custom Metrics: add and vary new
quicker, with continuous insights from their application metrics. code metrics in a snap.
uShip developers use Smart Alerting to automate application
monitoring, thus eliminating repetitive tasks so they can focus
NOTABLE CUSTOMERS
on code development. Shared insights from unified business and
Workday Lyft OKTA
application performance analytics empower execs to make data-
driven decisions in real-time. Box Intuit Groupon
WEBSITE wavefront.com TWITTER @WavefrontHQ BLOG wavefront.com/blog

Real World
QUICK VIEW
01 Web workloads are changing,

performance metrics and
Performance and the

tooling need to be adapted
appropriately.
02 JavaScript engines are focusing

on broadening the fast path
Future of JavaScript
beyond just peak scripting
performance.
03 Whenever possible, modern

JavaScript should be shipped
Benchmarking
to the browser to avoid the
transpiler overhead.
04 Limiting the amount of

JavaScript proportionally to
whats visible on the screen is a
BY BENEDIKT MEURER good strategy.
TECH LEAD OF THE JAVASCRIPT EXECUTION OPTIMIZATION TEAM, GOOGLE
In the last 10 years, an incredible amount of

resources went into speeding up peak performance
of JavaScript engines. This was mostly driven by
peak performance benchmarks like SunSpider
and Octane, and shifted a lot of focus toward the
sophisticated optimizing compilers found in modern
JavaScript engines like Crankshaft in Chrome.
This drove JavaScript peak performance to incredible heights On average, the time spent in executing JavaScript is roughly
in the last two years, but at the same time, we neglected 20%, but more than 40% of the time is spent in just parsing, IC
other aspects of performance like page load time, and we (inline cache) Miss and V8 C++ (the latter of which represent
noticed that it became ever more difficult for developers to the subsystems necessary to support the actual JavaScript
stay on the fine line of great performance. In addition to that, execution, and the slow paths for certain operations that
despite all of these resources dedicated to performance, the are not optimized in V8). Optimizing for Octane might not
user experience on the web seemed to get worse over time provide a lot of benefit for the web. In fact, parsing and
especially page load time on low-end devices. compiling large chunks of JavaScript is one of the main
problems for startup of many web pages nowadays, and
This was a strong indicator that our benchmarks were no
Octane is a really bad proxy for that.
longer a reasonable proxy for the modern web, but rather
turned into a caricature of reality. Looking at Googles Octane Theres another benchmark suite named Speedometer, that
benchmark we see that it spends over 70% of the overall was created by Apple in 2014, which shows a profile that is
execution time running JavaScript code. closer to what actual web pages look like. The benchmark
consists of the popular TodoMVC application implemented in
various web frameworks (i.e. React, Ember, and AngularJS).
Comparing this to profiles we see during startup of some 25

top web pages, we see that those are nowhere near the 70%
JavaScript execution of Octane. They obviously spend a lot of
time in Blink doing layouting and rendering, but also spend a As shown in the profile, the Speedometer benchmark is
significant amount of time in parsing and compiling JavaScript. already a lot closer to what actual web page profiles look

like, yet its still not perfect - it doesnt take into account offering a view into the less obvious places. V8 has a step-
parse time for the score, and it creates 100 todos within a by-step guide on how to use this. For most use cases
few milliseconds, which is not how a user interacts with a though, Id recommend sticking to the Developer Tools,
web page usually. V8s strategy for measuring performance because they offer a more familiar interface and dont
improvements and identifying bottlenecks thus changed expose an overwhelming amount of the Chrome / V8
from using mostly traditional JavaScript benchmark methods internals. But for advanced developers, chrome://tracing
toward using browser benchmarks like Speedometer and also might be the swiss army knife that they were looking for.
tracking real-world performance of web pages.
Looking at the web today, weve discovered that it is
Whats interesting to developers in light of these findings is important to significantly reduce the amount of JavaScript
that the traditional way of deciding whether to use a certain that is shipped to the browser, as we live in a world where
language feature by putting it into some kind of benchmark more and more users consume the web via mobile devices
and running it locally or via some system like jsperf.com that are a lot less powerful than a desktop computer and
might not be ideal for measuring real-world performance. might not even have 3G connectivity.
When following this route, its possible for the developer to
fall into the microbenchmark trap and observe mostly the raw One key observation is that most web developers use
JavaScript execution speedup, without seeing the real overhead ECMAScript 2015 or later for their daily coding already, but
cumulated by the other subsystems of the JavaScript engine for backwards compatibility compile all their programs to
(i.e. parsing, inline caches, slow paths triggered by other parts traditional ECMAScript 5 with so-called transpilers, like
of the application, etc.) that negatively affect a web pages Babel, for example. This can have unexpected impact on
performance. At Chrome, we have been making a lot of the the performance of your application because often the
tooling that supported our findings available to developers via transpilers are not tuned to generate high performance
the Chrome Developer Tools. code. Thus the final code that is shipped might be less
efficient than the original code. But theres also the increase
in code size due to transpilers: The generated code is
usually 200-1000% the size of the original code, which
means the JavaScript engine has up to 10 times the work, in
parsing, compiling, and executing your code.
Since not all browsers support all new language features,

theres a certain period of time where new features require
transpilation. But if you are building a web application today,
consider shipping as much of the original code as possible.
You can now see parsing and compile buckets in the profiler.
An Intranet application with dedicated clients inside the
And, over the last few years, weve introduced another
company, all using some recent browser version, could as
mechanism - called chrome://tracing - which allows you to
well be written and shipped as ES2015 code.
record traces that collect all kinds of events. For example,
you can analyze in detail how much time V8 spends in the A good rule of thumb currently, is to ship amounts of
different parsing steps, and thereby understand whether it
JavaScript proportionally to whats on the screen. Think
might make sense to consider using a tool like optimize-js to
about code splitting from the beginning and design your
mitigate the overhead of pre-parsing when its not beneficial,
web application with progressive enhancement in mind
for example the function is executed immediately anyway.
whenever possible. And independent of what kind of
application you are developing, try to be as declarative as
possible, using appropriate algorithms and data structures;
i.e. if you need a map, use a Map. If it turns out to be slow in
a certain browser, file a bug report. Focus optimization work
on bottlenecks identified via profiling.
BE N E D I KT M E UR E R joined Google in 2013 to work on the V8

JavaScript VM that powers both Node.js and Chrome. He is the tech lead
of the JavaScript Execution Optimization team, focusing on the compiler
architecture and performance of new language features. He contributed to various open
Chrome Tracing provides you with a pretty detailed source projects in the past, including OCaml, Xfce, and NetBSD. In his spare time, hes a
understanding of whats going on performance-wise by father of two, enjoys hiking and biking.

SMARTER, FASTER DIGITAL PERFORMANCE MONITORING
Web Monitoring experience. Not having this visibility will prevent you from
delivering the amazing customer experiences that drive
Just Isnt Good

successful businesses.
Basic web monitoring wont provide any insights
Enough Anymore into how this complex network performs and

affects your end-user experience.
Monitoring tools for websites have been around for more than
Monitoring todays digital infrastructure requires a global node
two decades now. They can tell you whether your site is up or
network that goes everywhere your applications go. It requires
down, how long it takes your webpage to load, and even how
synthetic monitoring to constantly test all the services your
long it takes to complete a multi-step transaction.
applications use, and real user measurement (RUM) to gauge
All of this is done by monitoring the communication layers what your actual end users are experiencing when they use your
of the web, including TCP (Transmission Control Protocol), application. It requires powerful analytics to make sense of all of
and HTTP (Hypertext Transfer Protocol). But todays web the data collected by these tools.
applications rely on so much more than these two protocols.
Gartner calls this type of performance monitoring Digital
They traverse a global network of Internet service providers,
Experience Monitoring, or DEM. It expects that by 2020, 30%
cloud infrastructure providers, content delivery networks,
of global enterprises will have strategically implemented DEM
internal and external domain name services (DNS); they
technologies or services, up from fewer than 5% today. Are you
call on third-party hosts and APIs, and they deploy tags for
utilizing digital experience monitoring?
advertising and personalization services.
Basic web monitoring wont provide any insights into how WRITTEN BY DENNIS CALLAGHAN
this complex network performs and affects your end-user DIRECTOR OF INDUSTRY INNOVATION, CATCHPOINT
Catchpoint
Work smarter. Act faster. Deliver Better.

DEM (Digital Experience 8x Annually No Digital Experience monitoring (DEM) built for
Monitoring) / Performance IT Ops by IT Ops
Monitoring
Only DEM platform to simultaneously capture,
index, and store object-level data
CASE STUDY
In order to provide their users with a consistently excellent customer The industrys most extensive test types
experience, Overstocks operations team must overcome a number of and capabilities
performance obstacles. Some of the biggest challenges are mitigating the
The industrys most comprehensive global
performance impact of the many different third-party marketing tags
node coverage
and advertisements which are hosted on the site, as well as managing
the performance of third-party vendors such as CDNs and DNS providers.
Another challenge lies in the mobile realm, where the need for insight and
NOTABLE CUSTOMERS
optimization of their pages becomes paramount as users deal with obstacles
like bad wireless connections. These challenges become even larger during
Comcast Ask
high-traffic shopping periods like Thanksgiving weekend, which only Google Verizon
increases the need for a robust digital experience intelligence (DEI) solution. Honeywell Overstock.com
Continue reading Catchpoints Overstock.com case study here.
Kate Spade
WEBSITE www.catchpoint.com TWITTER @catchpoint BLOG blog.catchpoint.com

QUICK VIEW
Writing 600
01 By changing our data access
patterns to match what the
hardware is capable of and
optimizing for that, we were able to
use the same overall architecture
but get a performance boost of 600
times for RavenDB.
Times Faster
02 We utilized transaction merging
and compression of the journal
output to perform a large number
of operations in a single transaction,
but only write a fraction of the size
of the data that we manipulate to
the journal
03 We made the transaction commit

an asynchronous process, allowing
processing, compression, and
BY OREN EINI writing to disk to occur concurrently.
CEO, HIBERNATING RHINOS LTD
In my day job, Im building RavenDB, a NoSQL in the database even in the face of a failure. In other
document database. It is a very cool product, but words, if the database reported a successful transaction,
pulling the plug from the machine immediately
not the subject of todays article. Instead, Id like
afterward would not impact the data. Once we restart,
to talk about how we made RavenDB very fast, all the data that we committed is still there and still
the techniques and processes we used to get valid. At the same time, a transaction that was midway
there, as well as the supporting infrastructure. through (not committed to the database) is not going to
be there at all.
During the design process for the next release of RavenDB,
That is a pretty standard requirement for databases, and
we set ourselves a pretty crazy goal. We wanted to get a
most people dont give it a lot of thought, but, as it turns
tenfold performance improvement across the board. Its
out, this is incredibly hard to do in a performant manner.
one thing to say that, but a lot harder to do. Weve had to
The underlying reason is the hardware. We cannot buffer
work quite hard to get there, re-working pretty much every
changes in memory, but instead must physically write
layer in the system to make it better.
those changes to disk. This means we have to flush all of
our changes after every transaction, and that is slow.
Here, well talk about how we improved the write speed
of RavenDB (and more particularly, Voron, which is the
In the graph at the top of the next page, you can see the
low-level storage engine we wrote) by a tremendous
results of running the following code on a variety of disks
margin, while maintaining the same transaction safety
and using various sizes of buffers.
properties. This was neither easy nor fast, but it was a
very rewarding journey.
const int count = 1000;
I chose to write about the write speed improvements
using (var jrnl = File.Create(0001.journal))
because it was a challenging task that doesnt respond {
very well to standard optimization techniques. You cant sp = Stopwatch.StartNew();
just throw more cores at the problemand while you can for (int i = 0; i < count; i++)
buy faster hardware, at some point youll reach the limit. {
jrnl.Write(buffer, 0, buffer.Length);
Before we get to the optimizations we implemented, let
jrnl.Flush(flushToDisk: true);// fsync
me explain what the problem is. }
sp.Stop();
RavenDB is an ACID compliant database, meaning that }
once a transaction has been completed, it should remain

would now join together in a queue, and then travel to the

disk in a group. Basically, each request would prepare all
the information it needed (parsing the request, computing
work, etc.) and then submit the work into a single queue.
A dedicated thread would pull the work from the queue,
apply it to the database, and then write it to the journal.
This means that even though our writes were sequential,

we could still parallelize a lot of the work before we hit the
actual database, and we were also able to send a lot more
transactional work with every trip to the disk.
This single change had a mind-blowing effect. We moved

from an average of 200 writes per second to an average
Several interesting things are shown in this table. A 4 KB
of over 20,000 writes per second. That was two orders of
buffer is 512 times smaller than a 2 MB buffer, but writing
magnitude increase in performance, all because we were
2 MB to disk is only 8 times slower on an SSD, and only
optimizing our I/O pattern.
about 50% slower on an HDD. It seems that the size of the
writes rarely matter. In fact, we tested writes with a size of
64 MB and the results were 1,336.27 ms for HDD and 373.6
ms for SSD.
Databases usually implement their durability guarantees

We moved from an average of 200
using some sort of journal, each transaction writes the
writes per second to an average of
data it modified to the journal, and the commit happens
when we ensure that the data is on stable storage by over 20,000 writes per second. That
calling fsync.
was two orders of magnitude increase
That leads to a lot of performance challenges. Writing to
the disk at random places tends to offer the worst possible in performance, all because we were
performance, so we really want to only write to the
journal in a sequential manner. That, in turn, leads to the optimizing our I/O pattern.
serialization of the access to the journal. The problem is
the disk access is so slow that we are effectively turning
our whole system into one big queue sitting in front of the
disk I/O and waiting and waiting and then waiting
some more. The next thing we tackled was how much we are
writing to the transaction journal. Now that we had
The first version of Voron we built acquired an exclusive merged transactions, they could get pretty fat. It takes
lock whenever it needed to write, and its top speed was 8 milliseconds to write 2 MB and flush it, but it takes
roughly around 200 transactions per second. Note that this 375 milliseconds to write 64 MB and flush it to SSD.
isnt quite saying that it was able to do 200 writes per second. On the other hand, compressing data using LZ4 can
As you can see in the table above, the size of the write isnt be done at a rate of over 625MB/sec. Being a document
really that important, in the grand scheme of things. It is the database, most of the data we place in RavenDB are
number of times that you have to hit the disk. actually JSON documents, which compress extremely
well and quite cheaply.
We call it the bus factor because when you take the bus,
its speed is rarely impacted by how many people are With compression, were able to perform a large number
on board. The bus is the bus, and it will take you to the of operations in a single transaction, but only write a
destination at its own pace. fraction of the size of the data that we manipulate to the
journal. With this technique, it meant that we traded off
These and other observations led us to try using traffic CPU time for I/O, but given the numbers that we see here,
optimization methods to improve our performance. The that was a very wise choice. Even if we spent a whole
very first thing we did was to implement the bus, known 100 milliseconds on compressing the data, we typically
in database jargon as transaction merging. Instead of each reduced it by so much that the saved I/O time is a
transaction traveling alone all the way to the disk, they net benefit.

Those two optimizations (transaction merging and that the disk is actually idle for a non-trivial portion of the
compressing the journal output) have managed to timea crime in high performance circles.
dramatically improve the performance of our system, but
there was still a lot of additional work to be done. At that Instead, we made the transaction commit an asynchronous
point, we pulled a profiler and started going over the code, process. Whenever a transaction is completed, it will start
finding hotspots and improving them one by one. a task to compress its data and write the data to disk. At
the same time, we can already start the next transaction,
The really nice thing about such work is that it is which gives us much more concurrency. In fact, as long as
cumulative. That is, you improve 2% here and 0.5% there, we are waiting for the previous transaction to complete
then suddenly it is like releasing the flood gates and you writing to the disk, we can continue accepting more work
have a 5% increase in performance. A small increase in into the new transaction.
performance in one location can dramatically affect the

whole system utilization.
TX PROCESSING COMPRESSION DISK TX PROCESSING COMPRESSION DISK

A lot of the time this means analyzing what we are doing
and finding ways to either avoid doing it entirely (caching,
different algorithm, etc.) or doing it more efficiently (better
locality of data, more efficient instructions, etc.).
This gives us an additional benefit because the transaction
size is now determined by the actual I/O speed of the
system; we are in effect self-balancing, and well find the
optimal balance of CPU vs. disk work in a short order.
The overall behavior now looks like this:

You cant just throw more cores at
the problemand while you can buy

TX PROCESSING TX PROCESSING TX PROCESSING
faster hardware, at some point youll
reach the limit. COMPRESSION DISK COMPRESSION DISK
And while this change isnt as dramatic as the transaction

But, there was one thing that truly annoyed me. As we
merging one, it did manage to up our performance by a
kept improving things, the cost of compressing the data
total of 45%.
slowly took more and more of our time. It got to the point
where it was the dominating factor in the transaction In total, we were able to reach a maximum amount
commit process. But, at the same time, it had such an of about 120,000 writes per second, counting all
impact on our performance that we couldnt just drop it. optimizations and running at full throttle. Weve been
able to manipulate the nature of physically accessing the
Down below, I have visualized what this looked like. The hardware to merge a lot of individual writes into full bus
green portion is the part that we care aboutactually trips to the disk, and then compress and parallelize the
processing the transaction. rest of the work even further.
The yellow portion is compression, and the red is the actual

OR E N E I N I is CEO of Hibernating Rhinos LTD, an Israeli-
writing to disk. Note that without the compression, the work
based company that produces RavenDB (NoSQL database)
would be completely dominated by the disk, so we cant just
and developer productivity tools for OLTP applications such
drop it. But this type of behavior introduces a lot of jitter into as NHibernate Profiler, Linq to SQL Profiler, Entity Framework
the system. We are processing operations in a transaction, Profiler (efprof.com), and more. Oren is a frequent blogger at
then compressing the data, then writing to disk. That means ayende.com/Blog.

Metric or
To meet critical SLAs and maintain reliability, modern digital enterprises
running applications in the cloud must measure the performance of their
revenue generated essential services, distributed applications, and
Log Analytics infrastructures. For developers, DevOps and TechOps engineers, it can be
confusing to know when to use metrics or log monitoring to isolate code
Checklist
performance anomalies, proactively monitor and baseline their scaled out,
dynamic and distributed applications.
Metrics describe numeric measurements in time. The metric format includes the
measured metric name, the metric data value, the timestamp, the metric source,
and an optional tag. Metrics convey small information bits, much lighter than logs.
Logs, unlike metrics, contain textual information about an event that occurred.
Logs are meant to convey detailed information about the application, user, or
system activity. The primary purpose of logs is troubleshooting a specific issue
after the fact, e.g., code error, exception, security issue, or other. This checklist
BY STELA UDOVICIC
SR. DIRECTOR, PRODUCT MARKETING, WAVEFRONT will help you select the right approach for your environment.
Use metric analytics if you: Use metric and log analytics if you:
Need to continuously measure and get split-second insights Need to process both continuous metric data events and logs.
from your cloud application code performance, business Metrics analytics helps you get the first-pane of glass across
KPIs, and infrastructure metrics at high scale. The almost the entire application stack. Then use log monitoring to deep-
instant insights are essential for digital businesses generating dive into a specific issue to investigate the root-cause after an
revenue from customer-facing applications. issue happened.
Need proactive query-driven smart alerting.

Are concerned with CPU, memory, or storage consumption, in
particular, when you are developing and monitoring complex Implementing DevOps principles and continuous delivery of
distributed applications requiring benchmarking and storing your code.
large code performance data sets. As numeric measurments,
Need to troubleshoot and deep dive into a particular system
metrics can be highly compressed.
such as storage or network, after an issue occurred that
generated a log.
Run many microservices and containers.
Use messaging pipelines for your application monitoring data Use log analytics if you:
including Kafka or others. Need to analyze only unstructured text-based data from your
applications and infrastructure.
Work for an organization that has many developers that need
to collaborate and share metrics analysis and dashboards Can afford application performance data under-sampling and
(such as self-service analytics for coarser monitoring.
engineering teams). Dont need to develop and dont need to run highly distributed
applications that require high scalability.
Need to apply complex processing on your code performance
measurments or business KPI data such as using aggregates, Are developing monolithic applications that typically do not
histograms (distributions), and other mathematical require frequent code updates requiring continuous monitoring.
transformations.
Are not concerned with slower processing of your application
performance data, such as in batch-like processing.

How Should We GET THE VISIBILITY TO LOCATE FAILURE POINTS

Having a full-stack network and application monitoring
Learn from Large-

system in place for both internal and external services
is key to learning the right lessons. To detect outages,
diagnose root cause and quickly resolve issues, find a
Scale Outages?
monitoring solution with:
End-to-end visibility, from source all the way to

destination
OUTAGES EXPOSE CRITICAL VULNERABILITIES Visibility across the network and application stacks,
Its a time for reflection in the tech community, after huge including web, network, routing and device layers, so
numbers of popular and critical applications were rocked by you can correlate data and understand root cause
the recent AWS S3 and Dyn DNS outages.
The ability to share data with providers, team members
and affected users
What can we learn from them? Its true that widely
impactful outages target specific vulnerabilitieslike the
With the right solutions in place, youll have a birds-eye view
lack of redundancy and overdependence on AWS S3 and
of critical applications and the networks that deliver them.
Dynbut these outages also expose and publicize those
Youll be able to rapidly deduce the root cause of issues, keep
same vulnerabilities.
your providers accountable with actionable data, and be
equipped with the knowledge to reinforce your environment
Learn from each major outage (even if you werent affected) against future events.
and adjust your network architecture and monitoring
strategies accordingly. Dealing with outages then becomes
a process of incremental fortification. Your networks will be WRITTEN BY YOUNG XU
strengthened by each outage, and history wont repeat itself. PRODUCT MARKETING ANALYST, THOUSANDEYES
Network & Application Monitoring By ThousandEyes

ThousandEyes delivers powerful insights for the Internet-centric enterprise by
correlating application performance to network behavior.

Network & Application Bi-weekly No SaaS-based solution that provides an unified view
Performance of performance from user to application
Smart, lightweight, active monitoring probes

CASE STUDY deployed across the Internet and your network
Zendesk is a customer service platform that around 60,000
enterprises rely on to foster better customer relationships. Pinpoint network dependencies and perform root
As a SaaS service delivered over the Internet, the perceived cause analysis with intuitive visualizations
performance of Zendesk is heavily dependent on application
Customizable alerts, integrations and API
performance and network quality. We would encounter transform insights into actions
situations where our application was working well but would
still hear customers report slow performance, says Steve Interactive snapshots shared across internal
Loyd, Vice President of Engineering Operations at Zendesk. and external teams to promote collaborative
problem solving
Zendesk uses ThousandEyes to get deep insight into
application delivery that equips the operations team to react NOTABLE CUSTOMERS
quickly to problems. Zendesk now uses ThousandEyes metrics Evernote Twitter Shutterfly
as the ground truth to measure and share SLA metrics with PayPal RichRelevance Wayfair
their customers. Craigslist Avera Health Lyft
WEBSITE www.thousandeyes.com TWITTER @thousandeyes BLOG blog.thousandeyes.com

QUICK VIEW
Understanding 01 Multiple regions can be used to
comprise a single offering, multiple
and Monitoring
services are combined to provide
another cloud product, or both. There
are complex dependencies built into
almost every cloud service on the
market today.
Dependencies in 02 Cloud-based services can be a

cost-effective alternative to building
your own, but each provider
builds them differently. Have a
monitoring strategy to understand
Cloud Applications
the dependencies and foundational
structures of these services.
03 APIs tie together cloud-based services

and applications, but when APIs fail,
so can everything else that relies on
them. Continually test the performance
of APIs and their connections for
BY NICK KEPHART operational awareness.
SR. DIRECTOR OF PRODUCT MARKETING, THOUSANDEYES
Its been a nerve-wracking few months for teams or maintain on your own. But these services, from firewalls
to DDoS mitigation to globe-spanning databases to data
managing cloud applications. In October 2016,
streaming platforms, are themselves composed of many other
a DDoS impaired Dyns DNS services for hours, services. Like a digital matryoshka doll, it can be hard to know
rendering unavailable myriad sites and services just how many layers and dependencies are bound up inside.
In the case of the AWS S3 outage, many operations teams
across the Internet. And in an unrelated, but
were surprised at how many different AWS offerings failed.
similarly impactful event, the outage of AWS S3 They had not appreciated, and AWS had not communicated,
at the end of February 2017 caused widespread just how interdependent various services were.
and unpredictable collateral damage. With more
In your own data center, the failure of your entire file storage
applications leveraging more services hosted
system would have a dramatic impact. In the cloud, it is the
in just a few infrastructure environments, how same story. The oldest services are building blocks from
can we make sense of application dependencies? which other services are built (as represented in the AWS
logo), and are foundational and critical to almost all other
How can we adopt a monitoring strategy that
services. Basic compute (AWS EC2), storage (AWS S3) and
clearly accounts for the risks of improbable but networking (underpinning it all) are critical services that you
hugely catastrophic service disruptions? should be monitoring and evaluating for failure scenarios.
The same goes for Microsoft Azure (VMs, Blob Storage) and
Well dig into how you can identify and manage cloud Google Cloud (Compute Engine, Cloud Storage). If you use
dependencies by: cloud services that depend on these foundational elements,
make sure they are part of your monitoring strategy.
Understanding underlying cloud architectures and
failure scenarios
Developing for the cloud also requires an understanding
Getting a handle on the API connections in your app of failure isolation. AWS is built around the concept of
and in customer interactions regions, with previous outages typically corresponding
Developing a comprehensive monitoring strategy to a single region. Unfortunately, many developers dont
based on these requirements invest (sometimes wisely, sometimes naively) in cross-
region failover strategies. So when US-East-1, the first and
MAKING SENSE OF IAAS ARCHITECTURES largest of the AWS regions has an issue, the impact is
Public cloud environments are a popular and powerful way to unmistakable. Some services, like Google Spanner, have
gain access to advanced services that would be costly to build different isolation mechanisms that need to be evaluated.

When it comes to architecture planning, performance Track trends over time to understand services that
monitoring and optimization, youll want to monitor each fail under your application load.
potential failure domain. So if you are using cloud services
in 4 different regions, make sure that you are collecting 2. Actively monitor API servers and infrastructure
metrics on each. services. Regularly test the reachability, response

time and response codes of these services with
IDENTIFYING API USAGE IN APPLICATIONS

preconfigured tests. Dont know what targets to test?
Your applications depend on the specialized functionality Your cloud provider typically has canary servers or
of third-party applications, typically accessed via APIs. endpoints (here is the list for AWS) they can point
Dont think you rely on APIs for critical capabilities? Think you to.
again. APIs are very common in modern applications, Taken together, these two approaches will give you an
hiding in plain view a complex set of dependencies. Some understanding of baseline performance and specific issues
of these external services are important for just small as they occur. As a bonus, tying both of these methods
portions of functionality. But many impact customer together with a correlation engine such as Splunk can be
experience and revenue generation in fundamental ways. an effective way to make sense of seemingly disparate
events that are actually all related.
What kind of APIs should you be monitoring? The specific
APIs will be unique to your application, but some FOUR STEPS TO TACKLING DEPENDENCIES
examples include: Cloud-based applications, and the business models
User authentication is accomplished with single sign- that they support, rely on an increasingly diverse set
on APIs and services to detect fraud or abuse. of underlying services, tied together through APIs. The
availability and efficacy of APIs and infrastructure services
Pricing and merchandising require the complex has, therefore, become a key element in monitoring and
integration of many back-end applications to show an optimizing cloud applications.
accurate price to a customer.
As you are building out your next cloud application or

Supply chain and logistics APIs ensure shipping is
rethinking ways to meet your SLAs, follow these four steps:
fulfilled.
1. Map key cloud infrastructure and application APIs.
Payment gateways and billing systems are necessary
It can be a monster task but you cant optimize or
to transact with your customers.
mitigate services you dont know about.
Advertising is the lifeblood of many media sites and
2. Test each of the critical service dependencies with a
relies on APIs to display targeted products, images,
combination of logging and active monitoring. Logs
descriptions, and reviews in real time.
will give you forensic evidence while active monitoring
Customer chat, phone and CRM systems use APIs to will provide a heads up to impending trouble.
seamlessly integrate with sites, and typically are the 3. Validate functionality and performance with
difference between successfully communicating with event correlation, alerting and baselining. With
your users and being dead in the water. interdependent services, you may not know likely
failure scenarios until you correlate your data.
There are myriad APIs that make up your overall
4. Optimize performance over the long run to influence
customer experience. Getting a handle on performance
vendor or architectural decisions. From choosing
dependencies requires a clear appreciation for the APIs
vendors to investing in redundancy, it all starts with
used by your application. You enumerate your APIs in
having clear insights from your monitoring data.
various ways: observing domains of objects on web pages,
looking at connection logs from your application servers The next time a major cloud outage or service disruption hits,
and using documentation (well hopefully it exists!) of youll be well aware of what is wrong, and with the proper
embedded services. planning, well positioned to ride out the storm.
Monitoring External Services, Infrastructure and APIs N I C K K E P HA R T leads Product Marketing at

Once youve figured out what to monitor, the next step to ThousandEyes, which develops Network Intelligence software,
operational awareness is collecting data. There are several where he reports on Internet health and digs into the causes
of outages that impact important online services. Prior to
key elements youll want as part of your monitoring toolkit:
ThousandEyes, Nick worked to promote new approaches to cloud
application architectures and automation while at cloud management
1. Log errors of failed API connections and requests. firm RightScale.

Application performance issues are a lot like illnesses. There are thousands of possible culprits, and they can range from a mild
annoyance to you and those around you to potentially fatal. However, like sicknesses, you can take preventative measures to minimize
the occurrence and seriousness of performance problems. We surveyed 476 members of the DZone audience to learn about the most
common bugs that bring them down, how difficult they are to overcome, and how proactive they are in preventing them.
APPLICATION CODE
Those using
their languages
36% of respondents built-in tooling and thread
encountered frequent dump analyzers were 4%
performance issues with less likely to find such issues
their code, while 47% to be challenging than
had some issues. those who did not, while
THE FLU developers using debuggers
were 5% less likely.
DATABASE
Those who build
performance into
their applications
Frequent database throughout the SDLC are
issues plagued 24% of 5% more likely to have no
database issues than those
survey respondents.
STREP who build application
functionality first and worry
THROAT about performance later.
WORKLOAD
Developers
16% of respondents using APM
encountered frequent
tools were 5% more
workload issues, with
likely to solve workload
12% nding such issues
issues easily compared
BROKEN to be challenging.
to those who do not.
BONES
MEMORY
Those who
build perf-
15% of respondents ormance into their app-
were having problems lications from the start
with application see an 8% decrease in
memory. frequent memory issues
MIGRAINES compared to those who
do not.
NETWORK
Those who
build perf-
Frequent network issues ormance into their
aected 14% of survey applications from the
respondents, with 10% start of development see
nding such problems to be a 5% decrease in frequent
COMMON challenging. network issues compared
COLD to those who worry about
performance later.
CO PY R I G H T DZO N E .CO M 2 017

QUICK VIEW
Reinventing 01 Cloud practically eliminated the

lack of appropriate hardware as a
reason for not doing load testing
while also significantly decreasing
Performance
the cost of large-scale tests.
02 With Agile development weve

had a major shift left allowing
us to start testing early.
Testing
03 In Agile development,
performance testing should be
interwoven throughout the SDLC,
not an independent step.
04 Dynamic architectures provide

new challenges for performance
testingmore sophisticated tools
BY ALEX PODELKO may be needed.
CONSULTING MEMBER OF TECHNICAL STAFF, ORACLE
As the industry is changing with many modern AGILE

Agile development eliminates the primary problem with
trends, performance testing should change traditional development: you need to have a working system
too. A stereotypical, last-moment performance before you may test it. Now, with agile development, weve had a
validation in a test lab using a record-playback major shift left, allowing us to start testing early.
load testing tool is no longer enough. Theoretically, it should be rather straightforwardevery iteration
you have a working system and know exactly where you stand
with the systems performance. From the agile development side,
CLOUD
the problem is that, unfortunately, it doesnt always work this
Cloud practically eliminated the lack of appropriate hardware
way in practice. So, such notions as hardening iterations and
as a reason for not doing load testing while also significantly
technical debt get introduced. From the performance testing side,
decreasing the cost of large-scale tests. Cloud and cloud
the problem is that if we need to test the product each iteration or
services significantly increased a number of options to
build, the volume of work skyrockets.
configure the system under test and load generators. There are
some advantages and disadvantage of each option. Depending Recommended remedies usually involve automation and
on the specific goals and the systems to test, one deployment making performance everyones job. Automation here means
model may be preferred over another. not only using tools (in performance testing, we almost always
use tools), but automating the whole process including setting
For example, to see the effect of a performance improvement up the environment, running tests, and reporting/analyzing
(performance optimization), using an isolated lab environment results. Historically, performance test automation was almost
may be a better option for detecting even small variations non-existent as its much more difficult than functional testing
introduced by a change. For load testing the whole production automation, for example. Setups are more complicated, results
environment end-to-end to make ensure the system will are complex (not just pass/fail) and not easily comparable, and
handle the load without any major issue, testing from the changing interfaces is a major challengeespecially when
cloud or a service may be more appropriate. To create a recording is used to create scripts.
production-like test environment without going bankrupt,
moving everything to the cloud for periodical performance While automation will take a significant role in the future, it
testing may be your best solution. only addresses one side of the challenge. Another side of the
agile challenge is usually left unmentioned. The blessing of
When conducting comprehensive performance testing, youll agile development, early testing, requires another mindset
probably need to combine several approaches. For example, and another set of skills and tools. Performance testing of new
you might use lab testing for performance optimization to get systems is agile and exploratory in itself. Automation, together
reproducible results and distributed, realistic outside testing with further involvement of development, offloads performance
to check real-life issues you cant simulate in the lab. engineers from routine tasks. But, testing earlythe biggest

benefit being that it identifies problems early when the cost of NEW TECHNOLOGIES
fixing them is lowdoes require research and analysis; it is not a New technologies may require other ways to generate load. Quite
routine activity and cant be easily formalized. often, the whole area of load testing is reduced to pre-production
testing using protocol-level recording/playback. Sometimes,
CONTINUOUS INTEGRATION it even leads to conclusions like performance testing hitting
Performance testing shouldnt just be an independent step the wall just because load generation may be a challenge.
of the software development life-cycle where testers get the
While protocol-level recording/playback was (and still is) the
system shortly before release. In agile development/DevOps
mainstream approach to testing applications, it is definitely just
environments, it should be interwoven with the whole
one type of load testing using only one type of load generation;
development process. There are no easy answers here to fit
such equivalency is a serious conceptual mistake, dwarfing load
every situation. While agile development/DevOps is becoming
testing and undermining performance engineering in general.
more and more mainstream, their integration with performance
testing is just making its first steps. Protocol-level recording/playback is the mainstream approach
to load testing: recording communication between two tiers
What makes agile projects really different is the need to run
of the system and playing back the automatically created
a large number of tests repeatedly, resulting in the need for
tools to support performance testing automation. The situation script (usually, of course, after proper correlation and
started to change recently as agile support became the main parameterization). As far as no client-side activities are involved,
theme in load testing tools. Several tools recently announced it allows the simulation of a large number of users. But, such a
integration with Continuous Integration Servers (such as Jenkins tool can only be used if it supports the specific protocol used for
and Hudson). While initial integration may be minimal, it is communication between two tiers of the system. If it doesnt or it
definitely an important step toward real automation support. is too complicated, other approaches can be used.
It doesnt look like well have standard solutions here, as UI-level recording/playback has been available for a long
agile and DevOps approaches differ significantly and proper time, but it is much more viable now. New UI-level tools for
integration of performance testing cant be done without browsers, such as Selenium, have extended the possibilities
considering such factors as development and deployment of the UI-level approach, allowing the running of multiple
processes, system, workload, and the ability to automate browsers per machine (limiting scalability only to the resources
gathering and the analysis of results. available to run browsers). Moreover, UI-less browsers, such as
HtmlUnit or PhantomJS, require significantly fewer resources
NEW ARCHITECTURES than real browsers.
Cloud seriously impacts system architectures, having a lot of
performance-related consequences. Programming is another option when recording cant be used at
all, or when it can, but with great difficulty. In such cases, API
First, we have a shift to centrally managed systems. Software as
calls from the script may be an option. Often, this is the only
a Service (SaaS) are basically centrally managed systems with
option for component performance testing. Other variations
multiple tenants/instances.
of this approach are web services scripting or the use of unit
Second, to get the full advantage of cloud, such cloud-specific testing scripts for load testing. And, of course, there is a need
features as auto-scaling should be implemented. Auto-scaling to sequence and parameterize your API calls to represent a
is often presented as a panacea for performance problems, but, meaningful workload. The script is created in whatever way
even if it is properly implemented, it just assigns a price tag for is appropriate and then either a test harness is created or a
performance. It will allocate resources automatically, but you load testing tool is used to execute scripts, coordinate their
need to pay for them. Any performance improvement results in executions, and report and analyze results.
immediate savings.
SUMMARY
Another major trend involves using multiple third-party Performance testing should reinvent itself to become a flexible,
components and services, which may be not easy to properly context-, and business-driven discipline. It is not that we just
incorporate into testing. The answer to this challenge is service need to find a new recipe; now, we need to be able to adjust on
virtualization, which allows one to simulate real services during the fly to every specific situation in order to remain relevant.
testing without actual access.
A L E X P OD E L KO has specialized in performance since 1997,
Cloud and virtualization triggered the appearance of dynamic,
working as a performance engineer and architect for several companies.
auto-scaling architectures, which significantly impact collecting Currently he is Consulting Member of Technical Staff at Oracle,
and analyzing feedback. With dynamic architectures, we responsible for performance testing and optimization of Enterprise
have a great challenge ahead of us: to discover configuration Performance Management and Business Intelligence (a.k.a. Hyperion) products. Alex
automatically, collect all necessary information, and then periodically talks and writes about performance-related topics, advocating tearing
properly map the collected information and results to a changing down silo walls between different groups of performance professionals. His collection
of performance-related links and documents (including his recent papers and
configuration in a way that highlights existing and potential
presentations) can be found at alexanderpodelko.com. He blogs atalexanderpodelko.
issuesand potentially, to make automatic adjustments to avoid
com/blogand can be found on Twitter as @apodelko. Alex currently serves as a
them. This would require very sophisticated algorithms and director for the Computer Measurement Group (CMG,cmg.org), an organization of
sophisticated Application Performance Management systems. performance and capacity planning professionals.

Premier Web Performance Monitoring Platform
Monitor your website and the whole infrastructure behind it
Insightful Reports Integrations line-up
Open API
Real-time Alerts
Start Monitoring Now
Trusted by 200.000
A study by research firm IHS Markit released last year A website is the digital front for a product or service;
reported that information and communication technology nowadays its not an online catalogue anymore but a
(ICT) downtime is costing North American organizations mandatory business generating engine.
$700 billion per year. Thats a monumental amount of
lost revenue and lost productivity, not to mention the Lots of time and money have been spent in building your website
negative impacts on company branding. And its completely wouldnt you want to take the last step to ensure it is
unnecessary. With web performance monitoring one can protected and functioning properly?
ensure that end-users are interacting with a website or web
Its always preferable to be notified of any unexpected downtime
application as expected.
events in realtime, rather than discovering it once its too late.
Web performance monitoring is a critical means to avert the Better yet, shouldnt you see how real users are interacting with
negative consequences of unpredictable downtime and ensuring your website, and gain control over each step they are taking by
proper functioning of the entire IT infrastructure. A contributor making sure mission-critical flows work flawlessly?
to the success of any business, web monitoring is still often
These are just some of the immediate benefits from website
viewed as an optional nice to have feature. This way of
performance monitoring. Think of it as an insurance policy on
thinking misses the point. The same rationale for having an
your digital landscape, which gives you peace of mind, empowers
elegant mobile-friendly website must also drive the decision to
your IT team and keeps your customers coming back.
establish a clear business-focused web monitoring strategy.
With insurance, you pay to protect your family, home, or car

in case of an emergency. The same should apply to website WRITTEN BY RAFFI M. KASSARJIAN
EXECUTIVE DIRECTOR AND GENERAL MANAGER,
performance.
MONITIS, A TEAMVIEWER COMPANY
Monitis
CATEGORY COMPANY
Cloud-based Web Performance and End-User Monitoring solution Monitis is a TeamViewer company founded in 2006.
Today over 200,000 users in 150 countries rely on its
world-class monitoring system to monitor the
PRODUCTS
WEBSITE PERFORMANCE MONITORING functionality of 300,000+ websites, servers, and
The Monitis Website Monitoring suite provides users with all the applications. Monitis offers all the innovative
information they need about the availability and performance of solutions necessary to ensure that your mission
critical web applications from a single dashboard, identifies and critical web applications are running continually and
isolates any problem associated with end user experience before it efficiently.
harms the business. The right blend of Uptime, Real User Monitoring
INTEGRATIONS
and Synthetic Transaction Monitoring gives a complete picture of
To equip clients with the most innovative
web applications performance. With a static selection of test node
capabilities in web monitoring, Monitis has
locations all over the world, rather than Round-Robin location
introduced a line-up of SaaS integrations to ensure
checks, a customer has hand on the pulse of the most critical markets
that no performance issue ever goes unnoticed.
for the business.
Monitis supports integrations with the following
Real User Monitoring industry leading services:
Uptime Monitoring VictorOps
Transaction Monitoring Slack
Full-Page Load Zapier
JIRA
APPLICATION PERFORMANCE MONITORING
HipChat
Monitis offers full SDKs for all popular languages including Java, Perl,
OpsGenie
Python, PHP, Ruby, C#
CloudWatch
OPEN API WHMCS
To fulfill specific monitoring needs, Monitis also provides an open API PagerDuty
for extending and customizing the platform.
and more
WEBSITE monitis.com TWITTER @monitis BLOG monitis.com/blog

QUICK VIEW
6 Common
01 Documentation is crucial to the
success of your API. Make sure
that it stays up to date and that
developers can quickly find the
information they need.
02 Leverage tools that give you
API Mistakes
visibility into your APIs. They can
help you quickly debug a wide
array of issues, from missing
headers to invalid certificates.
03 Put in the time to make your

API response the best you can.
Implement status codes correctly.
Use clear, concise error messages
to help save time for your users
BY HEITOR TASHIRO SERGENT and support team.
COMMUNITY CONTENT LEAD, RUNSCOPE
Have you ever used an API that returned an has given us unique insight into issues they often see when
HTML error page instead of the JSON you integrating and interacting with APIs.
expected, causing your code to blow up? What

Heres our list of 6 common mistakes that can catch you off
about receiving a 200 OK status code with a
guard, why they happen, and how you can avoid them.
cryptic error message in your response?
1. USING HTTP:// INSTEAD OF HTTPS://
Forgetting a single s can get you in a lot of trouble when
Building an API can be as quick as serving
testing an API. Some APIs may only support HTTPS, while
fast food. Frameworks like Express, Flask, and
others may support HTTP for some endpoints and not
Sinatra combined with Heroku or zeits now help others.
any developer have an API up and running in a
few minutes. Even when an API supports both, you might still run into
some errors. For example, some APIs redirect HTTP traffic
However, building a truly secure, sturdy, hearty API can to their HTTPS counterpart, but not all frameworks will
take a little more work, just as a chef takes more time be configured to follow a 302 status code. Node.js request
when crafting a great meal. You need great docs, clear module, for example, will follow GET redirects by default,
and concise error messages, and to meet developers but you have to explicitly set followAllRedirects to true if you
expectations of how your API should work. want to follow redirects to POST and other methods.
On the other side of the table, we have developers APIs may also stop supporting HTTP, so its important to
interacting with these APIs. And we, as developers, stay up-to-date with any changes. Good API providers will
sometimes make mistakes. We can make false let users know beforehand via email and any social media
assumptions about how an endpoint should work, not channels they have. Another step you can take is to use a
read the docs closely enough, or just not have enough tool like Hitch, which lets you follow certain APIs and be
coffee that morning to parse an error message. notified if anything changes.
Our testing and monitoring tools can help you uncover If youre asking yourself if your API should support HTTPS,
issues that would otherwise stay hidden by a lack of then the answer is yes. The process for getting certificates
integration tests, or real-world use case scenarios. Working used to be a hassle, but with solutions like Lets Encrypt
with thousands of developers to resolve their API problems and Cloudflare, theres no excuse to not support HTTPS.

If youre unsure why you should do it, or dont think you As API consumers, we need to be careful and not assume
should because youre not transmitting any sensitive data, that an API 200 status code means the request made a
I highly recommend reading Why HTTPS for Everything? successful call and returned the information we want.
from CIO.gov. Some APIs, like Facebooks Graph API, always return a 200
status code, with the error being included in the response
2. UNEXPECTED ERROR CODES data. So, when testing and monitoring APIs, always be
A good API error message will allow developers to quickly careful and dont automatically assume that a 200 means
find why, and how, they can fix a failed call. A bad API error everything is OK.
message will cause an increase in blood pressure, along
with a high number of support tickets and wasted time. Another great resource about response handling
is Mike Stowes blog post on API Best Practices:
I ran into this issue a couple of weeks ago while trying to Response Handling.
retrieve an APIs access token. The code grant flow would
return an error message saying that my request was 3. USING THE WRONG HTTP METHOD
invalid, but it wouldnt give me any more details. After This is an easy one, but surprisingly common. A lot of
an hour banging my head against the wall, I realized I times this can be blamed on poor documentation. Maybe
hadnt paid attention to the docs and forgot to include an the endpoints do not explicitly say what methods are
Authorization header with a base64 encoded string of my supported between GET/POST/PUT etc., or they have the
applications client_id and client_secret. wrong verb.
Good usage of HTTP status code and clear error messages Tools can also play tricks on you if youre not careful.
may not be sexy, but it can be the difference between a For example, lets say you want to make a GET request
developer evangelizing your API and an angry tweet. with a request-body (not a great practice, but it happens).
If you make a curl request using the -d option, and dont
Steve Marx had this to say in How many HTTP status use the -XGET flag, it will automatically default to POST
codes should your API use?: ...developers will have and include the Content-Type: application/x-www-form-
an easier time learning and understanding an API if urlencoded header.
it follows the same conventions as other APIs theyre
familiar with. As an API provider, you dont have to This post by Daniel Stenberg (author and maintainer of
implement 70+ different status codes. Another great curl) on the unnecessary use of curl -X also illustrates
advice by Steve is: another possibility you might run into this issue when
dealing with redirects:
Following this pragmatic approach, APIs should
probably use at least 3 status codes (e.g. 200, 400, 500)
One of most obvious problems is that if you also tell
and should augment with status codes that have specific,
curl to follow HTTP redirects (using -L or location), the
actionable meaning across multiple APIs. Beyond that, keep
-X option will also be used on the redirected-to requests
your particular developer audience in mind and try to meet
which may not at all be what the server asks for and the
their expectations.
user expected.
Twilio is a great example of best practices for status code

and error messages. They go the extra mile and include Other times, we might fall into past assumptions and
links in their responses, so the error message is concise just use the wrong method. For example, the Runscope
while still providing the developer with more information API uses POST when creating new resources, such as test
in case they need it. steps or environments, and PUT when modifying them.
But Stripes API uses POST methods when creating and
updating objects.
Both approaches are valid, and Stormpath has a great blog

post talking about their differences, and how to handle
them as an API provider. No matter which one you choose,
just be consistent throughout your API and make sure to
have correct and up-to-date docs, so your users dont run
into this error.

4. SENDING INVALID AUTHORIZATION CREDENTIALS 6. APIS RETURNING INVALID CONTENT TYPES WHEN
APIs that implement OAuth 2, such as PayPal, usually THERE IS AN ERROR
require the developer to include an Authorization

header for each request. Its common to confuse that
with Authentication instead, so if your request is failing,
make sure youre using the correct word.
Another issue that pops up with Authorization headers

is actually constructing it correctly. OAuth 2 tokens
need to be prepended with Bearer for them to work:
Authorization: Bearer your_api_token
Its also important when using HTTP Basic

authentication to pay close attention to the syntax of
the header value. The form is as follows:
Authorization: Basic base64_encode(username:password)

I can say that this is one of my pet peeves with APIs.
Seeing that <!DOCTYPE HTML> line in a response makes
my blood pressure go sky high.
Common mistakes include forgetting the Basic
(note the space) prefix, not encoding the username
Well, sometimes thats my fault. If you forget to send an
and password or forgetting the colon between them.
Accept header with your request, the API cant be sure
If an API provider only requires a username without what response format youre expecting.
a password (like Stripe, where your API key is the
username), youll need that pesky colon after the For API providers, some frameworks and web servers
username, even if theres no password. default to HTML. For example, Symfony, a PHP framework,
defaults to returning a 500 HTML error. So, if youre
5. NOT SPECIFYING CONTENT-TYPE OR ACCEPT

creating an API that has no business returning HTML,
HEADER make sure to check the defaults error response.
Accept and Content-Type headers negotiate the type

Another reason this might happen may not have to do
of information that will be sent or received between a
with your API, but with the routing mesh or load balancer
client and server. Some APIs will accept requests that
that sits in front of your API. For example, if you have
dont contain any of those headers, and just default to a
a Nginx instance fronting your API and it encounters a
common format like JSON or XML.
request timeout or other error, it may return an HTML
error before your API instances even have a chance to
Other APIs are a little more strict. Some might return a know whats going on.
403 error if youre not explicit about the Accept header
value and require you to include those headers on CONCLUSION
requests. That way, the server knows what information These are some of the most common mistakes we have
the client is sending, and also what format they expect seen across multiple APIs. This list could go on for much
to receive in return. longer, so if theres some other error you came across and
you managed to fix it, please share it with us.
This issue can also because some confusion if you
are testing your API with different tools. curl, for HE I TOR TAS HI R O S E R G E N T is the Community
Content Lead at Runscope, which provides a cloud-based API
example, along with other popular testing tools, will
Monitoring and Testing solution. He helps in all of the company
automatically include an Accept header for any MIME content efforts, including writing technical documentation,
type: */* with every request. We, at Runscope, dont add crafting tutorials and blog posts, and creating customer stories. Prior to
Runscope, Heitor worked as a Developer Evangelist for SendGrid in Latin
a default Accept header, so this can get you different
America, building a strong presence for the company by helping the local
results when testing the same endpoint. developer and startup communities.

Diving Deeper
INTO PERFORMANCE
TOP #PERFORMANCE TWITTER FEEDS PERFORMANCE ZONES

To follow right away Learn more & engage your peers in our cloud-related topic portals
Performance dzone.com/performance
@Souders @tyler_treat
Scalability and optimization are constant concerns for the Developer
and Operations manager. The Performance Zone focuses on all things
performance, covering everything from database optimization to garbage
@bbinto @jaffathecake collection to tweaks to keep your code as efficient as possible.
Web Dev dzone.com/webdev

@svenbaumgartner @brianleejackson Web professionals make up one of the largest sections of IT audiences. We
collect content that helps web pros navigate in a world of quickly changing
language protocol, trending frameworks, and new standards for UX. The
@appperfeng @ayende Web Dev Zone is devoted to all things web development, including front-
end UX, back-end optimization, JavaScript frameworks, and web design.
@AndyDavies @wp_maven Mobile dzone.com/mobile

The Mobile Zone features the most current content for mobile developers.
Here youll find expert opinions on the latest mobile platforms, including
Android, iOS, and Windows Phone. You can find in-depth code tutorials,
editorials spotlighting the latest development trends, and insights on
TOP PERFORMANCE REFCARDZ
upcoming OS releases.
Scalability and High Availability

dzone.com/refcardz/scalability PERFORMANCE BOOKS
Provides the tools to define Scalability and High Availability, so your team Writing High-Performance .NET Code
can implement critical systems with well-understood performance goals. by Ben Watson
Systems Performance: Enterprise & the Cloud

Java Performance Optimization by Brendan Gregg
dzone.com/refcardz/java-performance-optimization
Getting Java apps to run is one thing. But getting them to run fast is High Performance JavaScript: Build Faster Web
another. Performance is a tricky beast in any object-oriented environment, Application Interfaces
but the complexity of the JVM adds a whole new level of performance- by Nicholas C. Zakas
tweaking trickiness and opportunity. This Refcard covers JVM internals,
class loading (updated to reflect the new Metaspace in Java 8), garbage PERFORMANCE TOOLS
collection, troubleshooting, monitoring, concurrency, and more.
PageSpeed Insights
developers.google.com/speed/pagespeed/insights
Getting Started with Real User Monitoring
dzone.com/refcardz/getting-started-with-real-user-monitoring Key CDN
tools.keycdn.com/speed
Teaches you how to use new web standardslike W3Cs Beacon API
to see how your site is performing for actual users, letting you better Page Scoring
understand how to improve overall user experience. pagescoring.com/website-speed-test

Maximize Your
APM Investment
with xMatters
Connect your APM alerts to your on-call schedule,

ticket system, status page, chat, etc.
RELAY MONITORING ALERTS ENGAGE THE RIGHT PEOPLE SUPPORT CONTINUOUS

BETWEEN SYSTEMS TO RESOLVE INCIDENTS DELIVERY PROCESSES
Identify and resolve incidents faster with xMatters. xMatters.com/maxAPM
Copyright 2017 xMatters. All rights reserved. All other products and brand names are trademarks or registered of their respective holders.
3 Ways to Maximize
a message for multiple audiences so engineers get detailed
specifics while stakeholders get the business language they
understand all from the same notification. Focus your engineering
Your APM Investment

teams on resolving the issue rather than sending updates.
Connect your entire DevOps toolchain to help

Because businesses work at a faster pace now, we depend more and you resolve issues quickly and collaboratively.
more on technology in order to keep up. Developers have shortened
release cycles. Operations teams support quicker application
INCREASE THE VELOCITY OF YOUR CI/CD PROCESSES
deployments. IT teams have to increase velocity while keeping
Your release pipeline is automated; your communications need
older systems up and running. Here are 3 keys to keep things
to be as well so you can drive critical processes forward and
humming along:
avoid potential service outages. Instead of copying and pasting
TRANSFORM ALERTS WITH ACTIONABLE (or worse, typing) alert data into tickets or chat rooms, etc.
RESOLUTION OPTIONS automatically push relevant data from your monitoring alerts to
Get more value out of your APM by adding actionable responses to the tools you use to resolve incidents. Cuz nobody got time for
alerts. Want to create a ticket? Start an incident chat room? Post typing, am I right?
an update to a service status page? You can when you integrate
With xMatters powerful integrations, engineers can take action
your APM solution with xMatters. Turn that SocketException alert
directly from notifications and move through the incident lifecycle
into a We just averted total disaster. Youre welcome alert.
quickly, focus on resolution activities and avoid major incidents.
KEEP YOUR COMPANY IN THE LOOP
Do your teams love resolving an incident with an executive calling WRITTEN BY ABBAS HAIDER ALI
every 10 minutes? With xMatters you customize CTO, XMATTERS
xMatters
xMatters is a toolchain communication platform that relays data between systems
while engaging the right people to resolve incidents.

IT Alerts Quarterly No
Connect insights from APM tools to the people, teams,
and systems needed to drive business processes forward
CASE STUDY
Leverage group on-call schedules and rotations,
Kellogg Cuts Resolution Times by 83%
escalation rules, and user device preferences to
In 2013, The Kellogg Company initiated a three-year plan to elevate company
automatically engage the right resources
efficiency. Along with finding a new IT monitoring solution, they sought to
automate IT event notifications and integrate them with the monitoring. Customized response options allow teams to quickly
get up to speed and take action during incidents
xMatters tied smoothly into Kelloggs broader cloud-based infrastructure,
including its new monitoring solution and ITSM and CMDB applications. Events Robust self-service integration platform with 100s of
in the monitoring and help desk systems automatically kick off tailored alerts prebuilt configurations that can be installed in minutes
from xMatters for 2,000 potential events to the relevant people across 88 global
15 years of experience integrating people into
teams via phone, SMS, email and pushwhichever format the recipient prefers.
toolchains spanning DevOps, Ops, and Service
If someone doesnt accept the alert, xMatters escalates the issue to the next
Management solutions
person on the list, ensuring a timely response and preventing alert fatigue.
With xMatters, Kellogg was able to: NOTABLE CUSTOMERS

Reduce mean time to resolution by 83% 3M Fujitsu ViaSat
Reach alerting accuracy exceeding 99.99% Dealertrack / The Kellogg Walgreens
Cox Automotive Company
Cut resource costs by 92% in conjunction with monitoring and ticketing
a savings that is expected to add up to $2.5 million over 5 years Fiserv Manpower
WEBSITE xmatters.com/maxAPM TWITTER @xmatters_inc BLOG xmatters.com/blog

Executive Insights
QUICK VIEW
01 Use real-time user monitoring to

provide visibility into the entire
Performance
pipeline to ensure an optimal
user experience with video,
applications, and web pages.
02 While theres a proliferation of
Optimization and
tools providing visibility across
networks, architectures, and
devices, no one has developed
a single, holistic solution.
Monitoring 03 In the future, there will be a

single, holistic solution that
uses machine learning to
solve problems before they
even occur for an optimal user
BY TOM SMITH experience.
RESEARCH ANALYST, DZONE
To gather insights on the state of performance requirements, and devices in diverse geographic
locations has made visibility into the entire network
optimization and monitoring today, we spoke
critical. You need to be able to see where all of your data
to 12 executives from 11 companies that provide is residing to understand how performance is, or is not,
performance optimization and monitoring being optimized.
solutions for their clients Heres who we spoke to:
02 Theres a greater need for visibility, and theres
JOSH GRAY, Chief Architect, Cedexis a proliferation of tools coming online to provide
that visibility. However, no one has developed a
JEFF BISHOP, General Manager, ConnectWise Control
single solution to provide a complete view across a
BRYAN JENKS, CEO and Co-Founder, DropLit.io diverse collection of infrastructures and application
DORU PARASCHIV, Co-Founder, IRON Sheep TECH architectures. Response times and page-load times
have continued to decrease with the adoption of
YOAV LANDMAN, Co-Founder and CTO, JFrog
virtualization and microservices. Were evolving from
JIM FREY, V.P. Strategic Alliances, Kentik performance monitoring to performance intelligence
with the addition of easy-to-understand, contextually
ERIC SIGLER, Head of DevOps, PagerDuty
relevant, algorithmically-driven performance analytics.
NICK KEPHART, Senior Director Product Marketing, ThousandEyes However, its important to identify and focus on key
KUNAL AGARWAL, CEO, Unravel Data business metrics, or else you run the risk of being
overwhelmed with data.
LEN ROSENTHAL, CMO, Virtual Instruments
ALEX RYSENKO, Lead Software Engineer, Waverly Software 03 The most frequently mentioned performance and
monitoring tools used are AppDynamics, New Relic, and
EUGENE ABRAMCHUK, Sr. Performance Engineer, Waverly Software
DataDog. However, these were just three of more than
30 mentioned, with a trend towards more granular and
Here are the key findings from the subjects we covered:
specialized offerings, and respondents mentioning just a
01 The keys to performance optimization and few solutions that came to mind besides their own.
monitoring are the design infrastructure and real-
time user monitoring (RUM) to ensure an optimal 04 Real-world problems that are being solved with
end-user experience (UX) whether its videos, web performance optimization and monitoring are time to
pages, or applications. The proliferation of new services, market, optimization of UX, and reduction in time to

resolve issues through greater collaboration among expertise. Companies are not moving quickly enough to
teams. While more tools are coming online, some share and integrate different viewpoints. Smaller teams
providers are enabling disparate tools to provide an can implement more iterative solutions more quickly,
integrated view to the client, which results in greater which allows them learn faster and observe how small
visibility into the entire pipeline and faster time to optimization differences can have massive hardware
problem resolution. This visibility is also enabling implications. Its important to identify and agree upon
clients to ensure service level agreements (SLAs) are KPIs for each business unit, and how they will be
being met by third-party providers. measured. Premature optimization is a common pitfall
in software development. Its common to see software
05 Nonetheless, the most common issues continue being developed without concern for consistency or use
to be the need to improve visibility, ease of use, cases, which dramatically affect the quality and speed
performance, and knowledge of the impact that of the software.
code has on the UX. Incomplete visibility throughout
the pipeline prevents organizations from accurately 08 The skills needed by developers to optimize
finding the source of latency in the network, the application performance and monitoring are: 1)
application, or the endpoint. There continues to be understanding of the fundamentals; 2) understanding
a lack of knowledgeable professionals that know the concept of benchmarking and improving; and 3)
distributed computing and parallel processing. As such, staying creative. Have authoritative understanding
technical complexity of these tools must be reduced for of the underlying IT infrastructure and the expertise
companies to get the most value from them. Vendors to keep it running in the face of constant change,
should also improve the ease of use through analytics independent of vendors or location. Understand the
so IT operations do less data interpretation and can architecture of the system, how services talk to each
focus more on remediation. Understanding the product, other, how the database is accessed, and how messages
load, load tests, and performance graphs is critical. are read by concurrent consumers. Keep a broad
Several developers do not understand the performance perspective, an open mind, and an understanding of
impact of their code and they are not pre-optimizing the needs and wants of the end user. Dont assume
their code, which can lead to less readable code with the model you have in your mind is correct and know
more complex bugs. Ensure that you talk to end users youre going to get it wrong. Get used to designing in
in order to understand what they are experiencing and a way that makes it easy to make a few small changes
whats important to them. Do not assume you know than having to rebuild the entire application. Set a
what they want. reliable benchmark for the performance goals that
are relevant to your business application and work to
improve on those goals as you get more information.
06 The biggest opportunities for improvement are the
automatic reaction to, and correction of, issues and 09 An additional consideration made by a few
having more elegant, thoughtful design, and testing of our participants is the question of where
resulting in an optimal UX. In the future, performance performance monitoring begins and ends versus
and monitoring tools will automatically react to issues testing and validation. Once a problem is identified
and know the difference between mitigating and fixing and remediation proposed, there is a need to test
problems. Theyll be able to do this by collecting more and validate that the change has completely fixed
data and identifying a dynamic system to determine the problem. What effect will advancements in
what the problem may be before it affects the customer. technologies such as AI, bots, BI, data analytics,
Data will be more manageable with automated ElasticSearch, natural language search, and new open
analysis. Application design will feature higher level source frameworks with standardized APIs have on
programming, better tools, and graceful degradation. performance and monitoring?
Just as data is used to solve problems, it can also be
used to change the way performance testing is done Let us know if you agree with their perspective or have
and measured. All monitoring products will monitor answers to the questions they raised. Wed love to get
across the hybrid data center, including on-premise your feedback.
and public cloud-deployed applications.
TOM S M I T H is a Research Analyst at DZone who excels
at gathering insights from analyticsboth quantitative and
07 The biggest concerns about performance and
qualitativeto drive business results. His passion is sharing
monitoring today are the lack of collaboration, information of value to help people succeed. In his spare time, you
identification of KPIs and how to measure them, and can find him either eating at Chipotle or working out at the gym.

3-Second
Root Cause
Identification
Healthy Microservices -
No Compromise
Monitoring
monitoring among other things to achieve them. The
problem is that monitoring keeps people busy. They have
to continuously watch the data in order to understand
Modern Applications the health of the system.
By utilizing machine intelligence to filter noise,

recognizing recurrent patterns, and finding technology-
Every day, billions of metrics and their aggregates specific issues, people can focus on making decisions
flow from various systems into monitoring platforms. based on indicators provided by the tool. This is a model
Thousands of collectors consume metrics through for managing quality of service. Even more:
different protocols, at different times. At the other end, by defining objectives, people can tell the tool when to
a person is watching all the metrics as they are surfaced let them know and even what automated actions to
and is applying rules to catch anomalies in behavior trigger, should an important objective be missed. This
when they deviate from the assumed or statistically clearly frees peoples time, and turns busy work into
computed normal. directed activity.
This approach might work fine in a small environment By using these three generalizations as your
with a few dependencies between them, but not at scale. operational model to monitor applications, you can
evolve monitoring, from you doing all the work, to
In an IT organization, teams of people typically managing, and leaving the heavy lifting to a modern
are responsible for specific IT systems for their platform like Instana.
development, maintenance, health, and user satisfaction.
In order to follow the direction of the organization,
WRITTEN BY PAVLO BARON
teams have goals within their own scope, and they use CHIEF TECHNOLOGY OFFICER, INSTANA
Dynamic APM for Microservice Management
Built for agile organizations, Instana monitors and correlates data from every
aspect of the application stack. As IT teams integrate and deploy new code, Instana
automatically discovers and continuously aligns with any change.
CONTINUOUS ALIGNMENT SERVICE INTELLIGENCE BEST DIGITAL EXPERIENCE

Continuous automatic discovery Precise correlation of data Confidence to take risks, innovate
aligns to code deployment and and microservice structure faster, and continually optimize
your ever changing (micro)service delivering automatic root cause your business applications
architectures identification
Organizations benefit from real-time impact analysis, improved quality of service, and
optimized workflows that keep applications healthy.
WEBSITE instana.com TWITTER @instanahq BLOG instana.com/blog

QUICK VIEW
The Smoke and Mirrors 01 True user experience

improvement begins with
precision application performance
optimization that only an APM
of UX vs. Application
solution can provide.
02 The first step in approaching

application performance
monitoring is identifying an
Performance
entirely new unit of measuring the
success of your applications.
03 Ignoring underlying performance

problems and covering it up
with a UI bandaid only covers
up a symptom of the underlying
problem.
BY OMED HABIB
DIRECTOR OF PRODUCT MARKETING, APPDYNAMICS
Can a better UX simultaneously deliver a worse that the users perceptions about the apps performance
user experience? It sounds like a paradox, but need to be managed. One real world example involves a
test of loading screen animations for the mobile version
it may be more common than you think. It
of Facebook. The test demonstrated that users reported
describes a category of UX design practices an improved perception of the apps loading speed when
that have little to do with improving the actual developers changed the design of the loading animation.
experience and everything to do with suggesting
As iOS developer, Rusty Mitchell reported, A Facebook
that the experience is a good one.
test indicating that when their users were presented with
a custom loading animation in the Facebook iOS app,
The difference can be subtle, but true user they blamed the app for the delay. But when users were
experience improvement begins with precision shown the iOS system spinner, they were more likely to
application performance optimization that only blame the system itself.
an APM solution can provide. APM diagnoses

Perceived performance techniques are part a larger
the cause of a slowdown, allowing developers trend in design beyond the world of software known as
to address the root cause and not the surface benevolent deception.
symptoms. Ignoring the underlying performance
A STUDY OF BENEVOLENT DECEPTION
bottlenecks and tricking the user with a UI
In a report by Microsoft and university researchers,
bandaid is akin to placing tape over a crack author Eytan Adar said his team found increasing use of
in your drywall youre only covering up the benevolent deception in a range of software applications.
symptom of an underlying problem. Thats the Adar said that these deceptions arise from the stress of
opposing market forces. While enterprise and consumer
difference between improving the quality of the
apps are growing more complex, users increasingly
software and generating economic waste for prefer apps with a simpler, more intuitive interface. The
short term gains. complexities must be simplified on the front end and
deception is the most direct route.
DEFINING PERCEIVED PERFORMANCE
UX designers commonly deploy a set of techniques Adar wrote, Were seeing the underlying systems
related to perceived performance. This is the principle become more complex and automated to the point

where very few people understand how they work, pattern emergence, reification (fill-in-the-gaps), image
but at the same time we want these invisible public- multi-stability, and object invariance. These often cover
facing interfaces. The result is a growing gulf between over issues related to slow-performing software.
the mental model (how the person thinks the thing
MENTAL MODEL DECEPTIONS: DESIGNED TO GIVE
works) and the system model (how it actually works). A
USERS A SPECIFIC MENTAL MODEL ABOUT HOW THE
bigger gulf means bigger tensions, and more and more SYSTEM OPERATES.
situations where deception is used to resolve these gaps. Explainer videos may be the worst offenders in this
category, since they apply dramatic and distracting
BEYOND BENEVOLENT DECEPTION
metaphors in an attempt to engage distracted prospects.
One step beyond benevolent deception is volitional
Many times, the more outlandish the model, the more
theater, which refers to functions and displays that dont
memorable it is. These mental models help sales, but
correspond to the underlying processes. Outside the
support teams then have to explain how things really
world of software defined businesses, you can see these
work to frustrated customers. It also covers popular
techniques at work in the fact that elevator Close Doors
skeuomorphs, like the sound of non-existent static on
buttons dont do anything at all. They just give impatient
Skype phone calls.
riders something to do until the doors close on their own.
BETTER PRACTICES IN UX
Pretty shocked? Yep, you better believe it. Despite these trends, there are many developers who
remain strongly opposed to benevolent deception and
The New York Times cited several other examples of volitional theater. Some of their insights and arguments
volitional theater, such as the Press to Walk button are presented on the Dark Patterns website.
on street corners. Remember all those times you kept
the crosswalk pedestrian button to cross the street? These developers dont feel that its ethical to trick the
You were most likely pushing a button designed to calm user or remove their freedom to know whats going on
your nerves that didnt have any actual impact on the with the software. Instead of masking issues, they feel
timing of the street lights. Designers refer to non- that slowdowns or clunky design can and should be
operation controls as placebo buttons. They shorten eliminated. What developers need to achieve that is a
the perceived wait time by distracting people with the comprehensive monitoring solution that pinpoints the
imitation of control. causes of latency. Then engineering teams can go in and
make the necessary code and infrastructure optimization
THREE TYPES OF VOLITIONAL THEATER
necessary for software to actually perform better. They
Adars research identified three major trends in the consider volitional theater to be sloppy design. One way
way that volitional theater is being used in modern UX to start correcting those design flaws is by evaluating the
application design: software and corresponding infrastructure dependencies.
SYSTEM IMAGES DECEPTION: DESIGNED TO REFRAME

UX AND BUSINESS TRANSACTIONS
WHAT THE SYSTEM IS DOING.
The first step in approaching application performance
This category of deceptive UX includes Sandboxing.
monitoring is identifying an entirely new unit of
Thats where developers create a secondary system that
measuring the success of your applications. At
operates in a way thats different from a full application.
AppDynamics, weve solved this by introducing the
The example Adar gave is the Windows Vista Speech
concept of a Business Transaction.
Tutorial. This works as a separate program from the main
speech recognition system. In reality, the tutorial is a
Imagine all the requests that are invoked across an
safe version of the environment that was built to be less
entire distributed application ecosystem necessary
sensitive to mistakes to simplify the learning process.
to fulfill a specific request. For example, if a customer
BEHAVIORAL DECEPTIONS: DESIGNED TO OFFER USERS wants to checkout their shopping cart, they click
THE APPEARANCE OF CHANGE. the Checkout button. Upon that single click, a request
This category goes back to the idea of placebo buttons. is made from the browser through the internet to a
By giving users an OK or Next button, the system web server, probably a front-end Node.js application
can smooth over performance delays. UX designers which then calls an internal website, a database, and a
use a host of optical illusions and cinematographic caching layer. Each component within that call thats
techniques like blurring to suggest motion when invoked is responsible to fulfill that click, so we call the
nothing is happening on screen. Examples include lifeline of that request a Business Transaction.

The Business Transaction perspective prioritizes the applications performance, start by fixing up these regular
goals of the end user. What are your customers really occurring exceptions. Who knows, you just might help
trying to achieve? In the past, developers argued over everyones code run a little faster.
whether application monitoring or network monitoring
were more important. Heres how an AppDynamics This approach is becoming even more critical as smaller
engineer reframed the problem: devices attempt to crunch higher volumes of data. A good
example is how applications built for the Apple Watch
For me, users experience Business Transactions are expected to provide the same level of performance
they dont experience applications, infrastructure or as those built for a tablet or smartphone. Users dont
networks. When a user complains, they normally say lower their expectations to compensate for processing
something like, I cant log in, or, My checkout timed power. In the end, users care about the benefits of the
out. I can honestly say Ive never heard them say, The application, not the limitations of the device.
CPU utilization on your machine is too high, or, I dont
think you have enough memory allocated. Now think THE AGE OF EXPERIENCE
about that from a monitoring perspective. Do most Gartner reported that 89 percent of companies expected
organizations today monitor business transactions, or do to compete based on customer experience by 2017.
they monitor application infrastructure and networks? However, customers want more from their software than
The truth is the latter, normally with several toolsets. So just great UX. Beautiful design and clever tricks cant
the question Monitor the application or the network? is distract these time-sensitive users from being very aware
really the wrong question. Unless you monitor business of application performance issues.
transactions, youre never going to understand what your
end users actually experience. As data volumes accelerate and devices shrink, it will
grow harder to maintain optimal performance and
Starting from the business transactions will help DevOps continuous improvement schedules. Applications
teams view your system as a function of business teams need speed and precision tools to pinpoint areas
processes vs. individual requests firing off everywhere. for improvement. At the same time, more businesses
This is how you can start solving the problems that are undergoing their own digital transformations and
matter most to customers. Its always better to diagnose discovering the importance of performance management
and solve performance issues instead of merely covering for the first time.
them up or distracting the user with UX techniques.
Developments in user hand movement recognition and
ISOLATING CAUSES motion control mapping have been accelerating along
This approach is much closer to approximating the multiple fronts, such as VR best practices by Leap Motion
true UX of an average user. Starting from the business and Googles Project Soli, which uses micro-radar to
transaction, you can use APM solutions to drill down precisely translate user intent by the most minute finger
from end user clients to application code-level details. gestures. These advancements likely represent whats
Thats how you isolate the root cause of performance coming next in terms of UX, but they will demand IT
problems that matter most to specific users. infrastructures with access to a great deal more data-
processing power.
Of course, isolating the problem doesnt matter unless
you resolve it. Weve discussed this with application DRILLING DOWN TO MAXIMUM IMPACT
teams who have isolated problems related to runtime Excellence in UX for the next generation of applications
exceptions for Java-based applications in production, has to start by troubleshooting business transaction
but they tended to gloss over those that didnt break performance from the users point of view. From
the application. there, youll be able to drill down to the code level and
intelligently capture the root cause that impacts the user.
Thats a mistake we addressed in a series about Top
Application Performance Challenges. Bhaskar Sunkar,
AppDynamics co-founder and CTO, concluded that, OME D HA BI B is a Director of Product Marketing at
Runtime Exceptions happen. When they occur frequently, AppDynamics. He originally joined AppDynamics as a Principal
Product Manager to lead the development of their world-class
they do appreciably slow down your application. The
PHP, Node.js and Python APM agents. An engineer at heart,
slowness becomes contagious to all transactions being
Omed fell in love with web-scale architecture while directing technology
served by the application. Dont mute them. Dont ignore throughout his career. He spends his time exploring new ways to help
them. Dont dismiss them. Dont convince yourself they some of the largest software deployments in the world meet their
are harmless. If you want a simple way to improve your performance needs.

Solutions Directory
This directory of monitoring, hosting, and optimization services provides comprehensive, factual
comparisons of data gathered from third-party sources and the tool creators organizations.
Solutions in the directory are selected based on several impartial criteria, including solution maturity,
technical innovativeness, relevance, and data availability.
COMPANY PRODUCT CATEGORIES FREE TRIAL HOSTING WEBSITE
akamai.com/us/en/solutions/
CDN, Network & Mobile Monitoring &
Akamai ION Free tier available SaaS products/web-performance/web-
Optimization, FEO
performance-optimization.jsp
Apica Apica Systems APM, Infrastructure Monitoring Limited by usage SaaS apicasystems.com
APM, Mobile and Web RUM, Database On-premise

AppDynamics AppDynamics Free tier available appdynamics.com
Monitoring, Infrastructure Visibility or SaaS
APM, Synthetic Monitoring, Network Available by

AppNeta AppNeta SaaS appneta.com
Monitoring, ITOA, Real User Monitoring request
Appnomic On-premise
AppsOne ITOA Upon request appnomic.com/products/appsone
Systems or SaaS
Aternity Aternity APM, ITOA, Real User Monitoring Upon request On-premise aternity.com
BigPanda BigPanda ITOA, Alert Software 21 days SaaS bigpanda.io
APM, Network Monitoring, ITOA, Database

BMC BMC TrueSight Pulse 14 days SaaS bmc.com/truesightpulse
Monitoring
BrowserStack BrowserStack FEO Limited by usage SaaS browserstack.com
On-premise
Bugsnag Bugsnag Application Montioring 14 days bugsnag.com/
or SaaS
CA App Synthetic Available by ca.com/us/products/ca-app-

CA APM, Synthetic Monitoring SaaS
Monitor request synthetic-monitor.html
CA App Experience ca.com/us/products/ca-app-

CA Mobile APM Free tier available SaaS
Analytics experience-analytics.html
CA Unified
Available by ca.com/us/products/ca-unified-
CA Infrastructure Infrastructure Monitoring On-premise
request infrastructure-management.html
Management
On-premise
Catchpoint Catchpoint Suite Synthetic, RUM, UEM 14 days www.catchpoint.com/products
or SaaS

Available by
Cedexis Impact Infrastructure Monitoring, FEO, ITOA SaaS cedexis.com/products/impact
request
Circonus Circonus Infrastructure Monitoring, ITOA Free tier available SaaS circonus.com
CDN, Network, Mobile, and Web Monitoring

CloudFlare CloudFlare Free tier available CDN cloudflare.com
and Optimization, FEO
APM, Network Monitoring, Middleware Available by On-premise

Correlsense SharePath correlsense.com/product
Monitoring request or SaaS
APM, Infrastructure Monitoring, ITOA, Real

CoScale CoScale 30 days SaaS coscale.com
User Monitoring
Datadog Datadog Performance Metrics Integration and Analysis 14 days SaaS datadoghq.com
Dotcom-
Dotcom-Monitor APM, Infrastructure Monitoring, FEO 30 days SaaS dotcom-monitor.com
Monitor
Infrastructure Monitoring, Network Monitoring,

Dyn Dyn 7 days On-premise dyn.com
ITOA
Dynatrace Application dynatrace.com/solutions/application-

Dynatrace APM, ITOA 30 days On-premise
Monitoring monitoring
Dynatrace Data Center Available by dynatrace.com/platform/offerings/

Dynatrace RUM (web and non-web), synthetic, ITOA On-premise
RUM request data-center-rum
Dynatrace SaaS and 30 days / 1000 On-premise dynatrace.com/platform/offerings/

Dynatrace APM (cloud-native optimized) + AI
Managed hours or SaaS ruxit
Available by dynatrace.com/capabilities/synthetic-
Dynatrace Dynatrace Synthetic Synthetic monitoring, managed load testing SaaS
request monitoring
dynatrace.com/platform/offerings/
Dynatrace Dynatrace UEM Real user monitoring (web and mobile) 30 days On-premise
user-experience-management
Mobile APM (Synthetic Monitoring, Test

Dynatrace Keynote Platform 7 days SaaS keynote.com/platform
Automation)
APM & RUM for Web Applications, Citrix,

On-premise
eG Innovations eG Enterprise Middleware, DB, and Virtualization, Analytics 15 days eginnovations.com
or SaaS
and Reporting
Available by
Evolven Evolven ITOA On-premise evolven.com
request
ExtraHop
ExtraHop Networks ITOA Free tier available SaaS extrahop.com
Networks
On-premise
f5 Big-IP Platform APM, Network Monitoring 30 days f5.com/products/big-ip
or SaaS
fortinet.com/products/management/
Fortinet FortiSIEM ITOA, Network Monitoring 30 days SaaS
fortisiem.html
Java server monitor, production debugging,

FusionReactor FusionReactor 14 days On-premise fusion-reactor.com
crash protection

hp.com/us/en/software-solutions/
HPE HPE APM APM, ITOA, Real User Monitoring 30 days On-premise application-performance-
management
On-premise hp.com/us/en/software-solutions/
HPE LoadRunner Load Testing Free tier available
or SaaS loadrunner-load-testing/try-now.html
hp.com/us/en/software-solutions/
HPE StormRunner Load Testing 30 days SaaS
stormrunner-load-agile-cloud-testing
on-premise ibm.com/software/products/en/api-
IBM IBM API Connect API Management Platform Free tier available
or SaaS connect
IBM Application ibm.com/software/products/en/

APM, Infrastructure Monitoring, Real User On-premise
IBM Performance 30 days ibm-application-performance-
Monitoring or SaaS
Management management
SQL Diagnostic idera.com/productssolutions/

Idera DB monitoring 14 days SaaS
Manager sqlserver/sqldiagnosticmanager
Uptime Infrastructure APM, Infrastructure Monitoring, Network idera.com/it-infrastructure-

Idera 14 days SaaS
Monitor Monitoring management-and-monitoring
infovista.com/products/Application-
APM, Network Monitoring, Real User Available by
InfoVista 5view Applications On-premise Performance-Monitoring-and-
Monitoring request
Management
Infrastructure Management, Application

Instana Instana 14 days 14 days instana.com
Management
Available by
INTECO INTECO Insight APM, Middleware Monitoring On-premise inetco.com/products/inetco-insight
request
On-premise
jClarity Censum JVM Garbage Collection Optimization 7 days jclarity.com/censum
or SaaS
On-premise
jClarity Illuminate JVM Performance Diagnosis and Optimization 14 days jclarity.com/illuminate
or SaaS
jennifersoft.com/en/product/product-
JENNIFERSOFT Jennifer APM 14 days On-premise
summary
Librato Librato Performance Metrics Integration and Analysis 30 days SaaS librato.com
liveaction.com/solutions/liveaction-
LiveAction LiveNX Network Monitoring and Diagnostics 14 days SaaS
network-performance-management
Logentries Logentries Log Management and Analytics Free tier available SaaS logentries.com
Loggly Loggly Log Management and Analytics 30 days SaaS loggly.com
ITOA, APM, Infrastructure Monitoring, Network Available by

LogMatrix NerveCenter On-premise logmatrix.com
Monitoring, Database Monitoring request
ManageEngine APM, Network Monitoring, Infrastructure Available by

ManageEngine On-premise manageengine.com
Applications Manager Monitoring request

microsoft.com/en-us/cloud-platform/
Microsoft System Center 2016 APM 180 days On-premise
system-center
On-premise
Monitis Monitis Network and IT Systems Monitoring 15 days monitis.com
or SaaS
Performance Metrics Integration, Analysis, and Available by On-premise

Moogsoft Incident.MOOG moogsoft.com/product
Response request or SaaS
APM, Infrastructure Monitoring, Network

Nagios Nagios XI Open source On-premise nagios.com/products/nagios-xi
Monitoring, FEO, ITOA
APM, Infrastructure Monitoring, FEO, Available by

Nastel AutoPilot SaaS nastel.com/products/autopilot-m6
Middleware Monitoring request
Available by netscout.com/product/service-
NetScout nGeniusONE APM, Network Monitoring, ITOA On-premise
request provider/ngeniusone-platform
Netuitive Netuitive APM, Infrastructure Monitoring, ITOA 21 days SaaS netuitive.com
neustar.biz/security/web-
Neustar Website
Neustar FEO 30 days SaaS performance-management/
Monitoring
monitoring
APM, Database Monitoring, Availability & Error

New Relic New Relic APM Monitoring, Reports, Team Collaboration, Free tier available SaaS newrelic.com/application-monitoring
Security

op5 op5 Monitor Free tier available SaaS op5.com
Monitoring, FEO, ITOA
Available by
OpsGenie OpsGenie Alert Software On-premise opsgenie.com
request
Opsview Opsview APM, Network Monitoring, ITOA 30 days On-premise opsview.com
Outlyer Outlyer Infrastructure Monitoring 14 days SaaS outlyer.com
PagerDuty PagerDuty ITOA, Alert Software 14 days SaaS pagerduty.com
Pingdom Pingdom APM, FEO 30 days SaaS pingdom.com
poweradmin.com/products/server-
Power Admin PA Server Monitor Infrastructure Monitoring, Network Monitoring 30 days On-premise
monitoring
Progress Telerik Analytics End-User Monitoring and Analytics Free tier available On-premise docs.telerik.com/platform/analytics
Available by
Quest Foglight APM, Database Monitoring, RUM, ITOA On-premise quest.com/foglight
request
Rackspace Rackspace Monitoring Cloud monitoring Free tier available SaaS rackspace.com/cloud/monitoring
Performance Monitoring and Optimization, Available by On-premise

Rigor Rigor rigor.com
RUM request or SaaS

Riverbed SteelCentral 30-90 days On-premise riverbed.com/products/steelcentral
Monitoring, ITOA

Sauce Labs Sauce Labs FEO, Automated Web and Mobile Testing 14 days SaaS saucelabs.com
APM, Infrastructure Monitoring, Network Available by

ScienceLogic ScienceLogic Platform SaaS sciencelogic.com/product
Monitoring request
Available by
SevOne SevOne Infrastructure Monitoring, Network Monitoring SaaS sevone.com
request
SmartBear APM, Synthetic Monitoring, Infrastructure Available by On-premise smartbear.com/product/alertsite/

Alertsite
Software Monitoring, Middleware Monitoring request or SaaS overview
soasta.com/videos/soasta-platform-
SOASTA Soasta Platform Real User Monitoring, Load Testing Up to 100 users SaaS
overview
SolarWinds Network Performance Network Monitoring, ITOA, Database solarwinds.com/network-

30 days On-premise
Worldwide Monitor Monitoring, Log Management performance-monitor
Available by
SpeedCurve SpeedCurve FEO, ITOA SaaS speedcurve.com
request
Spiceworks Spiceworks Network Monitoring, ITOA Free tier available On-premise spiceworks.com
APM, Network Monitoring, Database Available by

Stackify Stackify SaaS stackify.com
Monitoring, ITOA request
On-premise
Sysdig Sysdig Cloud Application Montioring 14 days sysdig.com
or SaaS
Available by
TeamQuest TeamQuest ITOA On-premise teamquest.com
request
ThousandEyes ThousandEyes Network Monitoring, ITOA 15 days SaaS thousandeyes.com
Available by
Tingyn Tingyun App APM, FEO, Real User Monitoring SaaS tingyun.com/tingyun_app.html
request
On-premise
Unravel Data Unravel Application Montioring 30 days unraveldata.com
or SaaS
Available by
VictorOps VictorOps Alert Software On-premise victorops.com
request
Virtual Available by
VirtualWisdom Metrics Monitoring and Analytics SaaS virtualinstruments.com
Instruments request
Available by
Wavefront Wavefront Metrics Monitoring and Analytics SaaS wavefront.com
request
Available by On-premise
xMatters xMatters IT Alerts xmatters.com/maxAPM
request or SaaS
Zabbix Zabbix Network Monitoring Open source On-premise zabbix.com
Open source On-premise

Zenoss Zenoss Infrastructure Monitoring, Network Monitoring zenoss.com/product
version available or SaaS
Zoho APM, FEO, Infrastructure Monitoring, Network

Site24x7 Limited by usage SaaS site24x7.com
Corporation Monitoring

with contextual data passed NOSQL A class of database

from one call to the next; systems that incorporates other
GLOSSARY involves measuring availability,

performance, and functional
means of querying outside of
traditional SQL and does not use
correctness. standard relational structures.
DISTRIBUTED DENIAL OF OAUTH 2 An open standard for

SERVICE (DDOS) A malicious authorization, commonly used
ACID (ATOMICITY, CONSISTENCY, attack on an application consisting by APIs that provide access to user
ISOLATION, DURABILITY) A term of superfluous traffic designed to information.
that refers to the model properties render the service unusable.
of database transactions, PEAK PERFORMANCE Maximum
traditionally used for SQL FUNCTIONAL TESTING The theoretical performance that can
databases. process of testing if a system be achieved by software.
meets functional requirements and
APPLICATION PROGRAMMING specifications. REAL-WORLD PERFORMANCE
INTERFACE (API) A set of The actual performance of
definitions, protocols, and tools HTTP HEADERS Fields that can software when its tested in real-
that specify how software be added to HTTP requests and life circumstances as opposed
components should interact; the responses, allowing the client to theoretical performance that
building blocks for modern web and the server to pass additional measures against the ideal.
and mobile applications. information.
SEQUENCED API API transactions
APPLICATION PERFORMANCE HTTP METHODS Indicates the that are invoked by web and
MONITORING (APM) Combines desired action to be performed on mobile applications in sequence
metrics on all factors that might an endpoint. GET and POST are with contextual data passed from
affect application performance the most commonly used. Also
one call to the next.
(within an application and/or web referred to as HTTP verbs.
server, between database and SERVICE LEVEL AGREEMENT
application server, on a single KEY PERFORMANCE INDICATOR
(SLA) A contractual set of
machine, between client and server, (KPI) A set of indicators to
expectations for levels of service
etc.); usually (but not always) measure data or performance
like monthly uptimebetween the
higher-level. against a particular set of
cloud buyer (customer) and cloud
standards or requirements.
provider (seller).
BENCHMARK A set of tests
that run against a particular LOAD TESTING The process of
SHIFT LEFT TESTING An
software or piece of software measuring a software systems
approach in which software testing
that is used for gathering data response when handling a
is performed earlier in the software
and assessing the performance of specified load.
development lifecycleconducive
whats being tested.
with but not exclusive to Agile and
LOG Contains textual information
DevOps methodologies.
CONTENT DELIVERY NETWORK about an event; a record of activities
(CDN) Geographically and performed by an electronic device
TRANSACTION JOURNAL
topologically distributed servers that is automatically created
and maintained by another Refers to the simultaneous, real-
that cache content (often high-
device; meant to convey detailed time logging of all data updates
bandwidth static content, like
information about the application, in a database. The resulting log
videos, documents, or other large
user, or system activity and functions as an audit trail that can
binaries) to minimize unnecessary
troubleshoot a specific issue after be used to rebuild the database
transport and backbone overload.
the fact. if the original data is corrupted
cURL An open source software, or deleted.
command-line tool, and library for METRIC Numeric measurements
transferring data with URLs. in time. The format includes the TRANSACTION MERGING
measured metric name, metric data The queuing of operations to
DEEP API MONITORING The value, timestamp, metric source, travel as a group to the disk
ability to monitor a series of API and optional tag and are meant to instead of calling the journal
endpoints invoked in sequence convey small information bits. directly for each operation.

NETWORK SECURITY
WEB APPLICATION SECURITY
DENIAL OF SERVICE ATTACKS
IoT SECURITY
dzone.com/security

Optimization and Monitoring Perfomance

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Optimization and Monitoring Perfomance

Загружено:

Авторское право:

Доступные форматы

THE DZONE GUIDE TO

BROUG HT TO YOU IN PA RT NERSHIP WITH

DEAR READE R , TABLE OF CONTE NTS

tuning and monitoring over the years. Gone are the

involve loosely coupled architectures rife with

bottlenecks in the system.

2 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

DESIGNING FOR PARALLEL EXECUTION HASNT

IMPLICATIONS Load balancing continues to be an important

3 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

30% of respondents identify as developers or 100

use Java, and 79% work at organizations using

WHEN WAS THE LAST TIME YOU HAD TO SOLVE A

4 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

30 34 HOW DO YOU PRIORITIZE PERFORMANCE IN YOUR

TRANSACTION BUSINESS TRANS. ADDM APPLICATION

5 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

WHERE DOES OPEN SOURCE FIT IN?

Performance Testing By HPE

CATEGORY NEW RELEASES OPEN SOURCE STRENGTHS

CASE STUDY Detailed breakdown reports with optimization recommendations

WEBSITE go.saas.hpe.com/performanceengineering TWITTER @HPE_Loadrunner BLOG hpe.com/blog/loadrunner

7 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

Where Does Deep 01 Simple endpoint monitoring is not

API Monitoring Fit

02 Availability monitoring is also

03 Your monitors are only as good as

APIs are the backbone of modern applications

may be oblivious to underlying APIs when

where each call depends on the results from

SmartBears The State of API 2016 survey queried both API

8 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

You may be a producer or a consumer of APIs. You may be

API endpoints are not called in isolation by web and

9 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

QUERY-DRIVEN ANALYTICS INTELLIGENT ALERTING MADE FOR ENTERPRISES

#1 IN SELF-SERVICE METRICS FOR DEVOPS AND DEVELOPER TEAMS

2017 Wavefront, Inc. All rights reserved.

Companies like Google and Twitter saw that once customers

Cloud Application Metrics & Analytics By Wavefront

CATEGORY NEW RELEASES OPEN SOURCE STRENGTHS

Smart Alerts: proactively monitor application

WEBSITE wavefront.com TWITTER @WavefrontHQ BLOG wavefront.com/blog

11 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

01 Web workloads are changing,

Performance and the

02 JavaScript engines are focusing

03 Whenever possible, modern

04 Limiting the amount of

In the last 10 years, an incredible amount of

Comparing this to profiles we see during startup of some 25

12 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

Since not all browsers support all new language features,

BE N E D I KT M E UR E R joined Google in 2013 to work on the V8

13 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

Just Isnt Good

Basic web monitoring wont provide any insights

Enough Anymore into how this complex network performs and

CATEGORY NEW RELEASES OPEN SOURCE STRENGTHS

WEBSITE www.catchpoint.com TWITTER @catchpoint BLOG blog.catchpoint.com

15 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III

03 We made the transaction commit

16 DZONES GUIDE TO PERFORMANCE: OPTIMIZATION AND MONITORING, VOLUME III