Anant Jhingran
Apigee Insights
Contents
Introduction3
Data as Currency and Catalyst 4
Big, Noisy…and Useful 5
The Power of Context 6
Relief from the ETL and EDW Legacy 8
Broad Data and Signal Amplification 9
Increasing Signal to Noise 11
Correlating Data Across Channels 11
Considerations for Getting the Most From Your API Data 12
Conclusion13
About the Author 13
2 2 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
Introduction
Businesses are expanding their digital ecosystems into the app economy. The app
economy is changing the way users consume digital information and customers
interact with enterprises. As enterprises increasingly interact with partners, devel-
opers, and internal resources through mobile, social media, and app experiences,
the number of sources of useful data keeps expanding. Data becomes the currency
of the digital ecosystem and enterprises are rushing to make sense of the new world
of data as their highly structured internal systems meet the ocean of unstructured
and semi-structured data in the big data world. This eBook introduces you to broad
data, the bridge between big data and the world characterized by legacy ETL and
EDW technologies and internal enterprise systems.
Digital ecosystems enable an enterprise to broaden the reach of its digital assets,
increase revenue, accelerate and expand partnerships, and create innovation. In the
app economy, digital ecosystems are enabled quickly, securely, and at scale through
APIs. APIs are like the valves and the arteries that connect the heart of a company,
its digital assets, to the rest of the ecosystem. Just as assessing the health of a
person requires blood tests as well as information about their lifestyle, assessing and
optimizing the health of a digital ecosystem requires a breadth of data. It requires
that context about users, developers and partners from internal and external sources
be combined with API data to create a 360-degree view from which you can draw
actionable business insights.
Broad data starts with a focus on digital ecosystem data. That’s internal data at
the core of enterprise commerce as well as external data from interactions with the
enterprise. Broad data combines the relevant contextual signals from outside of the
system to deliver insights and help optimize the health of your business.
In order to participate and be relevant in the app economy, and to derive the most
valuable business insights, a whole new paradigm of data analysis is needed.
3 3 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
In this eBook, we explore how broad data and the power of context informs this
paradigm shift.
We describe why existing internal enterprise technologies and machine data analytics
tools don’t work in the broad data world. We explore how data transport systems need
to change. The traditional approach of putting large datasets on disks and shipping
them where they need to go is just not going to work in the world of broad data.
As it becomes impossible to move all the relevant external data to repositories inside
a business, companies will need to create new infrastructures, build new types of
data analysis systems, and forge new kinds of business relationships.
The challenge for companies is to gain access to and gain value from diverse datasets
that they don’t own and that they cannot move. In short, businesses need to be able
to access data wherever it resides, to reduce the noise, and to harvest insights from
the signals in the data.
But even more importantly, this data, when harnessed, is a catalyst to improve
retailer interactions with the app, the developer, and ultimately the user. The
analysis of this data immediately helps improve the retailer’s business. Big data at
the edge of the enterprise is like water—it flows readily, but if harnessed using new
techniques, it also helps power the business and change it dramatically.
4 4 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
The world of connected devices and communications data has created a universe
of data that looks very different from the structured tables and rows at work in our
transactional systems of record, such as ecommerce platforms and databases.
There has been a great deal of attention paid to “unstructured” data, which can be
misleading, because what’s most important about big data is not its volume and
lack of structure, but the fact that there are important signals buried in it that can
be extracted and exploited.
Rather than worry about “speeds and feeds” and the obligation to process raw
volume, the right questions for the enterprise to ask are: What is the right data
source to explore? What data sources provide valuable signals for us?
5 5 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
Here are two examples where context lends powerful value to data analysis,
especially when cross-referenced:
›› The transactional context of app developers and users. How many times did
a developer access your API? How many purchases has an app user made in the
last month through your ecommerce app? How many times has the app user
phoned your support line? Much of this data is stored in internal systems.
›› The social context of both app developers and users, which can lead to
insights for a business on its developer innovation programs. What kind of
comments are the developers and users making in developer forums and
social media about your API or your brand? This data lies almost entirely
outside of the organization.
6 6 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
The app that he builds stores information on user behavior outside the telecom
API. The user of the app is a known customer of the telecom, which has records
of her payment and phone-activation history in its traditional systems of record.
With all of this contextual information, the telecom provider can now determine
how engaged the developer is, how persuasive his app is to the user, and how the
user is interacting with the telecom provider through the app and other channels
with the telecom. This 360-degree view, when captured, is far more influential in
engaging the user and the developer than capturing only the traffic to and from
the app through an API.
7 7 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
The good news is that when the data is highly structured—even if it is siloed—it
requires less logic to normalize and analyze. EDW and ETL were great for
answering known questions derived from known data sources. If an enterprise
wanted to know the churn rate of a channel, it could build a model with a high
degree of confidence, then fill its EDW with carefully selected data, filtered out by
ETL. Once this connection is set up, this apparatus could repeatedly answer the
same question efficiently.
But today, new apps and APIs are constantly being developed and deployed,
sometimes with a “half life” of only weeks. Not only is there a lot of semi-struc-
tured and unstructured data in the world of big data, there are a lot of unknown
structures, and new ones are added to the mix every day. The world of big data is
almost the opposite of the controlled world of the internal enterprise. The ETL and
EDW paradigm was not designed for such a dynamic world.
8 8 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
In the past, an enterprise collected all the data that it thought might be useful for
optimizing its business and centralized it for analysis. But in the new world, it can
no longer do so. While this might appear to be a problem, it is not. Everywhere data
is being collected, it is also being exposed through APIs. The action is at the edge
of the enterprise, where the interactions happen, not in a central processing facility.
APIs form the bridge for an enterprise wanting to access external data that it does
not collect and store, such as Twitter feeds or Github posts, and correlate this with
data that is buried deep behind its own firewall. It becomes much easier to access
data that is outside of the enterprise’s direct control, without the need to ingest,
store, and process it—while its value can still be determined and put to use.
Enterprises need a system that delivers the best of both worlds—fast. That is
broad data.
The world at the edge of the enterprise has some key patterns—apps talk to
APIs, which produce data; likewise APIs expose data. Developers build apps, and
users interact with apps built by developers. Broad data starts with the business
questions that an enterprise is looking to answer and builds upon the interactions
at the edge, around the APIs and apps. Then it lays out a loose and expandable
structure around the core concepts of entities, events, and analytical needs that
can help normalize data, point it towards the business value sought, and deliver
time to value without old-world constraints.
9 9 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
In the broad data/API world, the definition of “entity” is much more fluid, as the
properties of apps and their developers change all the time and the “events”
change too. It’s critical that the system you use for analysis has the flexibility to
recognize when events and entities change, to accommodate those changes,
and to produce a useful signal for your business right away. You need a loose
framework that allows you to ask new business questions all the time.
The data seen by such a system is not just big; it is broad, in the sense that
hundreds of data sources combine to produce some strong signals—as opposed to
the handful of signals collected by focusing only on data that was generated within
the enterprise.
This broad data platform encapsulates the common entities and events at the
edge, captures the common analytical needs, and allows simple extensions for
data, domains, and business problems. This is the future for insights at the edge of
the enterprise.
10 10 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
This is also the essence of signal amplification: Understand what each data source
is telling you, stitch it together with other data sources, and produce a more
reliable signal than any one data source can give you.
11 11 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
experience. Individual behaviors might vary, but the interactions across the two
channels are likely to be different. A website supports a more exploratory approach
while mobile apps need to be more streamlined. The essence of the broad data
challenge in this case is how to correlate two behavior patterns of one shopper and
how to meaningfully differentiate the behaviors of an aggregate population across
the channels so that we can build appropriate experiences.
12 12 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
›› Always ensure that you have key business drivers in mind. These might
be information accounting, developer adoption, or user behavior. Convert the
business problem into bite-sized tasks that answer the questions:
What data do I need?
What entities and events should I monitor?
What analysis is appropriate?
What is the connection to the business problem?
Solve the problem from both ends—“problem down” and “technology up.”
Conclusion
In order to profitably participate in the app economy, a new paradigm of analysis
is needed—one that understands that the action is at the edge of the enterprise.
By analyzing not only the transactions but also the interactions and contextual
activity around an API, businesses can begin extracting the valuable, actionable
signals from the broad ocean of noise that big and broad data have delivered to
our doorsteps. Once we start extracting the signals and exploring the breadth of
data around our APIs and stop worrying so much about ingesting and processing
massive volumes of data, a rich context develops that provides new clues about
how to survive and thrive in the app economy.
13 13 ©2012 Apigee. Confidential – All Rights Reserved. ©2013 Apigee. All Rights Reserved.
Apigee Insights
About Apigee
Apigee is the leading provider of API technology and services for enterprises and
developers. Hundreds of companies including AT&T, Bechtel, eBay, Korea Telecom,
Telefonica and Walgreens, as well as tens of thousands of developers use Apigee to
simplify the delivery, management and analysis of APIs and apps. Apigee’s global
headquarters are in Palo Alto, California, and it also has offices in Bangalore, India;
London; and Austin, Texas. To learn more, go to apigee.com.