Академический Документы
Профессиональный Документы
Культура Документы
Key Takeaways
forrester.com
Table Of Contents
2 The Big Data Fabric Market Is Immature But
Will Grow Rapidly
3 Big Data Fabric Evaluation Overview
Evaluation Criteria: Current Offering, Strategy,
And Market Presence
Forresters Evaluation Assesses The
Capabilities Of 11 Big Data Fabric Vendor
Offerings
6 Larger Providers Have An Edge With A
Broader Range Of Functionality
9 Vendor Profiles
Leaders
Strong Performers
Contenders
14 Supplemental Material
The Big Data Fabric Market Is Immature But Will Grow Rapidly
Big data is not an option it has become a necessity for supporting next-generation insights.
Enterprises of all types and sizes are embracing big data, but the gap between business expectations
and the challenges of supporting big data technology (such as Hadoop) has become the primary
motivation to innovate with big data fabric. The collection of technologies enables enterprise architects
to integrate, secure, and govern various data sources through automation, simplification, and selfservices capabilities. It reduces complexity and hides heterogeneity by embodying an abstracted
model of the data processing pipeline that reflects business requirements rather than the complexity of
the underlying systems.
Today, big data fabric is accelerating the delivery of insights by automating key processes for increased
agility while giving business users more autonomy in the data preparation process. Enterprises use it
to support many use cases, such as enabling 360-degree and multidimensional views of the customer,
internet-of-things (IoT) and real-time analytics, offloading data warehouses, fraud detection, integrated
analytics, and risk analytics. Enterprises are using big data fabric primarily because it:
Delivers new actionable insights with minimal effort. Big data fabric offers the ability to
aggregate, transform, cleanse, and integrate data from multiple big data sources, which can
then be presented in dashboards, reporting tools, and web applications. It leverages advanced
technologies such as machine learning, Apache Spark, Hadoop, Kafka, Storm, Ranger, and others
to deliver insights with zero to minimal coding.
Secures big data end-to-end. Big data fabric enables centralized data access and control, and
it enforces a stricter level of data-at-rest and data-in-motion security measures than traditional
approaches. It can remediate security risks with masking, auditing, and encryption across the
fabric. Today, large banks and insurance companies rely on big data fabric to ensure the protection
of critical siloed data.
Enables real-time integrated data across the business. Big data fabric enables data and
metadata sharing between peers, employees, partners, and customers. It allows any application,
process, dashboard, tool, or user to access any integrated data, regardless of where the data is
physically or logically located and regardless of the data format. Big data fabric offers consistent,
timely, and trusted data for internal and external users, creating a go-to place for integrated data
like Google does for searches.
Delivers a self-service data platform for business users. Until recently, data platforms were
mostly used by developers, architects, and data scientists, largely because of the platforms
complexity and limited use cases. Big data fabric emphasizes self-service data preparation,
curation, orchestration, and integration services that nontechnical personnel can leverage. It
enables business users to blend, wrangle, and mash up their own data sets and share them among
peers and other groups for improved decision making.
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
A standalone big data fabric solution. The vendors included in this evaluation provide a software
solution that organizations can implement independent of Hadoop distribution and the analytics/
visualization tool. The solution should not be technologically tied or bundled to any particular
application, product, or solution. The vendor must market the big data fabric as a standalone
product or solution. The solution can run on cloud and/or on-premises platforms.
Big data use cases. The solution must be able to support big data use cases such as customer
churn, the IoT, 360-degree views of customers and the business, advanced analytics, real-time
analytics, and others.
A referenceable install base. There should be 10 or more unique enterprise paying customers
using the big data fabric product that span more than one major geographical region. Each vendor
also provided at least two customer references who Forrester interviewed.
A publicly available product. The participating vendors must have actively marketed a big data
fabric product as of August 1, 2016.
Customer interest. Forrester included only those vendors that customers mentioned during
Forrester inquiry calls during the past 12 months related to big data fabric topics.
Client inquiries and/or technologies that put the vendor on Forresters radar. Forrester
clients often discuss the vendors and products through inquiries and interviews; alternatively, the
vendor may, in Forresters judgment, warrant inclusion or exclusion in this evaluation because of
technology trends and market presence.
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
Product version
evaluated
Vendor
Product evaluated
Denodo Technologies
Denodo Platform
Global IDs
IBM
11.5
Informatica
Informatica Platform
10.1
Oracle
Paxata
SAP
Syncsort
Talend
Trifacta
Waterline Data
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
6.0
9
12.2.0.1.1
12.2.1.2.0
12.2.1.1
12.2.1.1
1.3
SP 12
7
16
9 and 1.4
6.2
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
Informatica, IBM, Oracle, and Talend are Leaders. These vendors offer more comprehensive,
scalable platforms with broader use-case support. Each has a sweet spot enabling it to compete
vigorously in the market. They have had strong offerings in the traditional data integration space
and have been quick to expand their platform to leverage big data technologies. EA pros often
shortlist Informatica for its integration capabilities, but over the past two years it has extended
its platform to support a broader big data fabric that appeals to many enterprises. IBMs strong
data and information management offering, including its broad range of database, Hadoop, and
integration services, helps deliver the big data fabric. Oracle offers a scalable fabric software
and appliance. It continues to expand its existing data platform to support big data use cases,
leveraging its high-performance Hadoop loader, open source integration, and big data appliance.
Talend offers a big data fabric that delivers high scale and performance and supports various big
data use cases.
Denodo, Global IDs, Paxata, SAP, Syncsort, and Trifacta are Strong Performers. Strong
Performers can still be a strong choice, especially if price/performance, broader big-data-as-aservice, integration-as-a-service, and big data appliances are important. Denodos mature data
virtualization technology broadens its coverage to support big data fabric use cases. Global
IDs leverages its core expertise in data discovery, governance, metadata, and data quality to
support various use cases. Paxatas platform has been expanding. It is built on Apache Spark
and optimized to run in Hadoop, leveraging distributing computing and machine learning. SAPs
Hana Vora supports big data initiatives by combining in-memory, Spark, Hadoop, and integration
services in a unique platform. Syncsorts solution supports new big data use cases by leveraging
technologies to collect, integrate, sort, and distribute data. Trifactas data prep software continues
to expand to support big data fabric, leveraging machine learning, sophisticated transformations,
discovery, and enrichment.
Waterline Data is a Contender. Waterline provides a niche solution focused on the enterprise data
catalog space, but it is not a complete data fabric solution. Customers often use Waterline Data
with other vendor solutions, such as data prep software to support big data fabric deployments.
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
Challengers
Contenders
Strong
Performers
Leaders
Strong
Informatica
Talend
IBM
Paxata
Trifacta
Syncsort
Oracle
Denodo Technologies
Go to Forrester.com
to download the
Forrester Wave tool for
more detailed product
evaluations, feature
comparisons, and
customizable rankings.
SAP
Current
offering
Global IDs
Waterline Data
Market presence
Full vendor participation
Weak
Weak
Strategy
Strong
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
rt
Ta
le
nd
Tr
ifa
c
W ta
at
e
Da rlin
ta e
nc
so
Sy
at
ra ica
cl
e
Pa
xa
ta
SA
P
In
fo
r
De
no
do
G
T
lo
ba ech
no
IB l ID
lo
s
M
g
Fo
r
w res
ei te
gh rs
tin
g
ie
s
Current offering
50%
3.54 2.38 4.08 4.53 3.57 4.00 2.93 3.19 4.13 3.72 2.15
Data ingestion
10%
3.00 2.00 3.00 4.00 3.00 3.00 3.00 4.00 4.00 2.00 1.00
Data orchestration
15%
4.00 2.00 3.00 4.00 4.00 4.00 3.00 3.00 4.00 4.00 0.00
Data discovery
15%
4.00 2.00 3.00 5.00 3.00 4.00 4.00 2.00 3.00 4.00 2.50
Data management
20%
4.20 2.60 5.00 5.00 4.60 4.60 3.00 3.80 5.00 4.20 3.00
20%
3.00 2.80 4.40 4.40 3.00 4.40 2.40 2.40 4.40 4.40 3.60
Fabric management
20%
3.00 2.50 5.00 4.50 3.50 3.50 2.50 4.00 4.00 3.00 1.75
Strategy
50%
3.30 3.00 4.05 4.05 3.70 3.00 3.60 3.00 3.65 3.00 2.60
Ability to execute
35%
3.00 3.00 4.00 4.00 4.00 3.00 3.00 3.00 4.00 3.00 2.00
Road map
30%
3.00 3.00 4.00 4.00 3.00 3.00 4.00 3.00 4.00 3.00 3.00
Vision
30%
4.00 3.00 4.00 4.00 4.00 3.00 4.00 3.00 3.00 3.00 3.00
Professional services
5%
3.00 3.00 5.00 5.00 4.00 3.00 3.00 3.00 3.00 3.00 2.00
Market presence
0%
2.50 1.65 4.00 4.45 3.65 2.40 3.00 2.70 3.65 2.85 1.65
Product revenue
35%
2.00 1.00 4.00 4.00 4.00 2.00 3.00 2.00 3.00 2.00 1.00
Customer base
30%
3.00 2.00 4.00 5.00 4.00 2.00 3.00 3.00 5.00 3.00 2.00
Market awareness
20%
3.00 2.00 4.00 4.00 3.00 4.00 3.00 4.00 4.00 4.00 2.00
Partner ecosystem
15%
2.00 2.00 4.00 5.00 3.00 2.00 3.00 2.00 2.00 3.00 2.00
Vendor Profiles
Whether they are a Leader, Strong Performer, or Contender, every big data fabric vendor in this
Forrester Wave offers a credible solution to support new and emerging use cases. This evaluation of
the big data fabric market is intended to be a starting point only. We encourage clients to view the
detailed product evaluations and adapt the criteria weightings to fit their individual needs through
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
the Forrester Wave Excel-based vendor comparison tool. Clients can also schedule an inquiry to
have a conversation about the market and specific vendor products to discuss specific business and
technology requirements.
Leaders
IBM differentiates with its broad information management capabilities. IBM is known for
its strong data and information management offering, and now the company is extending it to
support big data fabric deployments. Unlike other big data fabric vendors, IBM provides its own
Hadoop distribution, yet it also provides connectors to support connectivity to Hadoop and Spark
ecosystems. IBMs key strengths lie in high-end scalability, support for complex data issues, endto-end big data governance, integrated metadata, and granular security and privacy controls. In
addition, several reference customers mentioned that IBM Global Business Services helped them
implement a big data fabric quicker through customized models, access patterns, and integration
with existing analytical tooling. IBM is a good fit for enterprises that have complex legacy data,
have multiple data lakes, require tight security controls, and want to leverage a hybrid platform.
Informatica provides a big data fabric with all the trimmings. With more than 7,000 firms
using Informatica for their information management initiatives, its technology is proven and
mature. Informaticas strength lies in increasing developer productivity via its intuitive visual and
metadata-driven development environment, which developers can leverage for big data sources
and prebuilt parsers, transformers, and connectors that help parse, integrate, cleanse, mask, and
match data natively on Hadoop. It also supports the reuse of workflow pipelines to support other
infrastructures. Informatica provides an enterprise information catalog, which catalogs data assets
across the enterprise using an inferred understanding of the data as well as crowdsourced input
from business analysts, stewards, and architects. Enterprises use Informaticas big data fabric
solutions to deliver enterprise data lakes for real-time analytics, IoT, integrated analytics, and realtime operational intelligence like fraud detection and proactive customer engagement.
Talend offers a compelling, flexibly priced big data fabric solution. The Talend big data fabric
combines several technologies to deliver a common set of easy-to-use tools for real-time, batch,
or dynamic integration running in on-premises, cloud, or hybrid environments. Talend Platform
for Big Data simplifies the process of working with Hadoop and Spark distributions, requiring no
coding to perform various activities. In the Eclipse-based Talend user interface, you can drag, drop,
and configure graphical components representing Hadoop-related data transformation and data
quality operations and natively connect to applications, databases, NoSQL, and the IoT. Talend
automatically generates the corresponding native Spark or MapReduce code for transforming data
using the Hadoop cluster. However, data preparation, discovery, and self-service are still emerging
functionality compared with leading big data fabric vendors.
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
10
Oracle offers a viable and scalable big data fabric solution. Oracles GoldenGate replication
solution provides real-time capabilities, integrating with Oracle Data Integrator tools to deliver a
unified development experience. It also supports real-time big data integration to dynamically
push data into the HDFS, HBase, Hive, Flume, Storm, and Kafka big data frameworks. Oracle
Big Data SQL provides data federation with Hadoop; Oracle Big Data Connectors deliver a highperformance Hadoop to Oracle Database loader and enables optimized analysis using Oracles
distribution of open source R directly on Hadoop data. Oracles key strengths lie in its security and
governance capabilities, highly scalable data movement and transformations, and tight integration
with Oracle Big Data Appliance. Its customers use big data fabric to support various use cases,
including real-time analytics across disparate data sources (such as data lakes), customer
intelligence, IOT applications, and other big data applications and insights.
Strong Performers
Denodo Technologies extends its platform to support big data fabric. Unlike other large
software vendors in this evaluation, Denodo is a pure-play data virtualization vendor now extending
the platform to support big data initiatives. Today, several enterprises are leveraging Denodo to
support big data fabric deployments such as virtual big data marts, big data analytics, realtime analytics, and IoT data processing in various vertical industries. Denodos key strength
is delivering a unified and centralized data services fabric with security and real-time integration
across multiple traditional and big data sources, including Hadoop, NoSQL, cloud, and softwareas-a-service (SaaS). Customers like its easy-to-use, simple yet sophisticated data modeling
capabilities, search, and support for various big data sources.
Global IDs offers a viable big data fabric solution for all enterprises. Global IDs has been
providing data management solutions to retailers, financial services, telcos, pharmaceuticals,
and healthcare companies for more than 15 years. It addresses the data ecosystem problem
by leveraging its core expertise in data discovery, governance, profiling, lineage, and quality.
Enterprises can deploy the product in on-premises, cloud, and hybrid environments, and it
is optimized for performance on the Hadoop ecosystem. Business analysts can contribute
business terms and metadata within the product and focus on technology-management-business
collaboration. Global IDs provides extensive metadata functionality in its products to support
end-to-end big data fabric deployments. Enterprises with complex big data platforms that need
powerful metadata management and lineage should look at Global IDs.
Paxata offers easy-to-use big data fabric focusing on self-service. Paxatas information
platform provides an interactive, analyst-centric data preparation solution that is powered by a
unified set of technologies designed to support data integration, quality, governance, collaboration,
and enrichment. Machine learning algorithms help business analysts easily understand, categorize,
integrate, and connect data more quickly. The platform is built on Apache Spark and optimized to
run in the Hadoop environment, leveraging distributed computing, machine learning, and visual
workspace. Paxata focuses on delivering an easy-to-use solution that eliminates the need for
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
11
coding, scripting, and sampling. Enterprises are using Paxata to support ad hoc, operational,
predictive, and real-time analytics. However, customers report that Paxatas integration with a few
traditional and legacy data sources is not optimized.
SAP Hana Vora extends the SAP platform to support big data fabric. SAP offers a
comprehensive data management framework to support data access, data movement, data quality,
transformation, and integration. And with SAP Hana Vora, it extends the platform to support big
data initiatives, including those for Hadoop, Spark, NoSQL, and in-memory computing fabrics.
SAP Hana Vora couples tightly with Apache Spark to expose Vora data and processing to Spark.
Enterprises can deploy machine learning algorithms in Hana directly or to Spark. In addition,
organizations can distribute data preparation operations such as sorting, joining, and aggregation
across Hana and Spark clusters. Enterprises use SAPs big data fabric to support various use
cases, including a 360-degree view of the customer, fraud detection, IoT, and real-time insights.
Syncsort offers a scalable big data fabric solution. Syncsort provides a big data fabric solution
that focuses on simplifying the process of collecting, integrating, sorting, and distributing enterprise
data to deliver actionable insights, while requiring fewer resources. Syncsorts top use cases for
big data fabric include leveraging data from mainframes and other traditional systems in Hadoop,
while ensuring data lineage, security, and efficiency. Syncsort allows enterprises to deploy a fullfeatured ETL environment on premise and on AWS EC2, Amazon Elastic MapReduce, and Google
Cloud Platform, with forthcoming support for Microsoft Azure. Data transformations are defined in
a visual, wizard style GUI, and the same jobs can be executed natively in MapReduce, Spark, or
stand-alone servers, without any changes. Although DMX-h does not ship with built-in machine
learning capabilities, they can be included as task extensions and custom functions as part of the
data flows. Syncsort is still expanding its self-service capabilities.
Trifactas solution makes self-service big data fabric easy to deploy. Trifactas self-service data
preparation software enables enterprises to easily explore, transform, and join together raw and
diverse data sources into clean and structured outputs for a variety of analytic purposes. Trifacta
leverages machine learning algorithms to automate and simplify the interaction with data, making
data wrangling a self-service process for analysts and business users. The vendor supports batch
and on-demand natively and continuous ingestion through integrations with partners StreamSets
and Google Dataflow. It has extensive metadata management directly within the application and
through integrations with partners such as Cloudera Navigator, Apache Atlas, Waterline Data, and
Alation. Trifacta visually tracks and presents the lineage of data transformation steps for specific
data sets and across multi-data-set-wrangling workflows. However, enterprises are reporting that
Trifacta lacks high-end scalable big data fabric deployments.
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
12
Contenders
Waterline Data focuses on delivering a Smart Data Catalog for big data environments.
Waterline Data accelerates data discovery, governance, and time-to-value through its Smart Data
Catalog, which automates the cataloging of all data lake assets. It empowers business analysts
and data scientists to find, understand, and provision trusted data to extract insights and create
accurate business decisions without coding and manual exploration. In addition to automated
discovery, it also enables business analyst communities to crowdsource tagging and annotations
and allows data stewards to curate the data catalog using an agile approach. Waterline ensures
that the catalog is up to date by detecting changes and automatically cataloging new and updated
data assets including curated business metadata and data lineage. While Waterline supports onpremises and cloud, hybrid is currently planned in a future release.
Analyst Advisory
Webinar
Learn more.
Learn more.
Learn more.
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
13
Supplemental Material
Online Resource
The online version of Figure 2 is an Excel-based vendor comparison tool that provides detailed product
evaluations and customizable rankings.
Data Sources Used In This Forrester Wave
Forrester used a combination of 32 data sources to assess the strengths and weaknesses of each solution:
Vendor surveys. Forrester surveyed vendors on their capabilities as they relate to the evaluation
criteria. Once we analyzed the completed vendor surveys, we conducted vendor calls where
necessary to gather details of vendor qualifications.
Product briefings and demos. We asked vendors to conduct briefings and demonstrations of
their products functionality. We used findings from these product briefings and demos to validate
details of each vendors product capabilities.
Customer reference calls. To validate product and vendor qualifications, Forrester also conducted
reference calls or conducted surveys with at least one of each vendors current customers.
The Forrester Wave Methodology
We conduct primary research to develop a list of vendors that meet our criteria to be evaluated in this
market. From that initial pool of vendors, we then narrow our final list. We choose these vendors based
on: 1) product fit; 2) customer success; and 3) Forrester client demand. We eliminate vendors that have
limited customer references and products that dont fit the scope of our evaluation.
After examining past research, user need assessments, and vendor and expert interviews, we develop
the initial evaluation criteria. To evaluate the vendors and their products against our set of criteria,
we gather details of product qualifications through a combination of lab evaluations, questionnaires,
demos, and/or discussions with client references. We send evaluations to the vendors for their review,
and we adjust the evaluations to provide the most accurate view of vendor offerings and strategies.
We set default weightings to reflect our analysis of the needs of large user companies and/or other
scenarios as outlined in the Forrester Wave document and then score the vendors based on a
clearly defined scale. These default weightings are intended only as a starting point, and we encourage
readers to adapt the weightings to fit their individual needs through the Excel-based tool. The final
scores generate the graphical depiction of the market based on current offering, strategy, and market
presence. Forrester intends to update vendor evaluations regularly as product capabilities and vendor
strategies evolve. For more information on the methodology that every Forrester Wave follows, go to
http://www.forrester.com/marketing/policies/forrester-wave-methodology.html.
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
14
Integrity Policy
All of Forresters research, including Forrester Wave evaluations, is conducted according to our
Integrity Policy. For more information, go to http://www.forrester.com/marketing/policies/integritypolicy.html.
Endnotes
Increasing data volume is creating new challenges in integration, security, curation, administration, and governance.
Business users want real-time trusted data to make accurate business decisions, while technology management
wants to simplify administration and lower costs. Closing the big data platform gap is the goal of the emerging
collection of technologies that Forrester calls big data fabric. Enterprise architects should look at big data fabric to
accelerate their big data initiatives, monetize big data sources, and respond more quickly to business needs and
competitive threats. See the Forrester report Big Data Fabric Drives Innovation And Growth.
2016 Forrester Research, Inc. Unauthorized copying or distributing is a violation of copyright law.
Citations@forrester.com or +1 866-367-7378
15
Technology Management
Professionals
CIO
Application Development
& Delivery
Enterprise Architecture
Infrastructure & Operations
Security & Risk
Sourcing & Vendor
Management
Technology Industry
Professionals
Analyst Relations
Client support
For information on hard-copy or electronic reprints, please contact Client Support at
+1 866-367-7378, +1 617-613-5730, or clientsupport@forrester.com. We offer quantity
discounts and special pricing for academic and nonprofit institutions.
Forrester Research (Nasdaq: FORR) is one of the most influential research and advisory firms in the world. We work with
business and technology leaders to develop customer-obsessed strategies that drive growth. Through proprietary
research, data, custom consulting, exclusive executive peer groups, and events, the Forrester experience is about a
singular and powerful purpose: to challenge the thinking of our clients to help them lead change in their organizations.
132141
For more information, visit forrester.com.