Вы находитесь на странице: 1из 16

White Paper

Inside the Vibe Virtual Data Machine


A technical perspective on Vibe and its
Map once, Deploy anywhere. capabilities
This document contains Confdential, Proprietary and Trade Secret Information (Confdential
Information) of Informatica Corporation and may not be copied, distributed, duplicated, or otherwise
reproduced in any manner without the prior written consent of Informatica.
While every attempt has been made to ensure that the information in this document is accurate and
complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept
responsibility for any kind of loss resulting from the use of information contained in this document.
The information contained in this document is subject to change without notice.
The incorporation of the product attributes discussed in these materials into any release or upgrade of
any Informatica software productas well as the timing of any such release or upgradeis at the sole
discretion of Informatica.
Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670;
6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following
pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700.
This edition published July 2014
1 Inside the Vibe Virtual Data Machine: A technical perspective on Vibe and its Map once, Deploy anywhere. capabilities
White Paper
Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
What Is the Vibe Virtual Data Machine? . . . . . . . . . . . . . . . . . . . . . . . . 2
The Core Concept in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
High-Level Architecture for the Virtual Data Machine . . . . . . . . . . . . . . . . . . . . . 3
Vibe: Powering the Informatica Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Vibe Foundational Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Metadata-Driven. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Extensible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Adaptable to Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Reliable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Use Cases for the Vibe Virtual Data Machine . . . . . . . . . . . . . . . . . . . . 7
Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Worldwide Financial Services Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Leading Health Services Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Large North American Logistics Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2
Introduction
The volume of data is exploding to the point that humans will create more information this year than in the last
5,000 years combined. The challenge for enterprise information management (EIM) is to deal with the volume
and complexity of this data while delivering business value. Business leaders are demanding much faster
project delivery from IT without compromising quality or costs.
Yet IT is struggling to keep up with these demands and is looking for ways to ensure timely, high-quality data
to accelerate delivery of innovative solutions. The problem is that enterprise data is highly fragmented, hard to
fnd, and hard to access. Traditionally, data has been tightly coupled to applications and repositories, which
created unconnected silos of data. Add to that the explosive growth of new datatypes such as third-party data,
cloud, social, mobile, Big Data, and the Internet of Things, and IT has a highly complex data environment
to manage.
Managing a next-generation enterprise data environment can be dramatically simplifed with the Vibe virtual
data machine (VDM)a unique innovation from Informatica. With Vibe, users can map once and deploy
anywherein the cloud, on-premise, in databases, applications, middleware, on a Hadoop cluster, in batch,
request/response, or real time.
Vibe has been the key run-time component of Informatica

tools such as PowerCenter

, PowerCenter
Express, and PowerCenter Big Data Edition for years. It is now being extended to additional tools, execution
environments, and datatypes. Vibe is also embeddable, meaning that independent software vendors and
systems integrators can take advantage of the REST-based API framework to embed data integration and
data quality directly into their applications and services and make them accessible through the native UI of
those applications.
The key business beneft of Vibe is that new execution environments such as Hadoop or a data warehouse
appliance can be supported without any additional training in tools or languages. Vibe enables Informatica
developers to quite literally become Hadoop developers, insulating them from changes in technology while
enabling them to deliver clean, safe, connected data. The result is faster project delivery, overall cost savings,
and greater adaptability for change across an enterprise data management environment.
3 Inside the Vibe Virtual Data Machine: A technical perspective on Vibe and its Map once, Deploy anywhere. capabilities
What Is the Vibe Virtual Data Machine?
Business requires faster, better, and less-expensive delivery of solutions from IT. This is a large and growing
challenge for architects. The problem of increasing data fragmentation means that data management is
becoming a signifcant choke point in the delivery cycle of business initiatives.
A data engine that can abstract differences in data storage and access technologies means that data users,
both business and IT, dont need to worry about the underlying physical data infrastructure. Vibe enables IT to
deal with data complexity using existing people, tools, and code. Because the number of technologies to be
learned and managed is reduced, it also speeds the delivery of data projects for the organization. In short,
data users can focus on managing data rather than managing technology.
Because Vibe is an embeddable data management engine, it can also be integrated in other applications or
tools. For example, it could be used to include the processing of data quality rules at the point of data entry
on an application, catching data errors at the source and signifcantly reducing the business impact of bad or
incomplete data.
Another key aspect of Vibe is that if underlying technologies change or new technologies evolve, you are
allowed to redeploy business mappings without recoding, redevelopment, or respecifcation. Vibe can
be embedded into applications, middleware infrastructure, and deviceswherever you need to access,
aggregate, and manage data. More importantly, Vibe allows organizations to reuse their existing Informatica
resources with little or no additional training on new technologies as they emerge. Companies can insulate
their employees and processes from technology change and substantially reduce data preparation time.
Figure 1. With Vibe, businesses enjoy unlimited data deployment
4
Just as a virtual operating system can run on multiple physical platforms, Vibe enables organizations to deploy
data management tools on multiple run-time environments with no impact to the business or IT users who are
creating the business logic to manage the data. This capability makes Vibe a virtual data machine.
The Core Concept in Practice
Vibe and the use of a single declarative language have enabled Informatica to integrate a comprehensive
suite of data management solutions on the Informatica platform. Users can create data transformation
mappings, data quality rules, and other such data integration logic for the virtual data machine.
For example, Vibe enables data mappings developed in PowerCenter to be reused in a data quality rule,
and a data quality rule to be executed in the cloud. The ability to create logic once and deploy it across
multiple Informatica solutions without having to regenerate code or worry about implementation details of the
underlying technology allows organizations to concentrate on solving business problems instead of system
integration challenges. Vibe lets IT be more effcient by using the same people, processes, and data mappings
both on-premise and in the cloud and across multiple technologies such as Hadoop, SQL databases, and
data appliances.
High-Level Architecture for the Virtual Data Machine
Figure 3 depicts the four key logical components of the Vibe virtual data machinethe defnition layer
(transformation library), implementation layer (optimizer and executor), and physical layer (connectors).
With Informatica Vibe, create logic can be created once and deployed across multiple
Informatica solutions without having to regenerate code.
5 Inside the Vibe Virtual Data Machine: A technical perspective on Vibe and its Map once, Deploy anywhere. capabilities
Denition Layer
The transformation library is in the defnition layer, where transformation logic is defned in a visual
format. Using prebuilt Informatica transformations or custom-built transformations, developers can create a
transformational fow of data from source to target systems. The transformation logic defnition is independent
of the physical implementation and allows creation of common metadata, supporting portability of
transformation logic across the platform.
Implementation Layer
The implementation layer includes the optimizer and the executor. These two components together form the
data processing layer. They are responsible for optimizing the mapping logic and business rules, converting
this optimized logic to Informatica code, and executing it in a run-time environment selected by the developer.
Optimizer
The optimizer provides an optimal execution path that takes
into account the characteristics and structure of the data used in
a mapping. The goal of this optimization step is to reduce the
volume of data that the data engine moves and processes. Figure
4 shows the multiple steps of optimization.
The optimizer parses the mapping, converts it to an internal
representation, and breaks it down into smaller units for analysis.
Based on the semantics of the mapping units and the data
selection criteria defned within those blocks, the semantic
optimizer pushes the execution of search expressions (comparison
operators such as =, !=, <, and <, and others such as ALL, IN,
and BETWEEN) defned in the where clause to the source. This
step allows substantial reduction in the data extracted. Some of
the optimizations done at this stage are early selection, early
projection, and predicate inference.
Virtual Data Machine
Parser Masking Transformation Library
Optimizer
Executor
Connectors
User-dened mapping
Optimized mapping sent
to executor
Optimizer Components
Mapping Pre-Processor (Compiler, translator)
Mapping Fragmentor
Semantic Optimizer
(Predicate optimization)
Cost-Based Optimizer
(Cost engine, statistics manager)
Deployment Prep
(Mapping generator, physical optimization)
6
Next, the mapping is run through a cost-based optimizer ,which considers performance, throughput and
resource consumption in determining the optimal execution path. It generates alternate semantically equivalent
mappings, evaluates the estimated cost of executing each mapping, and selects the one with the least cost. It
uses a statistics manager module to get on-demand or persisted statistics on cardinality, density, and
selectivity of the data constructs being used. Some examples of this optimization are join reordering and
aggregation positioning.
Finally, run-time optimizationsincluding resource allocation, partitioning, and schedulingare applied on
the selected mapping. Once the optimization process is completed, the mapping units are recombined by the
mapping generator to provide an optimized mapping to the executor.
Executor
The executor is responsible for translating mapping operators into their corresponding run-time constructs for
the target run-time environment. There is an executor instance for every type of run-time environment that Vibe
supports. When a developer designs a mapping, they also specify the target runtime environments for the
mapping. The optimizer initiates the appropriate executor based on this specifcation. The optimizer initiates
the appropriate executor based on this specifcation. The executor is also responsible for orchestrating the
execution of the mapping.
As an example, when running a mapping in a Hive run-time environment, the executor converts the mapping
into a series of HiveQL tasks and manages the execution of those tasks in the Hive run-time environment.
Physical Layer
Informatica provides connectors to a variety of data sources out of the box. These include all standard
relational database systems, analytic appliances, legacy mainframe systems, and ERP applications. Using the
connectors, developers can access data scattered across on-premise data stores or cloud-based SaaS systems,
as well as social media sources such as LinkedIn, Twitter, and Facebook.
For real-time use cases, Vibe can connect to and process data from messaging systems such as JMS and
WebSphere MQ. Vibe provides change data capture connectors that can capture changes in operational
systems and propagate them downstream. Finally, Informatica supplies APIs that third-party developers can use
to design and add new connectors to access data through Vibe.
Vibe: Powering the Informatica Platform
Vibe is the run-time engine that most Informatica tools run on. This allows development of data integration
patterns around synchronization, data enrichment, masking, subsetting, data validation, and partner
management, all on a single platform. The design of the VDM allows for any of these integration patterns to
run in different modalities without any recoding or reprogramming of the code. Moreover, this code can be
deployed in the cloud or on-premise, in databases and enterprise applications, or on a Hadoop cluster.
Vibe provides new adapter and cloud SDKs for partners and a Vibe Starter Kit for the development
of custom data integration connectors using Eclipse-based authoring, allowing for third-party vendors to
implement and embed the virtual data machine for their data integration needs.
7 Inside the Vibe Virtual Data Machine: A technical perspective on Vibe and its Map once, Deploy anywhere. capabilities
Vibe Foundational Principles
Vibe is guided by a set of core principles that help to shape product enhancements, acquisitions, future road
maps, and best practices. These principles have been key in developing a VDM that is:
Metadata-driven
Extensible
Adaptable to change
Reliable
Metadata-Driven
Data is at the heart of every business system. For example, fnancial systems contain accounting data and
sales systems contain customer transaction data.
At the heart of the VDM vision is the idea that integrated data is a business need rather than a technology
need. To fully satisfy business needs, an integration system must be centered on data about datain other
words, metadata. Metadata tells an organization what data it has, where it is, its semantic meaning, its
business context, who is using it, and how it is changing (among other things).
Implications
There are many different kinds of metadata since data itself and virtually every enterprise application and
technology component contain metadata. Most enterprises have multiple metadata sources. Productive
enterprise data management environments must be able to exchange metadata seamlessly across platform
components to effectively manage data and automate data management processes to increase productivity.
8
Metadata-driven means more than just a static repository of data about data. Metadata can be leveraged to
automatically perform routine or common functions. It is also possible to regenerate a given mapping or data
quality rule simply by changing the model/metadata that was used to frst create it.
Benets
Visibility and transparency. Business owners can track the fow of data throughout their environment, from
initial capture to output or from business term back to the source.
Risk management. Data changes are governed and managed proactively with an audit trail, formal
defnitions, clear ownership, and accountability.
Rapid time to market. Business analysis and solution design is faster and more effcient due to common
defnitions, a well-defned system of record, ability to perform rapid impact analysis of planned changes,
and use of metadata-driven code generation and code deployment.
Business-IT co-development. Business-IT collaboration and communication can be enhanced by capturing
semantic business terms and attaching them to the underlying technical metadata.
Reduced operating costs. A searchable inventory of data fows and related documentation helps IT
departments recover from production incidents rapidly and reduce maintenance costs.
Extensible
Extensibility enables Vibe to evolve incrementally and continuously, whether as a result of internal development
by Informatica, product acquisition, or customer/partner development. The capability is present at all levels of
the integration software stack.
This principle is based on the belief that Informatica technologists are not the only ones with innovative ideas.
Anyone with the ability to develop a product improvement should be empowered to do so. The Informatica
Marketplace, with free and for-fee data integration technology from Informatica, partners and customers., and
its wide range of solution and asset blocks from innovators around the world, is evidence of the power of
this principle.
Implications
Essential functionality in Informatica Platform components is exposed and supported through an API to allow
teams, whether inside or outside of Informatica, to develop new capabilities.
Benets
Embeddable. Informatica customers and partners can embed the Vibe VDM in their applications to access,
aggregate, and run data quality rules locally within applications.
Scalable. As the volume and complexity of data increase, integration solutions can be extended through
partitioning, concurrency, and multithreaded architectures.
Incremental benets. The features and functions of the Vibe VDM that are available today can be used
immediately. As new capabilities are introduced, they can be incorporated into existing solutions for
incremental benefts.
9 Inside the Vibe Virtual Data Machine: A technical perspective on Vibe and its Map once, Deploy anywhere. capabilities
Adaptable to Change
Vibe provides an abstraction layer between tools that run on the VDM and the underlying data sources and
technology. This means that mappings created for use with these tools will apply to any target environment that
Vibe supports. Those environments include SQL databases, analytic appliances, Hadoop, and on-premise and
cloud applications. The people, processes, tools, and mappings created for enterprise data management in
this environment will apply to any environment that Vibe supports now and in the future.
Implications
It is critical to separate logical rules from physical implementation. Just as COBOL and Java have endured
new technology advances for decades by abstracting machine and operating system details, the Vibe engine
also has endured by using a data mapping and data rules language and protocol that are independent of the
underlying database or hardware/software platform. This future-proofng characteristic substantially reduces
the cost and time for delivering business initiatives.
Benets
Portability. This principle allows the same mapping language to be used across multiple components of the
Informatica Platform, which enables map once, deploy anywhere capabilities.
Flexibility. Users have a choice of invoking the Vibe VDM from GUI tools or invoking the underlying code
through an API for more complex requirements.
Investment protection. Work to develop and capture business rules for data mapping, data quality, and
workfows can be done once and without the need for rework when the underlying technology changes. In
addition, these efforts can take advantage of technology changes that enhance performance, scalability,
agility, and security without the need to rework business logic.
Reliable
Vibe can handle temporary failures and support high availability and system recovery. The virtual data
machine also allows for horizontal scaling by running on a grid, increasing scalability in high-volume
environments. It is also able to isolate and control workloads so that one problem workload does not impact
others.
In addition, the Informatica tools that run on Vibe enable data profling, data testing and validation, data
lineage analysis, impact analysis, and automatic deployment from development to test to production. These
capabilities contribute to an overall ecosystem that increases reliability.
Implications
Vibe has fault tolerance built into its design through features such as resiliency and recovery to temporary
database or network failures, allowing data integration processes to run without any human intervention in
case of temporary glitches. Each data integration job can run as its own independent process to reduce and
isolate any impacts from rogue processes operating in the same environment.
10
Benets
Recovery and failover. With built-in capabilities that allow automated or manual recovery of data
integration processes from the last point of failure, the Vibe VDM allows for easy restart of jobs.
Error handling. Vibe provides capabilities to capture and continue running integration jobs when data
errors occur. It supplies the fexibility for users to stop or continue processing remaining data after capturing
invalid data or rogue records from the source.
Stability. The ability to isolate running processes from each other helps maintain stability of the environment
and reduce any negative impacts due to bad design fows or rogue mappings.
Use Cases for the Vibe Virtual Data Machine
Vibe enables a number of powerful use cases using Informatica products, such as these examples:
Logic portability. With Vibe, logic created for data integration, data quality, data security, and data
masking can be deployed without change on-premise, in the cloud, on analytic appliances, and in Hadoop
clusters. This enables the reuse of resources (mappings, data quality rules, etc.) in ways that were never
possible before. Vibe enables you to map once, deploy anywhere.
Data virtualization. The Informatica Data Services data virtualization product provides a single virtualized
interface to a canonical business object. This creates a universal interface to data, no matter where the data
is stored (e.g., data warehouse, operational system, fle, Web service, cloud) that can be accessed via a
SQL query, web service call, or batch ETL process. Once again, you have map once, deploy
anywhere capabilities.
Agile development and prototyping. Vibe empowers developers and business users alike to create quick
integration prototypes without the heavy lifting associated with typical integration projects. Using features
such as SQL services and Web services, users can quickly query small loads of data in batch or on-demand
modes against the virtualized data model. The developer tool also enables midstream profling, which
comes in handy when debugging mapping logic on the fy.
Embeddable DI/DQ services. Its possible to expose data quality mappings/rules and data extraction
routines as Web services or SQL services using the Vibe platform. These service calls can be embedded into
external solutions or invoked as part of SOA implementations. The services are deployed as applications
or Web services and can be invoked using JDBC calls or Web service calls. The beneft is that data quality
rules and checking can be moved to the point of input, capturing data errors at the input source and
keeping costly data errors from proliferating. Vibe makes this possible through APIs that enable remote Web
services to call the VDM.
11 Inside the Vibe Virtual Data Machine: A technical perspective on Vibe and its Map once, Deploy anywhere. capabilities
Case Studies
Vibe is helping organizations in such industries as fnancial services, health services, and logistics increase the
impact and value of their enterprise data management initiatives.
Worldwide Financial Services Company
Business Challenge
This organization had done most of its business from physical locations. The challenge was to expand into
online channels, while growing its understanding of customers and improving the customer experience
regardless of the channel used to do business.
Solution
The fnancial services company implemented a data management strategy based on Informatica products,
powered by Vibe. The solution involved 18 diverse data sources including big data.
Business Benets of Vibe
The organization was able to utilize existing Informatica skills to deploy data management mappings to
Hadoop clusters with a single click of a check box. No additional knowledge of Hadoop, languages, or
tools was required. The ability to Map once, Deploy anywhere. resulted in signifcant cost savings and
dramatically reduced the time necessary to deliver a working solution to market.
Vibe enables Informatica to deploy mappings on Hadoop clusters for execution.
12
Leading Health Services Provider
Business Challenge
This health services organization had a large and disparate IT environment with many silos of unconnected
data, spread across 16 enterprise data stores and multiple technologies. As a result, key personnel were not
able to fnd and access the data they needed to deliver new business initiatives as quickly as some of the
organizations competitors.
Solution
The company used multiple Informatica products, including Informatica PowerCenter and Informatica Data
Quality. It also used Informatica Data Services to provide a common data virtualization layer and common
way of accessing its data.
Business Benets of Vibe
Leveraging the power of Informatica technology running on Vibe, this company plans to continue to evolve
its data services layer by embedding data quality routines and including SaaS sources. As data volumes
increase, it sees Vibe being a key enabler in allowing the company the fexibility of deploying business
logic in the cloud or, for high-volume and critical performance needs, to run in a PowerCenter/Hadoop grid
environment. Using Informatica products, it plans to deploy data integration routines as applications that can
be invoked with JDBC connectivity or as Web service calls, which can augment the companys master data
management system as well.
Large North American Logistics Company
Business Challenge
This North American logistics company needed to manage a hybrid data environment, specifcally integrating
Salesforce with its on-premise applications. The challenge was to make its salespeople more productive by
integrating Salesforce with information in its in-house systems, rather than having salespeople waste time
looking up information in multiple systems.
Solution
The company used a variety of Informatica products powered by Vibe to create a strategy to understand
(profle) its data and create a data management architecture to synchronize and share trusted and relevant
information across multiple applications, including Salesforce.
Business Benets of Vibe
Because of the power of Vibe, the company was able to create data integration and data quality logic
(mappings) on its on-premise instance of Informatica products and have the same mappings run on its cloud
instance of Informatica products. The benefts included faster solution delivery due to the ability to reuse
mappings across the environment. The organization also reduced costs and simplifed its data architecture
with the ability to reuse tools, processes, and personnel across a single environment.
13 Inside the Vibe Virtual Data Machine: A technical perspective on Vibe and its Map once, Deploy anywhere. capabilities
Conclusion
With Vibe as the foundation, you have a fully integrated information platform to transform raw data into
information that provides insight and value. Because it is powered by Vibe, the Informatica Platform allows
you to easily adapt to technology changes, while delivering business value faster for your organization.
The Informatica Platform is the only platform that provides the tools and capabilities for the simplest entry-level
uses to the most complex cross-enterprise initiatives. And it is the most proven platform, with 5,000 customers
who rely on Informatica to harness the full potential of their information to compete in todays interconnected
information age.
Our vision is simple: With Vibe, we have created the only information platform that is built to write once, run
anywhere. With this platform, you can simplify your IT architecture, get more out of your current resources,
and truly put your information potential to work.
Worldwide Headquarters, 100 Cardinal Way, Redwood City, CA 94063, USA Phone: 650.385.5000 Fax: 650.385.5500
Toll-free in the US: 1.800.653.3871 informatica.com linkedin.com/company/informatica twitter.com/InformaticaCorp
2014 Informatica Corporation. All rights reserved. Informatica

and Put potential to work

are trademarks or registered trademarks of Informatica


Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks.
IN09_0714_02677

Вам также может понравиться