Академический Документы
Профессиональный Документы
Культура Документы
5
Informatica has released its latest version 8.6 covering all the hot fixes it released for the
prior version 8.5 and including few new features. Since version 8, a Unified Admin
Console has been designed for managing Integration and Repository services. These were
discussed in earlier Blogs.
What does PowerCenter 8.6 bring new for the developers? Let us discuss PowerCenter
8.6 Client enhancements which will be useful to the developers.
2. Invalid/Invalidated renamed
In PowerCenter 7, the two states of objects were known as Invalid and Invalidated.
The exact meaning of these states is as follows:
Invalid – an object will not run,
Invalidated – an object may be invalid or may not run.
The difference between the two terms was not very clear. Therefore, to avoid any
confusion, in PowerCenter 8.6, the two states have been renamed as Invalid and
Impacted. While the Invalid state still implies that an object will not run, Impacted
means that an object is affected by a change, and therefore, may not run.
Apart from the naming convention the icons are also changed in PowerCenter 8.
3. Propagating Port Descriptions
In the Designer, in addition to the other properties of port propagation, we can edit a port
description and propagate the description to other transformations in the mapping.
2. PowerCenter Repository
The PowerCenter Repository is one of best metadata storage among all ETL products.
The repository is sufficiently normalized to store metadata at a very detail level; which in
turn means the Updates to the repository are very quick and the overall Team-based
Development is smooth. The repository data structure is also useful for the users to do
analysis and reporting.
Accessibility to the repository through MX views and SDK kit extends the repositories
capability from a simple storage of technical data to a database for analysis of the ETL
metadata.
PowerCenter Repository is a collection of 355 tables which can be created on any major
relational database. The kinds of information that are stored in the repository are,
When a user creates a folder, corresponding entries are made into table OPB_SUBJECT;
attributes like folder name, owner id, type of the folder like shared or not are all stored.
When we create\import sources and define field names, datatypes etc in source analyzer
entries are made into opb_src and OPB_SRC_FLD.
When target and related fields are created/imported from any database entries are made
into tables like OPB_TARG and OPB_TARG_FLD.
Table OPB_MAPPING stores mapping attributes like Mapping Name, Folder Id, Valid
status and mapping comments.
Table OPB_WIDGET stores attributes like widget type, widget name, comments etc.
Widgets are nothing but the Transformations which Informatica internally calls them as
Widgets.
Table OPB_SESSION stores configurations related to a session task and table
OPB_CNX_ATTR stores information related to connection objects.
Table OPB_WFLOW_RUN stores process details like workflow name, workflow started
time, workflow completed time, server node it ran etc.
REP_ALL_SOURCES, REP_ALL_TARGETS and REP_ALL_MAPPINGS are few of
the many views created over these tables.
PowerCenter applications access the PowerCenter repository through the Repository
Service. The Repository Service protects metadata in the repository by managing
repository connections and using object-locking to ensure object consistency.
3. Administration Console
The Administration Console is a web application that we use to administer the
PowerCenter domain and PowerCenter security. There are two pages in the console,
Domain Page & Security Page.
We can do the following In Domain Page:
o Create & manage application services like Integration Service and Repository
Service
o Create and manage nodes, licenses and folders
o Restart and shutdown nodes
o View log events
o Other domain management tasks like applying licenses and managing grids and
resources
We can do the following in Security Page:
o Create, edit and delete native users and groups
o Configure a connection to an LDAP directory service. Import users and groups
from the LDAP directory service
o Create, edit and delete Roles (Roles are collections of privileges)
o Assign roles and privileges to users and groups
o Create, edit, and delete operating system profiles. An operating system profile is
a level of security that the Integration Services uses to run workflows
4. PowerCenter Client
Designer, Workflow Manager, Workflow Monitor, Repository Manager & Data
Stencil are five client tools that are used to design mappings, Mapplets, create
sessions to load data and manage repository.
Mapping is an ETL code pictorially depicting logical data flow from source to target
involving transformations of the data. Designer is the tool to create mappings
Designer has five window panes, Source Analyzer, Warehouse Designer,
Transformation Developer, Mapping Designer and Mapplet Designer.
Source Analyzer:
Allows us to import Source table metadata from Relational databases, flat files, XML
and COBOL files. We can only import the source definition in the source Analyzer
and not the source data itself is to be understood. Source Analyzer also allows us to
define our own Source data definition.
Warehouse Designer:
Allows us to import target table definitions which could be Relational databases, flat
files, XML and COBOL files. We can also create target definitions manually and can
group them into folders. There is an option to create the tables physically in the
database that we do not have in source analyzer. Warehouse designer doesn’t allow
creating two tables with same name even if the columns names under them vary or
they are from different databases/schemas.
Transformation Developer:
Transformations like Filters, Lookups, Expressions etc that have scope to be re-used
are developed in this pane. Alternatively Transformations developed in Mapping
Designer can also be reused by checking the option‘re-use’ and by that it would be
displayed under Transformation Developer folders.
Mapping Designer:
This is the place where we actually depict our ETL process; we bring in source
definitions, target definitions, transformations like filter, lookup, aggregate and
develop a logical ETL program. In this place it is only a logical program because the
actual data load can be done only by creating a session and workflow.
Mapplet Designer:
We create a set of transformations to be used and re-used across mappings.
4. PowerCenter Client (contd)
Workflow Manager : In the Workflow Manager, we define a set of instructions called a
workflow to execute mappings we build in the Designer. Generally, a workflow contains
a session and any other task we may want to perform when we run a session. Tasks can
include a session, email notification, or scheduling information.
A set of tasks grouped together becomes worklet. After we create a workflow, we run the
workflow in the Workflow Manager and monitor it in the Workflow Monitor. Workflow
Manager has following three window panes,Task Developer, Create tasks we want to
accomplish in the workflow. Worklet Designer, Create a worklet in the Worklet Designer.
A worklet is an object that groups a set of tasks. A worklet is similar to a workflow, but
without scheduling information. You can nest worklets inside a workflow. Workflow
Designer, Create a workflow by connecting tasks with links in the Workflow Designer.
We can also create tasks in the Workflow Designer as you develop the workflow. The
ODBC connection details are defined in Workflow Manager “Connections “ Menu .
Workflow Monitor : We can monitor workflows and tasks in the Workflow Monitor. We
can view details about a workflow or task in Gantt Chart view or Task view. We can run,
stop, abort, and resume workflows from the Workflow Monitor. We can view sessions
and workflow log events in the Workflow Monitor Log Viewer.
The Workflow Monitor displays workflows that have run at least once. The Workflow
Monitor continuously receives information from the Integration Service and Repository
Service. It also fetches information from the repository to display historic information.
Output window – Displays messages from the Integration Service and Repository
Service.
Gantt chart view – Displays details about workflow runs in chronological format.
Repository Manager
We can navigate through multiple folders and repositories and perform basic repository
tasks with the Repository Manager. We use the Repository Manager to complete the
following tasks:
2. Add and connect to a repository, we can add repositories to the Navigator window
and client registry and then connect to the repositories.
3. Work with PowerCenter domain and repository connections, we can edit or remove
domain connection information. We can connect to one repository or multiple
repositories. We can export repository connection information from the client
registry to a file. We can import the file on a different machine and add the
repository connection information to the client registry.
4. Change your password. We can change the password for our user account.
5. Search for repository objects or keywords. We can search for repository objects
containing specified text. If we add keywords to target definitions, use a keyword
to search for a target definition.
8. Truncate session and workflow log entries. we can truncate the list of session and
workflow logs that the Integration Service writes to the repository. we can
truncate all logs, or truncate all logs older than a specified date.
5. Repository Service
General Properties
> OperatingMode: Values are Normal and Exclusive. Use Exclusive mode to perform
administrative tasks like enabling version control or promoting local to global repository
Database Properties
Advanced Properties
> CommentsRequiredFor Checkin: Requires users to add comments when checking in
repository objects.
> Error Severity Level: Level of error messages written to the Repository Service log.
Specify one of the following message levels: Fatal, Error, Warning, Info, Trace & Debug
Environment Variables
The database client code page on a node is usually controlled by an environment variable.
For example, Oracle uses NLS_LANG, and IBM DB2 uses DB2CODEPAGE. All
Integration Services and Repository Services that run on this node use the same
environment variable. You can configure a Repository Service process to use a different
value for the database client code page environment variable than the value set for the
node.
You might want to configure the code page environment variable for a Repository
Service process when the Repository Service process requires a different database client
code page than the Integration Service process running on the same node.
For example, the Integration Service reads from and writes to databases using the UTF-8
code page. The Integration Service requires that the code page environment variable be
set to UTF-8. However, you have a Shift-JIS repository that requires that the code page
environment variable be set to Shift-JIS. Set the environment variable on the node to
UTF-8. Then add the environment variable to the Repository Service process properties
and set the value to Shift-JIS.
The main three components of Integration Service which enable data movement are,
The Integration Service starts one or more Integration Service processes to run and
monitor workflows. When we run a workflow, the ISP starts and locks the workflow, runs
the workflow tasks, and starts the process to run sessions. The functions of the Integration
Service Process are,
The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasks
to a single node or across the nodes in a grid after performing a sequence of steps. Before
understanding these steps we have to know about Resources, Resource Provision
Thresholds, Dispatch mode and Service levels
• Resources – we can configure the Integration Service to check the resources
available on each node and match them with the resources required to run the
task. For example, if a session uses an SAP source, the Load Balancer dispatches
the session only to nodes where the SAP client is installed
• Three Resource Provision Thresholds, The maximum number of runnable
threads waiting for CPU resources on the node called Maximum CPU Run Queue
Length. The maximum percentage of virtual memory allocated on the node
relative to the total physical memory size called Maximum Memory %. The
maximum number of running Session and Command tasks allowed for each
Integration Service process running on the node called Maximum Processes
• Three Dispatch mode’s – Round-Robin: The Load Balancer dispatches tasks to
available nodes in a round-robin fashion after checking the “Maximum Process”
threshold. Metric-based: Checks all the three resource provision thresholds and
dispatches tasks in round robin fashion. Adaptive: Checks all the three resource
provision thresholds and also ranks nodes according to current CPU availability
• Service Levels establishes priority among tasks that are waiting to be dispatched,
the three components of service levels are Name, Dispatch Priority and Maximum
dispatch wait time. “Maximum dispatch wait time” is the amount of time a task
can wait in queue and this ensures no task waits forever
1. The Load Balancer checks different resource provision thresholds on the node
depending on the Dispatch mode set. If dispatching the task causes any threshold
to be exceeded, the Load Balancer places the task in the dispatch queue, and it
dispatches the task later
2. The Load Balancer dispatches all tasks to the node that runs the master
Integration Service process
1. The Load Balancer verifies which nodes are currently running and enabled
2. The Load Balancer identifies nodes that have the PowerCenter resources required
by the tasks in the workflow
3. The Load Balancer verifies that the resource provision thresholds on each
candidate node are not exceeded. If dispatching the task causes a threshold to be
exceeded, the Load Balancer places the task in the dispatch queue, and it
dispatches the task later
4. The Load Balancer selects a node based on the dispatch mode
When the workflow reaches a session, the Integration Service Process starts the DTM
process. The DTM is the process associated with the session task. The DTM process
performs the following tasks: