Вы находитесь на странице: 1из 16

About Ab Initio

 Ab Initio is a general purpose data processing platform for enterprise


class, mission critical applications such as data warehousing,
clickstream processing, data movement, data transformation and
analytics.
 Supports integration of arbitrary data sources and programs, and
provides complete metadata management across the enterprise.
 Proven best of breed ETL solution.
 Applications of Ab Initio:
 ETL for data warehouses, data marts and operational data sources.
 Parallel data cleansing and validation.
 Parallel data transformation and filtering.
 High performance analytics
 Real time, parallel data capture.

Monday, May 27, 2019 Classification: Public 1


Architecture of Ab Initio

Applications

Ab Initio
Metadata
Application Development Environments Repository
Graphical C ++ Shell

Component User-defined Third Party


Library Components Components

Ab Initio Co>Operating System

Native Operating System


UNIX Windows NT

Monday, May 27, 2019 Classification: Public 2


Ab Initio base software consists of two
main pieces:

 Ab Initio Co>Operating System and core


components
 Graphical Development environment(GDE)

Monday, May 27, 2019 Classification: Public 3


Co>operating System

 The Co>Operating System is core software that unites a


network of computing resources-CPUs, storage disks,
programs, datasets-into a production-quality data
processing system with scalable performance and
mainframe reliability.

 The Co>Operating System is layered on top of the native


operating systems of a collection of computers. It provides
a distributed model for process execution, file
management, process monitoring, check-pointing, and
debugging.

Monday, May 27, 2019 Classification: Public 4


Run Process

What happens when you start executing a Graph


using GDE?

 Your graph is translated into a script that can be executed in the


Shell Development Environment.

 This script and any metadata files stored on the GDE client
machine are shipped (via FTP) to the server.

 The script is invoked (via SSH/REXEC/TELNET) on the server.

 The script creates and runs a job that may run across many nodes.

 Monitoring information is sent back to the GDE client.


Monday, May 27, 2019 Classification: Public 5
Graphical Development Environment
(GDE)
 GDE lets create applications by dragging and dropping
components onto a canvas configuring them with familiar,
intuitive point and click operations, and connecting them into
executable flowcharts.

 These diagrams are architectural documents that developers


and managers alike can understand and use, but they are not
mere pictures: the co>operating system executes these
flowcharts directly. This means that there is a seamless and
solid connection between the abstract picture of the application
and the concrete reality of its execution.

Monday, May 27, 2019 Classification: Public 6


Sandbox and Project
What is a Sandbox - Sandbox is a collection of the various directories like bin,
dml, mp, run etc which contains the metadata (Graphs and their associated files)
Why to create a Sandbox – Helps in managing the directory structure where this
metadata is stored. Also helps in version control, migration and navigation. The
sandbox provides an excellent mechanism to maintain uniqueness while moving
from development to production environment by means switch parameters
Note - Sandbox can be associated with only one project, but project can have
many sandboxes

/Projects
bin

dml
Sandbox X
mp

run

xfr
Monday, May 27, 2019 Classification: Public 7
EME (Enterprise Meta>Environment)
Datastore
It is a system storage area where every version that you save of the files you
work on is permanently preserved

Check Out
GDE

GDE
E
GDE
M
GDE
E
GDE Locking

Monday, May 27, 2019 Classification: Public 8


EME uses

 Source control
 Documentation
 Analysis
 Job status
 Lifecycle management

Monday, May 27, 2019 Classification: Public 9


Check In / Check Out

Check Out – Updating sandbox with the latest datastore


version of the object from the EME. In this stage, the object is
still read-only and hence user cannot make changes.

Locking – In order to make a file editable, user needs to lock


that object in his/her respective sandboxes. Only one user can
lock an object at a time.

Check In – GDE imports specified files into EME datastore.


EME automatically increments the version number of the
object checked in.
Monday, May 27, 2019 Classification: Public 10
Checkout by User A
GDE
IN all the 3 cases,
EME Checkout by User B
GDE the object is still in
read-only mode
Checkout by User C
GDE

Checkout by User A Locked By A Can only be


GDE edited be User A

EME Checkout by User B

Checkout by User C
GDE User B and User C cannot lock
the object and object is still in
GDE read-only mode here

Now either User B or C can


User A Check In
EME lock the object and make the
required changes

Monday, May 27, 2019 Classification: Public 11


Parallelism

Ab-Initio Supports 3 types of parallelism:


• Data Parallelism : Same data is being processed in
parallel sessions

• Pipeline Parallelism: One process passes the record


immediately to the next process after the work is done
without waiting for the whole data to be processed

• Component Parallelism: More than one


component(Processes) running parallel at the same
time

Monday, May 27, 2019 Classification: Public 12


Data Parallelism

Monday, May 27, 2019 Classification: Public 13


Two Ways of Looking at
Data Parallelism
Expanded View:

Global View:

Monday, May 27, 2019 Classification: Public 14


Pipeline Parallelism

Processing Record: 100

Processing Record: 99

Monday, May 27, 2019 Classification: Public 15


Component Parallelism

Sorting Transactions

Monday, May 27, 2019 Classification: Public 16

Вам также может понравиться