Вы находитесь на странице: 1из 15

What is the use of Shared Folder?

Shared Folder is like another folder but that can be accessed by all the users(Access
can be changed) This is mainly used to share the Objects between the folders for
resuablity. For example you can create a shared folder for keeping all the Common
Mapplets Src Tgt and Transformation that can be used across the folders by creating
shortcut to those. By doing we are increasing usablity of the code also changes can
be made at one place and that will easily reflect in all other shortcuts.

Unit testing are of two types

1. Quantitaive testing

2.Qualitative testing

Steps.

1.First validate the mapping

2.Create session on themapping and then run workflow.

Once the session is succeeded the right click on session and go for statistics tab.

There you can see how many number of source rows are applied and how many
number of rows loaded in to targets and how many number of rows rejected.This is
called Quantitative testing.

If once rows are successfully loaded then we will go for qualitative testing.

Steps

1.Take the DATM(DATM means where all business rules are mentioned to the
corresponding source columns) and check whether the data is loaded according to
the DATM in to target table.If any data is not loaded according to the DATM then go
and check in the code and rectify it.

This is called Qualitative testing.

This is what a devloper will do in Unit Testing.

What is the use of incremental aggregation? Explain me in brief with an example.

Its a session option. when the informatica server performs incremental aggr. it
passes new source data through the mapping and uses historical chache data to
perform new aggregation caluculations incrementaly. for performance we will use
it.
When using incremental aggregation, you apply captured changes in the source to
aggregate calculations in a session. If the source changes incrementally and you
can capture changes, you can configure the session to process those changes. This
allows the Integration Service to update the target incrementally, rather than
forcing it to process the entire source and recalculate the same data each time you
run the session.

For example, you might have a session using a source that receives new data every
day. You can capture those incremental changes because you have added a filter
condition to the mapping that removes pre-existing data from the flow of data. You
then enable incremental aggregation.

When the session runs with incremental aggregation enabled for the first time on
March 1, you use the entire source. This allows the Integration Service to read and
store the necessary aggregate data. On March 2, when you run the session again,
you filter out all the records except those time-stamped March 2. The Integration
Service then processes the new data and updates the target accordingly.

Consider using incremental aggregation in the following circumstances:

You can capture new source data. Use incremental aggregation when you can
capture new source data each time you run the session. Use a Stored Procedure or
Filter transformation to process new data.

Incremental changes do not significantly change the target. Use incremental


aggregation when the changes do not significantly change the target. If processing
the incrementally changed source alters more than half the existing target, the
session may not benefit from using incremental aggregation. In this case, drop the
table and recreate the target with complete source data.

Note: Do not use incremental aggregation if the mapping contains percentile or


median functions. The Integration Service uses system memory to process these
functions in addition to the cache memory you configure in the session properties.
As a result, the Integration Service does not store incremental aggregation values
for percentile and median functions in disk caches.

How to delete duplicate rows in flat files source is any option in informatica
Use a sorter transformation , in that u will have a "distinct" option make use of it .

How to use mapping parameters and what is their use


In designer u will find the mapping parameters and variables options.u can assign a
value to them in designer. comming to there uses suppose u r doing incremental
extractions daily. suppose ur source system contains the day column. so every day u
have to go to that mapping and change the day so that the particular data will be
extracted . if we do that it will be like a layman's work. there comes the concept of
mapping parameters and variables. once if u assign a value to a mapping variable
then it will change between sessions.

in the concept of mapping parameters and variables, the variable value will be saved
to the repository after the completion of the session and the next time when u run
the session, the server takes the saved variable value in the repository and starts
assigning the next value of the saved value. for example i ran a session and in the
end it stored a value of 50 to the repository.next time when i run the session, it
should start with the value of 70. not with the value of 51.

how to do this.

u can do onething after running the mapping,, in workflow manager

start-------->session.

right clickon the session u will get a menu, in that go for persistant values, there u
will find the last value stored in the repository regarding to mapping variable. then
remove it and put ur desired one, run the session... i hope ur task will be done

Can we use aggregator/active transformation after update strategy transformation

You can use aggregator after update strategy. The problem will be, once you perform
the update strategy, say you had flagged some rows to be deleted and you had
performed aggregator transformation for all rows, say you are using SUM function,
then the deleted rows will be subtracted from this aggregator transformation.

Why dimenstion tables are denormalized in nature ?

Because in Data warehousing historical data should be maintained, to maintain


historical data means suppose one employee details like where previously he
worked, and now where he is working, all details should be maintain in one table, if
u maintain primary key it won't allow the duplicate records with same employee id.
so to maintain historical data we are all going for concept data warehousing by
using surrogate keys we can achieve the historical data(using oracle sequence for
critical column).

so all the dimensions are marinating historical data, they are de normalized,
because of duplicate entry means not exactly duplicate record with same employee
number another record is maintaining in the table.

How do you handle decimal places while importing a flatfile into informatica?

while importing flat file definetion just specify the scale for a neumaric data type. in
the mapping, the flat file source supports only number datatype(no decimal and
integer). In the SQ associated with that source will have a data type as decimal for
that number port of the source.

If you are workflow is running slow in informatica. Where do you start trouble
shooting and what are the steps you follow?

When the work flow is running slowly u have to find out the bottlenecks

in this order

target

source
mapping

session

system

If you have four lookup tables in the workflow. How do you troubleshoot to
improve performance?

There r many ways to improve the mapping which has multiple lookups.

1) we can create an index for the lookup table if we have permissions(staging


area).

2) divide the lookup mapping into two (a) dedicate one for insert means: source -
target,, these r new rows . only the new rows will come to mapping and the
process will be fast . (b) dedicate the second one to update : source=target,, these
r existing rows. only the rows which exists allready will come into the mapping.

3)we can increase the chache size of the lookup.

Can anyone explain error handling in informatica with examples so that it will be
easy to explain the same in the interview.

Go to the session log file there we will find the information regarding to the

session initiation process,

errors encountered.

load summary.

so by seeing the errors encountered during the session running, we can resolve the
errors.

There is one file called the bad file which generally has the format as *.bad and it
contains the records rejected by informatica server. There are two parameters one
fort the types of row and other for the types of columns. The row indicators
signifies what operation is going to take place ( i.e. insertion, deletion, updation
etc.). The column indicators contain information regarding why the column has
been rejected.( such as violation of not null constraint, value error, overflow etc.) If
one rectifies the error in the data preesent in the bad file and then reloads the data
in the target,then the table will contain only valid data.

What are partition points?

Partition points mark the thread boundaries in a pipeline and


divide the pipeline into stages. The Informatica Server sets partition points at
several
transformations in a pipeline by default. If you use PowerCenter, you can define
other partition
points. When you add partition points, you increase the number of transformation
threads,
which can improve session performance. The Informatica Server can redistribute
rows of data at partition points, which can also improve session performance.

Can i start and stop single session in concurent bstch?

,Just right click on the particular session and going to recovery option
or

by using event wait and event rise

Can we run a group of sessions without using workflow manager

It is possible two run two session only (by presession,post session) using pmcmd
without using workflow. Not more than two.

If a session fails after loading of 10,000 records in to the target.How can u load the
records from 10001 th record when u run the session next time in informatica 6.1?

Running the session in recovery mode will work, but the target load type should be
normal. If its bulk then recovery wont work as expected

What are mapping parameters and varibles in which situation we can use it

If we need to change certain attributes of a mapping after every time the session is
run, it will be very difficult to edit the mapping and then change the attribute. So
we use mapping parameters and variables and define the values in a parameter
file. Then we could edit the parameter file to change the attribute values. This
makes the process simple.

Mapping parameter values remain constant. If we need to change the parameter


value then we need to edit the parameter file .

But value of mapping variables can be changed by using variable function. If we


need to increment the attribute value by 1 after every session run then we can use
mapping variables .

In a mapping parameter we need to manually edit the attribute value in the


parameter file after every session run

What is worklet and what use of worklet and in which situation we can use it

A set of worlflow tasks is called worklet,

Workflow tasks means

1)timer2)decesion3)command4)eventwait5)eventrise6)mail

What is the logic will you implement to laod the data in to one factv from 'n'
number of dimension tables.
Noramally evey one use

!)slowly changing diemnsions

2)slowly growing dimensions

In the source, if we also have duplicate records and we have 2 targets, T1- for
unique values and T2- only for duplicate values. How do we pass the unique values
to T1 and duplicate values to T2 from the source to these 2 different targets in a
single mapping?

source--->sq--->exp-->sorter(with enable select distinct check box)--->t1

--->aggregator(with enabling group by and write count

function)--->t2

If u want only duplicates to t2 u can follow this sequence

--->agg(with enable group by write this code decode(count(col),1,1,0))---


>Filter(condition is 0)--->t2.

confirmed dimension == one dimension that shares with two fact table

factless means ,fact table without measures only contains foreign keys-two types of
factless table ,one is event tracking and other is covergae table

Bit map indexes preffered in the data ware housing

Metedata is data about data, here every thing is stored example-


mapping,sessions,privileges other data,in informatica we can see the metedata in
the repository.

Can any body write a session parameter file which will change the source and
targets for every session. i.e different source and targets for each session run.

You are supposed to define a parameter file. And then in the Parameter file, you
can define two parameters, one for source and one for target.

Give like this for example:

$Src_file = c:program filesinformaticaserverinabc_source.txt

$tgt_file = c: argetsabc_targets.txt

Then go and define the parameter file:

[folder_name.WF:workflow_name.ST:s_session_name]
$Src_file =c:program filesinformaticaserverinabc_source.txt
$tgt_file = c: argetsabc_targets.txt

If its a relational db, you can even give an overridden sql at the session level...as a
parameter. Make sure the sql is in a single line.

If you want to create indexes after the load process which transformation you
choose?

Its usually not done in the mapping(transformation) level. Its done in session level.
Create a command task which will execute a shell script (if Unix) or any other
scripts which contains the create index command. Use this command task in the
workflow after the session or else, You can create it with a post session command.

Where is the cache stored in informatica?

Cache is stored in the Informatica server memory and over flowed data is stored on
the disk in file format which will be automatically deleted after the successful
completion of the session run. If you want to store that data you have to use a
persistant cache.

What will happen if you are using Update Strategy Transformation and your
session is configured for "insert"?

What are the types of External Loader available with Informatica?

If you have rank index for top 10. However if you pass only 5 records, what will be
the output of such a Rank Transformation?

if u r using a update strategy in any of ur mapping, then in session properties u


have to set treat source rows as Data Driven. if u select insert or udate or delete,
then the info server will not consider UPD for performing any DB operations.

ELSE

u can use the UPD session level options. instead of using a UPD in mapping just
select the update in treat source rows and update else insert option. this will do the
same job as UPD. but be sure to have a PK in the target table.

2) for oracle : SQL loader

for teradata:tpump,mload.

3) if u pass only 5 rows to rank, it will rank only the 5 records based on the rank
port.

How can you delete duplicate rows with out using Dynamic Lookup? Tell me any
other ways using lookup delete the duplicate rows?

For example u have a table Emp_Name and it has two columns Fname, Lname in
the source table which has douplicate rows. In the mapping Create Aggregator
transformation. Edit the aggregator transformation select Ports tab select Fname
then click the check box on GroupBy and uncheck the (O) out port. select Lname
then uncheck the (O) out port and click the check box on GroupBy. Then create 2
new ports Uncheck the (I) import then click Expression on each port. In the first
new port Expression type Fname. Then second Newport type Lname. Then close the
aggregator transformation link to the target table.

In realtime which one is better star schema or snowflake star schema

the surrogate will be linked to which columns in the dimension table.

In real time only star schema will implement because it will take less time and
surrogate key will there in each and every dimension table in star schema and this
surrogate key will assign as foreign key in fact table.

How do u check the source for the latest records that are to be loaded into the
target.
i.e i have loaded some records yesterday, today again the file has been populated
with some more records today, so how do i find the records populated today.

Create a lookup to target table from Source Qualifier based on primary Key.

b) Use and expression to evaluate primary key from target look-up. ( If a new
source record look-up primary key port for target table should return null). Trap
this with decode and proceed.

How do you load the time dimension.

Time Dimension will generally load manually by using PL/SQL , shell scripts, proc C
etc......

What are the properties should be notified when we connect the flat file source
definition to
relational database target definition?

1. File is fixed width or delimited

2.Size of the file.

If its can be executed without performance issues then normal load will work

If its huge in GB they NWAY partitions can be specified at the source side and
the target side.

3.File reader,source file name etc .....

What is hash table informatica?

In hash partitioning, the Informatica Server uses a hash function to group rows of
data among partitions. The Informatica Server groups the data based on a partition
key.Use hash partitioning when you want the Informatica Server to distribute rows
to the partitions by group. For example, you need to sort items by item ID, but you
do not know how many items have a particular ID number.
What is meant by EDW?

EDW is Enterprise Datawarehouse which means that its a centralised DW for the
whole organization.

this apporach is the apporach on Imon which relies on the point of having a single
warehouse/centralised where the kimball apporach says to have seperate data
marts for each vertical/department.

Advantages of having a EDW:

1. Golbal view of the Data

2. Same point of source of data for all the users acroos the organization.

3. able to perform consistent analysis on a single Data Warehouse.

to over come is the time it takes to develop and also the management that is
required to build a centralised database.

What are the measure objects

Aggregate calculation like sum,avg,max,min these are the measure objetcs.

How can we join the tables if the tables have no primary and forien key relation
and no matchig port to join?

without common column or common data type we can join two sources using
dummy ports.

1.Add one dummy port in two sources.

2.In the expression trans assing '1' to each port.

2.Use Joiner transformation to join the sources using dummy port(use join
conditions).

If the workflow has 5 session and running sequentially and 3rd session hasbeen
failed how can we run again from only 3rd to 5th session?

If multiple sessions in a concurrent batch fail, you might want to truncate all targets
and run the batch again. However, if a session in a concurrent batch fails and the
rest of the sessions complete successfully, you can recover the session as a
standalone session.To recover a session in a concurrent batch:1.Copy the failed
session using Operations-Copy Session.2.Drag the copied session outside the batch
to be a standalone session.3.Follow the steps to recover a standalone
session.4.Delete the standalone copy.

Hi, as per the questions all the sessions are serial. So you can start the session as
"start workflow from task" from there it wil continue to run the rest of the tasks.
What is the diff b/w STOP & ABORT in INFORMATICA sess level ?

Stop:We can Restart the session

Abort:WE cant restart the session.We should truncate all the pipeline after that
start the session

What is the diff b/w Stored Proc (DB level) & Stored proc trans (INFORMATICA
level) ?
again why should we use SP trans ?

First of all stored procedures (at DB level) are series of SQL statement. And those
are stored and compiled at the server side.In the Informatica it is a transformation
that uses same stored procedures which are stored in the database. Stored
procedures are used to automate time-consuming tasks that are too complicated
for standard SQL statements.if you don't want to use the stored procedure then you
have to create expression transformation and do all the coding in it.

What is surrogatekey ? In ur project in which situation u has used ? explain with


example ?

A surrogate key is system genrated/artificial key /sequence number or A surrogate


key is a substitution for the natural primary key.It is just a unique identifier or
number for each row that can be used for the primary key to the table. The only
requirement for a surrogate primary key is that it is unique for each row in the
tableI it is useful because the natural primary key (i.e. Customer Number in
Customer table) can change and this makes updates more difficult.but In my
project, I felt that the primary reason for the surrogate keys was to record the
changing context of the dimension attributes.(particulaly for scd )The reason for
them being integer and integer joins are faster. Unlike other

In my source table 1000 rec's r there.I want to load 501 rec to 1000 rec into my
Target table ?how can u do this ?

You can overide the sql Query in Wofkflow Manager. LIke

select * from tab_name where rownum<=1000

minus

select * from tab_name where rownum<=500;

How can we eliminate duplicate rows from flat file?

Use Sorter Transformation. When you configure the Sorter Transformation to treat
output rows as distinct, it configures all ports as part of the sort key. It therefore
discards duplicate rows compared during the sort operation

I have a situation here to load the Table into informatica.

I have 5 temporary tables as sources. They look like:


Table1: (K - Key, N-Null, X,Y,Z - values)

K1 X N N N N
K2 X N N N N
--------------------------

Table2:

K1 N X N N N
K2 N X N N N
--------------------------

the other 3 tables are in the same way.

But there can be a situation like any of the table can contain duplicates like:

K1 X N N N N
K1 Y N N N N
--------------------------

This kind of records should be errored out.

Because of this, we can't use aggregator/group by as we are not sure which one
should be removed.

How can we obtain this functionality in Informatica.

Do the following

Map1..Source instance s1---Source qualifier?aggregator transformation---t1(target)

In aggregator transformation

Count_ port(output port)= key port --group by key port

Now take your original port with eliminated duplicate key port?s record in map2

Map2..Source instance( t1)---source qualifier---take your original port with


eliminated duplicate key records--- and now do your requirement design .

When do u we use dynamic cache and when do we use static cache in an connected
and unconnected lookup transformation

We use dynamic cache only for connected lookup. We use dynamic cache to check
whether the record already exists in the target table are not. And depending on
that, we insert,update or delete the records using update strategy. Static cache is
the default cache in both connected and unconnected. If u select static cache on
lookup table in infa, it own't update the cache and the row in the cache remain
constant. We use this to check the results and also to update slowly changing
records

what is incremental loading ?

Incremental load is adding/inserting only changed/latest


data from the source.In incremental loading,history data
could remain as it is along with the new data or
overwritten by incremental data.

HOW DO YOU PARFORM INCREMENTAL LAOD ?

Taking the Target defination as source and using the joiner


and update we can do the incremental loading

2.By using lookup transformation, keeping lookup on target

and companring.

How to delete duplicate record in Informatica?

if the source is database means we can delete the duplicate


records by enabling the option select distinct in source
qualifier t/r properties or by writing the following query
in source qualifier filter

delete from emp where rowid not in (select min(rowid) from


emp group by empno);

if the source is flat file means by enabling the option

distinct in sorter t/r we can delete the duplicate records

How do you Merge multiple Flat files for example 100 flat files with out using
Union T/F

By using File List we can Merge more than one Flat file.
when we are importing more than one flat file we should Set
the Source Property as INDIRECT,where default property ll
have DIRECT. and we should give the file,which ll have the
adresses of all the Flat files which we are going to merge.

What is target update override


By Default the intergration service updates the target
based on key columns. But we might want to update non-key
columns also,at that point of time we can override the
UPDATE statement for each target in the mapping.The target
override affects only when the source rows are marked as
update by an update strategy in the mapping.
If u select groupby port in aggregator what is output and
dont select groupby option what is output
If we select a group by port then : we get the last row of
each group based on the group by port.

If we didn't select a group by port: we get only one row


that is the last row .

Is it possible to update the target table with PK?


You needs to use PK for updating the Target table along
with the Update Strategy and select Datadriven @ session
level.

How do you Merge multiple Flat files for example 100 flat files
with out using Union T/F
By using File List we can Merge more than one Flat file.
when we are importing more than one flat file we should Set
the Source Property as INDIRECT,where default property ll
have DIRECT. and we should give the file,which ll have the
adresses of all the Flat files which we are going to merge.

what is workflow varible


Workflow variable is similar to Mapping variable where as
in workflow variable we will pass the workflow statistics
and suppose you want to configure the multiple run of
workflows by using varable that you can do with this.

what are the limitations for bulk loading in informatica for


all kind of databases and transformations?
need to disable constraints while performing bulk loading

what is mean by throghput? in informatica

its the rows per second processed by informatica

enterprise datawarehouse your project phase by phase


explain?
Its depend on the project scheduling..

In some projects.. it will considered as


ph1 -- Requirements Gathering
ph2 -- Analysis and framework
ph3 -- Designing
ph4 -- Production Migration, performance tuning and
maintenance activities

In other hand some projects will be set as


ph1 -- Integrating the sources
ph2 -- Creating the Datamarts and Aggregations
ph3 -- Reporting

So its depends on size and requirements of the project

i have source flat file like 1 a,1 b,1 c,2 a,2 b,2 c
i want output as 1 a,b,c and 2 a,b,c ...
how can achieve this
Use 2 variables one for counter and another for value.
sort all the records by c1 and keep track on c1 and hold
the value in v1. When the v1 and c1 are equal, then
concatenate c2 with v2 other assign c2 with v2. Finaly
connct to aggregator and take last value by grouping the
records by c1.
Take the mapping as (suppose the columns as C1, c2)
SD --> SQ --> Exp --> Agg --> Tgt
In exp define as
c1 <-- c1
c2 <-- c2
v_2 <-- IIF(c1 = v_1,TO_CHAR(v_2) || ', ' || TO_CHAR
(c2),TO_CHAR(c2))
v_1 <-- c1
o_p1 <-- v_2
In Agg
Group by c1
c2 <-- v_2
c11 <-- LAST(c1)
Tgt is connected as
c1 <-- c11
c2 <-- c2

v_ stands for variable port and o_ stands for output port

wt is the difference between truncate and delete in which


situation u use delete and truncate in real time..
TRUNCATE removes all rows from a table.The operation cannot
be rolled back and no triggers will be fired. As such,
TRUCATE is faster and doesn't use as much undo space as a
DELETE.

The DELETE command is used to remove rows from a table. A


WHERE clause can be used to only remove some rows. If no
WHERE condition is specified, all rows will be removed.
After performing a DELETE operation you need to COMMIT or
ROLLBACK the transaction to make the change permanent or to
undo it. Note that this operation will cause all DELETE
triggers on the table to fire.

TRUNCATE are DDL commands, whereas DELETE is a DML command.


As such, DELETE operations can be rolled back (undone),
while DROP and TRUNCATE operations cannot be rolled back.

what is the gap analysis?


its the difference between what is needed and what is
available...

How to call stored Procedure from Workflow monitor in


Informatica 7.1 version?
If the stored procedure id used to do any operations on the
database tables (say Dropping the indexes on the tgt table
or renaming it or truncating it)then call them at the Pre
SQL and Post SQL options at the session properties of the
Target.

Why touse stored procedure in ETL Application?


i used to stored procedure for time conversion,droping and
creating indexes,for loading time dimension,to know the
status of the database,to know the space availble.

Вам также может понравиться