Академический Документы
Профессиональный Документы
Культура Документы
Metadata
Reporting
Raw Summary
Data Data
Reporting
Dat
Information Decision
a
L Q
o u
Operational a Data
e dippers
data d r
y
M Summary
a info M
n Detailed a
External a inform- n
data g a OLAP
e ation tools
D Meta g
r data e
r
Warehouse manager
1 Data mart is used on a business division/department level. A data mart only contains
the required subject data for local analysis.
1.Can two Fact Tables share same dimensions Tables? How many Dimension tables
are associated with one Fact Table ur project?
Ans: Yes
Ans: ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), and DOLAP (Desktop
OLAP). In these three OLAP architectures, the interface to the analytic layer is typically the
same; what is quite different is how the data is physically stored.
In MOLAP, the premise is that online analytical processing is best implemented by storing
the data multidimensionally; that is, data must be stored ultidimensionally in order to be
viewed in a multidimensional manner.
In ROLAP, architects believe to store the data in the relational model; for instance, OLAP
capabilities are best provided against the relational database.
DOLAP, is a variation that exists to provide portability for the OLAP user. It
creates multidimensional datasets that can be transferred from server to desktop, requiring
only the DOLAP software to exist on the target system. This provides significant advantages
to portable computer users, such as salespeople who are frequently on the road and do not
have direct access to
their office server.
3.What is an MDDB? and What is the difference between MDDBs and RDBMSs?
Ans: There are two primary technologies that are used for storing the data used in OLAP
applications.
1. multidimensional databases (MDDB)
2. relational databases (RDBMS).
The major difference between MDDBs and RDBMSs is in how they store data. Relational
databases store their data in a series of tables and columns. Multidimensional databases, on
the other hand, store thedata in a large multidimensional arrays.
Advantages of MDDB:
1.Retrieval data is very fast becauseThe data corresponding to any combination of
dimension members can be retrieved with a single I/O.
2. Data is clustered compactly in a multidimensional array.
3. Values are caluculated ahead of time.
4.the index is small and can therefore usually reside completely in memory.
5.Storage is very efficient because the blocks contain only data.
6.A single index locates the block corresponding to a combination of sparse
dimension numbers.
Datamerging: Process of standardizing data types and fields. Suppose one source system
calls integer type data as smallint where as another calls similar data as decimal. The data
from the two source systems needs to rationalized when moved into the oracle data format
called number.
Aggregation: The process where by multiple detailed values are combined into a single
summary value typically summation numbers representing dollars spend or units
sold.Generate summarized data for use in aggregate fact and dimension tables.
Data Transformation is an interesting concept in that some transformation can occur during
the “extract,” some during the transformation,” or even – in limited cases--- during “load“
portion of the ETL process. The type of transformation function u need will most often
determine where it should be performed. Some transformation functions could even be
performed in more than one place. B’ze many of the transformations u will want to perform
already exist in some form or another in more than one of the three environments (source
database or application, ETL tool, or the target db).
OLAP stands for Online Analytical Processing. OLAP is a term that means many things to
many people. Here, we will use the term OLAP and Star Schema pretty much
interchangeably. We will assume that star schema database is an OLAP system.( This is not
the same thing that Microsoft calls OLAP; they extend OLAP to mean the cube structures
built using their product, OLAP Services). Here, we will assume that any system of read-
only, historical, aggregated data is an OLAP system.
A data warehouse(or mart) is way of storing data for later retrieval. This retrieval is almost
always used to support decision-making in the organization. That is why many data
warehouses are considered to be DSS (Decision-Support Systems).
Both a data warehouse and a data mart are storage mechanisms for read-only, historical,
aggregated data.By read-only, we mean that the person looking at the data won’t be
changing it. If a user wants at the sales yesterday for a certain product, they should not
have the ability to change that number.
The “historical” part may just be a few minutes old, but usually it is at least a day old.A data
warehouse usually holds data that goes back a certain period in time, such as five years. In
contrast, standard OLTP systems usually only hold data as long as it is “current” or active.
An order table, for example, may move orders to an archive table once they have been
completed, shipped, and received by the customer.
When we say that data warehouses and data marts hold aggregated data, we need to stress
that there are many levels of aggregation in a typical data warehouse.
8. If data source is in the form of Excel Spread sheet then how do use?
Data Warehousing Interview Questions
Ans: PowerMart and PowerCenter treat a Microsoft Excel source as a relational database,
not a flat file. Like relational sources, the Designer uses ODBC to import a Microsoft Excel
source. You do not need database permissions to import Microsoft Excel sources. To import
an Excel source definition, you need to complete the following tasks: Install the Microsoft
Excel ODBC driver on your system. Create a Microsoft Excel ODBC data source for each
source file in the ODBC 32-bit Administrator.Prepare Microsoft Excel spreadsheets by
defining ranges and formatting columns of numeric data. Import the source definitions in
the Designer.Once you define ranges and format cells, you can import the ranges in the
Designer. Ranges display as source definitions when you import the source.
10. What are the modules/tools in Business Objects? Explain their purpose briefly?
Ans: BO Designer, Business Query for Excel, BO Reporter, Infoview,Explorer,WEBI, BO Publisher,
and Broadcast Agent, BO (ZABO).
InfoView: IT portal entry into WebIntelligence & Business Objects.
Base module required for all options to view and refresh reports.
Reporter: Upgrade to create/modify reports on LAN or Web.
Explorer: Upgrade to perform OLAP processing on LAN or Web.
Designer: Creates semantic layer between user and database.
Supervisor: Administer and control access for group of users.
WebIntelligence: Integrated query,reporting,and OLAP analysis over the Web.
Broadcast Agent: Used to schedule, run, publish, push, and broadcast pre-built reports and
spreadsheets, including event notification and response
capabilities, event filtering, and calendar based notification, over the LAN, e-mail, pager,Fax,
Personal Digital Assistant( PDA), Short Messaging Service(SMS), etc.
Set Analyzer - Applies set-based analysis to perform functions such as execlusion,
intersections, unions, and overlaps visually.
Developer Suite – Build packaged, analytical, or customized apps.
11.What are the Ad hoc quries, Canned Quries/Reports? and How do u create
them? (Plz check this page……C\:BObjects\Quries\Data Warehouse - About
Queries.htm)
Ans: The data warehouse will contain two types of query. There will be fixed queries that are
clearly defined and well understood, such as regular reports, canned queries (standard
reports) and common aggregations. There will also be ad hoc queries that are unpredictable,
both in quantity and frequency.
Ad Hoc Query: Ad hoc queries are the starting point for any analysis into a database. Any
business analyst wants to know what is inside the database. then he proceeds by calculating
totals, averages, maximum and minimum values for most attributes within the database.
These are unpredictable element of a data warehouse. It is exactly that ability to run any
query when desired and expect a reasonable response that makes the data warhouse
worthwhile, and makes the design such a significant challenge.
The end-user access tools are capable of automatically generating the database query that
answers any Question posed by the user. The user will typically pose questions in terms that
they are familier with (for example, sales by store last week); this is converted into the
database query by the access tool, which is aware of the structure of information within the
data warehouse.
Canned queries: Canned queries are predefined queries. In most instances, canned queries
contain prompts that allow you to customize the query for your specific needs. For example,
a prompt may ask you for a School, department, term, or section ID. In this instance you
would enter the name of the School, department or term, and the query will retrieve the
specified data from the Warehouse.You can measure resource requirements of these
queries, and the results can be used for capacity palnning and for database design.
Data Warehousing Interview Questions
The main reason for using a canned query or report rather than creating your own is that
your chances of misinterpreting data or getting the wrong answer are reduced. You are
assured of getting the right data and the right answer.
12. How many Fact tables and how many dimension tables u did? Which table
precedes what?
Ans: http://www.ciobriefings.com/whitepapers/StarSchema.asp
13. What is the difference between STAR SCHEMA & SNOW FLAKE SCHEMA?
Ans: http://www.ciobriefings.com/whitepapers/StarSchema.asp
14. Why did u choose STAR SCHEMA only? What are the benefits of STAR SCHEMA?
Ans: Because it’s denormalized structure , i.e., Dimension Tables are denormalized. Why to
denormalize means the first (and often only) answer is : speed. OLTP structure is designed for
data inserts, updates, and deletes, but not data retrieval. Therefore, we can often squeeze
some speed out of it by denormalizing some of the tables and having queries go against fewer
tables. These queries are faster because they perform fewer joins to retrieve the same
recordset. Joins are also confusing to many End users. By denormalizing, we can present the
user with a view of the data that is far easier for them to understand.
16. (i) What is FTP? (ii) How do u connect to remote? (iii) Is there another way to
use FTP without a special utility?
Ans: (i): The FTP (File Transfer Protocol) utility program is commonly used for copying files
to and from other computers. These computers may be at the same site or at different sites
thousands of miles apart. FTP is general protocol that works on UNIX systems as well as
other non- UNIX systems.
(iii): Yes. If u r using Windows, u can access a text-based FTP utility from a DOS
prompt.
To do this, perform the following steps:
0 From the Start Programs MS-Dos Prompt
1 Enter “ftp ftp.geocities.com.” A prompt will appear
(or)
Enter ftp to get ftp prompt ftp> open hostname ex. ftp>open ftp.geocities.com (It
connect to the specified host).
2 Enter ur yahoo! GeoCities member name.
3 enter your yahoo! GeoCities pwd.
You can now use standard FTP commands to manage the files in your Yahoo! GeoCities
directory.
Mput: To copy multiple files from the local machine to the remote machine.
Note: Discarded rows do not appear in the session log or reject files. To maximize
session performance, include the Filter transformation as close to the sources in the
mapping as possible. Rather than passing records you plan to discard through the mapping,
you then filter out unwanted data early in the flow of data from sources to targets.
You cannot concatenate ports from more than one transformation into the Filter
transformation; the input ports for the filter must come from a single transformation. Filter
transformations exist within the flow of the mapping and cannot be unconnected. The Filter
transformation does not allow setting output default values.
20. When do u create the Source Definition ? Can I use this Source Defn to any
Transformation?
Ans: When working with a file that contains fixed-width binary data, you must create the
source definition. The Designer displays the source definition as a table, consisting of
names, datatypes, and constraints. To use a source definition in a mapping, connect a
source definition to a Source Qualifier or Normalizer ransformation. The Informatica Server
uses these transformations to read the source data.
3.Filter
4.Joiner
5.Normalizer
6.Rank
7.Source Qualifier
Note: If you use PowerConnect to access ERP sources, the ERP Source Qualifier is also an
active transformation. You can connect only one of these active transformations to the
same transformation or target, since the Informatica Server cannot determine how to
concatenate data from different sets of records with different numbers of rows.
Passive transformations that never change the record count include the following:
1.Lookup
2.Expression
3.External Procedure
4.Sequence Generator
5.Stored Procedure
6.Update Strategy
You can connect any number of these passive transformations, or connect one active
transformation with any number of passive transformations, to the same transformation or
target.
2 Type (database data type, such as Char, Varchar, Number and so on).
3 Tablename (Name of the table th field will be part of).
The other information that needs to be stored is the transformation or transformations that
need to be applied to turn the source data into the destination data:
Transformation:
4 Transformation (s)
- Name
-Language (name of the lanjuage that transformation is written in).
- module name
- syntax
The Name is the unique identifier that differentiates this from any other similar
transformations.The Language attribute contains the name of the lnguage that the
transformation is written in.The other attributes are module name and syntax. Generally
these will be mutually exclusive, with only one being defined. For simple transformations
such as simple SQL functions the syntax will be stored. For complex transformations the
name of the module that contains the code is stored instead.
Data management:
Metadata is reqd to describe the data as it resides in the data arehouse.This is needed by
the warhouse manager to allow it to track and control all data movements. Every object in
the database needs to be described.
Metadata is needed for all the following:
5 Tables
- Columns
- name
- type
6 Indexes
- Columns
- name
- type
7 Views
- Columns
- name
- type
8 Constraints
- name
- type
- table
- columns
Aggregations, Partition information also need to be stored in Metadata( for details refer page
# 30)
Query Generation:
Metadata is also required by the query manger to enable it to generate queries. The same
metadata can be used by the Whouse manager to describe the data in the data warehouse
is also reqd by the query manager.
The query mangaer will also generate metadata about the queries it has run. This metadata
can be used to build a history of all quries run and generate a query profile for each user,
group of users and the data warehouse as a whole.
The metadata that is reqd for each query is:
- query
- tables accessed
- columns accessed
- name
- refence identifier
- restrictions applied
- column name
- table name
- reference identifier
- restriction
Data Warehousing Interview Questions
25. What are the tasks that are done by Informatica Server?
Ans:The Informatica Server performs the following tasks:
1.Manages the scheduling and execution of sessions and batches
2.Executes sessions and batches
3.Verifies permissions and privileges
4.Interacts with the Server Manager and pmcmd.
The Informatica Server moves data from sources to targets based on metadata stored in a
repository. For instructions on how to move and transform data, the Informatica Server
reads a mapping (a type of metadata that includes transformations and source and target
definitions). Each mapping uses a session to define additional information and to optionally
override mapping-level options. You can group multiple sessions to run as a single unit,
known as a batch.
26. What are the two programs that communicate with the Informatica Server?
Ans: Informatica provides 1.Server Manager and 2.pmcmd programs to communicate with
the Informatica Server:
Server Manager: A client application used to create and manage sessions and batches, and
to monitor and stop the Informatica Server. You can use information provided through the
Server Manager to troubleshoot sessions and improve session performance.
Pmcmd: A command-line program that allows you to start and stop sessions and batches,
stop the Informatica Server, and verify if the Informatica Server is running.
28. (ii) What are the minimim condition that u need to have so as to use Targte
Load Order Option in Designer?
Ans: U need to have Multiple Source Qualifier transformations.
To specify the order in which the Informatica Server sends data to targets, create one Source
Qualifier or Normalizer transformation for each target within a mapping. To set the target load
order, you then determine the order in which each Source Qualifier sends data to connected
targets in the mapping. When a mapping includes a Joiner transformation, the Informatica
Server sends all records to targets connected to that Joiner at the same time, regardless of
the target load order.
1.Source Analyzer. Use to import or create source definitions for flat file,Cobol,
ERP, and relational sources.
2.Warehouse Designer. Use to import or create target definitions.
3.Transformation Developer. Use to create reusable transformations.
4.Mapplet Designer. Use to create mapplets.
5.Mapping Designer. Use to create mappings.
Note:The Designer allows you to work with multiple tools at one time. You can also work
in multiple folders and repositories
To add a slight performance boost, you can also set the tracing level to Terse, writing the
minimum of detail to the session log when running a session containing the transformation.
31. What the difference is between a database, a data warehouse and a data mart?
Ans:
A database is an organized collection of information.
A data warehouse is a very large database with special sets of tools to extract and cleanse
data from operational systems and to analyze data.
A data mart is a focused subset of a data warehouse that deals with a single area of data
and is organized for quick analysis.
32. What is Data Mart, Data WareHouse and Decision Support System explain
briefly?
Ans: Data Mart:
A data mart is a repository of data gathered from operational data and other sources that is
designed to serve a particular
community of knowledge workers. In scope, the data may derive from an enterprise-wide
database or data warehouse or be more specialized. The emphasis of a data mart is on
meeting the specific demands of a particular group of knowledge users in terms of analysis,
content, presentation, and ease-of-use. Users of a data mart can expect to have data
presented in terms that are familiar.
In practice, the terms data mart and data warehouse each tend to imply the presence of the
other in some form. However, most writers using the term seem to agree that the design of
a data mart tends to start from an analysis of user needs and that a data warehouse tends
to start from an analysis of what data already exists and how it can be collected in such a
way that the data can later be used. A data warehouse is a central aggregation of data
(which can be distributed physically); a data mart is a data repository that may derive from
a data warehouse or not and that emphasizes ease of access and usability for a particular
designed purpose. In general, a data warehouse tends to be a strategic but somewhat
unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate
need.
Data Warehouse:
A data warehouse is a central repository for all or significant parts of the data that an
enterprise's various business systems collect. The term was coined by W. H. Inmon. IBM
sometimes uses the term "information warehouse."
Typically, a data warehouse is housed on an enterprise mainframe server. Data from various
online transaction processing (OLTP) applications and other sources is selectively extracted
Data Warehousing Interview Questions
and organized on the data warehouse database for use by analytical applications and user
queries. Data warehousing emphasizes the capture of data from diverse sources for useful
analysis and access, but does not generally start from the point-of-view of the end user or
knowledge worker who may need access to specialized, sometimes local databases. The
latter idea is known as the data mart.data mining, Web mining, and a decision support
system (DSS) are three kinds of applications that can make use of a data warehouse.
Typical information that a decision support application might gather and present would be:
Comparative sales figures between one week and the next
Projected revenue figures based on new product sales assumptions
The consequences of different decision alternatives, given past experience in a context that
is described
A decision support system may present information graphically and may include an expert
system or artificial intelligence (AI). It may be aimed at business executives or some other
group of knowledge workers.
34. How do you use DDL commands in PL/SQL block ex. Accept table name from
user and drop it, if available else display msg?
Ans: To invoke DDL commands in PL/SQL blocks we have to use Dynamic SQL, the Package
used is DBMS_SQL.
36. Which package, procedure is used to find/check free space available for db
objects like table/procedures/views/synonyms…etc?
Ans: The Package is DBMS_SPACE
The Procedure is UNUSED_SPACE
The Table is DBA_OBJECTS
37. Does informatica allow if EmpId is PKey in Target tbl and source data is 2 rows
with same EmpID?If u use lookup for the same Situation does it allow to load 2
rows or only 1?
Ans: => No, it will not it generates pkey constraint voilation. (it loads 1 row)
=> Even then no if EmpId is Pkey.
38. If Ename varchar2(40) from 1 source(siebel), Ename char(100) from another source
(oracle) and the target is having Name varchar2(50) then how does informatica handles
this situation? How Informatica handles string and numbers datatypes sources?
Data Warehousing Interview Questions
Unconnected Lookups : -
An unconnected Lookup transformation exists separate from the data flow in the mapping.
You write an expression using
the :LKP reference qualifier to call the lookup within another transformation.
Some common uses for unconnected lookups include:
1. Testing the results of a lookup in an expression
2. Filtering records based on the lookup results
3. Marking records for update based on the result of a lookup (for example, updating slowly
changing dimension tables)
4. Calling the same lookup multiple times in one mapping
Ex:-
**************************
TO USE EXPLAIN PLAN FOR A QRY...
**************************
SQL> EXPLAIN PLAN
2 SET STATEMENT_ID = 'PKAR02'
3 FOR
4 SELECT JOB,MAX(SAL)
5 FROM EMP
6 GROUP BY JOB
7 HAVING MAX(SAL) >= 5000;
Explained.
**************************
TO QUERY THE PLAN TABLE :-
**************************
SQL> SELECT RTRIM(ID)||' '||
2 LPAD(' ', 2*(LEVEL-1))||OPERATION
3 ||' '||OPTIONS
4 ||' '||OBJECT_NAME STEP_DESCRIPTION
5 FROM PLAN_TABLE
6 START WITH ID = 0 AND STATEMENT_ID = 'PKAR02'
7 CONNECT BY PRIOR ID = PARENT_ID
8 AND STATEMENT_ID = 'PKAR02'
9 ORDER BY ID;
STEP_DESCRIPTION
----------------------------------------------------
0 SELECT STATEMENT
1 FILTER
2 SORT GROUP BY
3 TABLE ACCESS FULL EMP
For example, to copy a mapping, the Mapping Designer must be active. To copy a Source
Definition, the Source Analyzer must be active.
Copying Mapping:
To copy the mapping, open a workbook.
In the Navigator, click and drag the mapping slightly to the right, not dragging it to the
workbook.
When asked if you want to make a copy, click Yes, then enter a new name and click OK.
Choose Repository-Save.
Repository Copying: You can copy a repository from one database to another. You use this
feature before upgrading, to preserve the original repository. Copying repositories provides
a quick way to copy all metadata you want to use as a basis for a new repository.
If the database into which you plan to copy the repository contains an existing repository,
the Repository Manager deletes the existing repository. If you want to preserve the old
repository, cancel the copy. Then back up the existing repository before copying the new
repository.
To copy a repository, you must have one of the following privileges:
Administer Repository privilege
Super User privilege
To copy a repository:
1. In the Repository Manager, choose Repository-Copy Repository.
2. Select a repository you wish to copy, then enter the following information:
If you are not connected to the repository you want to copy, the Repository Manager asks
you to log in.
Data Warehousing Interview Questions
3. Click OK.
4 If asked whether you want to delete an existing repository data in the second
repository, click OK to delete it. Click Cancel to preserve the existing repository.
Copying Sessions:
In the Server Manager, you can copy stand-alone sessions within a folder, or copy sessions
in and out of batches.
To copy a session, you must have one of the following:
Create Sessions and Batches privilege with read and write permission
0 Super User privilege
To copy a session:
1. In the Server Manager, select the session you wish to copy.
2. Click the Copy Session button or choose Operations-Copy Session.
The Server Manager makes a copy of the session. The Informatica Server names the copy
after the original session, appending a number, such as session_name1.
When the object the shortcut references changes, the shortcut inherits those changes.
By using a shortcut instead of a copy,
you ensure each use of the shortcut exactly matches the original object. For example,
if you have a shortcut to a target
definition, and you add a column to the definition, the shortcut automatically inherits
the additional column.
Shortcuts allow you to reuse an object without creating multiple objects in the
repository. For example, you use a source
definition in ten mappings in ten different folders. Instead of creating 10 copies of the
same source definition, one in each
folder, you can create 10 shortcuts to the original source definition.
You can create shortcuts to objects in shared folders. If you try to create a shortcut to
a non-shared folder, the Designer
creates a copy of the object instead.
folders.You can save space in your repository by keeping a single repository object and
using shortcuts to that object, instead of creating copies of the object in multiple folders or
multiple repositories.
You can configure a session to stop if the Informatica Server encounters an error while
executing pre-session shell commands.
For example, you might use a shell command to copy a file from one directory to another.
For a Windows NT server you would use the following shell command to copy the SALES_
ADJ file from the target directory, L, to the source, H:
copy L:\sales\sales_adj H:\marketing\
For a UNIX server, you would use the following command line to perform a similar
operation:
cp sales/sales_adj marketing/
Tip: Each shell command runs in the same environment (UNIX or Windows NT) as the
Informatica Server. Environment settings in one shell command script do not carry over to
other scripts. To run all shell commands in the same environment, call a single shell script
that in turn invokes other scripts.
Note: You can only work within one version of a folder at a time.
51. What are the differences between 4.7 and 5.1 versions?
Data Warehousing Interview Questions
Ans: New Transformations added like XML Transformation and MQ Series Transformation,
and PowerMart and PowerCenter both are same from 5.1version.
53. How many values it (informatica server) returns when it passes thru
Connected Lookup n Unconncted Lookup?
Ans: Connected Lookup can return multiple values where as Unconnected Lookup will return
only one value that is Return Value.
Rank:Filters the top or bottom range of records, based on a condition you enter using an
expression.
update Strategy:Assigns a numeric code to each record based on an expression, indicating
whether the Informatica Server should use the information in the record to insert, delete, or
update the target.
In each transformation, you use the Expression Editor to enter the expression. The
Expression Editor supports the transformation language for building expressions. The
transformation language uses SQL-like functions, operators, and other components to build
the expression. For example, as in SQL, the transformation language includes the functions
COUNT and SUM. However, the PowerMart/PowerCenter transformation language includes
additional functions not found in SQL.
When you enter the expression, you can use values available through ports. For example, if
the transformation has two input ports representing a price and sales tax rate, you can
calculate the final sales tax using these two values. The ports used in the expression can
appear in the same transformation, or you can use output ports in other transformations.
57. In case of Flat files (which comes thru FTP as source) has not arrived then what
happens?Where do u set this option?
Ans: U get an fatel error which cause server to fail/stop the session.
U can set Event-Based Scheduling Option in Session Properties under General tab--
>Advanced options..
----------------- ------------------- ------------------
Event-Based Required/ Optional Description
----------------- -------------------- ------------------
Indicator File to Wait For Optional Required to use event-based
scheduling. Enter the indicator
file
(or directory and file) whose arrival
schedules the session. If you do not enter a directory, the Informatica Server assumes the
file appears in the server variable directory $PMRootDir.
58. What is the Test Load Option and when you use in Server Manager?
Ans: When testing sessions in development, you may not need to process the entire source.
If this is true, use the Test Load
Option(Session Properties General Tab Target Options Choose Target Load
options as Normal (option button), with
Test Load cheked (Check box) and No.of rows to test ex.2000 (Text box with
Scrolls)). You can also click the Start button.
64. what will happen if u increase commit Intervals? and also decrease commit
Intervals?
65. What kind of Complex mapping u did? And what sort of problems u faced?
67. Can u refresh Repository in 4.7 and 5.1? and also can u refresh pieces
(partially) of repository in 4.7 and 5.1?
70. BI Faq
Ans: http://www.visionnet.com/bi/bi-faq.shtml
DATA CLEANING - a two step process including DETECTION and then CORRECTION of
errors in a data set
Repository: The place where you store the metadata is called a repository. The more
sophisticated your repository, the more Complex and detailed metadata you can store in it.
PowerMart and PowerCenter use a relational database as the Repository.
76. What is the filename which u need to configure in Unix while Installing
Informatica?
77. How do u select duplicate rows using Informatica i.e., how do u use
Max(Rowid)/Min(Rowid) in Informatica?
Business Objects
ANS:
BO DESIGNER,
BUSINESS QUERY FOR EXCEL,
BO REPORTER,
INFOVIEW,
EXPLORER,
Data Warehousing Interview Questions
WEBI,
BO PUBLISHER, AND
BROADCAST AGENT,
BO ZABO).
INFOVIEW: IT PORTAL ENTRY INTO WEBINTELLIGENCE & BUSINESS OBJECTS BASE MODULE
REQUIRED FOR ALL OPTIONS TO VIEW AND REFRESH REPORTS.
REPORTER: UPGRADE TO CREATE/MODIFY REPORTS ON LAN OR WEB.
EXPLORER: UPGRADE TO PERFORM OLAP PROCESSING ON LAN OR WEB.
DESIGNER: CREATES SEMANTIC LAYER BETWEEN USER AND DATABASE.
SUPERVISOR: ADMINISTER AND CONTROL ACCESS FOR GROUP OF USERS.
WEBINTELLIGENCE: INTEGRATED QUERY, REPORTING, AND OLAP ANALYSIS OVER THE WEB.
BROADCAST AGENT: USED TO SCHEDULE, RUN, PUBLISH, PUSH, AND BROADCAST PRE-
BUILT REPORTS AND SPREADSHEETS, INCLUDING EVENT NOTIFICATION AND RESPONSE
CAPABILITIES, EVENT FILTERING, AND CALENDAR BASED
NOTIFICATION, OVER THELAN, E-MAIL, PAGER,FAX, PERSONAL DIGITALASSISTANT( PDA),
SHORT MESSAGING SERVICE(SMS), ETC. SET ANALYZER - APPLIES SET-BASED ANALYSISTO
PERFORM FUNCTIONS SUCHAS EXECLUSION, INTERSECTIONS, UNIONS, AND OVERLAPS
VISUALLY.
DEVELOPER SUITE - BUILD PACKAGED, ANALYTICAL, OR CUSTOMIZED APPS.
1)there are five sessions in a batch, he wants to run first two parallel and next three
in sequence. how do u do that.
answer is we have to go for nested batches.
yes
6)What are the Differences Between informatica 4.7 and 5.1 and 6.0
0 Router Tranformation is avialble from 5.0 onwards
1 Debugging in Designer
2 Partition of session in session manager
3 In 6.0 complete heterogeneous targets one in oracle one in db2 into multiple targets
4 Data partitioning run as multiple sessions in informatica 6.0
5 Repository Server (New Component)
6 Workflow Manager (New Component)
7 Workflow Monitor (New Component)
8
7)What is diff. Between 5.1 & 6.0.
1) One new transformation is added which is called sorter transformation.Sorter
transformation can be used before aggregator for fast processing.2) Repository manager is
same in both.3) server manager is called workflow manager.4) Batch is called workflow and
Data Warehousing Interview Questions
sub batches are called worklets.5) workflow is the top batch and u can have sub batches
which are called worklets and sessions under worklets.6) sessions can be run independently by
right clicking on them.7) output monitoring window in 6.x is called workflow monitor
DATAWAREHOUSE FAQ :
0 WHAT IS DATWAREHOUSE ?
1 WHO NEEDS DATAWAREHOUSE ?
2 WHAT ARE TYPES OF DATABASE SYSTEMS ?
3 WHAT ARE IMPORTANT CONCERNS OF OLTP AND DSS SYSTEMS ?
4 WHAT IS ARCHITECTURE OF DATAWAREHOUSE?
5 WHAT IS A DATA MART?
6 WHAT ARE CHARACTERISTICS OF DATA WAREHOUSE?
7 WHAT IS DIFFERENCE BETWEEN DATA MART AND DATAWAREHOUSE?
8 WHAT IS DATA MODELING?
9 WHAT IS AN ENTITY, ATTRIBUTE AND RELATIONSHIP?
10 WHAT ARE DIFFERENT TYPES OF RELATIONSHIPS ?
11 WHAT IS DIFFERENCE BETWEEN CARDINALITY AND NULLABILITY?
12 WHAT ARE DIFFERENT STEPS FOR DATA MODELING?
13 WHAT IS A PHYSICAL DATA MODEL?
14 WHAT IS A LOGICAL DATA MODEL ?
15 WHAT IS FORWARD,REVERSE AND RE ENGINEERING?
16 WHAT IS NORMALIZATION, DENORMALIZATION?
17 WHAT ARE DIFFERENT FORMS OF NORMALIZATION?
18 WHAT IS ETL OR ETT ?
19 WHAT IS A STAR SCHEMA ?
20 WHAT ARE FACT AND DIMENSION TABLES?
21 WHAT IS A STAR-FLAKE OR SNOW-FLAKE SHEMA ?
22 WHAT IS VERY LARGE DATABASE?
23 WHAT ARE SMP AND MPP?
24 WHAT IS PARALLELISM ?
25 WHAT IS A PARALLEL QUERY ?
26 WHAT IS AN OLAP AND WHAT ARE ITS TYPES?
27 HOW OLTP IS DIFFERENT FROM OLAP ?
28 WHAT IS DATA MINING?
29 WHAT IS DIFFERENCE BETWEEN DATAWAREHOUSE AND OLAP?
30 WHAT ARE FACILITIES PROVIDED BY DW TO ANALYTICAL USERS?
Data Warehousing Interview Questions
39 WHAT IS COGNOS? WHAT ARE IMPORTANT PRODUCTS OF COGNOS AND THEIR USE?
40 WHAT IS A CATALOG OF COGNOS? WHAT ARE DIFFERENT TYPES OF CATALOG?
41 WHAT IS A DIMENSION,LEVEL,CATEGORY AND MEASURE IN COGNOS TRANSFORMER?
42 WHAT IS NAME OF ADMINISTRATOR USER IN COGNOS?
43 HOW TO CREATE A USER AND MANAGE A USER IN COGNOS?
44 WHAT ARE DIFFERENT TYPES OF REPORTS GENERATED USING COGNOS IMPROMPTU?
45 WHAT IS BUSINESS OBJECTS? WHAT ARE IMPORTANT PRODUCTS OF BUSINESS
OBJECTS?
46 WHAT ARE DIFFERENT USER PROFILES OF BUSINESS OBJECTS?
47 WHAT IS A UNIVERSE OF BUSINESS OBJECTS?
48 EXPLAIN USER HIERARCHY IN BUSINESS OBJECTS?
49 WHAT IS A CLASS,OBJECT,DIMENSION,DETAIL,MEASURE OF BUSINESS OBJECTS?
50 WHAT IS AN ETL TOOL? EXPLAIN EXTRACTION,TRANSFORMATION,LOADING
PROCESS?
51 WHAT IS METADATA REPOSITORY OF OWB?
52 WHAT ARE CODE GENERATOR AND INTEGRATORS OF OWB?
INFORMATICA FAQ :
Informatica :
1.Performance Enhancement in ETL.
2.Difficulties faced in ETL Job ?. Did you overcome ?.
3.What are Mapplets .
4.What are the OLTP Process you worked with ______.
5.What is Lookup Transformation.
6.Tell about Cache Directory in Lookup.
7.What is Confirmed Diemension.
8.Slowly Changing Diemension and How to over them ?
9.Can a Mapplet have Target.
10.How many Transfomations are there in Informatica.
11.How do you connect to Remote database in Mapping.
12.What is Data Driven in Update Strategy Transformation.
13.What is Mapplets?. Repository Objects that are not supported in Mapplets Why?
14.Difference between Oracle Warehouse Builder and Informatica.
15.How do you identify Fact and Dimension tables.
16.Difference between Connected and Unconnected Lookup.
17.What is Source Qualifier.
18.What are the OLTP Process you worked with ___RDBMS___.
Business Objects :
1.What is Universe.
2.How can we create Universe.
3.What are the parameters that we are using at time of Universe Creation.
4.What is Repository.
5.How can we restrict rows in the report in Business Objects.
6.What is .key file.
7.What are Domains.
8.Can we create a report with Data Providers.
9.What are Locks.
10.What is Broad Cast Agent.
11.What is trouble shooting in Broad Cast Agent.
12.What is Adhoc report.
13.What are Loops in Business Objects. How to Use it.
14.Definition of Universe.
15.Explain grouped cross tab.
16.Who launches the Supervisor
Business Objects:
2. How do u export the report data into personal files (.txt, .xls)
Open the report containing the data U wants to export in BO
Click the view data command on the data menu.
Click the Export button on the data manager box select the format U
Want to export.
4. What is the difference when applying a sort on report and sort on Query panel?
Sort on report: Click the cell, column, row or chart element containing the
Data and then click the toll bar button for the sort U want to apply.
Sort on query panel: Click an object in the Result Objects box and then
click the sort button on the toll bar.
5. What is the result set u get if u drag objects with measure object (sum) into the report?
(How many rows u get in the report)
One row with the sum of the result measure object
Synchronous mode allows a user to cancel queries only during the fetch
phase. This is the option by default.
12.When developing a report how would u apply a single break with region,
division and department in that order
By going to Format Break -> Edit -> and add the variables that u want in a
single break
13.What is a Context within the reporter module (not a designer created context)
When u extend a Formula, u will see contexts like Inbody, Inblock, where
Inbody is input context and Inblock is output context
15.Why would the result set u see on the screen without applying any
formatting or filtering is different that what would be exported?
17. What do u has to execute a VBA Macro when any document is opened
Create a Adding in VBA Editor, save it with .rea extension and call this
adding into Business objects
18.What is Repository?
Provide a centralized storage location for BO applications.
Secure access b/n BO deployment and data ware house.
21. What is the use of Linked Universes, and Link Data Providers?
Linking two or more universes allows you to access multiple databases,
which may be deployed over different servers.
Linked universes are universes that share common components such as
parameters, classes, objects, or joins. One universe is said to be the kernel
or master universe while the others are the derived universes.
Kernel approach, Master approach and Component approach.
29. What are cardinalities, What is unknown cardinality and Cardinality not valid
Cardinality expresses the minimum and maximum number of instances of an
entity B that can be associated with an instance of an entity A.
Data Warehousing Interview Questions
32, How would u use the same LOV with many different objects?
Double Click on the object, go to properties tab, copy the code under
ListName box and paste it to the new object ListName box.
33, How can u minimize the download time of a Universe that has many custom
lov’s that are exported with the universe and refreshed upon usage?
This can be done by setting the check box “Do not retrieve the data” in
Go to object properties->Edit->Options
43 When u exports the Universe to Repository, what happens is it going directly to the
repository?
Yes, it directly goes to the repository and resides in the universe domain.
46 What are Pdac.lsi and Sdac.lsi files, where these files are stored?
These are the security and administration files.
Personal Data Account file (Pdac.lsi): Stores security information concerning the user’s personal
onnections to the database. Stored in the LocData folder in the client m/c.
Shared Data Account file (Sdac.lsi): Stores security information concerning the
shared type connections to the database. Stored in the ShData folder on the server.
50 How can u keep bomain key files synchronized between Cluster Manager and Cluster
Nodes?
By copying the bomain.key file to server and client
53 How does drill down in WebIntelligence differs from drill down in Business Objects?
WebI needs WebIntelligence Explorer to drill the reports In Business Objects - drill down is
applicable, but if u want to perform drill down analysis,install BUSINESSOBJECTS EXPLORER
module
57 Explain the major differences between full client reports run thru Infoview and
reports developed thru Weblntelligence.
Full Client Reports are compressed in Info View where as Web-I reports are
dynamically generated.
59 What is a DMZ?
Configuring the system to use double firewall between the WebServer and
the Application Server is called DMZ (De Militarized Zone)
Questions
Reader
Reporter
Explorer
Business query for excel
Business query server
Business miner
Designer
Supervisor
2. What is supervisor?
Supervisor is the product u need in order to set up and maintain secure environment for business objects
products. It is with supervisor u create the business objects repository then define the users and user groups, as
well as assign profiles to them.
General Supervisor. He only creates the repository when he launches the Supervisor.
He creates user-profiles, user groups and universes. This user has all rights of the
Supervisor and Designer combined. This user can access Supervisor, Designer and business
objects end user products.
8. What r the business objects end user products? and why for them?
Users use BO end users products to query report and analyze data. They r
Business Objects
Reporter
Explorer
Business Query for Excel.
Reporter and Explorer used for multidimensional analysis. Business Query for Excel
provides the queried data in Excel Sheets for analysis.
It is a centralized set of data structures stored in a database. It enables business objects users
to share resources in a controlled environment. The repository is made up of three domains.
Security domain
Universe domain
Document domain
Security domain contains information on the other domains (universe and documents
domains) and on the identification of business objects users. Security domain is created with
the wizard the first time supervisor is launched.
Universe domain It contains the information on the universes created and exported with
Designer. The universe domain makes it possible to store, distribute and administrative
universe.
2. What is supervisor?
Supervisor is the product u need in order to set up and maintain secure environment for business objects
products. It is with supervisor u create the business objects repository then define the users and user groups, as
well as assign profiles to them.
General Supervisor. He only creates the repository when he launches the Supervisor.
Regularly delete old or outdated documents from the repository. U can do this by using the
delete document command in the tools menu in SUPERVISOR.
15. What r the factors that u have to consider while choosing the repository database?
Row level locking mechanism allows for the highest degree of concurrency and minimum
conflicts between multiple users accessing and updating data in the same repository
domain(s).
When users exchange documents via the repository the documents r stored in slices in the
OBJ_X_documents table of the document domain. Depending on it's size and the length of
each slice, single document might be stored in one or more rows in the table.
17.Define Universe?
Slowly Changing Dimension
The basic assumption while designing data warehouse is that the data in the
warehouse will never change but over a period of time contain attributes of a dimension will
change this is called SCD.
Ex: customer
Customer address
Phone number changes
Email etc
• TYPE I
Data Warehousing Interview Questions
TYPE I : we will over write previous data with the new data..
TYPE II: we will add a new record or field.
TYPE III : we will add a new attribute or column.
SlowlyChangingDimensionsTypeI
• Clicktoad
danou
tline
Source T
arget
Empid Na
m e Ema
il Empid Na
m e Ema
il
1
001 Sh
ane Shan
e@xy
z. 1
001 Sh
ane Sha
ne@xy
co
m z
.com
Source T
arget
Empid Na
m e Ema
il Empid Na
m e Ema
il
1
001 Sh
ane Sha
n e
@ 1
001 Sh
ane Sha
n e
@
Shane@xyz.
a
bc.co.in a
bc.co.in
co
m
SlowlyChangingDimensionsTypeII
• Clicktoaddanoutline
Target
Source P
M
M
A
_P
RY
R
K
I Empid Name Email P
R
M
S
_V
IO
E
N_
EY NUMBE
Empid Name Email R
10 Shane Shane@xyz.
com 1000 10 Shane Shan e@ 0
xyz.
com
S
lowlyC
hang
ingD
ime
n s
ion
s-V
ers
ion
ing
S
ource
E
m pid N
ame E
m a
il
1
0 S
han
e S
han
e@
a
bc.c
om
P
M _
P R
IM E
m pid N
ame E
m a
il P
M _
V E
RSIO
N_NU
M
A
RYKEY B
ER
T
arg
et 1
000 1
0 S
han
e Shane@ 0
x
yz.c
om
1
001 1
0 S
han
e S
han
e@ 1
a
bc.c
o.in
1
003 1
0 S
han
e S
han
e@ 2
a
bc.c
om
Data Warehousing Interview Questions
S
lowlyC
han
gin
gDim
ens
ion
sTy
peII-F
lag
• C
lic
ktoa
dda
nou
tlin
e
P
M _P E
m pid N
ame E
m a
il PM_
C UR
RIM
A RENT_
F
E
m pid N
ame E
m a
il
RYKE L
A G
Y
1
0 S
han
e Sha
ne@x
yz.
c
om 1
000 1
0 S
han
e Sha
ne 1
@xyz.
c
om
S
ourc
e
T
arg
et
S
lowlyC
han
gin
gDim
ens
ion
s-F
lagC
urre
nt
S
ourc
e
E
m pid N
ame E
m a
il
1
0 S
han
e S
han
e@
a
bc.c
o.in
P
M _
P R
IMA E
m pid N
ame E
m a
il P
M _
C U
RRE
NT_
FLAG
RYKEY
1
000 1
0 S
han
e Shane@ N
x
yz.c
om
1
001 1
0 S
han
e Shan
e@ Y
a
bc.c
o.in
T
arg
et
S
lo
wlyCh
ang
ingDim
en
sio
ns-F
lagCu
rr
e n
t
S
ourc
e
E
mpid N
ame E
mail
1
0 S
han
e S
han
e@
a
bc.c
om
P
M_P
RIM
A E
mpid N
ame E
mail P
M_C
URR
ENT
_FL
AG
R
YKE
Y
T
arg
et 1
000 1
0 S
han
e S
han
e@ N
x
yz.c
om
1
001 1
0 S
han
e S
han
e@ N
a
bc.c
o.in
1
003 1
0 S
han
e S
han
e@ Y
a
bc.c
om
S
lowlyC
hang
ingD
ime
n s
ion
sTy
peII
• C
lic
ktoa
dda
nou
tlin
e
PM_
P RI Empid Na
m e E
m a
il PM_BE PM_EN
MARYK GIN_
D A D_
D AT
E
E
m pid Na
m e E
m a
il
EY T
E
1
0 Sh
ane Sha
ne@xy
z.
c
o m
1
000 1
0 Sh
ane Shane@ 0
1/0
1/0
0
xy
z.com
S
ourc
e
T
arg
et
Data Warehousing Interview Questions
S
lowlyC
han
gin
gDim
ens
ion
s-E
ffe
ctiv
eDa
te
S
ourc
e
E
m pid N
ame E
m a
il
1
0 S
han
e Sh
a n
e@
a
bc.c
o.in
P
M _
PRIM
AR E
mpid N
ame E
m a
il P
M_BE
GIN
_ P
M _
END
_
Y
KEY D
ATE D
ATE
1
000 1
0 S
han
e Shane
@ 0
1/0
1/0
0 0
3/0
1/0
0
x
yz.c
om
1
001 1
0 S
han
e Sh
a n
e@ 0
3/0
1/0
0
a
bc.c
o.in
T
arg
et
S
lo
wlyCh
ang
in
gDim
en
sio
ns-E
ff
e c
tiv
eDa
te
S
our
ce
E
mpid N
ame E
mail
1
0 S
ha
ne S
ha n
e@
a
bc.c
om
P
M_P
RIM E
mpid N
ame E
mail P
M_B
EGIN
_ P
M_E
ND_
DA
A
RYK
EY D
ATE T
E
1
000 1
0 S
ha
ne S
ha n
e@ 0
1/0
1/0
0 0
3/0
1/0
0
x
yz.c
om
1
001 1
0 S
ha
ne S
ha n
e@ 0
3/0
1/0
0 0
5/0
2/0
0
a
bc.c
o.in
1
003 1
0 S
ha
ne S
ha n
e@ 0
5/0
2/0
0
a
bc.c
om
T
ar
g e
t
S
lo
wlyCh
ang
ingDim
en
sio
nsT
ypeIII
P
M_PR
I E
mpid N
ame E
mail PM_Pr
e v P
M_EF
FE
MA
RYKE _
Colu
m n C
T_D
ATE
Y Na
m e
E
mpid N
ame E
mail
1
0 S
han
e Sha
ne@x
yz. 1 1
0 S
han
e Sh
ane@x
y 0
1/0
1/0
0
c
om z
.co
m
S
ourc
e T
arg
et
S
low
lyC
han
gin
g D
ime
nsio
nsT
ypeIII
S
ourc
e
E
mpid N
ame E
mail
1
0 S
han
e S
ha n
e@
a
bc.c
o.in
P
M_P
RIM
AR E
mpid N
ame E
mail P
M _
Pre
v_C
ol P
M _
EFF
EC
Y
KEY u
m n
Name T
_DATE
1 1
0 S
han
e S
ha n
e@ S
han
e@x
yz.c 0
1/0
2/0
0
a
bc.c
o.in o
m
T
arg
et
Data Warehousing Interview Questions
S
lowlyC
han
gin
gDim
ens
ion
sTy
peIII
S
ourc
e
E
m pid N
ame E
m a
il
1
0 S
han
e Sh
a n
e@
a
bc.c
om
P
M_P
RIM E
m pid N
ame E
m a
il PM_Pre
v_Co
l PM_
EFF
ECT
A
RYK
EY u
m n
Na me _
DATE
1 1
0 S
han
e Sh
a n
e@ Sh
a n
e@ 0
1/0
3/0
0
a
bc.c
om a
bc.c
o.in
T
arg
et
A dimension which have single attribute we place this attribute in the fact table. This
is called DD.
2)Meta data
4) cube
0 multi dimensional databases store information in the form of cubes.
1 Cube is collection of facts and related dimensions stored together in arrays.
5) what is a OLAP?
6) data cleansing?
7) Transformation
8) Mappings
9) Conformed Dimensions
0 Conformed dimensions are those which are consistent across data marts.
1 Casual dimensions can be used for explaining why a record exists in a fact table.
2 Casual dimensions should not change the grain of the fact table.
CasualDimensions
• Ca
sua
ldimensio
nsshou
ldn
ot ch
ang
eth
egra
ino
f th
efa
ct ta
ble
.
HelperTables
SurrogateKeys
• Joinsbetweenfact anddimensiontablesshouldbebasedonsurrogate
keys
• Thesekeysshouldbesimpleintegers
FactlessFactTables
• Coveragetables
• Event trackingtables
Data Warehousing Interview Questions