Data Warehousing Basics Interview Questions

1.
Data Warehousing Basics Interview Questions & Answers
1. What is hybrid slowly changing dimension?
Ans:- Hybrid SCDs are combination of both SCD 1 and SCD 2.
It may happen that in a table, some columns are important and we

need to track changes for them i.e capture the historical data for them
whereas in some columns even if the data changes, we don't care.
For such tables we implement Hybrid SCDs, where in some columns are
Type 1 and some are Type 2.
You can add that it is not an intelligent key but similar to a sequence
number and tied to a timestamp typically!
2. can a dimension table contains numeric values?
Ans:- Yes.But those datatype will be char (only the values can
numeric/char).
yes,diemesions even contain numerical because these are disscriptive

elements of
ur business
3. How to Create Surrogate Key using Ab Initio?
Ans:- There are many ways to create Surrogatekey but it depends on

your business logic. Here you can try these ways.
1. use next_in_sequence() function in your transform.
2.use Assign key values component (if your gde is higher than 1.10)
3.write a stored proc to this and call this stor proc wherever you need.
Yes, dimension table contains numerics but not contain measures and
facts
4. Differences between star and snowflake schemas ?
Ans:- Star schema
A single fact table with N number of Dimension

Snowflake schema
Any dimensions with extended dimensions are know as snowflake

schema
STAR MODEL :Radical arrangement of the organisation with one fact

table and more dimension tables SNOWFLAKE MODEL: The dimension
table subdivide itself forming new dimension table and start acting as a
fact table.
a datamart is subject oriented data base through which we can analyse

the business in each department in an organisation.
data marts are called high query performance structures
5. What is a CUBE in datawarehousing concept?
Ans:- Cubes are logical representation of multidimensional data.The

edge of the cube contains dimension members and the body of the
cube contains data
values.
6. Difference between Snow flake and Star Schema. What are

situations where Snow
flake Schema is better than Star Schema to use and when the
opposite is true?
Ans:- Star schema contains the dimesion tables mapped around one or
more fact tables.
It is a denormalised model.
No need to use complicated joins.
Queries results fastly.
Snowflake schema
It is the normalised form of Star schema.

contains indepth joins ,bcas the tbales r splitted in to many pieces.We can
easily do modification directly in the tables.
We hav to use comlicated joins ,since we hav more tables .
There will be some delay in processing the Query .
7. What is ER Diagram ?
Ans:- The Entity-Relationship (ER) model was originally proposed by Peter

in 1976 [Chen76] as a way to unify the network and relational database
views.
Simply stated the ER model is a conceptual data model that views the real
world as entities and relationships. A basic component of the model is the
Entity-Relationship diagram which is used to visually represents data
objects.
Since Chen wrote his paper the model has been extended and today it is
commonly used for database design For the database designer, the utility of
the ER model is:
it maps well to the relational model. The constructs used in the ER model
can easily be transformed into relational tables.
it is simple and easy to understand with a minimum of training. Therefore,
the model can be used by the database designer to communicate the design
to the end user.
In addition, the model can be used as a design plan by the database

developer to implement a data model in a specific database management
software.
8. What is degenerate dimension table?
Ans:- Degenerate Dimensions : If a table contains the values, which r

neither dimesion nor measures is called degenerate dimensions.Ex : invoice
id,empno
9. What is VLDB??
Ans:- The perception of what constitutes a VLDB continues to grow. A one

terabyte database would normally be considered to be a VLDB.
degenerate dimension:it doesn't have any link with dimensions and i wont
have any attribute.
10. What is meant by metadata in context of a Datawarehouse and

how it is important?
Ans:- Meta data is the data about data; Business Analyst or data modeler
usually capture information about data - the source (where and how the
data is originated), nature of data (char, varchar, nullable, existance, valid
values etc) and behavior of data (how it is modified / derived and the life
cycle ) in data dictionary a.k.a metadata. Metadata is also presented at the
Datamart level, subsets, fact and dimensions, ODS etc. For a DW user,
metadata provides vital information for analysis / DSS.
11. 1.what is incremintal loading?2.what is batch processing?3.what

is crass reference table?4.what is aggregate fact table??
Ans:- Incremental loading means loading the ongoing changes in the OLTP.
Aggregate table contains the [measure] values ,aggregated

/grouped/summed up to some level of hirarchy.
12. What are the possible data marts in Retail sales.?
Ans:- Product information,sales information
13. What is the main differnce between schema in RDBMS and

schemas in DataWarehouse....?
Ans:- RDBMS Schema

* Used for OLTP systems,* Traditional and old schema,* Normalized
* Difficult to understand and navigate.,* Cannot solve extract and complex
problems
* Poorly modelled
DWH Schema
* Used for OLAP systems.,* New generation schema.,* De Normalized
* Easy to understand and navigate., * Extract and complex problems can be
easily solved .,* Very good model
14. What are the vaious ETL tools in the Market?
Ans:- Various ETL tools used in market are:

1)Informatica.2)Data Stage 3) Oracle Warehouse Bulider 4)Ab Initio.,Data
Junction
Apart from the above Humming Bird Genio, Business Objects Data
Integrator
15. What is Dimensional Modelling?
Ans:- Dimensional Modelling is a design concept used by many data

warehouse desginers to build thier datawarehouse. In this design model all
the data is stored in two types of tables - Facts table and Dimension table.
Fact table contains the facts/measurements of the business and the
dimension table contains the context of measuremnets ie, the dimensions on
which the facts are calculated.
16. What is Data warehosuing Hierarchy?
Ans:- Hierarchies
Hierarchies are logical structures that use ordered levels as a means of
organizing data. A hierarchy can be used to define data aggregation. For
example, in a time dimension, a hierarchy might aggregate data from the
month level to the quarter level to the year level. A hierarchy can also be
used to define a navigational drill path and to establish a family structure.
Within a hierarchy, each level is logically connected to the levels above and
below it. Data values at lower levels aggregate into the data values at higher
levels. A dimension can be composed of more than one hierarchy. For
example, in the product dimension, there might be two hierarchies--one for
product categories and one for product suppliers.
Dimension hierarchies also group levels from general to granular. Query

tools use hierarchies to enable you to drill down into your data to view
different levels of granularity. This is one of the key benefits of a data
warehouse.
When designing hierarchies, you must consider the relationships in business

structures. For example, a divisional multilevel sales organization.
Hierarchies impose a family structure on dimension values. For a particular

level value, a value at the next higher level is its parent, and values at the
next lower level are its children. These familial relationships enable analysts
to access data quickly.
Levels
A level represents a position in a hierarchy. For example, a time dimension
might have a hierarchy that represents data at the month, quarter, and year
levels. Levels range from general to specific, with the root level as the
highest or most general level. The levels in a dimension are organized into
one or more hierarchies.
Level Relationships
Level relationships specify top-to-bottom ordering of levels from most
general (the root) to most specific information. They define the parent-child
relationship between the levels in a hierarchy.
Hierarchies are also essential components in enabling more complex

rewrites. For example, the database can aggregate an existing sales revenue
on a quarterly base to a yearly aggregation when the dimensional
dependencies between quarter and year are known.
17. What is data validation strategies for data mart validation after
loading process?
Ans:- Data validation is to make sure that the loaded data is accurate and
meets the business requriments.
Strategies are different methods followed to meet the validation requriments
18. Wht r the data types present in bo?n wht happens if we

implement view in the designer n report
Ans:- Three different data types: Dimensions,Measure and Detail.
View is nothing but an alias and it can be used to resolve the loops in the
universe.
19. What is surrogate key ? where we use it expalin with

examples???
Ans:- surrogate key is a substitution for the natural primary key.
It is just a unique identifier or number for each row that can be used for the
primary key to the table. The only requirement for a surrogate primary key
is that it is unique for each row in the table.
Data warehouses typically use a surrogate, (also known as artificial or

identity key), key for the dimension tables primary keys. They can use Infa
sequence generator, or Oracle sequence, or SQL Server Identity values for
the surrogate key.
It is useful because the natural primary key (i.e. Customer Number in

Customer table) can change and this makes updates more difficult.
Some tables have columns such as AIRPORT_NAME or CITY_NAME which are

stated as the primary keys (according to the business users) but ,not only
can these change, indexing on a numerical value is probably better and you
could consider creating a surrogate key called, say, AIRPORT_ID. This would
be internal to the system and as far as the client is concerned you may
display only the AIRPORT_NAME.
2. Adapted from response by Vincent on Thursday, March 13, 2003
Another benefit you can get from surrogate keys (SID) is :
Tracking the SCD - Slowly Changing Dimension.
Let me give you a simple, classical example:
On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1'
(that's what would be in your Employee Dimension). This employee has a
turnover allocated to him on the Business Unit 'BU1' But on the 2nd of June
the Employee 'E1' is muted from Business Unit 'BU1' to Business Unit 'BU2.'
All the new turnover have to belong to the new Business Unit 'BU2' but the
old one should Belong to the Business Unit 'BU1.'
If you used the natural business key 'E1' for your employee within your
datawarehouse everything would be allocated to Business Unit 'BU2' even
what actualy belongs to 'BU1.'
If you use surrogate keys, you could create on the 2nd of June a new record
for the Employee 'E1' in your Employee Dimension with a new surrogate key.
This way, in your fact table, you have your old data (before 2nd of June)
with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June)
would take the SID of the employee 'E1' + 'BU2.'
You could consider Slowly Changing Dimension as an enlargement of your

natural key: natural key of the Employee was Employee Code 'E1' but for
you it becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the
difference with the natural key enlargement process, is that you might not
have all part of your new key within your fact table, so you might not be
able to do the join on the new enlarge key -> so you need another id.
19. What is a linked cube?
Ans:- Linked cube in which a sub-set of the data can be analysed into great
detail. The linking ensures that the data in the cubes remain consistent.
20. What is a source qualifier?-

Ans:- When you add a relational or a flat file source definition to a mapping,
you need to connect it to a Source Qualifier transformation. The Source
Qualifier represents the rows that the Informatica Server reads when it
executes a session.
21. What are the steps to build the datawarehouse??
Ans:- As far I know...

Gathering bussiness requirements,Identifying Sources,Identifying Facts,
Defining Dimensions,Define Attribues,Redefine Dimensions & Attributes
Organise Attribute Hierarchy & Define Relationship,Assign Unique Identifiers,
Additional convetions:Cardinality/Adding ratios.
1 business modeling 2 data modeling 3 data from the source databases 4
Extration Transformation Loading 5 DataWare house (Data Marts)
22. What are the different architecture of datawarehouse??
Ans:- I think, there are two main things.,

1. Top down - (bill Inmon)
2.Bottom up - (Ralph kimbol)
23. What is the difference between view and materialized view??
Ans:- View - store the SQL statement in the database and let you use it as a
table. Everytime you access the view, the SQL statement executes.
Materialized view - stores the results of the SQL in table form in the
database. SQL statement only executes once and after that everytime you
run the query, the stored result set is used. Pros include quick query results.
24. What is the main difference between Inmon and Kimball

philosophies of data warehousing?
Ans:- Both differed in the concept of building the datawarehosue..

According to Kimball ...
Kimball views data warehousing as a constituency of data marts. Data marts
are focused on delivering business objectives for departments in the
organization. And the data warehouse is a conformed dimension of the data
marts. Hence a unified view of the enterprise can be obtain from the
dimension modeling on a local departmental level.
Inmon beliefs in creating a data warehouse on a subject-by-subject area

basis. Hence the development of the data warehouse can start with data
from the online store. Other subject areas can be added to the data
warehouse as their needs arise. Point-of-sale (POS) data can be added later
if management decides it is necessary.
i.e.,
Kimball--First DataMarts--Combined way ---Datawarehouse
Inmon---First Datawarehouse--Later----Datamarts
25. what is junk dimension?

what is the difference between junk dimension and degenerated
dimension?
Ans:- Junk dimension: Grouping of Random flags and text Attributes in a

dimension and moving them to a separate sub dimension.
Degenerate Dimension: Keeping the control information on Fact table ex:
Consider a Dimension table with fields like order number and order line
number and have 1:1 relationship with Fact table, In this case this
dimension is removed and the order information will be directly stored in a
Fact table inorder eliminate unneccessary joins while retrieving order
information..
26. What is the definition of normalized and denormalized view and

what are the differences between them??
Ans:- Normalization is the process of removing redundancies.

Denormalization is the process of allowing redundancies.
27. Why fact table is in normal form?
Ans:- Basically the fact table consists of the Index keys of the
dimension/ook up tables and the measures.
so when ever we have the keys in a table .that itself implies that the table is
in the normal form.
28. What is Difference between E-R Modeling and Dimentional

Modeling.??
Ans:- Basic diff is E-R modeling will have logical and physical model.
Dimensional model will have only physical model.
E-R modeling is used for normalizing the OLTP database design.
Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.
29. What is Difference between E-R Modeling and Dimentional

Modeling.??
Ans:- Basic diff is E-R modeling will have logical and physical model.
Dimensional model will have only physical model.
E-R modeling is used for normalizing the OLTP database design.
Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.
30. What is conformed fact?
Ans:- Conformed dimensions are the dimensions which can be used across
multiple Data Marts in combination with multiple facts tables accordingly??
31. What are the methodologies of Data Warehousing.??
Ans:- Every company has methodology of their own. But to name a few
SDLC Methodology, AIM methodology are stardadly used. Other
methodologies are AMM, World class methodology and many more.
32. What is BUS Schema?
Ans:- BUS Schema is composed of a master suite of confirmed dimension
and standardized definition if facts.
Data Warehousing Concepts Interview Questions and Answers
33. What is the difference between datawarehouse and BI?
Ans:- Simply speaking, BI is the capability of analyzing the data of a

datawarehouse in advantage of that business. A BI tool analyzes the data of
a datawarehouse and to come into some business decision depending on the
result of the analysis.
34. What is a bo repository??

Ans:- Repository means set of database tables, Business object store
security information e.g user, group, access permission, user type etc. ,
universe information e.g. objects, classes, table name, column name,
relation ship etc.and document information..
35. What is the difference between Datawarehousing and

BusinessIntelligence?
Ans:- Data warehousing deals with all aspects of managing the
development, implementation and operation of a data warehouse or data
mart including meta data management, data acquisition, data cleansing,
data transformation, storage management, data distribution, data archiving,
operational reporting, analytical reporting, security management,
backup/recovery planning, etc. Business intelligence, on the other hand, is a
set of software tools that enable an organization to analyze measurable
aspects of their business such as sales performance, profitability, operational
efficiency, effectiveness of marketing campaigns, market penetration among
certain customer groups, cost trends, anomalies and exceptions, etc.
Typically, the term “business intelligence” is used to encompass OLAP, data
visualization, data mining and query/reporting tools.Think of the data
warehouse as the back office and business intelligence as the entire business
including the back office. The business needs the back office on which to
function, but the back office without a business to support, makes no sense.
36. Why do we override the execute method is struts? Plz give me

the details?
Ans:- As part of Struts FrameWork we can decvelop the Action

Servlet,ActionForm servlets(here ActionServlet means which class extends
the Action class is called ActionServlet and ActionFome means which calss
extends the ActionForm calss is called the Action Form servlet)and other
servlets classes.
In case of ActionForm class we can develop the validate().this method will
return the ActionErrors object.In this method we can write the validation
code.If this method return null or ActionErrors with size=0,the webcontainer
will call the execute() as part of the Action class.if it returns size > 0 it
willnot be call the execute().it will execute the jsp,servlet or html file as
value for the input attribute as part of the <action -mapping> attribute in
struts-config.xml file.
In case of Action class the execute() method retuen the ActionForward
object.in execute() we can write (return
mapping.findForward("success");)here mapping is the object for the
ActionMapping class.After that it will forward the request to the "success" jsp
file.(here success is context path for the jsp file,it is written in web.xml.
37. What is snapshot??
Ans:- You can disconnect the report from the catalog to which it is attached
by saving the report with a snapshot of the data. However, you must
reconnect to the catalog if you want to refresh the data.
38. Is OLAP databases are called decision support system ???
true/false?
Ans:- True.
38. What is active data warehousing?
Ans:- An active data warehouse provides information that enables decision-

makers within an organization to manage customer relationships nimbly,
efficiently and proactively. Active data warehousing is all about integrating
advanced decision support with day-to-day-even minute-to-minute-decision
making in a way that increases quality of those customer touches which
encourages customer loyalty and thus secure an organization's bottom line.
The marketplace is coming of age as we progress from first-generation
"passive" decision-support systems to current- and next-generation "active"
data warehouse implementations
39. Why Denormalization is promoted in Universe Designing?
Ans:- In a relational data model, for normalization purposes, some lookup

tables are not merged as a single table. In a dimensional data modeling(star
schema), these tables would be merged as a single table called DIMENSION
table for performance and slicing data.Due to this merging of tables into one
large Dimension table, it comes out of complex intermediate joins.
Dimension tables are directly joined to Fact tables.Though, redundancy of
data occurs in DIMENSION table, size of DIMENSION table is 15% only when
compared to FACT table. So only Denormalization is promoted in Universe
Desinging.
40. explain in detail about type 1,type 2(SCD),type 3 ?
Ans:- Type-1 Most Recent Value.

Type-2(full History) i) Version Number ii) Flag iii) Date
Type-3
Current and one Privileges value
41. Kindly numberWhat are the steps to be taken to schedule the

report?
Ans:- You can schedule any report using Business Objects (reporter) .1)
Open report in BO2) Select option " File->Send To- BCA"3) Select the BCA
name to which report has to be scheduled4) Set other options for report
scheduling like time , any macro , user etc.
42. what is aggregate table and aggregate fact table ... any
examples of both??
Ans:- Aggregate tableÂ contains summarised data. The materialized view
are aggregated tables. For ex in sales we have only date transaction. if we
want to create a report like sales by product per year. in such cases we
aggregate the dateÂ vales into week_agg, month_agg, quarter_agg,
year_agg. to retrive date from this tables we use @aggrtegate function.
43. What is the difference between OLAP and datawarehosue??
Ans:- Datawarehouse is the place where the data is stored for analyzing
where as OLAP is the process of analyzing the data,managing
aggregations,partitioning information into cubes for indepth visualization.
44. What is the difference between ODS and OLTP??
Ans:- ODS:- It is nothing but a collection of tables created in the

Datawarehouse that maintains only current data.
where as OLTP maintains the data only for transactions, these are designed
for recording daily operations and transactions of a business.
ods has broad enterprise wide scope ,but unlike the real enterprise data
warehouse ,data is refreshd in near real time and used for routin business
activity whereas o;yp is online transaction proces which comes across daily
bases data .
ODS: Operational Data Source/Store, The source from where we will get the
data for DWH is called ODS OLTP: We have content(Front End) associated
with the back end.
a fact table without facts.i,e there is no key measure to analyse the
business.
General Datawarehousing Interview Questions and Answers
45. What is real time data-warehousing?
Ans:- Real-time data warehousing is a combination of two things: 1) real-

time activity and 2) data warehousing. Real-time activity is activity that is
happening right now. The activity could be anything such as the sale of
widgets. Once the activity is complete, there is data about it.
Data warehousing captures business activity data. Real-time data

warehousing captures business activity data as it occurs. As soon as the
business activity is complete and there is data about it, the completed
activity data flows into the data warehouse and becomes available instantly.
In other words, real-time data warehousing is a framework for deriving
information from data as the data becomes available.
46. Name some of the real time data-warehousing tools.

Ans:- ??
47. What is ETL?
Ans:- ETL stands for extraction, transformation and loading.
ETL provide developers with an interface for designing source-to-target

mappings, ransformation and job control parameter.
· Extraction
Take data from an external source and move it to the warehouse pre-
processor database.
· Transformation
Transform data task allows point-to-point generating, modifying and
transforming data.
· Loading
Load data task adds records to a database table in a warehouse.
48. What Snow Flake Schema?
Ans:- Snowflake Schema, each dimension has a primary dimension table, to

which one or more additional dimensions can join. The primary dimension
table is the only table that can join to the fact table.
49. What is a dimension table?
Ans:- A dimensional table is a collection of hierarchies and categories along

which the user can drill down and drill up. it contains only the textual
attributes.
50. What are modeling tools available in the Market?
Ans:- There are a number of data modeling tools
Tool Name Company Name,Erwin Computer Associates ,Embarcadero

Embarcadero Technologies ,Rational Rose IBM Corporation ,Power Designer
Sybase Corporation ,
Oracle Designer Oracle Corporation
51. What are the vaious ETL tools in the Market?

Ans:- Various ETL tools used in market are:
1. Informatica,2. Data Stage,3. MS-SQL DTS(Integrated Services 2005)

4. Abinitio,5. SQL Loader,6. Sunopsis,7. Oracle Warehouse Bulider,
8. Data Junction
52. What is Dimensional Modelling? Why is it important ?
Ans:- Dimensional Modelling is a design concept used by many data

warehouse desginers to build thier datawarehouse. In this design model all
the data is stored in two types of tables - Facts table and Dimension table.
Fact table contains the facts/measurements of the business and the
dimension table contains the context of measuremnets ie, the dimensions on
which the facts are calculated.
Why is Data Modeling Important?

---------------------------------------
Data modeling is probably the most labor intensive and time consuming part
of the development process. Why bother especially if you are pressed for
time? A common response by practitioners who write on the subject is that
you should no more build a database without a model than you should build
a house without blueprints.
The goal of the data model is to make sure that the all data objects required
by the database are completely and accurately represented. Because the
data model uses easily understood notations and natural language , it can be
reviewed and verified as correct by the end-users.
The data model is also detailed enough to be used by the database

developers to use as a "blueprint" for building the physical database. The
information contained in the data model will be used to define the relational
tables, primary and foreign keys, stored procedures, and triggers. A poorly
designed database will require more time in the long-term. Without careful
planning you may create a database that omits data required to create
critical reports, produces results that are incorrect or inconsistent, and is
unable to accommodate changes in the user's requirements.
53. What is ER Diagram?
Ans:- The Entity-Relationship (ER) model was originally proposed by Peter in

1976 [Chen76] as a way to unify the network and relational database views.
Simply stated the ER model is a conceptual data model that views the real
world as entities and relationships. A basic component of the model is the
Entity-Relationship diagram which is used to visually represents data
objects.
Since Chen wrote his paper the model has been extended and today it is
commonly used for database design For the database designer, the utility of
the ER model is:
It maps well to the relational model. The constructs used in the ER model
can easily be transformed into relational tables.
It is simple and easy to understand with a minimum of training. Therefore,
the model can be used by the database designer to communicate the design
to the end user.
In addition, the model can be used as a design plan by the database

developer to implement a data model in a specific database management
software.
54. What is a Star Schema?
Ans:- Star schema is a type of organising the tables such that we can
retrieve the result from the database easily and fastly in the warehouse
environment.Usually a star schema consists of one or more dimension tables
around a fact table which looks like a star,so that it got its name.
55. What are Aggregate tables?
Ans:- Aggregate table contains the summary of existing warehouse data

which is grouped to certain levels of dimensions.Retrieving the required data
from the actual table, which have millions of records will take more time and
also affects the server performance.To avoid this we can aggregate the table
to certain required level and can use it.This tables reduces the load in the
database server and increases the performance of the query and can
retrieve the result very fastly.
56. What are the various Reporting tools in the Market?
Ans:- 1. MS-Excel ,2. Business Objects (Crystal Reports) ,3. Cognos

(Impromptu, Power Play) ,4. Microstrategy ,5. MS reporting services ,6.
Informatica Power Analyzer ,
7. Actuate ,8. Hyperion (BRIO) ,9. Oracle Express OLAP ,10. Proclarity,
57. What is the Difference between OLTP and OLAP?

Ans:- Main Differences between OLTP and OLAP are:-
1.User and System Orientation
OLTP: Customer-oriented, used for data analysis and querying by clerks,

clients and IT professionals.
OLAP: Market-oriented, used for data analysis by knowledge

workers( managers, executives, analysis).
2. Data Contents
OLTP: Manages current data, very detail-oriented.
OLAP: Manages large amounts of historical data, provides facilities for

summarization and aggregation, stores information at different levels of
granularity to support decision making process.
3. Database Design
OLTP: Adopts an entity relationship(ER) model and an application-oriented

database design.
OLAP: Adopts star, snowflake or fact constellation model and a subject-

oriented database design.
4. View
OLTP: Focuses on the current data within an enterprise or department.
OLAP: Spans multiple versions of a database schema due to the evolutionary

process of an organization; integrates information from many organizational
locations and data stores
57. What is Fact table?
Ans:- Fact Table contains the measurements or metrics or facts of business

process. If your business process is "Sales" , then a measurement of this
business process such as "monthly sales number" is captured in the Fact
table. Fact table also contains the foriegn keys for the dimension tables.
58. What are the Different methods of loading Dimension tables?

Ans:- Conventional Load:
Before loading the data, all the Table constraints will be checked against the
data.
Direct load:(Faster Loading)

All the Constraints will be disabled. Data will be loaded directly.Later the
data will be checked against the table constraints and the bad data won't be
indexed.
58. What is a lookup table?
Ans:- A lookUp table is the one which is used when updating a warehouse.
When the lookup is placed on the target table (fact table / warehouse) based
upon the primary key of the target, it just updates the table by allowing only
new records or updated records based on the lookup condition.
59. What is a general purpose scheduling tool?
Ans:- The basic purpose of the scheduling tool in a DW Application is to

stream line the flow of data from Source To Target at specific time or based
on some condition.
60. Name some of modeling tools available in the Market?
Ans:- These tools are used for Data/dimension modeling
1. Oracle Designer, 2. ERW in (Entity Relationship for windows),

3. Informatica (Cubes/Dimensions),4. Embarcadero,5. Power Designer
Sybase.
61. What are Data Marts?
Ans:- Data Marts are designed to help manager make strategic decisions
about their business.
Data Marts are subset of the corporate-wide data that is of value to a
specific group of users.
There are two types of Data Marts:
1.Independent data marts – sources from data captured form OLTP system,
external providers or from data generated locally within a particular
department or geographic area.
2.Dependent data mart – sources directly form enterprise data warehouses.
62. What is a data warehousing?

Ans:- Data Warehouse is a repository of integrated information, available for
queries and analysis. Data and information are extracted from
heterogeneous sources as they are generated.This makes it much easier and
more efficient to run queries over data that originally came from different
sources.
Typical relational databases are designed for on-line transactional processing

(OLTP) and do not meet the requirements for effective on-line analytical
processing (OLAP). As a result, data warehouses are designed differently
than traditional relational databases.
a data ware housing is analysing the business this is timevarient,non

volitail,integreated and subject oriented a data ware house is support
decision making.
63. Differences between star and snowflake schemas?
Ans:- Star schema - all dimensions will be linked directly with a fat table.
Snow schema - dimensions maybe interlinked or may have one-to-many
relationship with other tables.
64. Which columns go to the fact table and which columns go the
dimension table?
Ans:- The Primary Key columns of the Tables(Entities) go to the Dimension

Tables as Foreign Keys.
The Primary Key columns of the Dimension Tables go to the Fact Tables as
Foreign Keys.
65. What is ODS?
Ans:- 1. ODS means Operational Data Store.
2. A collection of operation or bases data that is extracted from

operation databases and standardized, cleansed, consolidated, transformed,
and loaded into an enterprise data architecture. An ODS is used to support
data mining of operational data, or as the store for base data that is
summarized for a data warehouse. The ODS may also be used to audit the
data warehouse to assure summarized and derived data is calculated
properly. The ODS may further become the enterprise shared operational
database, allowing operational systems that are being reengineered to use
the ODS as there operation databases.
66. What does level of Granularity of a fact table signify?
Ans:- Granularity
The first step in designing a fact table is to
determine the granularity of the fact table. By
granularity, we mean the lowest level of information
that will be stored in the fact table. This
constitutes two steps:
Determine which dimensions will be included.

Determine where along the hierarchy of each dimension
the information will be kept.
The determining factors usually goes back to the
Requirements
67. How are the Dimension tables designed?
Ans:- Most dimension tables are designed using Normalization principles

upto 2NF. In some instances they are further normalized to 3NF.
Find where data for this dimension are located.
Figure out how to extract this data.
Determine how to maintain changes to this dimension (see more on this in
the next section).
Change fact table and DW population routines.
68. What are conformed dimensions?

Ans:- Conformed dimentions are dimensions which are common to the
cubes.(cubes are the schemas contains facts and dimension tables)
Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4

are the Facts and Dimensions
here D1,D2 are the Conformed Dimensions
69. What is a level of Granularity of a fact table?
Ans:- Level of granularity means level of detail that you put into the fact
table in a data warehouse. For example: Based on design you can decide to
put the sales data in each transaction. Now, level of granularity would mean
what detail are you willing to put for each transactional fact. Product sales
with respect to each minute or you want to aggregate it upto minute and put
that data.
70. What is VLDB?

Ans:- It is an environment or storage space managed by a relational
database management system (RDBMS) consisting of vast quantities of
information.
VLDB doesnt refer to size of database or vast amount of information stored.

It refers to the window of opportunity to take back up the database.
Window of opportunity refers to the time of interval and if the DBA was
unable to take back up in the specified time then the database was
considered as VLDB.
71. What is SCD1 , SCD2 , SCD3?
Ans:- SCD Stands for Slowly changing dimensions.

SCD1: only maintained updated values.
Ex: a customer address modified we update existing record with new
address.
SCD2: maintaining historical information and current information by using

A) Effective Date,B) Versions,C) Flags,or combination of these
SCD3: by adding new columns to target table we maintain historical

information and current information.
72. What are slowly changing dimensions?
Ans:- SCD stands for Slowly changing dimensions. Slowly changing

dimensions are of three types
SCD1: only maintained updated values.
Ex: A customer address modified we update existing record with new
address.
SCD2: Maintaining historical information and current information by using
A) Effective Date,B) Versions,C) Flags,or combination of these
scd3: By adding new columns to target table we maintain historical
information and current information
73. What are Semi-additive and factless facts and in which scenario
will you use such kinds of fact tables?
Ans:- Snapshot facts are semi-additive, while we maintain aggregated facts

we go for semi-additive.
EX: Average daily balance
A fact table without numeric fact columns is called factless fact table.
Ex: Promotion Facts
While maintain the promotion values of the transaction (ex: product
samples) because this table doesn’t contain any measures.
74. What are non-additive facts?
Ans:- Non-Additive: Non-additive facts are facts that cannot

be summed up for any of the dimensions present in the
fact table.
75. What are conformed dimensions?
Ans:- Conformed dimensions mean the exact same thing with every possible
fact table to which they are joined
Ex:Date Dimensions is connected all facts like Sales facts, Inventory
facts..etc
76. How do you load the time dimension?
Ans:- Time dimensions are usually loaded by a program that

loops through all possible dates that may appear in
the data. It is not unusual for 100 years to be
represented in a time dimension, with one row per day.
77. Why are OLTP database designs not generally a good idea for a
Data Warehouse?
Ans:- Since in OLTP,tables are normalised and hence query response will be
slow for end user and OLTP doesnot contain years of data and hence cannot
be analysed.
78. What type of Indexing mechanism do we need to use for a

typical datawarehouse?
ANs:- On the fact table it is best to use bitmap indexes. Dimension tables
can use bitmap and/or the other types of clustered/non-clustered,
unique/non-unique indexes.
To my knowledge, SQLServer does not support bitmap indexes. Only Oracle
supports bitmaps.
79. Why should you put your data warehouse on a different system
than your OLTP system?
Ans:- A OLTP system is basically " data oriented " (ER model) and not "
Subject oriented "(Dimensional Model) .That is why we design a separate
system that will have a subject oriented OLAP system...
Moreover if a complex querry is fired on a OLTP system will cause a heavy
overhead on the OLTP server that will affect the daytoday business directly.
The loading of a warehouse will likely consume a lot of machine
resources. Additionally, users may create querries or reports that are very
resource intensive
because of the potentially large amount of data available. Such loads and
resource needs will conflict with the needs of the OLTP systems for resources
and will negatively impact those production systems.
80. Explain the advanatages of RAID 1, 1/0, and 5. What type of

RAID setup would you put your TX logs
Ans:- Transaction logs write sequentially and don't need to be read at all.
The ideal is to have each on RAID 1/0 because it has much better write
performance than RAID 5.
RAID 1 is also better for TX logs and costs less than 1/0 to implement. It has
a tad less reliability and performance is a little worse generally speaking.
RAID 5 is best for data generally because of cost and the fact it provides
great read capability.
81. What is Normalization, First Normal Form, Second Normal Form ,

Third Normal Form?
Ans:- 1.Normalization is process for assigning attributes to entities–

Reducesdata redundancies–Helps eliminate data anomalies–Produces
controlledredundancies to link tables
2.Normalization is the analysis offunctional dependency between attributes /

data items of userviews,It reduces a complex user view to a set of small
andstable subgroups of fields / relations
1NF:Repeating groups must beeliminated, Dependencies can be identified,

All key attributesdefined,No repeating groups in table
2NF: The Table is already in1NF,Includes no partial dependencies–No

attribute dependent on a portionof primary key, Still possible to exhibit
transitivedependency,Attributes may be functionally dependent on non-
keyattributes.
3NF: The Table is already in 2NF, Contains no transitivedependencies.
82. What is data mining?
Ans:- Data mining is a process of extracting hidden trends within a

datawarehouse. For example an insurance dataware house can be used to
mine data for the most high risk people to insure in a certain geographial
area.
***********************PAGE 1 and PAGE 2

Complete********************
http://www.newinterviewquestions.com/cat/General-Datawarehousing-
interview-questions

Data Warehousing Basics Interview Questions

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Warehousing Basics Interview Questions

Загружено:

Авторское право:

Доступные форматы

1.

Data Warehousing Basics Interview Questions & Answers

1. What is hybrid slowly changing dimension?

Ans:- Hybrid SCDs are combination of both SCD 1 and SCD 2.

It may happen that in a table, some columns are important and we

2. can a dimension table contains numeric values?

yes,diemesions even contain numerical because these are disscriptive

3. How to Create Surrogate Key using Ab Initio?

Ans:- There are many ways to create Surrogatekey but it depends on

1. use next_in_sequence() function in your transform.

4. Differences between star and snowflake schemas ?

Ans:- Star schema

A single fact table with N number of Dimension

Any dimensions with extended dimensions are know as snowflake

STAR MODEL :Radical arrangement of the organisation with one fact

a datamart is subject oriented data base through which we can analyse

data marts are called high query performance structures

5. What is a CUBE in datawarehousing concept?

Ans:- Cubes are logical representation of multidimensional data.The

6. Difference between Snow flake and Star Schema. What are

No need to use complicated joins.

Queries results fastly.

It is the normalised form of Star schema.

We hav to use comlicated joins ,since we hav more tables .

There will be some delay in processing the Query .

Ans:- The Entity-Relationship (ER) model was originally proposed by Peter

In addition, the model can be used as a design plan by the database

8. What is degenerate dimension table?

Ans:- Degenerate Dimensions : If a table contains the values, which r

Ans:- The perception of what constitutes a VLDB continues to grow. A one

10. What is meant by metadata in context of a Datawarehouse and

11. 1.what is incremintal loading?2.what is batch processing?3.what

Aggregate table contains the [measure] values ,aggregated

12. What are the possible data marts in Retail sales.?

Ans:- Product information,sales information

13. What is the main differnce between schema in RDBMS and

Ans:- RDBMS Schema

14. What are the vaious ETL tools in the Market?

Ans:- Various ETL tools used in market are:

15. What is Dimensional Modelling?

Ans:- Dimensional Modelling is a design concept used by many data

16. What is Data warehosuing Hierarchy?

Dimension hierarchies also group levels from general to granular. Query

When designing hierarchies, you must consider the relationships in business

Hierarchies impose a family structure on dimension values. For a particular

Hierarchies are also essential components in enabling more complex

Strategies are different methods followed to meet the validation requriments

18. Wht r the data types present in bo?n wht happens if we

Ans:- Three different data types: Dimensions,Measure and Detail.

19. What is surrogate key ? where we use it expalin with

Ans:- surrogate key is a substitution for the natural primary key.

Data warehouses typically use a surrogate, (also known as artificial or

It is useful because the natural primary key (i.e. Customer Number in

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are

2. Adapted from response by Vincent on Thursday, March 13, 2003

Another benefit you can get from surrogate keys (SID) is :

Tracking the SCD - Slowly Changing Dimension.

Let me give you a simple, classical example:

You could consider Slowly Changing Dimension as an enlargement of your

19. What is a linked cube?

20. What is a source qualifier?-

21. What are the steps to build the datawarehouse??

Ans:- As far I know...