Вы находитесь на странице: 1из 11

1.Define Data Warehouse ?

A subject-oriented , integrated , time-variant and non-volatile collection of data in


support of management's decision making process
2. What is junk dimension? What is the difference between junk dimension and
degenerated dimension?
A "junk" dimension is a collection of random transactional codes, flags and/or text
attributes that are unrelated to an particular dimension! "he junk dimension is simpl a
structure that provides a convenient place to store the junk attributes!#here as A
degenerate dimension is data that is dimensional in nature but stored in a fact table!
Junk dimension: the column #hich #e are using rarel or not used, these columns are
formed a dimension is called junk dimension
Degenerative dimension$ the column #hich #e use in dimension are degenerative
dimension
%x!%mp table has empno, ename, sal, job, deptno
&ut 'e are talking onl the column empno, ename from the %() table and forming a
dimension this is called degenerative dimension
3.Differnce between Normai!ation and Denormai!ation?
Normai!ation is the process of removing redundancies!
*+") uses the ,ormali-ation process
Denormai!ation is the process of allo#ing redundancies!
*+A)/.' uses the denormali-ed process to capture greater level of detailed data /each
and ever transaction0
". Wh# fact tabe is in norma form?
A fact table consists of measurements of business re1uirements and foreign kes of
dimensions tables as per business rules!
A fact table consists of measurements of business re1uirements and foreign kes of
dimensions tables as per business rules!
"here can just be 23s #ithin a 2tar schema, #hich itself is de-,ormali-ed! ,o#, if there
#ere then 43s on the dimensions as #ell, 5 #ould agree! &eing in normal form, more
granularit is achieved #ith less coding i!e! less number of joins #hile retrieving the fact!
$. What is Difference between %&' (odeing and Dimensiona (odeing?
&asic difference is %-6 modeling #ill have logical and phsical model! .imensional
model #ill have onl phsical model! %-6 modeling is used for normali-ing the *+")
database design
!
Dimensiona modeing is used for de-normali-ing the 6*+A)/(*+A) design! Adding
to the point$
%&' modeing revolves around the %ntities and their relationships to capture the overall
process of the system!
.imensional model / (ultidimensional (odeling revolves around .imensions /point of
analsis0 for decision-making and not to capture the process!
5n %6 modeling the data is in normali-ed form! 2o more number of 7oins, #hich ma
adversel affect the sstem performance! 'hereas in .imensional (odeling the data is
denormali-ed, so less number of joins, b #hich sstem performance #ill improve!
). What is conformed fact?
8onformed dimensions are the dimensions, #hich can be used across multiple .ata
(arts in combination #ith multiple facts tables accordingl
8onformed facts are allo#ed to have the same name in separate tables and can be
combined and compared mathematicall! 8onformed dimensions are those tables that
have a fixed structure! "here #ill b no need to change the metadata of these tables and
the can go along #ith an number of facts in that application #ithout an changes
.imension table, #hich is used, b more than one fact table is kno#n as a conformed
dimension!
*. What are the methodoogies of Data Warehousing?
"he are mainl 9 methods!
1. 'a+h ,imbe (ode -.o+ & Down a++roach :: Data Warehouse &&/ Data (art0
3imball model al#as structured as .enormali-ed structure!
2. 1nmon (ode. -2ottom & 3+ a++roach :: Data (art &&/ Data Warehouse0
5nmon model structured as ,ormali-ed structure!
4. What are data vaidation strategies for data mart vaidation after oading
+rocess?
.ata validation is to make sure that the loaded data is accurate and meets the business
re1uirements! 2trategies are different methods follo#ed to meet the validation
re1uirements!
5. What is surrogate ke#?
2urrogate ke is the primar ke for the .imensional table! 2urrogate ke is a
substitution for the natural primar ke!
.ata #arehouses tpicall use a surrogate, /also kno#n as artificial or identit ke0, ke
for the dimension tables primar kes! "he can use 5nfa se1uence generator, or *racle
se1uence, or SQL Server 5dentit values for the surrogate ke!
1t is useful because the natural primar ke /i!e! 8ustomer ,umber in 8ustomer table0
can change and this makes updates more difficult and also used in 28.s to preserve
historical data!
16. What is meant b# metadata in conte7t of a Data warehouse and how it is
im+ortant?
(etadata or (eta data is data about data! %xamples of metadata include data element
descriptions, data tpe descriptions, attribute/propert descriptions, range/domain
descriptions, and process/method descriptions! "he repositor environment encompasses
all corporate metadata resources$ database catalogs, data dictionaries, and navigation
services! (etadata includes things like the name, length, valid values, and description of
a data element! (etadata is stored in a data dictionar and repositor! 5t insulates the data
warehouse from changes in the schema of operational systems! (etadata
2nchroni-ation "he process of consolidating, relating and snchroni-ing data elements
#ith the same or similar meaning from different sstems! (etadata snchroni-ation joins
these differing elements together in the data #arehouse to allo# for easier access!
5n context of a .ata #arehouse metadata is meant the information about the data! "his
information is stored in the designer repositor! (eta data is the data about data: &usiness
Analst or data modeler usuall capture information about data - the source /#here and
ho# the data is originated0, nature of data /char, varchar, nullable, existance, valid values
etc0 and behavior of data /ho# it is modified / derived and the life ccle0 in data
dictionar a!k!a metadata! (etadata is also presented at the .ata mart level, subsets, fact
and dimensions, *.2 etc! 4or a .' user, metadata provides vital information for
analsis / .22!
11. What are the +ossibe data marts in 'etai saes?
)roduct information, sales information
12. What is the main difference between schema in 'D2(8 and schemas in Data
Warehouse?
'D2(8 8chema
; <sed for *+") sstems
; "raditional and old schema
; ,ormali-ed
; .ifficult to understand and navigate
; 8annot solve extract and complex problems
; )oorl modelled
DW9 8chema
; <sed for *+A) sstems
; ,e# generation schema
; .e ,ormali-ed
; %as to understand and navigate
; %xtract and complex problems can be easil
solved
; =er good model
13.What is Dimensiona (odeing?
5n .imensional (odeling, .ata is stored in t#o kinds of tables$ 4act "ables and
.imension tables!
4act "able contains fact data e!g! sales, revenue, profit etc!!!!!
.imension table contains dimensional data such as )roduct 5d, product name, product
description etc!!!!!
.imensional (odeling is a design concept used b man data #arehouse designers to
build their data #arehouse! 5n this design model all the data is stored in t#o tpes of
tables - 4acts table and .imension table! 4act table contains the facts/measurements of
the business and the dimension table contains the context of measurements i!e!, the
dimensions on #hich the facts are calculated!
1". Wh# is Data (odeing 1m+ortant?
"he data model is also detailed enough to be used b the database developers to use as a
"blueprint" for building the phsical database! "he information contained in the data
model #ill be used to define the relational tables, primar and foreign kes, stored
procedures, and triggers! A poorl designed database #ill re1uire more time in the long-
term! 'ithout careful planning ou ma create a database that omits data re1uired to
create critical reports, produces results that are incorrect or inconsistent, and is unable to
accommodate changes in the user's re1uirements!
1$. What does eve of :ranuarit# of a fact tabe signif#?
5t describes the amount of space re1uired for a database! +evel of >ranularit indicates
the extent of aggregation that #ill be permitted to take place on the fact data! (ore
>ranularit implies more aggregation potential and vice-versa! 5n simple terms, level of
granularit defines the extent of detail! As an example, let us look at geographical level of
granularit! 'e ma anal-e data at the levels of 8*<,"6?, 6%>5*,, "%665"*6?,
85"? and 2"6%%"! 5n this case, #e sa the highest level of granularit is 2"6%%"!
+evel of granularit means the upper/lo#er level of hierarch, up to #hich #e can
see/drill the data in the fact table! +evel of granularit means the upper/lo#er level of
hierarch, up to #hich #e can see/drill the data in the fact table!
1). What is degenerate dimension tabe?
"he values of dimension, #hich is stored, in fact table is called degenerate dimensions!
"hese dimensions don't have it's o#n dimensions!
1*. 9ow do #ou oad the time dimension?
5n .ata #arehouse #e manuall load the time dimension, %ver .ata #arehouse
maintains a time dimension! 5t #ould be at the most granular level at #hich the business
runs at /ex$ #eek da, da of the month and so on0! .epending on the data loads, these
time dimensions are updated! 'eekl process gets updated ever #eek and monthl
process, ever month!
14. Difference between 8nowfake and 8tar 8chema. What are situations where
8now fake 8chema is better than 8tar 8chema to use and when the o++osite is true?
2tar schema and sno#flake both serve the purpose of dimensional modeling #hen it
comes to data #arehouses!
2tar schema is a dimensional model #ith a fact table /large0 and a set of dimension tables
/small0! "he #hole set-up is totall denormali-ed!
@o#ever in cases #here the dimension tables are split to man tables that are #here the
schema is slightl inclined to#ards normali-ation /reduce redundanc and dependenc0
there comes the sno#flake schema!
"he nature/purpose of the data that is to be feed to the model is the ke to our 1uestion
as to #hich is better!
8tar schema
contains the dimension tables mapped around one or more fact tables!
5t is a denormali-ed model!
,o need to use complicated joins!
Aueries results fastl!
8nowfake schema
5t is the normali-ed form of 2tar schema!
8ontains in depth joins, because the tables are splited in to man pieces! 'e
can easil do modification directl in the tables!
'e have to use complicated joins, since #e have more tables!
"here #ill be some dela in processing the Auer!
15. Wh# do #ou need 8tar schema?
B0 +ess joiners contains
90 2impl database
C0 2upport drilling up options
26. Wh# do #ou need 8nowfake schema?
2ome times #e used to provide separate dimensions from existing dimensions that time
#e #ill go to sno#flake
Disadvantage ;f snowfake: Auer performance is ver lo# because more joiners is
there
21. What is conformed fact?
8onformed dimensions are the dimensions, #hich can be used across multiple .ata
(arts in combination #ith multiple facts tables accordingl
8onformed facts are allo#ed to have the same name in separate tables and can be
combined and compared mathematicall! 8onformed dimensions are those tables that
have a fixed structure! "here #ill b no need to change the metadata of these tables and
the can go along #ith an number of facts in that application #ithout an changes
.imension table, #hich is used, b more than one fact table is kno#n as a conformed
dimension!
22. What are conformed dimensions
"he are dimension tables in a star schema data mart that adhere to a common structure,
and therefore allo# 1ueries to be executed across star schemas! 4or example, the
8alendar dimension is commonl needed in most data marts! & making this 8alendar
dimension adhere to a single structure, regardless of #hat data mart it is used in our
organi-ation, ou can 1uer b date/time from one data mart to another to another!
8onformed dimentions are dimensions #hich are common to the cubes!/cubes are the
schemas contains facts and dimension tables0

8onsider 8ube-B contains 4B,.B,.9,.C and 8ube-9 contains 49,.B,.9,.D are the 4acts
and .imensions
here .B,.9 are the 8onformed .imensions
23. What is <act tabe
A table in a data #arehouse #hose entries describe data in a fact table! .imension tables
contain the data from #hich dimensions are created! A fact table in data #are house is it
describes the transaction data! 5t contains characteristics and ke figures!
2". What are 8emi&additive and faceess facts and in which scenario wi #ou use
such kinds of fact tabes
2emi-Additive$ 2emi-additive facts are facts that can be summed up for some of the
dimensions in the fact table, but not the others! 4or example$
8urrent &alance and )rofit (argin are the facts! 8urrent &alance is a semi-additive fact,
as it makes sense to add them up for all accounts /#hat's the total current balance for all
accounts in the bankE0, but it does not make sense to add them up through time /adding
up all current balances for a given account for each da of the month does not give us an
useful information
A factless fact table captures the man-to-man relationships bet#een
dimensions, but contains no numeric or textual facts! "he are often used to record events
or
coverage information! 8ommon examples of factless fact tables include$
- 5dentifing product promotion events /to determine promoted products that didn't sell0
- "racking student attendance or registration events
- "racking insurance-related accident events
- 5dentifing building, facilit, and e1uipment schedules for a hospital or universit
2$. What are the Different methods of oading Dimension tabes
=onventiona >oad: &efore loading the data, all the "able constraints #ill be checked
against the data!

Direct oad:/4aster +oading0 All the 8onstraints #ill be disabled! .ata #ill be loaded
directl!+ater the data #ill be checked against the table constraints and the bad data #on't
be indexed! 8onventional and .irect load method are applicable for onl oracle! "he
naming convension is not general one applicable to other 6.&(2 like DB2 or 2A+
server!!
2).What are ?ggregate tabes
Aggregate tables contain redundant data that is summari-ed from other data in the
#arehouse! "hese are the tables #hich contain aggregated / summari-ed data! %!g ?earl,
monthl sales information! "hese tables #ill be used to reduce the 1uer execution time!
Aggregate table contains the summar of existing #arehouse data #hich is grouped to
certain levels of dimensions!6etrieving the re1uired data from the actual table, #hich
have millions of records #ill take more time and also affects the server performance!"o
avoid this #e can aggregate the table to certain re1uired level and can use it!"his tables
reduces the load in the database server and increases the performance of the 1uer and
can retrieve the result ver fastl!
2*. What is a dimension tabe
A dimensional table is a collection of hierarchies and categories along
#hich the user can drill do#n and drill up! it contains onl the textual attributes!
2$. Wh# are ;>.@ database designs not genera# a good idea for a Data
Warehouse
*+") cannot store historical information about the organi-ation! 5t is used for storing the
details of dail transactions #hile a data#arehouse is a huge storage of historical
information obtained from different datamarts for making intelligent decisions about the
organi-ation!
2). What is the need of surrogate ke#A wh# +rimar# ke# not used as surrogate ke#
2urrogate 3e is an artificial identifier for an entit!5n surrogate ke values are generated
b the sstem se1uentiall/+ike 5dentit propert in SQL Server and 2e1uence in
*racle0! "he do not describe anthing!
)rimar 3e is a natural identifier for an entit! 5n )rimar kes all the values are entered
manuall b the user #hich are uni1uel identified! "here #ill be no repeatition of data!
Need for surrogate ke# not @rimar# ,e#
5f a column is made a primar ke and later there needs a change in the datatpe or the
length for that column then all the foreign kes that are dependent on that primar ke
should be changed making the database <nstable
2urrogate 3es make the database more stable because it insulates the )rimar and
foreign ke relationships from changes in the data tpes and length!
<or %7am+e : ?ou are extracting 8ustomer 5nformation from *+") 2ource and after
%"+ process, loading customer information in a dimension table /.'0! 5f ou take 28.
"pe B, ?es ou can use )rimar 3e of 2ource 8ustomer5. as )rimar 3e in
.imension "able! &ut if ou #ould like to preserve histor of customer in .imension
table i!e! "pe 9! "hen ou need another uni1ue no apart from 8ustomer5.! "here ou
have to use 2urrogate 3e!
Another reason $ 5f ou have Alpha,umeric as a 8ustomer5.! "hen ou have to use
surrogate ke in .imension "able! 5t is advisable to have system generated small integer
number as a surrogate ke in the dimension table! so that indexing and retrieval is much
faster!
2*. What is data ceaning? how is it done?
Data Cleansing$ the act of detecting and removing and/or correcting a database's dirt
data /i!e!, data that is incorrect, out-of-date, redundant, incomplete, or formatted
incorrectl0
5t can be done b using the exisitng %"+ tools or using third part tools like "rivillium
etc!,
24. What are sow# changing dimensions
.imensions that change over time are called 2lo#l 8hanging .imensions! 4or instance,
a product price changes over time: )eople change their names for some reason: 8ountr
and 2tate names ma change over time! "hese are a fe# examples of 2lo#l 8hanging
.imensions since some changes are happening to them over a period of time
25. What are Data (arts
.ata (art is a segment of a data #arehouse that can provide data for reporting and
analsis on a section, unit, department or operation in the compan, e!g! sales, paroll,
production! .ata marts are sometimes complete individual data #arehouses #hich are
usuall smaller than the corporate data #arehouse!
.ata (art$ a data mart is a small data warehouse! 5n general, a data #arehouse
is divided into small units according the busness re1uirements! for example, if #e take a
.ata 'arehouse of an organi-ation, then it ma be divided into the follo#ing individual
.ata (arts! .ata (arts are used to improve the performance during the retrieval of data!
eg$ .ata (art of 2ales, .ata (art of 4inance, .ata (art of (aketing, .ata (art of
@6 etc!
36. =an a dimension tabe contains numeric vaues?
,o! *nl 4act "able having ,umeric 4ields!
31. %7+ain degenerated dimension in detai.
.egenerated dimension is a dimension, #hich is not having an source in oltp
5t is generated at the time of transaction
+ike invoice no this is generated #hen the invoice is raised
5t is not used in linking and it is also not a fke
&ut #e can refer these degenerated dimensions as a primar ke of the fact table
A .egenerate dimension is a .imension #hich has onl a single attribute!
"his dimension is tpicall represented as a single field in a fact table!
"he data items thar are not facts and data items that do not fit into the existing
dimensions are termed as .egenerate .imensions!
.egenerate .imensions are the fastest #a to group similar transactions!
.egenerate .imensions are used #hen fact tables represent transactional data!
32. :ive e7am+es of degenerated dimensions
.egenerated .imension is a dimension ke #ithout corresponding dimension! %xample$
5n the )oint*f2ale "ransaction 4act table, #e have$
.ate 3e /430, )roduct 3e /430, 2tore 3e /430, )romotion 3e /4)0, and )*2
"ransaction ,umber
.ate .imension corresponds to .ate 3e, )roduction .imension corresponds to
)roduction 3e! 5n a traditional parent-child database, )*2 "ransactional ,umber #ould
be the ke to the transaction header record that contains all the info valid for the
transaction as a #hole, such as the transaction date and store identifier! &ut in
this dimensional model, #e have alread extracted this info into other dimension!
"herefore, )*2 "ransaction ,umber looks like a dimension ke in the fact table but does
not have the corresponding dimension table!"herefore, )*2 "ransaction ,umber is a
degenerated dimension!
33. What are the ste+s to buid the data warehouse
B!>athering bussiness re1uiremnts
F 5dentifing 2ources
F 5dentifing 4acts
F .efining .imensions
F .efine Attribues
F 6edefine .imensions G Attributes
F *rganise Attribute @ierarch G .efine 6elationship
F Assign <ni1ue 5dentifiers
F Additional convetions$8ardinalit/Adding ratios
F <nderstand the bussiness re1uirements!
9!*nce the business re1uirements are clear then 5dentif the >rains/+evels0!
C!>rains are defined: design the .imensional tables #ith the +o#er level >rains!
D!*nce the .imensions are designed, design the 4act table 'ith the 3e )erformance
5ndicators /4acts0!
H!*nce the dimensions and 4act tables are designed define the relation ship bet#een the
tables b using primar ke and 4oreign 3e! 5n logical phase data base design looks
like 2tar 2chema design so it is named as 2tar 2chema .esign
3". What is the different architecture of data warehouse
B! "op do#n - /bill 5nmon0
9! &ottom up - /6alph kimbol0
"here are three tpes of architectures!
F .ate #arehouse &asic Architecture$
5n this architecture end users access data that is derived from several sources through the
data #arehouse!
Architecture$ 2ource --I 'arehouse --I %nd <sers
F Data warehouse #ith staging area Architecture$
'henever the data that is derived from sources need to be cleaned and processed before
putting it into #arehouse then staging area is used!
Architecture$ 2ource --I 2taging Area --I'arehouse --I %nd <sers
F .ata #arehouse #ith staging area and data marts Architecture$
8ustomi-ation of #arehouse architecture for different groups in the organi-ation then
data marts are added and used!
Architecture$ 2ource --I 2taging Area --I 'arehouse --I .ata (arts --I %nd <sers

Вам также может понравиться