Вы находитесь на странице: 1из 153

Data Warehouse

ni dung chnh mn hc nng cao k nng khai thc thng tin t cc kHO D LIU
Mc ch:

Kho d liu l g?
Nm bt c nhng khi nim c bn, c s nht v kho d liu Hiu r v cc thnh phn ca kho d liu v cc phng php xy dng kho d liu

Khai thc kho d liu nh th no? Nhng vn v x l phn tch trc tuyn OLAP Cc cng c h tr pht trin kho d liu v khai thc kho d liu thc hin cc cng vic nghip v v tr gip quyt nh. MC LC

Phn I: Kho d liu (Data Warehouse) ......................................... 5


Chng I: Gii thiu chung kho d liu ...................................................... 5
1.1 Cc chin lc x l v phn pht thng tin ...........................................................................5 1.2 Kho d liu Data Warehouse ............................................................................................7 1.3 Mc ch ca kho d liu.......................................................................................................8 1.4 c tnh d liu trong kho d liu..........................................................................................9 1.5 Phn bit DW vi nhng h c s d liu tc nghip...........................................................12 1.6 Mt s khi nim c bn......................................................................................................13 1.6.1 Kho d liu cc b - Datamart ......................................................................................13 1.6.2 Metadata (Siu d liu).................................................................................................14 1.6.3 Kho d liu tc nghip, c s d liu thao tc ..............................................................16 1.6.4 nh ngha kho d liu thao tc ....................................................................................18

Chng 2: Cc kiu d liu ........................................................................ 22


2.1 D liu nghip v ..................................................................................................................22 2.1.1 Cc c im phn loi d liu nghip v................................................................22 2.1.2 Ba loi d liu nghip v...............................................................................................23

-1-

Data Warehouse

2.1.3 D liu nghip v phi cu trc (Untructured business data) .........................................25 2.2 Siu d liu (Metadata).......................................................................................................26 2.2.1 Ba loi siu d liu........................................................................................................27 2.2.2 Cc c tnh ca siu d liu.........................................................................................32 2.3 Kho d liu o.....................................................................................................................32

Chng 3: Kin trc kho d liu ............................................................ 33


3.1 Kin trc d liu nghip v ...................................................................................................33 3.1.1 Kin trc d liu mt tng (Single-layer Architecture)..................................................33 3.1.2 Kin trc d liu hai tng (Two-layer Architecture)......................................................34 3.1.3 Kin trc d liu ba tng (Three-layer Architecture).....................................................35 3.2 Kin trc Metadata ..............................................................................................................37 3.3 Kin trc logic ca DW .......................................................................................................38 3.4 D liu nghip v (Business data)........................................................................................39 3.5 Danh mc ca kho d liu....................................................................................................40 3.5 Cc chc nng ca DW ........................................................................................................41

Phn II: Xy dng Kho d liu.................................................... 47


Chng IV: Phn tch cc yu cu ca kho d liu .................................. 47
4.1 Lp k hoch ........................................................................................................................47 4.2 Phn tch cc yu cu ca h thng......................................................................................52 4.2.1 Xc nh cc yu cu ca ch s hu v cng tc qun l............................................52 3.2.2 Yu cu v kin trc......................................................................................................52 4.2.3 Xc nh yu cu ca ngi pht trin h thng ...........................................................53 4.2.4 Nhng yu cu ca ngi s dng u cui..................................................................54

Chng V: M hnh d liu .................................................................... 60


5.1 Cc m hnh d liu ca DW...............................................................................................60

5.1.1 S hnh sao ...............................................................................................................60 5.1.2 Nhng vn lin quan ti thit k s hnh sao ......................................................64 5.1.3 Nhng vn khc vi vic thit k s hnh sao .....................................................65 5.2 S hnh tuyt ri - Snowflake ..........................................................................................67 5.3 S kt hp........................................................................................................................68 5.4 Gii php cho vn nng sut thc hin ca m hnh d liu ...........................................68 5.5 Kiu d liu phc hp ..........................................................................................................72 5.6 M hnh d liu a chiu .....................................................................................................73 5.7 Tp hp d liu v khoan su - Drill down ..........................................................................74

-2-

Data Warehouse

5.8 V d thit k m hnh d liu nhiu chiu ..........................................................................75

Chng VI: To lp cc kho d liu.......................................................... 78


6.1 Cc ngun d liu .................................................................................................................78 6.1.1 Phn tch cc ngun d liu ..........................................................................................78 6.1.2. Thu thp v to lp d liu ............................................................................................79 6.2 Thit k kho d liu nghip v BDW...................................................................................85 6.2.1 T d liu thao tc n d liu trong BDW...................................................................85 6.2.2 Cc k thut nm bt d liu.........................................................................................86 6.2.3 Cc cu trc d liu kt qu thu nhn c ..................................................................91 6.3 p d liu vo BDW ............................................................................................................91 6.3.1 Ti d liu .....................................................................................................................92 6.3.2 B sung d liu .............................................................................................................93 6.3.3 Thit lp s kt hp.......................................................................................................94 6.3.3 Chuyn i d liu ........................................................................................................95 6.4 Duy tr v Trin khai kho d liu .........................................................................................97 6.4.1 Trin khai vi cng ty ....................................................................................................97 6.4.2 Trin khai i vi ngi s dng u cui...................................................................101

PHN III: Cc k thut phn tch v khai thc d liu................... 105


Chng VII: Truy cp v khai thc d liu ............................................ 105
7.1 Truy cp v phn tch .......................................................................................................105 7.2 Khai thc d liu ................................................................................................................108 7.2.1 Cc ng dng ca khai ph d liu .............................................................................108 7.2.2 X l phn tch trc tuyn - OLAP .............................................................................110 7.3 Qun l v qun tr kho d liu...........................................................................................111 7.4 H thng phn phi thng tin.............................................................................................112 7.5 Cng ngh c s d liu c s dng trong phng php khai thc DW.........................112 7.6 Xy dng kho d liu ti chnh ngn sch .........................................................................115

Chng VIII: X l phn tch trc tuyn OLAP................................... 120


8.1 Ti sao li phi x l phn tch trc tuyn .........................................................................120 8.1.1 Phn tch d liu a chiu ...........................................................................................121 8.1.2 nh ngha OLAP .......................................................................................................121 8.1.3 Kin trc OLAP ..........................................................................................................122 8.2 Cc nguyn tc ca OLAP .................................................................................................133 8.3 nh gi cc server OLAP v cc cng c ........................................................................134

-3-

Data Warehouse

8.5 Cng c tr gip phn tch thit k h thng thng tin Des2000 .......................................138

Ph lc ....................................................................................................... 143
Data Warehousing and Oracle Discoverer/2000 TM .............................. 143
Summary ...................................................................................................................................143 Data Warehouses.......................................................................................................................143 Oracle Discoverer/2000 ............................................................................................................144 The Data Warehouse Meta-Layer .........................................................................................145 Data Query Component: A Technical Overview...................................................................146 The Browser Component: A Technical Overview .....................................................................148 The Schema Editor ....................................................................................................................149 Administration ..........................................................................................................................149 Conclusion ................................................................................................................................151

-4-

Data Warehouse

Phn I: Kho d liu (Data Warehouse)


Chng I: Gii thiu chung kho d liu
Thng tin v nhu cu x l d liu c c cc thng tin cn thit Kho d liu Data Warehouse (DW) Mc ch ca kho d liu DW c tnh d liu trong kho d liu Phn bit DW vi cc h c s d liu tc nghip Mt s khi nim c bn
1.1 Cc chin lc x l v phn pht thng tin

Chng ta ang sng trong thi i ca nn kinh t tr thc. Mi hot ng ca chng ta mun t hiu qu cao, ginh c thng li trong th cnh tranh gay gt th nht thit phi c nhng phng php c c nhng thng tin, tri thc cn thit mt cch nhanh v chnh xc. Thng tin c th c c mi ni, mi thi im, t nhiu dng khc nhau. Vic p dng cng ngh thng tin vo thc tin sn xut nghip v mang li nhng hiu qu v li ch to ln. Cng ngh ngy cng c pht trin, hon thin hn p ng nhng yu cu ngy cng cng cao ca thc t nghin cu, qun l sn xut v nghip v. S m rng qui m p dng t nhng ng dng n l n cc h thng thng tin c ln dn n nhng thnh cng vt bc trong nghip v. Cc h thng thng tin t ch ch gii quyt nhng x l cng vic hng ngy nay tin ti p ng c nhng yu cu mc cao hn. Cc nh qun l iu hnh khng nhng bit c cng vic ang din ra nh th no m cn bit ci g s xy ra sau , c ngha l thng tin mang tnh phn tch v h thng thng tin c kh nng h tr quyt nh. Tuy nhin vic xy dng mt h thng nh th vp phi mt s hn ch v mt k thut, c bit l khi kch thc cng nh phc tp ca mi trng thng tin tng ln. Nhng h thng thng tin xy dng theo phng php truyn thng khng lm hi lng ngi s dng v cc nh qun l h thng thng tin. Nhng mc tiu nu trn khng d g c c bi v d liu ngy mt nhiu, lu tr phn tn nhiu dng khng tng thch vi nhau, thm ch cn nhng dng phi cu trc. Nhiu h CSDL c xy dng khng tng thch vi nhau v khng tng thch vi nhng h thng tin mi c xy dng. Nhiu khch hng khng tho mn vi nhng h thng thng tin hin thi. Internet m ra nhiu kh nng v trin vng cho cc doanh nghip, cung cp cho chng ta nhiu phm tr thng tin phong ph, rt cn thit cho cc hot ng ca chng ta. WWW cung cp cc thng tin v mi lnh vc ca x hi loi ngi, t cc cng trnh nghin cu, kt qu hc tp, thng tin qung co, du lch, cc loi tr chi gii tr, thng mi in t, v.v. Rt nhiu hot ng ca chng ta c th thc hin c thng qua

-5-

Data Warehouse

Internet. Tuy nhin, mt vn t ra l lm th no t chc, khai thc c nhng khi lng d liu khng l v a dng c? V pha ngi s dng, cc kh khn gp phi thng l 1. Khng th tm thy d liu cn thit D liu ri rc rt nhiu h thng vi cc giao din v cng c khc nhau, khin tn nhiu thi gian chuyn t h thng ny sang h thng khc. C th c nhiu ngun thng tin p ng c i hi, nhng chng li c nhng khc bit v rt kh pht hin thng tin no l ng. 2. Khng th ly ra c d liu cn thit Thng xuyn phi c chuyn gia tr gip, dn n cng vic dn ng C nhng loi thng tin khng th ly ra c nu khng m rng kh nng lm vic ca h thng c sn. 3. Khng th hiu d liu tm thy M t d liu ngho nn v thng xa ri vi cc thut ng nghip v quen thuc 4. Khng th s dng c d liu tm thy Kt qu thng khng p ng v bn cht d liu v thi gian tm kim D liu phi chuyn i bng tay vo mi trng lm vic ca ngi s dng Nhng vn v h thng thng tin: 1. Pht trin cc chng trnh khc nhau l khng n gin Mt chc nng c th hin rt nhiu chng trnh, nhng vic t chc v s dng li n l rt kh khn do hn ch v k thut. Chuyn i d liu t cc khun dng tc nghip khc nhau ph hp vi ngi s dng l rt kh khn. 2. Duy tr nhng chng trnh ny gp rt nhiu vn Mt thay i mt ng dng s nh hng n tt c cc ng dng khc c quan h Thng thng s ph thuc ln nhau gia cc chng trnh khng r rng hoc l khng xc nh c Do s phc tp ca cng vic chuyn i cng nh ton b qu trnh bo tr dn n m ngun ca cc chng trnh tr nn ht sc phc tp 3. Khi lng d liu lu tr tng rt nhanh -6-

Data Warehouse

Khng kim sot c kh nng chng cho d liu trong cc mi trng thng tin dn n khi lng d liu tng nhanh. 4. Qun tr d liu phc tp Thiu nhng nh ngha chun, thng nht v d liu dn n vic mt kh nng kim sot mi trng thng tin Mt thnh phn d liu tn ti nhiu ngun khc nhau Gii php cho tt c cc vn nu trn chnh l vic xy dng mt kho cha d liu (Data Warehouse).
1.2 kho d liu Data Warehouse

Theo John Ladley [6], k ngh kho d liu (Data Warehouse Technology) l tp

cc phng php, k thut v cc cng c c th kt hp, h tr nhau cung cp thng tin cho ngi s dng trn c s tch hp t nhiu ngun d liu, nhiu mi trng khc nhau.
nh ngha: Kho d liu (Data Warehouse - DW) l tuyn tp cc CSDL tch

hp, hng ch , c thit k h tr cho chc nng tr gip quyt nh, m mi n v d liu u lin quan ti mt khong thi gian c th [5].
Kho d liu thng rt ln ti hng trm GB hay thm ch hng Terabyte. Kho d liu c xy dng tin li cho vic truy cp theo nhiu ngun, nhiu kiu d liu khc nhau sao cho c th kt hp c c nhng ng dng ca cc cng ngh hin i v k tha c t nhng h thng c sn t trc. D liu pht sinh t cc hot ng hng ngy v c thu thp x l phc v cng vic nghip v c th ca mt t chc thng c gi l d liu tc nghip (operational data) v hot ng thu thp x l loi d liu ny c gi l x l giao dch trc tuyn (On_line Transaction Processing OLPT). Kho d liu tri li phc v cho vic phn tch vi kt qu mang tnh thng tin cao. Cc h thng thng tin thu thp x l d liu loi ny cn gi l h x l phn tch trc tuyn (On_Line Analytical Processing - OLAP). Ni cch khc, kho d liu l mt tp hp cc CSDL rt ln ti hng trm GB hay thm ch hng Tera byte d liu t nhiu phn h ca h thng, lu tr v phn tch phc v cho vic cung cp cc dch v thng tin lin quan ti nghip v mt t chc, c quan hay x nghip. Thng thng d liu pht sinh t cc hot ng hng ngy v c thu thp x l phc v cng vic nghip v c th ca mt t chc v vy thng c gi l d liu tc nghip v hot ng thu thp x l loi d liu ny c gi l x l giao dch trc tuyn OLPT. Kho d liu tri li phc v cho vic phn tch vi kt qu mang tnh thng tin cao. Cc h thng thng tin thu thp x l d liu loi ny cn gi l h x l phn tch trc tuyn OLAP. Dng d liu trong mt t chc (c quan, xi nghip, cng ty, v.v.) c th m t khi

-7-

Data Warehouse

qut nh sau:
Operational Data Store (Current, near
current detail data)

Legacy System (Current data)

Data Warehouse (historical data)

Personal Data Warehouse

summarized data, historical data)

Data Mart (data subset,

MetaData

Hnh 1.1 Lung d liu trong mt t chc D liu c nhn (Personal Data) khng thuc phm vi qun l ca h qun tr kho d liu. N cha cc thng tin c trch xut ra t cc h thng d liu tc nghip, kho d liu v t nhng kho d liu cc b ca nhng ch lin quan bng cc php gp, tng hp hay x l bng mt cch no .
1.3 Mc ch ca kho d liu

Mc tiu chnh ca kho d liu l nhm p ng cc tiu chun c bn: 1. Phi c kh nng p ng mi yu cu v thng tin ca NSD 2. H tr cc nhn vin ca t chc thc hin tt, hiu qu cng vic ca minh, nh c nhng quyt nh hp l, nhanh v bn c nhiu hng hn, nng sn cao hn, thu c li nhun cao hn, v.v. 3. Gip cho t chc, xc nh, qun l v iu hnh cc d n, cc nghip v mt cch hiu qu v chnh xc. 4. Tch hp d liu v cc siu d liu t nhiu ngun khc nhau Mun t c nhng yu cu trn th DW phi: Nng cao cht lng d liu bng cc phng php lm sch v tinh lc d liu theo nhng hng ch nht nh Tng hp v kt ni d liu -8-

Data Warehouse

ng b ho cc ngun d liu vi DW Phn nh v ng nht cc h qun tr c s d liu tc nghip nh l cc cng c chun phc v cho DW. Qun l siu d liu Cung cp thng tin c tch hp, tm tt hoc c lin kt, t chc theo cc ch

Dng trong cc h thng h tr quyt nh (Decision suport system - DSS), cc h thng thng tin tc nghip hoc h tr cho cc truy vn c bit. Mc tiu c bn ca mi t chc l li nhun v iu ny c m t nh sau: Li nhun Chi ph C nh Bin i Cc chi ph trong sn xut
Hnh 1.2 Mi quan h gia cc cch quan st v h thng

Li tc Bn hng xut Xc nh gi

1.4 c tnh d liu trong kho d liu

Nhng c im c bn ca Kho d liu (DW) l mt tp hp d liu c tnh cht sau: 1. Tnh tch hp (Integration) D liu trong DW c t chc theo nhiu cch khc nhau sao cho ph hp vi cc qui c t tn, thng nht v s o, c cu m ho v cu trc vt l ca d liu, v.v. Mt DW l mt khung nhn thng tin mc ton x nghip, thng nht cc khung nhn khc nhau thnh mt khung nhn theo mt ch im no . V d h thng OLTP (x l giao dch trc tuyn) truyn thng c xy dng trn mt vng nghip v. Mt h thng bn hng v mt h thng marketing c th c chung mt dng thng tin v khch hng, nhng cc vn v ti chnh cn mt khung nhn khc cho thng tin v khch hng. Mt DW s c mt khung nhn ton th v mt khch hng. Khung nhn bao gm cc phn d liu khc nhau t ti chnh v marketing. Tnh tch hp th hin ch: D liu tp hp trong kho d liu c thu thp t nhiu ngun v trn ghp vi nhau to thnh mt th thng nht.

-9-

Data Warehouse

V d: D liu t nhng chng trnh ng dng thc hin trn cc CSDL tc nghip c tch hp li theo mt cch m ho (encoding) v s o (measurement) thng nht nh sau: S tch hp CSDL tc nghip Data Warehouse Appl. A: m, f Appl. B: 0, 1 Appl. C: male, female Appl. A: pipeline cm Appl. B: pipeline inch (2,54 cm) Appl. C: pipeline yard (0.914 cm) Hnh 1.3 Tch hp d liu ODS
Checking Account System L Anh Female Opened Account 1994 Saving Account System L Anh F (code) Opened Account 1992 Investment Account System L Anh Owns 25 Shares Exxon Opened Account 1995

Data Warehouse

m, f

cm

DW

Integrated and Transformed

Customer L Anh Female Owns 25 Shares Exxon Customer Since 1992

Hnh 1.4 Tch hp d liu

- 10 -

Data Warehouse

2. Hng ch D liu trong DW c t chc theo cc ch phc v cho nhng t chc d dng xc nh c nhng thng tin cn thit trong tng hot ng ca mnh. V d, trong h thng qun l ti chnh c c th d liu c t chc theo chc nng: cho vay, qun l tn dng, qun l ngn sch, v.v. Ngc li, trong DW v ti chnh, d liu c t chc theo ch im da ch yu theo cc i t ng: khch hng, sn phm, cc x nghip, v.v. S khc nhau ca hai cch tip cn trn dn n s khc nhau v ni dung d liu c lu tr trong h thng: DW khng lu tr d liu chi tit, ch cn lu tr nhng d liu c tnh tng hp phc v ch yu cho qu trnh phn tch tr gip quyt nh. Cc h thng ng dng tc nghip (Operational Application System- OAS), CSDL tc nghip li cn nhng d liu chi tit, phc v trc tip cho nhng yu cu x l theo cc chc nng ca lnh vc ng dng hin thi. Do vy mi quan h ca d liu trong nhng h thng ny cng khc, i hi phi c tnh chnh xc, c tnh thi s, v.v. 3. D liu gn thi gian v c tnh lch s Mt kho cha d liu bao hm mt khi lng ln d liu lch s. D liu c lu tr thnh mt lot cc snapshort (nh chp d liu), mi bn ghi phn nh nhng gi tr ca d liu ti mt thi im nht nh th hin mt khung nhn ca mt ch im trong mt giai on. Do vy cho php khi phc li lch s v so snh mt cch chnh xc cc giai on khc nhau. Yu t thi gian ng vai tr nh mt phn ca kho bo m tnh n nht ca mi hng v cung cp c trng v thi gian cho d liu. D liu trong OAS cn phi chnh xc chnh thi im truy cp, cn DW ch cn c hiu lc trong khong thi giann no , trong khong 5 n 10 nm hoc lu hn. D liu ca CSDL tc nghip thng sau mt khong thi gian nht nh th s tr thnh d liu lch s v chng s c chuyn thnh kho d liu. chnh l nhng d liu hp l v nhng ch im cn lu tr.

CSDL tc nghip D liu nghip v: + Thi gian ngn 30-60 ngy + C th c yu t thi gian hoc khng + D liu c th cp nht

DW nh chp d liu: + Thi gian di: 5 n 10 nm + Lung c yu t thi gian + Khi d liu c chp li th khng cp nht c

Hnh 1.4 Tnh thi gian ca d liu

- 11 -

Data Warehouse

4. D liu c tnh n nh (nonvolatility) D liu trong DW l d liu ch c v ch c th c kim tra, khng c sa i bi ngi s dng u cui. N ch cho php thc hin hai thao tc c bn: Np d liu vo kho, Truy cp vo vo cc vng trong DW. 5. D liu khng bin ng Thng tin trong DW c ti vo sau khi d liu trong h thng iu hnh c cho l qu c. Tnh khng bin ng th hin ch: D liu c lu tr lu di trong kho d liu. Mc d c thm d liu mi nhp vo nhng d liu c trong kho vn khng b xo, iu cho php cung cp thng tin v mt khong thi gian di, cung cp s liu cn thit cho cc m hnh nghip v phn tch, d bo, t c c nhng quyt nh hp l, ph hp vi cc qui lut tin ho ca t nhin. 6. D liu tng hp D liu tc nghip thun tu khng c lu tr trong DW. D liu tng hp c tch li qua nhiu giai on khc nhau theo cc ch im nh nu trn.
1.5 Phn bit DW vi nhng h c s d liu tc nghip

Trn c s cc c trng ca DW, ta phn bit DW vi nhng h qun tr c s d liu tc nghip truyn thng: Kho d liu phi c xc nh theo hng ch . N c thc hin theo ca ngi s dng u cui trong khi cc h CSDL tc nghip dng phc v cc mc ch p dng chung. DW qun l mt khi lng ln thng tin c lu tr trn nhiu phng tin lu tr v x l khc nhau. Nhng h CSDL thng thng khng phi qun l nhng lng thng tin ln m qun l nhng lng thng tin va v nh. Trong khi th DW phi qun l nhng lng thng tin rt ln v cng chnh l c th ca kho d liu. DW c th ghp ni cc version khc nhau ca cc loi cu trc CSDL. DW tng hp thng tin th hin chng di nhng hnh thc d hiu i vi ngi s dng. DW tch hp v kt ni thng tin t nhng ngun khc nhau trn nhiu loi phng tin lu tr v x l thng tin nhm phc v cho nhng ng dng x l tc nghip trc tuyn. DW c th lu tr cc thng tin tng hp theo mt ch nghip v no sao cho to ra cc thng tin phc v hiu qu cho vic phn tch ca ngi s dng. DW thng thng cha cc d liu lch s kt ni nhiu nm trc ca cc thng tin tc nghip c t chc lu tr c hiu qu v c th hiu chnh li d dng. D liu trong CSDL tc nghip thng l mi, c tnh thi s trong khong thi gian ngn.

- 12 -

Data Warehouse

D liu t CSDL tc nghip c cht lc v tng hp li chuyn sang mi trng DW. Rt nhiu d liu khc khng c chuyn v DW, ch nhng d liu cn thit cho cng tc qun l hay tr gip quyt nh mi c chuyn sang DW. Ni mt cch tng qut, DW lm nhim v phn pht d liu cho nhiu i tng (khch hng) x l thng tin di nhiu dng nh: CSDL, SQL query, Reports, v.v. Subject-area Factory area Appli Application package

Subject-area Datab

Warehouse area Data Warehouse

Distribution area SQL query Extract file (Datamart) Reports

Hnh Cc vng hot ng ca kho d liu


1.6 Mt s khi nim c bn

1.6.1 Kho d liu cc b - Datamart Kho d liu cc b (Datamart DM) l CSDL c nhng c im ging vi kho d liu nhng vi quy m nh hn v lu tr d liu v mt lnh vc, mt chuyn ngnh. Datamart l kho d liu hng ch . Cc Datamart c th c hnh thnh t mt tp con d liu ca kho d liu hoc cng c th c xy dng c lp v sau khi xy dng xong, cc datamart c th c kt ni tch hp li vi nhau to thnh kho d liu. V vy c th xy dng kho d liu bt u bng vic xy dng cc Datamart hay ngc li xy dng kho d liu trc sau to ra cc Datamart. Datamart (DM) l mt kho d liu th cp cc d liu tch hp ca DW. Datamart c hng ti mt phn ca d liu thng c gi l mt vng ch (Subject Area SA) c to ra v ginh cho mt nhm ngi s dng. D liu trong Datamart cho thng tin v mt ch xc nh, khng phi v ton b cc hot ng nghip v ang din ra

- 13 -

Data Warehouse

trong mt t chc. Th hin thng xuyn nht ca datamart l mt kho d liu ring r theo phng din vt l, thng c lu tr trn mt server ring, trong mt mng cc b phc v cho mt nhm ngi nht nh. i khi datamart mt cch n gin vi cng ngh OLAP to ra cc quan h theo dng hnh sao c bit hoc nhng siu khi (hypercube) d liu cho vic phn tch ca mt nhm ngi c cng mi quan tm trn mt phm vi d liu. C th chia ra lm 2 loi: Datamart c lp v Datamart ph thuc Datamart ph thuc: cha nhng d liu c ly t DW v nhng d liu ny s c trch lc v tinh ch, tch hp li mc cao hn phc v mt ch nht nh ca Datamart. Datamart c lp: khng ging nh Datamart ph thuc, DM loi ny c xy dng trc DW v d liu c trc tip ly t cc ngun khc nhau. Phng php ny n gin hn v chi ph thp hn nhng i li c nhng im yu. Mi DM c lp c cch tch hp ring, do d liu t nhiu DM kh ng nht vi nhau. DM th hin hai vn : th nht l tnh n nh trong cc tnh hung t mt DM nh ban u ln ln nhanh chng theo nhiu chiu v th hai l s tch hp d liu. V vy khi thit k DM phi ch k ti tnh n nh ca h thng, s ng nht ca d liu v vn v kh nng qun l. Xy dng kho d liu (Data Warehousing) khng phi l mt sn phm m l mt qu trnh k thut thu thp, qun l v khai thc d liu mt cch hp l t nhiu ngun khc nhau, thit lp mt kho d liu l tp hp cc d liu hp nht phn nh chi tit mt phn hay ton b cng tc nghip v ca mt t chc hay ni cch khc, y l qu trnh xc lp cch nhn, lp k hoch, xy dng, s dng, qun tr, bo tr v nng cp Kho d liu v Datamart. Khng ph thuc vo vic xy dng mt kho d liu hay mt datamart, qu trnh rt phc tp v lun lun tip din vi trng tm l cc nhu cu nghip v i vi kin thc ly d liu lm cn c. 1.6.2 Metadata (Siu d liu) Metadata l d liu v d liu c s dng trong DW (hay gi l siu d liu) tr li cc cu hi ai, ci g, khi no, ti sao, nh th no v d liu. N oc s dng cho vic xy dng, duy tr, qun l v s dng DW. Metadata c chia thnh 3 loi: siu d liu nghip v, k thut v tc nghip ( thao tc) 1. Siu d liu nghip v (Business Metadata): cha ng nhng thng tin khin cho ngi s dng d dng hiu c khung cnh ca thng tin c lu tr trong DW. N cha ng nhng thng tin cho tt c nhng ngi s dng u cui v v: Cc cc vng ch (Subject Area - SA) v cc loi i tng thng tin bao gm cc cu truy vn, cc bo co, cc hnh nh, video v cc audio clip Cc trang ch trn Internet.

- 14 -

Data Warehouse

Cc thng tin khc h tr cho tt c cc thnh phn cu thnh DW. Chng hn nh cc thng tin lin quan ti cc h thng phn phi thng tin bao gm thng tin v lch lm vic, nhng chi tit v ni phn phi v cc i tng truy vn nh nhng truy vn, bo co v cc phn tch c xc nh trc. Cc thng tin tc nghip ca DW nh lch s ca d liu ( cc snapshot, cc version), quyn s hu, theo di s sch, s dng d liu. Miu t cc thuc tnh DW bng cch xc nh tn ca cng vic, cc nh ngha, cc bng m t v cc b danh. 2. Siu d liu k thut (Techncal Metadata): cha ng nhng thng tin v d liu trong DW ca nhng ngi thit k v qun tr khi tin hnh cng vic pht trin v qun l. N bao gm: Thng tin v cc ngun d liu k c nhng ngun tc nghip v nhng h thng ngun bn ngoi mi trng DW v v tr, tn cc file, kiu file, tn cc trng v cc c tnh, b danh, thng tin v phin bn, nhng mi quan h, ln, tnh d bin ng, ngi ch d liu v nhng ngi s dng c quyn truy nhp. Nhng m t v s chuyn i v d nh cch thc nh x t c s d liu tc nghip ln DW v cc thut ton c s dng bin i v ci thin hay chuyn i d liu. Nhng nh ngha cu trc d liu v i tng trong mi trng Warehouse cho d liu ch. Nhng lut dng lm sch v ci thin d liu. Nhng php ton nh x d liu khi ly dliu t cc h thng ngun v a chng vo c s d liu ch. Quyn truy nhp, lch s v backup, v s lu tr, v s phn phi thng tin, v s thu nhn d liu, v s truy nhp d liu, v.v.. 3. Siu d liu tc nghip ( Operational Metadata - OM) OM gip trong vic duy tr v trin khai DW. OM m t thng tin cha ng trong cc bng ch. M t ct li, kh nng to c s d liu ch ( to ra bng v thng tin di dng lit k), thng tin c lu tr hay trc tuyn, ngy lm ti mi (refresh) d liu, s lng cc bn ghi, lch thc hin cc cng vic v nhng ngi s dng c kh nng truy nhp vo data. Metadata cung cp cho ngi s dng s truy nhp tng tc gip cho h c th hiu c ni dung v tm thy c d liu cn thit. Mt vn l trong thc t kh nng kt hp ca cng c trch lc d liu v Metadata cn kh th. Do cn phi to ra nhng giao din dng Meatadata cho ngi s dng . Tt c cc thnh phn ca DW u cn v c th ly d liu t Metadata. Metadata - 15 -

Data Warehouse

c lu tr khu vc trung tm. 1.6.3 Kho d liu tc nghip, c s d liu thao tc Kho d liu tc nghip (Operational Database Store - ODS) l h thng tc nghip tch hp cn bn dng cho mc ch thc hin cng vic tr gip quyt nh v phn tch trn d liu giao dch tc nghip. Ni mt cch khc, ODS l mt khi nim c kin trc h tr cho vic to quyt nh tc nghip hng ngy lu tr nhng d liu c gi tr hin thi c chuyn n t cc ng dng tc nghip. iu khin cho d liu lu tr trong ODS bin ng thng xuyn khi nhng d liu lin quan trong cc h thng tc nghip c s thay i. ODS cung cp mt s la chn cho cc ng dng tr gip quyt nh tc nghip, truy nhp d liu mt cch trc tip t cc h thng x l cc giao dch trc tuyn. i khi cng c nhng s nhp nhng gia ODS vi DW, nn cn phi phn bit chng vi nhau. Trong tt c cc trng hp, ODS cn phi c xy dng ring bit v l mt phn ca DW.

A B C Applications Operational Hnh 1.5 S phn cch gia ODS vi DW ODS DW

EIS DSS ES

Mt trong nhng s khc nhau c bn v quan trng nht l ni dung v cc cu trc d liu c lu tr. ODS cha nhng d liu c gi tr hin thi hoc gn vi d liu hin thi, cn DW cha nhng d liu lch s, c gi tr trong mt qu kh gn. ODS c th cp nht cn DW khng cp nht c. ODS
+ Current, near current, + Detailed data, + Updates

DW
+ Historical data, + Summary and detail + Nonvolatile snapshop

Hnh 1.6 S khc bit gia hai mi trng Ni chung d liu trong DW thng l rt ln, nhiu hn ODS, ngha l

- 16 -

Data Warehouse

chng khc nhau v s lng, phm vi lu tr d liu.

ODS

DW

Hnh 1.7 Khc nhau v s lng, phm vi lu tr ODS ch tp trung lu tr nhng d liu thun nht v c gi tr hin thi cn DW c th cha rt nhiu d liu nhiu mc khc nhau, nhng d liu khng thun nht. ODS Current data Data Warehouse
Highly summarized Lightly summarized Current datail Old detail

Hnh 1.8 S phong ph v chng loi d liu trong DW Mt s khc nhau na l cng ngh h tr cho hai h thng . ODS i hi phi l mi trng c php cp nht, ghi, thay i c nhng d liu cn thit cho ph hp vi nghip v v nhanh chng tr li c cc yu cu ca NSD, DW th ngc li, ch yu cu n gin l Load-and-Access Changes Access ODS Insert

Delete Load

DW

Access

+ Fast response time

+ General purpose update

+ Load and Access


+ No update

Hnh 1.9 Hai mi trng khc nhau v k ngh C th dng tn sut cp nht (thi gian thc hoc gn thi gian thc, nh k hay qua ma) phn loi cc ODS. Mc d kin trc ca ODS khc nhiu so vi DW nhng hai loi cui ca ODS kh ging vi DW. l l do nhiu yu cu ng dng ca ODS c thc hin

- 17 -

Data Warehouse

thng qua vic truy nhp trc tip ti kho d liu tc nghip v ci thin nhng x l tinh ch to ra DW. Nh vy c th c trng hp d liu t cc ngun khng c tinh ch chuyn i v ti trc tip vo DW m trc ht c ti v chuyn i vo ODS ri mi c x l tinh ch, lm sch cho vo DW hoc DM. Tuy nhin c mt s kh khn chnh ca ODS vn cn tn ti. Trong c nhng vn cn gii quyt sau: V tr ngun d liu thch hp. Vic chuyn i ngun d liu p ng c nhu cu ca m hnh d liu ODS. S phc tp ca vic chuyn ti nhng thay i t cc h thng tc nghip ti ODS. Mt h qun tr c s d liu kt hp nhng x l truy vn hiu qu vi kh nng x l nhng giao dch bo m nhng thuc tnh giao dch ACID. Cch thit k c s d liu ti u, mt mt h tr cho cc hot ng ca h tr gip quyt nh kht khe nht v ng thi gim c s lng ch s gim thiu nh hng ti vic cp nht. V mt chc nng, ODS cung cp mt khung nhn tp trung v d liu gn vi thi gian thc t cc h thng tc nghip. Mc du hu ht cc ODS c lm mi li hng ngy (i vi nhng DW c chu k l mt ngy) trong nhng trng hp nht nh cn thit c mt s phn tch nhanh qun l cng vic v nu d liu tn ti trong nhng file ring r, mt ODS l thch hp nht vi s phn tch ny. Thm na, ODS c th l vt thay th cho mt bn ghi nhng thay i c dng cho vic lm mi li nhng file DSS khc trong cng ty. Trong mi quan h vi DW, ODS c th c s dng nh kho d liu dng cho vic tp hp d liu t cc ngun khc nhau. Ngc li ODS khng hot ng nh l mt kho d liu trung gian cho DW c bit trong trng hp DW cn d liu t nhng ngun bn ngoi, khng nm trong ODS. Trong trng hp DW c th ly d liu mt cch ring r t ODS hoc mt ngun d liu bn ngoi c thm vo thnh phn tinh ch d liu ca DW. 1.6.4 nh ngha kho d liu thao tc Nh trn phn tch, vy chnh xc kho d liu thao tc l g?. ODS l h thng: Hng theo ch Tch hp C th thay i c, c th cp nht, Tuyn tp cc d liu hin ti hoc gn vi hin ti h tr cho nhng quyt nh tc nghip hng ngy. Nh vy c th ni ODS khc ch yu vi DW hai im cui. D liu t nhiu ng dng hin ti cn c chuyn i lu vo kho d liu thao tc. Qu trnh chuyn i d liu bao gm nhng bc sau:

- 18 -

Data Warehouse

Bin i d liu (Converting) Quyt nh xem d liu no trong s cc d liu t nhiu ngun khc nhau l tt nht M ho / gii m d liu Sa i cu trc chnh Sa i cu trc vt l Thay i li cc format i t ng cho ph hp Biu din v tnh ton li d liu. V d: Trc tin chng ta xt cc bng d liu thu c t ba ng dng App1, App2, App3 c tch hp li nh sau: App1
Account id = 23456 Name = Tran Anh Date Acct opened = 14/4/02 Credit rating = AA

ODS
Customer id = TA3 Name = Tran Anh Credit rating = AA Address = 12 Tran Hung Dao Phone = 7571824 Credit limit = $10.000 Credit rating = B Employer = Hoa Sa Position= sales clerk Salary= $1.000/month Date of birth = 10/5/1976 Acct type A = 23456 Acct type B = A56 Acct type C = F123

App2
Account id = A56 Name = Tran Anh Address = 12 Tran Hung Dao Phone = 7571824 Sex= male Credit limit = $10.000 Credit rating = B

App3
Account id = F123 Name = ng Anh Employer = Hoa Sa Position= sales clerk Salary= $1.000/month Education = BS Date of birth = 10/5/1976

Hnh 1.10 To ra record d liu tch hp t nhiu ng dng khc nhau Mt khch hng c th c nhiu ti khon, nh hnh trn Tran Anh c ba Accounts. Mt im cn lu trong ODS l phi to ra kho chung (Customer id) ly c d liu t nhiu ngun khc nhau v c a vo record tng ng. Tm li, cc c tnh ca h thng d liu thao tc ODS v kho d liu DW c cc c tnh chnh nh sau:

- 19 -

Data Warehouse

c trng Mc ch

D liu thao tc

Kho d liu

Mt ng dng ti mi thi im Mt hay nhiu ch cng trong mi trng hin ti hoc mt thi im cha d liu v mt ch tng thi im Bit tng tn Nghip v hng ngy M h Tr gip quyt nh, qun l, tng li nhun, li th cnh tranh

Cc yu cu ng dng chnh Truy cp d liu

Mt s nh cc dng d liu Tp d liu rt ln c c tm c trong mi l gi. tm kim thng tin. Tn xut truy cp ln vo Truy cp khng thng xuyn nhng khi lng d liu va vo khi lng ln d liu phi

Khi lng d Khi lng d liu va phi Khi lng rt ln d liu cn liu cho cc cng vic hng ngy phn tch, thng k, d bo, lp k hoch, bo co, v.v. Duy tr d liu Lu gi cc d liu nghip v D liu c tnh lch s c duy tr di hng ngy hn i snh, phn tch, v.v. Phn ln x l tnh v d liu khng thay i

Kh nng thc C th trong tng pht hin ng thi

Mc sn sng sn sng mc cao theo Khng yu cu d liu phi sn ca d liu yu cu sng cao n v cng vic Nh, qun l c v d on Ln, khng on trc v cc c tng n v cng vic n v cong vic hay thay i Hiu qu cng vic Hiu qu cao Linh hot T chc lu tr d liu trong kho C hai cch lu tr d liu theo a chiu: M hnh d liu a chiu MDD (MultiDimensional Database) s dng cu trc khi Cube lu tr. K thut khai thc tng ng l MOLAP.

- 20 -

Data Warehouse

Time M a r k e t

Product

Lu tr theo m hnh d liu quan h a chiu s dng s hnh sao: Time Dim Table
time_key week month year

Product Dim Table POS Fact Table


time_key mkt_key prod_key vendor_key sales_unit sales_cost prod_key upc discript category

Market Dim Table


mkt_key store name market

Manufacture Dim Table


vendor_key vendor name vendor address

- 21 -

Data Warehouse

Chng 2: Cc kiu d liu


Phn loi d liu trong DW Metadata siu d liu: cc c trng c bn Kho d liu o (Virtual data warehouse)

2.1 d liu nghip v

D liu nghip v (Business data BD) l d liu dng vn hnh v qun l mt doanh nghip hoc mt t chc. N phn nh nhng hot ng ca doanh nghip v nhng i tng trong th gii thc nh l khch hng, a im, sn phm v.v.. N c to ra v s dng bi cc h thng x l giao tc cng nh cc h thng h tr quyt nh (DSS). 2.1.1 Cc c im phn loi d liu nghip v C bn c im c xem xt quyt nh vic phn loi d liu nghip v. 1. Phng thc s dng trong nghip v ca d liu, 2. Phm vi d liu, 3. D liu c/ghi hay ch c, 4. D liu theo thi gian. 1. Phng thc s dng trong nghip v: D liu c s dng trong nghip v thc hin hai mc tiu ln: D liu tc nghip (operational data) : vn hnh cng vic v lin quan ti cc hot ng hay quyt nh mang tnh ngn hn D liu mang tnh thng tin (informational data): vn hnh v qun l doanh nghip trong mt thi hn di. D liu tc nghip l d liu gc ca mt t chc, n l ngun ca tt c cc d liu mang tnh thng tin. C d liu tc nghip v d liu thng tin u c cu trc theo nhu cu truy nhp v s dng. 2. Phm vi d liu: D liu c th l phn nh mt mu tin hoc mt giao dch, n cng c th l mt tng hp ca cc mu tin hoc cc giao dch. D liu c th l: D liu chi tit hay d liu nguyn t : thng l cc i tng hay cc giao dch c s nh sn phm, n hng hay khch hng D liu tng hp: qun l doanh nghip, th hin mc bao qut hot ng nghip v

- 22 -

Data Warehouse

3. D liu c/ghi hay ch c: d liu c/ghi khc vi d liu ch c mt cch c bn cc im s dng v qun l: D liu c/ghi : yu cu c thit k cn thn cc tin trnh cp nht nhm m bo ton vn cc lut nghip v. Cu trc ca n ti u cho vic ghi vo c s d liu hay vo file D liu ch c: c thit k phc v cho vic s dng nhiu ln (lp li) 4. D liu theo thi gian: Tnh cht theo thi gian ca d liu th hin v tr ca n xt theo kha cnh thi gian. D liu hin ti: l mt khung nhn ca cng vic nghip v vo thi im hin ti. C th hiu n l d liu tc thi v v vy s thay i theo thi gian trn cc hot ng nghip v. D liu ti thi im (point-in-time): l mt snapshot ca d liu nghip v trong mt thi im nht nh, phn nh trng thi ca doanh nghip vo thi im . D liu ny th hin mt khung nhn ca qu kh, n c th dng nh k hoch hay d on. D liu nh k: y l mt lp d liu m rng rt quan trng. N th hin s thay i ca cng vic nghip v trong mi giai on thi gian. Nhng khi nim ny l c s cho vic x l cc d liu lch s (bao gm d liu nh k v cc snapshot qu kh) ca kho cha. Future Point-in-time Time Present Past Hnh 2.1 Cc kiu d liu theo tuyn thi gian tc nghip 2.1.2 Ba loi d liu nghip v Cc c im miu t trn cho php xc nh ba loi d liu nghip v. Vic phn loi ny da trn d liu c cu trc bi hai l do, th nht, d liu c cu trc bao gi cng c ci t vo kho cha u tin, th hai, vi d liu c cu trc s khc bit ca ba loi d liu s r rng hn. 1. D liu thi gian thc (real-time data): D liu chi tit, tc thi (up-to-the second) dng vn hnh cng vic v c truy xut theo ch c/ghi thng - 23 -

Data Warehouse

qua cc giao dch c xc nh trc. D liu thi gian thc c to lp, thao tc v s dng dng trong cc ng dng thao tc hay sn xut. Chng c th t chc thnh cc files hay CSDL. V d v d liu thi gian thc: Data Customer file Account balance Point-OfSale data Industry All Finance Usage Customer details Control account activities Generate bills, manage stock Technology Legacy appli, flat files, mainframe Legacy appli, hierarchy database Client/Server, Relational database, UNIX system, Legacy appli, hierarchy database New Appl, Relational database, Volumes Smallmedium Large

Retail

Very large

Call record

Telecommunication Billing Control production

Very large Medium

Production Manufacturing record

2. D liu dn xut (derived data): D liu xc nh theo thi im (point-intime) hoc d liu nh k, mc chi tit hoc tng hp, thuc ch ch c, nhn c t vic x l d liu thi gian thc v dng qun l cng vic nghip v. D liu dn xut l tp d liu thng c s dng tr gip quyt nh. Chng ta hy xt v d nhng d liu dn xut sau:

- 24 -

Data Warehouse

Data Volumes Sale summary

Industry All

Usage Historical sales patterns by month and year Analysis of campaigns by erea Pattern analysis, detection of fraud

Technology SpreadSheet, PCbased Smallmedium

Market analysis Claims analysis data

Retail

Multidimentional database, parallel computing Relational database, mainframe

Very large, detailed Very large, detailed

Insurance

trc.

D liu mi c th c dn ra t t hp ca cc trng hoc cc record d liu c 3. D liu tng hp, ho hp (reconciled data): D liu tng hp l mt loi d liu dn xut c bit, sinh ra bi mt tin trnh c thit k nhm bo m s vng chc ni ti ca d liu kt qu, tin trnh ny thc hin da vo d liu thi gian thc ti mc chi tit, duy tr hoc to ra cc d liu lch s.

2.1.3 D liu nghip v phi cu trc (Untructured business data) Nhng h thng thng tin qun l truyn thng thng c d liu c cu trc ha mc cao. D liu loi ny thng c nhng c im sau: Mi thc th c rt nhiu thuc tnh (c nhiu trng cho mi bn ghi hoc nhiu ct cho mi bng) Cc thc th c quan h vi nhau Hu ht cc trng thuc tnh u c kch thc nh. D liu phi cu trc c nhng c tnh ngc li, l nhng d liu khng thun nht. Hnh nh, m thanh hay phim l nhng v d ca d liu phi cu trc. Tm quan trng ca d liu phi cu trc ngy cng tng ln trong cng vic nghip v cng nh trong cc h thng thng tin. D liu phi cu trc c kch thc ln, kh thao tc v khng c h tr tt trong cc c s d liu v cc cng c khc. Tuy vy mt kho cha thng thng khng th

- 25 -

Data Warehouse

khng c d liu loi ny, nhng n ch c a vo kho sau khi hon thnh vic a d liu c cu trc vo kho. D liu thi gian thc phi cu trc tng ng vi cc nh in t ca cc giao tc nghip v m khng d phn tch thnh cc trng d liu c th hn. D liu dn xut phi cu trc c th c xem l tng hp hoc tru tng ho d liu thi gian, nh l d liu c cu trc.
2.2 Siu d liu (Metadata)

Siu d liu l d liu v d liu c s dng trong DW, tr li cc cu hi ai, ci g, khi no, ti sao, nh th no v d liu. Cc thuc tnh ny c s dng cho vic xy dng, duy tr, qun l v s dng DW. Metadata l mt trong nhng phng din quan trng nht ca DW. mc ti thiu, Metadata phi m t v d liu c cha trong DW, bao gm: V tr, m t v DW v cc thnh phn d liu (cc i tng ca DW) Cc tn gi, NSD nh ngha, cu trc v ni dung ca DW cng vi cc quan st ca

Xc nhn cn c ca cc ngun cung cp d liu Cc qui tc chuyn i v tch hp d liu c s dng trong DW, trong c c nhng php nh xt cc CSDL thao tc sang DW, k c nhng thut ton chuyn i. Cc qui tc chuyn i v tch hp d liu c s dng phn pht d liu ti NSD u cui Nhng thng tin m t v h thng thng tin cp pht Nhng thng tin thao tc trong DW, bao gm lch s qu trnh cp nht DW, qu trnh lm ti, sao chp d liu, v.v. Cc h s o (metrics) c s dng phn tch hiu xut s dng v hiu qu ca DW S m m bo v an ton d liu v danh sch qun l quyn truy nhp. Vn quan trng nht l m t nhng g c trong kho d liu v mi quan h ca chng. Nh vy, m hnh thch hp cho Metadata chnh l m hnh thc th lin h (Entity-Relationship Model) hay biu lp trong UML hoc OMT (Object Modeling Technique). Trong cc m hnh ny c cc thc th (entity), thuc tnh (Attribute), v mi lin h (Relationship). i tng vo/ra (Input-Output Object IO_O) m t cc i tng d liu vo/ra kho d liu trong m hnh d liu. Cc phn t d liu (Data Element) m t cc n v c s ca cc s kin truy

- 26 -

Data Warehouse

nhp c nh cc ct trong cc CSDL. Cc thnh phn quan h (Relationship Member) m t v s tham gia ca cc thc th trong mi quan h xc nh. V d: Cu trc quan h hnh 2.2 cho php xc nh siu d liu v quan h gia cc thc th. Mi lin h gia thc th Employee v thc th Skill l Employee-Skill mc siu d liu 2.2 (a) v sau d liu c th hin 2.2 (b).
Employee Social Security Employee-Skill Social Security Num (FK) Skill Code (FK) Skill Skill Code

Hnh 2.2 (a) Siu d liu quan h


Input-Output Object Social Security Employee Employee Skill Skill Relationship Memeber Relationship ID IO Object ID 12 12 13 13 Employee Employee Skill Employee Skill Skill Relationship Relationship ID 12 13

Relationship Relationship ID IO Object ID 12 12 13 13 Employee Employee Skill Employee Skill Skill Field ID* Social Security Num Social Security Num Skill Code Skill Code

Hnh 2.2 (b) Th hin ca siu d liu hnh trc 2.2.1 Ba loi siu d liu Metadata c chia thnh 3 loi: 1. Siu d liu nghip v (Business Metadata) Cha ng nhng thng tin gip cho ngi s dng d dng hiu c khung cnh ca thng tin c lu tr trong DW, bao gm nhng thng tin cho tt c nhng ngi s dng u cui v:

- 27 -

Data Warehouse

Cc vng ch im v cc loi i tng thng tin bao gm cc cu truy vn, cc bo co, cc hnh nh, video v cc audio clip Cc trang ch trn Internet Cc thng tin khc h tr cho tt c cc thnh phn cu thnh DW. Chng hn nh cc thng tin lin quan ti cc h thng phn phi thng tin bao gm: lch lm vic, nhng chi tit v ni phn phi, cc i tng truy vn nh nhng truy vn, bo co v cc phn tch c xc nh trc. Cc thng tin tc nghip ca DW nh lch s ca d liu (cc snapshot, cc version), quyn s hu, theo di s sch, s dng d liu. Miu t cc thuc tnh DW bng cch xc nh tn ca cng vic, cc nh ngha, cc bng m t v cc b danh. Nhng thng tin trn nhm tr li nhng cu hi sau: Tn nghip v ca ct, nh ngha, m t hoc cc b danh? Cc ct lin kt vi nhau nh th no? Tm thng tin u? D liu xut pht t u, h thng ngun l g? Cc lut nghip v v s bin i c thc hin vi d liu trong cc giai on nghip v ? Ai lm ch d liu, bit c ai nm gi d liu l rt quan trng cho vic thay i, truy nhp cc cu hi and, or v d liu? D liu c lm mi (refresh) ln cui khi no? 2. Siu d liu k thut (Technical Metadata) Cha ng nhng thng tin v d liu trong DW cho nhng ngi thit k v qun tr khi tin hnh cng vic pht trin v qun l, bao gm: Thng tin v cc ngun d liu k c nhng ngun tc nghip v nhng h thng ngun bn ngoi mi trng DW v v tr, tn cc file, kiu file, tn cc trng v cc c tnh, b danh, thng tin v phin bn, nhng mi quan h, ln, tnh d bin ng, ngi ch d liu v nhng ngi s dng c quyn truy nhp. Nhng m t v s chuyn i v d nh cch thc nh x t c s d liu tc nghip ln DW v cc thut ton c s dng bin i v ci thin hay chuyn i d liu. Nhng nh ngha cu trc d liu v i tng trong mi trng Warehouse cho d liu ch. Nhng lut dng lm sch v ci thin d liu.

- 28 -

Data Warehouse

Nhng php ton nh x d liu khi ly d liu t cc h thng ngun v a chng vo c s d liu ch. Quyn truy nhp, lch s d liu c backup, v qu trnh lu tr, v s phn phi thng tin, v s thu nhn d liu, v s truy nhp d liu, v.v.. Nhng thng tin trn c xc nh da vo cc cu hi chng hn nh: Nhng cng vic trch chn d liu c tin hnh? S chuyn i no c thc hin vi d liu? D liu no nm trong cc h thng ngun c x l nh th no? B tr v vt l ca c s d liu ch, bn mc lc c t chc nh th no, cc bng c truy nhp nh th no? 3. Siu d liu tc nghip (Operational Metadata: OM) OM gip trong vic duy tr v trin khai DW. OM m t thng tin cha ng trong cc bng ch. M t ct li, kh nng to c s d liu ch (to ra bng v thng tin di dng lit k), thng tin c lu tr hay trc tuyn, ngy refresh, s lng cc bn ghi, lch thc hin cc cng vic v nhng ngi s dng c kh nng truy nhp vo d liu OM tr li cc cu hi chng hn nh: Thng tin c tp hp li trong bng ch trong bao lu? Thng tin cha trong cc bng ch l g v ct li mc no? Khi no cc cng vic c thc hin theo k hoch? Cc cng vic thc s c thc hin khi no? S th t ca cc bn ghi vo v ra l g? Bng ch c ti vo ln cui cng khi no, Bao nhiu bn ghi c ti vo, c bn ghi no hng khng? C truy vn chung no c thc hin trn mt bng, bng c i hi mt index khc khng, c nn xy dng mt bng s lc? Metadata h tr trc tip cho ngi s dng gip h c th hiu c ni dung v tm thy c d liu cn thit. Trong thc t kh nng kt hp ca cng c trch lc d liu v Metadata cn rt km. Do cn phi to ra nhng giao din dng Metadata cho ngi s dng . Vic lu tr, qun l v phn loi Metadata c thc hin qua mt kho cha Metadata v cc phn mm km theo. Cc kho c phn loi bng cch s dng mt s - 29 -

Data Warehouse

phn loi c gi l m hnh thng tin (information model). M hnh ny cha mt danh sch cc loi siu d liu v mi lin quan gia chng. Kho ny l mt cng c qun l siu d liu vi mc ch chung v rt linh hot. Phn mm qun l kho siu d liu c th c s dng nh x d liu ngun ti c s d liu ch, to m cho s tch hp, chuyn i d liu v kim sot d liu di chuyn ti DW. Phn mm ny chy trn mt my trm cho php ngi s dng bit mt cch c th d liu c chuyn i nh th no v d l nh x bin i hay c tng hp li. Hu ht cc kho c sn u s dng mt c s d liu quan h cho vic lu tr v qun l siu d liu. Mt vi gii php mi cho kho siu d liu da trn cng ngh h thng qun l c s d liu hng i tng (OODBMS). Metadata nh ngha ni dung v v tr ca d liu trong DW, mi quan h gia c s d liu tc nghip vi DW v cc khung nhn d liu ca DW c th truy nhp c bi cng c ca ngi s dng u cui. Ngi s dng u cui cn n Metadata khi cn n nhng nh ngha d liu hay cc vng ch th. Ni cch khc, Metadata cung cp cc con tr hng h tr quyt nh tr ti DW v cung cp mi lin kt logic gia DW v ng dng h tr quyt nh. Mt DW c thit k m bo c ch sn sinh, duy tr kho siu d liu v tt c cc ng dn truy nhp vo DW u c Metadata nh mt im vo. Mt DW c thit k phi ngn chn c bt k mt s truy nhp trc tip no vo DW (c bit l kh nng thay i d liu) nu khng s dng nhng nh ngha Metadata truy nhp. Kho siu d liu c ci t nh vy s em li nhng li ch sau y: Cung cp mt b cng c thng minh cho vic qun l siu d liu trong ton b cng ty. Lm gim v loi b s d tha thng tin, s khng ng nht v t s dng n gin ho vic qun l v ci thin t chc, kim sot v tnh ton nhng ti sn thng tin Lm tng vic xc nh, hiu r, cng sp xp v s dng cc ti sn thng tin ca cng ty. Cung cp cc cng c qun tr d liu hiu qu cng qun l tt hn cc ti sn thng tin vi t in d liu y cc chc nng. Lm tng tnh linh hot, kim sot v tin cy ca tin trnh pht trin ng dng v lm cho vic pht trin ng dng nhanh hn. Thc y vic iu tra kho st trong cc h thng tc nghip vi kh nng kim k v s dng nhng ng dng ang tn ti. Cung cp m hnh quan h tng th cho RDBMS hn tp tng tc v chia s thng tin. Tun theo chun pht trin CASE v loi b s d tha vi kh nng chia s v

- 30 -

Data Warehouse

dng li Metadata. Mt vn xut hin thng xuyn trong DW l kh nng giao tip vi ngi s dng u cui v nhng thng tin bn trong DW v cch thc chng c truy nhp. Chnh Metadata l cch ngi s dng v cc ng dng c th tip cn c vi nhng thng tin c lu tr trong DW. N c th nh ngha tt c cc nguyn t d liu (data element) v cc thuc tnh ca chng. Metadata cn c thu thp khi DW c thit k v xy dng. Metadata phi c sn cho tt c nhng ngi s dng DW hng dn h dng DW. Cc cng c tr gip cng c thit lp v cn c nh gi trc khi quyt nh mua n. Mt trong nhng thnh phn chc nng quan trng ca kho Metadata l th mc thng tin. Th mc ny lu tr v qun l siu d liu v c gn lin vi cc ng dng DW. Th mc ny c th c truy nhp ti bi tt c cc chng trnh nm bn trong DW nh b trch lc, chng trnh chuyn i v.v... ng thi th mc ny cng c kh nng truy nhp ti ngi s dng u cui cho vic xem, ly v truy vn d liu. Ni dung ca th mc thng tin ny l siu d liu gip cho ngi s dng v mt k thut hay nghip v u khai thc c sc mnh ca mi trng DW. Th mc ny gip tch hp, duy tr, v xem ni dung ca h thng DW. Xut pht t nhng yu cu v k thut, th mc thng tin v kho siu d liu: L mt cng i vi mi trng DW v c truy nhp t bt k mt platform no thng qua nhng kt ni trong sut. H tr mt s phn tn v ti to li ni dung ca chng vi cng sut v tnh sn sng cao. C kh nng tm kim bi nhng t kho hng nghip v. ng vai tr lm c s cho cc cng c phn tch v truy nhp d liu ca ngi s dng u cui. H tr vic dng chung nhng i tng thng tin nh cc truy vn, bo co, thu thp d liu v s ng gp gia nhng ngi s dng. H tr phong ph cc la chn thi hn cho nhng yu cu bao gm theo yu cu (on_demand), mt ln (one_time), lp (repetitive), hng s kin (event_driven) v phn phi c iu kin (lin quan ti h thng phn phi thng tin) H tr s phn tn cc kt qu truy vn ti mt hay nhiu trm ch trong bt k mt khun dng c th no ca ngi s dng. H tr v cung cp cc giao din ti cc ng dng khc nh th in t, bng tnh v lp lch lm vic. H tr ngi s dng u cui iu khin trng thi ca mi trng DW (lin quan ti cc thnh phn qun l v qun tr).

- 31 -

Data Warehouse

Tt c cc thnh phn ca DW u cn v c th ly d liu t Metadata. Metadata c lu tr khu vc trung tm. Metadata c th xut hin theo nhiu khun dng v c th trong sut. 2.2.2 Cc c tnh ca siu d liu 1. Tnh lch s: Cung cp cho ngi s dng u cui nhng thay i ca DW theo thi gian, Metadata cng th hin qu trnh kin to v lch s pht trin ca DW. 2. Gn vi thi gian: Cn bit khi no cc nh ngha ca Metadata i din cho thng tin. Chng hn Metadata c th c to ra cho ln lp th 2 trc khi ln lp ny c sn sinh trong DW. 3. Khng d thay i: Metadata nn c nhp vo ti mt ni v vic cp nht thng tin c lm trn cng c ca bn gc. 4. Tnh m: Metadata c th c tp hp v chia s t cc ng dng khc nhau. 5. Ch c: Khng c php cp nht, xa v chn thm bi ngi s dng u cui. Ngi s dng u cui c th thay i khung ty thch khung nhn v Metadata ca h. Vic kim sot thng tin vo c thc hin bi mt nhm ngi.
2.3 Kho d liu o

Khi xy dng v khai thc DW, ngi ta cn hay s dng khi nim Virtual Data Warehouse- VDW nh l mt cch ci t nhanh chng DW m khng cn sao chp li nhiu b d liu. VDW l mt kho d liu logic m NSD c quyn truy nhp trc tip vo nhiu ngun d liu thao tc khc nhau thng qua nhng cng c trung gian. VDW c s dng nng cp kh nng ca mng i vi mi cng c ca NSD, mi ni trn mng mi ni u c th truy cp vo cc d liu thi gian thc v cc d liu dn xut cn thit.

- 32 -

Data Warehouse

Chng 3: Kin trc kho d liu


Cc kin trc d liu Cc kin trc d liu cho Metadata Kin trc logic, D liu nghip v Danh mc kho d liu (catalog DW) Cc chc nng ca DW
3.1 Kin trc d liu nghip v

C ba m hnh kin trc d liu: Kin trc d liu mt tng, Kin trc d liu hai tng Kin trc d liu ba tng. D liu nghip v bao gm tp tt c cc d liu c s dng trong cc qu trnh thc thi v qun l cng vic, thng cn c gi l d liu x nghip (enterprise data). 3.1.1 Kin trc d liu mt tng (Single-layer Architecture) Nguyn l chnh ca kin trc ny l mi phn t d liu mt ln lu tr v ch mt ln. y cng chnh l im mnh ca kin trc ny. Bi v nh th s ti thiu c khng gian nh v trnh c vn phi qun l vic duy tru nhiu bn ghi d liu b sao chp m bo chng phi ng b, nht qun.

Operational System

Informational System

Real-time Data

Enterprise data Hnh 3.1 Kin trc d liu hai tng

- 33 -

Data Warehouse

V d: Kin trc ny ph hp cho vic t chc kho d liu thng tin a cht, a l GIS, nh cc d liu phc v cho vic khai thc du chng hn. Thng thng, d liu khai thc l rt ln v vic phn tch d liu i hi tm kim nhiu mu cc k chi tit. im yu ca kin trc mt tng: Ch yu l thng c s tranh chp nhau gia cc ng dng tc nghip vi cc ng dng thng tin, dn ti vic cc d liu c cung cp khng p ng v thi gian. Khng h tr s dng phn tn d liu. Lu : Trong thc t, vic pht trin DW cc d liu lch s t mt ngun thng nhanh hn kho d liu tng t t nhiu ngun khc nhau. Nh trn nu, ngi ta c th s dng khi nim Virtual Data Warehouse- VDW nh l mt cch ci t nhanh chng DW m khng cn sao chp li nhiu b d liu. 3.1.2 Kin trc d liu hai tng (Two-layer Architecture) Mt im ci tin ca kin trc nu trn l phn tch vng d liu s dng khc nhau ca hai loi h thng: h thng thao tc v h thng x l thng tin.
Operational System Informational System

Derived Data

Real-time Data

Hnh 3.2 Kin trc d liu hai tng Tng di, gm nhng d liu c s dng cho cc ng dng tc nghip, thc hin c c c v ghi, l nhng d liu thi gian thc. Tng trn bao gm nhng d liu dn xut ginh cho cc ng dng tm kim thng tin. D liu dn xut c th c xc nh t d liu thi gian thc thng qua cc qu trnh tnh ton, hoc cng c th l bn sao ca d liu thi gian thc. u im: Kin trc ny gii quyt c vn tranh chp gia hai loi h thng ca kin trc mt tng. H tr nhng NSD u cui c nhng nhu c l c d liu khc nhau c lu tr trong vng d liu thi gian thc. iu ny c ngha l cho php a ra nhiu d liu dn xut khc nhau t cng mt d liu thi gian thc. - 34 -

Data Warehouse

Nhc im: D liu c th b lp li mc cao. Vic t chc d liu lp dn n yu cu lu tr tn km khng gian nh v vn quan trng hn l vn qun l, duy tr li phc tp hn nhiu. Khng c s tng quan mt-mt gia d liu thi gian thc vi d liu dn xut. V d:
Operational System Informational System

Derived Data

Real-time Data

Hnh 3.3 Khn nng khng tng tch gia d liu thi gian thc v d liu dn xut trong kin trc hai tng Mc d c nhng nhc im nh trn, nhng khng c ngha l kin trc ny khng c s dng. Kin trc ny c Info Center s dng xy dng kho d liu thng tin khoa hc v ng dng. Ngy nay, nhu cu phn tn d liu ti tn cc my PC cho nhiu NSD khp ni trn mng din rng WAN v mng cc b LAN i hi phi c gii php khc, chnh l kin trc ba tng. 3.1.3 Kin trc d liu ba tng (Three-layer Architecture) Vn ct li ca kin trc ny l d liu thi gian thc chuyn sang d liu dn xut thay v mt bc nh kin trc trn l phi thc hin qua hai bc. 1. Ho hp (Reconcile) cc d liu t tp d liu thi gian thc, tng trung gian 2. D liu c ho hp cung cp d liu dn xut theo yu cu NSD.

- 35 -

Data Warehouse

Operational System

Informational System

Derived Data Reconciled Data Real-time Data

Hnh 3.4 Kin trc d liu ba tng Tng d liu ho hp trung gian chnh l mt cch thc hin chun ho CSDL. Mc ch chnh ca tng trung gian l thu thp nhiu d liu khc nhau t cc h thng thng tin tc nghip phn tn t hp li vo mt bc tranh d liu chung cho mi x nghip. V d: Cc d liu t file Khch hng (Customer File) t h thng nhp n hng v h thng Ho n c hp nht li thnh bng d liu v khch hng (Customer table) trong h thng thng tin kinh doanh.
Order-entry System

Customer File
ID Name Order Addr ID

Invoicing System

Customer File
Name Bill Addr

Order-entry System

Customer Table
ID Name Order Addr Bill Addr Ship Addr

Hnh 3.4 V d v s kt hp d liu c trng ca kin trc ba tng: H tr cho nhng yu cu cn nhng thng tin mi t d liu H tr vic ti k ngh cc ng dng tc nghip

- 36 -

Data Warehouse

Gim thiu c s lng d liu thng tin qun l Gim thiu s lp li ca d liu.
3.2 Kin trc Metadata

Tng t nh d liu nghip v, vic phn loi Metadata thnh mt s loi cng s dn n vic phi xc nh kin trc d liu thch hp cho Metadata. Hin nay, kin trc thng c s dng cho Metadata gn vi kin trc ba tng. Cu trc ca Metadata gm ba phn c quan h tng tc vi nhau nh sau: Limited write access BuildTime Control Read-only access

Usage

Business data

End user

Hnh 3.5 Kin trc d liu ca Metadata Siu d liu thi gian xy dng (Build-time Metadata) Nhng cng c h tr xc nh v biu din nhng thng tin nghip v mt cch c ngha thng c s dng to lp v qun l cc siu d liu thi gian xy dng. chnh l cc cng c m hnh ho d liu nh cc CASE (McFaden and Hoffer 1994). Siu d liu thi gian xy dng c thit lp thng qua cu trc, cch lu tr v thi gian thu thp c d liu bng cc CASE. Cc cu trc ca siu d liu thi gian xy dng thng th hin yu cu ca ngi thi t k v nhng ngi pht trin chng trnh ng dng v CSDL, do vy i khi khng tht ph hp vi ngi s dng u cui. Nhng NSD u cui thng c nhng k nng khc nhau, h thng c mt s nhu cu c cp nht d liu mt cch hn ch, do vy cn phi c hng dn (iu khin) mt cch cn trng. Siu d liu iu khin (Control Metadata) Siu d liu iu khin m t d liu lu hnh v d liu tin dng ca d liu nghip v. D liu lu hnh l d liu c cc chng trnh ng dng hoc cc cng c to ra v cp nht t d liu nghip v. Siu d liu lu hnh (Currency Metadata) tn ti nhiu mc chi tit khc nhau. Chng c lu tr theo hai mc: Cc siu d liu lu hnh c lu mc bng / tp (File/Table)

- 37 -

Data Warehouse

Cc siu d liu lu hnh c lu mc dng / bn ghi (Record/Row) Tng t i vi d liu tin dng (Utilization Metadata). Thit lp v duy tr d liu tin dng l trch nhim ca cc cng c c s dng truy cp vo cc ng dn d liu. Siu d liu s dng (Usage Metadata) Tm quan trng ca siu d liu s dng ch pht hin c vi s xut hin ca DW v khi lng ln d liu c to ra NSD khai thc.
3.3 kin trc logic ca DW

Nh trn phn tch, d liu nghip v bao gm ba loi: d liu thi gian thc, d liu tng hp v d liu dn xut. Mi loi d liu ny c th cc tng khc nhau v mi tng c cu trc vt l ring. Kin trc d liu logic cho DW c th xy dng nh sau: Business Information warehouse
Derived data

=
Data warehouse

Reconciled data

Business Data Warehouse Business data warehouse Business data warehouse

Real-time data

Operational System Hnh 3.6 Kin trc ba tng ca DW H thng thao tc, hay tc nghip (Operational System) H thng thao tc l chng trnh phn mm ng dng thc hin cc nghip v v cc d liu c lu h thng tp hay CSDL. H thng thao tc bao gm cc ngun d liu ca DW. D liu c to ra trong cc h thng x l giao tc hng ngy ca x nghip. Kho d liu nghip v (Business Data Warehouse BDW) Kho d liu nghip v BDW l dng ci t vt l nhng d liu tng hp c - 38 -

Data Warehouse

thi t k iu khin v cung cp d liu n gin, nht qun cho NSD u cui. BDW c nhng c tnh sau: Chi tit, Lch s, Nht qun, Chun ho. BDW rt t khi c NSD khai thc trc tip. N l ngun cung cp d liu cho kho d liu thng tin tc nghip. Kho thng tin tc nghip (Business Information Warehouse BIW) y l h thng thng tin c s dng lm bo co, phn tch, hay d on v nghip v. N bao gm nhng thng tin qun l, tr gip quyt nh v cc h thng thng tin thc thi nh cc h thng phn tch th trng, cc chng trnh ng dng khai thc thng tin. BIW c xy dng hoc trc tip t BDW hoc gin tip t nhng BDW khc.
3.4 D liu nghip v (Business data)

Trong kin logic tt c cc dng d liu u thc hin chuyn i theo mt chiu, t thi gian thc n d liu d liu tng hp ri cui cng l d liu dn xut, tng ng l t OS n BDW ri n BIW. Tuy nhin, nhiu yu cu ca nghip v li i hi dng d liu theo chiu ngc li. l cc yu cu: Tnh chnh xc (Correction). Khi NSD u cui pht hin ra nhng sai st, h thng sa li d liu thnh d liu ring v h thng cng c l khi mun sa li nhng d liu gc m bo tnh nht qun ca cng vic. Nhng chnh sa ny l cn thit trong cc h thng thao tc, trong BDW v BIW. iu chnh (Adjustment). iu chnh l s thay i cc phm tr d liu. D liu c s dng li theo nguyn bn, nhng sau NSD li c nhu cu s dng hay phn tch chng bng nhiu cch khc nhau. iu ny dn n nhu cu phi thay i d liu BDW v khi khi c h thao tc. S dng li d liu (Data reuse). D liu dn xut li c th tr thnh u vo cho nhng qu trnh nghip v khc. V d, trong phn tch nhu cu mua sm ca khch hng, ngi qun l li yu cu kt hp li mt s lp khc hng c phn loi. Nhng lp khch hng mi ny sau c s dng nh l c s trong cc h thng kinh doanh. Ngha l h thng thao tc yu cu nhng thng tin trn. D liu d bo (Predictive data). D liu c s dng on trc xu th pht trin v tp nhng trng thi thao tc bt u t BIW v cn c s dng thit lp li d liu trong cc h thng thao tc. V d: kt qu phn tch gi ca cc nguyn liu th t d liu dn xut ( tng ba, tng d liu dn xut)

- 39 -

Data Warehouse

cho php tnh ton li gi bn cho cc sn phm m hng sn xut v kinh doanh. D liu c nhn v d liu cng khai (Personal data and public data) D liu c nhn (d liu ring) l nhng d liu c kim sot v qun l bi nhng ch s hu ring. Trong kin trc ba tng, d liu c nhn c (d liu ring) c php tn ti tng d liu thi gian thc v tng d liu dn xut. mc khi nim th hai loi d liu ny khng cn phn bit. D liu ring khng c tng trung gian, tng d liu tng hp. Tuy nhin mc logic, th d liu cng khai v d liu ring li cn phn bit tng ca d liu dn xut. Cc kho d liu c nhn cng c th c xy dng t nhng BDW hoc t nhng BIW khc. Trong kin trc ba tng di y, d liu thi gian thc thuc phm vi c th cn phi hp nht trc khi c chuyn thnh public BDW. Business Informational
Personal Public Personal

Data warehouse

Public Public

Business data warehouse

Personal

Public

Public

Operational System
Hnh 3.7 D liu ring v cng khai trong DW
3.5 Danh mc ca kho d liu

Danh mc ca DW (Data Warehouse Catalog DWC) l s lu tr vt l tt c cc siu d liu s dng v iu khin trong DW, c phn chia v phn tn gia cc BDW v nhiu BIW. l tp con ca kho d liu thng c gi l Th mc d liu tc nghip, Th mc thng tin, v.v. N bao gm tt c cc Metadata cn thit s dng v qun tr kho d liu. Thng thng l tt c cc siu d liu s dng, mt phn siu d liu iu khin ca BDW v BIW, cng mt phn siu d liu s dng ca h thng tc nghip

- 40 -

Data Warehouse

OS nh hnh v 3.8. Siu d liu thit lp trong thi gian xy dng khng cha trong BDW bi v qu trnh xy dng DW v mt logic l tch bit vi qu trnh s dng v qun l d liu trong kho. Mt phn siu d liu iu khin BIW khng cn a vo danh mc bi v chng ch h tr cho thit lp c cu ca kho. Mt phn siu d liu iu khin BDW thc hin lp lch lin quan n tnh hin thi ca d liu. Danh mc c s dng ci t DW v truy cp vo d liu trong kho d liu.
Business Informational warehouse

B u i l d T i m e

D a n h M c
D W

Usage Control Usage Control Usage Control


Business data warehouse

Operational System
Hnh 3.8 Catalog ca DW
3.5 Cc chc nng ca DW

Nh trn phn tch, kin trc ba tng l thch hp i DW, n phn nh c mi lin h gia cc d liu v ph hp vi cc yu cu ca NSD. Lin quan n cc thnh phn ca kin trc trn l cc chc nng: 1. Xy dng d liu cho BDW: thu thp d liu t cc h thng tc nghip 2. Xy dng d liu cho BIW: thu thp d liu t cc DBW 3. Xy dng d liu cho danh mc DWC: thu thp d liu t cc siu d liu khi xy dng DW

- 41 -

Data Warehouse

4. Cc chc nng qun tr h thng kho d liu: lu tr, x l, truy cp, truyn tin, v.v. Ba chc nng u lm nhim v tp hp d liu cho kho d liu theo kin trc ba tng. Chc nng th t cung cp cc dch v v s tin dng cho NSD nhm khai thc, x l, khai thc d liu nhm p ng mi yu cu v thng tin hng ngy ca con ngi. 5. NSD c th s dng d liu v siu d liu theo nhiu cch khc nhau. D liu c th c thm d v phn tch c c nhng kt qu theo yu cu, cn siu d liu ch thm d (exploration) m khng phn tch c, n ch gip chng ta hiu v d liu m thi. Phn tch l tm kim s thc tn ti gia cc d liu, cn thm d ch ch gii hn mc tm kim mi lin h gia cc phng din khc nhau ca d liu. S khc nhau trn dn n hai thnh phn chc nng: Giao din thng tin nghip v (Business Information Interface - BII): cung cp cc chc nng theo yu cu ca d liu. Mi truy cp vo d liu trong kho c thc hin thng qua BII. Hng dn thng tin nghip v (Business Information Guide - BIG): cung cp cc chc nng cn thit cho siu d liu. Chc nng ny cn s dng DWC, hiu c ngha v li ch ca siu d liu c s dng trc .
Business Info. Interface
Business info. Guide

Business Informational warehouse

Data warehouse catalog

BIW population

Business data warehouse

Data warehouse management + Data access + Proces management + Data transfer + archive and retrieval + Database management

DWC population Build-time Metadata

BDW

Operational System
Hnh 3.9 Ton b kin trc logic v cc chc nng ca DW - 42 -

Data Warehouse

T nhng kin trc tng th chng ta c th a ra cch nhn tng quan v kin trc kho d liu, trong th hin c cch np d liu v kho, cch truy nhp, x l thng tin, v.v. Legacy Systems

Gather

Information Acquisition Layer

Refine Aggregation Store

Information Store Layer

Data Warehousse

Staging Process Data Mart Credit management Customer relationship Information Delivery Layer Hnh 3.10 Tng quan v kin trc kho d liu Mt trong cc chc nng quan trng ca DW l qun tr kho d liu (data warehouse management ) bao gm mt s cc chc nng m nhn vic thao tc v qun tr ton b d liu trong mi trng ca kho d liu v cc thnh phn c xy dng. l cc cc chc nng: LAN Data Mart Financial control

Product Management

- 43 -

Data Warehouse

1. Truy cp d liu (Data Access). Mi truy cp vo DW thc hin thng qua BII. Khi chc nng ny bao gm hai khi con: Khi con truy nhp v khi con phn tch, to lp bo co. Khi chc nng th nht: Truy nhp trc tip vo Data Warehouse . Truy nhp vo cc Datamart. Gia cng li v bin i d liu thnh cc loi d liu c cu trc phc tp hn. Khi cc chc nng th hai: To ra cc cng c chun to bo co, phn tch, m hnh ho tc nghip. To ra cc phn mm tr gip ra quyt nh, cc phn mm khai thc d liu. C hai khi con ny u c c ch qun l siu d liu ca chng. 2. Qun tr cc qu trnh (Process Management). Cc thnh phn ca DW c th hot ng trn nhng mi trng khc nhau. Cc qu trnh thit lp BDW, BIW, DWC c th mt phn c lp, nhng phn ln l c s ph thuc vo nhau. V d: BIW c th bt u c xy dng khi BDW c hon tt. 3. Chuyn ti d liu (Data transfer). Chc nng ny m nhn vic chuyn d liu vt l vo hoc bn trong h thng DW. Vic chuyn ti d liu trn mng bao gm cc loi h thng sau: Cc giao tc mng nh TCP/IP ( l cc quy nh chung cho trao i d liu). Cc c ch qun l mng: V d nh IBM Net View, Sun softSunnet manager Cc h iu hnh mng. Cc loi mng v d Ethernet, Tokenring... Chc nng ny cc loi thit b sau h tr: Cc cng kt ni c s d liu (Database gateways), cc thit b chuyn ti gia cc giao thc. Cc phn mm lp trung gian hng thng bo (Message Oriented Middleware) v d nh IBM MOSeries. Cc h sao chp v truyn b nh h IBM truyn b d liu quan h. Cc yu cu v an ton d liu v phn quyn truy nhp cng c m bo khi thc hin. 4. m bo an ninh d liu (Security). Trong DW cha cc tp d liu v mt t - 44 -

Data Warehouse

chc, xi nghip, do vy lun c nhu cu phi m bo qun l c quyn truy nhp v s dng v nhng d liu . y l vn quan trng. 5. Qun tr CSDL. Kho d liu c th xem nh l tp cc CSDL, c tp trung ln phn tn, do vic qun tr chng l cn thit. N bao gm hai chc nng chnh qun tr d liu v qun tr siu d liu. Qun tr d liu. Bn thn kho d liu l mt h thng thng tin ln cho nn cng ging nh cc h qun tr c s d liu tc nghip thng thng vic qun l d liu ng mt vai tr rt quan trng, nht l khi phi qun l mt khi lng rt ln cc d liu lch s v hin ti, vi nhiu kiu loi khc nhau rt phong ph v a dng c lu tr trong nhiu loi hnh vt mang thng tin. Vic qun l d liu ny to mi trng hot ng cho chnh cc khi chc nng. C th thy rng nhng chc nng nh nhp vo, np li, trch on d liu, tun th an ton, lu tr, khi phc d liu c trong Data Warehouse l nh lp qun l d liu. Nhng chc nng chnh lp qun l d liu l: Sao li cc d liu thch hp t ngun d liu chn phc v cho vic tinh ch v gia cng li d liu trong Data Warehouse. Gim st v p ng cc i hi cho cc d liu mi rt t cc ngun d liu khc nhau. Bo qun cc d liu trong cc ngun d liu tc nghip v np li hoc cp nht v lm sch d liu.

Mt khc, c th thy lp qun l d liu s thng nht cc phng php qun l d liu, cc th tc, cc php ton phc v cho vic an ton, phn quyn truy nhp, lu tr v khi phc d liu. Vic thc hin cc x l song song cc cht vn v phc hi vic s dng cc x l song song cho vic truy nhp d liu cng c qun l trong lp ny. Chng ta c th thy lp qun l d liu c nhng chc nng qun l mi khc vi cc chc nng ca h qun tr c s d liu thng thng. Qun tr siu d liu. V tnh a dng ca cc kiu loi d liu v cc phng php qun l d liu mi khc so vi cc h qun tr CSDL tc nghip, vic s dng cc d liu dng nh ngha v xc nh cc loi d liu, cc phng php x l, cc phng php qun l d liu, cc biu bng, ... trong kho d liu tng ln rt ln, cho nn phi tnh n vic qun l d liu ny. V th trong kho d liu phi hnh thnh lp qun l siu d liu phc v cho cng vic lu tr, x l cc d liu ny. Trong vic thit k cc kho d liu, cc siu d liu th hin khp ni. Cc ngun d liu c c trng bi nh ngha ca cc d liu nhp vo. Vic b sung cc nhn thi gian i hi phi nh ngha cc nhn thi gian dng trong siu d liu, ... Lp qun l siu d liu cng qun l cc d liu m t y v hon chnh cc d liu c lu tr trong DW.

- 45 -

Data Warehouse

Cc chc nng chnh ca lp ny l sao chp, to mi, lu tr, phc hi, lm sch v cp nht cc siu d liu sau y: Cc m hnh d liu vt l, logic ca Data Warehouse v Datamart; Cc s tng ng cng nh cc bng ch gii v k thut, nghip v c lu v qun l trong . Cc nh ngha d liu chun (bao gm c nh ngha k thut v miu t nghip v) ca cc d liu lu tr trong DW. Cc siu d liu c bo qun v to ra trong cc khi tinh ch v gia cng li. Cc siu d liu c trong cc qu trnh phn on, kt ni , tng hp,... Cc siu d liu m t cc bo co v cc cu hi/tra cu. Cc siu d liu m t cc ch s, cc ch gii dng truy nhp d liu.

liu, ...

Cc siu d liu m t cc lut xc nh thi gian sao chp, cp nht v np li d

- 46 -

Data Warehouse

Phn II: Xy dng Kho d liu


Chng IV: Phn tch cc yu cu ca kho d liu
Lp k hoch, Tm hiu v xc nh cc yu cu, Phn tch cc yu cu H thng kho d liu DW cng ging nh cc h thng phn mm khc, c chu trnh pht trin c ci tin v hon thin lin tc. pht trin c kho d liu DW, chng ta phi thc hin ln lt cc bc: lp k hoch, xc nh cc yu cu, phn tch thnh phn, thit k, ci t , trc nghim v bo tr h thng kho d liu.
4.1 Lp k hoch

Khi thng nht xy dng d n pht trin kho d liu phc v cho cc hot ng ca mt c quan, t chc hay ca cc ban ngnh, th vn trc tin cn phi xy dng k hoch thc hin bao gm cc bc nh sau: Xc nh chin lc pht trin h thng La chn phng php v m hnh Lp k hoch Xc nh mc tiu ca h thng Xc nh phm vi ca h thng Xy dng kin trc cho h thng Tp hp Metadata Hnh 4.1 Cc bc trong k hoc xy dng DW 1. Xc nh chin lc ci t y l bc u tin, rt quan trng, quyt nh v c cu t chc ca kho d liu. C ba cch tip cn chnh: Thc hin trn xung (Top-down) Thc hin di ln (Bottom up)

- 47 -

Data Warehouse

T hp ca hai cch tip cn trn. C ba cch tip cn ny lin quan n cng vic v cng ngh. Cng ngh Cng vic, nghip v

Cch tip cn top/down Cch tip cn bottom-up Cch tip cn t hp Hnh 4.2 Cc cch tip cn pht trin kho d liu Tu vo iu kin, tnh hung trong nghip v ca n v m la chn chin lc ci t h thng. Cch tip cn top/down c la chn khi c nhng iu kin sau: Nhm pht trin h thng nm vng c cng ngh thng tin v c kinh nghim trong pht trin ng dng. Khi xy dng kho d liu, cc nh qun l, u t xc nh c r mc ch v yu cu ca h thng cn pht trin. Khi xy dng kho d liu, cc nh qun l, u t c tng r rng ni no s s dng kho d liu v s dng chng nh th no h tr quyt nh. Cch tip cn bottom / up c la chn khi c nhng iu kin sau: Khi cha xc nh c cng ngh v c nhiu cng ngh mi cn phi tip tc nghin cu, xem xt. Khi xy dng kho d liu, cc nh qun l, u t cha xc nh c r mc ch v yu cu ca h thng cn pht trin. Cch tip cn hn hp c la chn khi c cc iu kin sau: T chc pht trin h thng c nhng chuyn gia giu kinh nghim, nm vng c cng ngh thng tin trong pht trin ng dng. Ngoi ra c mt nhm sn sng thc hin vic xy dng kho d liu v xc nh c r nhng ni cn s dng kho d liu. 2. La chn phng php v m hnh pht trin kho d liu pht trin h thng kho d liu c hai phng php c bn: Phng php hng chc nng: tp trung vo chc nng l chnh, d liu l ph. - 48 -

Data Warehouse

Phng php hng i tng: xem h thng l tp cc i tng v do vy tp trung chnh vo d liu. Mi phng php cng c mt mnh, mt yu ca n. Tu vo s h tr ca cng ngh, ca nng lc ca i ng cn b tham gia d n m la chn phng php cho thch hp. Tng ng vi mi phng php cng s c mt s m hnh la chn nh: m hnh thc nc (Waterfall Model), m hnh i phun nc, m hnh xon c, v.v. 3. Xc nh mc tiu ca kho d liu Vic xc nh kho d liu l rt phc tp v kho d liu chnh l mt h thng cc CSDL ln, phc tp vi khi lng d liu khng l v thng l khng thun nht, bao qut nhiu lnh vc khc nhau. Ngoi ra cn thng xuyn xut hin nhng cng ngh mi, nhiu iu kin mi xut hin thng xuyn; nhiu khi nim v thng tin, d liu ca cc nh qun l, phn tch v NSD khc nhau khin cho vic xc nh mc tiu cng tr nn kh khn. h tr cho vic nh r mc tiu ca d n, khi lp k hoch, nn da vo nhng cu hi sau: Th trng (nhng NSD tim nng) ca kho d liu l g? Cu tr li s ph thuc vo mc nhn thc v nhu cu v gi tr ca h thng d liu. Vic la chn i tng phc v phi xt ti nhng kh nng lm vic trn mng din rng v mng cc b. Nhng lnh vc no ang s dng hoc s phi s dng kho d liu? Cu tr li cn ch vo min ng dng rt rng, nhiu chiu bao gm: ngi phc v, cc trm lm vic, khch hng trn mng v lm vic trong mi trng a phng tin, cc h thng truyn thng cng ngh cao, v.v. Nhng vn cn lp k hoch, ch yu l nhng c tnh, chc nng no? Chng bao gm: o Nhng c tnh, chc nng hin, NSD nhn thy c v c th mi tc nhn ngoi, k c nhng ngi bn ngoi t chc vn c th s dng c. o Nhng c tnh, chc nng n, khng nhn thy c, lm nhim v h tr t chc, x l v qun tr d liu trong kho. Nhng ngun d liu no c th hoc cn phi tch hp a vo kho d liu? Khi no th kho d liu c trin khai ng dng? 4. Xc nh phm vi ca h thng Trong hu ht cc t chc, l do cn pht trin kho d liu l nhm p ng nhu cu qun l, khai thc thng tin thc hin cng vic hay tr gip quyt nh trong qun l, iu hnh cng vic ca mt nhm ngi, mt b phn hay c t ch . Phm vi ca d n c th xt theo nhiu chiu khc nhau:

- 49 -

Data Warehouse

S lng & loi i tng phc v

S lng cc ngun cung cp d

Tp cc m hnh c la chn

Kh nng ngn sch, ti chnh

Thi gian thc hin d n Hnh 4.3 Mt s chiu gii hn phm vi ca h thng C hai nhm cc yu t nh hng ti vic xc nh phm vi ca h thng: a/ Nhng yu t v trin vng pht trin ca h thng. xc nh c chng, ta phi tr li: Ai, phng ban no cn s dng kho d liu? Phm vi cc cu hi, truy vn thng tin? S chiu ca khng gian d liu, s lng bo co, s lng d liu cn np vo kho, v.v. Cu tr li cho nhng cu hi trn l kt qu thu c t cc php gp, tng hp, ti x l nhiu ln cc php chn lc, lm sch t nhiu ngun d liu khc nhau. b/ Nhng yu t ph thuc vo cng ngh. xc nh c chng, ta phi tr li: Kch thc ca siu m hnh kho d liu l nh th no? S lng d liu trong kho? Cc ngun d liu cung cp d liu?, vi s lng bao nhiu? Kh nng s dng d liu t cc ngun d liu? Cc d liu v cc ti liu m t v cc ngun d liu c tt khng? Cht lng ca cc d liu v ngun d liu s nh hng nhiu n cng vic sng lc, tng hp v lm sch d liu a vo kho. Kh nng h tr ca cng ngh, ca cc CASE tools? C nhng kho d liu tng t hay khng? s lng nhng ngi c kinh nghim v s dng cc cng c khai thc d liu? 5. La chn kin trc C th xy dng kho d liu theo nhng kin trc sau:

- 50 -

Data Warehouse

Ch xy dng quy d liu (Data Mart). Kin trc ny ph hp cho cc Phng, Ban trong t chc c nhu cu ring v mt kho d liu n gin khng th p ng c mi yu cu ca h. Ch xy dng kho d liu. Trong kin trc ny, cc php x l i vi cc ngun d liu nh: lm sch, tch hp, tng hp, v.v. s c s dng chung cho mi ng dng. Kho d liu logic ph hp vi mi NSD, h tr cho vic ra quyt nh. Xy dng kho d liu v c quy d liu. Mi b phn c tiu kho, c t trong mt c cu thng nht c gi l tng kho, hay kho d liu lin hp. y chnh l kin trc ba tng phn tch nhiu chng 3. Kin trc Client/Server gm hai lp chnh: lp Server v lp Client. Server thc hin cc chng trnh trong kho, quy d liu v lu tr d liu vo kho. Client thc hin cc chng trnh khai thc, lp bo co, lu tr d liu cc b, v.v. C mt s h phn mm DBMS c thit k lm vic trong mi trng da trn nhng phn cng, h thng phn mm khc nhau: Oracle Sysbase Informix DB2/6000 Microsoft SQL Server 6. Xy dng chng trnh v d kin ngn sch Khp ni chng trnh hnh ng vi chng trnh du n. Chng trnh hnh ng bao gm cc k hoch tng hp v cc ng dng kho d liu v vai tr ca n trong t chc, x hi. Chng trnh d n l cc k hoch thc hin c th kho d liu, n phi ph hp vi th t u tin cng vic m chng trnh hnh ng nu trn ra. D tr ngn qu tng xng vi chng trnh pht trin h thng, bao gm tt c cc khinh ph hot ng: phn tch, thit k, ci t, duy tr, v.v. c lng kh nng thu hi vn u t. Chng ta phi xt ti: + Nhng chi ph c th tit kim c khi c kho d liu? + Nhng li ch m h thng c th to ra? + Xu th pht trin ca th trng? + Nhng li th ginh c trong cnh tranh? + Mc thot mn ca khch hng?

- 51 -

Data Warehouse

4.2 Phn tch cc yu cu ca h thng

Danh sch cc yu cu ng vai tr rt quan trng khng ch trong vic c t, xy dng m hnh h thng m n c trong qu trnh xy dng v duy tr h thng. Theo cc con s thng k v hiu qu ca cc phn mm, rt nhiu sn phm xy dng xong khng s dng c, hoc rt km hiu qu s dng l do cha xc nh ng v chnh xc cc yu ca h thng. Cc yu cu ca kho d liu bao gm: Cc yu cu ca ch s hu Cc yu cu ca cc kin trc s Cc yu cu ca ngi pht trin Cc yu cu ca NSD. 4.2.1 Xc nh cc yu cu ca ch s hu v cng tc qun l xc nh c cc yu cu ca ngi qun l v ch h thng th phi tr li c nhng cu hi sau: 1. Ti sao cn xy dng kho d liu v Data Mart? Nhng vn no cn tp trung gii quyt? 2. Mc ch ca t chc, doanh nghip l g? 3. Ai l ngi u t, ti tr v l khch hng? 4. Kinh ph cung cp l bao nhiu? 5. Khi no cn phi hon thnh h thng? 6. Nhng kh nng u t v my tnh, cc thit b ngoi vi, thit b ph tr, kt ni mng v ng truyn d liu, v.v. Nhng cng ngh hin i c th p dng? 7. Kinh nghim thc hin d n? 8. Nhng may ri c th xy ra? 3.2.2 Yu cu v kin trc Nh trn nu, kin trc h thng l rt quan trng, n quyt nh nhiu tnh cht v cc kh nng ca kho d liu. Kin trc l c s thit lp cc thnh phn ca mt kho d liu nhm p ng cc nhu cu hin ti v tng lai ca mt t chc. Cht lng kt cu ca h thng c xy dng ph thuc nhiu vo nhng yu t sau: Phm vi chc nng v cc c tnh m h thng s c S dng cc chun cng ngh, tun theo cc qui nh chun v qui trnh, nghip v v giao din m Kh nng m rng, kh nng tng thch ca h thng.

- 52 -

Data Warehouse

Khi xy dng kho d liu cn lu ti ba loi kin trc sau: 1. Kin trc d liu (Data Architecture). Kin trc ny m t cc mc d liu v mi quan h ca chng trong h thng. D liu l c s chng ta to lp, x l v pht trin ng dng trn chng. Cng c thch hp t trc n nay cho m hnh d liu theo cch tip cn hng chc nng l m hnh lin kt - thc th ERM - Entity Relationship Model. M hnh d liu theo cch tip cn hng i tng hin nay c s dng ph bin l ngn ng UML v cc biu lp. 2. Kin trc chng trnh ng dng. H thng c xem nh l tp cc chng trnh ng dng. phc v tt cho nhng chng trnh , kho d liu c th c xem nh l danh mc catalog cha cc chng trnh thc hin theo cc chc nng ring v mi quan h ca chng trong h thng. Mi chng trnh c th to lp, c, cp nht, ghi, hoc loi b mt s mc d liu trong cc quy d liu ca mnh. 3. Kin trc cng ngh. N m t cc thnh phn cng ngh: my ch, cc trm lm vic, giao din ho GUI, h qun tr CSDL DBMS, t in d liu, v.v. 4.2.3 Xc nh yu cu ca ngi pht trin h thng Cc kin trc s quan tm n m hnh tru tng, cn nhng ngi xy dng h thng li quan tm n nhng vn c th ca kho d liu. H c nhng yu cu v d liu, cc chng trnh ng dng, cng ngh, cng c s dng pht trin ng dng v nhng vn c s nh my tnh, phn mm h thng, mng truyn tin, v.v. l cc yu cu: Yu cu v cng ngh Hiu bit v ngun d liu v siu d liu. Nhng vn sng lc, qun tr d liu, siu d liu, mng thng tin, cc mi trng thao tc trc tuyn v cc chun ho, v.v. trong kho d liu Qu trnh lm mn v ti to d liu, cc b x l v mi trng tc nghip trong kho d liu Qu trnh lm mn v ti to d liu, cc b x l v mi trng tc nghip i vi quy d liu v siu d liu i vi khi cng c v truy cp ca NSD, cn bit v nhng phn mm h tr truy cp, tm kim, x l v phn tch d liu, OLAP, v.v Yu cu v trin khai. l nhng yu cu lin quan n kh nng ca kho d liu, cho php truy cp v cung cp nhng thng tin cn thit, kp thi v tin li. Bao gm nhng yu cu v phng php truy cp, nhn tin, cc cng c truy cp v kt ni mng, v.v. Cc yu cu v sn phm. Nhng yu cu lin quan n: Yu cu duy tr tnh nht qun, tin cy v kh nng x l ng thi ca d liu

- 53 -

Data Warehouse

Qun l c siu d liu v m hnh d liu trong kho m bo c s thng sut trong trao i thng tin gia my tnh, chng trnh ng dng v cc kho d liu Qun l c quyn truy cp, c phn quyn m bo an ton d liu ca c h thng Qun l hiu qu nhng kho d liu ln, ngay c khi cc ln Lun ci tin, nng cp c tc truy cp, x l d liu kp thi c c nhng cu tr li nhanh v chnh xc, p ng mi yu cu ca NSD. Nhng yu cu ny thng c thc hin bng c ch lp bng, ch s ho, v.v. Cc yu cu v ngi tham gia d n. ci t c kho d liu p ng nhng yu cu nu trn th cn phi c i ng cn b c trnh , c nng lc v hp tc vi nhau pht trin phn mm. H va phi c kh nng nm bt cc khi nim chuyn mn, nghip v va c kinh nghim v x l d liu, cng hp tc vi nhau hon thnh nhim v ca d n. 4.2.4 Nhng yu cu ca ngi s dng u cui Khi kho d liu c xy dng v nhng d liu u tin c a vo kho, sau c nhng NSD u cui, cc nh phn tch d liu c c s tr gip quyt nh trong cc cng vic ca h. Mc ch ca NSD l x l thng tin v h mong mun l c s dng kho d liu lm c tt c nhng g c th. Ai l ngi s dng u cui? Ngi s dng u cui ca kho d liu, trc ht l cc doanh nghip, cc k thut vin, cc chuyn vin ca nhng lnh vc lin quan. Mc ch chung ca h l cn hon thnh nhanh v tt hn nhim v c giao. Mc ch ca cc x nghip l: Tng li nhun, tng thu nhp t cc hot ng nghip v Tng thm th trng, nhiu khch hng Gim n mc ti thiu cc chi ph, ph tn trong cc hot ng sn xut, kinh doanh, giao dch, v.v. Nh vy, kho d liu c th c xem nh l phng tin gip nhng NSD lm vic hiu qu v nng xut cao hn. i vi kho d liu c th k ra nhng NSD: nhng ngi marketing, bn hng, k ton ti v, cc nh qun l, k s, cc chuyn vin thng k, v.v.

- 54 -

Data Warehouse

Manager Researcher, Engineering DataWarehouse

Sales

Marketing Banking, accounting

Actuary

Businessman

Hnh 4.4 Nhng NSD kho d liu Nhng ngi s dng kho d liu c th phn loi thnh cc nhm khc nhau nh: nhng ngi s dng khng thng xuyn v s dng thng xuyn, hay nhng NSD trc tip hoc gin tip. Siu d liu v ngi s dng Mt trong nhng phng din quan trng nht i vi NSD l siu d liu. Siu d liu l d liu v d liu. Ni mt cch n gin, siu d liu l danh mc cha cc ni dung (d liu) trong kho d liu. Hy xem th gii m ch c cc d liu th v nhng NSD nh trn. Mt ngy no , gim c ca mt c quan gi mt ngi trong s h (chuyn gia phn tch d liu) yu cu cho bit mt s thng tin v mt lnh vc no y. Nu anh ta bit c cn tm d liu u v lm th no tr li c cu hi ca giam c th mi vic u tt p. Trng hp ngc th anh ta lm th no? Hin nhin l anh ta tm mi cch truy cp v tm kim nhng d liu c c nhng thng tin cn thit theo yu cu ca gim c. Nu may mn th anh ta tm ra nhng d liu hu ch phn tch, ngc li s mt kh nhiu thi gian tm kim v nhiu khi tm c nhng d liu khng ng, khng chnh xc. By gi chng ta hy xt cng mt kch bn trn nhng vi s c mt ca siu d liu. Khi chuyn gia phn tch khng cn phi tm kim d liu trong kho m c th da vo siu d liu c c nhng d liu cn thit v thc hin phn tch chng. Siu d liu c th c xem nh l bn giao thng thnh ph, hng dn cho nhng ngi khng thng tho ph x c th nhanh chng xc nh c ng i ti ni mnh mun ti.

- 55 -

Data Warehouse

Hnh 4.5 ch ra mt s kiu siu d liu gip mi NSD nhanh chng c c nhng thng tin cn thit. Alias Mapping Metrics Data Model Source Data Warehouse Metadata Security Loading Schedule

DW decriptions Hnh 4.5 Siu d liu v ngi s dng

Metrics: H metric (h s o) nhm gip cho vic tnh c s lng (m) v bit c hnh dng ca d liu trong kho, v d s cc bn ghi d liu, n v d liu ca tng trng d liu, v.v. Thng tin v b danh (Alias Information): cho php mt trng d liu c xc nh vi nhiu hn mt tn gi. Mt phng c th bit v d liu trong kho vi mt tn gi no v phng khc c th cng gi d liu vi tn khc. iu ny p ng nhng nhu cu v t tn gi d liu theo nhng qui c ring ca tng c quan theo truyn thng, cng nh kh nng s dng tn gi d liu c chun ho. Thng tin v m hnh (Data Model): m hnh d liu rt hu ch cho cc nh phn tch, n cung cp bn mc cao tm ra c nhng d liu cn thit cha trong kho d liu. Mi lin kt gia m hnh d liu v cc bng d liu trong kho l rt quan trng trong cc chin lc t chc v khai thc thng tin. S an ninh d liu (Security): DW yu cu mc m bo an ninh d liu cao hn cc CSDL. NSD phi c m bo rng nhng d liu m h khai thc c l ng v chnh xc l nhng g c lu li t thc t cc nghip v. Lch biu thi gian np d liu (Loading Schedule): Khi truy cp vo d liu, iu quan trng l nhiu khi NSD cn bit khi no d liu c lm ti li, khi no c np v (hm trc, thng trc, nm trc, v.v.). M t v kho d liu (Data warehouse Description): NSD cn bit v cc bng d liu, cc trng trong mi bng v cc thng tin m t v chng. Truy vn kho d liu NSD truy cp vo kho d liu thng ci c gi l "truy vn" (Query). Mt truy vn chnh l mt yu cu truy cp vo thng tin trong kho d liu cng vi mt s php x l d liu trc khi nhng kt qu truy vn c tr v cho NSD. - 56 -

Data Warehouse

S truy vn ca NSD l rt a dng. Mt s cu truy vn cn truy cp vo rt nhiu d liu, trong khi mt s khc li ch cn truy cp vo 1 hoc 2 records d liu. Nhiu truy vn phi thc hin lp li nhiu ln theo nhiu chu k khc nhau, nhng cng c truy vn ch thc hin mt ln, v.v. Vn quan trng nht hin nay l tnh hiu qu ca truy cp. Lm th no nhanh chng c c nhng thng tin cn thit t vic truy vn vo kho d liu vi gi thnh thp nht c th. p ng cc nhu cu nu trn th phi: Xy dng kho d liu vi nhng thit k hp l nht, Nng cao v ci tin thc hin truy vn. nng cao hiu qu truy vn vo kho d liu chng ta c th thc hin nhng k thut sau: 1. S dng c ch nh ch mc (Index) truy cp vo d liu trong kho. Ch mc cho php ch chn ra nhng records d liu cn truy cp m khng cn truy cp vo khi lng ln cc records d liu khc khng cn thit. 2. Thc hin ti u ho cc cu truy vn m bo rng ch cn truy cp cc tiu khi lng d liu cn thit theo yu cu. 3. K thut th ba l s dng b qun tr cc ngun d liu ca h qun tr CSDL (DBMS Reource Governor). Mt s DBMS c h qun tr cc ti nguyn ngun, gip cho vic thc hin cc truy vn c th truy cp vi ng nhng ngun d liu mong mun.
4.3 Cc thnh phn ca kho d liu

Kho d liu l mi trng, khng phi l sn phm thun tu. Trong kin trc tng th ca kho d liu, kho d liu tc nghip v cc chc nng qun l, x l phn tch d liu c th t chc nh sau:

- 57 -

Data Warehouse

Update Process

Management Platform

Metadata

Information Delivery System


MRDB Data Mining Tools OLAP Tools

Transform ODS Load

Data Extract, Clearning Data Load

Data Warehouse DBMS

MDDB Data Mart

Legacy & External Data Admin

Application & Tools

Report, Query, EIS Tools

Platform

Repository

Hnh 4.6 Kho d liu v cc thao tc kho d liu 4.3.1 Cc cng c thu np, lm sch v chuyn i d liu Cng vic quan trng ca qu trnh xy dng kho d liu l la chn d liu t ODS a vo kho, t chng cc format thch hp. Nhng cng vic ny c th thc hin cc cng c nh: Cc chng trnh ng dng, c th l chng trnh vit bng COBOL Cc lnh hay ngn ng iu khin nh MVS Job Control Language Unix Script SQL DDL (Data Definition Language) Nhng cng c ny h tr thc hin: Loi b nhng d liu khng cn thit t CSDL thao tc Chuyn d liu v dng c thit k Xc nh li nhng d liu b tht lc, v.v. 4.3.2 Cc cng c truy cp Truy cp d liu (Data Access) l mt chc nng ca kho d liu NSD c c - 58 -

Data Warehouse

nhng d liu, thng tin theo yu cu. NSD mong mun l c th truy cp vo nhiu platform khc nhau. Cc yu cu i vi vic truy cp vo kho d liu Mt ngn ng chung truy cp vo kho d liu Truy cp ch c cc d liu phn tn v c quan h. NSD khng cn quan tm n d liu c lu tr vt l u m ch cn c cung cp nhng thng tin theo yu cu v vi thi gian chp nhn c. Kt ni c d liu t nhiu v tr khc nhau Kt ni c mi trng d liu c quan h hoc khng c quan h vi nhau. Cu trc chung ca thnh phn truy cp d liu c m t nh sau: Enquiry Data Access Translate to SQL Split query Translate to DB API
Data source 1

Control and administration Translate to API Data Transfer Hnh 4.7 Cu trc ca thnh phn truy cp Combine results Translate to relational

Response

Data source 2

Hin nay, trn th trng khng c mt cng c no c th s dng p ng c tt c cc nhu cu truy nhp kho d liu ca NSD. Vn t ra l phi chn cng c truy nhp thch hp cho tng lnh vc, theo nhng kiu truy cp khc nhau. V d c nhng kiu truy cp nh: Truy cp vo kho d liu lp bo co, Phn tch d liu nhiu chiu Phn tch d liu thng k Cc cu truy vn tm kim hay pht hin tri thc, v.v.

- 59 -

Data Warehouse

Chng V: M hnh d liu


Nhng cu trc ca m hnh d liu Mt s vn lin quan n m hnh hnh sao D liu phc hp Thit lp m hnh l nn tng cho vic ci t. S hiu bit v h thng ngun l rt cn thit pht trin tm nhn v phm vi hot ng v m hnh trong tng lai.
5.1 Cc m hnh d liu ca DW

M hnh DW c pht sinh t mt m hnh d liu tng th (m hnh d liu mc x nghip) (Enterprise Data Model - EDM). Mt EDM l mt bc tranh tng th m cc m hnh khc c th hot ng trn . N c t chc thnh cc vng theo ch im, SA l phn chnh ca s chia nh cc cng vic cn c quan tm p ng nhu cu ngi s dng. Nu mt t chc khng c sn EDM thch hp, EDM c c php dng tip v b sung cc SA mi. bt u vic thit lp mt m hnh, cn quan tm ti khung nhn ti v tr hin ti v trong tng lai sp ti. V tr hin ti c ngha l m t v hiu nhng d liu c cha trong nhng h thng k tha (ngun k tha). Nu cc h thng ngun ang trong trng thi khng n nh th chn tip tc nhng cng vic cn thit. C th vch ra mt kha hc trong tng lai nm bt c cc h thng ngun. Mt s cng ty bt u vi mt m hnh c chun ha y cho kho d liu ca h sau ng dng k thut m hnh DW. Mt vn ni cm trong vic thit lp m hnh d liu l khng c cu tr li ng cho mi tnh hung. M hnh d liu DW c tnh ch , ph thuc vo cng vic nghip v v cc vn ny sinh. M hnh d liu ca DW c th thit lp theo: S hnh sao (Star Schema) S tuyt ri (Snowflake) M hnh a chiu (Mutiple Dimension): 5.1.1 S hnh sao S hnh sao c a ra ln u tin bi Dr. Ralph Kimball nh l mt la chn thit k c s d liu cho DW. N c gi l s hnh sao bi v cc s kin nm trung tm ca m hnh v c bao quanh bi cc phm vi lin quan, rt ging vi cc im ca mt ngi sao. S hnh sao cho php mt h thng i tng c th kt ni vi nhiu i tng khc. M hnh ny th hin cch nhn ca NSD v nhiu vn trong tc nghip.

- 60 -

Data Warehouse

Trong s hnh sao, d liu c xc nh v phn loi theo 2 kiu: Cc s kin c t chc thnh bng Fact Phm vi, hay cc chiu ca d liu, c t chc thnh cc bng Dimension. Bng Fact cha cc thng tin c s mc giao tc trong nghip v m cc ng dng cn thit. V d, khi phn tch d liu kinh doanh th cn nhng d liu v nhng mt hng bn c trong cc giao dch bn hng v s lng, chng loi, gi thnh, v.v.. Nhng d liu ny u c lu bng Fact ca kho d liu. Tuy nhin, trc khi cc d liu ny c a vo kho d liu th cn phi chn mt trng d liu no thng s dng trong cc chiu phn tch tham chiu (xem nh kho ngoi trong cc quan h lin kt) v sau a vo bng cc chiu. Cc s kin l cc i lng s ca cng vic. Cc bng Fact thng rt ln, cha hng triu dng m phn ln l s. Bng Dimension, ngc li, thng l tng i nh so vi cc bng Fact, cha cc thng tin m t. l cc b lc hoc cc rng buc ca nhng s kin bng Fact. Bng Dimension cha cc d liu cn thit cho vic thc hin cc giao tc nghip v theo mt chiu, hay phm vi no . V d, trong ng dng phn tch kinh doanh, bng Dimension bao gm: thi gian, vng bn hng, loi sn phm, v.v. Hnh 5.1 m t v mt v d s hnh sao bao gm bng Fact cc d liu v kinh doanh v ba bng Dimension v loi sn phm, chu k thi gian (thi k) v th trng. Kho nguyn thu ca cc bng Fact l m sn phm (UPC - Universal Product Code), m chu k thi gian v m th trng. Trong v d ny, nhng kho trn c th l khng duy nht, bi c th c nhiu ln cng bn mt sn phm trong cng mt chu k thi gian v cng mt th trng. Trong s hnh sao ny c ba quan h mt-nhiu lin kt gia cc dng trong bng Dimension vi cc dng trong bng Fact.

- 61 -

Data Warehouse

Period code 001 002 003 012

Year 2000 2000 2000 2000

Quarter 1 1 1 4

Month January February March December

Market code 1004 1019 1104 2010

Country England USA France Japan

Region London Paris Tokyo

Outlet ASD MARS ZXC

California XYZ

Dimesion table: time period Dimesion table: product


Product code 14003 15125 15467 15678 Brand Rsx Kph Kph Rsy Package Description type Plastic paper paper Can Product code 14003 15125 15467 23412 15467 15678 Period code 001 002 003 002 011 012

Dimesion table: market

Market code 1004 1019 1104 1004 2341 2010

Units sold 20 14 18 500 4 425

Sell Price 24.45 120.00 59.90 10.90 300.00 20.99

Hnh 5.1 S d liu hnh sao u im ca s hnh sao:

Fact table

H tr rt a dng cc cu truy vn v x l kh hiu qu nhng cu truy vn . V d, khi phn tch d liu kinh doanh hnh 5.1 theo chiu thi gian c th thc hin kh hiu qu m khng cn sp xp li d liu trong bng cc s kin. Ph hp vi cch m NSD nhn v s dng d liu v qua lm cho d liu c hiu trc quan hn. Nguyn l c bn ca s hnh sao l mt dng d tha d liu ci thin s thc hin cc truy vn. Vi s hnh sao, ngi thit k c th d dng m phng nhng chc nng ca c s d liu a chiu. S phi chun ha c th coi l s tin kt ni (pre-joining) cc bng cho cc ng dng khng phi thc hin cng vic kt ni, lm gim thi gian thc hin. D dng nhn thy, s hnh sao c thit k l khc phc nhng hn ch ca m hnh quan h hai chiu. Vi c s d liu c thit k theo s hnh sao, nhng truy vn vi nhng cu hi phc tp lin quan ti nhiu bng v s liu tng cng tr nn n gin hn v s lng cng vic cn thc hin a c ra cu tr li l t nht so vi mt m hnh quan h chun. S hnh sao

- 62 -

Data Warehouse

ci thin ng k thi gian truy vn v cho php thc hin mt s tnh nng a phm vi. S ny rt trc quan, d s dng, th hin khung nhn a chiu ca d liu dng ng ngha ca c s d liu quan h. Kha ca bng s kin c to bi nhng kha ca cc bng cha thng tin theo tng phm vi (Dimension Table). Tt c cc kha u c xc nh vi cng mt chun t tn. ly c thng tin chng hn da trn mt thnh ph ca khch hng c th, cn phi kt hp kha xc nh khch hng trong bng s kin-bng Fact (Fact Table) vi kha ca khch hng trong bng chiu - bng Dimension (Dimension Table) v t thuc tnh thnh ph ca khch hng l thnh ph m h quan tm. Nhng bng Fact c cha kha ca cc bng Dimension, c th l vi tn khc i m bo tnh duy nht ca mi hng. Cc bng Dimension thng c nh danh duy nht v cha ng nhng thng tin v chiu (Dimension) ca bng . S lng cc bng Dimension ca mi bng Fact l t 3 n 5. V bng Fact c tng hp t trc v c kt hp theo nhiu chiu nn xu hng c rt nhiu hng v tng trng mt cch nhanh chng trong khi cc bng Dimension khng c nhiu hng v s tng trng l tnh. Bng Fact c th bao gm hng triu hng. Bng Dimension cha ng cc thuc tnh c th c s dng nh cc tiu ch tm kim v thng c kch thc nh hn nhiu, rt quen thuc vi ngi s dng t trc. Kho ca n khng l kho ghp nh bng s kin. Nu mt bng Dimension bt u c s tng ng vi cc bng Fact th c th n cn c chia ra thnh cc bng Dimension. Nu mt bng Dimension c chia ra thnh Dimension chnh v Dimension ph th cu trc thu c ca kt qu c coi l mt s tuyt ri (snowflake c gii thiu phn sau) hoc mt cu trc sao m rng. Mt s hnh sao n gin ch gm mt bng Fact v mt vi bng Dimension (nh hnh 5.1). Mt s hnh sao phc tp bao gm hng trm bng Fact v bng Dimension. ci thin cng sut ca cc truy vn trong s hnh sao c th thc hin nhng k thut sau: Xc nh s kt hp cc bng Fact ang tn ti hay to ra mt s kt hp mi cc bng Fact. Phn chia bng Fact n mc m hu ht cc truy vn ch truy nhp ti phn . To ra cc bng Fact ring r. To ra nhng tp ch s n duy nht hoc cc k thut khc ci thin nng sut kt hp. Lu : bng Fact v cc bng Dimension u khng bt buc dng chun nh i vi phng php thit k truyn thng tc l c d tha d liu. Vi loi s ny cho php lu tr d tha d liu i li kh nng truy nhp nhanh hn ph hp vi nhng cu

- 63 -

Data Warehouse

hi phn tch nhiu chiu, phc tp. V bn cht, bng Fact thuc dng chun I, vi mc d tha d liu rt ln. C th ni s hnh sao l mt c s d liu ch c, vic cp nht d liu l rt kh, nu khng mun ni l khng th c. Mt vi bng Dimension cha d liu c th c thm vo bng cc truy vn c kt ni, mt vi bng khc li khng cha d liu g ngoi vic phc v nh ch s cho d liu. 5.1.2 Nhng vn lin quan ti thit k s hnh sao Mc du hu ht cc chuyn gia u ng rng s hnh sao thch hp cho phng php thit lp m hnh cho phng php DW nhng vn cn mt s vn ca h qun tr c s d liu quan h lin quan ti vic ci t s hnh sao. 1. nh ch s (ch mc) S dng vic nh ch s c th m bo s duy nht ca cc kha v c th ci thin nng sut c. V cc bng trong thit k hnh sao in hnh cha s phn cp tng th ca cc thuc tnh (chng hn vi chiu thi k (Period Dimension), s phn r ny c th l ngy tun thng qu nm, ngha l to ra mt kha nhiu thnh phn ca ngy, tun, thng, qu, nm. Cch thc ny c chp nhn cho nhng thit k bnh thng nhng n cng th hin mt vi vn trong m hnh s hnh sao. l: N i hi mt s nh ngha Metadata phc tp (mt cho mi thnh phn kha) xc nh mt mi quan h n (mt bng). iu ny lm cho thit k thm phc tp v nng sut km i nhiu. V bng Fact phi cha tt c cc kha thnh phn nh mt phn ca kha chnh, vic thm vo hay xa b mt mc trong s phn cp s i hi s thay i vt l cc bng lin quan mt nhiu thi gian v hn ch tnh linh hot trong truy vn. Cha tt c cc on kha ca mi Dimension trong bng Fact lm tng kch thc ca bng ch s v tc ng mnh ti cng sut v s n nh. Mt phng php i vi kha ghp nh trn l ct kha ra thnh cc kha n (chng hn kha bao gm tt c cc thuc tnh- ngy, tun, thng, qu, nm). Cch ny gii quyt c 2 vn u nhng kch thc ca bng ch s vn l mt vn . Cch tt nht l thay nhng kha c ngha bng vic s dng mt kha do mnh to ra l mt kha nh nht c th m vn bo m tnh duy nht ca mi bn ghi. Nhng kha c ngha c thay th nh ni trn khng cn thit phi hy b, chng c th dn gin l c chuyn n mt thuc tnh khng phi l kha. Kt qu thit k theo m hnh hnh sao bao gm mt bng Fact vi mt kha chnh c ng mt ct kha cho mi chiu ti mi kha l kha c to ra. Phng php ny cho kh nng linh hot mc cao nht, vic bo tr l t nht v cng sut cao nht c th. 2. Ch th v mc nh hng cc chiu mt cch thnh cng, vic thit k cc bng Dimension thng bao gm mt mc ch dn phn cp cho mi bn ghi. Mi truy vn ly d liu t

- 64 -

Data Warehouse

cc bn ghi chi tit ca mt bng lu tr chi tit v nhng d liu kt hp phi s dng ch dn ny nh mt rng buc thm thu c kt qu ng. Mc ny l mt cng c c ch cho cc mi trng c kim sot cht ch bi cc DBA v trong mi trng mt vi truy vn c bit c cho php s dng. Nu ngi s dng khng quan tm ti nhng ch th v mc hoc gi tr ca n khng ng th mc d qu trnh truy vn l ng vn c th a ra kt qu khng hp l. S la chn tt nht cho vic dng ch th v mc l s dng s hnh tuyt ri. Trong s loi ny, cc bng Fact kt hp c to ra mt cch ring bit t nhng bng cha d liu chi tit. Thm vo vi cc bng Fact chnh, s hnh tuyt ri cn cha cc bng Fact ring r cho mi mc kt hp, v vy khng mc li trong vic la chn cc bn ghi chi tit. Tuy nhin s hnh tuyt ri phc tp hn s hnh sao v thng i hi nhng cu lnh SQL phc tp hn nhn c cu tr li. 5.1.3 Nhng vn khc vi vic thit k s hnh sao Nhng vn trn l nhng vn ln ca s hnh sao v lin quan ti nhng ngi qun tr c s d liu, nhng vn cp sau y lin quan ti cc phng tin ca h qun tr c s d liu quan h v k thut ti u. 1. Vn kt hp tng cp 2 bng H qun tr c s d liu quan h khng c thit k dng cho mt tp ln cc cu truy vn phc tp c th c a ra i vi mt s hnh sao. Mt cch c th kh nng ly c thng tin lin quan t mt s bng trong mt cu truy vn n- c gi l x l kt hp - rt b hn ch. Mt s h qun tr c s d liu quan h (RDBMS) ch c th kt hp 2 bng ti mt thi im. Nu mt s kt hp phc tp lin quan ti nhiu hn 2 bng th RDBMS cn phi tch cu truy vn thnh mt chui cc cp 2 bng kt hp vi nhau. khng phi l hn ch kht khe nht i vi nhng cu hi n gin c thc hin bi c s d liu OLTP tuy nhin nhng k thut kt hp nh vy khng th thc hin mt cch y trong mi trng DW. S hn ch ca vic kt hp tng cp 2 bng c th hin thng qua v d sau: S dng s hnh sao di y bao gm bng SalesFact(perkey, prodkey, mktkey, Value, Units, Price) v 3 bng Dimension: Period(perkey, ...); Product(prodkey, ...); Market(mktkey, ...) c m t nh hnh 5.2.

- 65 -

Data Warehouse

Period
perKey month year quarter

SalesMonthly SalesWeekly prodKey


mktKey SalesDaily prodKey perKey dollars perKey mktKey weight prodKey dollars mktKey weight values units perKey

Product
prodKey product color model size

Market
mktKey city state region

Hnh 5.2 Gp d liu theo m hnh thc th-lin h Yu cu l lit k phn ng gp ca tt c s lng hng bn c theo mi sn phm trong cc th trng, cc loi v cc giai on khc nhau. Nh vy chng ta phi kt hp d liu t 4 bng: SalesFact, Priod, Market, Product. S lng php kt hp c nh gi l tng theo hm m so vi s lng cc bng c em ra kt hp nn trt t kt hp khng cn l vn quan trng nht, n c ti u c th thc hin trong mt khong thi gian hp l. S kt hp c nhiu thut ton khc nhau. Mi thut ton trong nhng thut ton ny cn c nh gi cho mi s kt hp. Chng hn c 5 thut ton kt hp RDBMS, cn nh gi 10!*5=18 triu php ton kt hp cho mt cu truy vn lin quan ti 10 bng. Con s ny qua ln khin cho mt s c s d liu khng chy nhng truy vn c gng kt hp qu nhiu bng. Mt RDBMS in hnh phi quyt nh trt t kt hp tng cp 2 bng trc khi mt truy vn bt u c thc hin v vy vic thc hin b tr i mt khong thi gian kh di. 2. Vn kt hp i vi s hnh sao V s lng cc cp bng cn kt hp vi nhau thng qu ln cho mt s nh gi y , rt nhiu RDBMS ti u hn ch php chn da trn mt tiu ch c th, thng l nht ra kt hp cc bng lin quan trc tip vi nhau. i vi v d trnh by trn th vic kt hp SalesFact vi Perid th tt hn l Product kt hp vi Market. Chin lc ny c v c l i vi s OLTP truyn thng cha mt mng phc tp rt nhiu cc bng c quan h trc tip vi nhau. Trong khi n t ra khng hiu qu lm vi DW v trong s hnh sao ch c mt bng lin quan trc tip ti hu ht cc bng cn li l bng Facts. iu c ngha l bng Facts l thnh phn quan trng nht cho s kt hp u tin. Nhng kch thc ca bng Facts thng l ln nht nn chin lc ny to ra tp cc bng trung gian rt ln. iu ny nh hng ti nng sut thc hin truy vn. Vn cng sut ny l gi thit rng RDBMS c th chn c cp 2 bng kt hp vi nhau tt nht

- 66 -

Data Warehouse

c nh gi theo trt t trong tp gii hn cc trt t. Trong mt RDBMS c ti u cho OLTP, truy vn c phn tch v k hoch c la chn da trn s c lng ln ca bng kt qu trung gian. Nhng c lng ny da trn s thng k ca bn thn d liu v thng khng c chnh xc. Trong bt k mt mi trng tnh ton no, s lan truyn v li ch lm cho vn tr nn ti t thm: nu c mt li trong ln nh gi u tin th li ny s c nhn ln trong mi ln nh gi mi tip theo v vy ch mt li nh s tr nn rt nghim trng. Hiu qu ca mng l ch RDBMS c th gt b trt t ti u nht khi phi tr ci gi qu cao do li xy ra trong qu trnh c lng chi ph.
5.2 S hnh tuyt ri - Snowflake

Nh ni trn s hnh tuyt ri l mt s m rng ca s hnh sao ti mi cnh sao khng phi l mt bng Dimension m l nhiu bng. Period
perKey month year quarter

SalesMonthl y SalesWeekl y prodKey SalesDaily perKey


perKey mktKey perKey prodKey dollars prodKey mktKey weight mktKey dollars values weight units

Product
prodKey product color model size

Markets
mktKey city countryKey state region regionKey

Hnh 5.3 S tuyt ri m rng ca s hnh sao Trong dng s ny, mi bng theo chiu ca s hnh sao c chun ha hn. S hnh tuyt ri ci thin nng sut truy vn, ti thiu khng gian a cn thit lu tr d liu v ci thin nng sut nh vic ch phi kt hp nhng bng c kch thc nh hn thay v phi kt hp nhng bng c kch thc ln li khng chun ha. N cng lm tng tnh linh hot ca cc ng dng bi s chun ha v t mang bn cht theo chiu hn. N lm tng s lng cc bng v lm tng tnh phc tp ca mt vi truy vn cn c s tham chiu ti nhiu bng. Mt vi cng c che giu ngi s dng u cui s c s d liu vt l v cho php h c th lm vic mc khi nim. Nhng cng c ny nh x nhng truy - 67 -

Data Warehouse

vn ca ngi s dng ti s vt l. H cn mt b qun tr c s d liu thc hin cng vic ny mt ln u tin khi cng c ny c ci t.
5.3 S kt hp

L kt hp gia s hnh sao da trn bng Fact v nhng bng Dimension khng chun ha theo cc chun 1, 2, 3 v s hnh tuyt ri trong tt c cc bng Dimension u c chun ha. Trong s loi ny ch nhng bng Dimension ln l c chun ha cn nhng bng khc cha mt khi lng ln cc ct d liu cha c chun ha. Mt vi c s d liu v cc cng c truy vn ca ngi s dng u cui nht l cc cng c x l phn tch trc tuyn (OLAP) i hi m hnh d liu phi l s hnh sao bi v n l mt m hnh d liu quan h nhng li c thit k h tr m hnh d liu a chiu l im ct li ca OLAP. Cc c s d liu v cng c ny c iu chnh cho ph hp thc hin c cc yu cu truy vn i vi m hnh ny.
5.4 Gii php cho vn nng sut thc hin ca m hnh d liu

T tng c bn ca vic ti u l chin lc kt hp cc cp bng bng cch la chn ch cc bng c lin quan ti nhau t nht. Ni mt cch khc, chin lc ti u ny cho php nhng sn phm ca cc nh cung cp ni ting nh ORACLE kt hp nhng bng khng quan h vi nhau. Khi 2 bng c kt hp v khng c ct no lin kt 2 bng vi nhau s kt hp cc hng ca 2 bng c thc hin. Trong i s quan h, cch kt hp ny c gi l tch cc. Ly v d bng PRODUCTS c 2 hng (bolts, nut) v bng MARKETS c 3 hng (east,west, central), tch cc bao gm 6 hng ( bolts/east, bolts/west, bolts/central, nut/east, nut/west, nut/central) RDBMS khng bao gi coi tch cc nh mt php kt hp tt, nhng i vi s hnh sao nhng tch cc ny i khi ci thin cng sut truy vn. Bi v bng Fact trong s hnh sao c kch thc ln hn rt nhiu cc bng Dimension m s kt hp cc cp bng c thc hin u tin vi bng Fact. S la chn ny l khng hp l v nh vy s to ra cc bng trung gian rt ln. Mt tch cc c thc hin u tin vi tt c cc bng Dimension (bng cch kt hp cc cp bng lin tip nhau) v s kt hp vi bng Fact c li li cui cng. Li ch quan trng l bng Fact khng tm thy du vt ca n trong bt k mt bng kt qu trung gian no. Chi ph ln nht l to ra tch cc cho tt c cc bng Dimension. Chi ph ny t tn km hn vic to ra cc bng trung gian do kt hp vi bng Fact. S ti u n gin khng gii quyt c tt c cc vn v nng sut thc hin. Chin lc ny ch dng c ch khi tch cc ca cc hng trong cc bng Dimension c chn t hn rt nhiu so vi s lng hng trong bng Fact. Nh vy vic kt hp cc ny ch c ch cho nhng s kt hp c kch thc nh. Nhng DW lin quan ti nhng bng c kch thc khng nh v vy mt s nh cung cp dng gii php s dng phn cng v cc phn mm song song gii quyt vn ny. Dng h thng song song c th lm gim thi gian thc hin mt truy vn n gin hoc lm thm mt s cng vic m khng lm thay i thi gian thc hin cng vic. Ngoi ra dng cc CPU

- 68 -

Data Warehouse

gm nhiu b vi x l cng ci tin c thi gian cho mt cu truy vn t 500 giy xung cn 50 giy. C ch song song khng ti u mt cch y cc x l ca s hnh sao. Di y a ra mt s sng kin tng cng sut thc hin ca Red Brick. 1. STARjoin v STARindex Bt chp nhng vn v hiu sut thc hin theo s hnh sao, m hnh quan h vn ph hp nht vi phng php DW. iu ny c khng nh li bi s chun ho SQL, s phong ph ca cc cng c lin quan ti RDBMS v kinh nghim c sn. Ni tm li RDBMS l s la chn t nhin cho kho d liu v phng php DW. Mt phng php mi x l cc truy vn phc tp c hiu qu i vi c s d liu DW m khng gp phi nhng vn nh trnh by trn l STARjoin: thc hin kt hp nhiu bng mt cch song song. RDBMS ca RedBrick c th kt hp nhiu hn 2 bng trong mt php ton n, tc nhanh. Thm ch khi kt hp 2 bng, STARjoin cng khng thc hin cc phng php kt hp c ci t bi RDBMS OLTP truyn thng. Bn cht cng ngh ny l s dng mt bng ch s lm cho cc x l nhanh hn c coi l cng sut c p dng vo tt c cc sn phm ca RDBMS. Cc ch s c xc nh da trn cc ct c chn ca mt bng v kh nng la chn ca truy vn b hn ch bi cc ct ny, RDBMS c th s dng bng ch s ny xc nh cc hng cn quan tm nhanh hn. H qun tr c s d liu quan h ca RedBrick h tr cch to ra ch s c bit c gi l STARindex lm cng sut thc hin tng hn rt nhiu. N khc vi cc cu trc index truyn thng nh B_tree hay Bitmap. N c to ra trn mt hoc nhiu ct ng vai tr l kho ngoi ca mt bng Fact. Khng ging nh cc ch s truyn thng lu tr thng tin dch gi tr ca mt ct thnh mt danh sch cc hng vi gi tr , mt STARjoin cha ng thng tin nn lin kt cc chiu ca bng Fact ti cc hng cha cc chiu ny. N c hiu qu v khng gian v vy n c xy dng v duy tr rt nhanh. Nh c STARindex m RDBMS c th xc nh c cc hng ch trong mt bng Fact cn thit cho mt tp cc chiu c th mt cch nhanh chng v STARindex c to ra nh cc kho ngoi. Mi kiu truy vn u c th s dng STARindex v kt hp cc bng c quan h vi nhau mt cch nhanh nht. C mt s im tng t v mt s im khc nhau c bn gia STARindex v vic nh ch s nhiu ct truyn thng. im khc nhau u tin l nh ch s nhiu ct ch tham chiu ti mt bng n, cn STARindex c th tham chiu ti nhiu bng. Khc nhau th 2 l vi phng php nh ch s nhiu ct, nu mt mnh WHERE ca mt cu truy vn khng b rng buc trn tt c cc ct trong bng ch s ghp th bng ch s khng th c s dng y tr khi cc ct c th l mt tp con cc ct chnh. Thut ton STARjoin c th s dng sc mnh v tnh linh hot ca STARindex

- 69 -

Data Warehouse

xc nh tt c cc hng c i hi trong mt kt hp c th mt cch hiu qu. Chng hn, thay v to ra tch cc y ca cc bng Dimension, STARjoin c th dng STARindex kt hp cc bng Dimension vi bng Fact m khng tn chi ph to ra tch cc. STARindex cho php STARjoin xc nh nhanh chng khu vc no ca khng gian tch cc cha nhng hng cn quan tm. Thut ton STARjoin c th to ra tch cc ca nhng vng c cc hng cn thit v b qua nhng nhng vng khng c hng no. Xt v d sau thy r iu . Gi s c 500 kh nng ca PRODUCTS, 200 ca MARKETS v 300 ca PERIODS v c bng FACTS mt triu bn ghi trong c s d liu ca DW. Gi thit thm rng mt truy vn c th chn 50 PRODUCTS, 20 MARKETS v 30 PERIODS s chn 1000 hng trong FACTS. Vi mt chin lc kt hp truyn thng s to ra 111000 hng. To ra tch cc tt hn v ch phi to ra 50*20*30=30000 hng trung gian v cng thm 1000 hng ca bng FACT thnh 31000 hng. Trong khi mt STARjoin c nhng rng buc tt thc t s ch to ra 1000+(1000*10%)=1100 hng. 2. nh ch s index theo kiu Bitmap Mt cch khc tng cng sut thc hin RDBMS l s dng k thut nh ch s mi cho php truy nhp nhanh, trc tip ti d liu. SYBASE IQ l mt v d v sn phm s dng cu trc nh ch s kiu ny cho d liu c lu tr trong DBMS SYBASE. SYBASE IQ khng ch l mt tp ch s bitmap chy trn mt c s d liu quan h m cn l mt c s d liu SQL ring bit. D liu c ti vo trong SYBASE IQ nhiu nh vo bt k mt c s d liu quan h no. Mi ln ti vo, SYBASE IQ chuyn i tt c d liu thnh cc chui bitmap, nhng chui ny sau c nn li v lu tr trn a. SYBASE IQ c th lm tho mn tt c cc truy vn d liu vi nhng phng tin ca n. Khng ging vi nhng ci t khc, nhng ch s khng tr ti d liu c lu tr ni khc m tt c d liu c lu tr trong cu trc ch s ny. Sybase coi SYBASE IQ nh mt c s d liu ch c cho cc Datamart vi mt s hn ch kch thc thc t l 100 GB. Lc lng d liu: Ni chung, tp ch s bitmap c dng cho nhng truy vn vi d liu lc lng t. Chng hn, lc lng ca d liu v m bang l 51 (m bang c th nhn 1 trong 50 gi tr), lc lng ca thuc tnh v gii tnh l 2 (gm nam v n). i vi nhng d liu lc lng t, mi gi tr phn bit c ch s bitmap ca ring n bao gm mt bit cho mi hng trong bng. C mt bng v ngi lm thu gm 10000 hng cha mt ct gii tnh c nh ch s bitmap cho gi tr ny. S th hin ca tp ch s bitmap l mt vector di 10000 bit, mi bit tng ng vi bn ghi tho mn iu kin gi tr ca gii tnh=M(con trai) th l 1. Tp ch s bitmap c th tr nn cng knh v thm ch khng ph hp i vi d liu c lc lng ln khi phm vi gi tr ca d liu l ln. Chng hn, cc gi tr nh thu nhp hoc tin li tc c th l mt con s c gi tr khng xc nh.

- 70 -

Data Warehouse

Mt gii php d thy l biu din cc loi d liu ny trong mt khong gi tr v d nh khong gi tr t 10$ ti 50$ v 51$ ti 100$. Nhng cch ny hn ch kh nng ca ch s bitmap v thng khng hiu qu hoc khng c ngha khi gii quyt nhng cng vic trong thc t. Mt gii php khc l s dng cu trc ch s B_tree( cy nh phn). Tuy nhin, phng php ny c th lm tng kch thc bi v khi khi lng d liu v s lng cc ch s tng th chng i hi thng xuyn c duy tr khi d liu c thm vo, c cp nht hay c xo i khi c s d liu. Cui cng, ch s B-tree c th ci thin mt cch ng k cng sut truy vn nu kiu cu hi truy vn c bit trc v tp ch s c xy dng phn nh ng dn truy nhp c bit trc. Nhng B-tree c th khng hiu qu i vi nhng cu truy vn c bit in hnh ca cc ng dng DW. SYBASE IQ s dng cng ngh c quyn l Bit-Wise xy dng tp ch s bitmap cho nhng d liu c lc lng ln hn 1000 gi tr phn bit (so vi cng ngh truyn thng l di 250 gi tr). Cc loi ch s: SYBASE IQ vi phin bn u tin cung cp 5 k thut nh ch s. Vic la chn phng php no l tu thuc vo lc lng ca d liu v cch truy nhp vo d liu nh th no. Hu ht u p dng 2 ch s cho mi ct. Mt loi l mc nh c gi l ch s chiu nhanh (Fast Projection index) v mt loi khc l ch s lc lng thp hoc cao. i vi d liu c lc lng thp SYBASE IQ cung cp: Low-fast index dng cho nhng cu truy vn lin quan ti cc chc nng nhiu ngi nh SUM, AVERAGE v COUNTS Low disk index c dng cho vic s dng khng gian a Tng t i vi d liu lc lng ln, SYBASE IQ cung cp ch s High Group v High Non-Group. C hai u h tr nhng truy vn kt hp v khi phc nhng High Group cn h tr nhng truy vn loi Group By. 3. Column Local Storage Mt phng php khc ci thin cng sut thc hin cc truy vn trong mi trng DW l ca cc nh cung cp cc h thng song song. Cch tip cn ny da trn vic lu tr d liu o ct nh kho d liu o hng truyn thng. Cch mt RDBMS truyn thng lu tr d liu trong b nh l lu tng hng ti mt thi im v mi hng c th c xem hay c truy nhp nh mt bn ghi n. Cch lu tr ny rt ph hp vi mi trng OLTP trong mt giao dch ch truy nhp ti mt bn ghi ti mt thi im. Cn trong mi trng DW, i vi cc truy vn c bit mc tiu l phi ly oc nhiu gi tr t nhiu ct khc nhau. V d, tnh gi tr trung bnh, gi tr ln nht hay nh nht lng nhn vin trong mt cng ty th kho d liu o ct ca trng lng i hi mt DBMS ch c mt bn ghi. V vy, nu DB h tr kho d liu o ct th gi tr ca ct mong mun t nhiu hng c th c lu tr nh mt bn ghi vt l n trong b nh v trong a cng. Li ch ca k thut ny rt r rng- mt thao tc vo/ra c th ly c mt bn ghi di bao gm mt tp con cc ct. Kt hp k thut ny vi RDBMS song song ci thin c ng k cng sut thc hin. - 71 -

Data Warehouse

5.5 Kiu d liu phc hp

Ta mi ch tho lun v kin trc DBMS tt nht cho phng php DW ch hn ch trong cc kiu d liu gm s v ch truyn thng. Mt xu hng r rng trong vic qun l d liu l s cn thit ca cc kiu d liu phc tp hn. Cc kiu d liu ny bao gm vn bn, hnh nh, hnh nh video ng v m thanh. Kiu d liu ny th hin cc i tng phc tp trong i sng thc cn qun l nh nhng i tng cng vi cch hot ng tim nng bn trong bn thn i tng hoc c tha hng t mt i tng khc. Mt vi kiu d liu mi ny khng ph hp vi cc phng php to tp ch s kiu B-tree, bitmap hay hashing c sn t cc nh cung cp phn mm RDBMS. DBMS ca DW phi c kh nng lu tr, truy nhp v thao tc c hiu qu vi d liu thuc kiu ny trong c s d liu vi cch thc ging nh vi cc d liu c cu trc. DBMS khng nhng phi c kh nng nh ngha nhng cu trc d liu mi m cn c nhng chc nng mi thao tc chng, nhng phng thc truy nhp mi cung cp s truy nhp rt nhanh ti d liu. Nu khng c nhng phng thc truy nhp thch ng th thao tc d liu c xu hng chm i thm ch vi c nhng my tnh c tc rt nhanh. Khi kch thc d liu tng ln th tc gim tuyn tnh. Vic thm nhng phng thc truy nhp vo mt RDBMS in hnh l mt cng vic rt kh khn ch thc hin bi i ng cc k s. Mt gii php l xy dng mt h c s d liu trong cc cu trc d liu mi, cc chc nng v cc cch thc truy nhp c th d dng c ci t nh mt phn chc nng ca DBMS. Mt DBMS nh vy c gi l h qun tr c s d liu quan h i tng (Object-relational DBMS vit tt l ORDBMS). Trong mt ORDBMS d liu c lu tr nh nhng cu trc v nhng chi tit bn trong c kh nng truy nhp ti cc chc nng chy trn server. Cc nh ngha chc nng ca ORDBMS bao gm cc cu lnh a ti b ti u v chi ph CPU v I/O ca chc nng t b ti u c th chn c mt k hoch truy vn thng minh. Cc chc nng thc hin trn server mang li hiu qu cao bi v d liu khng phi di chuyn ra khi server c thao tc. H tr cho kiu d liu ny c rt nhiu cc ng dng v c nhiu lnh vc s dng. Ni ngn gn li, s h tr v d liu dng khng gian l mt nhu cu thit yu ca bt k mt h thng thng tin no trong s nh v l mt phn quan trng trong tin trnh ra quyt nh. Mt trong nhng nh cung cp c kh nng h tr d liu loi ny l ORACLE vi Spatial Data Option v Context Option. gii quyt nhng vn kinh doanh trong thi k hin i ny nh vic phn tch th trng v d bo ti chnh i hi nhng gin c s d liu ch yu tp trung vo nhng truy vn m bn cht l a chiu v hng mng (array-oriented). Nh vy cng ngh c s d liu chnh ca DW l RDBMS, ta s xem xt vic thit k gin d liu khi n gn lin vi cng ngh c s d liu quan h.

- 72 -

Data Warehouse

5.6 M hnh d liu a chiu

Bn cht a chiu ca cc cu hi trong nghip v c phn nh trong thc t chng hn nh nhng ngi qun l th trng khng c tho mn na vi cu hi theo mt chiu n gin: ngn kh quc gia b thm ht bao nhiu phc v cho mt sn phm mi c ra i?. Thay vo h t ra nhng cu hi nh sau: "Ngn kh quc gia b thm ht bao nhiu cho mt sn phm mi c ra i trong mt thng, khu vc ty nam, b ngi s dng ph hng, theo mi ca hng bn sn phm, lin quan ti phin bn trc ca sn phm, so snh vi k hoch?". l mt cu hi 6 chiu. Mt cch quan st mt m hnh d liu nhiu chiu l nhn n nh mt hnh khi. Hnh sau th hin cu truy vn theo bn chiu: sn phm, th trng, thi gian v n v sn phm bn c.
Sn phm Th trng Thi gian nv

My tnh My tnh My tnh My tnh My tnh My tnh ... My in my in

HaNoi HaNoi HaNoi HaNoi Nam nh Nam nh ... Hai Phng Hai Phng

Q1 Q2 Q3 Q4 Q1 Q2 ... Q1 Q2

1200 1500 1800 2100 1000 1100 ... 250 300

T.trng My tnh
S.Phm 1200 1500 1800 2100

Q1

Q2 Q3 T.Gian

Q4

Hnh 5.4 M hnh d liu a chiu Bng nm bn tri cha d liu bn hng chi tit theo sn phm, th trng v thi gian. Hnh khi nm bn phi m t s lng hng bn c theo cc chiu- theo loi sn phm, theo th trng v theo thi gian-vi cc bin n v c t chc nh l cc t bo trong cc dy. Hnh khi ny c th c m rng bao gm thm mt dy khc-theo mt chiu khc na l gi tin-lin quan ti tt c hoc ch mt vi chiu (gi tin ca mt sn phm c th hoc khng thay i theo thi gian hoc khng thay i t thnh ph ny ti thnh ph khc). Khi ny c h tr tnh ton ma trn cho php khi ny th hin c dy s tin bn c n gin bng cch thc hin mt php ton ma trn trn tt c cc ca dy ny (s tin bn c =s lng *gi tin). Thi gian tr li mt truy vn nhiu chiu ph thuc vo s lng cc c thm vo trong qu trnh thc hin. Khi s lng chiu tng th s ca khi ny tng theo cp s m. Bn cnh , nhng truy vn a chiu u lin quan ti nhng d liu mc cao v d liu tng. V vy, gii php xy dng mt c s d liu a chiu c hiu qu l phi kt hp t trc tt c cc tng con logic v cc tng theo tt c cc chiu. S kt hp

- 73 -

Data Warehouse

trc ny c bit c gi tr khi cc chiu mang tnh phn cp. V d, theo chiu thi gian c th phn r thnh nm, qu, thng, tun, v ngy. Mt s phn cp c nh ngha t trc i vi cc chiu cho php c mt s kt hp logic t trc v cng cho php thc hin kh nng khoan su (drill_down) d liu, t mt nhm cc sn phm xung tng sn phm ring r, t vic bn hng theo tng nm xung theo tun, ... Mt cch khc gim kch thc ca khi l qun l mt lng d liu tha hn mt cch thch hp. Bi v thng th khng phi tt c cc d liu u c ngha i vi tt c cc chiu (nhiu c s d liu c hn 95% trong tng s cc khng cha s liu hoc c d liu = 0). Mt loi d liu tha khc c to ra khi c nhiu cha d liu b lp li (v d nh nu khi cha mt chiu gi tin, cng mt gi tin c th l c dng cho tt c cc th trng v trong tt c cc qu ca nm). Kh nng ca c s d liu a chiu b qua cc khng c d liu hoc d liu b lp li c th gim c kh nhiu kch thc ca khi v s lng cc x l. S phn cp v kch thc, qun l d liu tha hn v s kt hp trc l quan trng v chng lm gim ng k kch c ca c s d liu v nhng yu cu tnh ton cc gi tr. Mt thit k nh vy loi b vic phi kt hp nhiu bng v cung cp s truy nhp trc tip v nhanh ti cc cu tr li v vy ci thin ng k tc trong vic thc hin cc truy vn a chiu.
5.7 Tp hp d liu v khoan su - Drill down

Mt trong nhng nguyn tc nn tng ca c s d liu a chiu l tng v tng hp d liu. Nh ta bit, cc nh qun l cc cp bc khc nhau yu cu cc mc tng hp khc nhau v d liu ra c nhng quyt nh ph hp. cho php nh qun l la chn c mc t hp, kho cha phi c kh nng khoan su, cho php iu chnh mc chi tit, thm ch n tn d liu tc nghip ban u. Hnh v sau l mt v d minh ha cho vic tp hp d liu cc mc khc nhau: Hnh 5.4 Tng hp d liu Trong mt s kiu tng hp, thng dng nht l cun ln (roll-up) hay gp nhp (aggregation), mt v d ca loi ny l: ly tng s bn hng hng ngy ri cun vo bng bn hng hng thng. Dng phc tp hn l tp hp trn c s cc php ton logic v so snh. Nhm tng tc tnh ton, hu ht cc DW c np vo theo l sau khi h thng trc tuyn ngng lm vic (shutdown), v tt c cc gi tr t hp c a vo cc bng tng hp. Theo hng ny, mt DW l mt h song thc, tc l vic np d liu c thc hin vo gi ngh v vic thc hin cc truy vn l trong gi lm vic.

- 74 -

Data Warehouse

Hnh 5.5 K thut gp v khoan su d liu


5.8 V d thit k m hnh d liu nhiu chiu

Biu thc th lin h (Entity-Relationship Diagram-ERD) l mt cng c thch hp m hnh ho h thng d liu nhiu chiu. Trc tin, thit k s thc th lin kt ERD, sau chuyn n sang m hnh d liu nhiu chiu. Chng ta xt biu ERD m t mi quan h gia cc bng thc th nh hnh 5.6. Region
Region ID Region name

Store
Store ID Store name Address City State Region ID

Sale
Sale date Store ID Product ID Sale amount Sale units

Department
Dep. ID Dep. describe Product Group Prod Group ID Prod Group desc. Dep. ID

Product
Product ID Product desc. Prod Group ID

Inventory week
Store ID Product ID Quantity

- 75 -

Data Warehouse

Hnh 5.6 Mt phn biu thc th lin kt ca h thng bn hng Chiu sn phm (Product) gm phng bn hng (Department), sn phm v nhm sn phm (Product Group). Chiu v tr c vng (Region), ca hng (Store). xy dng kho d liu, ngoi hai chiu nu trn, ngi phn tch cn phi b sung thm chiu thi gian (Time), gm nm, thng, tun, ngy. Trong h thng kho d liu chng ta b qua kho hng (Inventory) m tp trung vo cc s kin trong phin bn hng (Sale Fact). M hnh d liu nhiu chiu cho h thng trn c th hin nh sau: Department Product Gr. Region Store Year Month Week Product Item Sale Fact Date

Hnh 5.7 Chuyn biu ERD sang m hnh d liu nhiu chiu Trong m hnh d liu nhiu chiu, cc thc th c chuyn tng ng sang cc chiu. Nhng thc th nm trong nhiu chiu (ln hn mt chiu) tr thnh cc s kin Fact. Bng cch nh x nh trn th m hnh nhiu chiu hnh 5.7 chuyn thnh s hnh sao nh hnh 5.8. Trong bng cc s kin bn hng (Sale Fact) c trng kho gm ba kho ngoi s dng lin kt gia cc chiu. Location TIME
TimeKey Time_Desc DateID WeekID MonthID YearID LevelID LocKey Loc_Desc RegionID StoreID LevelID

Sale
TimeKey LocKey ProdKey SalePrice SaleUnit

Product
ProdKey Prod_Desc DeptID ProdGrpID ProdItemID LevelID

Hnh 5.8 S d liu hnh sao

- 76 -

Data Warehouse

Phng php giao hng Hng khng

Cc n hng Hp ng Nhm hng Xe ti Ngy n hng

Khch hng

Thi gian

Nm

Qu

Nhm hng

Sn phm

Vng Quc gia a l Ca hng i tc

T chc

Cnh tranh

Hnh 5.9 M hnh mng hnh sao

- 77 -

Data Warehouse

Chng VI: To lp cc kho d liu


Cc ngun d liu Thit k kho d liu nghip v BDW To lp d liu cho kho BDW Bo tr v trin khai h thng kho d liu
Xy dng kho d liu l qu trnh tch hp d liu t cc ngun khc nhau vo mt kho. Cc nh phn tch nghip v c th truy vn kho d liu v sinh cc bo co, biu tr gip qu trnh ra quyt nh ca h. Mt kho d liu c th cha CSDL ln ton x nghip m NSD v ng qun tr c th truy cp hoc c th kt hp mt s h thng nh thng gi l kho d liu ch (DataMarts - DM). in hnh, mi DM gn vi mt min ch bn trong mt kho d liu ln.
6.1 cc ngun d liu

Cc ngun d liu bao gm cc h thng d liu bn trong, hoc bn ngoi ca mt c quan, t chc hay mt x nghip. Cc h thng d liu v mt t chc c coi nh cc h thng ngun, d liu bn trong, thng l nhng h thng thng tin c sn (Legacy System - LS). l nhng h thng tc nghip, h tr cc hot ng nghip v nh sn xut, hay kinh doanh. H thng ny tng c pht trin, s dng cc cng ngh c sn v vn ph hp vi cc nhu cu ca kinh doanh hin ti. Cc h thng ny c th c thc hin trong nhiu nm ti v c l khng c hoc c rt t minh chng bng ti liu. D liu bn ngoi ( External Data): l d liu khng nm trong cc h thng tc nghip ca t chc , l nhng d liu do ngi s dng u cui yu cu in vo bc tranh tng th phc v cc nhu cu cng vic ca h. 6.1.1 Phn tch cc ngun d liu Cc LS c pht trin xung quanh cc vng nghip v ca c quan cm xy dng d n. Cc ng dng c pht trin vi d liu m cc d liu ny ph hp vi cc nhu cu khc nhau, vi cng mt h thng d liu nhng vi tn khc nhau, hoc vi cc h thng o lng khc nhau, nh ngha d liu thm ch chng c nhng yu cu v d liu tng t nh nhau. Kt qu cui cng l cc ngun d liu cn c nh gi v cc nh ngha da vo Metadata nhm ti cc vn sau: Xc nh cc ngun, cc cu trc file, cc nn c s (platform) khc nhau. Hiu c d liu no c trong cc h thng ngun ang tn ti, cc nh ngha v nghip v ca d liu, v bt k cc lut nghip v no cho d liu. Pht hin s giao nhau v thng tin ca cc h thng khc nhau. Quyt nh d liu tt nht trong cc h thng- c th cng mt d liu ca nhiu

- 78 -

Data Warehouse

hn mt h thng. Mi h thng cn c nh gi quyt nh h thng no c d liu r rng v chnh xc hn. 6.1.2. Thu thp v to lp d liu Mt phn quan trng ca vic ci t kho d liu l s dng nhng d liu c tinh ch t nhng h thng tc nghip v a chng vo mt khun dng thch hp cho cc ng dng thng tin. Nhng cng c ny thc hin tt c cc cng vic chuyn i, tm tt nhng thay i quan trng, nhng thay i v cu trc v nhng c ng cn thit cho s chuyn i d liu ring r thnh thng tin c th c dng trong nhng cng c h tr quyt nh. N sn sinh ra nhng chng trnh v kim sot nhng cu lnh ca Cobol, ngn ng C, Unix script v ngn ng nh ngha d liu SQL cn thit chuyn d liu vo DW t nhiu h thng tc nghip khc nhau. N cng duy tr Metadata. Cc chc nng chnh bao gm: Loi b nhng d liu khng mong mun t nhng c s d liu tc nghip Chuyn i thnh nhng tn gi v nhng nh ngha d liu chung, tng qut Tnh ton cc tng v d liu c chuyn ha. Thit lp nhng mc nh cho cc d liu b mt. Lm cho nhng thay i v nh ngha d liu ngun tr nn thch hp. Nhng cng c ny c th tit kim c mt cch ng k thi gian v sc lc. Tuy nhin nhiu cng c c sn thng ch c ch cho vic tinh ch nhng d liu n gin. Do vic pht trin nhng th tc tinh ch cho mt s lnh vc ng dng l cn thit cho vic tinh ch d liu. Cc cng on thc hin bao gm: Trch chn d liu Lc, tinh ch d liu Thm nh d liu Gp, kt tp d liu Ti d liu vo kho Lu tr v pht tn, phn phi d liu Qu trnh ny thu thp v thit lp cc kho d liu gm nhng bc sau: Source Extract Filter Validate Merge Aggregate

Target

Archive

Load

Hnh 6.1 Qu trnh to lp d liu ca DW

- 79 -

Data Warehouse

1. Trch chn d liu (Etract) Trch chn d liu l mt php x l ly cc d liu c xc nh trc ra khi cc h thng tc nghip v cc ngun d liu bn ngoi. C th trong cc h thng d liu gc li c mt vi vn nh: khng c thng tin chi tit v h thng hoc ngi s dng u cui yu cu thng tin mc thp hn mc thng tin ca h thng hot ng c th lu tr. Trong trng hp u th cn phi quyt nh: Xo i nhng thng tin thiu, khng chnh xc, hoc b sung cho thng tin v m rng h thng lm vic thu nht y thng tin. Hoc c mt giao din chn la trong mt vng ring bit ca Data Warehouse tip nhn cc yu cu ca ngi s dng.

Trng hp th hai lin quan ti vn thit lp cc kh nng d on yu cu ca ngi s dng. Quyt nh s trch d liu nh th no? Qu trnh trch chn c th c d liu t: C s d liu gc (original). C s d liu nh (image). Truy cp vo a hay bng t.

i vi cc ngun d liu ln th vic truy cp vo bng t s hiu qu hn so vi x l trong cc file ngun. iu ny c coi nh l vic lu tr s thay i v d liu. DBMS c th truy cp d dng ti mi s thay i d liu. Thi gian tr li trc tuyn (on line) v thi gian tr li bng truy cp phi xp x nh nhau i vi d liu ngun. Vic lu tr cc bin ng d liu gip ch cho qu trnh np d liu c th c th ch cp nht phn thm (phn b sung) thay v cp nht ton b d liu (total refresh). Quyt nh khi no th trch chn d liu - c tn s v thi gian. Bc quan trng l phi lp ra k hoch v tn sut tin trnh trch chn d liu. Mc ch ca bc ny l phi ti thiu ho cc tc ng ln cc h thng v thi hnh cc tc v ny trong mt ca s x l theo l (batch window). i vi cc bng d liu khc nhau th tn sut trch d liu s khc nhau. Vic trch d liu cn ph thuc vo s nh hng t h thng ngun v loi d liu s c trch. V d nh i vi d liu bn hng th nn trch d liu hng ngy nhng vi d liu sn phm th ch cn trch hng tun. C cc cng c v cc chng trnh tin ch phc v cho qu trnh trch chn d liu. Chng hn, cc trnh tin ch np nhanh trch ly d liu, cc phng tin d dng ti to li c s d liu, cc cng c s to ra m bng cc ngn ng lp trnh th h 3 GL hoc 4 GL trch ly d liu. Cc vn xung quanh vic trch chn d liu bao gm c cu thi gian trong d liu c trch ly v hiu qu ca vic trch chn d liu .

- 80 -

Data Warehouse

Trong bt k phng thc no c chn trch ly d liu, Metadata cng ng vai tr quan trng ca qu trnh x l. Metadata mu bao gm cc phn: cc nh ngha ca h thng ngun, cc khun dng vt l, phng thc v bn lit k ca s trch chn d liu. C th dng cc cng c hoc to ti liu bng tay thu c Metadata . Change Data capture: pht hin ra nhng thay i c thc hin i vi d liu trong h thng LS thng qua vic c Log tape. Nhng thay i l hnh ng chn thm, cp nht v xo cng nh thng tin ca ct hoc hng lin quan ti hnh ng . Nhng thay i ton b c ghi li v sau c p dng theo trt t m cc thay i c thc hin trong h thng tc nghip. 2. Lc (Filter), lm sch d liu (Cleaning) Sau khi d liu c trch chn, n c tinh ch thng qua cc cng vic lc, lm sch thu d liu d liu khng b thay i v ng vi cc d liu nghip v. Qu trnh trnh lc, lm sch d liu kim tra v sa cha cc li c th c ca d liu m bo tnh ng n ca d liu. Cng vic ny bao gm cc thao tc dn dp (scrub), thay i v tnh ton li d liu. Lm sch d liu lin quan ti mt s hoc tt c cc tc v sau: kim tra tt c cc trng n l hoc cc trng c lin kt cho nhau, a ra v hp nht cc bn ghi trng nhau, thu nht sp xp li cc bn ghi ging nhau m b thiu cc kho bnh thng t cc h thng khc nhau. C 5 bc lm sch d liu: 1. Ch ra c tiu ch d liu nh th no c gi l d liu sch. 2. Ch ra c mi quan tm ca ngi s dng kim tra v nh gi d liu ngay sau ln truy cp d liu u tin. 3. Lp ra bn bo co v li ca d liu nu d liu vi phm tiu ch trn. 4. Tm ra chin lc sa cc li trn c th lm bng tay hoc t ng. 5. Tm ra chin lc lu di bng cch sa li h thng thc hin hoc a ngc thng tin t Data Warehouses vo h thng thc hin. Cc phng php lm sch d liu: Sp xp v lm sch d liu. Xc nh cc vng v phm vi. Cc thut ton thng minh (Heuristic algorithms). Logic m (fuzzy logic).

Cho d phng php no c chn i na th cng cn phi lp i lp li cc bc sa li cho h thng thi hnh hoc thc hin li qu trnh lm sch khc vi mt s d liu va c lm sch. V d nh qu trnh lm sch d liu phi tm ra c gi tr d liu khng thuc phm vi cho php hay cc mi quan h khng tn ti (nh vic khch hng t hng khng c trong kho).

- 81 -

Data Warehouse

Qu trnh lm sch lin quan n cc tc v: Kim tra cc trng d liu n l (khng c quan h vi trng khc) v kim tra cc trng c mi lin h vi trng khc, Ch ra v tng hp cc bn ghi trng nhau (duplicated records) cng nh sp xp li cc bn ghi ging ht nhau (identical records) b thiu kho t cc h thng khc nhau.

c mt chin lc tt cho vic lm sch d liu th hy thao tc trc tip vi d liu. 3. Thm nh (Validate) v chuyn i (Transforming) d liu Tip theo, d liu phi c kim tra, thm nh m cht lng nhm p ng cc yu cu phn tch phc v tr gip quyt nh. Cc cng c h tr thc hin nhng cng vic nu trn da vo mt tp cc thng s c xc nh trc, v c th s dng logic m hoc trin khai cc thut ton heuristic( c tnh t tm ti ly: c th hiu l thut ton thng minh). Cc thut ton heuristic c tp lut m rng v m phng suy din ca con ngi lm cho vic iu tra tin hnh c nhanh hn. Trc khi c th chuyn i d liu, nn thit lp h thng o lng v chun ho cc ngha trong nghip v. Mc ch ca vic chuyn i v tch hp l chuyn d liu thnh thng tin v lm cho chng d hiu v d s dng hn i vi NSD u cui. M hnh d liu ch c th khc so vi m hnh ca d liu ngun. S khc nhau ny xy ra khi cc yu cu ca ngi s dng khc so vi dng thc ca d liu. Qu tnh ny bao gm cc cng vic chuyn i, thao tc, sp xp v chn lc d liu. Mc ch ca vic chuyn i v tch hp l phi chuyn d liu thnh thng tin c th hiu c v s dng c i vi ngi s dng. Vic chuyn d liu t dng ny sang dng khc c th gm 1 trong cc chc nng sau y: Chuyn trc tip d liu t trng ny sang trng khc. Xy dng li v nh dng li cc trng d liu, c th l ch chuyn mt phn hoc phi gp c cc trng li vi nhau to mt trng mi. Cc ma trn chun ho. Chuyn i t trng ch c m sang trng phi m t y . V d: trng m: F ------> trng m t l Female. trng m: M ------> trng m t l Male T nhiu trng ngun sinh ra trng ch khc. V d: if Field1="xxxx" and Field2="yyyy" then Field3="zzzz" Tp hp v tng kt li da trn mt nhm hoc mt mi quan h gia cc trng d liu. - 82 -

Data Warehouse

Legacy Legacy Source Source

Transformation

Data Warehouse

External Source

Transformation New C/S Application Operation Data Store

Hnh 6.2 Thu thp v chuyn i d liu 4. Tch hp (Integrated), ghp (Merge) v gp (Aggregate) d liu Khi c nhiu ngun d liu th chng cn thit phi c tch hp li hp nht v t chc li d liu cho ph hp vi kin trc v nhu cu s dng. Qu trnh tch hp c th l s phi hp cc thao tc sau y: sp xp v hp nht, chia ct, xc nh v gii quyt cc vi phm n tnh nguyn vn ca d liu, sinh ra cc kho tng hp (synthetic key). Tch hp thng tin t h thng ny sang h thng khc bao gm: Sp xp v hp nht khi mt bng d liu ch c to nn t nhiu ngun d liu. Khi d liu phi c sp xp li v loi b i cc bn ghi ging nhau. Sau giai on sp xp th d liu mi c hp nht thnh mt file d liu duy nht. Chia ct d liu nu nh t mt d liu ngun cn to ra nhiu d liu ch, hoc mt thuc tnh li nhiu trng cha d liu cho n. Trong trng hp ny file hoc trng phi c ct ti mc thp nht. a ra v gii quyt cc vi phm v tnh nguyn vn ca d liu. Mt gii php l lu tr d liu nh l d liu gi. To ra cc kho tng hp (nhn to) c kh nng tch bit v bo v khi cc bin ng trong h thng ngun. DW phi c xy dng khi b tc ng ca cc thay i v sa i, c bit l d liu kho. 5. Np, ti (Load) d liu vo kho Vic ti d liu vo kho d liu c th thc hin: Lm ti li d liu (Refresh): khng c d liu c trong bng (v d nh khng quan tm n khch hng sng u ngy hm qua m ch quan tm hm nay h u).

- 83 -

Data Warehouse

B sung (Incremental): to thm cc d liu snapshort vo bng d liu. To hng mi duy nht mi bng cch thm gi tr thi gian vo kho. Cp nht trong vng: gi nguyn cu trc kho trong hng tr cc hng b ht thi hn hoc ch cp nht ct khng phi l kho (v d nh cp nht gi tr tng tin ca tun ny bng cch cng gi tr ca tun trc vi tun ny). c trc v ti d liu (Preload & Load). Qu trnh c trc l vic t chc v qun l cc file chun b sn cho cc tin ch ca DBMS ch. Qu trnh ti d liu lin quan n vn tch hp vt l ca cc d liu mi hoc d liu b thay i vo DBMS ch, c th bao gm cc thao tc nh cp nht v ng gi d liu. Sa cha v nh gi (Repair & Evaluate). Trong qu trnh lm sch, chuyn i v tch hp d liu c th xy ra li, do vy trong mi trng cha d liu ngun phi c chc nng lm nhim v sa li ny. Qu trnh tip theo c th phi lm bng tay, khi d liu c th c sa li trong mi trng ngun hoc bng cc thut ton bo v c gn sn trong chng trnh ngun tu thuc phng thc pht trin. Chc nng cn li l phi nh gi c tnh ng n v s thch hp ca d liu c c v c sa li.

Thi gian cho cng vic ti d liu v kho (c th l hng ngy, hng tun, hng thng, hng qu, ... ). 6. Lu tr v pht tn d liu (Archive and Distribute) D liu c phn b t mt platform ngun ti mt platform ch khc. S phn b ny c th xy ra trc, sau hoc trong khi xy ra cc qu trnh lm sch, bin i v tch hp d liu. Qu trnh ny c th bao gm cc thao tc nh vn chuyn, chuyn i v phn pht d liu. Vic phn tn ph thuc vo kin trc ca Data Warehouses. Vic phn tn ch cn thit i vi cc Data Warehouses m d liu ngun trn mt platform v Data Warehouses, ODS hoc OLAP ch li trn mt platform khc. im quan trng ca vic phn tn d liu: Kim tra m bo rng cc file phn tn ti c ch. Kim tra li xem d liu ca cc file cn ng khng hay b sai (v d nh do li ng truyn). Kim tra ni file n xem liu c nhiu h thng ngun tn ti hay cha. Thi gian to ra file trn platform. V d nh nn file n ch khi no? hng m hay hng tun? Liu c nn thm qu trnh khc khi file n hay khng?

- 84 -

Data Warehouse

Lp k hoch cho s ng gia cc file trong trng hp h thng b hng. V d nh h thng khng lm vic ban m.

Vic phn tn file nn c c ch t ng kim tra cc s c (corruption).


6.2 Thit k kho d liu nghip v BDW

Phn trn chng ta tho lun v nhng k thut cn thit thu lp v to lp d liu cho kho d liu ni chung. Phn ny gii thiu qu trnh tp hp d liu m bo tinh t v tin cy t h thng d liu thao tc OS a vo kho d liu nghip v BDW. C ba vn chnh: Nm bt (Capture) d liu t cc h thng thao tc ODS Tinh ch cc d liu nm bt c a vo BDW Chuyn i d liu gia hai mi trng ngun v ch Khi xem xt ti vic s dng d liu phn tch, tr gip quyt nh th phi s dng d liu trong kho BDW ch khng phi s dng d liu trc tip t cc h thao tc OS, bi v: 1. Cc h thng thao tc OS t khi duy tr c y d liu lch s v cc s kin, thng d liu ngun l hay bin i. 2. D liu ngun trong cc h OS thng l khng thun nht v c th phi cu trc. 3. BDW cha cc record d liu c tnh lch s v cc nghp v theo cc thi k. 4. D liu ch trong BDW c t chc theo format th hin c mi quan h gia cc thc th v c cu trc. 6.2.1 T d liu thao tc n d liu trong BDW D liu thi gian thc trong h thng thao tc OS mi ch l d liu lm thi, c th cp nht theo tng thi k. D liu thao tc thng l rt ln, t c cu trc v n phn nh nhng mt khc nhau ca cng vic. c im ca d liu thao tc: D liu thay i (Transient), cp nht theo tng thi k. V d: cc CSDL v n hng, bn hng, khch hng, v.v. l cc d liu thao tc. Mt s record d liu ny c th thay i mt vi ln trong mt thng. V th ch c nhng record d liu khng b thay i na, ngha l nhng record d liu khng b cc giao tc tip theo lm thay i gi tr ca chng th mi c nm bt v a vo BDW. D liu v cc s kin v d liu nh k. Hng ngy c nhiu d liu v cc s kin c lu gi trong h OS. V d, d liu v bo him c th thay i theo tng thi k. Khi d liu khng b thay i na th c th thu np a vo kho BDW.

- 85 -

Data Warehouse

6.2.2 Cc k thut nm bt d liu Nm bt d liu tnh Nm bt d liu da vo chng trnh ng dng Nm bt d liu da vo cc h qun tr CSDL Nm bt d liu theo du n thi gian So snh cc tp d liu 1. Nm bt d liu tnh (Static Capture) y l k thut n gin c s dng thu thp thm d liu cho BDW. Nhim v l thu thp trc tip tt c nhng d liu bn vng, khng cn b thay i theo cc thi k na. S a dng ca cc ngun d liu gy ra nhiu kh khn cho qu trnh nm bt d liu thao tc. Cc h OS c th lu tr d liu thao tc di dng: Cc tp tun t, c ch s ho CSDL quan h CSDL hng i tng Ngoi ra cc dng biu din v lu tr d liu vt l cng rt a dng. Chng ta cn nm c nhng d liu chung nht, c format n gin, l tp con ca d liu thao tc. C ba chiu cn lu khi thu thp thm d liu: Chiu thc th (Entity Dimension). Tt c cc thng tin v tng ch c th s c thu nhn. d liu kt qu c a vo BDW s l bc tranh m t v nhng thc th tng ng. V d: trong SQL, c th chn nhng tt c cc thng tin v khch hng theo cu lnh SELECT SELECT * FROM CUSTOMER_FILE Chiu thuc tnh (Attribute Dimension). Nhiu khi ch cn thu nhn mt s thuc tnh ca thc th m khng cn tt c. V d, trong SQL SELECT Name, Phone_No, CITY FROM CUSTOMER_FILE Phm vi xut hin (Occurrence Dimension). Mt s trng hay record d liu v cc thc th c th c thu thp theo x xut hin ca chng trong nhng iu kin nht nh. V d: SELECT FROM * CUSTOMER_FILE

- 86 -

Data Warehouse

WHERE

CITY = Ha Noi

K thut ny c th c m t nh sau:
Operational System

Business application function File or DB access component

DBMS
Operational Data Static Capture

Static Capture Data

Hnh 6.3 Thu thp d liu tnh 2. Thu thp, nm bt d liu da vo cc chng trnh ng dng Cc chng trnh ng dng c th cung cp nhiu record d liu bt bin t nhng d liu tc nghip. Thng qua cc chng trnh ny chng ta c th thu thp c nhng d liu sau khi c cp nht v s khng b thay i tip a vo kho.
Operational System

Business application function File or DB access component


Application-Assisted Capture

Operational Data

DBMS
Complete Incremental Changed Data

Hnh 6.4 Cch thu thp d liu da vo cc chng trnh ng dng Nh chng ta bit, d liu thao tc thng l d b thay i theo cc thi k, do - 87 -

Data Warehouse

vy d liu m chng ta cn a vo kho l nhng d liu xut hin ngay sau khi chng c cp nht. V d, xt d liu c cp nht nh hnh sau: Trc khi cp nht k4568 My tnh Feb., 12, 2002
UPDATE

148 S kin

k4568 Feb., 28, 2002 10 Sau khi cp nht k4568 My tnh Feb. 28, 1968 158

Hnh 6.5 Cp nht record d liu D liu c th c thu np sau khi kt thc cc s kin hay kt thc cc giao dch. Tuy nhin y c hai vn : 1. C nhng trng d liu (ngoi trng kho) khng b thay i trong cc giao dch hoc bi cc s kin 2. Mt s d liu c th b thay i bi cc s kin. Trng hp th nht, nu chng trnh ng dng thu nhn nhng d liu bt bin , v d My tnh, th quay li c d liu t cc file, hay CSDL gc. Trng hp th hai th chng trnh ng dng phi c t d liu gc v cn c vo d liu c cp nht tnh ton v cho ra d liu cui sau khi thc hin cc giao tc. Trong mt s trng hp th h qun tr CSDL c th m nhn nhng cng vic ny. 3. Thng qua cc h qun tr CSDL Khi d liu thao tc c qun tr bi mt h DBMS th c th s dng cc b hnh ng (Trigger, log) ca cc h thu thp d liu. Chng ta c th xem Trigger ca mt h qun tr CSDL nh l nhng th tc chuyn bit v c hot ng di nhng iu kin hay bi nhng s kin nht nh. Tt nhin cc Trigger c ci t ph thuc vo tng h DBMS. nm bt c d liu trng thi, th Trigger, log phi c kh nng xc nh c rng th tc thu nhn phi thc hin sau cc dng d liu c thay i, m bo rng d liu thu nhn l nhng d liu sau khi cp nht, sau khi thay i.

- 88 -

Data Warehouse

Operational System

Business application function File or DB access component

DBMS

Trigger Capture

Operational Data

Log Capture

Complete Incremental Changed Data

Hnh 6.6 Thu nht d liu nh DBMS 4. Thu nhn d liu da trn cc du thi gian y l cch c bit ca phng php thu nhn d liu tnh theo cc phm vi xc nh. Dc liu thu nhn c theo k thut ny l nhng d liu m trong c t nht mt trng thng tin v thi gian. Nhng record d liu m chng ta quan tm l nhng record c thay i sau ln thc hin cui cng ca chng trnh ng dng v du thi gian s ca n s mun hn trc .
Operational System

Business application function File or DB access component

Timestamped Operational Data

DBMS
Complete Incremental Changed Data

Timestamped- based Capture

Hnh 6.7 Thu nhn d liu da vo du thi gian Phng php thu nhp ny khng ng vai tr to lp hoc duy tr thng tin v thi gian. Cc chng trnh ng dng to ra v cp nht d liu thao tc v vy phi qun tr - 89 -

Data Warehouse

c nhng thng tin v thi gian ca nhng d liu trn nh hnh 6.7. Nu d liu thao tc c th bin i theo thi gian th vic la chn d liu s khng thc hin c trc tip sau mi thn cp nht. V d: d liu thao tc c th thay i nh trong hnh 6.8. Mt n hng t thi im 9 30 ngy th Hai (Monday) c 100 n v mt hng cn mua. Sau ngy th hai, d liu v n hng c th c np vo BDW. Sang ngy th Ba (Tuesday) vo lc 11h 30, khch hng quyt nh tng s hng t mua ln 200 v h thao tc s phi th hin c yu cu bng cch vit t 100 thnh 200. Nhng n 4h chiu cng ngy th ngi bn hng quyt nh ch giao theo n hng ny l 125 n v mt hng cho khch, cn li 75 n v chuyn sang n hng khc. Nh vy, trng thi ca n hng lu vo h thng thao tc vo m ngy th Ba l 125 n v. Sau , d liu v n hng ny c thu np v a vo kho BDW khi cng ty giao xong hng cho khc.
h

Khi d liu c a vo kho th nhng d liu h thng thao tc c th xo c. Operational Data (transient) Monday 09:30 AM 100 Monday

Tuesday Tuesday

11:30 AM 04:00 PM

200 125

Tuesday

Hnh 6.8 V d v vic thu nhn d liu theo du thi gian 5. So snh cc file y l phng php cui c th p dng thu nhn d liu t h thng thao tc. N i hi phi sao li d liu trc khi n c cp nht (v d nh m hm trc), v s dng file ny so snh vi file d liu hin thi (vo m hm nay). Hnh 6.9 minh ho phng php thu np bng cch so snh cc file. Kt qu l tp nhng d liu c thay i trong thi k . Phng php ny c hai hn ch chnh: Phi lu gi c hai bn copy ca file hoc CSDL trc v sau khi thao tc Phi thc hin sp xp, i snh mt khi lng rt ln cc record d liu pht hin ra mt s t nhng d liu thay i. Nhng n cng c u im: N thc hin c trong mi iu kin, c bit n s thch hp khi d liu c lu trong cc file v khng phi cc CSDL. Khi khng thc hin c phng php thu np d liu nh vo DBMS, v nu d liu li khng ghi du - 90 -

Data Warehouse

thi gian th ch phng php so snh file l thch hp. Khng i hi phi thay i chng trnh ng dng duy tr d liu gc, cng nh khng cn thay i cu trc, ni dung ca d liu. .
Operational System

Business application function File or DB access component

Operational data {current}

Operational data {previous}

File Comparision Capture

Complete Incremental Changed Data

Hnh 6.9 Phng php so snh cc file thu np d liu 6.2.3 Cc cu trc d liu kt qu thu nhn c C hai yu cu chnh i vi cu trc kt qu ca qu trnh thu np d liu D liu phi c lu tr theo cc format d s dng, d chuyn i v d tng thch Ni dung v cc c tnh ca d liu cn phi c m t, c lp ti liu. D liu kt qu c th ti v BDW t cc h thng thao tc nh k theo l (by batch), nh v m, gii, ngy ngh, ngy l, v.v. u im chng ta d nhn thy, nhng cng c nhng nhc im l cn phi c chin lc iu khin cht ch m bo tnh ton vn v nht qun ca d liu.
6.3 p d liu vo BDW

Mt b phn ca qu trnh ti to d liu l p dng cho qu trnh thu nhp v chuyn i d liu vo nhng kho d liu ln. C th p dng mt trong bn cch thc sau: 1. Ti v (Load). a vo kho BDW nhng d liu c ti v hoc c ti thm v a vo kho. 2. B sung (Append). B sung cho nhng mc d liu cn thiu trong kho. D liu c trong kho phi c bo v v nhng record mi c th c b sung vo

- 91 -

Data Warehouse

kho, tuy nhin cn tu thuc vo ni dung nhn v v tu thuc vo CSDL ch, nu chng th thi. 3. Tho b s kt hp (Destructive merge). p dng s kt hp d liu thay th vo d liu c trong kho. Khi kho ca d liu thay th v d liu trong kho snh c vi nhau th d liu c trong kho c cp nht tng ng, cn khi thuc tnh kho ca chng khng snh c vi nhau th nhng record mi s c b sung vo kho. 4. Thit lp s kt hp (Constructive merge). Cch thc ny tng t nh phng php trn, nhng cng c mt cht khc. Khi kho ca d liu thay th v d liu trong kho snh c vi nhau th d liu c trong kho c nh du nh l khng s dng nhng khng vit chng. D liu nhn v do vy lun c b sung vo kho. D liu thu nhn c p vo kho BDW theo hai trng hp: Vic la chn cch thc no trong tnh hung c th s tu thuc vo kiu ca d liu ph thuc theo thi gian. D liu nh sao chp (Snapshot Data). i vi d liu nh sao chp th c th s dng cch ti chng v kho. Sau khi ti nhng d liu v th c th s dng cch b sung m rng d liu khi cn thit. D liu tm thi, hay thay i (Transient Data). D liu tm thi c c ti v, b sung v c duy tr bng cch tho b s kt hp. D liu tm thi khc vi d liu sao chp ch yu tn xut cp nht, d liu tm thi c th thay i lin tc, nhiu ln cn d liu sao chp th ch c th thay i trong nhng thi khong nht nh. D liu nh k (Periodic Data). Theo cch nhn lch s ca nghip v, c th s dng cch ti v to ra d liu nh k. Cch thc thit lp s kt hp l hiu qu nht thu np d liu nh k, bi v nhng record d liu nh k mi c ti v khng bao gi b xo. Vic p d liu vo kho BDW c thc hin theo hai tnh hung: 1. Trng hp khi u, p d liu nh l mt phn ca qu trnh to lp khoBDW 2. Duy tr kho d liu. 6.3.1 Ti d liu y l cch n gin nht v c s dng rng ri nht. Cch ny c p dng ti d liu v theo nhng format ca CSDL, v d CSDL quan h. C ba kh nng: 1. Nu cc bng quan h cha c thit lp th phi to ra chng ri mi ti d liu v

- 92 -

Data Warehouse

2. Nu nhng bng quan h c sn th n gin l nhp d liu v 3. Nu d liu ch c th d liu c ti v s thay th cho nhng d liu c. Ba trng hp trn c m t nh sau:
Static capture data

Create table
Load

BDW (day 0)

Load data
BDW (Intermediate)

BDW (day 1) Hnh 6.10 Ti d liu vo BDW 6.3.2 B sung d liu Sau khi kho d liu c to lp, cch thc b sung (Append) m rng thm d liu. V d, c th b sung thm mt s dng, mt s ct vo bng d liu c sn nh hnh sau:
Static capture data

BDW (day 1)
Append

BDW (day 2) Hnh 6.11 B sung d liu vo kho

- 93 -

Data Warehouse

6.3.3 Thit lp s kt hp Mc ch ca cch thc ny l nhm xy dng tp d liu nh k cho BDW t nhng d liu thu nhn c v phn nh ng cc trng thi hin ti v trng thi trc ca ngun d liu thao tc. N c thc hin theo hai bc: a/ B sung thm d liu thay i vo BDW b/ Thay i theo bn cht t nhin ca d liu. Hnh sau m t nhng d liu mi sau 4 ngy c ghp vo kho BDW.
Incremental changed data

Constructive merge

BDW (day 3) Key Start 001 01-01 002 01-01 003 01-01 004 01-02 002 01-02 004 01-03

End # 01-02 # 01-03 # #

Flag A A A A C C

Data

Change or delete
BDW (day 4) Key Start 001 01-01 002 01-01 003 01-01 004 01-02 002 01-02 004 01-03 005 01-04 003 01-04 004 01-04 End # 01-02 01-04 01-03 # 01-04 # # # Flag A A A A C C A D C Data

Insert

Hnh 6.12 Thit lp s kt hp trong thu nhp d liu Mt s record d liu c b sung vo BDW, v d record 005. Thi im bt u l 01-04, thi im kt thc End l #, thi im no (khng cn c th), ngha l record d liu ny l d liu hp l hin thi v c Flag t l A. D liu c t c A ch ra rng record d liu ny l c to lp mi v c b sung vo BDW (Addition). Tip theo l chn thm record 003 vi du thi gian l tip theo ca record ny c trc (thi gian bt u Start l thi im kt thc End ) cn c c t l D. Nhng record d liu c c D th nhng record d liu tng ng trong h thng d liu thao tc l s b xo (Delete). Bc tip theo ca k thut kt hp trong thu np d liu l

- 94 -

Data Warehouse

ghp nhng record d liu c mc ni vi nhau, v d nhng record c cng kho 003, to ra record d liu mi. Khi nhng record c khng cn hp l na v phi c xo i. Tng t, record 004 c chn thm v thng qua c C b thit lp s kt hp pht hin ra rng record ny l dng cp nht ca reocrd 004 trc . Ni dung cp nht l thi gian bt u: 01-04 v kt thc l #. Trong k thut thit lp s kt hp, c hai chc nng c bn: 1. Thc hin chn nhng d liu c thay i, k hiu bi ng mi tn nt lin. Chc nng ny c th ci t bng cc cu lnh ca SQL dng: INSERT INTO BDW_T VALUES (00n, currentTime, ) 2. Thay i hay xo b d liu, c biu din bng ng mi tn t nt. Trong SQL c th vit: UPDATE BDW_T A SET (END) = (SELECT MAX(START) FROM BDW_T B WHERE B.END = # AND A.KEY = B.KEY) WHERE A.START = (SELECT MAX(START) FROM BDW_T C WHERE C.KEY = A.KEY) 6.3.3 Chuyn i d liu Chuyn i d liu l mt nhim v quan trng trong qu trnh thu np v a d liu vo kho. N chp nhn format ca d liu gc v c th thay i format ny ph hp vi mc ch s dng trong kho. Trong kho d liu, cc format d liu thng phi thay i theo yu cu s dng, t dng n gin sang dng phc tp. V d: Ngi s dng yu cu sao chp d liu c lu trn cc my ln, nhng d liu tc nghip sang my tnh PC, phc v cho c nhn. Vn chnh y phi quan tm khi thc hin sao chp d liu l phi bit c biu din v lu tr d liu trong hai mi trng s khc nhau nh th no. Hay mi chi nhnh Ngn hng yu cu lu tr d liu BDW nh l tp d liu cc b, ch cha nhng thng tin v khch hng ca h. Vy lm th no chuyn c d liu v cho kho d liu ca ngn hng Trung tm. Nh vy, chuyn i d liu l mt b phn ca qu trnh ti to d liu nhm thc hin chuyn nhng d liu khc nhau v cc cu trc logic v vt l v mt dng thng nht theo nhng qui tc nh sn. Trong mi trng rt khc nhau v nghip v v cc yu cu k thut, mun thc hin c chuyn i d liu th phi thc hin cc chc nng sau: La chn (Selection): chn mt phn hay tt c cc trng ca cc record d liu c ti v theo mt tiu ch no . - 95 -

Data Warehouse

Phn tch / chp ni (Separation / concatenation). K thut phn chia hoc kt ni cc record d liu m bo duy tr c cc mi lin h ca cc record d liu theo kho nguyn thu. Qu trnh chp ni cho php mt m rng mt record d liu vi nhng d liu v mt ch c chi tit hn. V d, thng tin v cc kiu sn phm c lu gi v duy tr nhiu CSDL tc nghip khc nhau cho nhiu phng ban khc nhau, nh cc kiu ng gi v kich c sn phm c xc nh t cc x nghip sn xut, cn gi bn th t phng Marketing. Tt c nhng thng tin ny c chp ni li theo thuc tnh kho sn phm a vo BDW. 1

Separation
1 1

Concatenation
1 Hnh 6.13 Tch / nhp d liu Chun ho / i chun (Normalization /denormalization). Chc nng ny lin quan cht ch vi chc nng trc. Chun ho thc cht l thc hin chia nh t record d liu vo thnh mt s record d liu, cn i chun l lm ngc li. 1 2 Normalization 1 2 2 Denormalization 1 2

- 96 -

Data Warehouse

Hnh 6.14

Chun / i chun d liu

Kt tp, gp li (Aggregation). Kt tp l qu trnh chuyn i d liu chi tit v dng tng hp. Ci tin (Conversion). y l dng chuyn cc gi tr ca trng d liu n v mt s dng lin quan. V d: + Chuyn cc vn bn vit thng, vit hoa hn hp v dng tt c u vit hoa. + Chuyn i gia cc h thng cc s o v h o chun. + Chuyn t h m ny sang h m khc, ASCII v EBCDIC, v.v. Lm phong ph thm (Enrichment). Chc nng ny s dng d liu t mt hay nhiu trng d liu nng cp, tng hp chng to ra cc trng d liu mi.
6.4 Duy tr v Trin khai kho d liu

Trin khai lin quan ti thc t nm ngoi DW, ngoi Metadata, lin quan ti vic thc hin, o to v gio dc, qun l cc nhu cu v cng c truy vn ca ngi s dng u cui, lu tr cc d liu c. Giai on trin khai a nhng thnh phn ph v c nh vo DW. Giai on trin khai a vo cc phn trang tr v nhng b phn bt ng vo Data Warehouse. Ti giai on ny, cng c truy nhp v phn tch phi c la chn cung cp tp kt qu. Vic trin khai Data Warehouse nh hng ti 3 vng chnh: Cng ty. Ton b thng tin tr gip. Nhng ngi s dng u cui.

6.4.1 Trin khai vi cng ty Vic qun l Metadata ng ngha vi c phng tin qun l Metadata lin kt vi d liu phn tn v cc hm chc nng.

- 97 -

Data Warehouse

Nhng cng c qun l Metadata

Cng c qun l Metadata People

CASE Tools Legacy Application Archieve Tools Middleware Tools MD

Scheduling RDBMS Catalog Security Tools End User Access Tools

6.15 Cc cng c qun l Metadata


Cc cng c trn c tnh cht chung l: Phi ph hp vi ng cnh ca d liu tc nghip. Tr gip iu khin li gii thch, v d nh m t cch chuyn i d liu t thng trc thnh d liu thng ny. Cung cp Metadata. Kh nng xut/nhp thng tin tr gip thng tin ti mt a im, mt thi im nhp nhp vo thng tin. Thng tin c kh nng c nhp vo t CASE tools, cc ng dng tn ti, cc thng k d liu quan h AND/OR. Cn xut th xut ti cng c truy vn c bit, EIS, DSS v cc bn bo co. D hiu v d s dng Metadala.

- Thng tin Metadata phi c chun b tt cng thi im a DW vo s dng.


Cc cng c qun l Metadata Cn c cc cng c qun l Metadata lin kt cc thng tin t cc ng dng tn ti. Cng c dng xy dng, duy tr v s dng Data Warehouse, nh ngha cc bng trong Data Warehouse v tri thc tn ti ca mi ngi trong Cng ty. Thng tin c th truy nhp c mt cch linh hot hoc di chuyn c bng cc thao tc xut/nhp. Mt iu quan trng l vic tn ti mt trung tm lu tr ca Metadata. Nu nh mc tm tt thp v cao nhng mt bng khc nhau th rt cn thit c Metadata nhng platform ny. S hin din ca Metadata c th l s chuyn i ti ngi s dng u cui.

- 98 -

Data Warehouse

1. Cc cng c CASE Cc cng c CASE c th c thng tin cc ng dng Legacy/OLTP, bn thit k ca ODS, ht nhn ca Data Warehouse. Cc lin kt Metadata cha mi thng tin ca cc cng c CASE. Thng tin cc cng c CASE c th cha cc thc th logic v cc mi quan h, cc kho chnh, cc kho ngoi v c cc nh ngha v chng, bng vt l v cc nh ngha ct, cc nh ngha cng vic ca cc bng v cc ct, lut nghip v, dy cc gi tr, .v.v. 2. Cc ng dng legacy/OLTP: cc h thng thng tin cn c a vo Metadata hiu c d liu g ang tn ti, cc d liu lin kt vi nhau nh th no, thi gian tn ti d liu v cch s dng d liu. Lc ny Metadata s cha c thng tin v hot ng, cng ngh v thao tc Metadata. 3. Cng c lu tr (Archive Tools): cng c to b nh cn thit km vi lu tr cc hng trong mt bng ca cc bng y . Cc bng ny c lu tr km vi Metadata phn nh rng d liu g c lu tr, tr gip tm kim v a ra thng tin. 4. Cc cng c Middleware: cng c ny cn truyn thng tin c m t khi nhng s chuyn giao thc t xy ra v cung cp s liu v s cc bn ghi c chuyn giao thc t. 5. Cng c truy vn v phn tch d liu (End user access and analysis tools)c th dng chung trong cng ng cho nhiu ngi s dng khc nhau. Cn c Metadata i km m t v truy vn, bo co v cch s dng. 6. Cng c m bo an ton (Security tools): cng c cn thit trong Data Warehouse v tu thuc vo loi thng tin. Nh d liu chuyn t mc nguyn t (atomic) ti mc tm tt, d liu cng km quan trng th yu cu an ton d liu cng gim. S an ton nm hai mc thng tin, an ton mc bng v an ton mc ct. 7. Danh mc RDBMS(RDBMS catalogs): lu tr thng tin v bng, cch s dng bng v s lng thc t ca d liu trong cc bng. Metadata cn cng c ny iu khin thng tin v tc ng bng cc cng c khc t chc li cc ch s v m rng c cc thng s. 8. Cng c lp lch (Scheduling tools): bao gm cc yu cu nh gi cho cc phn vic, hn mc thi gian m cc cng vic phi c hon thnh. 9. Cong ngi (People): l mt phn v cng quan trng ca cc nh ngha Metadata v cch s dng Metadata.

- 99 -

Data Warehouse

Duy tr h thng thng tin

Trin khai h thng thng tin Roll out - Snow ball effect - Feed back loop System Ongoing Maintenance - Turning - Performance Strategies Archieve Strategies Hnh 6.16 Duy tr h thng S ng trong vic trin khai mt Data Warehouse c th thy : Roll out (chit xut), Ongoing Maintenance (tip tc vic duy tr tr) v chin lc lu tr. Chit xut (Roll out) Giao din khai thc thng tin ca Data Warehouse ph thuc vo cng vic, vo nhng ngi s dng u cui v thng tin c phn tch. Vi roll out khi u, IS phi nh ra mt vi hp truy vn v bo co. Nu nh ngi s dng u cui nhn c mt bn bo co trc khi cc cng c c cung cp cho Data Warehouse th phi to li bn bo co. qun l c roll out ta phi: Tin hnh lm th im nh nhng ngi s dng u cui trc tin. C mt bn k hoch chin lc roll out cho nhng ngi s dng u cui cn li. Chia nh th tc m bo chn rng cc yu cu c gii quyt mt cch nhanh chng.

Tip tc duy tr y l vic quan trng nht quyt nh s sng cn ca Data Warehouse. Qu trnh bo tr bao gm: Ci thin thng tin hoc pht trin thng tin vng thuc i tng nghin cu. Nhng thay i i vi Data Warehouse phi tht nhanh thch ng vi nhu cu s dng. Kim sot v iu chnh vic thi hnh ca Data Warehouse vi RDWMS v cu

- 100 -

Data Warehouse

trc vt l v t chc li cc ch s. nh danh cc bng tng hp. iu chnh cc truy vn v bo co SQL. B sung thm ngi s dng v tng cng an ninh.

Theo cc ch tiu cng nghip, 85% truy nhp l trong cc form truy nhp c lp trnh v 15% l cc truy nhp l ri rc, bt thng. chuyn cc truy vn c bit thnh cc truy vn c lp trnh trc (pre-programmed) cn chn cc nhu cu ca ngi s dng, phn tch phm vi hot ng ca ngi s dng v to cc truy vn c kh nng lp i lp li tho mn nhu cu ca h. Vic to lp Data Warehouse hay cc ng dng truy cp Data Warehouse ca ngi s dng u cui ging cc ng dng EIS phi c lp k hoch ngay khi bt u a h thng vo s dng. Chin lc lu tr Chin lc ny tn ti hai lnh vc nghip v v k thut. Chin lc lu tr nghip v m t nhng g cn lm vi cc thng tin trong Metadata. Chin lc lu tr k thut m t mi trng lu tr thng tin. Nghip v (Business): Khi thit lp cng thc cho chin lc lu tr c th gp phi cc vn : Khi no th lu tr thng tin? iu khin cc thng tin c lp nh th no? Metadata nhn c mt phn nh nh th no? Chng ta cn lu tr Metadata hay n lun trng thi sn sng?

K thut (Technical): Kha cnh k thut trong chin lc lu tr nn c phc tho trong c s h tng. Nhng vn chnh nh a ch trong c s h tng ca k thut l: thi gian, kiu, chi ph truy nhp v kh nng c th ti s dng. 6.4.2 Trin khai i vi ngi s dng u cui Vn ct yu trong chin lc trin khai ngi s dng u cui l ai? Xc nh ngi s dng u cui tr gip cho cng c phn tch v truy nhp mc tiu s dng.

- 101 -

Data Warehouse

End User Deployment Type of User

Executive - EIS - Highly sumarized Line manager - EIS, OLAP - Lighly sumarized Analyst - OLAP, Ad hoc query - All levels Technical - Monitoring Tools - All levels

Hnh 6.17 Mi trng trin khai cho NSD Cc loi ngi s dng Nhng ngi thc thi cng vic: nhng ngi cn nhng h thng thng tin tc nghip thc hin cc cng vic ca h. Cc cng c truy cp l EIS. D liu trong EIS l d liu mc tng hp cao vi s cnh bo c xc nh trc. H c mt trc gic cao vi mt vi tu chn c xc nh c th. Nhng ngi qun l: mc qun l mt b phn cn bc tranh tng th v c chi tit thm lm r cho bc tranh tng th . Cng c truy nhp ca cc ngi s dng ny c kt hp t cng c EIS v OLAP. Phm vi ca d liu l tng hp. Nhng ngi phn tch: lm nhim v phn tch nhng d liu nghip v nh hng v xc nh bin php thc hin. Cng c truy nhp ca ngi s dng phn tch bao gm cng c OLAP, cng c truy vn c bit, nhng bn bo co c - 102 -

Data Warehouse

nh trc. Phm vi d liu truy nhp th tt c cc mc ca Data Warehouse. c bit, h c th truy nhp d liu bng nhiu cch vi nhiu giai on. Cc k thut, chuyn vin: i din cho nhng ngi lin quan n phng din hot ng ca doanh nghip. Trch nhim chnh ca h l tr gip cho tt c cc kiu ngi s dng trn. S truy nhp ca h l lin tc v bao gm nhng yu cu cho mt bn bo co chun. Cc kiu ngi s dng Nhng ngi s dng u cui c th c chia thnh cc kiu khc nhau da vo kinh nghim v kh nng v k thut ca h, phc tp m h c th thc hin phn tch d liu. Nhng NSD ngu nhin (Casual): mt ngi s dng ngu nhin l ngi s dng u cui m khng cn c nhng kinh nghim tt i vi nhng cng c truy vn v phn tch d liu. Cc giao din ca Data Warehouse vi nhng ngi ny l mt giao din cho php thc hin cc yu cu bng cch la chn n gin t menu hoc bng cch n mt phm. Mt vi ngi s dng kiu ny c th di chuyn vo s dng giao din cng c truy vn t nhin ("native") tm cc truy vn c sn v thc hin chng. Nhng ngi s dng kiu ny cn truy nhp vo Metadata c bn. NSD tim nng (Power): l nhng ngi c bit dnh nhiu thi gian hc hiu nhng cng c truy vn v l nhng ngi quan tm n cu trc ca c s d liu c trin khai trong Data Warehouse. Nhng ngi s dng thuc loi ny thng vit nhng yu cu ca h khng nhng bng cc cu lnh SQL m cn thng xuyn s dng cc truy vn ang c sn. Ngi truy nhp thuc loi ny cn phi truy nhp nh vy hon thnh cc Metadata bi v bn thn h l ngi vit ra cc cu hi v thc hin phn tch mc phc tp hn. H cn phn tch hiu c cc cng vic v k thut Metadata. NSD k thut (Technical): ngi s dng thuc kiu k thut c khuynh hng tr thnh chuyn nghip i vi cng c truy nhp v phn tch d liu. H tn dng hu ht cc c im tin b ca cng c. H lun a ra cc truy vn ca mnh cng nh cc truy vn phc tp cho nhng ngi s dng khc. Metadata cn c hon thin v phi d hiu i vi nhng ngi s dng loi ny v s phc tp ngu nhin ca cc cu hi do h vit ra. H cn bit mi quan h phc tp ca d liu trong Data Warehouse v cng hiu c ngun gc ca n trong h thng iu hnh. Nhng vn khc: Qua vic phn loi NSD nh trn, ta thy ngi s dng u cui l quan trng ca qu trnh truy vn. Ngi s dng u cui s kim tra tnh ng n v chnh xc d liu ca Data Warehouse. Nu c s khc nhau gia bn bo co h

- 103 -

Data Warehouse

nhn c t h thng bo co v t Data Warehouse th s khc nhau phi c gii thch. Thng qua vic s dng Metadata ngi s dng u cui c th c ch ra cc bc chuyn i thng tin. Cc bo co trong Data Warehouse cng l mt yu t quan trng i vi cng ng ngi s dng u cui.

- 104 -

Data Warehouse

PHN III: Cc k thut phn tch v khai thc d liu


chng VII: Truy cp v khai thc d liu
Cc k thut truy cp, phn tch d liu X l phn tch trc tuyn OLAP Qun l v qun tr kho d liu H thng phn phi thng tin
7.1 Truy cp v phn tch

Mc ch chnh ca phng php DW l cung cp thng tin cho nhng nh nghip v to ra nhng quyt nh chin lc. Nhng ngi s dng ny lin h vi DW thng qua vic s dng cng c u cui. Vi cc mc ch khc nhau cng c c chia ra lm nm loi chnh sau: Query Cc cng c to bo co v truy vn d liu Cc cng c pht trin ng dng Cc cng c thc hin h thng thng tin (EIS ) Cc cng c x l phn tch trc tuyn (OLAP) Cc cng c o xi d liu SQL Reports File Maint. Dictionary Comments History

Application-Based Access Desktop Tool Access

Data Warehouse Highly Summarized Lightly Summarized Detail Level

Hnh 7.1 Cc cng c h tr khai thc kho d liu


- 105 -

Data Warehouse

Cc cng c lp bo co
Cc cng c lp bo co c chia ra thnh cng c lp bo co v vit bo co (Report writer). Cng c lp bo co Production to ra nhng bo co tc nghip thng thng hoc h tr nhng cng vic x l theo nhng khi ln, chng hn nh tnh ton hay in nhng phiu kim tra thanh ton. Cc cng c lp bo co loi ny s dng ngn ng th h th ba nh Cobol, ngn ng th h th t nh Information Builder, .v.v. Report writer l mt cng c thch hp cho my tnh bn khng t tin c thit k cho ngi s dng u cui. Nhng sn phm nh Report Crystal ca hng Seagate cho php ngi s dng thit k v chy cc bo co m khng phi da vo b phn h thng thng tin. Nhn chung Report writer c mt giao din ha v cc chc nng biu c xy dng sn. Chng c th ko cc nhm d liu t rt nhiu ngun d liu khc nhau v tch hp chng li trong mt bo co duy nht. i u trong lnh vc ny l cc nh sn xut nh Report Crystal, Actuate Software Corp, IQ Software Corp, ... Cc hng cung cp phn mm ang c gng tng tnh n nh ca Report writer bng cch h tr nhng kin trc ba lp trong qua trnh x l bo co c thc hin trn server Window NT hoc Unix. Report writer cng bt u xut cc gii php giao din hng i tng cho vic thit k v thao tc nhng bo co, to nhng modul cho vic thc hin truy vn c bit v phn tch OLAP.

Cng c truy vn qun l


Nhng cng c ny khng ch phc tp ca ngn ng SQL v ca cu trc c s d liu v m bo chng trong sut vi ngi dng bng cch chn thm vo mt siu tng (metalayer) gia ngi s dng v c s d liu. Metalayer l mt phn mm cung cp nhng khung nhn (view) hng ch ca mt c s d liu v h tr vic to ra SQL bng cch chn v nhn chut (point-and-click). Nhng cng c ny rt ph bin v chng gip cho nhng ngi t hiu bit cng c th truy nhp vo d liu m khng cn phi c s can thip ca IS. Hu ht nhng cng c loi ny u c kin trc 3 lp ci thin tnh n nh. Chng cng h tr thc hin nhng truy vn khng ng b v vic tch hp vi Web Server. Nhng hng bn phn mm ang chy ua trong vic nhng h tr nhng c tnh OLAP v khai ph d liu. Hng Oracle a ra mt phn mm thuc loi ny l Discoverer/2000.

Cng c h thng thng tin thc hin


Cng c h thng thng tin thc hin (Executive Information System - EIS) cho php xy dng nhng ng dng tr gip quyt nh ha v tu bin hoc nhng sch hng dn, a ra cho nhng nh qun l v iu hnh mt khung nhn cng vic mc cao v kh nng truy nhp ti nhng ngun nm bn ngoi t chc. Cc ng dng EIS lm ni bt nhng ngoi l ca hot ng nghip v thng

- 106 -

Data Warehouse

thng hoc nhng quy lut thng qua vic s dng ha vi mu sc c m ho. Nhng cng c ph bin nht bao gm phn mm Pilot, Lightship... Express Analyzer l phn mm ca Oracle trong lnh vc ny. Nhng nh cung cp phn mm EIS ang pht trin theo hai hng. Rt nhiu trong s thm nhng chc nng truy vn cho cng c truy vn qun l trn cnh tranh vi cc cng c tr gip quyt nh khc. Xu hng th hai l xy dng nhng ng dng c ng gi vi cc chc nng c xp theo chiu ngang nh bn hng, iu chnh ngn sch, tip th hoc theo chiu dc l cc dch v ti chnh. Cc nhu cu phn tch ca nhng ngi s dng DW thng vt qu kh nng ca cc cng c to bo co v cng c truy vn. Nhng cng c s i hi mt tp cc truy vn v nhng m hnh d liu phc tp n mc m nhng ngi thc hin cng vic cm thy rt mun tr thnh mt chuyn gia v SQL v v m hnh d liu. Tnh trng ny lm mt i tnh d s dng ca cc cng c truy vn v to bo co. Trong trng hp ny cc t chc thng phi da vo mt bin php c chng minh l ng n: xy dng cc ng dng s dng mi trng ho c thit k theo m hnh Client/Server. Mt vi platform ca cc ng dng ny tch hp tt vi nhng cng c OLAP ph bin v c th truy nhp ti tt c cc h thng c s d liu quan trng bao gm Oracle, Sybase v Informix. Mt vi mi trng pht trin ng dng c th c k ra l Power Builder ca PowerSoft, Visual Basic ca Microsoft.

Cng c l cc phn mm ng dng


Nh cp trn, nhng cng c l cc cng c d s dng, l nhng cng c ch phi chn v nhn chut. Chng hoc dng cc cu lnh SQL hoc to ra cc lnh SQL truy vn nhng d liu quan h c lu tr trong kho d liu DW. Mt vi trong s cc cng c v ng dng phn mm c th nh khun dng cho cc d liu c bin i thnh nhng bo co d c trong khi nhng cng c khc th tp trung vo vic biu din d liu trn mn hnh. Ngi s dng thng la chn nhng cng c loi ny. Nhng khi s phc tp ca nhng cu hi tng ln th nhng cng c ny khng cn p ng c nhu cu s dng na. Cc t chc s dng cch pht trin cc phn mm ng dng quen thuc xy dng mt mi trng to bo co v truy vn cho DW. C mt vi l do thc hin vic l: Cc h h tr quyt nh tc nghip hay mt h thng EIS vn cn c s dng v cc phng tin thun li to bo co tng i y . Mt t chc thng c s u t ln vo mt mi trng pht trin ng dng c th (v d nh Visual C++ hay Power Builder) v c mt s lng cn thit nhng ngi pht trin c o to tt cung cp nhng gii php to bo co v truy vn. Mt cng c mi c th s i hi thm s u t v i ng nhng ngi pht trin c k nng sn xut phn mm v s u t v c s h tng. Tt c nhng

- 107 -

Data Warehouse

yu cu khng c tnh n trong qu trnh lp k hoch d n. Nhng ngi s dng khng mun lin quan ti pha thc hin ny ca d n v s tip tc da vo t chc thng tin phn pht nhng bo co theo nh k di dng nhng khun dng quen thuc. Nhng nhu cu lm bo co c th c th qu phc tp cho cng c bo co c sn. Do hng lot nhng phn mm ca cc hng ra i v cnh tranh nhau trn th trng thng s dng ngn ng th h th 3 nh C, C++.
7.2 khai thc d liu

Khai thc (khai ph) d liu (Data mining) l k thut khai thc kho d liu theo chiu su. N c th hiu l qu trnh tm kim, khm ph, xem xt d liu di nhiu mc nhm tm ra mi lin h gia cc thnh phn d liu v pht hin ra nhng xu hng, hnh mu v nhng kinh nghim qu kh tim n trong kho d liu v vy rt ph hp vi mc ch phn tch d liu h tr cho cng vic iu hnh v ra quyt nh. Khai ph d liu l qu trnh tr gip quyt nh, trong chng ta tm c nhng mu thng tin cha bit v bt ng trong t kho d liu ln, phc hp. thc hin khai ph hiu qu th phi biu din d liu di dng trc quan (Data Visualization). Khi phn tch d liu ngi ta khng nhng mun d liu l nhng con s m cn mun thy hnh nh ca d liu pht hin ra nhng thng tin mi, nhng xu hng pht trin ca i tng m d liu m t. 7.2.1 Cc ng dng ca khai ph d liu Cc k thut khai thc d liu c th ng dng vo nhiu tnh hung thc hin ra quyt nh a dng v trong nhiu phm vi rng ca cc hot ng nghip v. Marketing: phn tch cc nhu cu ca khch hng da trn cc mu d liu mua bn hng, phn loi khch hng, phn loi cc mt hng trong thi gian di t xc nh chin lc kinh doanh, qung co, xc nh cc k hoch sn xut v kinh doanh theo cc thi k khc nhau. Ti chnh, ngn hng, th trng chng khon: phn tch cc kh nng vay, tr n ca khc hng, nh tnh hiu qu ca cc hot ng kinh doanh tin t ca ngn hng, phn tch th trng u t chng khon, cc hp ng (kh c), cng tri, pht hin s gian ln trong cc hot ng kinh t, ti chnh, v.v. Sn xut, ch to, cng ngh: thc hin phn tch d liu v cc sn xut, ch to xut ti u ho v ti nguyn, vt liu, nhn lc trong cc qui trnh sn xut v ch to mi, v.v. Chm sc sc kho cng ng: phn tch cc kt qu phng chng v iu tr cc loi bnh, cng tc chm sc v bo v sc kho ca cng ng, phn tch s tc hi ca ma tu, nhng t nn x hi khc, v.v.

- 108 -

Data Warehouse

Gio dc, Sinh tin hc, khai thc Web, v.v. Research Marketing
Production Schelduling

Sale

Order Processing Inventory Management

Bill

Accounts Receivable

Product Development Engineering

Manufacturing

Shipping

Purchasing

Receiving

Account Payroll Payroll

Business Reporting

Stategic Planning

Recruiting

Training

Decision Support

Performance Appraisal

Benefits

Hnh 7.2 Cc lnh vc ng dng DW v qui trnh khai thc thng tin Thng k v cc lnh vc ng dng c th hin nh sau: Finance & Accounting 7% Customer Profiting 17% Another Application 12% Marketing & Sale 39% Operation 25% Hnh 7.3 Thng k cc lnh vc ng dng kho d liu Sau khi xc nh lnh vc ng dng, bc tip theo l tm cch s dng cc k thut x l, phn tch d liu h tr cc cng vic t ra. Qu trnh s dng d liu trong kho c m t qua nhng bc nh sau:

- 109 -

Data Warehouse

Xc nh mc tiu, nhng vn cn gii quyt Kim tra cht lng v tnh cht d liu Xy dng cch truy cp vo kho Truy nhp v tm kim thng tin Phn tch v x l d liu Quyt nh v thc thi cng vic Hnh 7.4 Cc bc thc hin s dng d liu trong kho 7.2.2 X l phn tch trc tuyn - OLAP

X l phn tch trc tuyn (On-Line Analysis Processing - OLAP) l cng c


phn tch trc tuyn. Bn cht ct li ca OLAP l d liu c ly ra t DW hoc DM sau c chuyn thnh m hnh a chiu v c lu tr trong mt kho d liu a chiu (d liu c lu tr theo mng thay v bn ghi nh m hnh quan h). Cc dch v (hay cng c) OLAP ly d liu trong kho d liu thc hin cc cng vic phn tch c bit theo nhiu chiu v phc tp h tr cho vic ra quyt nh. S hnh sao c dng thit k m hnh d liu trong DW hoc DM l m hnh d liu quan h nhng li mang nhng thuc tnh nhiu chiu rt c nhiu thun li cho vic ci t OLAP. OLAP l mt chc nng thng minh trong nghip v, lm cho cc thng tin trong x nghip c th hiu c. OLAP khin cho ngi s dng u cui c th hiu c bn cht bn trong thng qua vic truy nhp nhanh, tng tc ti cc khung nhn nhiu dng ca thng tin c chuyn i t cc d liu th phn nh s a dng nhiu chiu thc t ca cng ty. Phn tch nhiu chiu (multi_dimensional analyis): c thc hin thng qua

- 110 -

Data Warehouse

vic to ra cc khung nhn theo nhiu chiu (theo nhiu yu t (dimension)) khc nhau cng mt lc. Cc chiu v bn cht l cc nguyn t xut hin trong nghip v v d nh thi gian, a l, cc ng phn phi, sn phm, v cc kiu khch hng. Cc chiu c th l cc gi tr ca trc x v trc y ging nh trong mt bng tnh, v cng c th c trc Z. d hiu hn na, ta c th xem xt mt ci hp trong trc x l chiu di ca hp, trc y l chiu cao ca hp v trc z l nh ca hp. Tip theo thay th x l cc kiu khch hng, y l sn phm v z l a l th ta c: bn sn phm theo khch hng theo khu vc. H thng thng tin tc nghip (Executive Information Systems - EIS). Giao din ca EIS in hnh l mt kiu giao din theo hng nhn nt (push-button). Tt c cc i tng trong EIS u c lp trnh t trc nhm mc ch p ng cc thc thi khng truy vn hoc th hin cc truy vn kh vit. Chng cung cp mt phng thc nhanh, d hiu truy nhp vo cc thng tin mc cao.

Phn tch iu g s xy ra nu (What-if): Lin quan ti cc suy ngh nm


bn ngoi phm vi khp kn ca mt t chc. Phn tch lin quan ti vic dng mt vin cnh khng tn ti trong cng ty ti thi im hin nay. Chng hn, iu g s xy ra nu cc sn phm c phn nhm mt cch khc bit nhau.
7.3 Qun l v qun tr kho d liu

DW c ln gp khong 4 ln mt kho d liu tc nghip tng th (ODS). N khng c ng b vi d liu tc nghip lin quan trong thi gian thc nhng c th c cp nht thng xuyn mt ln trong mt ngy nu nh ng dng yu cu n n. Hu ht cc sn phm ca DW bao gm cc cng truy nhp ti cc ngun d liu phc tp ca cng ty m khng phi vit li cc phn mm chuyn i, dch v s dng d liu. Trong mt mi trng DW hn tp, rt nhiu cc c s d liu khc nhau nm trn nhng h thng ring r v th i hi cc cng c lm vic trao i gia cc mng. Mc d khng c mt cng ngh mng trong DW, mt ci t DW c th da trn nhng phn mm truyn thng cng nh i vi cc h thng x l cc giao dch hay gi thng bo khc (v d nh NetWare, giao thc TCP/IP hay cc sn phm da trn cng ngh DCE). iu dn n s cn thit phi qun l cc thnh phn h tng. Qun l DW bao gm: Qun l v an ton, bo mt v u tin Qun l s cp nht t nhiu ngun khc nhau Kim tra cht lng d liu Qun l v cp nht Metadata Kim ton v lp bo co v vic s dng v trng thi ca DW (qun l thi gian s dng v vic s dng cc ti nguyn, cung cp cc thng tin v gi tin phi tr..) Lm sch d liu Ti to d liu, chia nh d liu thnh nhng tp con v phn tn d liu Lu tr cc bn sao v phc hi d liu - 111 -

Data Warehouse

Qun l cc kho DW
7.4 H thng phn phi thng tin

H thng ny c s dng thc hin cc x l dnh cho nhng ngi ng k dng thng tin trong DW v phn phi chng ti nhiu a ch khc nhau theo thut ton ph thuc vo lch ca ngi s dng c th. Ni mt cch khc, h thng ny phn phi d liu c lu tr trong DW v cc i tng thng tin khc ti nhng DW khc v ti cc sn phm ca ngi s dng nh cc bng tnh hay cc c s d liu a phng. H thng ny ph thuc vo thi gian theo ngy hay vo thi gian hon thin mt s kin bn ngoi. L do cn bn c h thng ny l mi ln mt DW c ci t v hot ng, ngi s dng khng phi quan tm ti n ang c t v tr no v vic duy tr n ra sao. Tt c nhng g h c th cn l cc bo co v cc khung nhn d liu c phn tch ti mt thi im c th trong ngy hoc s kin c th lin quan. Gii php tt nht phn phi (hay phn tn) d liu l da vo ni d liu s chuyn ti v bao nhiu d liu c chuyn. Cc phn mm trung gian c s dng chuyn d liu t mt h thng ny sang mt h thng khc nu chng c nn (nn phn cng) khc nhau.
7.5 Cng ngh c s d liu c s dng trong phng php khai thc DW

So snh gia CSDL tc nghip vi DW: c mt vi s khc nhau c bn gia CSDL thit k cho nhu cu tc nghip v CSDL h tr quyt nh phc v lnh o (chnh l DW). S khc nhau c th hin bng di y:
c im Chc nng hot ng Thng tin tc nghip Thng tin h tr quyt nh

Ghi chp cc s kin, tnh Phn tch d liu, khm ph ton theo cc cng thc thng tin. Phc v tc nghip H tr iu hnh, ra quyt nh Bn trong ln bn ngoi Bitmap Khng sa i d liu Tn sut thp, khi lng ln C GigaByte n TeraByte Mang tnh lch s, lu di Khng xc nh Tnh tng hp, phn tch cao Ni b B_tree Thng xuyn vi cc giao dch nh C GigaByte Ngn, ch yu theo nm D bo c, nh k Theo cc khun mu

Ngun d liu Phng php ch s Cp nht

ln c s d liu Thi gian lu tr Khai thc

- 112 -

Data Warehouse

Mc d CSDL tc nghip l ngun cung cp d liu cho CSDL h tr quyt nh nhng chng vn cn thiu nhng kin trc v chc nng cn thit phn tch d liu h tr quyt nh mt cch d dng v hiu qu. D liu trong DW c hnh thnh t d liu n nh ca CSDL tc nghip trong khi d liu ca CSDL tc nghip s c cp nht mi khi c mt giao dch mi. Mt vi phng php v cng c phc v tt cho vic to ra cc h thng tc nghip gn nh l khng ph hp vi nhng yu cu khc nhau ca DW. iu ny rt ng trong cc h thng qun tr c s d liu. H thng c s d liu x l cc giao dch trc tuyn truyn thng c thit k mt cch n gin khng ph hp vi nhng yu cu ca phng php DW. Nhng d n dng phng php DW buc phi la chn gia mt m hnh d liu v mt s d liu lin quan trc quan cho vic phn tch nhng ngho nn v th hin. Mt s -m hnh l cch thc hin tt hn nhng khng ph hp lm cho vic phn tch. Khi phng php DW c tip tc pht trin th nhng cch tip cn mi cho vic thit k s d liu ph hp hn vi vic phn tch c hnh thnh v l iu ct yu dn n thnh cng ca phng php DW. Mt s c chp nhn s dng rng ri cho phng php DW l s hnh sao (s c trnh by di y). Cch b tr d liu truy nhp c tt nht M hnh d liu thng c s dng cho h thng tin tc nghip l m hnh d liu quan h. M hnh ny da trn cc nguyn l ton hc v logic v t nn cc h thng qun tr c s d liu quan h cung cp nhng gii php mnh cho mt khi lng phong ph cc ng dng khoa hc v thng mi. Xut pht t quan im ny, mt yu t quan trng ca vn thit k c s d liu l da trn pht trin m hnh d liu v s c s d liu quan h h qun tr c s d liu quan h (RDBMS) lin quan t c hiu qu hot ng ln nht. Yu cu in hnh cho RDBMS h tr cho h thng tc nghip l tr gip c hiu qu mt s lng ln nhng yu cu c v ghi c th xy ra ng thi. Vic nh ngha s d liu thng ti a da trn s ng thi v ti u nhng thao tc xo, thay i, chn thm thng qua vic xc nh cc bng quan h tng ng vi nhng yu cu tc nghip v ni dung lu tr c ti thiu nht cho vic truy nhp ti tng bn ghi ring. Nhng yu cu i vi RDBMS ca DW rt khc vi nhng yu cu i vi h thng tc nghip nh trnh by trn. Mt RDBMS cho DW in hnh cn x l nhng truy vn ln, phc tp, c bit v cn nhiu d liu. Khng ch c s khc nhau ng lu v cng ngh trong vic nhng h thng ny dng cc ti nguyn tnh ton m bn cht ca nhng cng vic c thc hin i hi mt cch tip cn khc ti vic xc nh s c s d liu. Nh vy cng ngh c s d liu chnh ca DW l RDBMS, ta s xem xt vic thit k s d liu khi gn lin n vi cng ngh c s d liu quan h. Trong kin trc tng th ca kho d liu, kho d liu tc nghip v cc chc nng qun l, x l phn tch d liu c th t chc nh sau:

- 113 -

Data Warehouse

Management Platform

Update Process
Metadata

Information Delivery System


MRDB Data Mining Tools OLAP Tools

Transform ODS Load

Data Extract, Clearning Data Load

Data Warehouse DBMS


MDDB Data Mart

Legacy & External Data Admin

Application & Tools

Report, Query, EIS Tools

Platform

Repository

Hnh 7.5 Kho d liu v cc kho d liu thao tc Qu trnh pht hin tri thc
Preprocessing Transformation Data

Selection

mining

Interpretation Evaluation

Target Data

Preprocessed Data

Transformed

Pattern

Data

Knowledge

Hnh Qu trnh khm ph tri thc Trch chn d liu: chn lc d liu t cc ngun d liu nhm phc v mc ch khai ph tri thc theo nhng tiu ch xc nh. V d, t CSDL v bn hng, ta chn ra cc d liu v khch hng, n t hng, ho n, v.v. Tin x l: lm sch v lm giu d liu. Lm y d liu, x l nhiu, nhng vn khng nht qun, v.v. V d, mt khch hng c th c lu nhiu bn ghi c th c nhng tn, a ch khc nhau, cn phi chng sa - 114 -

Data Warehouse

m bo nht qun v chnh xc v khch hng . Nhng d liu khc nhau v format, n v o lng, v.v. cn phi c nhng qui nh thng nht v cch chuyn v mt dng chung. Bin i d liu: thc hin bc m ho d liu v chy cc chng trnh tin ch nhm t ng ho vic kt xut, bin i v di chuyn d liu khai ph d liu . Khai ph d liu: thc hin phn tch v ra quyt nh. y l bc p dng cc k thut khai thc khai ph, trch chn ra cc mu tin, nhng mi quan h c bit trong kho. Biu din tri thc v nh gi: cc kt qu khai thc c c th tng hp di dng cc bo co nhm h tr cho tr gip quyt nh. Cc dng biu din thng l phi trc quan, di dng ho, cy, bng biu, hay cc lut, v.v. Cc bi ton trong khai ph d liu Pht hin s ph thuc d liu: Cc mi quan h ph thuc (hm) rt quan trng trong vic thit k, ci t v duy tr CSDL. Pht hin cc lut kt hp: Cho mt tp cc giao dch, trong mi giao dch l mt tp cc mc, mt lut kt hp l mt mnh suy din X Y, X, Y l tp cc mc. V d, 98 % khch hng mua bnh m, b v mua sa. Bi ton pht hin cc lut kt hp l tm tt c cc lut tho mn h tr v tin cy ti thiu c xc nh bi NSD. M hnh ho s ph thuc: tm ra m hnh m t s ph thuc c ngha gia cc bin, cc thnh phn, pht hin s ph thuc gia cc thuc tnh. Phn lp: xc nh cc nh x phn loi cc mc vo mt trong cc lp xc nh trc. V d, phn lp khch hng theo la tui, gii tnh, trnh hc vn, v.v.
7.6 Xy dng kho d liu ti chnh ngn sch

Phng php xy dng kho d liu DW c s dng xy dng c s d liu quc gia Ti chnh- Ngn sch. Cn c vo yu cu hot ng qun l ti chnh ngn sch, c th ca tng lnh vc chuyn ngnh cng nh khi lng d liu cn thu thp lu tr CSDL Ti chnh-Ngn sch c th chia thnh cc CSDL theo ch hay DATAMART (DM) cn xy dng, bao gm:
1. CSDL Thu-Chi Ngn sch 2. CSDL i tng np thu 3. CSDL i tng s dng ngn sch 4. CSDL Vn v ti sn nh nc ti doanh nghip 5. CSDL u t xy dng c bn 6. CSDL Vay n: trong v ngoi nc 7. CSDL Ti nguyn cng sn 8. CSDL V cc vn ti chnh lin quan

- 115 -

Data Warehouse

9. CSDL Vn bn php quy 10. CSDL Kinh t-X hi, tin t 11. CSDL Bin ch, tin lng, bo him x hi 12. CSDL Thng tin ni b ...

DW Ti chnh Ngn sch c xy dng sau khi tt c cc DM ny c xy dng xong. Mi DM trn c chc nng v c th c ngun d liu u vo ring v c yu cu khai thc thng tin tng ng v c tnh c lp tng i nhng chng u l thnh phn thng nht ca mt DW nn cn c thit k xy theo nhng nguyn tc chung v duy tr mi quan h thng tin ng vi thc t ca hot ng nghip v v yu cu qun l ti chnh. 7.6.1 Cc gii php xy dng kho d liu gia cc hng cung cp phn mm Hin nay trn th gii xu hng pht trin ca th trng cung cp cc gii php cho cng ngh DW dn dn thuc v cc hng CSDL truyn thng. iu ny cng d hiu v phng php DW nh trnh by trn vn da trn c s d liu quan h truyn thng v cc cng c khai thc v pht trin vn c k tha. Trc y, c mt s hng chuyn xy dng cc kho d liu nhng khi th trng pht trin nhu cu khch hng ngy cng cao th cc hng ny khng kh nng p ng, ng thi cc hng ni ting c truyn thng v CSDL bt u ch n th trng ny v c mt li th ln l pht trin cc h qun tr c s d liu ca mnh p ng nhng yu cu k thut ca DW ( nh gin hnh sao, phng php nh ch s Bitmap, .v.v.). Oracle, IBM, Infomix, RedBrick, Sybase l nhng hng hng u v h qun tr c s d liu c cng ngh xy dng DW tin tin v chim th phn ln nht trn th gii hin nay. Ta so snh nm hng trn v mt s ch tiu k thut dnh gi kh nng ca h qun tr CSDL c ph hp vi vic xy dng DW: C kh nng m rng: kh nng m rng l yu cu rt cn thit v c trng hp khi xy dng kho d liu t chc cn nh th n phi c m rng thm khi t chc ln ln. y l xu th pht trin tt yu ca cng ngh hin i. H tr c s d liu rt ln: v c trng quan trng ca DW l khch thc rt ln. Nng lc tnh ton song song: CSDL ln c ngha l nhiu d liu phi duyt, np, ch mc hn v cc cu hi truy vn s cn kh nng x l mnh m hn thc hin nhanh hn. Nhng yu cu s oc h tr bi nng lc tnh ton song song ca h thng. Kh nng qun tr: qun tr v bo tr mt CSDL ln l mt cng vic rt phc tp nh sao lu v phc hi d liu, i hi nhng cng c tr gip hu hiu, tin li v d s dng. C th pht sinh nhng vn m i vi CSDL nh cha bao gi xy ra. i vi CSDL ln cng vic bo tr v qun tr khng c lm gin on hot ng ca h thng.

- 116 -

Data Warehouse

p ng cu hi truy vn phc tp (mang tnh phn tch): x l c nhng loi cu hi c nhiu iu kin v kiu lin kt khc nhau. H tr Bitmap index v truy vn hnh sao: i vi cc cu hi truy vn n mt bng d liu ln m c mt mnh iu kin trn mt ct ch c mt s gi tr nht nh th s dng k thut ch s ho Bitmap s gim ng k thi gian x l ( phng php ny c trnh by trong phn ch s bitmap). Mt kiu phn tch hay dng khi khai thc d liu t DW l Star Query- truy cp ti mt bng Fact (c d liu rt ln) v bng Fact ny c mi lin kt n cc bng dimension khc nh hn ( xem phn gin hnh sao). H tr cc cng c OLAP ph dng trn th trng nh: NETMAP, MineSet, DIAMOND, IMAPGIX 4D, v.v. Ngoi ra i vi th trng Vit Nam, ta cn phi quan tm ti mt s yu t khc nh d s dng, gi c hp l, quen thuc ti VN v c h tr k thut ti VN. Bng so snh 5 hng cung cp gii php DW
Ch tiu k thut C kh nng m rng H tr CSDL rt ln Nng lc tnh ton song song Kh nng qun tr p ng cu hi phc tp H tr Bitmap v Star query H tr cc cng c OLAP
ORACLE

IBM Hn ch Tt c c Hn ch Khng Hn ch

INFOMIX c Tt Tt c Hn ch Khng Hn ch

REDBRICK c c c Hn ch Tt C/C Hn ch

SYBASE c Hn ch Hn ch c c C/Khng Hn ch

Rt tt Rt tt Rt tt Rt tt Rt tt C/C Rt tt

i vi th trng Vit nam


D s dng Gi c hp l Quen thuc ti VN H tr k thut ti VN Tt Tt Rt tt Rt tt c c Hn ch c c Tt Hn ch c

T nhng thng tin trn ta so snh v phn tch i dn kt lun l s dng phng php DW ca hng ORACLE xy dng l ph hp vi cng vic ny nht. Nh trnh by trn ta xy dng DM ny da trn cng ngh DW ca ORACLE nn cc cng c c s dng xy dng l ca hng ORACLE. - 117 -

Data Warehouse

7.6.2 Cng c h tr xy dng DW v DM Thng thng ngi s dng phi thit k, xy dng h thng ca mnh trn mt h thng tch hp c bn bi nhiu nh cung cp khc nhau bi s la chn mt sn phm ph hp vi tt c cc yu cu nghip v ca ng dng l cng vic kh kh khn. C qu nhiu sn phm ca nhiu nh cung cp khng tng thch v chng c thit k v bn ring r s kh hot ng c tt vi nhau. khc phc cc nhc im trn, hng ORACLE cung cp mt gii php ng b bao gm hai nhm sn phm Oracle Application Data Warehouse (OADW) xy dng kho d liu DW v nhm sn phm Oracle Data Mart Suite (ODMS) xy dng DM. y l nhng b cng c p ng y cc yu cu cn thit xy dng nhng CSDL ln v mt hoc nhiu CSDL ch , h tr cho tt c t nhng bc phn tch, thit k cho n xy dng v qun l h thng. Hai nhm sn phm ny c xy dng trn nn tng ca h qun tr c s d liu m hnh quan h Oracle 7.xx v 8.xx, s dng cc cng c phn tch thit k v pht trin truyn thng ca Oracle l Designer/2000 v Developer/2000. xy dng DM Thu-Chi Ngn sch ta s dng nhm cng c ODMS bao gm:
Oracle 7.3.2 Web/Application Server Oracle Discoverer Oracle Data Mart Admin: Cng c qun tr DM Oracle Data Mart Builder: Cng c xy dng DM Oracle Data Mart Designer : Cng c phn tch thit k kho d liu ch .

Trong ch yu s dng Oracle Data Mart Builder v Oracle Data Mart Designer. Ta thy rng Oracle Data Mart Designer c nhng chc nng ging ht nh nhng chc nng chnh ca b cng c Designer/2000. Mt s cng c h tr x l phn tch kho d liu ph bin trn th trng Hin thi trn th trng c kh nhiu v rt a dng cc ph n m m cung cp cc cng c phn tch kho d liu. Trong phn ny chng ta hy im qua mt s sn phm phn mm c th c xem l tt c pht trin thnh nhng cng c bn trn th trng hin nay phc v cho cc ng dng khai thc d liu. 1. NETMAP

ALTA Analytics, Inc., http:// www.alta.oh.com


NETMAP c xy dng t nhng nm 1980, bt u l cho nhng my tnh ln (mainframe), dn c m rng l pht trin cho cc ng dng trn mng cc my tnh. Cc version mi nht hin nay chy trn mi trng UNIX hay NT Window vi cc chng trnh ng dng c th vit bng C, c m rng vi API (Application Programming Interface). 2. IMAGIX 4D

- 118 -

Data Warehouse

Imagix Corporation, http:// www.imagix.com


Imagix 4D c xy dng trn c s kt hp ca ngn ng lp trnh C v TCL (Tool Command Language). Cc version mi nht hin nay chy c trong mi trng UNIX v Window 3. MineSet

Silicon Graphics, Inc., http:// www.sgi.oh.com/Products/softwware/MineSet


MineSet l h thng h tr phn tch thng tin trn c s tch hp rt a dng cc cch tip cn phn tch khc nhau mt cch trc quan. N da trn kin trc Client/Server truy cp rt nhanh vo cc CSDL ln.

- 119 -

Data Warehouse

Chng VIII: X l phn tch trc tuyn OLAP


Nhng vn v x l phn tch trc tuyn Kin trc ca OLAP X l phn tch OLAP quan h v nhiu chiu nh gi Server OLAP v cc cng c h tr OLAP

8.1 Ti sao li phi x l phn tch trc tuyn

Ngy nay cc nh qun l ang phi i mt vi hai thch thc ln: 1. iu hnh cng vic sao cho c hiu qu nht c c nhng li nhun ti a v ginh c li th trong mi trng cnh tranh gay gt. 2. Lp k hoch v chng trnh hnh ng cho cc hot ng ca t chc m h qun l. X l thng tin v x l phn tch l hai cch c bn c c nhng thng tin c gi tr t cc kho d liu nhm gii quyt hai vn trn. X l phn tch nhm tr li cho nhng cu hi nh Ci g ang din ra trong hot ng kinh doanh?, Ci g s xy ra tip theo? Trong cc kho d liu ln, a chiu thng cha nhiu thng tin (tri thc) n kn m cc cng c truyn thng nh k thut truy vn SQL rt kh v nhiu khi khng pht hin c. V d ng gim c mt x nghip mun bit: Mt hng no bn chy nht vo thng 11, t cc kho thuc khu vc Ty bc v vng ni, tng lp khch hng (nhm tui) no tiu th mnh nht?. y l cu hi c 4 chiu (mt hng, thi gian, vng, nhm tui), khng d g c c cu tr li cho nhng cu hi nhiu chiu nh vy bng cch s dng cc k thut truy vn truyn thng trong cc m hnh d liu quan h, v d nh SQL, m phi da trn nhng kt qu phn tch nhiu chiu. Hn th na, do yu cu ca ngi s dng lin tc thay i, i hi cc cu tr li phi c x l theo th t khc nhau: lc theo vng, khi th theo thi gian, khi khc li theo nhm tui, v.v... Do vy i hi x l phn tch trc tuyn trn nhng tp d liu ln v hn hp. C hai cch thng c s dng truy nhp trc tip vo kho d liu. Cch th nht thng qua cc khung nhn (view) nhiu chiu v th hin n nh l cu trc nhiu chiu phc v cho vic phn tch v lp bo co cc trm lm vic. thc hin hiu qu OLAP trn cc khung nhn d liu, ngi ta thng tp trung xy dng cc thut ton chn t ng cc bng tng hp v ch s ha cc khung nhn. Cch th hai l phn tch trc tip cc khi d liu nhiu chiu c to lp t cc kho d liu v to ra kh nng tng hp, gp chung, h tr cho vic ra

- 120 -

Data Warehouse

quyt nh v d bo, phn tch xu th pht trin v phn tch thng k. Trn thc t d liu trong nghip v l nhiu chiu. N c quan h ln nhau v c s phn cp. Mi chiu l mt yu t lin quan ti cng vic cn phn tch. Do vy d liu trc khi em phn tch cn c chuyn sang dng nhiu chiu. 8.1.1 Phn tch d liu a chiu Tt c nhng d liu c quan h vi nhau u cn c phn tch. Trong x l phn tch, trng tm l phn tch d liu c bit l phn tch a chiu. Trong phn tch a chiu, d liu c miu t thnh cc chiu (Dimension), v d nh sn phm, khu vc v khch hng. Cc Dimension thng lin quan ti nhng s phn cp, chng hn thnh ph, bang, vng, nc v lc a. Chiu thi gian l mt dimension chun vi s phn cp ca ring n l ngy, tun, thng, qu, nm . gii quyt s phn tch phc tp, phn tch nhiu chiu th hin mt khung nhn d liu gn gi vi ngi s dng. Mt ngi s dng c th truy nhp ti ngn kh theo tng phng ban v lu tr 4 qu cui cho mt tp cc sn phm. Kt qu c th c xoay thay i v tr cc trc v khung nhn. Thm na, ngi s dng c th xem cc chiu bng cch khoan su (drill-down) hay cun ln (roll-up) theo cc thnh phn ca mi chiu. Vic khoan trn cc chiu c th to ra cc khung nhn khc. Phm vi ca x l thng tin thng n gin hn (ch gm 2 hoc 3 chiu). Phn tch nhng d liu lch s hiu c qu kh l s phn tch tnh. X l phn tch c th c dng cho nhng phn tch lch s phc tp vi thao tc m rng - hay gi l s phn tch ng: ln k hoch v d bo tip qu kh nh l phn m u cho tng lai. Trong DW, d liu c lu tr cho vic truy vn, phn tch v cc mc ch khc nh OLTP, khi d liu c thu thp v lu tr cho cc hot ng tc nghip v cc mc ch kim sot. 8.1.2 nh ngha OLAP OLAP l mt cng ngh x l trc tuyn cc thng tin mi c to ra t nhng d liu ang tn ti, thng qua mt tp nhng chuyn i v cc tnh ton s. V bn cht, mt h OLAP l h thng lu gi nhng thng tin tng hp v cho php th hin thng tin tng hp di dng bng 2 chiu. sau: nh ngha: OLAP l mt cng ngh phn tch d liu thc hin nhng cng vic a ra mt khung nhn logic, nhiu chiu ca d liu trong DW. Khung nhn ny hon ton khng ph thuc vo vic d liu c lu tr nh th no (c th c lu tr trong mt kho d liu nhiu chiu hay mt kho d liu quan h). Thng lin quan ti nhng truy vn phn tch tng tc d liu. S tng tc thng l phc tp, lin quan ti vic khoan su xung ti nhng mc d liu chi tit hn hoc cun ln mc d liu cao hn mc tng hp hoc kt hp.

- 121 -

Data Warehouse

Cung cp kh nng thit lp m hnh phn tch bao gm mt m t tnh ton cho vic tnh t l, nhng bin i, ..., lin quan ti nhng i lng s hoc d liu l con s qua cc chiu d liu khc nhau. To ra s tng hp v kt hp, phn cp v dng nhng mc tng hp, kt hp cho mi php giao ca cc bng theo mi chiu. H tr nhng m hnh chc nng cho vic d bo, phn tch cc xu hng v phn tch thng k. Ly v hin th d liu theo nhng bng 2 chiu hay 3 chiu, theo biu hay th, d dng xoay i cc trc cho nhau. Kh nng xoay l quan trng bi v nhng ngi s dng cn phn tch d liu t nhng cch nhn khc nhau v s phn tch theo mi cch nhn s dn n mt cu hi khc, cu hi ny s c kim tra tnh ng n da trn mt cch nhn khc v d liu . p ng nhng cu tr li nhanh, v vy qu trnh phn tch khng b ct ngang v thng tin khng b c. S dng mt m t kho d liu a chiu, lu tr d liu theo cc mng (lu l mng lu tr nhng phn t cng kiu khc vi bn ghi l cc phn t khc kiu nhau). Nhng mng ny l s biu din logic ca cc chiu ca cng vic. Thut ng OLAP v c s d liu a chiu hay c ng nht vi nhau, gy nn s mp m xung quanh hai khi nim ny. Bn cht ca c s d liu a chiu l mt kin trc c s d liu lu gi thng tin tng hp bao gm tt c cc mc d liu chnh (hay cc chiu) tham chiu ln nhau. Trong khi OLAP l mt th hin mt trc cho php ngi s dng u cui la chn cc chiu v cc s kin tham chiu ln nhau. Cc ngun d liu cho mt ng dng OLAP bao gm c s d liu a chiu, c s d liu quan h v cc d liu bng tnh (ly t bt k mt kin trc CSDL no). 8.1.3 Kin trc OLAP OLAP l mt kh nng bo co v phn tch d liu. y l mt thnh phn quan trng trong khi truy nhp v s dng trong kin trc mt DW. Thnh phn OLAP th hin kh nng bo co v phn tch ca cc dch v OLAP cn thit khi chuyn i sang cu trc a chiu cng nh khi truy nhp ti kho d liu DW hay DM. Kin trc tham chiu ca DW a ra nhng la chn sau y: Truy nhp d liu trc tip t DW hoc DM sau bin n thnh cu trc a chiu v lu tr trn mt kho d liu a phng ti mt my trm. Truy nhp trc tip d liu t DW ri chuyn i chng thnh dng cu trc a chiu v lu tr n ti DM nhng trong mt kho d liu a chiu, sn sng phc v cho vic phn tch v ly d liu ra ti my trm. Truy nhp d liu trc tip t DW hoc DM sau bin chng thnh mt khung

- 122 -

Data Warehouse

nhn a chiu v th hin chng nh mt cu trc a chiu i vi ngi s dng cho vic phn tch v lp bo co ti my trm. Nhng cch thc trn minh ha nhng thnh phn logic v vt l ca bn thn kin trc OLAP.

Kin trc Logic


Bao gm 2 phn: Khung nhn ca OLAP: l s biu th logic v a chiu ca d liu trong DW hoc DM i vi ngi s dng, khng lin quan ti vic d liu c lu tr nh th no v u. K thut lu tr d liu: l cch la chn lu tr d liu nh th no v lu tr u. C 2 cch thng dng nht l kho d liu a chiu v kho d liu quan h (CSDL quan h). Nhng ngi s dng ch quan tm ti khung nhn d liu a chiu v mt mc th hin chp nhn c. Cn nhng ngi cung cp thng tin th quan tm ti vic d liu c lu tr u v lu tr v truy nhp nh th no m bo hiu sut chp nhn c v qun l d liu c hiu qu. Kin trc chc nng ca OLAP bao gm 3 thnh phn: cc dch v lu tr d liu, dch v OLAP, v cc dch v biu th i vi ngi s dng. Nh vy kin trc chc nng ca OLAP l mt kin trc khch-ch 3 tng. Kin trc ny cung cp nhiu kh nng la chn cu hnh vt l cho 3 dch v chc nng ny.

Kin trc vt l
Phn ra lm 2 loi c bn da trn k thut lu tr d liu: kho d liu a chiu v kho d liu quan h. 1. Vi loi kho d liu a chiu: kho d liu nm trn Server OLAP (khng nm trn Server cha kho d liu ca DW hay DM). Loi ny c chia thnh 2 loi nh l: Loi th 1: kho d liu a chiu c lu tr trn my trm Client. Do thng xy ra hin tng tc nghn (nt c chai) trn mng khi d liu c ti vo cc my trm. Mt nh hng khng tt na l vn hiu sut hot ng v an ton d liu. Loi th 2: kho d liu a chiu v cc dch v ca OLAP c kt hp vi nhau. D liu c trch lc ra t DW sau c chuyn i thnh cu trc a chiu v c lu tr trong Server cha DM. l cu hnh DM c in ti nhiu DM c ti d liu c lm sch v c sp t li t mt DW tng th vo. Nhng chc nng ny c bit ch lc, kt hp to ra nhng tp con

- 123 -

Data Warehouse

chc nng v c p dng vi d liu ly ra t DW to ra nhng DM chc nng. Vi loi th 2 ny ta cn c th tch ring kho d liu a chiu s lu tr trn server DM vi cc dch v ca OLAP nm trn server OLAP khi kho d liu a chiu c kch thc kh ln, s lng ngi s dng nhiu hay d liu cn c chia s. Trong trng hp ny DM c th ly d liu hoc t DW nh trn hoc trc tip t cc ngun d liu. 2. Vi loi kho d liu quan h: c lu tr trn Server cha kho d liu DW hay DM, OLAP Server nm ring v khung nhn OLAP cng nm trn my trm ring r. Thng thng h thng OLAP c thit lp da vo kin trc ca kho d liu. trong c th lin kt vi Excel spreadsheets, MDDB, v.v., khai thc kho d liu, pht hin cc thng tin, v.v. Less More
Data Meta Data Meta
Public Structures Individually Structures

Information OLAP

Normalized

History

Indexed, Customized

Sales/Marketing Datamart Meta

Financial Datamart Meta

More

Less

Subseting, summarizing, indexing, merge/joining, categorizing

Data Warehouse Metadata Definitions of all data, source mapping, calculations , v.v.

Data

Extraction, transformation, loading

OLTP Curent data

OLTP Curent data

Tape

Hnh 8.1 OLAP trong kin trc ca kho d liu

- 124 -

Data Warehouse

Cc loi kin trc trn lm ni ln 2 vn cn quan tm l: d liu c lu tr nh th no v cc dch v OLAP c cung cp nh th no v u. Vn th nht: d liu c lu tr nh th no? Kho d liu a chiu v kho d liu quan h Kho d liu quan h tun theo m hnh d liu quan h. Nhng DW (hay DM) theo m hnh quan h u c xy dng da trn s hnh sao mang thuc tnh a chiu nn tuy l kho d liu quan h nhng chuyn i sang khung nhn a chiu ti my trm d dng hn. Kho d liu quan h lu tr d liu nh nhng bn ghi c kho trong cc bng v d liu c truy nhp bi mt ngn ng chung l SQL. Mt khc, kho d liu a chiu li lu d liu trong cc mng (cha nhng d liu cng kiu). V vy khng c m hnh a chiu chung, khng c mt phng php chun truy nhp d liu. Mt vi sn phm c sn mt m t vi giao din ng dng lp trnh c (API) hoc vi mt thit b bng tnh u cui. Kho d liu quan h c th c kch c rt ln. Kch c ca kho d liu b tng ln rt nhiu bi vic s dng nhng tp ch s v nhng k thut khng chun ho (tc l s dng nhng bng cha c a v dng chun 3) t c hiu sut chp nhn c ca nhng truy vn a chiu. Vi kho d liu a chiu th kch thc ca kho ni chung l b hn ch nhng c th s dng cng ngh nn (v d k thut nn ma trn tha) lu tr c nhiu d liu hn trong mt khng gian nh hn. Vn th 2: cc dch v ca OLAP c cung cp nh th no v u? Cu hi t ra l: Kho d liu a chiu v cc dch v ca OLAP c th kt hp c vi nhau khng hay ni cch khc c th kt hp kho d liu quan h vi server OLAP t c nhng yu cu a chiu ca ngi s dng khng?. l 2 cch tip cn khc nhau, thng c gi l OLAP a chiu (MOLAP) v OLAP quan h (ROLAP). MOLAP v ROLAP l trong sut i vi ngi s dng u cui. Nhng mt trc (front_end) ca nhng cng c ny l ging nhau v dng thc h tr quyt nh l nh nhau. Tuy nhin c nhng khc bit quan trng gia ROLAP v MOLAP mc chi tit tc nghip, khc bit th hai lin quan n vn kinh t: MOLAP thng r hn ROLAP. MOLAP (Multidimensional OLAP) Cch tip cn ny kt hp kho d liu a chiu v cc dch v ca OLAP trn cng mt Server. MOLAP thng c coi l c s d liu a chiu (MultiDimension DBMDDB). Nh ni, MDDB l mt cu trc ti u cho vic lu tr cc s kin phn loi v cng vi n l cc chiu. D liu c t chc theo khung nhn d liu v c lu tr trong mt biu mu c kt hp v tng hp. Tp index nh hn khin cho tr li nhng truy vn phc tp rt nhanh. V d liu c lu tr trong cc mng, vic cp nht cc gi tr khng nh hng nhiu ti tp ch s. iu ny khin cho vic ci t nhng

- 125 -

Data Warehouse

ng dng cp nht hoc c-ghi nh d bo v iu chnh ngn sch tr nn d dng. Cc cng c ca MOLAP cho php truy cp vo d liu chi tit trong h qun tr RDBMS nh sau: Infor Request Database Server Front-end Tools Load MOLAP Server RDBMS SQL Metadata, Request Processing

Result Set

Result Set Hnh 8.2 Kin trc ca MOLAP MOLAP l s la chn tt nht cho nhng ng dng c c im: Yu cu tc truy vn cao C kh nng phn tch d liu phc hp. MOLAP cung cp mi trng phn tch mnh hn ROLAP. D s dng: L do bi d liu c tng hp t trc v c lu trong kho d liu a chiu. Tt c nhng g ngi s dng cn lm l xc nh cc chiu v cc nhm nm trong cc chiu . Trong khi ROLAP li yu cu ngi s dng phi hiu c nh x ti cc CSDL tc nghip.

Thit k MOLAP bao gm nhng bc c bn nh sau: Chn chc nng cng vic v d nh phn tch s tin bn hng v lp bo co ti chnh. Xc nh nhng gi tr s, nhng i lng o lu tr nh s tin bn v khch hng. Xc nh cc chiu (nh thi gian, v tr a l, sn phm) v n v ca mi chiu nh thi gian th theo thng v qu, v tr a l th theo bang hay theo vng. nh ngha m hnh logic v ti vo kho d liu a chiu hoc trc tip t cc ngun d liu hoc thng qua vic lc v kt hp nhng ni dung c la chn ca DW hoc DM. Nhng chc nng chnh a ra vi ngi s dng bao gm: Tr li nhanh nhng cu truy vn m rng v d nh trong ng cnh what-if - 126 -

Data Warehouse

(ci g xy ra nu ...). Nhng cu tr li nhanh khng lm ngt qung s phn tch v qu trnh suy ngh khc. Nhng cp nht tng tc (nh c-ghi) ca c s d liu a chiu p ng c cc ng dng d bo, hng k hoch, v iu chnh ngn sch, v.v. Pht hin c nhng mi quan h gia nhng yu t hoc nhng gi tr ca cc chiu pht hin ra nhng mi quan h khng ng ti. C mt m t tnh ton mnh v kh nng phn tch mang tnh so snh: phn hng, so snh, t l % dn ti phn lp, tnh gi tr ln nht, nh nht, gi tr trung bnh, dch chuyn gi tr trung bnh, so snh thi k ny so vi thi k khc, .v.v. Tnh ton theo nhng chiu cho nhau (c th ngc nhau) M rng nhng chc nng c bn vi nhng hm do ngi s dng nh ngha hoc pht hin ra nhng chc nng m hnh nhng c. Nhng chc nng ti chnh v thng k tim n: chuyn i tin t, s gim gi, t l tin li trong nc, nhng xu hng, phn tch nhng chui thi gian. Khoan ti nhng d liu chi tit trong DW Xoay, khoan su, dch chuyn chiu (v mt biu din) theo mt hoc nhiu chiu, v nhng chc nng biu din hin th d liu mnh khc. Qun tr v qun l h thng vi cch thc ny i hi: M hnh d liu khi to phi chn ng cc chiu v n v ca chng, thy trc c d liu s c truy nhp nh th no v vic la chn nhng php lc thch ng ti d liu vo t DW l nhng vn quan trng ng quan tm. Truyn d liu nh k v cp nht nhiu d liu cng lc bi v trong khi c s d liu ang c s dng th s ln cp nht cn phi hn ch. S kt hp, tng kt v tnh ton trc trong qu trnh ti d liu vo. o to mt cng ngh khc v cch s dng nhng k nng mi ny. Vit nhng ng dng mi vi mt ngn ng c quyn m rng v ci thin chun u cui (giao din vi ngi s dng u cui) ca c s d liu. Mt s vn cn quan tm khi s dng cch tip cn ny: Kch c ca c s d liu a chiu c h tr nh hn so vi mt c s d liu quan h. Cng ngh s dng ma trn tha - tm kim nhng phn t khng s dng trong ma trn a chiu, loi b chng v nn cc mng - c s dng tit kim khng gian nh. Nng sut c ci thin, nhng thng tin tng, kt hp cng c lu tr v vy nhu cu lu tr l nh hn.

- 127 -

Data Warehouse

nh hng ph ca vic lu tr d liu mc th (nh nhng d liu tng, nhng d liu c tnh ton trc v nhng d liu kt xut t nhng d liu khc) l khng th khoan su d liu ti mc chi tit. Truy nhp v an ton l c sn cho cc mc cao; khng c nhng u tin da trn vic s dng hoc nhng kim sot truy nhp cc mc con. Nhng thay i trong cu trc a chiu i hi c s d liu a chiu c t chc li; nhng phng tin lu tr v sao lu c sn b hn ch. Cn nhng ng dng u cui c bit hn ch nhng kh nng la chn. Cc m rng nhng ng dng u cui ca mt c s d liu a chiu khng th c dng cho mt c s d liu a chiu khc. Cn iu chnh ph hp cho c s d liu a chiu v cc dch v ca OLAP trong cc cng vic sau: Tnh ton ca nhng ng dng m rng vi nhng m hnh v ng cnh What-if. Cc chiu tnh v bnh ng. Nng lc c-ghi. Mt mi quan h gia cc chiu rt phc tp. Tnh ton lin quan ti nhiu chiu v nhiu hng. Nhng chc nng ti chnh, thng k v tnh ton mnh. Kch c c s d liu chp nhn c cho cc chc nng ca cng vic. Mt s cng c MOLAP c pht trin v c nhiu ng dng nh: Arbor Software Essbase Oracles Express Server Sinpers TM/1 OLAP quan h (Relational OLAP) Phng php tip cn ny bao gm cc dch v ca OLAP v c s d liu quan h. Cc d liu c lu tr trong nhng bng quan h v c th c kch thc hng trm Gigabyte. Nhng h ROLAP cung cp cc m t truy vn cc k linh ng bng vic chun b sn sng tt c d liu tc nghip cho ngi s dng u cui, d dng trch v tng hp d liu theo yu cu. Nhng cng c ROLAP c th trch d liu t rt nhiu ngun CSDL quan h khc nhau.

- 128 -

Data Warehouse

Database Server RDBMS

SQL ROLAP Server Metadata, Request Processing Result Set

Infor Request

Front-end Tools

Result Set

Hnh 8.3 Kin trc ca ROLAP Theo cch tip cn ny, cc khi d liu con (Subcube) s c tnh ton trc khi a vo bng tng hp (summary table). i vi nhng khi d liu ln th khng th x l mi th trc c m ch x l trc nhng d liu no c lin quan cht ch n cc cu truy vn. Cng vic tin x l trong ROLAP c thc hin theo hai bc: 1. Xy dng cc khi d liu con (subcube) c th ho t cc bng tng hp. 2. Thit lp cc ch mc (Index) trong cc bng . Vic phn chia khng gian (thi gian) gia hai bc ny l rt kh. Nu chia ra qu nhiu khng gian nh cho mi ch mc th dn n rt t cc khi con c x l trc v ngc li, nu khng gian nh ln cho cc khi con th cc ch mc s gim hiu qu. Khi d liu con l mt b phn ca khi d liu (Data Cube). Cu truy vn (Queries): c s dng nh l thuc tnh la chn (trong SQL, chiu l thuc tnh trong mnh gp nhm (groupby clause) hoc mnh iu kin Where. V d: Xt mt cp (Part, Customer) s cho tng ng mt subcube cha cc thit b bn cho khch hng. Trong SQL, c th s dng mnh gp to ra subcube c tn l pc t bng cc thc th R nh sau: SELECT Part, Customer, SUM(Sales) AS TotalSales FROM R GROUP BY Part, Customer Ch s ho (ch mc): tng tc x l cc cu hi, chng ta c th s dng cu trc B-Tree. V d i vi subcube pc c th xy dng cch nh s nh sau: 1. Icp Xc nh nhng ch s c ghp t hai chiu c (Customer), vi chiu p (Part).

- 129 -

Data Warehouse

2. Ipc Xc nh nhng ch s c ghp t hai chiu p vi chiu c. Trong cc phng php nh ch s nh trn th th t l quan trng. Theo cch Ipc , cho trc gi tr p, cho php xc nh tt c cc hng trong khi pc m chng c gi tr p . chnh l cu hi Tm tt c cc khch hng mua thit b p? ROLAP l s la chn cho DW c nhng c im sau: D liu thng xuyn thay i: trong mt kho cha nu d liu hay bin ng v ngi s dng li i hi nhng tng hp gn nh tc thi, ROLAP s l s la chn duy nht. MOLAP phi trch ly v tng hp d liu ngoi tuyn (off_line) np vo MDDB, hn na hu ht cc c s d liu a chiu u yu cu tnh ton li ton b CSDL khi mt chiu c thm vo hoc mt lc tng hp thay i hoc d liu mi c thm vo. Nhng c im ny khin cho MOLAP khng thch hp vi nhng h h tr quyt nh m ngun d liu thng xuyn bin ng. Khi lng d liu ln: i vi nhng DW c ln c terabyte, ci gi phi tr cho MOLAP l qu ln: vic tnh ton trc d liu i hi hng trm terabyte khng gian lu tr. Cc dng truy vn khng c bit trc: ROLAP cho php truy vn v tng hp t bt k ngun d liu tc nghip no. Tuy nhin kh nng ny li dn ti s phc tp khi s dng, trong vic nh x ti cc ngun d liu tc nghip.

Mc d cc d liu c lu tr di dng quan h (tc l cc ct v cc hng) nhng d liu vn c th hin i vi ngi s dng di dng cc chiu ca cng vic. che giu dng lu tr ca d liu, mt lp Metadata ng ngha cn c to ra. Nhim v ca tng ny l nh x tt c cc chiu ti cc bng quan h. Bt k mt s tng kt hay kt hp no cng cn to ra thm siu d liu ci thin thi gian tr li. Tt c nhng siu d liu ny c lu trong c s d liu quan h - tc l phi to ra mt kho siu d liu khc trong gii php ton th DW- cn c duy tr v qun l. Thit k OLAP quan h bao gm nhng bc c bn sau: 1. Xy dng mt m hnh a chiu s dng nhng k thut nh phi chun, s hnh sao, s hnh tuyt ri hoc s lai gia 2 loi trn. 2. Thm vo nhng d liu tng kt v nhng d liu kt hp. 3. Phn chia nhng tp d liu ln thnh nhng phn nh hn c kh nng qun l c ci thin nng sut thc hin. Chng hn, thi gian hoc nhng n v t chc cn c chia thnh cc phn nh. 4. Thm vo nhng tp ch s mi c tnh sng to hoc dng bitmap ci thin nng sut thc hin (lu l iu ny lm cho kch thc ca c s d liu tng v tng thi gian xy dng nhng bng ch s).

- 130 -

Data Warehouse

5. To ra v lu tr siu d liu. Cc siu d liu bao gm nhng nh ngha cc chiu, nh x cc chiu vi cc bng d liu quan h tng ng, nhng quan h phn cp gia cc chiu, nhm thng tin, nh ngha v m t nhng d liu tng v d liu kt hp li, nhng cng thc v nhng tnh ton, qun l vic s dng v nhiu th khc na. T khung nhn tc nghip, cc bc thc hin mt cu truy vn nh sau: Khi to cng c Client s dng mt khung nhn a chiu ca d liu. Gi OLAP Server t cng c ca Client v kim tra siu d liu trong thi gian thc. To ra nhng cu lnh SELECT t nhiu bng, nhng truy vn lin quan v a ti c s d liu quan h. Thc hin nhng chc nng a chiu v d nh tnh ton v lp cng thc, dch t cc bit sang cc m t cng vic, .... trong kt qu ca cu truy vn c s d liu. Tr li kt qu ti cng c ca Client cho nhng qu trnh x l xa hn v hin th hoc hin th ngay lp tc. Nhng chc nng c cung cp cho nhng ngi qun tr h thng bao gm: Khung nhn cng vic ca d liu quan h. H tr s phn cp theo chiu. Nhng chc nng tnh ton, ti chnh v thng k i vi s m rng ca ngi s dng Khoan su d liu ti mc chi tit La chn nhng cng c u cui. Vic qun tr c s d liu thc y s u t vo vic sao lu, lu tr v thit lp nhng c s d liu cho nhng phn tch ring c th. nh hng d liu s dng siu d liu Phn quyn ngi s dng theo nhiu mc m bo an ton. Vi cng vic qun l v qun tr h thng cn phi tha nhng iu sau : OLAP Server khng c nhng thay i theo nh k v khng cn ti d liu khi to ban u vo. Dng chun sao lu, lu tr v cc x l bo mt ang tn ti. Qun l, ng b v duy tr tt c cc siu d liu mi trong DW. Qun l vic s dng cho ph hp vi nng sut thc hin. Vic ny c th nh

- 131 -

Data Warehouse

hng ln ti m hnh d liu, nhng s phn chia hoc cc mc kt hp v tnh tng. Cng vic iu chnh rt phc tp. Ci thin nng sut nh phi chun cc bng v nh ch s c th lm tng thm kch c v kh tm kim trong c s d liu, v i hi cng vic duyt d liu nhiu hn, tn nhiu a hn v nhiu vng m d liu hn. Mt s vn cn quan tm khi p dng phng php ny: Vic s dng s hnh sao hay s hnh tuyt ri, phn chia ra cc phn, v s phi chun ca cc bng ci thin nng sut nh hng rt ln ti tnh linh hot v kh nng m rng ca c s d liu quan h. iu ny khin cho vic cp nht d liu gp nhiu kh khn v cn phi cp nht mt khi d liu ln cng mt lc. S hnh sao vi s bin i a dng, s kt hp v s tng hp c thit k vi gi thit d liu l tnh ngoi tr khi mt khi ln c ti vo. Vic tnh ton theo mc hng chng hn nh khi thay profit = revenue - cost, i hi chuyn v hng v ct. Thao tc ny kh thc hin thm ch vi nhng cu lnh SELECT lin quan ti nhiu bng. Qun l v duy tr siu d liu l kh khn v tn km. Kt hp c s d liu quan h v cc dch v ca OLAP l cn thit khi c cc yu cu sau: Nhng ng dng tp trung vo d liu vi nhng nhu cu hin th d liu chi tit. Biu din theo chiu ng v thay i ct li. C kh nng ch c vi yu cu ghi l t nht. S tnh ton mc hng v gia cc chiu l t nht. Kch thc c s d liu ln, mi quan h gia cc chiu l n gin v khung nhn theo chiu on-the-fly Nhng vn bn lun trn u vi mc tiu m bo hiu qu cng vic cho ngi s dng, khi lng d liu n nh, c qun l v ph hp vi mc ch. Mt s yu t khc cn quan tm bao gm: Giao din theo mong mun ca ngi s dng Nhng chc nng v c tnh ca cc cng c client S nhn thc v cc kin trc m v c quyn. Xc nh s la chn no l tt nht c k hoch u t vo Client, Server, c s d liu, cc cng ngh u cui, cc k nng ca ngi qun tr c s d liu v ca ngi s dng u cui.

- 132 -

Data Warehouse

8.2 cc nguyn tc ca OLAP

Sau y l 12 lut cung cp mt cch nhn thng nht (khng phi l im chun) trong vic nh gi v hiu r nhng yu cu v cc cng c OLAP. Nhng qui tc ny c pht biu di y c trch ra t nhng ti liu ca E. F. Codd (1993). 1. Khung nhn khi nim a chiu (Multiple Dimension): i vi mt ngi thc hin cc cng vic ca cng ty th cch nhn ca h vi cng vic thc cht l nhiu chiu. V vy, m hnh OLAP phi l a chiu v bn cht. Nhng ngi s dng c th thao tc d dng trn nhng m hnh d liu a chiu nh vy. 2. S trong sut (Transparency): V tr ca cng c phn tch cn phi trong sut vi ngi s dng. OLAP nn tn ti trong mt kin trc h thng m, cho php cc cng c phn tch c th c nhng vo bt k ni no m ngi s dng mong mun m khng c mt s tc ng ngc li no vi cc chc nng ca cng c trn my ch. 3. Kh nng truy nhp c (Accessibility): Cng c OLAP phi nh x c s logic ca chnh n ti kho d liu vt l hn tp, truy nhp ti d liu v thc hin mi chuyn i cn thit a ra mt khung nhn n gin, mch lc v ng nht cho ngi s dng. D liu vt l ca h thng thuc kiu ny tr nn trong sut vi ngi s dng v ch l mi quan tm ca cng c. 4. Thc hin cng vic to bo co nht qun: Khi s lng cc chiu tng th nng sut bo lp bo co gim i. Tuy nhin, khi s chiu ca kho d liu thay i cng khng nh hng n vic lp bo co. 5. Kin trc khch-ch (Client/Server): Thnh phn Server ca cc cng c OLAP cn phi thng minh n mc m nhiu client c th c truy nhp ti mt cch d dng v c th lp trnh tch hp. Server thng minh phi c kh nng nh x v xy dng d liu t nhng c s d liu vt l v logic khc hn nhau. iu rt cn thit m bo tnh trong sut v xy dng mt lc mc khi nim, logic, vt l chung. 6. Kh nng th nguyn ho tng qut (Generic Dimensionality): Mi chiu ca d liu phi cn bng gia cu trc v kh nng thc hin ca n. Thng ch tn ti mt cu trc chung cho tt c cc chiu. Mi chc nng c p dng cho mt chiu cng c th p dng cho cc chiu khc. 7. Lm ch ma trn ng v tha (Sparse Matrix): Cu trc vt l ca server OLAP cn phi bin i cho ph hp vi m hnh phn tch c th c to ra v ti vo vic qun l cc ma trn tha l ti u nht. Khi lm vic vi cc ma trn tha th Server OLAP c kh nng suy lun ra v tm ra cch lu tr d liu hiu qu nht. Cc phong php truy nhp vt l cng c thay i thng xuyn v cung cp nhng kiu c ch khc nhau v d nh tnh ton trc tip, cy nh phn, k thut bm hoc s kt hp tt nht nhng k thut nh vy. 8. H tr nhiu ngi s dng (Multi-user support): Nhng cng c ca OLAP phi cung cp truy nhp ng thi (ly d liu ra v cp nht), tnh ton vn v an ton h tr

- 133 -

Data Warehouse

cho nhng ngi s dng lm vic ng thi vi cng mt m hnh phn tch hoc to ra nhng m hnh khc nhau t cng mt d liu. 9. Nhng php ton cho gia cc chiu khng hn ch (Unrestricted Cross Diimension Operation): Trong phn tch d liu a chiu, tt c cc chiu c to ra v c vai tr nh nhau. Cc cng c OLAP qun l nhng tnh ton lin quan ti cc chiu v khng yu cu ngi s dng phi nh ngha nhng php ton . Vic tnh ton i hi phi nh ngha cc cng thc ty thuc vo mt ngn ng, ngn ng ny phi cho php tnh v thao tc vi mt s lng chiu bt k, m khng b hn ch bi mi quan h gia cc phn t, khng lin quan ti s thuc tnh chung ca d liu ca mi phn t. 10. Thao tc tp trung vo d liu trc quan (Intuitive Data Manipulation): Nhng thao tc nh nh hng li ng dn xy dng d liu hoc khoan su xung theo cc chiu hoc cc hng c thc hin bng hnh ng trc tip trn nhng phn t ca m hnh phn tch m khng i hi phi s dng nhng menu hay ngt cho giao din vi ngi s dng. Nhng chiu c nh ngha trong m hnh phn tch cha tt c thng tin m ngi s dng cn thc hin nhng hnh ng c hu. 11. To bo co linh hot (Flexible Reporting): Vi vic s dng OLAP Server v cc cng c ca n, mt ngi s dng u cui c th thao tc, phn tch, ng b ho v xem xt d liu theo bt k cch no m ngi mong mun, bao gm c vic to ra nhng nhm logic hoc b tr nhng hng, ct, phn t cnh nhng phn t khc m ngi mong mun. Nhng phng tin to bo co cng phi cung cp tnh linh hot v a ra nhng thng tin c ng b theo bt k cch no m ngi s dng mun hin th chng. 12. Khng hn ch s chiu v cc mc kt hp d liu (Unlimited Dimension and Aggregation Levels): Mt server OLAP c th cha c t nht l 15 chiu trong mt m hnh phn tch thng thng nht. Mi mt trong s cc chiu cho php mt s lng khng gii hn cc mc tng kt v kt hp d liu do ngi s dng nh ngha v a ra cch xy dng cc mc .
8.3 nh gi cc server OLAP v cc cng c

Da trn 5 tiu ch nh gi: 1. c im v cc chc nng: OLAP l cng ngh x l phn tch trc tuyn to ra v a ra nhng thng tin mi t nhng d liu ang tn ti thng qua nhng cng thc tnh ton v nhng lut chuyn i. Cc cng c v OLAP Server thc hin cc cng vic sau: H tr nhiu chiu v s phn cp ca mi mt trong nhng chiu . Kt hp, tng kt, tnh ton trc v kt xut ra nhng d liu theo mt chiu hoc mt tp cc chiu c la chn. p dng nhng tnh ton logic, cng thc v nhng th tc phn tch i vi mt hoc mt tp cc chiu c la chn.

- 134 -

Data Warehouse

H tr khi nim v m hnh phn tch tp cc chiu v nhng nguyn t cu thnh, logic tnh ton, cc cng thc, cc th tc phn tch v nhng d liu kt xut, tng kt, kt hp. Cung cp mt th vin chc nng. Cung cp kh nng tnh ton v so snh phn tch mnh v d nh phn hng, so snh, tnh %, tnh gi tr nh nht, ln nht, trung bnh, ... Thc hin nhng tnh ton cho gia cc chiu Cung cp nhng dch v thng minh v thi gian Chuyn i mt chiu thnh mt chiu khc, c th rt c ch sau khi ho hp hay thu nhn. nh hng xem xt v phn tch s dng quay, xem cho, khoan su, ko ln theo mt hay nhiu chiu. Nhng x l phn tch l nhu cu rt cn thit ca ngi s dng v vy nhng x l phn tch cn phi trn tru khng b ngt qung. 2. Truy nhp ti nhng c tnh v chc nng: Giao din v truy nhp ti cc dch v ca OLAP ca ngi s dng phi cung cp nhiu la chn v phi thc y s hiu bit ca ngi s dng v kh nng nhng tri thc vo m hnh phn tch OLAP. Nhng kh nng la chn bao gm: Bng tnh: t nht ngi s dng phi c th ti d liu OLAP vo cng c bng tnh ca h cho vic to bo co v phn tch thm. Cc cng c Client c quyn: Tu thuc vo mt ng dng c th Cc cng c thuc nhm th 3: H tr API (Application Programming Interface) ca Server OLAP (nu API l c quyn th cn c mt c ch kho cho OLAP server ). Mi trng 4 GL (mi trng dng ngn ng lp trnh th h th 4): phi h tr cho tt c cc chc nng v c im ca OLAP Server. Giao tip vi chun defacto: l nhng mi trng ng dng v d nh VB, Power Builder v nhng giao din nh OLE, DDE... Client nh hng khi: Nhng cng c thuc nhm th 3 m giao tip c vi cc dch v ca OLAP. to kh nng nhng tri thc vo m hnh phn tch, giao din truy nhp phi thc hin cc cng vic sau: Truy nhp v lc ra nhng tp con d liu da vo s phn cp, m hnh, thi gian v nhng chiu c la chn khc. Truy nhp ti nhiu mc ca s phn cp vi mt yu cu chit lc n. - 135 -

Data Warehouse

Nhn thc c nhng d liu tng kt v kt hp, phn chia v nhng tp ch s to ra nhng truy vn ng. Ti u mt c s d liu quan h c th bao gm nhng m rng SQL ca n khi truy nhp vo mt kho d liu quan h. 3. M t dch v OLAP: M t ca cc dch v OLAP trong kin trc vi mt kho d liu quan h hay mt kho d liu a chiu u phi tho nhng c im v cng ngh, tnh n nh v nng sut ca m hnh v ng dng phn tch c lp. Vn nng sut v tnh n nh c bn ti trong nhng phn trn. Cn nhng c im v cng ngh ph thuc vo m hnh phn tch v vic s dng c d nh t trc. Mt s nhng c im l: Kh nng ghi-c: lin quan ti cc ng dng tng tc gia d bo v iu chnh ngn sch. Nhiu ngi cng ghi: h tr cho s cng phn tch a chiu ca mt nhm ngi. Gii quyt iu ny kh khn hn l i vi c s d liu quan h. Thay v cng vic ch lin quan ti mt hng hay mt bng, mt yu cu cp nht hay yu cu ghi ca OLAP i hi vic tnh ton li nhng gi tr c kt xut v tnh ton, nh hng nhiu ti cc chiu v s phn cp trong cc chiu . Phm vi ca kho ghi c th rt rng v vic tnh ton li c th tp trung vo cc php ton nn thi gian kho rt di khin cho cng sut thc hin thp. Nhiu c s d liu: Nu ch c mt c s d liu cho mi ng dng OLAP th i hi mt c ch tng tc bi v d liu c kt xut t mt c s d liu cho ng dng ny c th l u vo ca mt ng dng khc. Phm vi ca kiu d liu: xut pht t s, ti thi gian, ti nhng m t (cho nhng mc ch bo co v hin th) ti BLOB. Hn na l cc kiu hnh nh c th ci thin s giao tip ca nhng phn tch phc tp. 4. Kh nng qun tr: Nhng chc nng qun tr cn cho vic chun b, ci t v cc mc ch thc hin s din ra sau bao gm: nh ngha m hnh phn tch theo chiu. To ra v duy tr kho siu d liu Kim sot truy nhp v mc u tin s dng. Vn cn quan tm y l ngi s dng mun lm g v nhng ai c th truy nhp vo m hnh phn tch v d liu ca n. Ti m hnh phn tch t DW hoc DM. iu chnh cng sut ti mc chp nhn c cho php nhng qu trnh phn tch khng b ngt qung. T chc li c s d liu ci thin cng sut, thay i m hnh theo chiu hoc cp nht d liu.

- 136 -

Data Warehouse

Qun l tt c cc phn ca h thng bao gm c phn cng trung gian. Kin trc tham chiu cung cp mt phng thc hiu c phm vi ca nhim v qun l cc h thng mt cch c trt t. Phn tn d liu ti cc client dng cho nhng phn tch thm a phng (Client ) 5. Kin trc tng th: T khung nhn kin trc tng th, khng th c mt s la chn n gin gia mt kho d liu a chiu vi kho d liu quan h. Ngi s dng cn cung cp nhng tiu ch c th to ra mt s la chn ng n. Xu hng hin nay l cung cp nhng dch v OLAP kt hp vi Server OLAP mt u (kho d liu a chiu c nhng vo nhng d liu th) v mt kho d liu quan h u kia (vi nhng d liu chi tit c lm sch). Trong thc t c mt s cng ty bt u vi mt kho d liu quan h sau to thm mt kho d liu a chiu khi cn thit. Trong cu hnh kin trc ny, thng tin c truy nhp v nhng cu truy vn mt cch thng xuyn c tnh ton trc, c tng kt v kt hp sau c lu tr trong kho d liu a chiu ca OLAP Server. N c th c thc hin trong ln ti m hnh phn tch u tin t kho d liu quan h DW hoc DM. Nhng truy vn phc tp v tp trung nhiu vo tnh ton hoc nhng d liu phc tp c tnh ton to ra t nhng d liu khc cng c x l truc v lu tr. iu ny lm cho tc thc hin rt nhanh. Cn nhng d liu c truy nhp khng thng xuyn hoc nhng gi tr c tnh ton t mt s t thnh phn cc chiu ch c tnh khi nhn c mt truy vn. Nhng d liu khng c truy nhp thng xuyn th khng c lu trong kho d liu a chiu v c th c OLAP server ly ra t kho d liu quan h ch khi cn thit. Chc nng gim st c th lu tr d liu (c truy nhp khng thng xuyn trc ) hoc nhng kt qu ca mt cu truy vn khng truy nhp thng xuyn trong kho d liu a chiu cho nhng yu cu sau . iu ny lm tng ng k cng sut thc hin. Cu hnh kt hp 2 loi ny cng cho php khoan su d liu ti mc chi tit nht, nhng d liu chi tit khng c sn trong kho d liu a chiu, bng cch to ra mt yu cu v ly nhng d liu chi tit trong kho d liu quan h. Tm li mt gii php OLAP tt cung cp mt s cn bng gia 5 tiu ch trn v chi ph cho mt chu trnh ca gii php (t vic thu nhn, ci t, o to, bo tr n thc hin). Th trng cc cng c ROLAP: ngoi nhng cng c ca MOLAP nh trn, cn c nhiu cng c s dng trc tip i vi d liu quan h nh: ROLAP databases: Oracle, Sybase, Informixs MetaCube, RedBrick, Db2, Ingres, DSS Agent Multi-dimensional: Arbors Essbase, Desktop ROLAP: Cognos PowerPlay, Brios Query, Business objects Business Objects, - 137 -

Data Warehouse

Desktop MDD: Cognos PowerPlay, Microsoft Excel, Business objects Business Objects, Pilots Desktop, Lotus 1-2-3, Oracle Express Hnh sau m t cc phm tr s dng nhng cng c nu trn truy cp vo kho d liu.

ROLAP

Data Warehousse
Metadata

Data Mart
Metadata

MDD Data Cube

Workstation

Workstation

Workstation

Workstation

Hnh 8.4 Cc kh nng truy cp vo kho d liu


8.5 Cng c tr gip phn tch thit k h thng thng tin Des2000

Designer/2000 l mt b cng c h tr qu trnh phn tch thit k v pht sinh ton b ng dng trn mi trng Client/Server v mi trng Web. Ta c th s dng Designer/2000 khi mun m hnh ha v pht sinh ton b ng dng n gin m khng cn phi vit mt dng lnh no. Cng c ny l tp hp cc cng c tr gip cho tt c cc giai on trong qu trnh phn tch thit k v pht trin h thng, t bc phn tch chin lc xy dng cho n pht sinh ng dng. Nh nhng ngi phn tch h thng c th xy dng mt cch nhanh chng cc ng dng ca mnh t vic xy dng m hnh thc th lin kt, biu phn r chc nng, biu lung d liu ... Chng c m t c cu trc v c lu tr trong t in d liu v c th c tham kho thng xuyn mt cch d dng trong sut qu trnh pht trin h thng. Kt thc qu trnh ta thu c mt CSDL logic v vt l, cng vic cn li ch l hiu chnh p ng yu cu ca ngi s dng. Trong giai on phn tch h thng, Des2000 cung cp cc cng c sau: Process Modeller: l cng c m hnh ho chc nng nghip v

- 138 -

Data Warehouse

Entity Relationship Diagrammer: l cng c thit lp m hnh thc th/lin kt. y l mt bc quan trng trong cng vic thit k v chnh t cc thc th v cc quan h gia chng m ta to ra c cc bng d liu cng trn cng c Des2000 ny. Function Hierachy Diagrammer: l cng c thit lp biu phn r chc nng h thng Dataflow Diagrammer: l cng c thit lp s dng d liu

Sau giai on phn tch ny mt CSDL mc khi nim c hnh thnh. Sau Des2000 c cc cng c nh x (s dng Database Design Wizard) cc chc nng thc th khi chuyn t giai on phn tch sang thit k. Trong giai on thit k h thng Des2000 cung cp cc cng c sau: Data Diagrammer: thit lp biu d liu v thc hin nhng bin i cn thit sau khi nh x t s thc th/lin kt. Sau khi nh x th CSDL logic c hnh thnh trong giai on thit k Module Structure Diagrammer: khai bo cu trc cc modul chng trnh ( s c to ra sau khi thc hin lnh Generate to ra cc bng vt l) Modul Data Diagrammer: l cng c lp m hnh m t vic s dng d liu ( cc bng, cc khung nhn...) cho cc modul chng trnh

Mun to ra CSDL vt l thc s phi thc hin Generate SQL DDL gi n cng c pht sinh d liu Server Generator to ra cc file cha cc cu lnh SQL ( gm 4 file c dng .con, .ind, .sql v .tab) v chy chng. Lc ny CSDL ca h thng thc s c to ra trn server. Ngoi ra Des2000 cn cung cp thm cng c kim tra cho ma trn ( Matrix Diagrammer), mn hnh (Form * Generator) v bo co (Report * Generator) c th to ra cc form v report dng th. Cc form v report dng th ny s c tinh chnh li nh cng c Developer/2000. Oracle Data Mart Builder (ODMB): L cng c xy dng DM tr gip tt c cc qu trnh thu thp, chuyn i, tinh ch v lm sch d liu v ti d liu c x l vo kho d liu ch DM ca Oracle. Vic phn tch v thit k DM ny c thc hin nh s tr gip ca b cng c Des2000 c trnh by trn trc tc l tn ti CSDL vt l thc s cha cc bng nhng cha c d liu. Cng c xy dng DM h tr cho vic a d liu t cc ngun khc nhau vo kho d liu ny. ng thi trc khi c a vo DM d liu cng c tinh ch, chuyn i lm sch nh s tr gip c lc ca cng c ny. ODMB cung cp nhng cng c hu hiu , cng vic ca ngi xy dng DM ch phi thit lp cc Baseview, MetaView, cc Plans, cc Snapshot v cc Transform. cng l nhng thnh phn chnh ca cng c ODMB. Thm na, ODMB cn tr gip vic sinh ra cc siu d liu (metadata) v lu tr trong sut qu trnh thc hin vic xy dng DM bao gm c cc siu d liu t qu trnh thit k DM bng Des2000 ti qu trnh trch lc, chuyn i v lm sch d liu cho vo DM.

- 139 -

Data Warehouse

BaseView: L thnh phn c bn ca ODMB, l mt th hin ha ca mt c s d liu bao gm cc bng, cc ct v kt hp gia cc bng. Mt BaseView c to ra ch t mt c s d liu m khng c to ra t nhiu hn mt c s d liu. Mi BaseView to ra mt con ng kt ni vi c s d liu ( bt k n thuc loi g) tng ng ca n bao gm tt c cc bng. Nu khng mun dng tt c cc bng th ta c th chn lc nhng bng cn dng n, sau cng c th b sung thm trong trng hp cn thit. Ngi to lp nhng plan khng dng trc tip nhng BaseView ny m dng n lm c s to ra nhng MetaView. C th to nhiu BaseView khc nhau t mt c s d liu MetaView : MetaView c to ra t cc BaseView tc l da trn nhng bng v cc quan h gia chng. Mt MetaView c th to ra t mt hoc nhiu BaseView, khng cn thit phi cha tt c cc bng ca BaseView m ch chn mt s bng cn thit. Mi bng c hin th nh mt Category ca mt MetaView. Mi trng ca bng c coi l mt thnh phn (part) tham gia vo Plan. C th thc hin cc cng vic xa b hay thm mi, i tn hay t chc li cc mc cc Category, cc MetaView ch bng nhng thao tc ko nhng cc trng trong BaseView vo. Nhng thay i trn Metaview khng gy nh hng ti BaseView tng ng ca chng. Ngc li nu cn thay i trn nhiu MetaView th c th thc hin thay i trn BaseView khin cho tt c nhng MetaView lin quan cng s b thay i. Cc Metadata c hin th trong Parts Bin, ch nhng ngi s dng c thm quyn truy nhp to ra nhng lung d liu ( trong Plan) mi c php kt hp cc Part ny trong mt Plan. Nhng quyn ny do nhng ngi c chc nng qun tr thit lp s dng cng c Oracle Data Mart Builder Admin. Khng th kt hp trn cc MetaView. Cc Plan: Vic thit lp cc Plan l cng vic nng n nht ca vic xy dng DM. Plan l nhng cu lnh ly v hin th thng tin. D liu t cc ngun sau khi c t chc thnh cc MetaView s c trch lc v chuyn i v lm sch thng qua cc Plan v c vo cc bng cui thuc c s d liu DM thit k bng Des2000. Mi Plan th hin mt lung i ca d liu ( Data Flow) t cc cc MetaView n bng ch v nhng thao tc bin i trung gian nhng d liu nh cc cng c tr gip c gi l Transform. Thnh phn to ra mt Plan gm cc Part v cc Transform cng vi Data Flow. Transform l nhng cng c nm trong Tool Bin c dng trong Data Flow. Mi bc trong Data Flow ca mt Plan, Transform thc hin nhng php ton ty bin trn nhng d liu c chn c th trch lc d liu, chuyn i v ti d liu vo DM. Cng c cng cho php bn to ra nhng Transform mi phc v cho nhng mc ch ca ring bn. Vic can thip vo cc Plan ph thuc vo thm quyn ca ngi s dng c gn cho bi Ngi qun tr.

Snap: Sau khi chy mt Plan, c th lu di 2 dng, mt Plan hoc mt Snap. Khi lu di dng mt Plan tc l lu nhng lnh ly ra d liu cn khi lu di dng mt Snap tc l lu kt qu c th trong bng ch ca nhng d liu trong bng ngun ti thi im . Nu cng chy Plan c lu ti mt thi im khc c th cho mt kt qu khc: lu thnh mt Snap khc cn Plan khng thay i. Khi m mt Snap c lu trc th n ng nhin vn khng thay i.
- 140 -

Data Warehouse

- 141 -

Data Warehouse

Cc thut ng
BDW DW DM OLTP OLAP SA DSS ODS ACID BD MD OM LS RDBMS MDDB EID MOLAP ROLAP Bng Fact Bng Dimension Business Data Metadata Operational Metadata Legacy System Relational DB Management Sys. H MultiDimensional Database Executive Information Database Multidimensional OLAP Relational OLAP Fact table Dimension table D liu nghip v Siu d liu Siu d liu tc nghip H thng c trc, k tha CSDL quan h CSDL a chiu CSDL thng tin thc thi OLAP a chiu OLAP quan h Bng s kin Bng chiu Bussiness Data Warehouse Data Warehouse Datamart On_line Transaction Processing On_line Analytical Processing Subject Area Decision Suport System Data Warehousing Operational Data Store Kho d liu nghip v Kho d liu Kho d liu cc b X l giao dch trc tuyn X l phn tch trc tuyn Vng ch H h tr quyt nh Phng php kho d liu Kho d liu tc nghip

- 142 -

Data Warehouse

Ph lc
Data Warehousing and Oracle Discoverer/2000 TM
Summary

Giving users immediate access to the business information they need is one of the promises of the Information Age. The irony of this promise is that in the process of collecting larger quantities of data, it is increasingly more difficult for users to sift through and collect the pieces of information that most directly affect them. To resolve this issue, Oracle Discoverer/2000TM is an end user analysis and query tool that empowers users to obtain specific information from complex production databases. Designed for both end users and MIS professionals together, it allows users to perform business analysis and create business reports and graphs without any programming knowledge or database experience. This paper covers how Oracle Discoverer/2000 provides competitive advantage when used as part of the Oracle Warehouse.
Data Warehouses

A data warehouse is a distinct database from the OLTP database and may take source data from several separate OLTP systems. The data structures are optimized for rapid retrieval and analysis. The data itself is usually historical and is updated in time periods from a day to a year, depending on the application, and usually has the following key features: Source data is usually mainframe based. The data warehouse is usually based in a relational database. Some OLAP (On-line Analytic Processing) functionality is provided using multidimensional databases. Increasing volumes of data. Data structures that are denormalized and summarized (approximately 10-30% of the data is summarized). Analysis requirements. The concept of a data warehouse is not new and is designed to leverage the wealth of information a company collects about itself, its customers and the - 143 -

Data Warehouse

market in order to make better business decisions. Essentially , there are three components to the data warehouse: 1. Extracting and scrubbing the data from the OLTP systems (usually mainframe based). Data is extracted from multiple systems within a company and placed in one centralized data warehouse. As part of the extraction process, the data needs to be scrubbed. (For example, in one source system, personnel data may be stored with the letters "M" and "F" for male and female; in a second source system, they may be referred to using the words "male" and "female" and in a third source system, they may be the numbers, "1" and "0." The process of translating these three definitions to a common denominator is called scrubbing.) Designing the data warehouse structures. The IS department interviews users to find their requirements and create an initial warehouse. Users access the warehouse and request new information. New data is loaded and so the process continues. At present, most of the information contained in these warehouses is atomic with about 10-30% being summarized. 2. Using front end tools. Ad-hoc query and reporting tools and front-end analysis tools are being utilized against (su dung tren, dua vao) data warehouses. This paper focuses on how Oracle Discoverer/2000 is utilized against a relational warehouse.
Oracle Discoverer/2000

Oracle Discoverer/2000 is a low maintenance ad hoc query and analysis tool for the Oracle Warehouse. It utilizes a powerful server-based meta-layer (the End User Layer TM ) to hide the complexity of the database from users. Discoverer/2000 has two components: Data Query for the end user or executive who wants to query and analyze their data, and Browser for the power user who may need to modify data or database schemas'. Discoverer/2000 supports rapid querying and reporting, multi-dimensional analysis of Data Warehouses and powerful data mining. With the low maintenance End User Layer Discoverer/2000 empowers users to make critical business decisions by placing information at their fingertips. When running queries or performing analysis against the large volumes of data held in today s warehouses, there will be times inquiries take more than a few seconds to respond. Both components utilize the query governor to control run-away queries and support background queries. Long-running queries that keep control over the PC are a serious issue with first generation query tools, reducing productivity and causing frustration with users. Support for background queries enables a query to be started, and frees the PC resources to allow you to switch to another application (such as MS-Word or a spreadsheet) to continue working. - 144 -

Data Warehouse

Discoverer/2000 Components
End User Layer (TM) Data Warehouse Administration Utility Data Query Browser Query Analysis Charting Query DML DDL

The Data Warehouse Meta-Layer One issue with Data Warehouses is support for the meta-layer. A typical warehouse today contains two separate meta-layers: The meta-layer that defines the relationship between the source (OLTP) systems and the Warehouse itself, (which contains the models of both systems and any relationships/transformation processes between them) The front-end meta-layer for an end user perspective of the data structures. Maintaining two meta-layers is a time-consuming and error-prone process. To resolve this, integration between Discoverer/2000 and Oracle Designer/2000 TM enables you to populate the front-end meta-layer directly from Designer/2000, reducing the need to duplicate the models and effort required for maintenance of the Warehouse. Another separate issue with first generation query tools is the amount of effort required for set-up and maintenance of their meta-layer. The first tools in this market require heavy administration, which becomes increasingly expensive and intensive as the number of users accessing the system increases. With Discoverer/2000, one-third of the development effort focused on the design of the End User Layer and Administration Utility, and feedback from early users was incorporated into the final product. This ensured that both set-up and maintenance are minimal, through an extremely intuitive interface. When a Business Area is put into production, it can be altered to accommodate changes in the business or unanticipated requests with little effort. End User Layer Overview The End User Layer is a second generation server-based meta-layer, simplifying (lam don gian hoa) and enhancing (lam nang cao) the users' view of the data. It provides a business perspective by grouping information into logical Business Areas. You see complex concepts and data models as a list of familiar objects (things) such as "Invoice" or "Annual Salary." The End User Layer

- 145 -

Data Warehouse

automatically handles any relationships (joins), tables, views, default formatting, and more. The End User Layer provides: Business Areas to group related information Renamed fields and tables and associated descriptions Relationships (joins) between any schema objects Drill Relationships Derived (dynamically calculated) values Dynamic Aggregation Default formatting for reports Data Query Component: A Technical Overview Introduction: Using information from the End User Layer, Data Query enables the end user to create reports and perform powerful multi-dimensional analysis without understanding SQL or database structures. You drill through your data to quickly find problem areas that need immediate attention, areas that are doing well, or just track business trends. Often while analyzing business performance a number of questions come to mind (mot so cau hoi co the se nay sinh), but in order to answer them a new query must be created. Data Query allows you to answer these questions without rebuilding new queries. This allows for quick analysis of business needs without restarting from the beginning. You keep drilling, flipping and changing dimensions until all your questions are answered. Reports can be displayed as business charts to visually identify trends and highlight extremes. Chart styles include column, bar, line and pie charts in two or three dimensions. The charts are dynamically linked to your results, allowing you to data mine through the chart itself and immediately see trends and exceptions in your business. Building Queries Users build queries by clicking on objects from within a Business Area, (a Business Area is simply a functional grouping of objects such as Human Resources .) As a user selects an object, they are automatically only shown objects that are directly related. In this way, you do not need to understand relational concepts -- Data Query leads you through the information available in the database. Once you have selected the objects of interest, you are placed in a report template, and you place items of interest into the template and layout the report - 146 -

Data Warehouse

exactly as you would like. Parameters Runtime parameters enable reports to be run on a regular basis against different sets of data. A parameter screen is displayed showing default values which may be overridden. Parameters also support pick-lists of available data, ensuring that users select their information easily without having to know exact values stored in the warehouse. Running queries Data Query runs queries with both on-line and batch processing (where available). On-line queries are subject to a query time governor that automatically stops runaway queries. This is based on the Oracle7 governor and can be based on CPU time, row limit, number of blocks accessed etc. Once the time limit is reached the query automatically stops with a meaningful message, and any data retrieved is displayed on the screen. Batch processing is operating system dependent. This uses a command file that may be tailored for individual users. (For example, queries may be run at 2am and the results sent directly to a printer ready to be picked up in the morning). Graphics and spreadsheets Query results can be graphed automatically in a variety of default styles and further customized using the Graphics engine of Discoverer/2000. Query results may also be passed to spreadsheets (either dynamically or by flat file) and other PC packages. Through this open interface Data Query is a powerful tool for extracting information from complex databases and passing it automatically to other tools for further processing. Multi-Dimensional Analysis When a query is run on-line the results are displayed on the screen in the report browser. Users browse around the information, and perform analysis directly on the report shown. You can change the dimensions of your report by selecting from a list, and Data Query immediately retrieves the new information. This is a powerful way to see how different aspects of your business (sales by region and product, sales by region and time) are performing, without rebuilding and running large numbers of separate reports. By changing dimensions users get an immediate perspective on how business functions affect each other. You can also drill down to get more detailed information on an area of interest, or drill up to retrieve summary information. This allows you to retrieve information and rapidly focus on areas that require more analysis. As you drill further, objects are added as necessary and automatic joins are made. Users are

- 147 -

Data Warehouse

unaware of the complexity of the reports they are creating -- you just see the required information.. Report Styles Data Query supports the following report styles: 1. Tabular reports 2. Break tabular reports with summaries and grand totals 3. Matrix or cross-tabular reports 4. Master-detail reports with a master record displayed with any of the above Tabular and matrix style (cross-tabular) query builders are used to lay out reports in a WYSIWYG (what you see is what you get) fashion. The query builder displays the format and layout exactly as it appears when run. This makes report design simple and straightforward and saves machine resources by enabling users to see the report format without actually running the query. Reports may have header and footer information of any length. Headers and footers may include report items, as well as date and page number information. Each report can be up to 500 characters wide, with the page length set to fit (phu hop) individual printers or screen sizes.
The Browser Component: A Technical Overview

The Browser component completes the suite of tools in Discoverer/2000 by supporting the power user. Browser also runs against the End User Layer, providing integration and compatibility with the Data Query component. Queries may be transferred freely between the two. Browser may run directly against the database dictionary, allowing privileged users to have database access and update without administration. The database objects are displayed visually and foreign key relationships are shown, making this an ideal tool for perusing database structures. You can directly access synonyms, snapshots and other databases via database links. With the correct security privilege you can load or modify data using a familiar spreadsheet style interface, and graphically create or modify tables and other data structures. Browser has tight integration with spreadsheets, allowing you to place data from your warehouse directly in the spreadsheet, modify it, and then transfer those modifications back to the database (assuming the correct privileges of course!). You can paste data using the clipboard, dynamically link queries using your operating system's "hot link" feature, such as DDE, or save the data to WKS, SYLK, DIF or delimited ASCII formats. Queries - 148 -

Data Warehouse

The Browser User Edition enables users to perform queries and reports directly against the native database structures (as well as the End User Layer). By displaying the structures and relationships between them, Browser provides a powerful interface to the Warehouse. It visually displays the standard star structure of the Data Warehouse, allowing power users to browse rapidly around the data. Similarly to the Data Query component, Browser supports parameters, conditions, breaks and sorts. The Data Editor The Data Editor allows the privileged user to edit and load data. In this mode the Results Window becomes a data editor, with a familiar spreadsheet style interface where you browse and edit data. Rows or columns can also be updated by pasting them from the clipboard, and data can be loaded from popular desktop file formats. This capability is extremely powerful when used in conjunction with (duoc dung chung voi) the tight integration available between Browser and spreadsheets. Data is readily queried using Browser and dynamically passed through to your spreadsheet. You analyze your data and modify as necessary in the spreadsheet and then use Browser to seamlessly commit the updates back to the database.
The Schema Editor

The Schema Editor allows you to create tables and other database objects by copying existing ones, or creating them from scratch , without programming. You can graphically add or modify indexes, constraints, comments and so on. These capabilities enable you to reduce the time and/or cost of running Decision Support queries against the warehouse by: 1. Turning a view (often with joins or aggregates) into a table 2. Adding indexes 3. Creating tables containing data subsets 4. Downloading a table from a central server to a local database for off-line processing
Administration

Using the Browser component directly against the base structures requires no extra administration beyond the standard database access privileges. An easy to use administration utility is provided to enable users to run both Browser and Data Query with the End User Layer. Great emphasis was placed to during development to ensure that administration was automated as much as

- 149 -

Data Warehouse

possible, and maintenance kept to a minimum. Security Users cannot see any information they do not have access to within the underlying database. Using Business Areas the administrator can further restrict the objects for each user to build ad hoc queries. This is important where there are tables that users are allowed to access through applications which are unsuitable for direct ad hoc queries. If database roles have been set up, then access to business areas is automatically handled through these roles. If access through a role is altered (bi sua doi ben trong server) within the server, this is reflected immediately. Computer Resources Discoverer/2000 protects computer resources in an ad hoc environment. Each user has a profile with a distinct set of privileges. There is also a default profile for new users or existing users without their own personal profile. One of the privileges which the profile specifies is whether a user is allowed to run reports on-line, via a background batch process, or a choice of both. On-line reports are controlled with a query time governor. If the report passes the time set in this governor, it is automatically stopped and any output is displayed to the user. Users may then run the report in batch and become subject to all of the control available in that environment. Setting up the End User Layer The End User Layer contains Business Areas with objects and items that reflect of the on-line dictionary. Users are granted access to Business Areas either directly or through roles defined within Oracle7. The existence of primary and foreign key constraints within the database generates automatic joins in the End User Layer. Discoverer/2000 is then available for immediate use, with no further administration required. If administrators want to tailor this environment the Administration Utility provides a powerful and easy interface to the End User Layer. In addition, privileged users may tailor the End User Layer as they build queries and reports. After Discoverer/2000 is installed these users may define standard public reports immediately, modify the default formats, headings and automatic join conditions for objects and items, and define their own derived items. Modifying data objects Administrators may tailor objects, items and Business Areas as desired to refine the end user view of the data. The Administration Utility is used to:

- 150 -

Data Warehouse

1. Add or remove objects from business areas 2. Rename objects and items with additional default headings and text describing the information they hold 3. Add or remove objects and items from pick-lists that users see, to refine the available data 4. Handle null values automatically, ensuring that users don't have to understand the differences between null values and zeros 5. Add derived items (such as annual_salary), hiding complex SQL structures 6. Add automatic join conditions and drill relationships to help hide the complex underlying database structures.
Conclusion

Oracle has the most complete front-end solution for the Data Warehouse today, with Discoverer/2000 Release 1 and the suite of Express Tools and Applications. In the near future the result of extensive user testing and integrated development with the DBMS will provide us with an even more powerful, easier to use product.

- 151 -

Data Warehouse

Ti liu tham kho

[1] A. Berson, S J. Smith, Data Warehousing, Data Ming, & OLAP, McGraw Hill, 1997 [2] B. Devlin: Data Warehouse From Architecture To Implementation. Addison Wesley Longman.1997 [3] Informix Software: An Intrduction do Data Warehousing, 1996 [4] Informix Software: Using Metadata with Data Warehouses Training Manual. 1996 [5] J. Bischoff & T. Alexander, Data Warehouse: Practical Advice from the Experts, Prentice Hall, 2002 [6] L. John, Operational Data Stores: Building an Effective Strategy, Data Warehouse: Practical Advive from the Experts, Prentice Hall, NJ, 1997, [7] U. M. Fayyad, et all, Advances In Knowledge Discovery And Data Mining, The MIT Press, 1996. [8] V. Poe, Building a Data Warehouse for Decision Support, Prentice Hall, 1996 [9] W.H. Immon & R. D. Hackathorn, Using the Data Warehouse, Jhon Wiley & Sons, 1994 [10] Phng CSDL & LT, H tr gip quyt nh da vo d liu, ti cp TT KHTN & CNQG (on Vn Ban ch nhim), H Ni 1996 [11] on Vn Ban, Phng php thit k v khai thc kho d liu, ti nghin cu cp Trung tm KHTN & CNQG, 1997 [12] on Vn Ban, C s d liu hng i tng v h ObjectStore, Gio trnh cao hc, H Ni, 2003

- 152 -

Data Warehouse

- 153 -