Вы находитесь на странице: 1из 69

I HC QUC GIA H NI TRNG I HC CNG NGH

V Bi Hng

PHT HIN QUAN H NG NGHA NGUYN NHN-KT QU T CC VN BN

LUN VN THC S

H Ni 2005

I HC QUC GIA H NI TRNG I HC CNG NGH

V Bi Hng

PHT HIN QUAN H NG NGHA NGUYN NHN-KT QU T CC VN BN

Ngnh: Cng ngh thng tin. M s: 1.01.10

LUN VN THC S

NGI HNG DN KHOA HC: PGS.TS H QUANG THY

H Ni - 2005

1 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Nhng li u tin
Vi nhng dng ch u tin ny, ti xin dnh gi li cm n chn thnh v su sc nht ti thy gio, tin s H Quang Thy - ngi tn tnh hng dn, ch bo v to cho ti nhng iu kin tt nht t khi bt u cho ti khi hon thnh cng vic ca mnh. ng thi, xin trn trng gi li cm n ti tp th cc thy gio-B mn Cc h thng thng tin-trng i hc Cng ngh-i hc Quc gia H Ni to cho ti mt mi trng lm vic y v thun tin. Xin cm n tt c nhng ngi thn yu trong gia nh ti cng ton th bn b, nhng ngi lun mm ci v ng vin ti mi khi vp phi nhng kh khn, b tc. Cui cng, xin chn thnh cm n Thc s Nguyn Phng Thi (B mn Khoa hc my tnhtrng i hc Cng ngh- i hc Quc gia H Ni), nghin cu sinh V Hi Long (University of Illinois at Urbana Champaign- United State), anh Mnh Hng (cng ty Elcom), nhng ngi em n cho ti nhng li khuyn v cng b ch gip tho g nhng kh khn, vng mc trong qu trnh lm lun vn.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

2 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

MC LC
DANH MC HNH V .........................................................................................................4 DANH MC BNG BIU ....................................................................................................5 M U ................................................................................................................................6 CHNG 1 - TNG QUAN V SEMANTIC WEB ...........................................................9 1.1. Gii thiu .....................................................................................................................9 1.2. Khi nim Semantic Web ..........................................................................................11 1.3. Cc ng dng ca Sematic Web ................................................................................12 1.4. Cc cng ngh cn thit cho Semantic Web..............................................................14 1.4.1. XML v Semantic Web ......................................................................................15 1.4.2. Ontology .............................................................................................................20 1.5. Cc ngn ng Ontology cho Semantic Web..............................................................23 1.5.1. Cc ngn ng ......................................................................................................23 1.5.2. c im chung ca cc ngn ng .....................................................................25 1.6. Kt lun chng 1......................................................................................................28 CHNG 2 - QUAN H NGUYN NHN-KT QU V THUT TON PHT HIN QUAN H NGUYN NHN-KT QU ...........................................................................30 2.1. Gii thiu ...................................................................................................................30 2.2. Khi nim v cc mi quan h ng ngha trong ngn ng t nhin ..........................30 2.3. Quan h nguyn nhn-kt qu ...................................................................................32 2.4. Cu trc nguyn nhn-kt qu trong ngn ng ca con ngi..................................34 2.4.1. Cu trc nguyn nhn-kt qu tng minh.........................................................35 2.4.1.1. T ni ch nguyn nhn ...............................................................................35 2.4.1.2. ng t ch nguyn nhn.............................................................................36 2.4.1.3. Cu phc vi mt cp t ch nguyn nhn ..................................................39 2.4.2. Cu trc nguyn nhn khng tng minh...........................................................39 2.5. Thut ton khai ph d liu pht hin quan h nguyn nhn-kt qu t cc vn bn41 2.5.1. Gii thiu ............................................................................................................41 2.5.2. Thut ton pht hin quan h nguyn nhn-kt qu ...........................................43

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

3 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn. 2.6. Kt lun chng 2......................................................................................................47 CHNG 3 - KT QU TH NGHIM THUT TON ................................................48 3.1. Gii thiu ...................................................................................................................48 3.2. nh dng file d liu ................................................................................................49 3.3. Chng trnh th nghim...........................................................................................52 3.4. Kt qu thc nghim..................................................................................................53 3.5. Nhn xt.....................................................................................................................57 3.6. Kt lun chng 3......................................................................................................58 KT LUN...........................................................................................................................59 TI LIU THAM KHO ....................................................................................................60 PH LC: Kt qu thc nghim vi cc cp danh t c tn sut xut hin ln hn 4 ln. 63

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

4 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

DANH MC HNH V
Hnh 1: Cc giai on pht trin ca "smart data" .............................................................14 Hnh 2: Mt s ngn ng ontology.......................................................................................23 Hnh 3: th t l cc cp danh t mang ngha nguyn nhn-kt qu theo tn sut xut hin........................................................................................................................................55 Hnh 4: th th hin t l cc cp danh t c ngha nguyn nhn-kt qu c tn xut ln hn mt gi tr ngng. ........................................................................................................57

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

5 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

DANH MC BNG BIU


Bng 1: Cc ng t nguyn nhn ly ra t WordNet .........................................................52 Bng 2: T l phn trm ca cc cp danh t tm thy theo tn sut xut hin. ..................54 Bng 3: t l phn trm cc cp mang ngha nguyn nhn-kt qu theo tn sut xut hin. ..............................................................................................................................................54 Bng 4: t l cc cp danh t mang ngha nguyn nhn-kt qu c tn sut ln hn mt gi tr ngng. ............................................................................................................................56

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

6 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

M U
World Wide Web l mt kho thng tin khng l vi nhng tim nng khng gii hn. C rt nhiu tim nng ca World Wide Web m cho n nay vn cha c khai thc mt cch hiu qu. Cc vn bn Web c lm ra vi mc ch ban u l dnh cho con ngi c. Nhng vi s lng khng l ca cc trang Web trn Internet, mt ngi c dnh c i mnh cng s khng bao gi c ht tt c nhng trang Web ny thu c y cc tri thc cn thit. Nhn thc c vn ny, c rt nhiu hng nghin cu hnh thnh, thu ht nhiu nhm nh khoa hc trn th gii, nhm mc ch s dng my tnh h tr con ngi trong vic thu thp thng tin v tng hp tri thc t cc trang Web trn Internet. V d nh vic p dng cc k thut Data Mining khai thc thng tin t cc vn bn Web, cng ngh Agent trong kinh doanh trc tuyn Tuy nhin trong thi gian va qua, nhng hng nghin cu ny ch yu mi ch tp trung vo vic khai thc thng tin da trn cc t vng n l hoc da trn mt s cu trc c nh ca trang Web. Tht l kh khn my tnh c th truy cp v tng hp cc thng tin trong cc vn bn v phng din ng ngha. Gn y, mt s hng nghin cu mi c m ra nhm mc ch khai thc kh nng kt hp ni dung trang Web vi cc thng tin ng ngha, to ra Semantic Web. Semantic Web khng phi l mt loi Web mi tch bit m l s nng cp ca Web hin ti (th h Web th ba), cc thng tin ng ngha c xc nh tt hn v c kt hp vo cng vi trang Web. Nh vy, vic c v hiu cc trang Web khng ch thi hnh c bi con ngi m cn c th c thi hnh bi my tnh.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

7 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Semantic Web ra i i hi mt lot cc cng ngh km theo n. Mt trong s nhng cng ngh quan trng nht i vi Semantic Web l Ontology. Thnh phn c bn ca Ontology l mt tp hp cc i tng (hay cn gi l cc khi nim) vi cc thuc tnh ca cc i tng v tp hp cc mi quan h gia cc i tng . Vic xy dng Ontology trong mt min ng dng l qu trnh tng hp tri thc trong min ng dng . Cng vic ny i hi nhng ngi xy dng ontology phi c nhng hiu bit v tri thc nht nh tm ra y i tng, thuc tnh v quan h. Xut pht t nhu cu nghin cu cc phng php h tr trong vic xy dng cc Ontology cho Semantic Web, lun vn trnh by mt phng php pht hin mi quan h ng ngha nguyn nhn-kt qu da trn tng nghin cu ca bi ton Semantic Role (CoNLL Share Task 2004 [31]) v thut ton khai ph quan h nguyn nhn-kt qu m Corina Roxana Girju tin hnh (Lun n Tin s 2002 [11]). Kt qu tm c ca thut ton chnh l nhng thng tin cn thit h tr trong vic pht hin cc i tng mi v mi quan h v mt ng ngha nguyn nhn-kt qu ca cc i tng ny trong qu trnh xy dng Ontology. Ngoi phn gii thiu, kt lun v cc ph lc. Lun vn c chia thnh 3 chng chnh: Chng 1 - Tng quan v Semantic Web. Gii thiu mt cch tng quan nhng nhu cu dn n s ra i ca th h Web th ba (Semantic Web). Nhng khi nim c bn v nhng cng ngh thit yu pht trin Semantic Web cng c trnh by trong chng ny.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

8 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Chng 2 Quan h nguyn nhn-kt qu v thut ton pht hin quan h nguyn nhn-kt qu. Chng ny i su vo phn tch cu trc quan h ng ngha nguyn nhn-kt qu trong ngn ng ca con ngi v cu trc th hin ca n trong vn bn. Thng qua lun vn trnh by mt thut ton nhm pht hin quan h nguyn nhn-kt qu t tp cc vn bn da vo tn sut xut hin ca cc cp danh t trong nhng cu cha ng t ch nguyn nhn. Chng 3 Kt qu ci t th nghim thut ton. Chng ny trnh by cc kt qu thc nghim v thut ton pht hin quan h nguyn nhn kt t cc vn bn. Chng trnh ci t th nghim cho thut ton c vit trn ngn ng Java. Thng qua cc nhn xt v gi tr cc o nh gi, kt qu thc hin chng trnh l kh quan. Phn Kt lun trnh by tng hp cc kt qu thc hin lun vn v phng hng nghin cu tip theo v cc ni dung ca lun vn. Mc d c mt mi trng lm vic tng i y v thun tin, nhng lun vn chc hn s khng trnh khi c nhiu sai st. Rt mong c s ng gp kin, nhn xt ti c th hon thin c kt qu lm vic ca mnh.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

9 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

CHNG 1 - TNG QUAN V SEMANTIC WEB


1.1. Gii thiu Internet ra i v mau chng tr thnh mt kho thng tin khng l. Hin nay, trn Internet c hng t cc trang Web c hng trm triu ngi trn khp th gii s dng [18,20,24]. Tuy nhin, khi lng thng tin trn Internet ngy cng tng th cng ng ngha vi vic tm kim, khai thc, t chc, truy cp v duy tr thng tin ngy cng tr nn kh khn hn i vi ngi s dng. Chng ta xem xt mt v d. Trong mt trng hp tm kim trn Internet, ngi s dng mun tm kim trang ch ca Mr v Mrs. Cook. Tt c nhng thng tin m ngi s dng c th nh c l tn h ca hai ngi ny l Cook, c hai ngi cng lm vic cho mt ng ch, l mt ngi c lin quan ti mt t chc c tn l ARPA-123-4567. y chc chn l nhng thng tin hu ch tm ra trang ch ca nhng ngi ny, theo mt c s tri thc c cu trc hp l cha ng tt c cc nhn t c lin quan. C v nh iu nhng thng tin tm ra trang ch ca h bng cch tm kim trn World Wide Web. Nhng khi tm kim, li xy ra cc tnh trng sau: - S dng danh mc Web c sn, ngi s dng c th tm ra trang ch ca ARPA nhng c hng trm ngi thu ph v cc nhm nghin cu ang lm vic cho chi nhnh 123-4567 - Nu tm kim theo t kho Cook th kt qu s tr li hng nghn trang Web ni v Nu n.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

10 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

- Nu tm kim mt trong hai cm t ARPA v 123-4567 th c hng trm kt qu tr v. Cn nu tm kim cho c ba t kho trn th s tr v kt qu rng. Vy th gii quyt trng hp ny nh th no? Tnh trng trn l kh ph bin i vi nhiu trng hp tm kim trn World Wide Web [18,19]. Vn chnh y l do d liu Web c qu t s t chc ng ngha. Khi m Web cng ngy cng c m rng th vic thiu t chc ng ngha nh vy s lm cho vic tm kim thng tin cng ngy cng kh, thm ch nu c thm c nhng k ngh x l ngn ng t nhin, c ch nh ch mc Tm li, hin nay vn cha c mt cch tm kim hiu qu no trn WWW [18,19] tr li cu truy vn c dng nh : Find webpage for all x,y and e such that X is a person, y is a person, z is a person Where lastName (x,Cook) and lastName (y, Cook) and employee (z,x) and employee (z,y) and married (x,y) and involvedIn (z, ARPA 123-4567)

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

11 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

S thiu kh nng hiu khung cnh ca cc t v cc mi quan h gia cc thut ng tm kim gii thch ti sao trong nhiu trng hp my tm kim li tr v kt qu tm kim sai trong khi li khng tm thy nhng ti liu mong mun [18,19,20,24]. Nu cc my tm kim c th hiu c ni dung ng ngha ca cc t, hoc hn th na, n c th hiu c c mi quan h v mt ng ngha gia cc t th chnh xc tm kim s c ci thin rt nhiu [19,24]. y chnh l mt trong nhng nguyn nhn dn n s ra i ca th h Web th ba: Semantic Web[24]. 1.2. Khi nim Semantic Web Tim Berners-Lee (ngi pht minh ra Web) a ra nh ngha Semantic Web nh sau: Bc u tin l t d liu trn Web theo mt nh dng m my tnh c th hiu c, hoc chuyn thnh nh dng m my tnh c th hiu c. iu ny to ra mt loi Web gi l Semantic Web - l mt Web d liu m c th c x l c trc tip hoc gin tip bng my tnh. [24] Semantic Web khng phi l mt Web ring bit m n ch l mt s m rng ca Web hin ti, m c cc thng tin v ng ngha nhiu hn, lm cho my tnh v con ngi c th phi hp lm vic tt hn [19,24].

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

12 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Semantic Web khng phi ch dnh cho World Wide Web. N km theo mt tp hp cc cng ngh m cng c th lm vic trn intranet ca ni b cc cng ty, doanh nghip[20,24] 1.3. Cc ng dng ca Sematic Web Semantic Search engine. Ci thin tm kim l mt trong rt nhiu nhng li ch tim nng ca Sematic Web. Hu ht cc c ch tm kim hin nay trn World Wide Web thng l mt trong ba cch tip cn sau: + nh ch mc cho cc t kho [1,4,16]. + Phn mc bng tay [11,16] . + S dng cc c ch c bit thu thp cc thng tin ng ngha t cc trang Web (nhng rt b hn ch) [2,14,16]. Mi cch tip cn trn u c nhc im. nh ch mc cc t kho th ch lin kt vi cc t vng m khng hiu c ng ngha ca chng nn c th gy ra s nhm ln (nh trong v d phn gii thiu chng). Trong khi , vic phn mc bng tay i hi phi tiu tn rt nhiu nhn cng v thi gian. Cn vic s dng mt s c ch c bit thu thp thng tin ng ngha th li rt b hn ch do cc trang Web mang rt t thng tin ng ngha hoc cn phi ph thuc vo cch b tr theo mt s cu trc nht nh ca cc trang Web. Khng c mt cch tip cn no trong s nhng cch tip cn trn (tr cch tip cn cui cng nu xt trong mt min ng dng c th) cho php suy lun c mi quan h ca cc trang Web (ngoi tr mi quan h gia cc

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

13 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

link). V vy m cc truy vn theo kiu nh trong v d phn gii thiu l khng th thc hin c. => Gii php cho vn ny chnh l Semantic Web. Thay v c gng thu thp cc tri thc t cc trang HTML hin ti, chng ta hy kt gn trc tip cc thng tin ng ngha cho cc trang HTML, lm cho n tr thnh n gin my tnh c th t x l cc thng tin v mt ng ngha m khng cn ti s h tr ca con ngi [6,19,20]. Agent Internet [19,24]: Cc Agent Internet, l cc chng trnh t tr m tng tc vi Internet, cng c th c hiu qu hn nhiu nu chng c hot ng trn mi trng Sematic Web. thc hin mt mc ch no , mt Agent Internet c th yu cu phi hiu cc trang Web thi hnh cc dch v Web. V mt l thuyt, mt agent nh th c th thc hin vic bn hng, tham gia trong mt cuc bn u gi hoc xp lch cho mt k nghV d: mt Agent c th c yu cu t ch cho mt chuyn du lch Jamaica, v Agent s t v my bay, tm mt xe car thu v t mt phng khch sn. Tt c phi da trn gi c r nht hin c v ph hp vi nhu cu. Mc d tn ti nhng Agent c th thc hin c mt vi nhim v nh vy, nhng chng c xy dng hot ng trn ch mt tp hu hn cc trang Web bit trc v phi ph thuc nhiu vo cu trc c nh ca cc trang Web ny. V vy, s tt hn rt nhiu nu nh vi bt k mt trang Web, cc Agent c th xem xt ng ngha ca cc trang Web thay v xem xt cu trc b tr c nh ca trang Web ny. Stovepipe system [24]: stovepipe system l mt h thng m th tt c cc thnh phn u l cc mch in t lm vic vi nhau. V vy, cc

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

14 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

thng tin ch l cc dng trong cc stovepipe m khng th c chia s bi mt h thng khc hoc mt t chc khc m cn nhng thng tin . Phn tch cc h thng stovepipe l cn thit tt c cc tng kin trc thng tin doanh nghip. Cng ngh Semantic Web l hiu qu nht phn tch cc h thng CSDL stovepipe. 1.4. Cc cng ngh cn thit cho Semantic Web Cch lm cho d liu c th x l c bng my tnh l lm cho d liu thng minh hn (smarter). Hnh v sau th hin cc cp trng thi pht trin ca d liu thng minh (smart data) [24].

Hnh 1: Cc giai on pht trin ca "smart data"

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

15 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Cc vn bn Text v cc c s d liu (tin XML). Hu ht d liu l c quyn trong cc ng dng. y khi nim smart l khi nim ca ng dng ch khng phi ca d liu. Cc ti liu XML s dng cc t vng n l. D liu c lp vi ng dng trong mt phm vi ng dng c th. D liu by gi th thng minh chuyn i gia cc ng dng trong phm vi . V d: cc chun XML trong: cng nghip y t, cng nghip bo him S phn loi bng XML v cc ti liu vi cc t vng phc. D liu c th c kt hp t nhiu min khc nhau v c phn lp mt cch chnh xc trong mt bng phn cp danh mc. Trong thc t, s phn lp c th c s dng khai thc d liu. Cc mi quan h gia cc phn mc trong bng phn cp danh mc c th c s dng kt ni d liu. V vy, d liu giai on ny thng minh khai thc v kt ni vi d liu khc Ontology v cc lut. giai on ny, cc d liu mi c th c suy ra t cc d liu ang tn ti bng cch s dng cc lut logic. iu ct yu y l d liu by gi thng minh c m t cng vi nhng mi quan h c th, v bng cc hnh thc tinh vi, phc tp m c th p dng c cc tnh ton logic. iu ny cho php tch d liu thnh cc thnh phn nh hn v c th phn tch su hn. Mt v d cho d liu trong giai on ny l ta c th t ng bin i mt ti liu trong mt min ng dng ny thnh mt ti liu tng ng trong mt min ng dng khc.
1.4.1. XML v Semantic Web

Cho d HTML l rt ph bin, nhng n hu nh ch c thit k cho s biu din i vi con ngi, v tht l kh my khai thc ni dung v

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

16 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

thc hin x l t ng trn cc ti liu. gii quyt vn ny, World Wide Web Consortium (W3C) pht trin eXtensible Markup Language (XML) [17,18,29]. XML v c bn l mt tp con ca Standard Generalized Markup Language (SGML), l mt chun c s dng bi cng ng x l text [18]. SGML l mt meta-language, c ngha l n c th c s dng nh ngha cc ngn ng khc - cc ng dng SGML. u im ca SGML l n c lp vi mi trng, phn tch r rng ni dung v nh dng, v c kh nng xc nh liu cc ti liu c tng thch vi cc qui tc cu trc hay khng. XML vn gi nguyn nhng c tnh ny, nhng bt i nhng th m him khi uc s dng, d gy nhm ln, hoc kh ci t. Cng ngh XML c xy dng da trn cc k t Unicode (Unicode character) v cc URI (Uniform Resource Identfier). Cc Unicode character cho php XML c bin son da trn cc k t chun quc t. URI c s dng xc nh duy nht cc khi nim (concept) ca Sematic Web [24]. XML khng phi l mt ngn ng, thc cht n ch l mt tp hp cc qui lut c php to ra ngn ng nh du mang tnh cht ng ngha trong tng lnh vc c th. Mt khc c th p dng XML to ra mt ngn ng mi. Bt c mt ngn ng no c to ra trn cc lut XML (nh MathXML) c gi l mt ng dng ca XML [18]. XML l tng c s c php ca Semantic Web [18]. Tt c cc cng ngh khc m mang c tnh ca Semantic Web u c xy dng da trn nn XML.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

17 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

C php ca XML kh ging vi HTML. iu ny khng c g ng ngc nhin v HTML l mt ng dng ca SGML (ngn ng cha ca XML). Ging nh HTML (v SGML), XML thm cc th c bao bi hai du ngoc nhn vo cc d liu vn bn, cc th ny s cung cp cc thng tin ph thm cho on vn bn. V d sau y l mt on vn bn vi cc th nh du XML m t vic lu tr a CD: <?xml version= 1.0 ?> <catalog> <cd> <artist>Cracker</artist> <title>Kerosense Hat</title> <price currency=USD>15.99</price> </cd> <cd> <artist>Phair, Liz</artist> <title>Exile in Guyville</title> <price currency=USD>15.99</price> </cd> <cd> <artist>Soul Coughing</artist>

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

18 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

<title>Irresistible Bliss</title> <price currency=USD>15.99</price> </cd> </catalog> C ba loi th trong XML: th bt u, th kt thc v th thnh phn. Th bt u nh du bt u m t mt i tng, th kt thc nh du s kt thc m t mt i tng, mi th thnh phn m t mt thuc tnh ca i tng. Th bt u bao gm mt tn v mt tp hp cc thuc tnh tu chn c bao bi cc du ngoc nhn. Mi thuc tnh l mt cp: tn/gi tr, c phn cch bi du =. Trong v d trn, th price c thuc tnh l currency. Mt th kt thc cha tn ging nh th bt u nhng c du gch cho / i trc v khng c bt c mt thuc tnh no. Tt c cc th bt u phi km theo mt th kt thc. Cc th thnh phn ging nh th bt u nhng khng c th kt thc. Thay vo , kt thc mt th thnh phn th du gch cho / c t ngay trc du ng ngoc >. V d, th <img src=photo.jpg /> l mt th thnh phn. D liu gia mt th bt u v mt th kt thc c gi l mt thnh phn. Mt thnh phn c th l cc thnh phn khc, cc on vn bn, hoc chnh mt on th bt u v th kt thc khc. Mc d tnh mm do ca XML lm cho n c th c son tho vi cc ni dung tu mt cch nhanh chng v d dng, nhng chnh tnh mm do ny li l s kh khn trong vic x l bng my tnh. Khng ging nh HTML, XML khng cung cp ng ngha cho cc th, hu ht cc chng trnh x l u i hi tp cc th ny c thng nht ngha theo mt vi qui

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

19 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

c chun. h tr vic x l bng my tnh, XML cho php nh ngha ng php cho cc th. Nhng thng tin ny cha trong mt file gi l document type definition (DTD) [18,27]. DTD cung cp c php cho mt ti liu XML, nhng n khng cung cp ng ngha. ngha ca cc thnh phn trong DTD c th c suy lun bi con ngi da vo tn ca n. Nhng cc cng c phn mm th khng th thu c ng ngha ny mt cch c lp. V vy vic trao i cc ti liu XML m c hai DTD khc nhau tr thnh mt vn kh khn. Mt trong nhng vn kh nht l vic nh x gia cc cch biu din khc nhau ca cng mt khi nim, y chnh l vn thng nht cc DTD. u tin l vic xc nh v nh x s khc nhau trong qui c t tn. Cng nh ngn ng t nhin, XML DTDs cng c cc tnh cht ng ngha v tnh cht nhiu ngha ca t. V d <person> v <individual> c th l cng mt khi nim. Hay <spider> c th ch khi nim ca mt phn mm my tnh hay l ch mt loi ng vt (con nhn). Mt vn thm ch cn kh khn hn na l vic xc nh v nh x s khc nhau v mt cu trc. Chnh v tnh mm do ca XML lm cho vic thit k DTD c nhiu s la chn. Vi cng mt khi nim, cc nh thit k c th m t bng nhiu cch khc nhau. V d, ta c ba cch biu din c th cho tn ca cng mt ngi: <person> <name>John Smith</name> </person> (Tn l mt thnh phn ca ngi di dng mt chui) <person>

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

20 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

<name><fname>John</fname><lname> Smith</lname></name> </person> (Tn l mt thnh phn vi ni dung l cc thnh phn) <person name=John Smith> (Tn l mt thuc tnh) S la chn th nht l tn l mt chui hay l mt thnh phn ca chnh cu trc . S la chn th hai l liu tn l mt thuc tnh hay l mt thnh phn. Mt trong nhng nguyn nhn dn n vn ny l s thiu thng tin ng ngha trong XML. Khng c mt ngha c th no lin quan n cc thuc tnh hay ni dung ca cc thnh phn. Chnh s thiu thng tin ng ngha trong cc XML DTD lm cho vic kt hp cc ti liu XML tr nn kh khn.
1.4.2. Ontology

XML mi ch cung cp c s v mt c php. Mt khc, chia s cc ti liu XML m c thm ni dung ng ngha ch lm c khi c hai bn u hiu ngha ca cc khi nim ng ngha trong [24]. V d, nu c mt bn gn nhn l <price> $1200 </price>, mt bn gn nhn l <cost> $1200 </cost>. Khng c cch no my s bit c hai th kia l cng mt th tr khi c thm nhng cng ngh Sematic Web khc nh Ontologies c thm vo. Mt ontology nh ngha cc t vng v cc khi nim c s dng m t v biu din trong mt min tri thc.[20,24]

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

21 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Mt min tri thc l cc vn xung quanh mt ch no . V d: y hc, qun l bun bn, sa cha t, vt l, ti chnh, a l. Cc s m t trong mt min tri thc l s th hin ca cc hot ng. V d, m t trong lnh vc sa cha t: - Cc th loi xe (xe mui kn, xe th thao, ) - Cc th loi ng c (gasoline, diesel, in, ng c lai). - Hng sn xut (Ford, General Motor, Chevrolet, Nissan, Honda, Volvo, Volkswagen) - Nhng b phn to thnh xe (ng c, h thng phanh, h thng lm lnh, h thng in, thn xe) v cc tnh cht ca cc b phn (mt ng c dung tch 4, 6, 8, 12 cylinder) iu quan trng trong vic sa cha t l lm th no sa cc loi xe khc nhau, cc b phn ca mi loi xe, chn on v cc dng c chn on v sa cha, c tnh gi thnh ca vic sa chaKhi m t trong mt min tri thc, chng ta m t cc s vt, hin tng, cc thuc tnh ca cc s vt-hin tng v mi quan h gia chng. Mt s m t ca mt ontology bao gm cc th loi khi nim sau [5,28,20,22,24]: - Cc lp (cc s vt ni chung) trong min cn quan tm. - Cc th hin (cc s vt c th). - Cc mi quan h gia cc s vt . - Cc thuc tnh (v cc gi tr thuc tnh) ca cc s vt.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

22 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

- Cc chc nng v cc tin trnh lin quan n s vt. - Cc rng buc v cc lut lin quan n cc s vt Cng vi vic m t trong mt min tri thc, chng ta cng cn biu din cc m t. Biu din c ngha l ta m ho nhng m t ny theo mt phng php no . Cc mc biu din cn thit cho mt m hnh biu din bao gm: c php, ng ngha, v pragmatic [18,22]. C php: ch ra mi quan h gia cc k hiu (cc t vng trong ngn ng). Ng ngha: ch ra mi quan h gia cc k hiu v cc s vt trong th gii thc. Pragmatic: da trn c php v ng ngha ch ra lm th no m cc k hiu c th c s dng cho mt mc ch c th. V d mt ontology c biu din bng ngn ng OIL [Horrocks et al, 2000]
class-def animal class-def plant subclass-of NOT animal class-def tree subclass-of plant class-def branch slot-constraint is-part-of has-value tree class-def leaf slot-constraint is-part-of has-value branch class-def defined carnivore subclass-of animal slot-constraint eats value-type animal class-def defined herbivore subclass-of animal % nh ngha lp ng vt % nh ngha lp thc vt % l mt lp khng giao vi lp ng vt % cy l mt th loi thc vt % cnh cy l mt b phn ca cy % l l mt b phn ca cnh cy % ng vt n tht l ng vt % m ch n cc ng vt khc % ng vt n c l ng vt

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

23 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn. slot-constraint eats % m ch n thc vt hoc cc b phn ca thc vt value-type plant OR (slot-constraint is-part-of has-value

plant) class-def giraffe % hu cao c l ng vt subclass-of animal slot-constraint eats % v chng n l value-type leaf class-def lion subclass-of animal % s t l ng vt slot-constraint eats % nhng chng n ng vt n c value-type herbivore class-def tasty-plant % thc vt ngon l thc vt c n bi subclass-of plant % c ng vt n c v ng vt n tht slot-constraint eaten-by has-value herbivore, carnivore

1.5. Cc ngn ng Ontology cho Semantic Web


1.5.1. Cc ngn ng

Cho ti nay, c nhiu ngn ng Ontology cho Semantic Web c pht trin. Hu ht cc ngn ng ny da trn c php XML, nh XOL (Ontology Exchange Language), SHOE v OML (Ontology Markup Language), RDF (Resource Description Framework) v RDF Schema (cc ngn ng c a ra bi W3C (World Wide Web Consortium)). Hai ngn ng truyn thng c xy dng da trn nn RDF v RDF Schema l OIL v DAML+OIL [5].

Hnh 2: Mt s ngn ng ontology

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

24 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Ontology Exchange Language (XOL) da trn XML. Cng ng thng tin sinh hc M thit k XOL cho vic trao i cc nh ngha ontology gia mt tp hn tp cc h thng phn mm trong lnh vc sinh hc. Cc nh nghin cu to ra ngn ng ny sau khi thy cn phi biu din cc thng tin sinh hc chuyn mn ca h [5]. Simple HTML Ontology Extension (SHOE). c pht trin bi trng i hc Maryland. N c to ra nh l s m rng ca HTML, kt hp cht ch cc tri thc mang tnh cht ng ngha trong cc ti liu HTML. Cc tri thc c nh du ngay trong cc trang HTML. Vi SHOE, cc Agent c th thu thp cc thng tin giu ngha v cc trang Web v c th ci thin c ch tm kim v thu thp tri thc. Tin trnh ny bao gm ba pha: nh ngha mt ontology, nh du cc trang HTML vi cc thng tin tng ng trong ontology, v xy dng mt agent t ng tm kim thng tin [5,20]. Ontology Markung Language (OML): c pht trin bi trng i hc Washington, n phn no da trn SHOE. V vy, OML v SHOE c rt nhiu c dim chung [5]. Resourse Description Framework v RDF Schema: c pht trin bi W3C m t cc ti nguyn Web, cho php c t ng ngha d liu da trn XML c chun ho [29]. Ontology Interchange Language (OIL): c pht trin bi d n OntoKnowledge (www.ontoknowledge.org/OIL), cho php vic trao i ng ngha gia cc kho d liu Web. C php v ng ngha ca n l da trn OKBC, XOL v RDF) [12,30].

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

25 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

DARPA Agent Markup Language + OIL (DAML+OIL): c pht trin bi mt t chc chu u (IST) theo d n DARPA. DAML+OIL c cng cc i tng ging nh OIL [15,30].
1.5.2. c im chung ca cc ngn ng

Mi ngn ng ontology s c mt s c im ring khc nhau, nhng tri thc Ontology c th c c t bi nm thnh phn c bn sau: concept (thng c t chc phn cp), relation, function, axiom v instance [5,24]. a) Concept Concept c th l tru tng hoc c th, n hoc phc, thc t hoc l tng tng. Tm li, mt concept c th l bt c th g m c ni n, v vy n cng c th l s m t ca mt cng vic, mt chc nng, mt hnh ngConcept cn c gi l cc lp (class) nh trong cc ngn ng XOL, RDF, OIL, DAML+OIL, cc i tng (object) nh trong OML, hoc cc phn mc (categories) nh trong SHOE. Concept bao gm cc thuc tnh (attribute). Thuc tnh cn c gi l slot (nh trong XOL), function (nh trong OML), hay property (nh trong RDF v DAML+OIL), binary relation v role (nh trong SHOE v OIL). Cc thuc tnh c cc loi sau: - Instance attribute. Cc thuc tnh m gi tr ca n c th khc nhau i vi mi instance ca mt concept. - Class attribute. Cc thuc tnh m gi tr ca n c km theo vi mi concept. C ngha l gi tr ca n s l ging nhau cho tt c cc th instance ca mt concept.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

26 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

- Local attribute. l cc thuc tnh c cng tn c km theo cho concept khc nhau. V d: hai concept Bn v Gh c th c cng thuc tnh Mu sc. - Global attribute. l thuc tnh c p dng cho tt c cc concept ca ontology . Instance attribute v class attribute thng c s dng trong vic m t cc concept. S cn thit phi c cc local attribute v global attribute hay khng ph thuc vo nhu cu biu din tri thc trong tng ng dng. Cc class attribute (thuc tnh ca lp) c cc th loi sau: - Default slot value (s dng gn mt gi tr cho mt thuc tnh trong trng hp khng c mt gi tr r rng no c nh ngha cho thuc tnh ). - Type hay cn gi l range (s dng rng buc cc th loi ca thuc tnh). - Cardinality constraints (c s dng rng buc s lng ln nht v nh nht ca cc gi tr). Cc rng buc v type v cardinality ca thuc tnh c s dng qui nh th loi gi tr no m thuc tnh c th c v c bao nhiu gi tr m thuc tnh c th c. V d: mt Sn phm th ch c mt Gi (thuc tnh ny l mt s nguyn) v c th c t 1 ti 5 Mu sc (thuc tnh ny c kiu String). Gi tr default c s dng trong trng hp chng ta khng c thng tin r rng v gi tr ca mt thuc tnh. V d: ta c th gi s rng gi

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

27 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

tr Khu hao ca mt Sn phm l bng 0 nu n khng c gn mt gi tr c th no. Khi nim phn loi c s dng t chc tri thc ontology. N c s dng trong vic tng qut ho v c th ho cc mi quan h thng qua vic p dng cc a tha k v n tha k. Ngn ng c tn ti phn loi th phi c cc nh ngha sau: - Subclass of (cng cn c gi l subsumption relationship) c t nhng khi nim tng qut bng nhng khi nim c th hn. - Disjoint decomposition (mt s phn chia m tt c cc concept ca n th l lp con ca mt concept khc). S phn chia ny khng cn thit phi l mt s phn chia y . iu ny c ngha l c th c mt instance m khng phi l instance ca mt lp con. V d: cc concept Bn v Gh c th l s phn chia ca concept gia dng nhng vn c nhng instance ca gia dng m khng thuc v lp Bn hoc Gh (v d nh T qun o). - Exhaustive subclass decomposition. l mt s phn chia y , c ngha l bt k mt instance no ca concept cha cng phi l mt instance ca mt concept con no . V d: B nh my tnh bao gm hai lp con l B nh trong v b nh ngoi. - Not subclass. c th c s dng th hin rng mt concept th khng th phn chia thnh cc concept nh hn na. N c s dng biu din cho cc lp con nguyn thu. b) Relation v function

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

28 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Relation l mt mi lin kt gia cc concept trong mt lnh vc no . Trong thc t cc relation c th c nh ngha bng cc thuc tnh (nh trong XOL, RDF v DAML+OIL). Cc relation cn c gi l cc role trong OIL. Function l mt loi c bit ca relation. N khc vi relation ch gi tr ca tham s cui cng trong s n tham s l duy nht vi mi tp n-1 tham s trc . V d: ta c relation Mua(Ngi mua, Sn phm, S tin). V ta c hm Mua(Ngi mua, Sn phm, S tin, tr ht tin). Tham s cui cng l tr ht tin ch nhn hai gi tr l True hoc False. c) Axiom Axiom l cc cu lun lun ng v c th c s dng cho mt vi mc ch nh l rng buc thng tin, kim tra tnh ng n. Axiom cn c gi l assertion (nh trong OML). Axiom khng c s dng rng ri trong khung cnh cc ng dng Semantic Web. Chng ta c th hnh dung Axiom nh l cc Axiom trong logic v t cp 1. V d: p(p p) d) Instance Instance biu din cc thnh phn trong mt min ng dng, ng vai tr nh l mt s c th ho ca concept. 1.6. Kt lun chng 1 S pht trin ca Internet dn n nhu cu cho s ra i ca th h tip sau ca Web hin ti: Semantic Web. Semantic Web ra i gn lin vi cng

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

29 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

ngh XML v Ontology. XML l c s c php v Ontology l c s ng ngha ca Semantic Web. Thnh phn c bn ca Ontology l cc lp (class) hay cn gi l cc khi nim (concept), cc thuc tnh lp v cc mi quan h.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

30 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

CHNG 2 - QUAN H NGUYN NHN-KT QU V THUT TON PHT HIN QUAN H NGUYN NHN-KT QU
2.1. Gii thiu Nh bit, mt trong nhng thnh phn quan trng nht ca ontology l cc concept v cc relationship[5,6,18,24]. Cc concept l cc khi nim ch s vt, hin tng,v thng tng ng vi cc danh t [5,24]. Cc relationship ch mi quan h gia cc concept. Cc thnh phn ny c xy dng cng chnh xc v y th tri thc ca Ontology cng c nh gi tt. Vic nh ngha ra cc concept v relationship c th da trn cc kinh nghim v s tng hp tri thc ca con ngi [20,24]. Tuy nhin, s l tt hn rt nhiu nu nh c mt cng c m c kh nng h tr t ng tm ra c cc concept cng nh cc mi quan h gia cc concept ny nhm h tr xy dng ontology. Chng ny s trnh by mt m hnh phn tch cu trc th hin ca cc quan h nguyn nhn-kt qu trong ngn ng t nhin v mt thut ton xut nhm mc ch tm ra c cc mi quan h nguyn nhnkt qu t mt tp d liu vn bn. Thut ton ny c ngha h tr trong vic xy dng tri thc ca cc Ontology. 2.2. Khi nim v cc mi quan h ng ngha trong ngn ng t nhin Trong lnh vc ngn ng t nhin, cc th loi thng tin nh t vng, c php, ng ngha v tri thc ng mt vai tr quan trng trong vic hnh thnh nn cc cu [11]. Cc nh nghin cu chng t rng tnh mch lc ca vn

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

31 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

bn c th c gii thch bng cc quan h ng ngha. V d: mnh ph trong cu sau c lin kt bi quan h nguyn nhn (hay cn gi l quan h nguyn nhn-kt qu) ch ra bi t ni so: It is raining heavily, so the lane is flooded. (Tri ma to nn ng b ngp nc.) Pht hin ra c cc mi quan h trong vn bn l mt iu ht sc quan trng cho cc m hnh m mun hiu c ngn ng ca con ngi. Hn th na, cc quan h v mt ng ngha th hin cc thnh phn ct li trong vic t chc ca c s tri thc ng ngha t vng. Trong c s tri thc ng ngha t vng, thng tin c biu din di dng cc khi nim c t chc trong mt cu trc phn cp v lin kt vi nhau bi cc mi quan h ng ngha [3,13]. Cc khi nim c th l mt n v text n gin nh l cc t, ti mt cu trc phc tp hn nh l mt mnh danh t phc tp. Mt s quan h ng ngha quan trng nht trong ngn ng t nhin l: quan h tng qut-c th, quan h tng th-b phn, quan h nguyn nhn-kt qu, quan h ng ngha, quan h tri ngha [11,13]. Quan h tng qut-c th: l mt trong nhng quan h ng ngha c s. N c s dng nhm mc ch phn lp cc thc th khc nhau to ra mt ontology c cu trc phn cp. Mt khi nim c gi l tng qut ca mt khi nim khc nu n tng qut hn khi nim kia. V d: Mu th tng qut hn mu ti.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

32 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Mc d bao gm c cc danh t v ng t, nhng quan h tng qut-c th thng thch hp cho cc danh t hn. Quan h tng th-b phn: l mi quan h v mt ng ngha m th hin lin kt tng th v b phn gia hai khi nim. V d: tay l mt b phn ca c th ngi. Quan h ng ngha: hai t c coi l ng ngha nu chng cng m ch cng mt khi nim ng ngha. Tuy nhin, mt vi t ch c coi l ng ngha trong mt khung cnh c th. Quan h tri ngha: l quan h ngc li vi quan h ng ngha. V cng nh quan h ng ngha. Cng ging nh quan h ng ngha, mt s t ch c coi l tri ngha ch trong mt vi khung cnh c th. Quan h nguyn nhn-kt qu: l quan h bao gm hai thnh phn, mt thnh phn th hin nguyn nhn v mt thnh phn th hin kt qu. V d: Lacking of calcium brings about rickets (Thiu can xi dn dn bnh ci xng). 2.3. Quan h nguyn nhn-kt qu Quan h nguyn nhn-kt qu c xem nh l mt trong s nhng quan h ng ngha quan trng nht gp phn to nn tnh mch lc ca vn bn. Quan h nhn qu l mt c im c mt khp cc qu trnh t nhin, v do vy n cng c biu din bng ngn ng ca con ngi [16].

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

33 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Ni theo ngha rng, nguyn nhn m ch ci cch bit liu mt trng thi ca mt s vic c gy ra mt trng thi khc hay khng. Mc d khi nim nguyn nhn c t rt c (t thi Aristotle), nhng tri qua thi gian, cc nh khoa hc v cc nh trit hc vn cn tranh lun vi nhau v nh ngha ca nguyn nhn v khi no th hai trng thi ca mt s vic c gi l c lin h nguyn nhn-kt qu vi nhau. Hc thuyt v nguyn nhn rt rng, v c l c im th v nht khi lm vic trn quan h nguyn nhn trong cc thp k qua l tnh a dng ca n. Mt vi hc thuyt c pht trin v kt qu l rt nhiu cng trnh nghin cu c cng b. S bng n ca cc hng nghin cu ny c th gii thch phn no l do s a dng ca cc phi cnh m cc nh nghin cu s dng cng nh tnh a dng ca cc min nghin cu: trit hc, thng k hc, ngn ng hc, vt l hc, kinh t hc, sinh hc, y hc V d, trong cun Knowledge Representation ca Sowa, tr tu nhn to (Artificial Intelligent) l mt trong ba mn hc kinh in (tr tu nhn to, vt l l thuyt v trit hc). Vi mn hc ny, c rt nhiu cu hi th v v nguyn nhn c t ra pht trin cc hc thuyt nhm kch thch nhng hnh vi tr tu tng t vi con ngi. Nhiu nghin cu v nguyn nhn trong tr tu nhn to c lm. Chng hn nh, Planning trong tr tu nhn to l vn tm kim mt chui cc hot ng nguyn thu nhm thu c mt vi mc ch. Kh nng l lun v mt thi gian ca cc hnh ng l c s cho bt k mt thc th tr tu no, thc th m cn thit phi a ra mt chui cc quyt nh. Tuy nhin, tht l kh biu din khi nim mt chui cc hnh ng ang din ra v khi nim kt qu ca chui cc hnh ng m khng s dng ti khi nim nguyn nhn. Cc hnh ng

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

34 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

planning cho cc robot i hi vic lp lun v nguyn nhn theo th t hnh ng v lng thi gian tiu tn thc hin hnh ng . Xc nh nguyn nhn ca cc trng no ca cc s vic th cng ng rng cn phi xem xt trng thi trc n v mt thi gian. 2.4. Cu trc nguyn nhn-kt qu trong ngn ng ca con ngi Cu trc nhn qu ng mt vai tr quan trng trong lch s ngn ng trong thi gian gn y ch yu bi v cc nghin cu ca n c lin quan n vic tng tc gia cc thnh phn a dng trong vic m t ngn ng bao gm: ng ngha, c php v hnh thi. Phn ny tp trung vo cc biu thc ngn ng a dng ca nguyn nhn c s dng trong ngn ng ca con ngi. Bt c mt cu trc nguyn nhn-kt qu no cng u bao gm hai thnh phn: nguyn nhn v kt qu. V d: The bus fails to turn up. As the result, Im late for a meeting (V xe but ti mun nn ti i hp mun) Trong v d trn, nguyn nhn c biu din bi hin tng xe but n mun, v kt qu l b mun bui hp. C hai loi quan h nguyn nhn-kt qu: quan h nguyn nhn-kt qu tng minh v quan h nguyn nhn-kt qu khng tng minh. Quan h nguyn nhn-kt qu tng minh thng c cu trc nguyn nhn r rng: vnn, do-nn,hoc km theo cc ng t gy nguyn nhn: v vy, cho nn, gy raQuan h nguyn nhn-kt qu khng tng minh th c cu trc phc

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

35 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

tp hn v kh nhn ra hn. nhn bit c cc quan h ny, cn phi c thm c s phn tch ng ngha v cc tri thc c s.
2.4.1. Cu trc nguyn nhn-kt qu tng minh

Cc mu c php-t vng ca cc quan h nguyn nhn-kt qu tng minh c chia thnh cc loi sau: - T ni ch nguyn nhn. - ng t ch nguyn nhn. - Cu phc vi mt cp t ch nguyn nhn. 2.4.1.1. T ni ch nguyn nhn T ni ch nguyn nhn c chia thnh cc loi sau: - Trng t ch nguyn nhn. - Lin t ch nguyn nhn a) Trng t ch nguyn nhn L cc cu trc lin kt hai cu n bng mt trng t nhm mc ch to nn mt mi quan h nguyn nhn. V d: The teacher is so prissy. For this reason, Lin doesnt go to school (C gio qu kh tnh. V l do ny, Lin khng i hc) Mt s trng t ch nguyn nhn thng gp: For this reason, As a result, The result that (v l do ny, kt qu l, do vy, nh vy)

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

36 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

b) Lin t ch nguyn nhn L cu trc lin kt gia hai mnh bng mt lin t to nn mt quan h nguyn nhn-kt qu. V d: It was cloudy, so the experiment was postponed (Tri nhiu my nn cuc th nghim b hon) The boy goes out because of the banking-dog (Cu b chy ra ngoi sn v thy ting ch sa) Mt s lin t ch nguyn nhn thng gp: Because, because of, so, so that, for, since, as (v, do, nh, nh c, cho nn) 2.4.1.2. ng t ch nguyn nhn Nhiu nh ngn ng hc quan tm nhiu in cu trc ng t ch nguyn nhn ch yu bi v nhng nghin cu ny ca h c lin quan ti cc c php chun v s phn tch ng ngha ca ngn ng. Theo Corina Roxana Girju [11], ngi u tin a ra xut phn lp t vng cho cc ng t nguyn nhn l nh ngn ng hc ngi Nga V.P. Nedjalkov. y ng phn loi ng t nguyn nhn thnh cc dng sau: - ng t nguyn nhn n gin. - ng t nguyn nhn bao hm kt qu. - ng t nguyn nhn m ch phng tin (gy ra) a) ng t nguyn nhn n gin:

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

37 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

L cc ng t bao hm ngha ca quan h nguyn nhn-kt qu c dng nh cause ,lead to, bring about, generate, make, force, allow (gy ra, dn n, sinh ra, to ra, lm cho) V d: Earthquakes generate tidal waves (ng t gy ra sng thn) Lacking of calcium might bring about rickets (Thiu can xi c th dn n ci xng) Rain lead to flooded lanes (Tri ma lm cho ng li) b) ng t nguyn nhn bao hm kt qu L nhng ng t th hin mt hnh ng m t ng t chng ta c th bit c kt qu ca hnh ng m kt qu ny khng cn phi cp n trong cu [11]. V d: The thieft killed the host (Tn trm git ngi ch nh) (Vi ng t git chng ta c th bit l ngi ch nh cht) The artist burned his paintings which he drew yesterday (Ngi ho s t nhng bc tranh m anh ta v ngy hm qua.)

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

38 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

(Vi ng t t chng ta bit c l nhng bc tranh m ngi ho s v ngy hm qua b chy ht). Mt s ng t nguyn nhn bao hm kt qu: kill, burn, fire, poison, hit, shoot... (git, t, chy, u c, nh, bn) c) ng t nguyn nhn m ch phng tin (gy ra) L cc ng t th hin mt hnh ng m t ng t chng ta c th bit c phng tin gy ra hnh ng trong khi phng tin ny khng cn phi c cp n trong cu. V d: Stepmother commonly poison her husbands stepchild (G gh thng hay u c nhng a con ring ca chng) (Vi ng t u c chng ta c th bit c cc b d gh dng thuc c u c con chng) He is swimming to the island (Anh ang bi ra ngoi o) (Vi ng t bi chng ta c th bit c anh phi ang bi trn mt h nc trong khi trong cu khng h nhc n nc). Mt s ng t nguyn nhn m ch phng tin: poison, swim, shoot, writte, read...(u c, bi, bn, vit, c)

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

39 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

2.4.1.3. Cu phc vi mt cp t ch nguyn nhn L cu trc cu ghp gm hai mnh c ni vi nhau bng mt cp t ni m ch quan h nguyn nhn-kt qu gia hai mnh ny. V d: It is raining so heavily that the lane is flooded (V tri ma to nn ng li) If I have much money then Ill buy a beautiful house (Nu ti c nhiu tin th ti s mua mt ngi nh tht p) Mt s cp t ni ch nguyn nhn thng gp [11]: Ifthen, so that(vnn, donn, nuth )
2.4.2. Cu trc nguyn nhn khng tng minh

y l th loi kh nht, n i hi phi suy lun da trn cc phn tch ng ngha v tri thc tng th. Bao gm cc cu trc sau: - H danh t ghp - ng t m ch nguyn nhn khng tng minh. a) Cc h danh t ghp biu din nguyn nhn Cc h danh t ghp l mt trong nhng vn kh nht ca vic x l ngn ng t nhin, ch yu bi v chng i hi vic phn tch ng ngha kh phc tp. Cc danh t ghp l cc mnh danh t c hnh thnh nh l mt s m rng hay tha k ca cc danh t gc. V d: gio vin ting Anh, t l gia tng dn s, S nhp nhng ca cc danh t ny lm cho vic

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

40 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

phn tch cu tr nn kh khn hn. Mt t vng c s c th c nhiu hn mt ngha, v vy, mt t ghp th li cng c nhiu ngha hn. c th bin dch chng mt cch y , i hi phi c nhng tri thc ngn ng m rng lin quan dn ni dung ng ngha ca cc thnh phn trong cu v trong mt ng cnh nht nh. Mt trong s nhng quan h c th lin kt hai danh t trong mt h danh t ghp l quan h nguyn nhn. N c dng l mt cm danh t c hnh thnh bi hai cm t trong mt cm t l nguyn nhn v mt cm t l kt qu. CT1 CT2 => CT1 l nguyn nhn ca CT2 hoc CT1 b gy ra bi CT2 Trong CT1 v CT2 l cc cm t 1 v 2. V d: Tetanus virus (Vi trng un vn) (Bnh un vn b gy ra bi vi trng) b) ng t ch nguyn nhn khng tng minh l cu trc ca mt dy cc hnh ng th hin bng cc ng t m hnh ng sau th thng l kt qu ca hnh ng trc. Trong cu trc ny, cha chc xut hin cc t ni ch nguyn nhn. V d: Feeling sorry for what he did, the burglar confessed to the policeman (Cm thy hi hn v nhng g m mnh lm, tn trm i u th vi cnh st).

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

41 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

(Hnh ng u th l kt qu ca hnh ng hi hn) 2.5. Thut ton khai ph d liu pht hin quan h nguyn nhnkt qu t cc vn bn
2.5.1. Gii thiu

Vn hc ngn ng t nhin l mt ch hay v c nghin cu t nhiu nm nay. Nhm nghin cu v hc ngn ng t nhin SIGNLL (Special Interest Group on Natural Language Learning) mi nm mt ln t chc mt hi tho vi cc ch xoay quanh vn v hc ngn ng t nhin CoNLL (Conference of Natural Language Learning). Hi tho ln th 8 t chc vo ngy 6-7 thng 5 nm 2004 (CoNLL-2004) c ch l Sematic Role Labeling. Bi ton Sematic Role Labeling l bi ton yu cu gn nhn ng ngha (sematic role) cho cc thnh phn c php trong cu. Mt Semantic Role l mt mi quan h gia cc thnh phn c php trong cu v mt thuc tnh ng ngha no . Vic nhn ra v gn nhn ng ngha cho cc thnh phn trong cu l mt cng vic quan trng tr li cho cc cu hi Ai, Ci g, Khi no, u, Ti sao, (Who, What, When, Where, Why, ). V d, ta c cu sau c gn nhn semantic roles: [A0 He ] [AM-MOD would ] [AM-NEG n't ] [V accept ] [A1 anything of value ] from [A2 those he was writing about ] . y, cc nhn ng ngha c nh ngha trong tp roleset tng ng vi cc k hiu c nh ngha trong PropBank Frames (qui nh cc k hiu c php ca ngn hng d liu PropBank) [19,20,21]:

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

42 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

V: ng t (verb) A0: ch ng iu khin ng t accept (acceptor) A1: v ng b iu khin bi ng t (thing accepted) A2: v ng ph sau gii t (accepted-from) AM-MOD: ng t tnh thi (modal) AM-NEG: ph nh (negative) y l mt bi ton ln v c nhiu cng trnh c trnh by ti hi tho nhm a ra cc gii php cho vn ny nh cc bi bo: Hierarchical Recognition of Propositional Arguments with Perceptrons ca cc tc gi Xavier Carreras and Llus M`arquez (TALP Research Centre,Technical University of Catalonia) v Grzegorz Chrupaa (GRIAL Research Group, University of Barcelona); Semantic Role Labeling by Tagging Syntactic Chunks ca cc tc gi Kadri Hacioglu1, Sameer Pradhan1, WayneWard1, James H. Martin1, Daniel Jurafsky2 (1University of Colorado at Boulder,
2

Stanford University); Semantic Role Labeling using Maximum Entropy

Model ca cc tc gi Joon-Ho Lim, Young-Sook Hwang, So-Young Park, Hae-Chang Rim (Department of Computer Science & Engineering Korea University); Semantic Role Labeling Via Generalized Inference Over Classifiers ca tc gi Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak Yuancheng Tu (Department of Computer Science Department of Linguistics, University of Illinois at Urbana-Champaign). Tuy nhin, tt c cc thut ton c xut ny c chnh xc vn cha cao (precision <75% v recall <70%).

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

43 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Mt khc, Corina Roxana Girju [11] a ra mt thut ton tm ra cc ng t th hin quan h nguyn nhn v cc ng t th hin quan h tng th-b phn. Trong cng trnh ca mnh, Corina Roxana Girju i su nghin cu v cu trc ngn ng t nhin, thut ton ca tc gi nhm mc ch tm kim cu c cu trc nguyn nhn-kt qu v tng th-b phn, sau nh gi mc quan trng ca cc ng t chnh trong cu bng cch thng k tn sut xut hin ca chng trong mt s lng ln cc vn bn. Thut ton c chng ti a ra l mt ci tin ca thut ton ca Corina Roxana Girju [11]. Chng ti cng tm kim cc cu c cu trc nguyn nhn-kt qu nh cch m Roxana Girju lm, nhng sau khng xc nh tn sut xut hin ca ng t m thng k tn sut xut hin ca chnh cc cp danh t ch nguyn nhn-kt qu trong cu (cn tc gi Corina Roxana Girju th li ly ra ng t thng k tn sut xut hin ca ng t). Cp danh t no c tn sut xut hin cng nhiu th xc sut mang quan h ng ngha nguyn nhn-kt qu ca chng cng cao. Bi ton ny l mt phn nh ca bi ton Semantic Role. C th l chng ti ch tp trung gii quyt vic gn nhn nhng ng t ch nguyn nhn n gin (ng t ch nguyn nhn tng minh).
2.5.2. Thut ton pht hin quan h nguyn nhn-kt qu

Nh chng ti gii thiu v phn tch trn, quan h nguyn nhn-kt qu th hin trong ngn ng t nhin v cng phong ph, a dng v phc tp. Ch ring vic phn tch cu xc nh ng ngha ca cu thuc cu trc nhn qu no cng l mt trong nhng dng bi ton kh nht ca x l ngn ng t nhin. V vy, trong thut ton ny, khng bao trm ton b mi

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

44 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

cu trc phc tp ca quan h nguyn nhn m ch quan tm n cu trc nguyn nhn tng minh th hin ng t ch nguyn nhn. Cc trng hp khc ca quan h nguyn nhn th khng c xt n y. Quan h nguyn nhn tng minh vi mt ng t ch nguyn nhn c th biu din di dng: <DT1 - ng t ch nguyn nhn - DT2> Trong : DT1 v DT2 l cc danh t (hoc ng danh t). Chng c th tng ng vi cc concept ca ontology. Ng danh t l mt nhm cc t m kt thc bng mt danh t. N c th cha qun t (the, a, this, ) u, cha cc tnh t, trng t, v danh t. Ng danh t khng c bt u bng mt gii t. Th tc pht hin quan h nhn qu. Khi qut thut ton: u vo: danh sch cc ng t ch nguyn nhn. u ra: danh sch cc cp quan h nguyn nhn- kt qu c dng (DT1, DT2) Bc 1: Vi mi vn bn trong tp d liu. Chn ra cc cu c cu trc <DT1-ng t-DT2> t cc vn bn. Trong , DT1 v DT2 l cc danh t (hoc ng danh t). Bc 2: So snh ng t trong cu chn vi cc ng t ch nguyn nhn trong bng ng t ch nguyn nhn. Nu ng t ny trng vi mt trong cc ng t ch nguyn nhn trong bng th xt cp (DT1, DT2):

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

45 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

- Nu cp danh t ny c trong c s d liu th tng gi tr tn sut xut hin ca chng ln 1. - Nu cp danh t ny cha tn ti trong c s d liu th thm mi n vo c s d liu. Bc 3: lp li bc hai vi tt c cc cu c dng <DT1- ng tDT2> trong vn bn . Bc 4 : Quay tr li thc hin bc 1 vi mi vn bn trong tp d liu. Bc 5: Sp xp cc cp (DT1, DT2) thu uc theo th t gim dn ca tn xut xut hin. Bc 6: Chn ra m cp u tin trong c s d liu. l nhng cp quan h nhn qu cn tm. Chi tit thut ton: In put: V l tp cha cc ng t ch nguyn nhn. Out put: O l mt tp gm cc cp c dng (DT1, DT2) l cc cp th hin quan h nguyn nhn-kt qu. 1. C := l tp hp s cha cc cp (DT1, DT2, i) vi DT1, DT2 l cc danh t ch nguyn nhn v kt qu v i l tn xut xut hin ca cp danh t . 2. For mi vn bn Di trong CSDL 2.1 For mi cu Sj trong vn bn Di 2.1.1 Nu Sj l cu c dng <danh t 1- ng t - danh t 2 >

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

46 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

2.1.1.1 Tch ra cp (DT1, DT2) vi DT1= danh t 1 v DT2= danh t 2. 2.1.1.2 Gn v:= ng t. 2.1.1.3 Nu v c trong trong V 2.1.1.3.1 Nu (DT1, DT2) c trong C th tng tn sut xut hin ca n ln 1. 2.1.1.3.1 Nu (DT1, DT2) cha c trong C th gn C:= C U (DT1, DT2, 1). 3. Sp xp tp C theo th t gim dn ca tn sut xut hin. 4. Chn ra m cp quan h u tin trong C lm kt qu tr v trong tp O. Ch : Mt iu quan trng cn ch vi th tc trn l vi cu c dng <DT1-ng t ch nguyn nhn- DT2> th DT1 c th l nguyn nhn ca DT2 hoc DT2 l nguyn nhn ca DT1. Nhng cp quan h nguyn nhn-kt qu thu c (DT1, DT2) th phi c mt dng thng nht l DT1 l nguyn nhn v DT2 l kt qu. V vy chng ta cn xc nh r loi ng t gy nguyn nhn l loi ng t no: <Nguyn nhn- ng t- kt qu> hay <kt qu - ng t- nguyn nhn>, t gn cp (DT1, DT2) cho thch hp. gii quyt vn ny c th thm cho mi ng t nguyn nhn mt thuc tnh th hin tnh cht trn.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

47 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

2.6. Kt lun chng 2 Chng ny trnh by khi nim, ngha v phn tch chi tit cu trc ca quan h nguyn nhn-kt qu c th hin trong ngn ng ca con ngi. T a ra mt thut ton nhm pht hin ra cc cp nguyn nhn-kt qu t mt tp hp cc vn bn text. Chng trnh ci t th nghim cho thut ton v vic nh gi kt qu thut ton s c trnh by chng tip theo.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

48 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

CHNG 3 - KT QU TH NGHIM THUT TON


3.1. Gii thiu Chng trnh th nghim ci t cho thut ton khai ph d liu pht hin quan h nhn qu trong vn bn c vit bng ngn ng Java v kt ni vi c s d liu Oracle. Chng trnh bao gm 1100 dng lnh trong nm file: - File chng trnh chnh: Phi hp cc lp v chy chng trnh. - Lp ConnectDBClass: cha cc th tc tin ch kt ni vo CSDL. - Lp ConvertFileClass: cha cc th tc chuyn t nh dng d liu gc ca Pern Tree Bank [7,8] thnh nh dng c th x l c. - Lp ReadFileClass: cha cc th tc c file phn tch cu tch ng t, danh t cho vo CSDL. Chng trnh vit theo mc ch ring v phi phn tch file theo nh dng d liu ca Pern Tree Bank nn khng s dng m ngun c sn. D liu s dng th nghim cho thut ton l mt corpus c trch ra t ngn hng d liu Penn TreeBank II (http://www.cis.upenn.edu/~treebank). Ngn hng d liu ny bao gm khong 1 triu cu, c ly t tp ch Wall Street Journal xut bn nm 1989.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

49 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

3.2. nh dng file d liu D liu Penn Tree Bank nm trong 2300 file. Mi file cha mt tp hp cc cu c nh du c php sn theo nh dng ca Penn TreeBank [7,8]. V d, cu sau c nh du c php y : The $ 1.4 billion robot spacecraft faces a six-year journey to explore Jupiter and its 16 known moons . DT $ CD CD NN NN VBZ DT JJ NN TO VB NNP CC PRP$ CD JJ NNS . B-NP I-NP I-NP I-NP I-NP I-NP B-VP B-NP I-NP I-NP B-VP I-VP B-NP O B-NP I-NP I-NP I-NP O (S* * * * * * * * * * (S* * * * * * * *S) *S) O O O O O O O O O O O O B-LOC O O O O O O

Cc k hiu ca mt cu c a ra bng cch s dng phng php biu din theo ct phn cch nhau bng cc du cch. Mi ct m ho mt k hiu bng cc th nh du tng ng vi k hiu . Vi mi cu, bao gm nhng ct sau: 1. Words.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

50 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

2. Part of speech tags. 3. Chunks in IOB2 format. 4. Clauses in Start-End format. 5. Named Entities in IOB2 format. Words cha danh sch cc t n ca cu. Part of speech tags biu din t loi ca tng t n tng ng trong ct Word. Mt s nh dng t loi: JJ: tnh t. JJR: tnh t so snh hn. JJS: tnh t so snh bc nht. RB: trng t. RBR: trng t so snh hn. RBS: trng t so snh bc nht. CC: t ni. CD: t ch s lng. DT: qun t. NN: danh t n. NNS: danh t s nhiu. NNP: danh t ring s t. NNPS: danh t ring s nhiu.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

51 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

VB: ng t, dng nguyn th. VBD: ng t, dng qu kh. VBG: ng t, dng tip din hoc danh ng t. nh dng IOB2 biu din cc on ni tip nhau. Cc t m khng thuc on no th nhn gi tr th O. Cc t bn trong mt on loi $k, th t u tin ng vi th c dng l B-$k (Begin), v cc t tip ng vi th c dng l I-$k (Inside). Mt s k hiu hay s dng ca nh dng IOB2: ADJ tnh t (adjective). ADJP ng gii t (adjective phrase) ADV trng t (adverb) ART qun t (article) N danh t (noun) NP ng danh t (noun phrase) S cu (sentence) V ng t (verb) VP ng ng t (verb phrase) nh dng Start-End biu din cc cm t (phrases) lng vo nhau. Mi th biu din m u v kt thc ca mt cm t, n c dng STARTS*ENDS. Th START c dng ($k, n biu din v tr bt u ca mt cm t ca th loi $k. Th END c dng $k), biu din v tr kt thc ca cm t th loi $k. S kt ni ca cc cu trc th th to nn mt cu trc ngoc. V d, th

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

52 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

* biu din mt t m khng phi l t bt u hay kt thc ca mt cm t; th (A0*A0) biu din mt t m to thnh i s A0; th (S (S*S) biu din mt t m cu thnh mt mnh c s (nhn S) v bt u mt mnh mc cao hn. 3.3. Chng trnh th nghim Chng trnh th nghim ci t th nghim cho thut ton pht hin quan h nguyn nhn-kt qu chy trn tp d liu c phn tch c php sn ca Penn TreeBank nh m t trn. Chng trnh chy trn my tnh IBM Pentium 4, CPU 2.4 GHz, 500 Mb RAM. Tng s thi gian mi ln chy chng trnh vi tp d liu c m t trn l 8h24. Cc ng t ch nguyn nhn s dng cho chng trnh l cc ng t ch nguyn nhn c ly ra t WordNet 2.1 (http://wordnet.princeton.edu/).
STT ng t 1 Induce 2 Cause 3 Make 4 Result (in/from) 5 Lead (to) 6 Produce 7 Generate 8 Create 9 Bring (about)
Bng 1: Cc ng t nguyn nhn ly ra t WordNet

WordNet l mt h thng tham kho t vng trc tuyn c thit k bi mt nhm nghin cu trng i hc Princeton University

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

53 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

(http://www.princeton.edu/main/). H thng ny v ang c s dng bi nhiu nhm nghin cu c lin quan. 3.4. Kt qu thc nghim Kt qu tm c tng cng 34 033 cp danh t (hay ng danh t). Trong , + C 2 cp danh t (hay ng danh t) c tn sut xut hin nhiu nht l 9 ln. l cc cp: company-sale (cng ty kinh doanh- vic bun bn), smoking-lung cancer (ht thuc- bnh ung th phi). + C 4 cp c tn sut xut hin 8 ln. l cc cp: smokingpulmonary problem (ht thuc- cc bnh v phi), traffic-noise (giao thngting n), Standard & Poor-underwriter (cp ny khng c ngha), environmental change-erosion (thay i ca mi trng- s xi mn). Ta c bng kt qu nh sau:
Tn sut xut hin 9 8 7 6 5 4 3 S cp danh t/ng danh t 2 4 8 23 30 99 263 T l % trn tng s cc cp tm thy 0.005 % 0.012 % 0.024 % 0.068 % 0.081% 0.29 % 0.77 %

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

54 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn. Tn sut xut hin 2 1 S cp danh t/ng danh t 502 33077 T l % trn tng s cc cp tm thy 1.48 % 97.2 %

Bng 2: T l phn trm ca cc cp danh t tm thy theo tn sut xut hin.

Tnh t l phn trm ca s cp danh t (hay ng danh t) c ngha nguyn nhn-kt qu theo tng tn sut xut hin ta c bng sau:
Tn sut xut hin 9 8 7 6 5 4 S cp danh t/ng danh t 2 4 8 23 30 99 S cp danh t/ng danh t mang ngha nguyn nhnkt qu 1 3 4 14 15 17 T l % s cp mang ngha nguyn nhn-kt qu 50 % 75 % 50 % 61 % 50 % 17.2 %

Bng 3: t l phn trm cc cp mang ngha nguyn nhn-kt qu theo tn sut xut hin.

Bng trn c biu din di dng th nh sau:

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

55 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Cp khng mang ngha nguyn nhn-kt qu Cp mang ngha nguyn nhn-kt qu 120 100 S cp danh t 80 60 40 20 0 4 5 6 7 8 9 Tn sut xut hin

Hnh 3: th t l cc cp danh t mang ngha nguyn nhn-kt qu theo tn sut xut hin.

Tnh t l phn trm s cp danh t (hay ng danh t) mang ngha nguyn nhn-kt qu theo tn sut xut hin ln hn mt ngng no ta c bng kt qu sau:
Tn sut xut hin 9 S cp danh t/ng danh t 2 S cp danh t/ng danh t mang ngha nguyn nhnkt qu 1 T l % s cp mang ngha nguyn nhn-kt qu 50 %

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

56 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn. S cp danh t/ng danh t mang ngha nguyn nhnkt qu 4 8 22 37 54 T l % s cp mang ngha nguyn nhn-kt qu 66.7 % 57.1 % 59.4 % 55.2 % 32.5 %

Tn sut xut hin 8 7 6 5 4

S cp danh t/ng danh t 6 14 37 67 166

Bng 4: t l cc cp danh t mang ngha nguyn nhn-kt qu c tn sut ln hn mt gi tr ngng.

Bng trn c biu din di dng th:

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

57 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

Cp khng mang ngha nguyn nhn-kt qu Cp mang ngha nguyn nhn-kt qu 180 160 140 S cp danh t 120 100 80 60 40 20 0 4 5 6 7 8 9 Tn sut xut hin

Hnh 4: th th hin t l cc cp danh t c ngha nguyn nhn-kt qu c tn xut ln hn mt gi tr ngng.

3.5. Nhn xt Bng kt qu cho thy vi nhng cp c tn sut xut hin ln th t l phn trm cc cp mang ngha nguyn nhn-kt qu cng cao. Vi nhng cp c tn sut xut hin ln hn 5 ln th t l ny u > 50 %. T l chnh xc vn cha cao (< 70 %) nhng kt qu t c cho thy c th da vo thut ton xut tm ra nhng cp danh t (hoc ng

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

58 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

danh t) c quan h ng ngha nguyn nhn-kt qu. y chnh l mc ch ca lun vn ny. 3.6. Kt lun chng 3 Chng ny l kt qu ci t th nghim ca thut ton c trnh by chng 2. Chng trnh ci t vit bng ngn ng Java, chy trn ngn hng d liu c phn tch c php sn Penn Tree Bank. S dng cc ng t ch nguyn nhn c ly ra t WordNet 2.1, chng trnh tm thy 34 033 cp danh t (hay ng danh t). Trong s cc cp c tn sut xut hin >= 4 c 32.5 % l cc cp mang ngha nguyn nhn-kt qu.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

59 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

KT LUN
Nh vy, kt qu thc nghim ca thut ton tm c 54 cp danh t (hay ng danh t) mang ngha nguyn nhn-kt qu trong s 166 cp kt qu tm thy m c tn sut xut hin 4. Nhng thng tin tm c ca thut ton s l cc thng tin rt hu ch trong vic xy dng ontology hay vic xy dng cc ng dng khc ca Semantic Web. Lun vn mi ch gii hn vic tm quan h ng ngha cu trc quan h nguyn nhn-kt qu. pht trin, c th p dng tng t thut ton vo cc loi quan h ng ngha khc nh tng th-b phn, khi qut-c th bng cch phn tch cu trc ca cc quan h ny trong cu. Ngoi vic ng dng kt qu ca thut ton tm quan h ng ngha vo vic xy dng Ontology cho Semantic Web. Kt qu ca thut ton cn c th c ng dng trong cc lnh vc khc. V d nh trong vic xy dng my tm kim thc hin tr li cu hi Who, What, When, Where Vic nh gi mc th hin ngha nguyn nhn, kt qu ca cp danh t (hay ng danh t) ca thut ton mi ch da vo tn sut xut hin trong cc vn bn. Vic nh gi ny c th m rng ln bng cch gn cho mi cp mt trng s. Trng s ny s c tnh thng qua cc thng s nh: tn sut xut hin, mc quan trng ca ng t ch nguyn nhn m n lin kt Kt qu thc nghim ca thut ton cha cho chnh xc cao (< 70 %), do chy trn mt tp d liu cha ln lm, nhng cho thy kt qu ca thut ton c th c s dng tham kho v xy dng cc mi quan h v tm ra cc concept trong qu trnh xy dng Ontology.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

60 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

TI LIU THAM KHO


Ting Vit [1]. ng Tiu Hng (2004), Phng php biu din ng ngha ln cn siu lin kt cho my tm kim VietSeek, Lun vn thc s, Khoa Cng Ngh-i hc Quc gia H ni, tr 6-42. [2]. on Sn (2001), Cc phng php biu din v ng dng trong khai ph d liu vn bn, Lun vn thc s, Khoa Cng Ngh-i hc Quc gia H ni, tr 16-32. [3]. Phm Thanh Nam, Bi Quang Minh, H Quang Thy (2004). Gii php tm
kim trang Web tng t trong my tm kim VietSeek. Tp ch Tin hc v iu khin hc (nhn ng 1-2004)

[4]. Phan Xun Hiu (2003), Khai ph song song lut kt hp m, Lun vn thc s, Khoa Cng Ngh- i hc Quc gia H ni, tr 9-16, tr 42-58. Ting Anh
[5]. Asuncion Gomez-Perez and Oscar Corcho (January / February 2002),

Ontology Languages for the Semantic Web, IEEE intelligent systems, http://computer.org/intelligent.
[6]. Aubrey E.Hill (1998), Automated knowledge acquisition of case-based

semantic networks for interative enhancement of the dataming proccess, Doctor of Philosophy, University of Alabama at Birmingham, pp 14-32. [7]. Beatrice Santorini (1990), Part-of-Speech Tagging Guidelines for the Penn
TreeBank Project, Penn http://www.cis.upenn.edu/~treebank. Treebank II Project,

[8]. Beatrice Santorini (1991), Bracking Guidelines for Penn TreeBank Project,
Penn Treebank II Project, http://www.cis.upenn.edu/~treebank.

[9]. Chiristopher D. Manning, Hinrich Schuze (1999), Foundations of Statistical Natural Language Processing, The MIT Press, Cambridge, Massachusets London, England. [10]. Choochart Haruechaiyasak (2003), A dataming and Semantic Web frameworks for building a web based recomender system, Doctor of Philosophy, the University of Miami, pp 31-44, pp 50-59.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

61 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn. [11]. Corina Roxana Girju (2002), Text mining for semantic relations, Doctor

of Philosophi in computer science, University of texas at Dallas, pp 25-63, pp 86-106.


[12]. Dieter Fensel and Frank van Harmelen (March/April 2001), OIL: an

ontology infrastructure for the Semantic Web, systems, http://computer.org/intelligent.

IEEE intelligent

[13]. on Thin Thut (2001), A concise Vietnamese grammar for nonnative speakers. Nh xut bn th gii 2001, pp 6-15, pp 20-29. [14]. Ha Quang Thuy, Nguyen Tri Thanh (2003). A web site representation
method using concept vectors and web site classifications. Gi ng Tp ch Tin hc v iu khin hc thng 10-2003. Markup Language,

[15]. I.Horrocks and F.van Harmelen (draft report, 2001), Reference Description of
the DAML+OIL Ontology www.daml.org/2000/12/reference.html Morgan Kaufmann, ch 1, pp 3-31.

[16]. J. Han and M. Kamber (2000), Data Mining: Concepts and Techniques, [17]. Jeff Heflin, James Hender (2000), Semantic Interoperablity on the Web,
University of Mary Land, http://www.cs.umd.edu/~heflin.

[18]. Jeffrey Douglas Heflin (2001), Toward the Semantic Web: a knowledge representation in a dynamic, distributated environment, Doctor of Philosophy, University of Maryland, pp 40-83. [19]. Jingkun Hu (2004), Visual Modeling of XML constraints based on a new extensible constraint Markup Language, Doctor of Philosophy, Pace University, pp 9-44 . [20]. Jonh Davies, Dieter Fensel, Frank van Harmelen (2003), Towards the Semantic Web Ontology-driven Knoledge Management, John Wiley & Sons Ltd, pp 1-9, pp 16,17,18
[21]. Lan Eric Gibson (2001), Data mining Analysis of digital library database

usage partern as a tool facilitating efficient user navigation, Doctor of Philosophy, the University of Alabama, pp 23-42.
[22]. Maedche, Alexander D (2002), Ontology learning for the Semantic Web,

Kluwer Academic Publisher, pp 10-34.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

62 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

[23]. Marie Meteer, et al (1995), Dysfluency Annotation Stylebook for the


Switchboard Corpus, Penn http://www.cis.upenn.edu/~treebank. Treebank II

Project,

[24]. Michael C.Dacota, Leo J. Obrst, Kevin T. Smith (2003), The Semantic Web, Wiley Publisher, ch 1,2, 7. [25]. Paul Kingsbury, Martha Palmer, and Mitch Marcus (2002), Adding Sematic
Annotation to Penn TreeBank, In Proceedings of the Human Language Technology Conference, San Diego, California.

[26]. Scott Owen Farrar (2003), An ontology for linguistics on the Semantic

Web, Doctor of Philosophy, Arizona State University, pp 12-14. [27]. Sean Luke, Lee Spector, David Rager , Ontology-Based Knowled Discovery
on the World Wide Web, http://www.cs.umd.edu/~seanl.

[28]. Sean Luke, Lee Spector, David Rager, James Hendler, Ontology-based Web Agents, ARPA/ Rome Laboratory Planning Initiative. [29]. Stefan Decker1, Frank van Harmelen3,4, Jeen Broekstra4, , Michael Erdmann5,
Dieter Fensel3, Ian Horrocks 2, Michel Klein3, Sergey Melnik1 (2003), The Semantic Web - on the respective Roles of XML and RDF, IEEE

intelligent systems, http://computer.org/intelligent.


[30]. Syed Ahmed (2003), Ontologies of electronic devicesn in DAML+OIL for

automated product design services in the Semantic Web, Master of engineering in Telecommunication Technology Management, Caleton University, Ottawa Canada, pp 4-89.
[31]. Youngchoon Park (2002), A frame work for discription, sharing and retrievel of semantic visual information, Doctor of Philosophy,

Arizona State University, pp 1-94.


[32]. CoNLL Share Task: http://www.lsi.upc.edu/~srlconll/

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

63 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn.

PH LC: Kt qu thc nghim vi cc cp danh t c tn sut xut hin ln hn 4 ln.


Chng trnh chy trn tp d liu Penn Tree Bank tm ra cc cp danh t c tn sut xut hin 4 sau:
STT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Company Smoking Smoking Traffic Standard & Poor environmental change daylight-saving time over age Jewel net income Group Investors Service Inc. Bank Investor Bad road War Poverty open-market poor rain each index Chicago Board program trading Trader HIV positive good command dramatic environmental change environmental change

Danh t
Sale lung cancer

Danh t

Tn sut xut hin


9 9 8 8 8 8 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 6 6 6 6 6

pulmonary problem Noise underwriter erosion Extra hour retirement robbery Share Share underwriter provision Stock traffic jam Death malaria investment slower agriculture 100 Trade market market sickness victory warmer climate ecosystem change

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

64 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn. Tn sut xut hin
6 6 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

STT 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
Soil Fight

Danh t
good crop

Danh t
wounded people Failure Stock Dollar paid-up capital Merrill Lynch Capital Markets recession hard decision program Price poverty Breast cancer heart disease the close problem Cent Caft Trade Debt chief executive officer infection delayed flight Bay Area ice-melting Bank equaling annual cost concession warming tsunamis company producer equipment Share

Recklessness Company Billion bank underwriter investor Congress Remic issuance market arms race environmental stress high blood pressure each index problem company Cow Merc company president virus Fog damage temperature increase loan index major technological breakthrough volcanic effect undersea earthquake president Warner IBM charge

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

65 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn. Tn sut xut hin
5 5 5 5 5 5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

STT 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
charge spokesman Fannie Mae money sale issue

Danh t
Cent company program bank company

Danh t

Merrill Lynch Capital Markets a national championship image bank bank cost smoking buy-out scotch and water scotch and water U.S. investor ton share scotch and water Congress president hairyknuckled knock Sierra Club door money power investor market time accident interest rate sleep average

the head coach chip provision bank company report Buy-out great disservice public dollar group company sale Clean Water Act president Congress scotch and water scotch and water scotch and water Trader president future announcement time carelessful driver Fed sleeping pill individual stock

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

66 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn. Tn sut xut hin
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

STT 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
magnitude K mart poverty company K mart K mart

Danh t
hazard

Danh t
number one job sickness market market-share loss discount store spinal cord injury company average bid plant trading asset business problem retirement first home purchase purchase computer market volatility money state official phone line quake computer troubled thrift Damage computer close investor investor announcement

motor vehicle accident chief executive officer price Buy-out group company close sale planner Early intervention money money retirement money Way earthquake market Different tactic California computer Way Californians nation Earthquake quake announcement portfolio Two-third company

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

67 Pht hin quan h ng ngha Nguyn nhn-Kt qu t cc vn bn. Tn sut xut hin
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

STT 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
shock wave market department course market Firm

Danh t
market investor bill firm firm profit right share market third quarter recession

Danh t

hard decision percentage basis Fear loss inflation right right Germany Fund Inc. Plan gainer right right right Congress offering responsibilitie hard decision hard decision group total volume group guardian guardian guardian provision

appropriate material and advice decision share company share life way rest right program guardian complaint fact alleged earlier violation program so-called prior-notice requirement stability price level measure paid-up capital

Ghi ch: nhng cp c nh du v l nhng cp mang ngha quan h nguyn nhn-kt qu.

V Bi Hng-Lun vn cao hc-Trng i hc Cng ngh-2005

Вам также может понравиться