Вы находитесь на странице: 1из 2

3/4/2014

1
XMLandSemistructuredData
Management
ErenErener
CSC500Presentation
FormsofData
UnstructuredData StructuredData SemistructuredData
datacanbeofany
type
notnecessarily
followingany
formatorsequence
doesnotfollowany
rules
isnotpredictable
examples:text,
video,sound,
images
dataisorganizedin
semanticchunks(entities)
similarentitiesare
groupedtogether
(relations)
entitiesinthesamegroup
havethesameattributes
descriptionsforallentities
inagroup(schema)
havethesame
definedformat
haveapredefined
length
areallpresent
data is availableelectronicallyin
Filesystems
Webdata
attempt toreconciledatabaseand
document "worlds"
semistructureddata
organizedinsemantic entities
similarentities aregrouped
together
entities insamegroupmaynot
havesameattributes
order ofattributes not necessarily
important
not allattributes mayberequired
sizeofsameattributes ina group
maydiffer
typeofsameattributes ina group
maydiffer
ExamplesofData
Data
Unstructured
AnalogData
GPSTrackingInfo.
A/VStreams
Semistructured
XML
EMail
FileSystems
Structured
Databases
DataWarehouses
Enterprisesystems
(CRM,ERP,etc.)
WhySemistructuredDatais
Important
Therearedatasourcessuchas
theWeb
HTML
XML
filesystems
Wewouldliketotreatitasdatabase
Butitcannotbeconstrainedbyaschema
Managingsuchdatarequiresrethinkingthe
designofcomponentsofaDBMS
NewUniversalDataExchangeFormat:
XML
XML=data
XMLgeneratedbyapplications
XMLconsumedbyapplications
Easyaccess:acrossplatforms,organizations
XML?
eXtensible MarkupLanguage(XML)
amarkuplanguage
definesasetofrulesforencodingdocuments
humanreadable
machinereadable
Motivation:
HTMLdescribespresentation
XMLdescribescontent
3/4/2014
2
HTMLvs XML
HTML XML
<h1>Bibliography</h1>
<p><i>DataProcessing</i>
Kroenke, Auer
<br>Pearson,2012
<p><i>DataontheWeb</i>
Abiteboul,Buneman,Suciu
<br>MorganKaufmann,1999
<bibliography>
<book><title>DataProcessing</title>
<author>Kroenke </author>
<author>Auer</author>
<publisher>Pearson
</publisher>
<year>2012</year>
</book><title>DataontheWeb</title>

</bibliography>
DescribestoPresentation DescribestheData
HowtoconvertHTMLtoXML?
Theobjectiveofthewrapper istoextractthe
relevantinformationfromtheHTMLpageand
totransformitintoaformat(XML)thatcanbe
easilyreusedbyapplication
Todaywecreateandpublishdocumentsin
XMLformat
PPTX,DOCX,XLSX
HowarewegoingtomodelXML?
DataGraph
QueryingSemistructuredData
<?xml version="1.0" encoding="UTF8"?>
<bookstore>
<bookcategory="COOKING">
<titlelang="en">Everyday Italian</title>
<author>Giada DeLaurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<bookcategory="CHILDREN">
<titlelang="en">Harry Potter</title>
<author>JK.Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<bookcategory="WEB">
<titlelang="en">XQuery Start</title>
<author>James McGovern</author>
<author>PerBothner</author>
<author>Kurt Cagle</author>
<author>James Linn</author>
<author>Vaid Nagarajan</author>
<year>2003</year>
<price>49.99</price>
</book>
<bookcategory="WEB">
<titlelang="en">Learning XML</title>
<author>ErikT.Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Query1:
doc("books.xml")/bookstore/book/title
Result1:
<title lang="en">EverydayItalian</title>
<title lang="en">HarryPotter</title>
<title lang="en">XQueryKickStart</title>
<title lang="en">LearningXML</title>
Query2:
doc("books.xml")/bookstore/book[price<30]
Result2:
<book category="CHILDREN">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
Thanks

Вам также может понравиться