Академический Документы
Профессиональный Документы
Культура Документы
XML clustering
methods
Sohn Jong-Soo
mis026@korea.ac.kr
Intelligent Information System Lab. Korea Univ.
2007.11.06
0. Index
Introduction
XML and XML schema
Relational vs. XML
Paper overview
My works
1. Introduction
XML
It has become a standard for information exchange
and retrieval
With the continuous growth in the XML data
The ability to manage massive collections of XML data
and to discover knowledge from them becomes
essential
For web based information system
Clustering method
Database objects, text data, multimedia data
XML data is different
Semi-structured
Hierarchical
Style sheet
XLS, CSS
XML
Content
XML file
Structure
Style
XLS, CSS
XML-1
XML-13
XML-2
XML-3
XML-4
XSLT
( DOM,SAX)
XML-1234
XSLT
( DOM,SAX)
XML-24
attribute
begin
element
chap
sect
sect sect
sect
sect sect
<html>
<h1> Chapter 1 </h1>
some free text
<h2> Section 1 </h2>
some more free text
<h3> Section 1.1 </h3>
</html>
4. Paper overview
XML schema clustering with semantic and
hierarchical similarity measures
This paper presents a XML schema clustering process
By organising the heterogeneous XML schemas
into various groups
4. Paper overview
Evaluating Structural Similarity in XML
Documents
Develop a dynamic programming
algorithm
to find this distance for any pair
of documents
4. Paper overview
A matching algorithm for measuring the
structural similarity between an XML document
and a DTD and its applications
This paper proposes a matching algorithm for
measuring the structural similarity
between an XML document and a DTD
4. Paper overview
This paper focused on five applications of the
algorithm:
(1) the classification of XML documents against a set of
DTDs
(2) the generation of a new schema
for a DTD by extracting structural information during the
classification of XML documents;
4. Paper overview
Schema Matching for Transforming
Structured Documents
Understanding the matching problem in the context
of structured document transformations
And developing matching methods those output
serves as the basis for the automatic generation of
transformation scripts
Four basic matching process
(1)linguistic matching
(2)datatype compatibility
(3)Designer type hierarchy
(4)structural matching
5. My works
XML data classification
Using a XML schema and its XML files
ID3 Algorithm
By classification tool on XML data
Problems
XML has hierarchical data type
It cant present like a table
References
E. Bertino, G. Guerrini, M. Mesiti, A matching algorithm for measuring the
structural similarity between an XML document and a DTD and its
applications, Information Systems 29 (1) (2004) 2346.
A. Boukottaya, C. Vanoirbeek, 2005, November 0204, Schema matching
for transforming structured documents. Paper presented at the The 2005
ACM Symposium on Document engineering, Bristol, United Kingdom.
References
A. Nierman, H.V. Jagadish, 2002, December, Evaluating structural
similarity in XML documents. Paper presented at the fifth International
Conference on Computational Science (ICCS05), Wisconsin, USA.
Richi Nayak, Wina Iryadi 2006, XML schema clustering with semantics
and hierarchical similarity measures.
http://www.w3c.org/xml