Академический Документы
Профессиональный Документы
Культура Документы
Given the intrinsic meaning of the term “ontology” and the efforts required
by ontology building, reuse and reusability are very important issues for a
cost-effective and high-quality ontology engineering. While several method-
ologies describing the reuse process already emerged in the Semantic Web
community, the implications of reuse in concrete application settings have
not been examined to a satisfactory extent yet. In this paper we analyze the
costs and benefits related to the reuse process on the basis of two case studies
which attempt to build new ontologies in the domains of eRecruitment and
medicine by means of ontological knowledge sources available on the Web.
Introduction
1
Elena Paslaru Bontas, Malgorzata Mochol
1. In fact large amounts of domain knowledge are encoded in thesauri like Cyc, UMLS (a
medical ontology containing over 300,000 concepts) using proprietary formats or even
natural language without any technical support for translation tools.
2
Towards a reuse-oriented methodology for ontology engineering
3
Elena Paslaru Bontas, Malgorzata Mochol
The presented methodology, though relatively straight forward and not tap-
ping the full potential of the newest approaches in ontology match-
ing/merging, has proved to be very useful and cost-saving in the application
domains presented below, since it does not has to cope with the limitations
related to the heterogeneity of the available source ontologies. Such hetero-
geneity issues currently make the automatic usage of matching techniques a
tedious and error-prone process.
2. Ontological primitives are denominated by complex phrases, in which single words are
capitalized or delimited by space, underscore etc.
4
Towards a reuse-oriented methodology for ontology engineering
3. See (Lopez, 2002) for a detailed description of ontology building and its sub-tasks.
5
Elena Paslaru Bontas, Malgorzata Mochol
6
Towards a reuse-oriented methodology for ontology engineering
The reused size SizeR is divided into the size of the directly integrated (Size-
dir), translated (Sizetrans) and/or modified (Sizemod, Sizetransmod) components
with different cost drivers: the unfamiliarity of ontologists and domain ex-
perts (OUNF), ontology understanding (OU), evaluation (OE), modification
(OM) and translation (OT). For a detailed explanation of the cost drivers see
(Paslaru and Mochol, 2005).
We now turn to the presentation of the two case studies on ontology reuse
from the domains of recruitment and medicine.
The “Knowledge Nets”4 project explores the potential of Semantic Web from
a business and a technical viewpoint by means of pre-selected use scenarios.
One of the scenarios analyzed the online job seeking and job procurement
processes and the implications of Semantic Web technologies in this area
(Mochol et al., 2004; Bizer et al., 2005).
The first step towards the realization of the e-Recruitment scenario was the
creation of a human resources ontology (HR-ontology). The requirements
analysis revealed the necessity of aligning the resulting ontology with com-
monly used domain standards and classifications in order to maximize the in-
tegration of job seeker profiles and job postings.
First we identified the sub-domains of the application setting (skills, types
of professions, etc.) and several useful knowledge sources covering them
(approx. 25). As candidate ontologies we selected some of the most relevant
classifications in the area, deployed by federal agencies or statistic organiza-
tions: Profession Reference Number Classification – BKZ (text file), Stan-
dard Occupational Classification – SOC5 (text file), Classification of Indus-
trial Sector – WZ20036 (text file), North American Industry Classification
System – NAISC7 (text file), Human Resources XML – HR-XML8 (XML
scheme), HR-BA-XML (XML scheme) and KOWIEN Skill Ontology9
(DAML+OIL).
Depending on the language used in the knowledge sources (Eng-
lish/German) we generated lists of concept names. Except for the KOWIEN
ontology, additional ontological primitives were not supported by the candi-
4. http://nbi.inf.fu-berlin.de/research/wissensnetze
5. http://www.bls.gov/soc/
6. http://www.destatis.de/allg/d/klassif/wz2003.htm
7. http://www.census.gov/epcd/www/naics.html
8. http://www.hr-xml.org
9. KOWIEN - Cooperative Knowledge Management in Engineering Networks;
http://www.kowien.uni-essen.de/
7
Elena Paslaru Bontas, Malgorzata Mochol
The project “A Semantic Web for Pathology”10 analyzes the impact of on-
tologies within a retrieval system for image and text data for the medical do-
main. The underlying ontology is used for concept-based search techniques
and for the semantic annotation of medical data (i.e. medical reports in text
form) (Paslaru et al., 2004).
In order to generate the ontology using available medical sources we ap-
plied the reuse-oriented methodology described in Section “A reuse-centered
methodology for ontology engineering”. First, we identified and analyzed
relevant knowledge sources, describing aspects of pathology-related knowl-
edge and diagnosis procedures. However, the sources to be reused in this set-
ting differ to a large extent in the content area and granularity, representation
format and degree of formality: i). SNOMED11 and DigitalAnatomist12 de-
scribe the anatomy of the lung and typical diseases (database); ii). The UMLS
Semantic Network13 contains generic and core medical concepts as part of
UMLS (database format); iii). XML-HL7 is an XML-based format for the
representation of patient data; and iv). Immunohistology Guidelines are a list
of stains to be applied in diagnosis procedures in our partner healthcare or-
ganization (textual description).
10. http://nbi.inf.fu-berlin.de/research/swpatho/deutsch/projektbeschreibung.htm
11. http://www.snomed.org
12. http://www.digitalanatomist.com/
13. Unified Medical Language System, National Library of Medicine;
http://www.nlm.nih.gov/research/umls
8
Towards a reuse-oriented methodology for ontology engineering
The cost and benefit analysis of the presented case studies focused on the
estimation of the presumed cost savings achieved by reuse. The real costs
arisen in the two projects were compared with the predicted costs which
would have been caused by building the corresponding ontologies from
scratch. The costs induced by the second approach were calculated using
ONTOCOM (Paslaru and Mochol, 2005).
In the recruitment scenario we found several taxonomies for the descrip-
tion of skills, classification of job profiles and industrial sectors, which we
wanted to reuse in our ontology. 15% of the total time was spent on gather-
ing the relevant sources while about 35% were invested in their customiza-
tion. Several ontologies have been fully integrated into the resulting ontol-
ogy, while KOWIEN and the XML-based sources required additional cus-
tomization. This part of the ontology building process produced over 40%
of the total engineering costs. The last phase of the ontology building, re-
finement and evaluation, costs 10% of the overall resources.
According to our experiences reusing existing knowledge source was
profitable for the HR-domain and for our application setting. A cost estima-
tion for a new implementation revealed that the reuse approach was more
cost-effective (2,5 PM’s for the HR-ontology with reuse vs. 4 PM’s for de-
velopment from scratch). In the same time re-using standard classifications
is expected to considerably increase the usability of our e-Recruitment ap-
plication. Nevertheless there is a need for reliable tools for translating be-
tween various representations and for ontology customization in order to
further optimize reuse costs.
9
Elena Paslaru Bontas, Malgorzata Mochol
The main challenge of the second scenario was the evaluation of existing
sources. Medicine is one of the best examples of application domains where
ontologies have already been deployed at large scale and have already dem-
onstrated their utility (Gangemi et al., 1999). However most of the available
ontologies in this domain are very comprehensive knowledge bases, which
differ in the formalized domain, quality and appropriateness for certain ap-
plication tasks. Additionally most of the available medical ontologies lack a
“reuse-friendly” representation format.
Since the retrieval system using the ontology is still under development,
the product-oriented benefits of the reusing process can not be fully evalu-
ated at this point. However we may say that, for this scenario, the efforts re-
lated to the customization of the source ontologies required over 45% of the
time necessary to build the target ontology. Further 15% of the engineering
time was spent on translating the input representation formalisms to OWL.
The reuse oriented approach gave rise to considerable efforts to evaluate
and extend the outcomes (approx. 40% of the total engineering time).
According to our experiences in this case study the benefits of reuse were
outweighed by their costs, because of the difficulties related to the evaluation
and (technical) management of large scale ontologies and because of the
costs of the subsequent refinement phase. Using ONTOCOM we approxi-
mated the costs induced by semi-automatically building a similar ontology on
the basis of a domain specific document corpus From a resource point of
view, building the first ontology involved four times as many resources as a
new implementation (5 person months for the UMLS based ontology with
1200 concepts vs. 1.25 person months for a manually developed ontology). In
the same time the recall of the ontology w.r.t. the semantic annotation task
would be consequently improved in the latter case because of the text-close
nature of the generation method.
In this paper we described two case studies on ontology reuse and a simple
methodology underlying them. Further on we introduced a method to esti-
mate the costs arisen in ontology building processes which was used to ana-
lyze the costs and the benefits of reuse in the mentioned case studies.
Ontology integration means not only the translation of the representation
languages to a common format, but also the matching of the resulting
schemes. Our experience during the presented case studies showed that due to
scalability and heterogeneity issues both of these steps can not be performed
efficiently using current techniques. This was the fundamental motivation for
applying an eventually less technically-versed reuse methodology in the case
studies. However exploiting incrementally the “lowest common denomina-
10
Towards a reuse-oriented methodology for ontology engineering
Acknowledgements
This work is a result of the cooperation within the Semantic Web PhD-
Network Berlin-Brandenburg14 and has been partially supported by the
KnowledgeWeb - Network of Excellence, by the project “A Semantic Web
for Pathology” funded by the German Research Foundation DFG and by the
“Knowledge Nets” project, which is part of the InterVal - Berlin Research
Centre for the Internet Economy, funded by the German Ministry of Research
BMBF.
Do, H. & Melnik, S. & Rahm, E.; Comparison of schema matching evalua-
tions. Proceedings of the 2nd International Workshop on Web Databases
(German Informatics Society), 2002
14. http://nbi.inf.fu-berlin.de/research/KnowledgeWeb/phd/phd.html
11
Elena Paslaru Bontas, Malgorzata Mochol
Gruber, R. T.; Toward principles for the design of ontologies used for knowl-
edge sharing. Int. J. Hum.-Comput. Stud., 43(5-6):907–928, 1995.
Grüninger, M. & Fox, M.; Methodology for the Design and Evaluation of
Ontologies. Proceedings Workshop on Basic Ontological Issues in Knowl-
edge Sharing, IJCAI95, 1995.
Linstone, H. A. & Turoff, M.; The Delphi Method: Techniques and Applica-
tion, Addison-Wesley, 1975
Paslaru Bontas, E. & Mochol, M.; A Cost Model for Ontology Engineering.
Technical Report, TR-B-05-03, FU Berlin, ftp://ftp.inf.fu-
berlin.de/pub/reports/tr-b-05-03.pdf, 2005.
12