Академический Документы
Профессиональный Документы
Культура Документы
5, 2011
Research Scholar, School of Information Technology and Engg , Vellore Institute of Technology, Vellore saleenaameen@gmail.com
2
Senior Professor, Dept of Electronics and Instrumentation Engg, St.Josephs College of Engg, Chennai profsks@rediffmail.com
3
Abstract
The existing web retrieval systems fetch us a great deal of irrelevant data and lacks in the ability to provide us with the correct and related information to a particular domain. This paper presents a framework for knowledge extraction by coupling the concepts of agents and ontologies. A Web of semantically enriched resources is created based on the domain ontology. Software agents are used to perform the task of annotating the web resources and generating RDFs and creating the knowledge base based on the domain ontology. The Java Agent DEvelopment (JADE) Framework was used for implementation of the agents. Machine readable ontology is created and the annotated documents are categorized in to relevant information based on that ontology and stored in the knowledge base and query agents helps in querying as per the users request. The system was evaluated for few queries and the results were judged based on the relevancy of users needs.
Keywords Semantic web, Agents, Knowledge Extraction, Annotation, Ontology, RDF 1. Introduction
The World Wide Web is a repository with all kind information for humans. Because of the rapid growth of web and its resources, retrieving the required and relevant data is an intimidating process. Its done by visiting a list of web sites, then retrieving pieces of information from each of them and consolidating it manually to assimilate the required knowledge. With the increasing complexity of our systems and our needs, we need to go toward human level interaction and maximize the amount of semantics we can utilize and make it easier for the users to retrieve meaningful information from the web. Semantic web technologies prove to be a promising one to effectively retrieve the relevant and semantic related resources in a single search. The best method is to design a system to extract information from diverse web resources and to present the knowledge gathered in a structured form. Providing a domain-specific vocabulary and sharing of web resources helps us for better access to relevant knowledge and also in describing the contents of knowledge. Semantic web is a proficient way of representing data on the World Wide Web in a meaningful way. It is a network of information linked up in such a way as to be easily processable by machines. Agents can be used to reduce the work of users by doing the background routine tasks of searching from thousands of documents.
November Issue
Page 1 of 88
International Journal of Advances in Science and Technology, Vol. 3, No.5, 2011 Semantic in the Semantic Web is not that computers are going to understand the meaning of anything, but that the logical pieces of meaning can be mechanically manipulated by a machine to useful ends[3]. A Standard machine understandable format is called Resource Description Framework (RDF) is used to represent the data and serves as a foundation on which the semantic web is built. RDF is a language for representing information about web resources. RDF is designed to be read by computers, not be designed to be displayed to people. RDF uses using XML as interchange syntax Resource Anything identifiable with a URI Description Statements about properties of resources Framework A common model for statements using diverse vocabularies.
1.2 Ontology
Ontology is explicit specification of conceptualization. It is the key component to using the semantic web approach for searching repositories. The relationships described as part of the ontology can allow the user to search on the basis of semantically related terms. At present the web resources associated with the same domain also differ in terms of syntax, structure and semantics. Ontologies help us to integrate the relevant documents under a common framework using RDF. Ontologies are meant to provide an understanding of the static domain knowledge that facilitates knowledge sharing and reuse. There are four different types [5] of ontologies. They are Domain ontology, Generic or Common Sense ontology, Method ontology, and Metadata ontology. The ontology used in this system is the domain ontology, which represents a tourism domain. The tourism domain concentrated in this project relates to three areas like type of trips, accommodation and hotel and the type of trips (Sub domain) are further classified into hill stations, pilgrimage and Education trips respectively. Sub domains can be drill downed based on the experts knowledge of the domain.
November Issue
Page 2 of 88
2. Related Work
The growth of semantic web has lead to effective knowledge management of diverse information present in the web. Few research works carried out in the area of semantic web and agents are discussed here. The Artequakt project of Harith Alani et al (2003) links a knowledge-extraction tool with an ontology to achieve continuous knowledge support and guide information extraction. The extraction tool searches online documents and extracts knowledge that matches the given classification structure [1]. It provides this knowledge in a machine-readable format that will be automatically maintained in a knowledge base.Problems related to this task, such as the identification and consolidation of duplicated knowledge and the verification of inconsistent knowledge, are highlighted. The work of Joshua Tauberer (2006) deals with the need for Resource Description Framework (RDF) to represent knowledge in the semantic web[3], in which computer applications make use of distributed, decentralized, structured information spread throughout the current web. In the work of Yi Xiao et al (2007) an agents-based intelligent retrieval framework in semantic web is proposed [8]. It is combined with other technologies such as information retrieval, knowledge modeling and ontology construction to perform the retrieval. There are so many ongoing researches on semantic web and intelligent agents and also the combination of both to extract information and respond to search queries of users based on tourism domain too. But none of the system focuses on providing details about the different amenities available in a tourist spot in a single search to guide the tourist efficiently. Traditional search Engines requires a lot of searches to gather information about a single tourist spot. For instance to retrieve the information about a particular hill station and to know about the hospitals or ATM in that particular location, different search queries has to be to posted, the proposed system is designed in such a way that the agents gather all the information pertaining to a particular tourist spot to guide the tourist efficiently .To accomplish this task a combination of semantic web and agents technologies are used.
3. System Architecture
This paper aims in using the semantic web technologies and intelligent agents to design a framework for knowledge extraction as shown in Figure 1. To achieve the task of designing this framework the following three phases were implemented by using semantic web technologies and the agents. Annotation of Web resources Creating a Knowledge Base and Knowledge Extraction This system consists of an annotation agent, knowledge agent and a query agent to carry out the tasks of converting the web resources to RDF, a machine- readable format which is used create a web of interrelated and meaningful information (semantic web) based on the domain ontology. The domain used to implement this prototype is tourism.
November Issue
Page 3 of 88
November Issue
Page 4 of 88
International Journal of Advances in Science and Technology, Vol. 3, No.5, 2011 A sample RDF file is generated by annotating the details of a website for tourism domain. The website contains the details of hill station in Tamilnadu. The below example in Figure 3 shows the RDF generated for the web site registered by a owner for ooty and kodaikanal hill station. The sample RDF generated is not designed to be displayed to the people. Its just to show how an RDF file will look like. The main goal of the annotation agent is to provide machine-readable description of the contents of the Web accessible resources.
November Issue
Page 5 of 88
The Knowledge agent takes the generated RDF and groups them in to triples. Based on the subject and the predicate of the triples the objects are grouped .This is done based on the ontology. A small portion of the domain ontology is shown in the Figure 5.
Figure 5: Sample Domain Ontology Based on the domain ontology an ontology library is created. The ontology library (repository) automatically creates folders based on its tree structure. The objects in triples are matched to the corresponding folders respectively. In this way the knowledge agent gathers all the related information from different websites and groups it in to a common domain to enrich the end user with only relevant information during his search.
November Issue
Page 6 of 88
Table 1 : Results of Information Extracted from Knowledge base Nature of Information Extracted Exact Relevant Irrelevant Retrieval without Stemming 43 39 20
6. References
[1] Harith Alani, Sanghee Kim, David E. Millard, Mark J. Weal,Wendy Hall, Paul H. Lewis, and Nigel R. Shadbolt ,Automatic Ontology-Based Knowledge Extraction from Web Documents, IEEE Intelligent systems,2003 [2] Natalya F. Noy and Deborah L. McGuinness, Ontology Development 101: A Guide to creating your first Ontology, http://protege.stanford.edu/publications/ontology_development/ontology101-noymcguinness.html
November Issue
Page 7 of 88
International Journal of Advances in Science and Technology, Vol. 3, No.5, 2011 [3] Joshua Tauberer, What is RDF and what is it good for?http://www.rdfabout.com/intro/ July 2006 [4]Phil Cross,Libby Miller,Sean Palmer .Using RDF to Annotate the (Semantic) Web http://www.ilrt.bris.ac.uk/publications/researchreport/rr1015/report_html, June 2003 [5] A.Aldea, R.Banares J.Bocio, J.Gramajo, D.Isern,A.Kokossis, L.Jimenez, A.Moreno, D.Riano, An Ontology-BasedKnowledgeManagementPlatformhttp://www.isi.edu/info- agents/workshops/ijcai03/ papers/DIsern-article-ijcai.pdf, 2003 [6] White, M., Korelsky, T., Cardie, C., Ng, V., Pierce, D.,Wagstaff, K.: Multi document Summarization via Information Extraction. Proc. of Human Language TechnologyConf. (HLT 2001), San Diego, CA, 2000. [7] Reidsma, D., Kuper, J., Declerck, T., Saggion, H.,Cunningham, H.: Cross document annotation for multimedia retrieval. EACL Workshop on Language Technology and the Semantic Web, Budapest, 2003. [8] Yi Xiao,Ming Xiao, Fan Zhang, Agents-based Intelligent Retrieval Framework for the Semantic Web,InternationalConference on Wireless Communications, Networking and Mobile Computing (IEEE) 2007.Volume , Issue , 21-25 Sept. 2007 Page(s):5357 5360
Authors Profile
Dr.S.K.Srivatsa was born at Bangalore on 21st July 1945.He received his Bachelor of Electronics and Telecommunication Engineering degree (Honors) from Jadavpur University(securing first rank and two medals), Master degree in Electrical Communication Engineering(With distinction) and Ph.D from Indian Institute of Science, Bangalore. He retired as a Professor of Electronics Engineering from Anna University in 2005 and currently working as a Senior Professor at St. Josephs College of Engineering since August 2005. He has taught twenty-two different Courses at P.G level during the last 34 years. He has functioned as a Member of the Board of studies in some Educational Institutions. He is a life Fellow/Member in about two dozen registered professional societies He has received about a dozen awards. He is the author of well over 450 publications in reputed journals and Conferences. He has produced 34 Ph.Ds. His research interest pertains to Electronics and Computer Science
B.Saleena, is currently pursuing her Research from Vellore Institute of Technology(V.I.T) University.Having received her MCA degree from Madras University. and her M.E.( Comp. Science) from Anna University (With distinction). For the past 14 years she was working in B.S.Adbur Rahman Crescent Engineering College having progressed to the level of Asst.Professor(Sel.Gr) in the department of Computer Applications. Her research interests are in the area of semantic web and ontology engineering. She has taught ten different courses at the P.G level during the last 14 years. Her research works were presented and published in both National & Internationalzz conferences and journals.
M.Manickasundaram. is currently working as Senior Test Analyst in the Royal Bank of Scotland group. He has received his MCA degree from Madras University. For the past 7 years he is working in the Software Testing field. Some of the major accomplishments during this period were establishment of a standardized test methodology for testing Data Warehouse, publishing a factory model for validation of Business Intelligence reports, formulating and standardization of a test strategy for validating IPhone Applications. His research interests are in the area of Semantic Web and Agent Programming.
November Issue
Page 8 of 88