Академический Документы
Профессиональный Документы
Культура Документы
1 INTRODUCTION
emantic Search significance has lead to the emergence of natural language interfaces that permit users to convey their need of information via natural language processing which was a substitute for the users formal knowledge in ontologies. Using the knowledge sharing and exchanging of ontologies, they act as a substantial pillar in the semantic web. Although the NLIs have been through many evolutionary stages, it is still cant generate precise results because it cant understand the whole query. Most Natural Language Interfaces only recognize a part of the natural language query. In addition to that, NLIs does not provide any information to the users about the available resources to search for, which make the users uncertain of the queries that will be appropriately answered. In [3], this issue is tackled: Users need knowledge of what it is possible to ask in a particular domain and so did [4]: Often, users would attempt to paraphrase a sentence many times when the reason for the system's lack of understanding was due to the fact that the system did not have data about the query being asked. This divergence is called the habitability problem. As for guided based systems, is another type of semantic search engines. It is not as flexible as NLIs but it has high precision rates. The semantic search tools could be divided into four groups depending on their user interface approach: keyword-based, view-based, natural language based and form-based systems [5]. Keyword-based systems allow the input of several keywords and generate their equivalent semantic entities. These Keyword-based systems give the
R. El-Deeb is with the Arab Academy for Science Technology & Maritime Transport Cairo, Egypt. A. Hegazy is with the Arab Academy for Science Technology & Maritime Transport, Cairo, Egypt. A. Fahmy is with the Faculty of Computers and Information, Cairo University, Cairo, Egypt.
impression of regular information retrieval systems apparently, but allows user to precisely identify their information needs by interpreting each query term into semantic phrases. View-based systems sustain query creation and domain investigation using the presentation and navigation of ontology structure. Natural language systems translate natural language sentences which are submitted by the user into ontological queries via various linguistic techniques. Form-based systems direct the user in constructing semantic queries by the means of form structure and form controls, bearing in mind the ontology structures. The more difficult issue about this approach is its scalability for outsized ontologies, in regards to the scrolling lists limitation on the items count that can be incorporated in it. In addition to, the number of form controls that a form can contain. Therefore these previous issues might limit the usability of form-based interfaces. Thats why we believe that data profiling is an essential milestone to help in overcoming the limitations of formbased systems .In addition to, improving the data quality and the understanding of data. One of the most valuable technologies for enhancing data accuracy is Data profiling. It checks the data in an existing data source and gathers information and statistics concerning that data. This Data profiling exploits aggregates like count and sum. In addition to, various types of explanatory statistics like minimum, maximum, mean, standard deviation, and variation. Diverse analyses are executed on different structural levels. For instance, to acquire an understanding of frequency distribution of different types and values as well as use of columns, each column could be profiled independently. Data profiling has various techniques, one of them is five-number summary which is considered as an explanatory statistic offering information about a set of annotations. It is composed of the five most vital percentiles: the sample minimum, the lower quartile, the median, the upper quartile, and the sample maximum. In comparison with the mean and standard deviation, the
22
five-number summary is; in most cases; better especially for describing a slanted distribution or a distribution with excessive outliers. The mean and standard deviation are reasonable for outliers-free distributions which are considered symmetric. In real life we cant always expect symmetry of the data. Its a common practice to include number of observations (n), mean, median, standard deviation, and range as common for data summarization purpose.
is a portable tool that can assemble and tune parameterized searches. It has been tested for more than ten diverse domains [9]. However, it does not effectively support investigative browsing like Magnet [10], which is a module of Haystack. In survey of other approaches, mSpace7 [11] produces form-like interfaces straight from the domain structure, introduces the user to an alien information space, where he does not know how it is structured. Yet, on the other hand, it is incapable of searching in a diverse environment.
2.1 KEYWORD-BASED SYSTEMS These systems utilize the accessibility of unambiguous semantics to improve the performance of conventional keyword search. Keyword-based tools most important benefit is allowing end-users to specify queries with a straightforward manner; which is very familiar to them. Giving end-users the ability to use these systems without any prior knowledge of the ontologys exact vocabulary or structure. Also, without the need to master a special query language. The way the search algorithm processes the queries and their keyword selection method, determines the success of the search. The TAP search engine [6] is a keyword-based semantic search systems, that was one the pioneers to build such systems. It makes use of the conventional keyword search algorithms. The present tools keywordmatching mechanisms are treated at the syntactic level, using string-matching techniques. This makes them domain independent, because they are not attached to domain ontologies but unfortunately making them unable to recognize the information needs of end-users. Therefore, they dont always generate successful search results. As a result, keyword semantic search routine should integrate both semantic and syntactic matching mechanisms by employing domain-specific ontology and lexical resources like WordNet. In order to, match the user keyword with its semantic equivalents. This was demonstrated in ZOOM5 and the distinctive features presented in [7] and [8], which generated interesting semantic matching results. 2.2 FORM -BASED SYSTEMS
All computer users use Forms in their typical day-today interactions. Resulting to make forms an accepted approach for semantic search interfaces. By making users select query values from valid expressions lists, formbased interfaces can overcome mapping issues that arise in other interaction modes. They give the user the ability to envision what the except-able searches would look like by viewing the user what is there in the domain and therefore supporting his understanding of it. Form-based interfaces are supported by the Corese library tool, which
JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 17, ISSUE 2, FEBRUARY 2013 23
However, if we try to review the constraints of Semantic QA systems, we find that these systems do not provide the end-users with any hints about the domain they are querying, which means that the end-users must be acquainted with the domain in order to create valid questions. In elaboration, these systems do not assist the user in understanding the domain, very much like the constraints of the keyword search systems. GINO system [17], tried to solve these limitations by presenting step by step directions to end-user while creating a query in the quasi-English form, to guarantee only the passing of suitable queries. Another example is Orakel9 [18] which uses two diverse lexicons: the domain lexicon and the generic, domain-independent lexicon. The domain-independent lexicon consists of English language related words like for example questions pronouns including When, How,etc. The domain lexicon is produced on the fly from each knowledge base thus, differing from one application to another. However, a Natural Language system does have its limitations. In order to have the flexibility of constructing a natural language query with numerous query-parts, you will have to sacrifice the ability to interpret the whole query, where there is a constraint on the number of phrases that could be parsed correctly. Therefore, not all complex parameterized queries could be interpreted. But what should be accounted for is that it handles scalability to large ontologies very well.
Performing data profiling on data values for better User experience. Validating the robustness of this novel approach
using Arabic data set which adds a new dimension for our approach to be a multilingual and flexible.
The main intention of this research is to offer a generic hybrid semantic search approach (GHSSA) to accomplish the following goals: Enlarging the utilization scope and maximizing the benefits of the SFGBSS to be used with any Standard English data set in a data set independent basis. Displaying different comparative operands according to the data type of each relation.
24
These datasets were used in specific because they have been used in most of the related work in this area. Thats why they are considered as a credible and standard source. Also, it facilitates the capability of comparison with other systems. Each data set contains a collection of English Queries in addition to the domain knowledge base.
4 GHSSA IMPLEMENTATION
Fig.5. Geo Data Set Layout Arabization.
announcements posted in the newsgroup austin.jobs. Geo Query Data in Arabic: Data for parsing queries about a simple U.S. geography database in translated into Arabic.
The consecutive eight figures represent the screenshots of the GHSSA with different datasets. Each pair illustrates a user query and its corresponding system response.
JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 17, ISSUE 2, FEBRUARY 2013 25
26
6 CONCLUSION
In the past few years various semantic search engines emerged, each implements a different approach. The purpose of this research was to enhance the precision of the guided systems by infusing a form-based interface to it. The GHSSA surpassed in implementing a Generic Hybrid Semantic Search engine that overcomed the limitations of Natural Language Interfaces habitability problem by providing the user with the data values in the domain and the Natural Language Interfaces limitation to the number of query-parts in a phrase that it can be correctly interpreted by displaying all the relations that exist so that the user can choose as many as required. In Regards to the Form-based search engine scalability limitation, we provided the data profiling of the data using five-number statistical model which can be implemented on any range of values no matter how large it gets. The GHSSA also encompassed some features that were implemented in previous Semantic Search engines like portability. In addition to, some new features that were not executed in other Semantic search engines as far as we know like, displaying different comparative operands according to the data type of each relation and querying an Arabic data set.
Fig. 14. Comparison between the precision and recall of the 3 Data Sets.
5 EXPERIMENTAL RESULTS
The GHSSA was tested on the 3 Mooney Data Sets mentioned above and then compared with the Nlp-
El-Deeb, Abdel Fatah. A. Hegazy, Aly Aly Fahmy. Semantic Form-Based Guided Search System. (2012). In the 22nd International Conference on Computer Theory and Applications. Alexandria, Egypt. L.R. Tang, R.J. Mooney, Using multiple clause constructors in inductive logic programming for semantic parsing. In: 12th Europe. Conf. on Machine Learning, Freiburg, Germany. 2001, pp. 466477. A. Bernstein, & E. Kaufmann, GINO - A Guided Input Natural Language Ontology Editor. In Proceedings of the 5th International Semantic Web Conference (ISWC 2006). Athens, Georgia, 2006, pp. 144-157. A. Bernstein, E. Kaufmann, & C. Kaiser, Querying the Semantic Web with Ginseng: A Guided Input Natural Language Search Engine. In Proceedings of the 15th Workshop on Information Technology and Systems (WITS 2005). Las Vegas, NV, 2005, pp. 45-50. Victoria Uren, Yuangui Lei, Vanessa Lopez, Haiming Liu, Enrico Motta, Marina Giordanino. (2007). The usability of semantic search tools: a review. The Knowledge Engineering Review. (pp 361-377). Guha, R., McCool, R. & Miller, E. 2003 Semantic search. In 12th International Conference on World WideWeb. pp. 700709. Mihalcea, R. & Moldovan, D. 2005 Semantic indexing using wordnet senses. In Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics. pp. 3545. Buscaldi, D., Rosso, P. & Sanchis Arnal, E. 2005 A WordNetbased queryexpansion method for geographical information retrieval. In CLEF 2005 Workshop at GeoCLEF 2005. Vienna, Austria.
[2]
[3]
[4]
Fig. 15. Comparison between the precision and recall of phase 1 and phase 2.
Reduce NLI [19]. Fig. 14 shows the Precision and recall performance measures for both systems. Fig. 15 shows the total precision and total recall of the SFBGSS (phase 1)[1] and the GHSSA (phase 2) compared with Nlp-Reduce in both phases.
[5]
[6] [7]
[8]
JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 17, ISSUE 2, FEBRUARY 2013 27
[9] [10]
Corby, O., Dieng-Kuntz, R., Faron-Zucker, C. & Gandon, F. 2006 Searching the semantic web: approximate query processing based on ontologies. IEEE Intelligent Systems 21(1), 2027. Sinha, V., Karger, D. R. 2005 Magnet: supporting navigation in semistructured data environments. In 2005ACM SIGMOD International Conference on Management of Data. Baltimore, Maryland, ACM Press, pp. 97106. schraefel, m.c., Wilson, M., Russell, A., & Smith, D.A., 2006 mSpace: improving information access to multimedia domains with multimodal exploratory search. Communications of the ACM 49(4), 4749. Athanasis, N., Christophides, V. & Kotzinos, D. 2004 Generating on the fly queries for the semantic web: the ICS-FORTH graphical RQL interface (GRQL). In 3rd International Semantic Web Conference (ISWC04). Hiroshima, Japan, pp. 486501. Catarci, T., Di Mascio, T., Franconi, E., Santucci, G. & Tessaris, S. An ontology based visual tool for query formulation support. In 16th European Conference on Artificial Intelligence (ECAI04). 2004. Valencia, Spain, pp. 308312. Hyvonen, E., Saarela, S. & Viljanen, K. 2003 Ontogator: combining view-and ontology-based search with semantic browsing. In XML Finland 2003, Open Standards, XML, and the Public Sector. Kuopio, Finland, pp. 8285. Mc Guinness, D. 2004 Question answering on the semantic web. IEEE Intelligent Systems 19(1), 8285 Lopez, V., Pasin, M. & Motta, E. 2005 AquaLog: an ontologyportable question answering system for the semantic web. In 2nd European Semantic Web Conference (ESWC 2005). Heraklion, Crete, Greece, pp. 546562. Bernstein, A. & Kaufmann, E. 2006 GINO-a Guided Input Ontology Editor. In Proceedings of the International Semantic Web Conference. pp. 144157. Cimiano, P. 2004 ORAKEL: A natural language interface to an flogic knowledge base. In 9th International Conference on Applications of Natural Language to Information Systems (NLDB). pp. 401406. E. Kaufmann and A. Bernstein, "How Useful Are Natural Language Interfaces to the Semantic Web for Casual EndUsers?," Proceedings of the 6th International Semantic Web Conference (ISWC 2007), Busan, Korea: 2007, pp. 281-294. A. Bernstein and E. Kaufmann. Making the semantic web accessible to the casual user: Empirical evidence on the usefulness of semiformal query languages. IEEE Transactions on Knowledge and Data Engineering, under review. A. Bernstein, E. Kaufmann, C. Kaiser, and C. Kiefer, "Ginseng: A Guided Input Natural Language Search Engine for Querying Ontologies," Jena User Conference, Bristol, UK: 2006. Esther Kaufmann, Abraham Bernstein, Renato Zumstein, Querix: A Natural Language Interface to Query Ontologies Based on Clarification Dialogs, In: 5th International Semantic Web Conference (ISWC 2006), Springer, November 2006. C. W. Thompson, P. Pazandak, and H. R. Tennant. Talk to your semantic web. IEEE Internet Computing, 9(6):75-78, 2005.
[11]
[12]
[13]
[14]
[15] [16]
[17] [18]
[19]
[20]
[21] [22]
[23]