Вы находитесь на странице: 1из 8

IPASJ International Journal of Computer Science (IIJCS)

A Publisher for Research Motivation ........

Volume 2, Issue 1, January 2014

Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm Email: editoriijcs@ipasj.org ISSN 2321-5992

A LITERATURE SURVERY ON INFROMATION RETERIVAL IN WEB BASED DATA


D.saravanan1 S.Vaithyasubramanian2 K.N.Jothi Venkatesh3
Assistant Professors, Department of Computer Applications Sathyabama University, Chennai-600119, Tamil Nadu, India
2 Assistant Professors, Department of Mathematics Sathyabama University, Chennai-600119, Tamil Nadu, India 3 Assistant Professors, Department of Computer Applications Sathyabama University, Chennai-600119, Tamil Nadu, India 1

ABSTRACT
On the last decades, the amount of web-based information available has increased dramatically. How to gather useful information from the web has become a challenging issue for users. Current web information gathering systems attempt to satisfy user requirements by capturing their information needs. Due to the assumption of all documents being generated from a single common template, solutions for this problem are applicable only when all documents are guaranteed to conform to a common template. However, in real applications, it is not trivial to classify massively crawled documents into homogeneous partitions in order to use these techniques. In this paper we try to present a literature survey on information retrieval in web based document.

Key words: Data mining, Information Retrieval, Web information, Web based data, Ontology.

1. INTRODUCTION
To simulate user concept models, ontologiesa knowledge description and formalization modelare utilized in personalized web information gathering. Such ontologies are called ontological user profiles or personalized ontologies. To represent user profiles, many researchers have attempted to discover user background knowledge through global or local analysis.Global analysis uses existing global knowledge bases for user background knowledge representation. Commonly used knowledge bases include generic ontologies (e.g., WorldNet [26]), thesauruses (e.g., digital libraries), and online knowledge bases (e.g., online categorizations and Wikipedia). The global analysis techniques produce effective performance for user background knowledge extraction. However, global analysis is limited by the quality of the used knowledge base. For example, Word Net was reported as helpful in capturing user interest in some areas but useless for others. Local analysis investigates user local information or observes user behavior in user profiles. For example, Li and Zhong discovered taxonomical patterns from the users local text documents to learn ontologies for user profiles. Some groups learned personalized ontologies adaptively from users browsing history. Alternatively, Sekine and Suzuki analyzed query logs to discover user background knowledge. In some works, such as users were provided with a set of documents and asked for relevance feedback. User background knowledge was then discovered from this feedback for user profiles. However, because local analysis techniques rely on data mining or classification techniques for knowledge discovery, occasionally the discovered results contain noisy and uncertain information. As a result, local analysis suffers from ineffectiveness at capturing formal user knowledge. From this, we can hypothesize that user background knowledge can be better discovered and represented if we can integrate global and local analysis within a hybrid model. The knowledge formalized in a global knowledge base will constrain the background knowledge discovery from the user local information. Such a personalized ontology model should produce a superior representation of user profiles for web information gathering.

2. LITERATURE REVIEW Volume 2 Issue 1 January 2014 Page 43

IPASJ International Journal of Computer Science (IIJCS)


A Publisher for Research Motivation ........

Volume 2, Issue 1, January 2014

Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm Email: editoriijcs@ipasj.org ISSN 2321-5992

Chris Buckley, Ellen M. Voorhees Evaluating Evaluation Measure Stability. 2.1 PROBLEM FORMULATION Information retrieval has a well-established tradition of performing laboratory experiments on test collections to compare the relative effectiveness of different retrieval approaches. The experimental design specifies the evaluation criterion to be used to determine if one approach is better than another. Because retrieval behavior is sufficiently complex to be difficult to summarize in one number and many different effectiveness measures have been proposed. 2.1.1 RESEARCH DESIGN The test collection must have a reasonable number of requests. Sparck Jones and van Rijsbergen suggested a minimum of 75 requests, while the TREC program committee has used 25 requests as a minimum and 50 requests as the norm. Five or ten requests is too few. The experiment must use a reasonable evaluation measure. Average Precision, R-Precision, and Precision at 20 (or 10 or 30) documents retrieved are the most commonly used measures. Measures such as Precision at one document retrieved (i.e., is the first retrieved document relevant?) or the rank of the first relevant document are not usually reasonable evaluation measures. Conclusions must be based on a reasonable notion of difference. Sparck Jones has suggested that a difference in the scores between two runs that is greater than 5% is noticeable, and a difference that is greater than 10% is material 2.1.2 FINDINGS This paper examines these three rules-of-thumb and shows how they interact with each other. A novel approach is presented for experimentally quantifying the likely error associated with the conclusion method A is better than method B given a number of requests, an evaluation measure, and a notion of difference. As expected, the error rate increases as the number of requests decreases. More surprisingly, a striking difference in error rates for various evaluation measures is demonstrated. For example, Precision at 30 documents retrieved has almost twice the error rate of Average Precision. These results do not imply that measures with higher error rates should not be used; different evaluation measures evaluate different aspects of retrieval behavior and evaluation measures must be chosen to match the goals of the test. The results do mean that for a researcher to be equally confident in the conclusion that one method is better than another, experiments based on measures with higher error rates require either more requests or larger differences in scores than experiments based on measures with lower error rates. 2.1.3 CONCLUSION This paper presents a method for quantifying how the number of requests, the evaluation measure, and the notion of difference used in an information retrieval experiment affect the confidence that can be placed in the conclusions drawn from the experiment. It is shown that some evaluation measures are inherently more stable than others. For example, Precision after 10 documents are retrieved has more than twice the error rate associated with it than the error rate associated with Average Precision. Conclusions drawn from experiments using more requests are more reliable than conclusions drawn from experiments using fewer requests. It is shown that requiring a larger difference between scores before considering the respective retrieval methods to be truly different increases reliability, but at a cost of not being able to discriminate between as many methods. 2.1.4 IMPLICATION Goal is to use the data from the Query Track to quantify the error rate associated with deciding that one retrieval method is better than another given that the decision is based an experiment with a particular number of topics, a specific evaluation measure, and a particular value used to decide if two scores are different. Our approach is as follows. First, we choose an evaluation measure and a fuzziness value. The fuzziness value is the percentage difference between scores such that if the difference is smaller than the fuzziness value the two scores are deemed equivalent. For example, if the fuzziness value is 5%, any scores within 5% of one another are counted as equal. A query set is picked and compute the mean of the evaluation measure over that query set for each of the nine retrieval methods. For each pair of retrieval methods, comparison is made whether the first method is better than, worse than, or equal to the second method with respect to the fuzziness value. Another query set is selected and repeat the comparison

Volume 2 Issue 1 January 2014

Page 44

IPASJ International Journal of Computer Science (IIJCS)


A Publisher for Research Motivation ........

Volume 2, Issue 1, January 2014

Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm Email: editoriijcs@ipasj.org ISSN 2321-5992

multiple times. This results in a 9x9 triangular matrix giving the number of times each retrieval method was better than, worse than, and equal to each other retrieval method for each of the query sets used. Susan Gauch, Jason Chaffee, Alexander Pretschner Ontology-Based Personalized Search and Browsing 2.2 PROBLEM FORMULATION The Web has experienced continuous growth since its creation. As of March 2002, the largest search engine contained approximately 968 million indexed pages in its database. As the number of Internet users and the number of accessible Web pages grows, it is becoming increasingly difficult for users to find documents that are relevant to their particular needs. 2.2.1 RESEARCH DESIGN The ontologies that are used for browsing content at a Web site are generally different for each site that a user visits. Even if there are similarly named concepts in the ontology, they may contain different types of pages. Frequently, the same concepts will appear with different names and/or in different areas of the ontology. Not only are there differences between sites, but between users as well. One user may consider a certain topic to be an Arts topic, while a different user might consider the same topic to be a Recreation topic. Thus, although browsing provides a very simple mechanism for information navigation, it can be time consuming for users when they take the wrong paths through the ontology in search of information. 2.2.2 FINDINGS One increasingly popular way to structure information is through the use of ontologies, or graphs of concepts. One such system is OntoSeek [Guarino 99], which is designed for content-based information retrieval from online yellow pages and product catalogs. OntoSeek uses simple conceptual graphs to represent queries and resource descriptions. The system uses the Sensus ontology [Knight 99], which comprises a simple taxonomic structure of approximately 70,000 nodes. The system presented in [Labrou 99] uses Yahoo! [YHO 02] as ontology. The system semantically annotates Web pages via the use of Yahoo! categories as descriptors of their content. The system uses Telltale [Chower 96a, Chower 96b, and Pearce 97] as its classifier. Telltale computes the similarity between documents using n-grams as index terms. The ontologies used in the above examples use simple structured links between concepts. A richer and more powerful representation is provided by SHOE [Heflin 99, Luke 97]. SHOE is a set of Simple HTML Ontology Extensions that allow WWW authors to annotate their pages with semantic content expressed in terms of ontology. SHOE provides the ability to define ontologies, create new ontologies which extend existing ontologies, and classify entities under an is a classification scheme. 2.2.3 CONCLUSION A personalized search system is created that made use of the automatically created user profiles. Documents in the result set of an Internet search engine were classified based on their titles and summaries. Those documents that were classified into concepts that were highly weighted in the users profile were promoted by a re-ranking algorithm. Overall, an 8% improvement in the top 20 precision resulted from this personalized re-ranking, with the biggest improvement seen in the top-ranked results. The personalized search results reported here are promising, but they exposed two areas of possible improvement. First, the quality of the results is affected by the quality of the classification of documents into concepts that, in turn, is affected by the quality of the training data for each concept. Second, working as a post-process on the search results limits the ability of the system to achieve dramatic gains in search performance. If few of, the twenty documents returned by the search engine address the users information needs, then re-ranking and/or filtering cannot help. In addition to personalized search, the use of classification techniques is investigated to map between user-created ontologies and the reference ontology to provide personalized browsing. OBIWANs Local and Regional Browsing agents allow users to browse Web sites with respect to a consistent conceptual arrangement of the world. Web sites have their contents spidered and classified with respect to the reference ontology after which the reference ontology can be used to browse the spidered Web pages. By mapping from the users own ontology to the reference ontology, users get a consistent arrangement of content that matches their own world view rather than the systems. Five users created their own ontologies and provided sample Web pages as training data. It is able to map from the personalized concepts to the reference ontology concepts and then use these mappings to browse Web sites that were pre-mapped into the reference ontology. 2.2.4 IMPLICATION

Volume 2 Issue 1 January 2014

Page 45

IPASJ International Journal of Computer Science (IIJCS)


A Publisher for Research Motivation ........

Volume 2, Issue 1, January 2014

Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm Email: editoriijcs@ipasj.org ISSN 2321-5992

Each user submits their personal ontology, a hierarchical tree of concepts that represents their view of the world. For our experiments, the tree was required to contain at least ten concepts with at least five sample pages for each concept. The goal of the mapping phase is to map every concept in the reference ontology to a concept in the personal ontology. However, since personal ontologies tend to be much smaller and more narrowly focused than the reference ontology, many concepts will remain unmapped. Thus, we augment the personal tree with an extra concept called All-Others to hold the concepts from the reference ontology that do not map to a corresponding concept in the personal ontology. We take a multi-phase approach to mapping from each reference ontology concept to the best matching personal ontology concept. While it is possible for a reference ontology concept to map to multiple personal ontology concepts, this would indicate that the personal concepts are more fine-grained than the reference concepts. Since our reference ontology is very large (5,863 concepts), this is not likely to occur. Thus, we simplified our mapping algorithm to focus on mapping each reference concept to the best matching, single personal concept. In practice, our users tended to create concepts that were at least as broad as or broader than the reference concept. Zhiqiang Cai, Danielle S. McNamara,Max Louwerse , Xiangen Hu, Mike Rowe, Arthur C. Graesser NLS: A NonLatent Similarity Algorithm 2.3 PROBLEM FORMULATION Computationally determining the semantic similarity between textual units (words, sentences, chapters, etc.) has become essential in a variety of applications, including web searches and question answering systems. One specific example is AutoTutor, an intelligent tutoring system in which the meaning of a student answer is compared with the meaning of an expert answer (Graesser, P. Wiemer- Hastings, K. Wiemer-Hastings, Harter, Person, & the TRG, 2000). In another application, called Coh-Metrix, semantic similarity is used to calculate the cohesion in text by determining the extent of overlap between sentences and paragraphs (Graesser, McNamara, Louwerse & Cai, in press; McNamara, Louwerse, & Graesser, 2002). 2.3.1 RESEARCH DESIGN This paper focuses on vector space models. Our specific goal is to compare Latent Semantic Analysis (LSA, Landauer & Dumais, 1997) to an alternative algorithm called Non-Latent Similarity (NLS). This NLS algorithm makes use of a second-order similarity matrix (SOM). Essentially, a SOM is created using the cosine of the vectors from a first-order (non-latent) matrix. This firstorder matrix (FOM) could be generated in any number of ways. However, here we used a method modified from Lin (1998). In the following sections, we describe the general concept behind vector space models, describe the differences between the metrics examined, and present an evaluation of these metrics ability to predict word associates. 2.3.2 FINDINGS 2.3.2.1 Latent Semantic Analysis (LSA) LSA is one type of vector-space model that is used to represent world knowledge (Landauer & Dumais, 1997). LSA extracts quantitative information about the cooccurrences of words in documents (paragraphs and sentences) and translates this into an N-dimensional space. The input of LSA is a large co-occurrence matrix that specifies the frequency of each word in a document. Using singular value decomposition (SVD), LSA maps each document and word into a lower dimensional space. In this way, the extremely large co-occurrence matrix is typically reduced to about 300 dimensions. Each word then becomes a weighted vector on K dimensions. The semantic relationship between words can be estimated by taking the cosine between two vectors. 2.3.3 CONCLUSION In summary, an alternative algorithm, NLS, is provided which makes it possible to use any non-latent similarity matrix to compare text similarity. This algorithm uses a SOM that is created using the cosine of the vectors from a first-order (non-latent) matrix. This FOM could be generated in any number of ways. A modified form of Lins (1998) algorithm is used to extract non-latent word similarity from corpora. Our evaluation of NLS compared its ability to predict word associates to the predictions made by the FOM and LSA. The critical difference between the algorithms addressed the latency of the word representations. The use of SVD results in latent word representations in LSA, whereas the use of the syntax parser in NLS results in a non-latent representation. It is found that NLS, using the similarity matrix that we generated, identified the associates to modifiers and nouns relatively well. Both LSA and NLS were equally able to identify the associations to the modifiers. In contrast, none of the metrics successfully identified the associates to the verbs.

Volume 2 Issue 1 January 2014

Page 46

IPASJ International Journal of Computer Science (IIJCS)


A Publisher for Research Motivation ........

Volume 2, Issue 1, January 2014

Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm Email: editoriijcs@ipasj.org ISSN 2321-5992

2.3.4 IMPLICATION 2.3.4.1 Non-Latent Similarity (NLS) Model NLS is proposed here as an alternative to latent similarity models such as LSA. NLS relies on a first order, non-latent matrix that represents the non-latent associations between words. The similarity between words (and documents) is calculated based on a second-order matrix. The second order matrix is created from the cosines between the vectors for each word drawn from the FOM. Hence, for NLS, the cosines are calculated based on the non-latent similarities between the words, whereas for LSA, the similarities are based on the cosines between the latent vector representations of the words. The following section describes the components and algorithms used in NLS. 2.3.4.2 First-Order Matrix LSA is referred to as latent because the content is not explicit or extractable after SVD.Thus, the features that two similar words share are latent. In contrast, every feature is explicit and directly extractable from the matrix using Lins (1998) algorithm. Hence, it is non-latent, and can be used as a first-order similarity matrix FOM). Kyung Soon Lee W. Bruce Croft James Allan A Cluster-Based Resampling Method for Pseudo- Relevance Feedback 2.4 PROBLEM FORMULATION Most pseudo-relevance feedback methods (e.g., [12,19,7]) assume that a set of top-retrieved documents is relevant and then learn from the pseudo-relevant documents to expand terms or to assign better weights to the original query. This is similar to the process used in relevance feedback, when actual relevant documents are used [23]. But in general, the top-retrieved documents contain noise: when the precision of the top 10 documents (P@10) is 0.5, 5 of them are nonrelevant. This is common and even expected in all retrieval models. This noise, however, can result in the query representation drifting away from the original query. 2.4.1 RESEARCH DESIGN This paper describes a resampling method using clusters to select better documents for pseudo-relevance feedback. Document clusters for the initial retrieval set can represent aspects of a query on especially large-scale web collections, since the initial retrieval results may involve diverse subtopics for such collections. Since it is difficult to find one optimal cluster, we use several relevant groups for feedback. By permitting overlapped clusters for the top-retrieved documents and repeatedly feeding dominant documents that appear in multiple highly-ranked clusters, an expansion query can be represented to emphasize the core topics of a query. 2.4.2 FINDINGS Motivation for using clusters and resampling is as follows: the top-retrieved documents are a query-oriented ordering that does not consider the relationship between documents. The pseudo-relevance feedback problem of learning expansion terms is viewed closely related to a query to be similar to the classification problem of learning an accurate decision boundary, depending on training examples. This problem is approached by repeatedly selecting dominant documents to expand terms toward dominant documents of the initial retrieval set, as in the boosting method for a weak learner that repeatedly selects hard examples to change the decision boundary toward hard examples. The hypothesis behind using overlapped document clusters is that a good representative document for a query may have several nearest neighbors with high similarities, participating in several different clusters. Since it plays a central role in forming clusters, this document may be dominant for this topic. Repeatedly sampling dominant documents can emphasize the topics of a query, rather than randomly resampling documents for feedback. 2.4.3 CONCLUSION Resampling the top-ranked documents using clusters is effective for pseudo-relevance feedback. The improvements obtained were consistent across nearly all collections, and for large web collections, such as GOV2 and WT10g, the approach showed substantial gains. The relative improvements on GOV2 collection are 16.82% and 6.28% over LM and RM, respectively. The improvements on the WT10g collection are 19.63% and 26.38% over LM and RM, respectively. We showed that the relevance density was higher than the baseline feedback model for all test collections as a justification of why expansion by the cluster based resampling method helps. Experimental results also show that overlapping clusters are helpful for identifying dominant documents for a query. 2.4.4 IMPLICATION

Volume 2 Issue 1 January 2014

Page 47

IPASJ International Journal of Computer Science (IIJCS)


A Publisher for Research Motivation ........

Volume 2, Issue 1, January 2014

Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm Email: editoriijcs@ipasj.org ISSN 2321-5992

Robustness of the baseline feedback model and the resampling method over the baseline retrieval model is analysed. Here, robustness is defined as the number of queries whose performance is improved or hurt as the result of applying these methods. The robustness of ROBUST and AP showed the similar pattern with WSJ. For the homogeneous newswire collections such as WSJ, AP and ROBUST, the relevance model and resampling method showed a similar pattern for robustness. The resampling method shows strong robustness for each test collection. For the GOV2 collection, the resampling method improves 41 queries and hurts 9, whereas the relevance model improves 37 and hurts 13. For the WT10g collection, the resampling method improves 30 and hurts 19, whereas the relevance model improves 32 and hurts 17. Although the relevance model improves the performance of 2 more queries than the resampling method, the improvements obtained by the resampling method are significantly larger. For the ROBUST collection, the resampling method improves 63 and hurts 36, whereas the relevance model improves 64 and hurts 35. Overall, our resampling method improves the effectiveness for 82%, 61%, 63%, 66% and 70% of the queries for GOV2, WT10g, ROBUST, AP and WSJ, respectively. Xing Jiang and Ah-Hwee Tan Mining Ontological Knowledge from Domain-Specific Text Documents 2.5 PROBLEM FORMULATION Ontology is an explicit specification of a conceptualization, comprising a formal description of concepts, relations between concepts, and axioms about a target domain. Considered as the backbone of the Semantic Web, domain ontologies enable software agents to interact and carry out sophisticate tasks for users. 2.5.1 RESEARCH DESIGN To reduce the effort of building ontologies, ontology learning systems have been developed to learn ontologies from domain relevant materials. However, most existing ontology learning systems focus on extracting concepts and taxonomic (IS-A) relations. For example, SymOntos, a symbolic ontology management system developed at IASI CNR, made use of shallow NLP tools including a morphologic analyzer, a part-of-speech (POS) tagger and a chunk parser, to process documents and employed text mining techniques to produce large ontologies based on document document collections. The concept extraction method was however domain-dependent and had limited applicability. Text-ToOnto, also based on shallow NLP tools, was able to extract key concepts and semantic relations from texts. Selection of concepts was based on the tf/idf measure used in the field of information retrieval. Semantic relations were extracted using an association rule mining algorithm and predefined regular expression rules. However, as tf/idf was designed primarily for IR, the system extracted both domain-specific and common concepts. Also, the identification of semantic relations is based on POS tags, limiting the accuracy of the relations extracted. 2.5.2 FINDINGS In this paper, a novel system is presented, known as Concept Relation Concept Tuple based Ontology Learning (CRCTOL) for mining rich semantic knowledge in the form of ontology from domain-specific documents. By using a full text parsing technique and incorporating statistical and lexico-syntactic methods, the knowledge extracted by our system is more concise and contains a richer semantics compared with alternative systems. We conduct a case study wherein CRCTOL extracts ontological knowledge, specifically key concepts and semantic relations, from a terrorism domain text collection. Quantitative evaluation, by comparing with the Text-To-Onto system, has shown that CRCTOL produces much better accuracy for concept and relation extraction, especially from sentences with complex structures. 2.5.3 CONCLUSION Traditional ontology learning systems for concept extraction were based on words. First, keywords were identified from the text. These words are typically single-word terms and will be seen as the concepts. Then, possible multi-word terms were formed by combining these keywords. As a result, the multi-word terms generated were not natural and most concepts extracted were only single-word terms. When using the NLP component to process documents, we found most noun terms in the text were multi-word term. As it was also shown that 85% of the terms in text were multi-word terms [6], traditional systems focusing on single-word term extraction will thus miss many concepts. A different strategy for concept extraction is adopted . First, multi-word terms are induced from text directly. Then, single-word terms are extracted if they appear frequently in the multi-word terms or they are found related to the multiword terms through certain semantic relations. This strategy reduces the chance of missing important concepts. 2.5.4 IMPLICATION

Volume 2 Issue 1 January 2014

Page 48

IPASJ International Journal of Computer Science (IIJCS)


A Publisher for Research Motivation ........

Volume 2, Issue 1, January 2014

Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm Email: editoriijcs@ipasj.org ISSN 2321-5992

Natural Language Processing (NLP): This component incorporates NLP tools, such as Eric Brills POS tagger for attaching words with POS tags and Michael Collinss syntactic parser for parsing sentences. With the NLP component, we could utilize the full text parsing technique for text analysis. It distinguishes our system from alternative systems which only use shallow NLP techniques. Algorithm Library: The algorithm library consists of a statistical algorithm that extracts key concepts from a document collection; a rule based algorithm that extracts relations between the key concepts; and a modified generalized association rule mining algorithm that builds the ontology. Domain Lexicon: The domain lexicon contains terms specific to the domain of interest. These terms are used in the NLP component for analyzing documents. The domain lexicon is manually built and can be updated during the process of ontology learning. The overall procedure for ontology learning is summarized as follows. Data Preprocessing: The CRCTOL system assumes that the input documents are in the plain text format. Text .les in other formats is converted to plain text before processing. NLP Analysis: The input files are processed by the NLP component. Syntactic and POS tags are assigned to individual words in the documents. Concept Extraction: Concepts are identified by a statistical algorithm from text. These concepts are called the key concepts of the target domain. Semantic Relation Extraction: Semantic relations of the key concepts are extracted from the text. These include taxonomic and non-taxonomic relations. Ontology Building: Ontology is built in this step by linking concepts and relations extracted. The final ontology is presented in the form of a semantic network. 2.5.5 CONCLUSION Several of the method proposed has been designed to implement for searching the web document effectively. This paper presents various existing technique proposed by researchers. Most of the method described here tested on different techniques it provide different result. A promising direction for future research is to see new generation retrieval system that produce effective result based on users query.

BIBLIOGRAPHY
[1] Chris Buckley, Ellen M. Voorhees Evaluating Evaluation Measure Stability, Proceeding SIGIR 00 Proceddings of the 23rd annual international ACM SIGIR confernce on Research and development in information reterival , pp 33-40,2000. [2] Susan Gauch, Jason Chaffee, Alexander Pretschner Ontology-Based Personalized Search and Browsing, Journal of Web intelligence and agenet systems,Volume I , Issue 3-4, pp219-234, Dec 2003. [3] Zhiqiang Cai, Danielle S. McNamara, Max Louwerse, Xiangen Hu, Mike Rowe, Arthur C. Graesser NLS: A NonLatent Similarity Algorithm, Aug 2004. [4] Kyung Soon Lee W. Bruce Croft James Allan A Cluster-Based Resampling Method for Pseudo- Relevance Feedback 2004. [5] Xing Jiang, Ah-Hwee Tan Mining Ontological Knowledge from Domain-Specific Text Documents 2005. [6] D.Saravanan, Dr.S.Srinivasan, Matrix Based Indexing Technique for Video Data , International journal of Computer Science, 9 (5): 534-542, 2013,pp 534-542. [7] D.Saravanan, Dr.S.Srinivasan, A proposed New Algorithm for Hierarchical Clustering suitable for Video Data mining., International journal of Data Mining and Knowledge Engineering, Volume 3, Number 9, July 2011.Pages 569-572. [8] S.E. Middleton, N.R. Shadbolt, and D.C. De Roure, Ontological User Profiling in Recommender Systems, ACM Trans.Information Systems (TOIS), vol. 22, no. 1, pp. 54-88, 2004. [9] G.A. Miller and F. Hristea, WordNet Nouns: Classes and Instances, Computational Linguistics, vol. 32, no. 1, pp. 1-3, 2006. [10] D.N. Milne, I.H. Witten, and D.M. Nichols, A Knowledge-Based Search Management (CIKM 07), pp. 445-454, 2007. [11] D.Saravanan, Dr.S.Srinivasan, Data Mining Framework for Video Data, In the Proc.of International Conference on Recent Advances in Space Technology Services & Climate Change (RSTS&CC-2010), held at Sathyabama University, Chennai, November 13-15, 2010.Pages 196-198. [12] D.Saravanan, Dr.S.Srinivasan, Video Image Retrieval Using Data Mining Techniques Journal of Computer Applications, Volume V, Issue No.1. Jan-Mar 2012. Pages39-42. ISSN: 0974-1925. [13] R. Navigli, P. Velardi, and A. Gangemi, Ontology Learning and Its Application to Automated Terminology Translation, IEEE Intelligent Systems, vol. 18, no. 1, pp. 22-31, Jan./Feb. 2003.

Volume 2 Issue 1 January 2014

Page 49

IPASJ International Journal of Computer Science (IIJCS)


A Publisher for Research Motivation ........

Volume 2, Issue 1, January 2014

Web Site: http://www.ipasj.org/IIJCS/IIJCS.htm Email: editoriijcs@ipasj.org ISSN 2321-5992

[14] S. Nirenburg and V. Rasin, Ontological Semantics. The MIT Press, 2004. AUTHOR:
D.Saravanan, Currently working as a Assistatant Professor in Sathyabama university, his area of interest are Data mining, Image processing and Data Base Mangement Systems. S.Vaithyasubramanian, Currently working as a Assistatant Professor in Sathyabama university, his area of interest are Data mining, Password generation , Password cracking and IDS /IPS.

K.N.Jothivenkatesh, Currently working as a Assistatant Professor in Sathyabama university, his area of interest are Data mining, Image processing and Data Base Mangement Systems.

Volume 2 Issue 1 January 2014

Page 50

Вам также может понравиться