Вы находитесь на странице: 1из 3

Resource Discovery

It refers to the process of finding and retrieving information on the


Internet
Major Differences between IR Systems and WWW Search Engines
1. WWW documents are distributed around the Internet while
documents in an IR system are centrally located;
2. The number of WWW documents is much greater than that of an IR
system;
3. WWW documents are more dynamic and heterogeneous than
documents in an IR system;
4. WWW documents are structured with HTML while the documents in
an IR system are normally plain text;
5. WWW search engines are used by more users and more frequently
than IR systems

WWW Documents are Dynamic and Heterogeneous because of


1. First, a huge vocabulary must be used to cope with the large
number of documents.
2. Second, it is difficult to make use of domain knowledge to improve
retrieve effectiveness as documents are from many different
domains.
3. Third, document frequency cannot be obtained by calculating the
term weight as the Web database is built progressively and is never
complete.
4. Fourth, the vector space model is not suitable because document
size varies and this model favors short documents.
5. Fifth, the index must be updated constantly as the documents
change constantly.
6. Sixth, the search engine must be robust to cope with the
unpredictable nature of documents and Web servers.
General Structure of WWW Search Engines All search engines have three
major elements:
1. The first is the spider, crawler, or robot. The spider visits a Web
page, reads it, and then follows links to other pages within the site.
The spider may return to the site on a regular basis, such as every
month to look for changes.

2. The second part of a search engine is the index. The index,


sometimes called the catalog, is like a giant book containing a copy
of every Web page that the spider finds. If a Web page changes,
then this book is updated with new information. Sometimes it can
take a while for new pages or changes that the spider finds to be
added to the index. Thus, a Web page may have been "spidered" but
not yet "indexed." Until it is indexed added to the index it is not
available to those searching with the search engine.
3. The third part of a search engine is search engine software. This is
the program that sifts through the millions of pages recorded in the
index to find matches to a search and rank them in order of what it
estimates is most relevant. Different search engines use different
similarity measurement and ranking functions. However, they all use
term frequency and term location in one way or another.

Basic Concepts of ASR :An ASR system operates in two stages:


training and pattern matching.
Training Stage, features of each speech unit is extracted and
stored in the system.
Recognition Process, features of an input speech unit are
extracted and compared with each of the stored features, and
the speech unit with the best matching features is taken as
the recognized unit.

Describe the four common approaches to image retrieval. What are their
strengths and weaknesses?

1- image contents are modeled as a set of attributes


extracted manually and managed within the framework
of conventional database management systems
2- integrated feature-extraction/object-recognition
subsystem. This subsystem automates the feature
extraction and object recognition. However, automated
approaches to object recognition are computationally
expensive, difficult, and tend to be domain specific.
3- uses free text to describe (annotate) images and employs
IR techniques to carry out image retrieval

4- in this approach uses low-level image features such as


color and texture to index and retrieve images. The
advantage of this approach is that the indexing and
retrieval process is carried out automatically and is easily
implemented. It has been shown that this approach
produces quite good retrieval performance.
5-

Вам также может понравиться