Вы находитесь на странице: 1из 11

Web Search Engine

Advanced Computing Concepts


University of Windsor

Guided By:
SCOUT Luis Rueda
What is Search
• SEARCH:
Engine
COMPUTING to examine a computer file, disk,
database, or network for particular information.

• ENGINE:
Something the supplies the driving force or
energy to a movement, system or trend.

• SEARCH ENGINE
A computer program that searches for a
particular keywords and returns a list of
documents in which they were found, especially a
commercial service that scans documents on the
internet.
• Crawling:
• Follow links to find
information
• Indexing:
• Record what words appear
where
• Ranking:
• What information is a good
match to user query?
• What information is inherently
good?
• Suggestions:
• Suggests similar words

How Search Engine • Serving:


• Handling queries, find pages,
works? display results
Features
Implemented
• Spell checker – Checks the spellings of the
string entered and find the relevant string.
• Page Ranking – Calculates the score of the
page by calculating the occurrences of the word
and then ranks that page correspondingly
• Word Suggestion – Uses spellcheck and
TST/edit distance to suggest new and related
words by suggesting new and similar words.
• Pattern Matching – To create dictionary of
words from crawled web pages.
• Inverted Index – An inverted index is
an index data structure storing a mapping from
content, such as words or numbers, to its
locations in a document or a set of documents.
EDIT DISTANCE • Edit distance is a way of
quantifying how dissimilar two
strings are by the number of
steps it takes to turn from one
into the other, where a step is
defined as a single character
change.
• Example: The words `computer'
and `commuter' are very similar,
and a change of just one letter,
p->m will change the first word
into the second.
• In our search engine, we have
used Edit Distance in word
suggestion i.e. suggesting
nearby related words.
TST

• A ternary search tree is a special


trie data structure where the child
nodes of a standard trie are ordered
as a binary search tree.
• Ternary search trees are more
efficient to perform applications
like spell-checking and auto-
completion.
• Autocomplete, or word
completion, is a feature in which
an application predicts the rest of a
word a user is typing.
• A spell checking is a feature that
checks for misspellings in a text.
QUICK SELECT
• Quickselect is a selection
algorithm to find the k-th
smallest element in an
unordered list.
• It is used calculating the
score of the page by
calculating the
occurrences of the word
and then ranks that page
correspondingly.
REGEX
• A regular expression, regex or reg-exp is a sequence
of characters that define a search pattern.
• In our search engine, Regex is used for word retrieval from HTML
file to create word dictionary.
• Regex is used for spell checking in which it checks whether the
entered word contains any special character or not.
HTTRACK
• It allows you to download a World Wide Web site from the
Internet to a local directory, building recursively all directories,
getting HTML, images, and other files from the server to your
computer.
• Website that we downloaded are:
• Wikipedia
• Python.org
• Oracle.org

Оценить