Академический Документы
Профессиональный Документы
Культура Документы
Friedman)
• The fall of the Berlin wall on Nov 9, 1989
• Netscape IPO on August 9, 1995
• Software with compatible interfaces and file formats
• Open Source
• Outsourcing
• Offshoring
• Supply Chaining
• Insourcing
• In‐forming
• The steroids (technological development)
1
Definition of search engine
a software system that is designed to search
for information on the World Wide Web.
The search results are generally presented in
a line of results often referred to as search
engine results pages (SERPs).
The information may be a mix of web pages,
images, and other types of files.
3
How Search Engine Works
4
Three parts of search engine
• 1st part: Automated web browser (Spider) follows
every www link on a website, collects analyzes the
page and parse the words
• 2nd part: The search engine then uses proprietary
algorithm to create an index (or a catalog) so that
meaningful result are returned for each query
5
3rd piece of a search engine (Query
processing)
• Query processor
– A program that receives your search request, compares it
to the entries in the index, and returns results to you
– Manages the relevance and ranking
• “On search” by Tim Bray http://www.tbray.org/ongoing/When/200x/2003/07/30/OnSearchTOC 6
How Query Works
7
Search Engine Market Share*
Yahoo!
3.42%
Baidu
Yandex
11.74%
Bing 1.24% Ask
8.05% 0.53%
DuckDuckGo
0.28%
Naver
0.12%
AOL
0.06%
Google
74.41%
https://netmarketshare.com/search-engine-market-share.aspx?
8
What are people searching?
Google Trends
– https://trends.google.com/trends/
9
Search Engine ‐ Early Days
• In the early days of Internet development, its users
were a privileged minority and the amount of
available information was relatively small
• Access was mainly restricted to employees of various
universities and laboratories who used it to access
scientific information
• In those days, the problem of finding information on
the Internet was not nearly as critical as it is now
10
History of Search Engine (1990‐1994)*
13
AliWeb
• AliWeb (Archie Like Indexing for the WEB)
• It allowed users to submit their pages they
wanted indexed with their own page
description
• This meant it needed no bot to collect data
and was not using excessive bandwidth. The
downside of ALIWEB is that many people did
not know how to submit their site.
14
WebCrawler
• Developed by Brian Pinkerton@University of
Washington on April 20, 1994
• It was the first crawler which indexed entire
pages
• AOL eventually purchased WebCrawler and
ran it on their network.
15
History of Search Engine (1995‐1998)
16
AltaVista
• Developed by DEC (Digital Equipment Corp)
• Index many web pages and provide efficient
search
• AltaVista also provided numerous search tips
and advanced search features.
• This made AltaVista the first searchable, full‐
text database of a large part of the World
Wide Web
• The “Google of its day”
17
History of Search Engine
(After 2000)
2000: Baidu
2009: Bing
http://www.searchenginehistory.com/
18
Yahoo Search ‐ Directory
• Site directories were one of the first methods used to
facilitate access to information resources on the network
• Links to these resources were grouped by topic
• As the number of sites in the Yahoo directory inexorably
increased, the developers of Yahoo made the directory
searchable. Of course, it was not a search engine in its true
form because searching was limited to those resources who’s
listings were put into the directory
19
Web Search Until 1998
• In the past, search engine results were often filled
with irrelevant content, spam, and other kinds of
malicious material
• Because ranking based on "on‐page factors“
– Keyword tag spam generates poor quality of search
results (order)
• Results were also influenced heavily by marketers
with big budgets
20
Google PageRank
Page and Brin proposed to
compute the absolute quality of a
page (PageRank) based on the
number and quality of pages
linking to a page (votes)
Link:www.ust.hk
Link:http://www.ust.hk/page.html
Link:www.ust.hk –site:www.ust.hk
21
Whole Internet on servers
• Google has created its own copied version of the Internet. The
site has scanned and ranked several billion Web pages — the
figure increases daily — and stored them on its own
computers, known as servers
• The pages are organized by subject and every time you click
“Search” hundreds of thousands of computers get to work
collecting different document links and returning those links
to your screen in the blink of an eye
• The search is speeded up because Google stores three copies
of all its previous searches, so it doesn’t have to scan the
entire Web if two people ask the same question
22
Search Engine – Political Analogy
• AltaVista – anarchy
• Yahoo (web directory) – planned economy
• Google – people’s power
http://www.archive.org/web/web.php
http://web.archive.org/web/19961017235908/http://www2.yahoo.com/
23
Database of Intention