Search

10 World Flattener (Thomas L,
Friedman)
• The fall of the Berlin wall on Nov 9, 1989
• Netscape IPO on August 9, 1995
• Software with compatible interfaces and file formats
• Open Source
• Outsourcing
• Offshoring
• Supply Chaining
• Insourcing
• In‐forming
• The steroids (technological development)
1
Definition of search engine
a software system that is designed to search
for information on the World Wide Web.
The search results are generally presented in
a line of results often referred to as search
engine results pages (SERPs).
The information may be a mix of web pages,
images, and other types of files.
3
How Search Engine Works
4
Three parts of search engine
• 1st part: Automated web browser (Spider) follows
every www link on a website, collects analyzes the
page and parse the words
• 2nd part: The search engine then uses proprietary
algorithm to create an index (or a catalog) so that
meaningful result are returned for each query
5
3rd piece of a search engine (Query
processing)
• Query processor
– A program that receives your search request, compares it
to the entries in the index, and returns results to you
– Manages the relevance and ranking
• “On search” by Tim Bray http://www.tbray.org/ongoing/When/200x/2003/07/30/OnSearchTOC 6
How Query Works
7
Search Engine Market Share*
Yahoo!
3.42%
Baidu
Yandex
11.74%
Bing 1.24% Ask
8.05% 0.53%
DuckDuckGo
0.28%
Naver
0.12%
AOL
0.06%
Google
74.41%
Google Baidu Bing Yahoo! Yandex Ask DuckDuckGo Naver AOL
*Report generated March 2019
https://netmarketshare.com/search-engine-market-share.aspx?
8
What are people searching?
Google Trends
– https://trends.google.com/trends/
9
Search Engine ‐ Early Days
• In the early days of Internet development, its users
were a privileged minority and the amount of
available information was relatively small
• Access was mainly restricted to employees of various
universities and laboratories who used it to access
scientific information
• In those days, the problem of finding information on
the Internet was not nearly as critical as it is now
10
History of Search Engine (1990‐1994)*
Archie is a pre‐Web The first search engine WebCrawler was Carnegie‐

search. A pre‐Web Wandex created in 1993 the first to index the Mellon's
search application by a MIT student named full text of the Web Lycos &
created in 1990 by a Mc Matthew Gray in 1994 Infoseek in
Gill University student July
named Alan Emtage
In 1991, Minnesota Oct 1993, Martin Koster April 1994, Stanford

University student Mark created Aliweb, is University students
McCahill Developed Archie’s HTTP version David and Yang co‐
Gopher founder of Yahoo! Search
engines in fast
development period
*adapted from “The Search” by John Battelle 11

Pre Web Search – FTP (File Transfer
Protocol)
• Problems with FTP
– No organization of FTP Servers
– User had to know an FTP Server existed
– User had to visit FTP Server to see files
• Archie (created by Alan Emtage @ McGill Univ. in
1990)
– Searchable directory of FTP files
– Searched FTP Servers and indexed their files
– User searched the Index
• Gopher (created by Paul Lindner & Mark P. McCahill
of Univ. of Minnesota in 1991)
– Connected Gopher servers through the Gopher hierarchy
(gopherspace)
12
Wandex
• Wanderer (Matthew Gray’s World Wide Web
Wanderer)
– First WWW Engine
– Designed to track the size of the WWW
– Captured URL’s and entered into database
(Wandex)
– First Robots “bots”
13
AliWeb
• AliWeb (Archie Like Indexing for the WEB)
• It allowed users to submit their pages they
wanted indexed with their own page
description
• This meant it needed no bot to collect data
and was not using excessive bandwidth. The
downside of ALIWEB is that many people did
not know how to submit their site.
14
WebCrawler
• Developed by Brian Pinkerton@University of
Washington on April 20, 1994
• It was the first crawler which indexed entire
pages
• AOL eventually purchased WebCrawler and
ran it on their network.
15
History of Search Engine (1995‐1998)
16
AltaVista
• Developed by DEC (Digital Equipment Corp)
• Index many web pages and provide efficient
search
• AltaVista also provided numerous search tips
and advanced search features.
• This made AltaVista the first searchable, full‐
text database of a large part of the World
Wide Web
• The “Google of its day”
17
History of Search Engine
(After 2000)
2000: Baidu
2004: Yahoo! Search
2005: MSN Search & ASK.com
2006: Live Search
2009: Bing
http://www.searchenginehistory.com/
18
Yahoo Search ‐ Directory
• Site directories were one of the first methods used to
facilitate access to information resources on the network
• Links to these resources were grouped by topic
• As the number of sites in the Yahoo directory inexorably
increased, the developers of Yahoo made the directory
searchable. Of course, it was not a search engine in its true
form because searching was limited to those resources who’s
listings were put into the directory
19
Web Search Until 1998
• In the past, search engine results were often filled
with irrelevant content, spam, and other kinds of
malicious material
• Because ranking based on "on‐page factors“
– Keyword tag spam generates poor quality of search
results (order)
• Results were also influenced heavily by marketers
with big budgets
20
Google PageRank
Page and Brin proposed to
compute the absolute quality of a
page (PageRank) based on the
number and quality of pages
linking to a page (votes)
Link:www.ust.hk
Link:http://www.ust.hk/page.html
Link:www.ust.hk –site:www.ust.hk
21
Whole Internet on servers
• Google has created its own copied version of the Internet. The
site has scanned and ranked several billion Web pages — the
figure increases daily — and stored them on its own
computers, known as servers
• The pages are organized by subject and every time you click
“Search” hundreds of thousands of computers get to work
collecting different document links and returning those links
to your screen in the blink of an eye
• The search is speeded up because Google stores three copies
of all its previous searches, so it doesn’t have to scan the
entire Web if two people ask the same question
22
Search Engine – Political Analogy
• AltaVista – anarchy
• Yahoo (web directory) – planned economy
• Google – people’s power
http://www.archive.org/web/web.php
http://web.archive.org/web/19961017235908/http://www2.yahoo.com/
23
Database of Intention
The idea is that every search entry with an Internet search

engine contributes to a pattern that can be analyzed and
used for prediction. Each search, Battelle notes, offers a
hint of what an individual wants to accomplish -- an itch to
scratch, a problem to solve, a desire to fulfill.
Web searches are "a place holder for the intentions of

humankind — a massive database of desires, needs,
wants, and likes that can be discovered, subpoenaed,
archived, tracked, and exploited to all sorts of ends
The Database of Intentions is simply this: The sum total of

all queries that pour into search engines daily, revealing the
HD9696.8.U64 G663 2005 intricacies and idiosyncrasies of our culture.
24
Database of Intention*
*From Wired UK July 2010

25
Right To Be Forgotten*
• In May 2014, the highest court in Europe said
ruled that people had the right to choose
what the world could learn about them
through online searches
• The ruling place the right to privacy over the
right of the public to find information
*Aug 2015 NYT 26

Search

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Search

Загружено:

Авторское право:

Доступные форматы

10 World Flattener (Thomas L,

Google Baidu Bing Yahoo! Yandex Ask DuckDuckGo Naver AOL

*Report generated March 2019

Archie is a pre‐Web The first search engine WebCrawler was Carnegie‐

In 1991, Minnesota Oct 1993, Martin Koster April 1994, Stanford

*adapted from “The Search” by John Battelle 11

2004: Yahoo! Search

2005: MSN Search & ASK.com

2006: Live Search

The idea is that every search entry with an Internet search

Web searches are "a place holder for the intentions of

The Database of Intentions is simply this: The sum total of

*From Wired UK July 2010

*Aug 2015 NYT 26

Вам также может понравиться