Добро пожаловать в Scribd!

Web Crawlers

Загружено:

0% нашли этот документ полезным (0 голосов)

34 просмотров7 страниц

Web crawlers Crawl or visit web pages and download them. How good / bad, efficient a crawler is Mainly depends on crawling policies used Crawl policies Selection policy Re-visit policy Politeness policy Parallelization policy Selection policy +Pageranks +Path ascending +Freshness +Age Politeness +So that crawlers donPt overload web servers +Set a delay between GET requests Parallelization +Distributed web crawling +To maximize download rate Nu

Исходное описание:

Авторское право

Доступные форматы

PPT, PDF, TXT или читайте онлайн в Scribd

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Пожаловаться на этот документ

Авторское право:

Attribution Non-Commercial (BY-NC)

Доступные форматы

Скачайте в формате PPT, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

0% нашли этот документ полезным (0 голосов)

34 просмотров7 страниц

Web Crawlers

Загружено:

Anil Yellumahanthi

Авторское право:

Attribution Non-Commercial (BY-NC)

Доступные форматы

Скачайте в формате PPT, PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

Перейти к странице

Вы находитесь на странице: 1из 7

Поиск в документе

Web Crawlers

Nutch

Agenda
What are web crawlers Main policies in crawling Nutch Nutch architecture

Web crawlers
Crawl or visit web pages and download them Starting from one page determine which page(s) to go to next This is where we know how good/bad, efficient a crawler is Mainly depends on crawling policies used

Crawl policies
Selection policy Re-visit policy Politeness policy Parallelization policy Selection policy
Pageranks Path ascending Focused crawling

Re-visit policy
Freshness Age

Politeness
So that crawlers dont overload web servers Set a delay between GET requests

Parallelization
Distributed web crawling To maximize download rate

Nutch
Is a Open Source web crawler Nutch Web Search Application
Maintain DB of pages and links Pages have scores, assigned by analysis Fetches high-scoring, out-of-date pages Distributed search front end Based on Lucene

http://lucene.apache.org/nutch/

Вам также может понравиться

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
От Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Рейтинг: 4 из 5 звезд
4/5 (5794)
Purchasing Methods
Документ5 страниц
Purchasing Methods
Anil Yellumahanthi
Оценок пока нет
The Yellow House: A Memoir (2019 National Book Award Winner)
От Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Рейтинг: 4 из 5 звезд
4/5 (98)
Constitutional Amendments
Документ12 страниц
Constitutional Amendments
Anil Yellumahanthi
Оценок пока нет
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
От Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Рейтинг: 3.5 из 5 звезд
3.5/5 (231)
Purchase Manual
Документ57 страниц
Purchase Manual
Anil Yellumahanthi
Оценок пока нет
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
От Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Рейтинг: 4 из 5 звезд
4/5 (895)
AutoCAD For Beginners
Документ82 страницы
AutoCAD For Beginners
dalaalstreet09
Оценок пока нет
The Little Book of Hygge: Danish Secrets to Happy Living
От Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Рейтинг: 3.5 из 5 звезд
3.5/5 (400)
All Lessons
Документ21 страница
All Lessons
Er Harish Kumar
Оценок пока нет
Shoe Dog: A Memoir by the Creator of Nike
От Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Рейтинг: 4.5 из 5 звезд
4.5/5 (537)
UML (Unified Modeling Langauage)
Документ17 страниц
UML (Unified Modeling Langauage)
Prince Balu
Оценок пока нет
Never Split the Difference: Negotiating As If Your Life Depended On It
От Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Рейтинг: 4.5 из 5 звезд
4.5/5 (838)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
От Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Рейтинг: 4.5 из 5 звезд
4.5/5 (474)
Grit: The Power of Passion and Perseverance
От Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Рейтинг: 4 из 5 звезд
4/5 (588)
Yes Please
От Everand
Yes Please
Amy Poehler
Рейтинг: 4 из 5 звезд
4/5 (1891)
The Emperor of All Maladies: A Biography of Cancer
От Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Рейтинг: 4.5 из 5 звезд
4.5/5 (271)
On Fire: The (Burning) Case for a Green New Deal
От Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Рейтинг: 4 из 5 звезд
4/5 (74)
Team of Rivals: The Political Genius of Abraham Lincoln
От Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Рейтинг: 4.5 из 5 звезд
4.5/5 (234)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
От Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Рейтинг: 4.5 из 5 звезд
4.5/5 (266)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
От Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Рейтинг: 4.5 из 5 звезд
4.5/5 (344)
Rise of ISIS: A Threat We Can't Ignore
От Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Рейтинг: 3.5 из 5 звезд
3.5/5 (137)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
От Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Рейтинг: 3.5 из 5 звезд
3.5/5 (2259)
Fear: Trump in the White House
От Everand
Fear: Trump in the White House
Bob Woodward
Рейтинг: 3.5 из 5 звезд
3.5/5 (738)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
От Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Рейтинг: 4 из 5 звезд
4/5 (1090)
Principles: Life and Work
От Everand
Principles: Life and Work
Ray Dalio
Рейтинг: 4 из 5 звезд
4/5 (599)
John Adams
От Everand
John Adams
David McCullough
Рейтинг: 4.5 из 5 звезд
4.5/5 (2409)
The Unwinding: An Inner History of the New America
От Everand
The Unwinding: An Inner History of the New America
George Packer
Рейтинг: 4 из 5 звезд
4/5 (45)
The Glass Castle: A Memoir
От Everand
The Glass Castle: A Memoir
Jeannette Walls
Рейтинг: 4.5 из 5 звезд
4.5/5 (1712)
Angela's Ashes: A Memoir
От Everand
Angela's Ashes: A Memoir
Frank McCourt
Рейтинг: 4.5 из 5 звезд
4.5/5 (440)
Steve Jobs
От Everand
Steve Jobs
Walter Isaacson
Рейтинг: 4.5 из 5 звезд
4.5/5 (806)
Bad Feminist: Essays
От Everand
Bad Feminist: Essays
Roxane Gay
Рейтинг: 4 из 5 звезд
4/5 (1015)
The Outsider: A Novel
От Everand
The Outsider: A Novel
Stephen King
Рейтинг: 4 из 5 звезд
4/5 (1839)
The Light Between Oceans: A Novel
От Everand
The Light Between Oceans: A Novel
M.L. Stedman
Рейтинг: 4.5 из 5 звезд
4.5/5 (789)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
От Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Рейтинг: 4.5 из 5 звезд
4.5/5 (121)
Brooklyn: A Novel
От Everand
Brooklyn: A Novel
Colm Toibin
Рейтинг: 3.5 из 5 звезд
3.5/5 (1937)
The Woman in Cabin 10
От Everand
The Woman in Cabin 10
Ruth Ware
Рейтинг: 3.5 из 5 звезд
3.5/5 (2322)
A Man Called Ove: A Novel
От Everand
A Man Called Ove: A Novel
Fredrik Backman
Рейтинг: 4.5 из 5 звезд
4.5/5 (4609)
The Perks of Being a Wallflower
От Everand
The Perks of Being a Wallflower
Stephen Chbosky
Рейтинг: 4.5 из 5 звезд
4.5/5 (2103)
Wolf Hall: A Novel
От Everand
Wolf Hall: A Novel
Hilary Mantel
Рейтинг: 4 из 5 звезд
4/5 (3811)
Little Women
От Everand
Little Women
Louisa May Alcott
Рейтинг: 4 из 5 звезд
4/5 (104)
Manhattan Beach: A Novel
От Everand
Manhattan Beach: A Novel
Jennifer Egan
Рейтинг: 3.5 из 5 звезд
3.5/5 (792)
The Art of Racing in the Rain: A Novel
От Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Рейтинг: 4 из 5 звезд
4/5 (4200)
The Constant Gardener: A Novel
От Everand
The Constant Gardener: A Novel
John le Carré
Рейтинг: 3.5 из 5 звезд
3.5/5 (104)
A Tree Grows in Brooklyn
От Everand
A Tree Grows in Brooklyn
Betty Smith
Рейтинг: 4.5 из 5 звезд
4.5/5 (1929)
Her Body and Other Parties: Stories
От Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Рейтинг: 4 из 5 звезд
4/5 (821)
Sing, Unburied, Sing: A Novel
От Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Рейтинг: 4 из 5 звезд
4/5 (1103)