Добро пожаловать в Scribd!

Modern Information Retrieval: Parallel and Distributed IR

Загружено:

api-20013624

0% нашли этот документ полезным (0 голосов)

33 просмотров15 страниц

Parallel computing is the simultaneous application of multiple processors to solve a single problem, where each processor works on a different part of the problem.

Исходное описание:

Оригинальное название

index

Авторское право

Доступные форматы

PDF, TXT или читайте онлайн в Scribd

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Пожаловаться на этот документ

Parallel computing is the simultaneous application of multiple processors to solve a single problem, where each processor works on a different part of the problem.

Авторское право:

Attribution Non-Commercial (BY-NC)

Доступные форматы

Скачайте в формате PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

0% нашли этот документ полезным (0 голосов)

33 просмотров15 страниц

Modern Information Retrieval: Parallel and Distributed IR

Загружено:

api-20013624

Parallel computing is the simultaneous application of multiple processors to solve a single problem, where each processor works on a different part of the problem.

Авторское право:

Attribution Non-Commercial (BY-NC)

Доступные форматы

Скачайте в формате PDF, TXT или читайте онлайн в Scribd

Отметить как неприемлемый контент

Перейти к странице

Вы находитесь на странице: 1из 15

Поиск в документе

Modern Information Retrieval

Parallel and Distributed IR

信息检索实验室
Outline
Parallel Computing
Performance Measures
Parallel IR(MIMD)
Distributed IR

信息检索实验室
Parallel Computing
Parallel computing is the simultaneous application of multiple processors to
solve a single problem, where each processor works on a different part of
the problem.

Processor1
1 2

3 Processor2

Processor3

Four classes of parallel architecture

SISD : single instruction stream, single data stream
SIMD : single instruction stream, multiple data stream
MISD : multiple instruction stream, single data stream tightly coupled
MIMD : multiple instruction stream, multiple data stream
loosely coupled
信息检索实验室
Performance Measure

Running time of best available sequential algorithm

S=
Running teim of parallel algorithm

The perfect speedup is S = N, where N is the number of processors

We can’t achive the perfect speedup because :

problem partition
parallel architecture
inherently sequential component

信息检索实验室
Parallel IR
Two directions
z Develop new retrieval strategies that directly lend themselves to

parallel implementation (Neural network)

z Adapt existing, well studied information retrieval algorithms to parallel

processing.

Search
Engine
User query User query
Search response time
Broker Search
result result Engine resource balance
Engine
Search
Engine
Multitasking model

信息检索实验室
Parallel IR

Search
User query/Results Engine
User query
Search
Broker Engine
Result
Search
Engine

Search
Engine

信息检索实验室
Parallel IR — Data Partitioning
K1 K2 … Ki … Kt
D1 W1,1 W2,1 … Wi,1 … Wt,1
D2 W1,2 W2,2 … Wi,2 … Wt,2
… … … … … … …
Dj W1,j W2,j … Wi,j … Wt,j
… … … … … … …
Dn W1,n W2,n … Wi,n … Wt,n

d j
= ( w1, j ,..., wt , j ) q = ( w1,q ,..., wt ,q ) sim(d j , q)

Partitioning Methods
Logical document partitioning
Document Partitioning
Physical document partitioning
Term Partitioning

信息检索实验室
Parallel IR — Data Partitioning
Logical Document Partitioning Inverted list
for tem i

P0
term i P1
P2

Indexing:
Patitions the documents among processers
Runs a separate indexing process on each processer in parallel
Merge all inverted lists to final inverted file

信息检索实验室
Parallel IR — Data Partitioning
Physical Document Partitioning Inverted file

1
Document set

Result Merging:
Global term statistics
Two phases approach

信息检索实验室
Parallel IR — Data Partitioning
Term Partitioning Inverted file
for term 1 to term 1000
1
Document set

Document partitioning VS Term partitioning:

Architecture
Performance

信息检索实验室
Distributed IR
Distrivuted IR is very similar to MIMD, but:
z Communication channel is network protocol

z Document partitioning is the best choice

Engineering issues:
z Search protocol

z Search server

z Request broker

Algorithmic issues:
z How to distribute documents across the distributed search servers

z How to select which server should receive a paticular search request

z How to combine the result from the different servers

信息检索实验室
Distributed IR

Collection Partitioning
z Replicate the collection across all of the search seavers
Each server indexs its replica of the documents
Each server indexs a subset of documents and merge all sub-indexs
to the final index file at each search server
z Random distribute the documents (Search Engine)
z Explicit semantic partition the documents

信息检索实验室
Distributed IR

Source Selection
z Always broadcast the query to each search server
z Use cosine similarity measure
Block technique
z Build content model by training

信息检索实验室
Distributed IR
Query Processing
z Select collections to search
z Distributed query to selected collections
z Evaluate query at distributed collections in parallel
z Combine results from distributed collections in final result

Ranking
z Merging ranked hit-lists returned by each search server
Global term statistical
Two phases search
z Reranking
Weighting document scores based on their collection similarity computed
during the source selection step
Require term statistics and reranking

w = 1+ | C | *( s − s ) / s
|C| — number of collections searched

信息检索实验室
s — collection’s score
信息检索实验室

Вам также может понравиться

Data Acquisition Methods
Документ19 страниц
Data Acquisition Methods
Mohamed aboaly
Оценок пока нет
Lect6 DFD Tutorial
Документ4 страницы
Lect6 DFD Tutorial
Arunabha Sengupta
Оценок пока нет
Phase Three Project Report
Документ2 страницы
Phase Three Project Report
Bob Patel
Оценок пока нет
Mu-Fashion: Multi-Resolution Data Fusion Using Agent-Bearing Sensors in Hierarchically-Organized Networks
Документ33 страницы
Mu-Fashion: Multi-Resolution Data Fusion Using Agent-Bearing Sensors in Hierarchically-Organized Networks
haddan
Оценок пока нет
IntroductionOpenSees PDF
Документ27 страниц
IntroductionOpenSees PDF
ali381
Оценок пока нет
Lecture 4: Term Weighting and The Vector Space Model: Information Retrieval Computer Science Tripos Part II
Документ62 страницы
Lecture 4: Term Weighting and The Vector Space Model: Information Retrieval Computer Science Tripos Part II
Bemenet Biniyam
Оценок пока нет
PostgresChina2018 刘东明 PostgreSQL并行查询
Документ36 страниц
PostgresChina2018 刘东明 PostgreSQL并行查询
Thoa Nhu
Оценок пока нет
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
Документ46 страниц
IR Models: - Why IR Models? - Boolean IR Model - Vector Space IR Model - Probabilistic IR Model
kerya ibrahim
Оценок пока нет
DS Data Structures Overview
Документ127 страниц
DS Data Structures Overview
karri maheswar
Оценок пока нет
9 Hadoop PDF
Документ59 страниц
9 Hadoop PDF
Amine Hamdouchi
Оценок пока нет
Lab Manual 3112
Документ11 страниц
Lab Manual 3112
Tahamid Hasan.
Оценок пока нет
Discovery of Similarity Computations of Search Engines: King-Lup Liu Weiyi Meng Clement Yu
Документ8 страниц
Discovery of Similarity Computations of Search Engines: King-Lup Liu Weiyi Meng Clement Yu
postscript
Оценок пока нет
CBIR Papers For PHD
Документ7 страниц
CBIR Papers For PHD
kalpesh_chandak
Оценок пока нет
Query Processing + Optimization: Outline: Operator Evaluation Strategies
Документ53 страницы
Query Processing + Optimization: Outline: Operator Evaluation Strategies
Anupam Dubey
Оценок пока нет
Databricks: Building and Operating A Big Data Service Based On Apache Spark
Документ32 страницы
Databricks: Building and Operating A Big Data Service Based On Apache Spark
Saravanan1234567
Оценок пока нет
Software Random Number Generation Based On Race Conditions
Документ7 страниц
Software Random Number Generation Based On Race Conditions
CHRISTIAN ANDRES ESPINOZA
Оценок пока нет
2052 PLMEurope 24.10.17-17-30 BALASUNDARAM-KANTHI INTELIZIGN ENGINEERING SERV Active Workspace For External Data Aggregation and Search
Документ18 страниц
2052 PLMEurope 24.10.17-17-30 BALASUNDARAM-KANTHI INTELIZIGN ENGINEERING SERV Active Workspace For External Data Aggregation and Search
Abhilash Ravindran
Оценок пока нет
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
Документ32 страницы
Introduction To Datawarehousing: Duration: 45 Minutes (Approx.) Abhishek Ranjan
babjeereddy
Оценок пока нет
Jesus A. Gonzalez Supervisor: Dr. Lawrence B. Holder Committee: Dr. Diane J. Cook Dr. Lynn Peterson
Документ35 страниц
Jesus A. Gonzalez Supervisor: Dr. Lawrence B. Holder Committee: Dr. Diane J. Cook Dr. Lynn Peterson
meloz85
Оценок пока нет
DS20 ALL Overview
Документ88 страниц
DS20 ALL Overview
Mouhcine Mohammed
Оценок пока нет
Cross-Language Information Retrieval (CLIR) : Ananthakrishnan R
Документ32 страницы
Cross-Language Information Retrieval (CLIR) : Ananthakrishnan R
Bapuji Valaboju
Оценок пока нет
Recall, Precision
Документ7 страниц
Recall, Precision
Sugi Hartono
Оценок пока нет
Wurst
Документ31 страница
Wurst
John Slor
Оценок пока нет
IR Ch23 Text Representation
Документ36 страниц
IR Ch23 Text Representation
Bushra Mamoud
Оценок пока нет
Syngistix Results Library Description
Документ35 страниц
Syngistix Results Library Description
MOHAMMAD AMIRUL ASSYRAF MOHAMMAD NOOR
Оценок пока нет
Optimizing Multi-Dimensional Data-Index Algorithms For Mic Architectures
Документ7 страниц
Optimizing Multi-Dimensional Data-Index Algorithms For Mic Architectures
International Journal of Innovative Science and Research Technology
Оценок пока нет
S : On Synthesis of Program Analyzers: Ouffl E
Документ7 страниц
S : On Synthesis of Program Analyzers: Ouffl E
TsumeAwase
Оценок пока нет
Lecture 1 Parallel Databases
Документ30 страниц
Lecture 1 Parallel Databases
Kumkumo Kussia Kossa
Оценок пока нет
Analysis of Multiple Experiments Tigr Multiple Experiment Viewer (Mev)
Документ130 страниц
Analysis of Multiple Experiments Tigr Multiple Experiment Viewer (Mev)
Sajna Rajthala
Оценок пока нет
Performance Comparison of Different Sort PDF
Документ3 страницы
Performance Comparison of Different Sort PDF
max crax
Оценок пока нет
oreillyfodooltweek11675274112220
Документ45 страниц
oreillyfodooltweek11675274112220
sanedo.owner
Оценок пока нет
Parallel and Distributed IR Techniques for Large-Scale Document Collections
Документ24 страницы
Parallel and Distributed IR Techniques for Large-Scale Document Collections
Suvetha Senguttuvan
Оценок пока нет
Getting Started With IP
Документ15 страниц
Getting Started With IP
Lexa Zc
Оценок пока нет
IRS Unit-3
Документ32 страницы
IRS Unit-3
Nagendra Marisetti
Оценок пока нет
Lec7 PDF
Документ16 страниц
Lec7 PDF
Rabia Chaudhary
Оценок пока нет
The Future of Real-Time in Spark: Reynold Xin @rxin
Документ30 страниц
The Future of Real-Time in Spark: Reynold Xin @rxin
zameer
Оценок пока нет
Tensor FPGA
Документ24 страницы
Tensor FPGA
dreadrebirth2342
Оценок пока нет
Parallel Programming Session 1
Документ27 страниц
Parallel Programming Session 1
cilango1
Оценок пока нет
RIoTBench Summary
Документ26 страниц
RIoTBench Summary
bushraNazir
Оценок пока нет
Dimension Projection Matrix/Tree: Interactive Subspace Visual Exploration and Analysis of High Dimensional Data
Документ9 страниц
Dimension Projection Matrix/Tree: Interactive Subspace Visual Exploration and Analysis of High Dimensional Data
Ani Ani
Оценок пока нет
Joke Recommendation Systems Using SVD and Text Similarity
Документ6 страниц
Joke Recommendation Systems Using SVD and Text Similarity
Gustavo Luizon
Оценок пока нет
Low-Latency Multi-Threaded Ensemble Learning For Dynamic Big Data Streams
Документ10 страниц
Low-Latency Multi-Threaded Ensemble Learning For Dynamic Big Data Streams
ashinantony
Оценок пока нет
Cactus Project & Collaborative Working
Документ11 страниц
Cactus Project & Collaborative Working
Ankit Singh
Оценок пока нет
Distributed Indexing and Searching
Документ28 страниц
Distributed Indexing and Searching
Black Knight
Оценок пока нет
Module 4 Data Manipulation
Документ19 страниц
Module 4 Data Manipulation
Rachell Ann Uson
Оценок пока нет
OAC 105.4 Selected New Features: Draft V2
Документ23 страницы
OAC 105.4 Selected New Features: Draft V2
nandakarsan
Оценок пока нет
CS3361 - Data Science Laboratory
Документ31 страница
CS3361 - Data Science Laboratory
paranjothi karthik
Оценок пока нет
dm9 DDB
Документ52 страницы
dm9 DDB
bangkit sitohang
Оценок пока нет
BCS Topic
Документ66 страниц
BCS Topic
ShahidUmar
Оценок пока нет
Multi-Agent System For Documents Retrieval and Evaluation Using Fuzzy Inference Systems
Документ7 страниц
Multi-Agent System For Documents Retrieval and Evaluation Using Fuzzy Inference Systems
IAES IJAI
Оценок пока нет
Abstraction and Modular Reasoning For The Verification of Software
Документ44 страницы
Abstraction and Modular Reasoning For The Verification of Software
Lamiss Hamd
Оценок пока нет
IR Basics Lec28 Oct 3 2011
Документ26 страниц
IR Basics Lec28 Oct 3 2011
Mathesh Paramasivam
Оценок пока нет
Azure Data Factory Data Flows: A Code-Free Approach to Data Transformation at Scale
Документ30 страниц
Azure Data Factory Data Flows: A Code-Free Approach to Data Transformation at Scale
Babu Shaikh
100% (1)
Monitoring Docker Containers With Splunk PDF
Документ33 страницы
Monitoring Docker Containers With Splunk PDF
bobwillmore
Оценок пока нет
RAG Slide ENG
Документ41 страница
RAG Slide ENG
Wellcare
Оценок пока нет
Training Objectives & Architectures: Bert, GPT, T5, Bart & Xlnet: Comprehensively Compared
Документ36 страниц
Training Objectives & Architectures: Bert, GPT, T5, Bart & Xlnet: Comprehensively Compared
FILALI HOUCEMEDDINE
Оценок пока нет
TCS DynatraceTraining
Документ127 страниц
TCS DynatraceTraining
Sreenivasulu Reddy Sanam
100% (2)
EEEM048/COM3023-Internet of Things: Lecture 6 - Intelligent Data Processing
Документ40 страниц
EEEM048/COM3023-Internet of Things: Lecture 6 - Intelligent Data Processing
PV sushmitha
Оценок пока нет
Software Architecture with Python
От Everand
Software Architecture with Python
Anand Balachandran Pillai
Оценок пока нет
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
От Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
Оценок пока нет
Modern Information Retrieval: A Brief Overview: by Amit Singhal
Документ14 страниц
Modern Information Retrieval: A Brief Overview: by Amit Singhal
api-20013624
Оценок пока нет
Relevance Ranking and Relevance Feedback: Carl Staelin
Документ34 страницы
Relevance Ranking and Relevance Feedback: Carl Staelin
api-20013624
Оценок пока нет
CSCI 7000 Modern Information Retrieval Jim Martin
Документ36 страниц
CSCI 7000 Modern Information Retrieval Jim Martin
api-20013624
Оценок пока нет
CSCI 7000 Modern Information Retrieval: Lecture 1: Introduction
Документ16 страниц
CSCI 7000 Modern Information Retrieval: Lecture 1: Introduction
api-20013624
Оценок пока нет
Modern Information Retrieval Chapter 7: Text Operations: Ricardo Baeza-Yates Berthier Ribeiro-Neto
Документ40 страниц
Modern Information Retrieval Chapter 7: Text Operations: Ricardo Baeza-Yates Berthier Ribeiro-Neto
api-20013624
Оценок пока нет
Modern Information Retrieval: A Brief Overview
Документ9 страниц
Modern Information Retrieval: A Brief Overview
api-20013624
Оценок пока нет
cs419-519 Slides Part 2
Документ6 страниц
cs419-519 Slides Part 2
api-20013624
Оценок пока нет
Conceptual Structures in Modern Information Retrieval: Claudio Carpineto
Документ28 страниц
Conceptual Structures in Modern Information Retrieval: Claudio Carpineto
api-20013624
Оценок пока нет
Prayer of Submission
Документ3 страницы
Prayer of Submission
LindaLindy
Оценок пока нет
A P1qneer 1n Neuroscience: Rita Levi-Montalcini
Документ43 страницы
A P1qneer 1n Neuroscience: Rita Levi-Montalcini
Ant
Оценок пока нет
Inner Ear Balance Problems
Документ6 страниц
Inner Ear Balance Problems
aleiyo
Оценок пока нет
Sexual Extacy
Документ18 страниц
Sexual Extacy
Chal Jhonny
Оценок пока нет
4 5994641624901094407
Документ20 страниц
4 5994641624901094407
Success
100% (1)
A Better Kiln Coating
Документ2 страницы
A Better Kiln Coating
amir
100% (4)
F2970.1558734-1 Trampoline Court
Документ22 страницы
F2970.1558734-1 Trampoline Court
Kannan Lakshmanan
Оценок пока нет
ZetaPlus EXT SP Series C
Документ5 страниц
ZetaPlus EXT SP Series C
georgadam1983
Оценок пока нет
OE & HS Subjects 2018-19
Документ94 страницы
OE & HS Subjects 2018-19
bharath hs
Оценок пока нет
Handout Week10.1
Документ7 страниц
Handout Week10.1
Antoniette Niña Yuson
Оценок пока нет
Burning Arduino Bootloader With AVR USBASP PDF
Документ6 страниц
Burning Arduino Bootloader With AVR USBASP PDF
xem3
Оценок пока нет
Embodied experience at the core of Performance Studies
Документ10 страниц
Embodied experience at the core of Performance Studies
Victor Bobadilla Parra
Оценок пока нет
Ford Taurus Service Manual - Disassembly and Assembly - Automatic Transaxle-Transmission - 6F35 - Automatic Transmission - Powertrain
Документ62 страницы
Ford Taurus Service Manual - Disassembly and Assembly - Automatic Transaxle-Transmission - 6F35 - Automatic Transmission - Powertrain
infocarsservice.de
Оценок пока нет
Jyothy Fabricare Services Ltd. - Word)
Документ64 страницы
Jyothy Fabricare Services Ltd. - Word)
sree02nair88
100% (1)
Key concepts in biology exam
Документ19 страниц
Key concepts in biology exam
Aditya Rai
Оценок пока нет
Equilibrium of Supply and Demand
Документ4 страницы
Equilibrium of Supply and Demand
Juina Mhay Baldillo Chunaco
Оценок пока нет
The Biologic Width: - A Concept in Periodontics and Restorative Dentistry
Документ8 страниц
The Biologic Width: - A Concept in Periodontics and Restorative Dentistry
DrKrishna Das
Оценок пока нет
SPX
Документ6 страниц
SPX
api-3700460
Оценок пока нет
Handout 4-6 Strat
Документ6 страниц
Handout 4-6 Strat
Trixie Jordan
Оценок пока нет
Lesson Plan For Demo
Документ7 страниц
Lesson Plan For Demo
Shiela Tecson Gamayon
Оценок пока нет
Revit Mep Vs Autocad Mep
Документ4 страницы
Revit Mep Vs Autocad Mep
Abdelhameed Tarig Alemairy
Оценок пока нет
Axial and Appendicular Muscles Guide
Документ10 страниц
Axial and Appendicular Muscles Guide
Yasmeen Alnajjar
Оценок пока нет
Tectonics, Vol. 8, NO. 5, PAGES 1015-1036, October 1989
Документ22 страницы
Tectonics, Vol. 8, NO. 5, PAGES 1015-1036, October 1989
atoinsepe
Оценок пока нет
KoL Mekflu - 9
Документ104 страницы
KoL Mekflu - 9
Maha D Nugroho
Оценок пока нет
Parameter Pengelasan SMAW: No Bahan Diameter Ampere Polaritas Penetrasi Rekomendasi Posisi Pengguanaan
Документ2 страницы
Parameter Pengelasan SMAW: No Bahan Diameter Ampere Polaritas Penetrasi Rekomendasi Posisi Pengguanaan
Khamdi Afandi
Оценок пока нет
Sujet Dissertation Sciences Politiques
Документ7 страниц
Sujet Dissertation Sciences Politiques
DoMyPaperSingapore
100% (1)
Clare Redman Statement of Intent
Документ4 страницы
Clare Redman Statement of Intent
api-309923259
Оценок пока нет
BC C Punmia Beam
Документ14 страниц
BC C Punmia Beam
vikrantgouda
Оценок пока нет
Introduction to Philippine Literature
Документ61 страница
Introduction to Philippine Literature
alvindadacay
Оценок пока нет
Alcatel 350 User Guide Features
Документ4 страницы
Alcatel 350 User Guide Features
Filipe Cardoso
Оценок пока нет