Академический Документы
Профессиональный Документы
Культура Документы
17 December 2007 Non-electronic documents and calculators are authorized. Name : Semester :
Exercise 1 : Denitions
Dene the following terms : tokenization permuterm index champion list
With the best reduction rate of the dictionary achieved when using a linguistic preprocessing (noise words, stemming), what is the size (number of terms) of the dictionary ?
Consider an index where the average length of a non-positional posting list is 200. What is the estimation of the total number of postings of this index ?
How many bytes do you allow respectively for encoding (without compression) a dictionary term ? a non-positional posting ?
What are the size (mega or giga bytes) of the resulting dictionary and posting lists ?
If you compress your dictionary using the dictionary-as-a-string method, what is the new size of the dictionary ?
rigorous
placement
What is the posting list that can be decoded from the variable byte-code 10001001 00000001 10000010 11111111 ?
What would be the encoding of the same posting list using a -code ?
Compute the vector representations of d1 , d2 and d3 using the tf idft,d weighting and the euclidian normalisation.
Give the ranking retrieved by the system for the query movie trailer.
What workaround would you propose for this insertion ? Give an algorithm for inserting a key-value pair.