Вы находитесь на странице: 1из 2

Volume 4 Issue 4 April 2012

Editors Desk

Federate to Aggregate or Meta Search to Segregate?


Discovery services are looking ahead to federated search engines (FSEs) to aggregate search results from several e-resources of ones choice. True, no one would like to repeatedly execute searches on different sources. But, we often forget the fact that the merged set of search results is at the cost of loss of some information (like what are all the sources which gave particular a record). Further, combined results are better than the segregated result set (usually provided by Meta search engines) if only there is an effective and efficient de-duplication of resultant set. The results from the combined set need to be ranked effectively after merging and de-

duplication. Do we really have FSEs that have the capability to do quality de-duplication and reranking of merged results? Let us see only these two aspects to better appreciate the end result of FSEs. De-duplication: Like a typical clustering process, the exact replicas can be eliminated easily, but to define and eliminate the duplicates based on similarity measure is somewhat tricky. After all, de-duplication is not clustering. Variations between items occur not only intentionally by the creator of content, but also due to errors and varying formats of content hosts. It is a general experience that even popular search engines hardly are able to accurately auto-remove duplicates. Thus, together with handling partial duplicates, managing large number of results from the Web, copyright issue of branded and proprietary material from databases of commercial publishers are the challenges faced in aggregation. Ranking of Hits: Ranking of Web sites in terms of Web Impact Factor (WIF) or any other criteria, is a matter of routine and similar to that of citation ranking model. As far as ranking of hits is concerned it is a different case, particularly the monstrous PageRanking technique of Google has grown very wild with hundreds of factors to confuse the end-users. Whatever the criteria adopted and complex formula developed, it is the simple by date or latest reference first that emerges as the default (or at least as one of the few options) of ranking criteria for scholarly publications. Since the ranking of hits has been wantonly made sufficiently complicated, majority of end-users shy away to know and understand what happens in the background. When each search engine has its own formula and tool to rank hits, one can imagine how complicated it would become for FSE to rank such merged heterogeneously ranked sets of results.

One easiest solution, as mentioned earlier, is to rank the merged set of results by date of publication (latest reference first). The next option is to retain the ranks thrown by individual search engines/e-resources and present the merged results in a series like A1, B1, C1 and so on. This kind of presentation is justified as the ranks are not uniformly decided by the constituent search engines and sources. A third option, due to the difficulty of freshly ranking the result sets, is to give varied weightages to results from different search engines/sources. For example, if A is given double importance than B and C, then A1, A2, B1, C1, A3, A4, B2, C2, and so on. Lot of variations in the weightage is possible, but it has to be done carefully, only when one is aware of the significance of each source.

It is not correct to judge whether to go for aggregation or segregation based only on deduplication and ranking of hits. When a user has to search across many dissimilar sources and databases like OPAC, databases journal articles, patents and so on, federation is still the appropriate option. Unfortunately, like advanced search options in search engines, very few end-users might bother to check how effective the de-duplication and ranking in a search result are. Of course, an ideal FSE is one which not only allows pick and choose among various eresources/ databases for aggregation, but also provides an option to de-aggregate the result set (i.e., get back to segregated results).

Happy Searching!

M S Sridhar
sridhar@informindia.co.in We are happy to acknowledge significant increase in the number of our Newsletter readers and regular online stream to the Quiz that we introduced recently. We request the winners of the Quiz to forward their result-declaring e-mail from us to prasad@inforindia.co.in if they have scored 100% in the first attempt and provide their address and phone number for arranging the gift. We are also introducing a new column on Case studies of searching J-Gate and invite our end-users to share their experiences of searching (not more that 200 words please) The case studies that are accepted and published will be suitably rewarded with gifts.

We are using a list of e-mail IDs to trigger alert messages when a new issue of newsletter is released. Those who wish to add their e-mail IDs to the list may send the same to Sumitha@informindia.co.in or Jayashree@informindia.co.in .

Вам также может понравиться