Вы находитесь на странице: 1из 5

Recall

What Every CIO Needs to Know About Enterprise Search and Search Engines
Precision

Introduction
Search engines are omnipresent. Every business application or Web site that contains a significant amount of content will have one. From embedded search engines to sophisticated, enterprisescale platforms, there are a small number of established principles that can guide organizations to the effective use of search. This brief white paper provides an overview for CIOs and other senior decision makers of five key issues that affect the success of search applications.

About Search Technologies


Search Technologies is the worlds largest independent provider of search engine expertise, consulting and implementation services. We serve more than 200 corporate and government customers, and maintain expertise in a range of leading search products including Microsoft FAST and SharePoint, the Google Search Appliance, Autonomy, Attivio, Exalead, LucidWorks, NXT / Folio and open source alternatives such as Solr Lucene.

#1 Data Size & Findability


Summary
Recognize that finding information is proportionally harder as the data set grows Just because a search engine is capable of indexing tens of millions of documents and serving search results, it doesnt mean that people will find what they need

Details
Needles in haystacks; your needle would be twice as hard to find if your haystack was double the size. This principle is wholly applicable to search. Back in the late 1990s, many large companies were implementing a new groupware tool, called Lotus Notes. It had an embedded search engine, and the overwhelming majority of initial implementations used this, which was seen as functional and competent. Four years later, the majority of Lotus Notes systems had implemented an add-on search product. What changed? As more data came to reside in Lotus Notes, it became more challenging to find anything. User satisfaction declined, and pressure built to implement a new search solution. There were some performance issues with the embedded engine too.

The purple PowerPoint slide shows how one of the add-on vendors depicted the issue at the time. Corporate color schemes have improved since. Their proposed solution better relevancy ranking makes search more accurate was rather oversold, as it is today. However, their observation about users only browsing a very small percentage of documents before becoming frustrated or giving up was profound, and remains pertinent. Did these add-ons help findability? Not necessarily. Systems such as Verity, RetrievalWare and an early version of Autonomy addressed the performance issues, but findability didnt always improve, for reasons that will be discussed in the next sections. Instead of Lotus Notes, read SharePoint, Documentum, Corporate Intranet or content-rich Web site. The principle of data size vs. findability remains exactly the same.

#2 The Subjectivity of Search Relevancy


Summary
Relevancy is in the eye of the beholder If youre relying primarily on dont worry, our relevance ranking is great, then youre probably going to be disappointed The nature of your data is very influential on relevancy in ways that cannot usually be addressed by technology alone If your search application is important, you will benefit from tuning relevancy to your specific circumstances Use the personalization of search results with care. Indiscriminate use causes more problems than it solves

Details
The need for tuning Today, most leading search vendors provide good (and broadly similar) functionality for relevance ranking, and the ability to customize it to suit the circumstances. The need for tuning can be illustrated by comparing a patent search application with a financial trading system. A 15 year old patent may be just as relevant as a 15-day old patent to a lawyer doing research for a client. A city trader may view anything more than a day old as unimportant. The same relevancy setup is unlikely to suit both applications. One should take more account of document age than the other. Automated personalization Within some search applications, results personalization technology is very useful. Sadly, automated personalization is also proposed (to buyers who want to believe in simplicity and total automation) for a wider range of search applications, where it does not belong.

Personalization techniques can be based on the previous behaviour of the individual, data volunteered by the individual, or on the behaviour of the users peer group. Search applications in which this works well (based on peer group behaviour in these examples) include: E-commerce - A people who bought this, also bought that approach can be used to boost certain search results Customer support portals promoting the documents that, according to embedded analytics, were liked by others who searched for the same thing Yet the personalization of search results can be also problematic. It relies on an assumption of a persons informational needs remaining constant. Sometimes this is the case, mostly not. The author recently observed this effect on eBay, having bought a used bicycle for a teenager. eBay mistakenly promoted other used bicycles for the next few weeks, even though no future bike purchases are planned. Even where users have constancy of task, they may not appreciate search results being skewed, based on their previous behaviour. It annoys the typical knowledge worker as much as it helps. An expert in a subject will typically wish to study multiple aspects of the subject. Perhaps consider promoting personalized results in a side-column widget, but unless automated personalization is obviously right for your application, then it probably isnt. For general applications such as enterprise search, personalization of resul ts is not a silver bullet.

#3 The Importance of Search Navigators


Summary
Almost all important search applications can benefit from search navigators Search navigators are now a well-established approach, from Amazon and eBay down to departmental intranet applications They provide a generally applicable solution to the data growth vs. findability issue It is important to understand why they work, and under what circumstances they work best

Details
Depending on the vendor, search navigators are also called dynamic navigators, facetted search, dynamic navigation, guided navigation, etc. Data set reduction Search navigators reduce the data set from which search results are being served, one easy click at a time. As Lotus Notes systems demonstrated a decade ago, and countless others have done since, search is easier if the data set is small. A white paper discussing in detail why search navigators are so effective in can be found here. Self-personalization of results Thanks to Amazon, eBay and many others, search users are now familiar with the search navigator paradigm. They understand what will happen when you click a link with a little number next to it. This is the general solution to personalizing search results. It is a self -service solution where the user is in control. User familiarity with search navigators means that deploying them is a much safer bet than looking to inference-based personalization.

The importance of accuracy If you present search navigators to users it is important that they are accurate, otherwise users wont trust them and will stop using them. Then youre back to square one. This is a metadata quality issue.

#4 Dont Neglect Data Quality


Summary
Data quality impacts all aspects of search, including relevancy ranking and search navigator accuracy The more diverse your data is, the more attention you need to pay to it, to ensure that the data plays well together behind a single search box Almost all of the important functions in a modern search engine rely on good data structure

Details
Data quality greatly influences search engine effectiveness. Poor data is the leading cause of poor search systems. Dirty data Dirty data causes false positives in search results, which annoy users. Issues include unwanted headers and footers on documents, menu structures that should not have been indexed with the document, unhelpful comments associated with a document, etc. Wrong granularity We recently worked with a customer whose entire intellectual property was contained in fifty enormous documents. They were so big that for most searches, youd get matches an all fifty. Their search users needed a system in which content was indexed in sensibly searchable chunks. In this case, there were 70,000 chunks in the deployed (and highly successful) search application. Automated techniques were applied to the data prior to indexing, to split it into smaller units. For some search applications it can also be beneficial to join documents together into virtual documents. For example, gathering all known data about an individuals skills, qualifications and project experience into a single record for search purposes, even though the original data is kept in a variety of different business systems. Metadata quality If youre lucky, all of your data will already have great metadata which is accurate and complete. Unfortunately, in most situations it is neither. Great metadata requires processes and people to support the technology. To be specific, dont assume that technology alone will provide you with the quality of metadata you need, and if vendors tell you otherwise, ask them to prove it using your data set. Every organization has a unique data set. You need to start by understanding your data and the type and quality of metadata you need to drive a successful search application s1. Only then should you consider the technology options. You wouldnt knowingly allow poor data to pollute your SAP system. Just because search typically deals with unstructured data, it doesnt mean you can neglect data quality.

Search Technologies provides a fixed-price Assessment Service for this purpose

#5 Search Engines Need Maintenance


Summary
We dont just mean software patches and telephone support Data evolves and this will affect data structure and cleanliness, search relevancy and navigator accuracy It doesnt have to be hard work to maintain optimum search system efficiency, but you do need a plan

Details
Implementing search will never be a one-time exercise, because both data sets and user needs constantly evolve. The addition of a new data set to an enterprise search system, if done carelessly, can adversely affect the user experience by clogging results with the new data (which may not be relevant for everybody). A change of platform for an important data set (upgrading to a new CMS, for example) will affect the nature of the data being sent to the search engine and this in turn will change search results, not necessarily for the better. The good news is that established best practices exist for maintaining great search applications. These dont need to be very time-consuming or expensive. But they do need to be planned.

-------------------------

Search Technologies Corporation 590 Herndon Parkway, Suite 375 Herndon, VA 20170 T: +1 703 953 2791
jback@searchtechnologies.com

Search Technologies Limited Kingswick House Sunninghill, Berkshire T: +44 1344 292 292
gcharlesworth@searchtechnologies.com

www.searchtechnologies.com

Вам также может понравиться