Академический Документы
Профессиональный Документы
Культура Документы
Application
Marc Krellenstein
CTO
Lucid Imagination
Abstract
If you think you need a search application, there are some useful first steps to take:
validating that full-text search is the right technology; producing sets of ideal results you’d
like to return for a range of queries; considering the value of supplementing a basic search
results list with document clustering; and producing more specific requirements and
investigating technology options.
There are many ways that people come to the conclusion that they need a search
application, and a variety of ways in which they can then proceed. Here are some of the
steps that users often consider (or maybe should consider) before they’re ready to start
building an application.
such as pictures, video, and audio. The unstructured text you’re searching will usually exist
together with structured “fields” or related pieces of information—the price of a product,
the date of an article. But what makes full-text search useful is that some of the data you are
looking through is unstructured.
Full-text search is good at a variety of information requests that can be hard to satisfy with
other technologies. These include:
• Finding the most relevant information about a specific topic, or an answer to a
particular question,
• Locating a specific document or content item, and
• Exploring information in a general area, or even browsing the collection of
documents or other content as a whole (this is often supported by clustering; see
below).
Full-text search is useful even if you have a way of navigating to a particular item of
interest. Early on, Yahoo! pioneered the idea of a directory of websites, allowing users to
Because this is often difficult many application developers tend to skip this step, preferring
just to build a system and improve it over time to get better results. People may also skip
this step because they assume that ideal results are unattainable. While that is often true,
it’s not always the case. If you don’t at least start with what you’d ideally like to find, you
may have already made an unnecessary compromise, and you can’t really design for
effective relevancy ranking if you don’t have a goal.
Determining what you most want to get in light of the user’s query will also give you
important clues about what technology you need and how to build the system. Matching
results to a query is usually a combination of finding the results most similar to what the
user is asking while also favoring results which may be inherently “better” by some
measure in a particular context, for example, more authoritative, more recent, longer (or
shorter).
To determine the best results and give you some insight into the system you’ll need to
build, you may want to consider some or all of the following:
Clustering Results
There are many ways in which a basic search application can be supplemented beyond
simply providing an ordered list of results. One common extension is the clustering of
search results into categories of related results. This is useful when even the best relevancy
ranked list of documents will not suffice to give users what they want. This may be because
the query is too general to identify what the user is really interested in or because the user
is not interested in something specific but is just exploring an area of interest. In other
cases you may be interested in discovering certain common themes in the results, or that a
particular portion of the content has relevance.
In all these cases, supplementing a best effort relevancy ranking by also presenting groups
or clusters of documents organized around a specific topic or common characteristic can
give a user greater visibility and ease of navigation over the full set of results. In these cases
choosing from a set of categories organized by a common subject, date, price range, or
other attribute will often be faster for getting the user to what they want than walking a
long results list from beginning to end.
All of these are points to include in your review of technologies, in an RFP, and/or in
discussion with technology providers. Things like cost, support, performance, scalability,
• “Faceted search with Solr,” by Yonik Seeley. Discusses the faceting (a form of
clustering) available in Solr:
http://www.lucidimagination.com/Community/Hear-from-the-
Experts/Articles/Faceted-Search-Solr
http://www.lucidimagination.com/Community/Hear-from-the-
Experts/Articles/Faceted-Search-Solr