Вы находитесь на странице: 1из 16

Good Information

Is Hard to Find:
Guidelines for Managers
Considering Open Source Enterprise Search

A Lucid Imagination White Paper


Abstract
Enterprise search helps employees, customers, and partners find the most relevant and
timely information, enabling them to make smart, efficient decisions about doing business
with your company. Open source has provided strong benefits in enterprise software such
as operating systems, databases, and middleware, now unleashes value in enterprise
search. Lucid Imagination brings market-leading expertise to open source enterprise
search, and can help any organization quickly design and optimize search solutions based
on Lucene and Solr.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 1
Table of Contents
Abstract ..................................................................................................................................................................... 1
Introduction and Overview ............................................................................................................................... 3
The Advantages of Open Source ...................................................................................................................... 5
Lower Costs ......................................................................................................................................................... 5
Pay at the Point of Value................................................................................................................................. 6
Transparent Development ............................................................................................................................ 6
Re-tool the employees, retire the software............................................................................................. 7
Lower Overall Risk ........................................................................................................................................... 7
About Lucid Imagination.................................................................................................................................... 8
Engagement Scenarios ..................................................................................................................................... 10
Considering Alternatives to Legacy Packaged Search Applications ........................................... 10
Building on In-house Lucene/Solr Expertise ...................................................................................... 12
Next Steps ............................................................................................................................................................. 13
Appendix: About Apache Lucene and Solr ............................................................................................... 14

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 2
Introduction and Overview
Raising the collective intelligence of company employees can make them smarter and more
efficient—but how do you enable them to keep up with the vast, ever-changing amount of
data your organization produces? Many operations seem to be better at creating data than
using it to operate more productively. Using search tools designed for the Web can make it
difficult to find relevant, timely corporate information, mostly because corporate data is
not much like Web data:
• Corporate data can be stored in a variety of different and unstructured formats,
including documents and database records.
• A document’s popularity is not necessarily what makes it useful to a specific search.
• Information may require controlled access, yet still be discoverable to those users
with the appropriate permissions.

Two state-of-the-art, open source search technologies—Lucene and Solr—are available for
free from the Apache Software Foundation. Lucene is a powerful search engine and library;
Solr provides a platform built on top of Lucene that makes it easy to build Lucene-based
applications.1 Rich, flexible text query tools and sophisticated ranking capabilities of
Lucene/Solr enable users to quickly find the most useful documents or records.
Either of these full-featured technologies delivers excellent performance, relevancy
ranking, and scalability. They are used today by thousands of organizations, powering
substantial and diverse search applications for AOL, CNET, Comcast Interactive Media, IBM,
Netflix, LinkedIn, MySpace, and many others. For these companies, Lucene/Solr solutions
regularly index and search hundreds of millions of documents with subsecond response
time, all without incurring any licensing fees.
These solutions excel at quickly and effectively searching large volumes of unstructured
text—documents or other records containing freeform text—and returning results based

1
Most organizations use Solr today as their search development platform. Because Lucene serves as the core of
Solr’s search capabilities, this paper refers to them as Lucene/Solr. For more information about these technologies,
see the Appendix.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 3
on how well they match the user’s query. At most companies, this means digesting and
searching through dozens of different file formats—including documents, spreadsheets,
presentations, e-mail, and records stored in databases, to name just a few—and delivering
relevant results to authorized users. Incremental update capabilities mean that
Lucene/Solr searches can track document collections easily as they grow and change,
finding information nearly as fast as it is created.
Solr can speedily facet, or categorize, data and search results based on specific field values.
An excellent example of this function is Zappos.com, the popular shoe e-tailer, where users
can quickly refine searches based on product criteria such as price or features.
For most application development teams, building a search application is not an everyday
project. By definition, enterprise search technology processes unstructured data, which can
change frequently. Expert guidance on architectural considerations, such as index
optimization, result relevance, deployment configuration, and retrieval performance can
make a tremendous difference in deploying a successful solution. By taking advantage of
expert, experienced personnel to assist with application design, development, and
deployment, organizations can leverage the full benefit of Lucene/Solr search technologies
without the cost of licensing proprietary software.
For these reasons, Lucid Imagination provides commercial-grade support, training, and
professional consulting services that are essential to designing and installing successful
enterprise applications.
This paper is intended for business decision makers who are considering options for
powerful, flexible enterprise search solutions. It provides guidelines for understanding:
• Advantages of open source software, including ways it can lower costs and risks,
• Why Lucid Imagination’s service and support is a key ingredient in achieving successful
Lucene/Solr solutions,
• Engagement scenarios—the types of situations where Lucid Imagination can help, and
• The capabilities of Lucene/Solr, which are provided in an appendix.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 4
The Advantages of Open Source
Open Source has changed the IT landscape. Gartner says 85 percent of polled companies
are already using open source software, calling the use of open source software
“pervasive.”2 Most organizations are now familiar with free and open source products such
as Linux, MySQL, Apache, and SugarCRM, because of the many benefits, including:
• Lower costs
• Pay at the point of value
• Transparent development
• Control and flexibility – investing in people instead of licenses
• Lower overall risk

With Lucene/Solr’s broad, successful adoption across markets and deployments, these
advantages are now available for enterprise search applications. Let’s take a closer look at
how open source pays off.

Lower Costs
While proprietary software vendors must try to recover their development costs, this is not
the case with open source software, because it does not have capital costs associated with
source code IP. The cost of talent is less, too. Community development, adherence to
standards, and lower barriers to adoption all help increase the number of developers who
become proficient in the use of a product or technology. Together, these factors combine to
reduce upward pricing pressure.
The high license fees associated with proprietary and closed source development can
discourage developers and customers from adopting a product or technology. In contrast,
open source communities help lower costs by encouraging participation and allowing
anyone to download the source code and try it out. Most open source communities release

2
http://www.theregister.co.uk/2008/11/18/gartner_open_source/

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 5
updated binaries on a periodic basis, so users can easily try the software on their own
timetables.
Many commercial solutions combine proprietary software with service and support, and
customers may believe that buying a software license is sufficient to get a search
application up and running. In most cases, however, the technology’s purchase price makes
up less than half of the implementation cost, with the balance going to services. Both open
source and proprietary software usually require a significant amount of customization,
which means some service and support costs are inevitable.

Pay at the Point of Value


Open source project code is freely available for any use. If a company can become proficient
with the code, it can make productive use the code at any phase from evaluation to
production. Only in those areas where an open source customer sees value—for support
and integration services, or for additional functionality or expertise—does money need to
be spent. There are no restrictions on when open source software can be used.
In contrast, proprietary products typically must be purchased before they can be used, or
in some cases, even evaluated. Some vendors offer evaluation or trial versions, but these
often have reduced functionality or restrictive licenses. Because the software must be
purchased before the customer can see any value from the product, return on investment is
delayed.

Transparent Development
Community-developed software enables everyone to see what is being built and which
features are included as early as possible. Developers and customers do not need to wait
for a vendor to publish a roadmap or product launch to know what is being readied for
release. As a result, prospective users can make better, faster, and more informed decisions
relating to their software infrastructure.
Compare this to proprietary software, where customers have little if any insight into
upcoming products until very late in the product life cycle. This is typically no sooner than
the software’s beta release, when it is too late to provide input on features and
functionality. This delays assessment and adoption of innovations.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 6
Re-tool the employees, retire the software
In this tough economic climate, managers who own budgets need to review every expense
with a critical eye. Many software applications that made sense a few years back may have
out-lived their intended fit to business needs.
Any application development effort generates significant learning. The work of
development imbues it requires the expertise of in-house developers with deep knowledge
and understanding of the company, its IT infrastructure, culture, and usage requirements.
Given that software applications must keep up with an organization’s changing goals and
requirements as the needs of its market and constituents evolve, the expertise which the
technical staff develops becomes is a vital competitive asset.
This is key corollary benefit of the open source model: by retiring old software packages
and investing in staff expertise, companies combine innovative technology with their most
valuable asset – their people, establishing vital competitive advantage.
Companies who leverage savings from not purchasing software licenses to build
development talent in-house reduce the cost of addressing inevitable change. What’s more,
increasing a technical team’s ability to translate company business objectives into
technology solutions increases the likelihood that the software they build will continue to
fit that inevitable change. This is particularly true for an enterprise search solution. What’s
more, compared to closed source implementations, in-house developers can work with
open source code and supplement additional functions or expertise by relying on the
community and marketplace of readily available resources – again capturing unique
competitive advantage.
Supplementing open source development with training, consulting, and reliable support
from established industry experts reinforces a company’s competitive advantage – with the
control and flexibility needed to survive and thrive.

Lower Overall Risk


Vendors use proprietary interfaces and components to lock in customers. However, the
source code for open source software is freely available and widely supported by the
community, based on standardized, free public interfaces. If a commercial vendor goes out
of business (or is purchased by another), or tries to increase fees for a commercial product,

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 7
open source vendors may be able to step in to meet the needs of customers at market-
competitive prices.
Open source software can reduce security and operational risks, too. Widely used open
source software is essentially under constant peer review. Technical or security issues,
once exposed in the community, are readily addressed, resulting in a safer and more
reliable product.

About Lucid Imagination


The benefits of open source have unlocked tremendous value in many software categories:
Red Hat’s Enterprise Linux in operating systems, MySQL in database software, Sugar in
CRM software—all have benefited from matching the efficiencies of open source with deep,
robust commercial resources to ensure successful applications. Today, Lucid Imagination’s
capabilities and expertise brings that same approach to unlocking enterprise search with
Lucene and Solr.
Lucid Imagination’s mission is to enable customers to achieve business objectives for
optimal search performance and accuracy, with lower total cost of ownership and faster
time to market. The company’s founding team consists of many key contributors and
committers to the Lucene/Solr project, as well as other experts in enterprise search
application development. Our skills, acquired across hundreds of deployments, including
best practices and technical know-how, can enhance and optimize any phase of an open
source search implementation.
Lucid Imagination’s team has a deep understanding of indexing, which is the foundation of
any search solution; it captures all the content and location of searched documents for
quick lookup, much as a book index does. We have broad experience indexing:
• Documents of widely varying sizes and formats within a very large collection,
• Documents with diverse metadata requirements, and
• Multilingual documents.

The team is also skilled at applying business rules such as boosting documents and fields,
indexing dates, or other attributes of terms and data. Lucid Imagination has developed best
practices for indexing and metadata management, and can help establish and refine
policies to meet business and technical search requirements, such as:

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 8
• How and when to add documents to an index,
• Removing documents from an index,
• Results relevancy and document/data findability
• Undeleting documents, and
• Batch and real-time updates.

The Lucid Imagination team has extensive experience with large-scale search applications,
including engagements with:
• Large collections—more than one billion documents,
• High query volumes and large user populations,
• High document growth rates,
• Distributed indexing and searching,
• Replication and high availability, and
• Cloud environments.

In addition to fine-tuning search technology machinery, the Lucid Imagination team has
significant expertise in natural language processing, which optimizes the interaction of
compute resources with human-created content. Key considerations include:

• Developing structured methods for characterizing how well a set of results meets user
needs,
• Establishing a tradeoff between overall net gain in the quality of results across the whole
application, versus a single improvement for one query or user, and
• Improving the ability to find accurate answers by leveraging a balanced mix of content
analysis and query interpretation algorithms.

The breadth of expertise offered by Lucid is available in a variety of forms suited to a range
of different business needs and deployment requirements. This enables customers to
create even more powerful and successful search applications.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 9
Engagement Scenarios
Virtually every company and organization uses some form of enterprise search, to help
customers, employees, and partners find the information they need. Many companies use
packaged commercial software applications; but, over time, their requirements evolve
beyond the original platform’s limitations. Also, licensing or customization costs may grow
too high, or the number and type of documents may expand beyond the original design’s
capacity. As companies evaluate the ongoing fit of their current search applications to an
ever changing market and organizational landscape, they naturally ask “Is there a faster,
cheaper, more effective way to do this?”
Today, thousands of companies and organizations—each with unique search and retrieval
requirements—answered this question with Lucene/Solr. The essential value of Lucid
Imagination and open source Lucene/Solr technology is that it provides commercial
support that adapts to specific requirements. Whether a company is evaluating
Lucene/Solr for a new implementation, considering replacement of a commercial search
product, or enhancing an existing Lucene/Solr implementation, Lucid Imagination offers
skills and resources to help at every phase of the project life cycle.

Considering Alternatives to Legacy Packaged Search Applications


Change happens quickly, but taking advantage of new opportunities can be limited by
existing applications and traditional ways of doing things. Organizations with legacy search
applications often realize that they are paying too much to align packaged enterprise
search applications with evolving business requirements. In other cases, they discover it is
too difficult to integrate existing software with new services, or it takes too long to meet
new corporate goals. With the power of Lucene/Solr, Lucid Imagination supplies the
expertise organizations need to produce successful search solution efforts, more quickly
and less expensively—now and going forward—than other solutions.
• Consulting services are highly customized and able to engage quickly to shorten
cycles and ramp times, minimize errors and design pitfalls, and improve production
results. Lucid Imagination’s consulting team consists of senior search technologists
who are intimately familiar with Lucene/Solr technologies and have extensive
experience in field-tested search solutions for diverse deployment scenarios.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 10
Open source software is ideally suited to low-cost prototyping, because it can
reduce time to deployment and refine the user experience. For customers striving to
integrate a highly diverse base of data and documents, Lucid Imagination offers
prototyping services to assist with the process.
• Technical training can bring everyone in the IT department up to speed on best
practices and the elements of good search design—establishing a solid base of skills
before coding begins. This can greatly reduce downstream problems and reduce
overall costs. Lucid Imagination works with in-house application and system
administration teams to provide the knowledge transfer, guidance, training, and
support required to implement an enterprise search solution that fits the
organization’s specific needs.
• When dependable, predictable support is required to accompany an organization’s
efforts on a regular basis over time, Lucid Imagination’s support subscriptions
provide reliable access to domain experts during the entire application life cycle
process.
 Technical Support features the latest tested versions and timely,
predictable support turnaround times.
 Advanced Development Support provides expert architectural design,
development, and testing guidance for building search applications using
Lucene and Solr.
 Advanced Production Support provides expert advice on configuration,
performance tuning, and optimization for applications deployed to a
production operation environment with live users and service-level
attainment regimes.
 Search Health Check, included with Advanced Support, is a comprehensive
set of services that ensures applications are designed to meet recommended
best practices for search configuration, optimization, and effectiveness.
 Custom Support packages are also available for unique situations.

• Lucid Imagination’s free 30-Day Get Started Program is available with downloads of
Lucidworks, our certified distributions of Lucene and Solr. The Get Started Program
complements Lucidworks with added guidance for questions on first-time
installation, configuration, and basic usage, as well as evaluation of Lucene/Solr and

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 11
included utilities. LucidWorks for Solr is the logical starting point for most
developers building search applications with Lucene/Solr technology for websites,
products, or internal organizational use, because it bundles the most recent and
stable Apache/Solr capabilities, along with other tools and utilities.

Building on In-house Lucene/Solr Expertise


Many organizations with in-house Lucene/Solr expertise have achieved considerable
sophistication in their deployments. Still, they may reach a point where it is difficult to
move the architecture or implementation past a particular design, deployment, or
optimization constraint. There can be many reasons for this, such as limitations on staff
expertise, design, or architecture. Configurations and policies may not have kept pace with
current best practices. A dependent part of the IT environment may have changed—
anything from upgraded complementary applications to new middleware, or expanded
data volume and variety.
For organizations that are ready to gain the required knowledge to move ahead, address
the current situation, and make sure that a deployment stays at peak performance, Lucid
Imagination recommends an in-depth engagement. Typically in a consultative format,
engagement begins with an in-depth assessment and review followed by best practices
design recommendations, and ends with a strategy proposal for achieving long-term,
sustainable innovation for search solutions.
Another key area where Lucid Imagination stands ready to help is in optimizing
performance—both in application response time and its utilization of hardware/software
resources. Lucid Imagination experts work with in-house teams to diagnose and improve
search application efficiencies.
As mentioned earlier, a significant benefit of open source software is its ability to provide
fast, low-cost prototyping as a means to reduce time to deployment and refine the user
experience. For customers that seek to integrate highly diverse bases of data and
documents, or accelerate evaluations of open source search solutions, Lucid Imagination
offers prototyping services.
While community support has always been a significant benefit of open source projects,
tough issues may not always be answered in timely fashion or with the discretion

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 12
necessary to prevent exposure of confidential organizational knowledge. That’s when Lucid
Imagination’s expert teams can help.
Some companies are already skilled in open source technologies in general and
Lucene/Solr in particular. For these, Lucid Imagination offers Technical Support and
Advanced Support. Technical Support can provide answers within defined response times
for users encountering problems with Lucene/Solr projects or production
implementations.
Different levels of support address most situations. For example, an e-commerce startup
may find that community forums provide suitable answers, but not always as quickly as
needed. Basic Technical Support provides Web-based and e-mail support at competitive
rates for customers that do not require same-day response or direct telephone support.
Lucid Imagination also offers various levels of Technical Support for larger or mission-
critical installations, including fast turnaround, diagnosis, and bug fixes. Finally, Enterprise
Technical Support includes Search Health Checks by Lucid Imagination domain experts to
help ensure optimal runtime effectiveness.

Next Steps
For more information on how Lucid Imagination can help employees, customers, and
partners find the information they need, please visit http://www.lucidimagination.com to
access blog posts, articles, and reviews of dozens of successful implementations. Please e-
mail specific questions to:
Support and Service: support@lucidimagination.com
Sales and Commercial: sales@lucidimagination.com
Consulting: consulting@lucidimagination.com
Or call: 1.650.353.4057

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 13
Appendix: About Apache Lucene and Solr
Apache Lucene/Solr offers an attractive alternative to proprietary search and discovery
software vendors. Lucene is a Java technology-based search library and Solr is a platform
built atop Lucene that provides application builders with a ready-to-use search platform.
Both Lucene and Solr are free and open source. They are available under the Apache
Software License, which allows users to modify or embed the technology as they see fit, and
to keep, sell, and/or redistribute any resulting product.
Solr is the logical starting point for most developers building search applications with
Lucene/Solr technology for websites, products, or internal organizational use. Most users
building Lucene-based search applications will find it is quicker to start with Solr, since it
contains many of the capabilities needed to turn a core search capability into a full-fledged
search application.

The full-featured core Lucene search engine library offers:

• Speed: Sub-second performance for most queries.


• Relevancy ranking: Out-of-the-box rankings are as good or better than the best
commercial competitors.
• Complete query capabilities: Keyword, Boolean and +/- queries, proximity operators,
wildcards, fielded searching, term/field/document weights, find-similar, spell checking,
multilingual search, and much more.
• Full results processing: Sorting by relevancy, date or any field, dynamic summaries, hit
highlighting, and more.
• Portability: Runs on any platform supporting Java and indexes are portable across
platforms. Indexes built on Linux can be copied to a Microsoft Windows machine where
they can be searched. Lucene and Solr are written entirely in Java; .NET and other
versions are also available.
• Scalability: There are production applications in the hundreds of millions that can search
billions of documents/records.
• Low-overhead indexes and rapid incremental indexing.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 14
The Solr platform adds the following capabilities:

• Web services: Solr places Lucene over HTTP, allowing programs written in any
language to invoke Lucene.
• Faceting: The dynamic clustering of items or search results into categories enables users
to drill into search results (or even skip searching entirely) by any value in any field, as
seen on popular e-commerce sites such as Amazon or Zappos.
• XML-based schema: Manages indexed fields and their characteristics.
• Admin tools: Configuration, data loading, index replication, statistics, logging and cache
management, and more.
• Scalable: Distributed architecture enables large-scale distributed search.
• Configurable: Fixed/paid result list placement.

Good Information Is Hard to Find: Considering Open Source for Enterprise Search
A Lucid Imagination White Paper • April 2009 Page 15

Вам также может понравиться