Enterprise Search

Enterprise search
From Wikipedia, the free encyclopedia

Enterprise search is the practice of making content from multiple enterprise-type sources, such as
databases and intranets, searchable to a defined audience.
Contents1 Enterprise search summary2 Components of an enterprise search system2.1 Content
awareness2.2 Content processing and analysis2.3 Indexing2.4 Query Processing2.5 Matching3
Differences from web search4 Relevance factors for enterprise search5 Access Control - early
binding vs late binding5.1 Late binding5.2 Early binding6 Search Relevance Testing options7 See
also8 ReferencesEnterprise search summary
"Enterprise Search" is used to describe the software of search information within an enterprise
(though the search function and its results may still be public).[1] Enterprise search can be
contrasted with web search, which applies search technology to documents on the open web, and
desktop search, which applies search technology to the content on a single computer.
Enterprise search systems index data and documents from a variety of sources such as: file systems,
intranets, document management systems, e-mail, and databases. Many enterprise search systems
integrate structured and unstructured data in their collections.[2] Enterprise search systems also
use access controls to enforce a security policy on their users.[3]
Enterprise search can be seen as a type of vertical search of an enterprise.
Components of an enterprise search system
In an enterprise search system, content goes through various phases from source repository to
search results:
Content awareness
Content awareness (or "content collection") is usually either a push or pull model. In the push model,
a source system is integrated with the search engine in such a way that it connects to it and pushes
new content directly to its APIs. This model is used when realtime indexing is important. In the pull
model, the software gathers content from sources using a connector such as a web crawler or a
database connector. The connector typically polls the source with certain intervals to look for new,
updated or deleted content.[4]
Content processing and analysis
Content from different sources may have many different formats or document types, such as XML,
HTML, Office document formats or plain text. The content processing phase processes the incoming
documents to plain text using document filters. It is also often necessary to normalize content in
various ways to improve recall or precision. These may include stemming, lemmatization, synonym
expansion, entity extraction, part of speech tagging.
As part of processing and analysis, tokenization is applied to split the content into tokens which is
the basic matching unit. It is also common to normalize tokens to lower case to provide case-
insensitive search, as well as to normalize accents to provide better recall.[5]

Indexing
The resulting text is stored in an index, which is optimized for quick lookups without storing the full
text of the document. The index may contain the dictionary of all unique words in the corpus as well
as information about ranking and term frequency.
Query Processing
Using a web page, the user issues a query to the system. The query consists of any terms the user
enters as well as navigational actions such as faceting and paging information.
Matching
The processed query is then compared to the stored index, and the search system returns results (or
"hits") referencing source documents that match. Some systems are able to present the document as
it was indexed.
Differences from web search
Beyond the difference in the kinds of materials being indexed, enterprise search systems also
typically include functionality that is not associated with the mainstream web search engines. These
include:
Adapters to index content from a variety of repositories, such as databases and content management
systems.Federated search, which consists oftransforming a query and broadcasting it to a group of
disparate databases or external content sources with the appropriate syntax,merging the results
collected from the databases,presenting them in a succinct and unified format with minimal
duplication, andproviding a means, performed either automatically or by the portal user, to sort the
merged result set.Enterprise bookmarking, collaborative tagging systems for capturing knowledge
about structured and semi-structured enterprise data.Entity extraction that seeks to locate and
classify elements in text into predefined categories such as the names of persons, organizations,
locations, expressions of times, quantities, monetary values, percentages, etc.Faceted search, a
technique for accessing a collection of information represented using a faceted classification,
allowing users to explore by filtering available information.Access control, usually in the form of an
Access control list (ACL), is often required to restrict access to documents based on individual user
identities. There are many types of access control mechanisms for different content sources making
this a complex task to address comprehensively in an enterprise search environment (see
below).Text clustering, which groups the top several hundred search results into topics that are
computed on the fly from the search-results descriptions, typically titles, excerpts (snippets), and
meta-data. This technique lets users navigate the content by topic rather than by the meta-data that
is used in faceting. Clustering compensates for the problem of incompatible meta-data across
multiple enterprise repositories, which hinders the usefulness of faceting.User interfaces, which in
web search are deliberately kept simple in order not to distract the user from clicking on ads, which
generates the revenue. Although the business model for enterprise search could include showing
ads, in practice this is not done. To enhance end user productivity, enterprise vendors continually
experiment with rich UI functionality which occupies significant screen space, which would be
problematic for web search.Relevance factors for enterprise search
The factors that determine the relevance of search results within the context of an enterprise
overlap with but are different from those that apply to web search. In general, enterprise search
engines cannot take advantage of the rich link Data Management structure as is found on the web's
hypertext content, however, a new breed of Enterprise search engines based on a bottom-up Web
2.0 technology are providing both a contributory approach and hyperlinking within the enterprise.
Algorithms like PageRank exploit hyperlink structure to assign authority to documents, and then use
that authority as a query-independent relevance factor. In contrast, enterprises typically have to use
other query-independent factors, such as a document's
http://social.technet.microsoft.com/profile/maginfo/ recency or popularity, along with querydependent factors traditionally associated with information retrieval algorithms. Also, the rich
functionality of enterprise search UIs, such as clustering and faceting, diminish reliance on ranking
as the means to direct the user's attention.
Access Control - early binding vs late binding
Security and restricted access to documents is an important matter in Enterprise Search. There are
two main approaches to apply restricted access: early binding vs late binding.[6]
Late binding
Permissions are analyzed and assigned to documents at query stage. Query engine generates a
document set and before returning it to a user this set is filtered based on user access rights. It is
costly process but accurate (based on user permissions at the moment of query).
Early binding
Permissions are analyzed and assigned to documents at indexing stage. It is much more effective
than late binding, but could be inaccurate (user might be granted or revoked permissions between in
the period between indexing and querying).
Search Relevance Testing options
Search application relevance can be determined by following relevance testing options like[7]
Focus groupsReference evaluation protocol (based on relevance judgements of results from agreedupon queries performed against common document corpuses)Empirical testingA/B testingLog
analysis on a Beta production siteOnline ratingsSee alsoComparison of enterprise search
softwareList of enterprise search vendorsList of Search EnginesCollaborative search engineData
Defined StorageEnterprise bookmarkingEnterprise information accessKnowledge managementText
miningFaceted searchInformation ExtractionVertical SearchReferences^ What is Enterprise
Search?^ The New Face of Enterprise Search: Bridging Structured and Unstructured Information^
Mapping Security Requirements to Enterprise Search - Part 1: Defining Specific Security
Requirements^ Understanding Content Collection and Indexing^ Stemming, Variations, and Accent
Folding^ Enterprise Search: document access control^ Debugging Search Application Relevance
IssuesRetrieved from
"https://en.wikipedia.org/w/index.php?title=Enterprise_search&oldid=666716374"
https://en.wikipedia.org/wiki/Enterprise_search

Enterprise Search

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Enterprise Search

Загружено:

Авторское право:

Доступные форматы

Enterprise search

From Wikipedia, the free encyclopedia

insensitive search, as well as to normalize accents to provide better recall.[5]

Вам также может понравиться