Вы находитесь на странице: 1из 6

Technical Brief Autonomy IDOL server 5

Autonomys software infrastructure uses sophisticated pattern-matching techniques to enable computers to understand information in context. For the first time, a computer can go beyond keywords and metadata to identify concepts within text itself, determine the concepts' importance and automate the processing of this content, regardless of its format, location, language and source application. Using Autonomy Connectors, Autonomy's unique Intelligent Data Operating Layer (IDOL) integrates unstructured, semi-structured and structured information from multiple repositories through an understanding of the content, delivering a real-time environment in which operations across applications and content are automated, removing all the manual processes involved in getting the right information to the right people at the right time.

IDOL server
At the heart of Autonomys software infrastructure lies IDOL server, a scalable, multithreaded process based on advanced pattern-matching technology that exploits high-performance probabilistic modeling techniques.

Selected IDOL server operations


The intelligent operations that IDOL server performs across structured, semi-structured and unstructured data are highly customizable, offering a wide range of configuration combinations that enable you to perform over 250 data operations.

1. Automatic Query Guidance


IDOL servers Automatic Query Guidance feature provides an easy navigation facility which directs users to the results they require based on a conceptual and contextual understanding of their query. Instead of page ranking, an approach which has been proven to be ineffective in the link free enterprise, Automatic Query Guidance uses conceptual clustering to determine the context of a user's search, and presents the most appropriate results along with other suggestions, even from few or single word queries.

2. Dynamic Clustering
Query results are clustered on the fly to avoid information overload and provide an overview of the different conceptual aspects that results can be grouped into. The clustered results are presented in an easily navigable hierarchy, providing users with speedy access to the right information.

3. Hyperlinking
IDOL server provides the following core information operations: 1. 2. 3. 4. 5. 6. 7. 8. 9. Automatic Query Guidance Dynamic Clustering Hyperlinking Summarization Taxonomy Generation Categorization Channels Channel Recommendation Clustering 13. Profiling 14. Expertise Location 15. Collaboration 16. Alerting 17. Mailing 18. Spelling Correction 19. Dynamic Thesaurus 20. Retrieval - Lite 21. Retrieval - Concept 22. Retrieval - Parametric 23. Retrieval - Federated Hyperlinks can be automatically generated in real time. These link to contextually similar content and can be used to recommend related articles, documents, affinity products or services, or media content that relates to textual content. Because links are automatically inserted at the time a document is retrieved, they can include references to documents and articles written long before. Hyperlinks from archived material can link to the latest news or material on that subject.

10. CEN Clustering 11. Eduction 12. Agents

Technical Brief
4. Summarization
IDOL server accepts a piece of content and returns a summary of the information. IDOL server can generate different types of summary: Rather than relying on rigid rule based category definitions such as Legacy Keyword and Boolean Operators, Autonomys infrastructure relies on an elegant pattern matching process based on concepts to categorize documents and automatically insert tag data sets, route content or alert users to highly relevant information pertinent to the users profile. This highly efficient process means that Autonomy is able to categorize upwards of four million documents in 24 hours per CPU instance. That's approximately one document, every 25 milliseconds. Autonomy hooks into virtually all repositories and data formats respecting all security and access entitlements, delivering complete reliability.

Conceptual summaries
Summaries that contain the most salient concepts of the content

Contextual summaries
Summaries that relate to the context of the original inquiry allowing the most applicable dynamic summary to be provided in the results of a given inquiry.

Quick summaries
Summaries that comprise a few sentences of the result documents.

Category Matching
IDOL server accepts a category or piece of content and returns categories ranked by conceptual similarity. This determines for which categories the piece of content is most appropriate, so that the piece of content can subsequently be tagged, routed or filed accordingly.

5. Taxonomy Generation
IDOL server's automatic Taxonomy Generation feature can automatically understand and create deep hierarchical contextual taxonomies of information. Clustering or any other conceptual operation can be used as a seed for the process. The resulting taxonomy can be used to provide insight into specific areas of the information, to provide an overall information landscape, or as training material for automatic categorization, which then allows information to be placed into a formally dictated and controlled category hierarchy.

7. Channels
IDOL server can automatically provide users with a set of hierarchical channels with highly relevant information pertinent to the respective channel. Eliminating the requirement for manual intervention or pre-tagging, real-time information is dynamically updated into the channels automatically, minimizing the maintenance effort required. Moreover, the administrator can add and remove channels on the fly, without having to re-categorize all of the data.

Automatic Taxonomy Based on Cluster Result


Based on cluster results, IDOL server can build Taxonomies automatically and in real time.

8. Channel Recommendation
IDOL servers Channel Recommendation feature automatically recommends conceptually matching channels when a query is submitted to IDOL server, thus providing users with instant access to relevant information in the hierarchical channels.

Automatic Taxonomy to Category Generation


Once the Automatic Taxonomy Generation process has taken place, it contextually understands the type of data it is dealing with. From this a deep hierarchical contextual taxonomy is generated, known also as an information landscape. Much like the Automatic Cluster to Category Generation, this feature takes the taxonomy results and uses that data to create categories (in order to perform categorization of information using the Categorization operation).

9. Clustering
IDOL server delivers the ability to automatically cluster information. Clustering is the process of taking a large repository of unstructured data, agents or profiles and automatically partitioning the data so that similar information is clustered together. Each cluster represents a concept area within the knowledge base and contains a set of items with common properties.

6. Categorization
IDOL server can automatically categorize data with no requirement for manual input whatsoever. The flexibility of Autonomys Categorization feature allows you to precisely derive categories using concepts found within unstructured text. This ensures that all data is classified in the correct context with the utmost accuracy. Autonomys Categorization feature is a completely scalable solution capable of handling high volumes of information with extreme accuracy and total consistency.

Features:
Automatic clustering of information Configurable sub-headings Automatic title generation Configurable results layout Identify key areas of expertise Complete overview of knowledge base.

Technical Brief
10. CEN Clustering
IDOL server provides Collaboration and Enterprise Network (CEN) Clustering to automatically match clustered data against user agents and profiles in order to identify data that matches people's interests. User interfaces that integrate with IDOL server (for example, Retina, Portal-in-a-Box or third party portals) highlight matching data in a spectrograph and enable on-the-fly display of community users who own matching agents or profiles, providing an instant overview of the community users' details and instant email contactability.

Agent Alerting
The server accepts a piece of content (a sentence, paragraph or page of text, the body of an email, a record containing human readable information, or the derived contextual information of an audio or speech snippet) and returns similar agents ranked by conceptual similarity. This is used to discover users who are interested in the content, or to find experts in a field.

Agent Matching
The high performance agent matching solution enables documents to be dynamically matched against any scale of Boolean Agents. As content is indexed into IDOL server, the content is matched against all Agent rules simultaneously allowing targeted information to be delivered to the user in real time.

Features:
Automatic clustering of information Automatic matching cluster / interests matching Automatic highlighting of popular clusters Identify key areas of expertise Display community user details Email community users Encourage collaboration.

13. Profiling
IDOL server tracks the content with which a user interacts, extracts a conceptual understanding of the content and uses this understanding to maintain a profile of the users interests. This profile is typically used to target information on particular users, recommend content to users and to alert users to the existence of content.

11. Eduction
Eduction identifies concepts in the document in order to add tags to the kind of content you specify.

Features:
Tag training Plain Tagging ConceptValue Tagging Negative Name training Default User definable phrase tags Case-sensitive user defined phrase tags.

14. Expertise Location


IDOL server facilitates the automatic recognition of highly focused experts and reduces the duplication of effort through teamwork and the engagement of proactive collaboration ventures.

15. Collaboration
IDOL server automatically matches users with common explicit interest agents or similar implicit profile agents. This information can be used to create virtual expert knowledge groups.

12. Agents
Agents provide the facilities to find and monitor information from a configurable list of Internet and Intranet sites, News Feeds, Chat Streams and internal repositories highly relevant to the explicit interests of a user. Agents are created in a very user-friendly way using the following options: Natural language descriptions Example content (point and click) Legacy Keyword or Boolean Expressions. IDOL server provides the conceptual information that is needed to create agents. The server accepts a piece of content (training text, a document or a set of documents) or reference (identifier) and returns an encoded representation of the concepts, including each concepts specific underlying patterns of terms and associated probabilistic ratings.

16. Alerting
IDOL server analyzes data in new documents, and compares the concepts the documents contain with agents that users have set up already. It then automatically sends email notification to users whose interests are similar to a new documents content.

17. Mailing
IDOL server regularly emails users to notify them of content that matches their agents and channels that they are subscribed to.

Agent Retraining
The server accepts an agent and a piece of content (training text, a document or a set of documents) and adapts the agent using the content.

Features:
Configurable email format through XSS templates.

Technical Brief
18. Spelling Correction
IDOL server can automatically spell check query text that it receives and suggest correct spelling for terms that it doesnt contain. If a query contains several words that IDOL server does not recognize, it suggests a spelling suggestion for each of these words.

21. Retrieval - Concept


IDOL server provides the following sophisticated conceptual retrieval operations:

Conceptual Matching
IDOL server accepts a piece of content (a sentence, paragraph or page of text, the body of an email, a record containing human-readable information, or the derived contextual information of an audio or speech snippet) or reference (identifier) as input, and returns references to conceptually related documents ranked by relevance or contextual distance. This is used to generate automatic hyperlinks between pieces of content.

19. Dynamic Thesaurus


IDOL server includes a sophisticated conceptual Thesaurus which uses the most salient terms and phrases in the result documents that a query produces in order to offer a selection of alternative query strings. These strings allow a user to quickly execute alternative queries in order to produce a variety of relevant result sets.

Proper Names
IDOL server recognizes names and treats them as a unit.

20. Retrieval - Lite


IDOL server offers the following basic legacy search methods:

Active Matching
IDOL server accepts textual information describing the current user task and returns a list of documents ordered by contextual relevance to the active task.

Legacy Keyword
IDOL server accepts a keyword and returns a list of documents containing the terms ordered by contextual relevance to the query.

Native XML Indexing


This allows IDOL server to natively index plain well-formed XML straight into IDOL server. This feature involves minimal configuration with document level and field indexing specification required.

Boolean/ bracketed Boolean


IDOL server accepts simple or complex Boolean and bracketed Boolean expressions and returns a list of matching documents. Boolean expressions can be formed using a range of Boolean and proximity operators: AND NOT OR XOR / EOR NEAR DNEAR WNEAR BEFORE AFTER

Native XML Output


Users can specify in which output format they require information, i.e. if they dont specify the XML output, the default template is used.

Multiple XML Schema Support


Multiple simultaneous schema support - This feature enables you to index multiple XML sources with varying XML schemas (tag names/hierarchies) into IDOL server. IDOL servers intelligence will perform conceptual analysis across all the different schemas. Users have the option to specify the output format of information.

Exact Phrase
Provides the ability to search for exact phrases by putting quotation marks around a string of words. For example, world market.

Automatic XML Tagging


IDOL server can automatically XML tag any form of unstructured information based on the same process used for tag reconciliation.

Fuzzy Queries
If a search string is not quite accurate (for example, if it contains spelling mistakes) a fuzzy query returns results that contain words that are similar to the entered string. (Note that you need to enable fuzzy queries before you can use them).

22. Retrieval - Parametric


Advanced Parametric Refinement is used to provide an improved user experience coupled with increased productivity via an advanced real-time information discovery process. Real-time navigation across multiple taxonomies is supported with no additional manual configuration necessary, including full access to intersections of diverse taxonomy definitions.

Proximity Search
IDOL server returns documents in which specific terms occur within a given proximity with a higher weighting.

Soundex Keyword Search


If the spelling of a keyword is not quite accurate but phonetically correct, a Soundex keyword search returns results that contain the keyword and phonetically similar keywords (using a configurable Soundex algorithm).

Technical Brief
From among the complete set of field names present within the corpus, a subset of fields can be defined in the servers configuration as of type 'Parametric'. These fields are known as 'parametric' fields. Once indexed, IDOL server will create and store a structure containing information about all 'tag-value' pairs that occur within defined parametric fields ('tag-value's' are defined where a field contains a textual or numerical value and the field name is considered paired to its textual value). The user may then query IDOL server with the name of a parametric field or fields. IDOL server returns a list of all textual values that appear within the given field or fields within the documents stored in the server. This underlying operation can be used to power a user interface that enables a user to gradually refine the scope of query from a complete corpus to the subset of documents that contain information pertinent to the user's current enquiry.

DiSH / Dashboard
The Autonomy Service Dashboard, is an intuitive stand-alone front-end web interface that allows administrators to manage all Autonomy modules/ services running locally or remotely. The Dashboard communicates with one or more Autonomy Distributed Service Handler (DiSH) modules that provide the back-end process for monitoring and controlling all the Autonomy child services.

23. Retrieval - Federated


Submit queries to a selection of third party search engines in addition to IDOL server.

The Autonomy Service Dashboard provides central control.

DiSH servers administration


View the DiSH servers in enterprise Display DiSH server information (version, ports, status, start time etc.) Add and remove DiSH servers to / from the dashboard Edit the DiSH servers View DiSH servers' configuration, license information and logs.

Additional functionality
Sentient Architecture
IDOL server's sentient architecture delivers on the concept of autonomic computing for companies worldwide. Global predictive self management abstracts the need for an administrator, for example, by dynamically throttling IDOL's connector layer to available bandwidth and a target site's responsiveness together with the ability to predict windows of opportunity for faster collection based on prior usage patterns. This ability to support distributed architectures, identify potential problems and prompt a real-time, dynamic substitution enables companies to keep systems entirely operational for users at all times. IDOL's sentient architecture presents a robust solution for large, geographically dispersed, multinational enterprises who seek to make all their information assets readily available.

Services administration
View child services Display child service information (version, ports, status, start time etc.) Add and remove child services Edit child services Configure child services View child service's logs.

Control of services
Start, stop or pause or restart child service Set up KeepAlives to ensure continuous service.

Failover / Distribution
Uninterrupted service is ensured through Failover. If IDOL server should fail at any point, it is automatically restarted, ensuring a stable system.

Monitoring services
Track service processing of documents Automatically audit child service Generate graphs for a child service's audit data.

Automatic Language Detection


IDOL server can detect the language and encoding of documents that it processes automatically. This allows you to set up processes that are automatically applied to documents or document metadata if they are in a specific language. For example, if a document is identified as Chinese, the appropriate preliminary linguistic tools are automatically applied to it.

Alerting
Allows setup of an email alert triggered by any statistic Configuration of an alert triggered when certain statistics values move outside a predefined range Configuration of a periodic email alert containing status summary reports.

Technical Brief

User Interfaces
Retina
IDOL server 5 includes Autonomy Retina, a web interface application that provides a full spectrum of retrieval methods, from simple keyword search to sophisticated conceptual matching. Adjusting to the user's experience and proficiency, Retina not only offers basic legacy search methods but also leverages them through Autonomys unique patternrecognition technology. Please refer to the Retina Technical Brief for further details.

Requirements Platforms Supported:


Microsoft Windows NT4, 2000, XP and 2003 Linux (all versions) kernel 2.2, 2.4 and 2.6 Sun Solaris for SPARC version 5 - 9 Sun Solaris for Intel version 9 AIX version 4.3, 5 and 5.1 HP-UX for PA-RISC version 10, 11 and 11i HP-UX for Itanium version 11i Tru64 version 5.1 Other POSIX compliant UNIX versions are available on request.

Portlets
Autonomy provides a wide range of Portlets that offer userfriendly platforms from which IDOL server operations can be intuitively executed. Autonomy Portlets are available as part of the Autonomy Portal-in-a-Box solution or for integration with a number of market-leading third party Portals. Please refer to the Portlets Technical Brief for further details.

Minimum Server Specifications:


Dual Intel Xeon 1.8 Ghz 1 GB RAM 30 GB hard disk recommended

For specific sizing requirements, please consult the Autonomy Sizing Service.

Architecture

Autonomy Inc. One Market Plaza, 19th Floor, Spear Tower, San Francisco, CA 94105 Tel: 415 243 9955 Fax: 415 243 9984 Email: info@us.autonomy.com

Autonomy Systems Ltd Cambridge Business Park Cowley Road Cambridge CB4 0WZ Tel: +44 (0) 1223 448 000 Fax: +44 (0) 1223 448 001 Email: autonomy@autonomy.com

Other Offices Autonomy has additional offices in Boston, Dallas, Chicago, Washington and New York,as well as in Amsterdam, Beijing, Diegem, Hamburg, Madrid, Milan, Munich, Oslo, Paris, Rome, Singapore, Stockholm and Sydney.

Copyright 2005 Autonomy Corp. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners. Product specifications and features are subject to change without notice. Use of Autonomy software is under license.

www.autonomy.com