Вы находитесь на странице: 1из 54

File System Fetch

version 4.2.x

Administrators Guide

Information in this document is subject to change without notice. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express permission of Autonomy Systems Ltd. Windows is a trademark of Microsoft Corp., UNIX is a trademark of X/OPEN Ltd.

Copyright 2004 Autonomy. All rights reserved.

IDOL server and File System Fetch are trademarks of Autonomy Systems Ltd.

Table of Contents
Preface.....................................................................................................................................i Autonomy ........................................................................................................................ i Contact ........................................................................................................................... ii Downloading manual updates from Automater ............................................................. iii Typographical conventions ............................................................................................ iii Related documentation ................................................................................................. iv 1. Autonomy infrastructure .............................................................................................1 IDOL server ....................................................................................................................3 Connectors .....................................................................................................................3 Interfaces ........................................................................................................................3 Distributed systems ........................................................................................................3 Administration .................................................................................................................4 PODS .............................................................................................................................4 Data flow and security ....................................................................................................5 Introduction ..................................................................................................................7 System architecture ........................................................................................................8 Controlling internal file import .........................................................................................9 Installation ..................................................................................................................11 System requirements ...................................................................................................11 Implementation procedure ............................................................................................12 Installing File System Fetch on Windows .....................................................................13 Directory structure: Windows ..................................................................................15 Installing File System Fetch on UNIX ...........................................................................17 Directory structure: UNIX ........................................................................................19 Configuring File System Fetch .................................................................................21 Displaying help on configuration settings .....................................................................21 Modifying configuration parameter values ....................................................................22 Configuration file sections ............................................................................................23 [License] section .....................................................................................................23 [Service] section .....................................................................................................24 [Server] section .......................................................................................................24 [Default] section ......................................................................................................24 [Configuration] section ............................................................................................25 [<AFetchJob>] section ............................................................................................26 Example configuration file ......................................................................................27 Importing PST files .....................................................................................................29

2.

3.

4.

5.

6.

Importing individual files .......................................................................................... 31 Displaying online help ............................................................................................ 31 Action command syntax ......................................................................................... 32 Starting and stopping File System Fetch ................................................................ 33 Starting File System Fetch ........................................................................................... 33 Stopping File System Fetch ......................................................................................... 34

7.

Appendix A: Service port commands................................................................................ 35 GetConfig ..................................................................................................................... 36 GetLogStream .............................................................................................................. 36 GetLogStreamNames .................................................................................................. 37 GetStatistics ................................................................................................................. 37 GetStatus ..................................................................................................................... 38 GetStatusInfo ............................................................................................................... 38 MergeConfig ................................................................................................................ 39 SetConfig ..................................................................................................................... 41 Stop .............................................................................................................................. 41 Glossary ............................................................................................................................... 43 Index ..................................................................................................................................... 45

Preface
Autonomy
Autonomy employs a fundamentally different and unique combination of technologies to enable computers to form an understanding of a page of text, web pages, emails, voice, documents and people. Autonomy's solution is therefore able to power any application dependent upon unstructured information within every market sector, including: e-commerce, customer relationship management, knowledge management, enterprise information portals and online publishing applications. This is evidenced by the significant penetration of the technology in a diversity of vertical markets and has been achieved principally because every market sector needs to manage and leverage the benefits of unstructured information.

Autonomy was founded in 1996 and has offices in Boston, Chicago, Dallas, San Francisco, New York, and Washington, D.C. in the United States, as well as offices throughout EMEA, including Amsterdam, Brussels, Cambridge, Frankfurt, Milan, Paris, Oslo, and Sydney. In July 1998, the company went public on the EASDAQ exchange (EASDAQ:AUTN). Autonomy floated on The NASDAQ National Market (NASDAQ: AUTN) in May 2000, and on the London Stock Exchange (LSE: AU.) in November 2000.

Contact
To contact Autonomy, please get in touch with your nearest location listed below.

Europe and South Pacific Autonomy Systems Ltd. Cambridge Business Park Cowley Road Cambridge CB4 0WZ Help Desk: Switchboard: Fax: Email +44 (0) 800 0 282 858 +44 (0) 1223 448 000 +44 (0) 1223 448 001 for information: for support: Website: www.autonomy.com autonomy@autonomy.com uksupport@autonomy.com

The Help Desk operates from 9.30 am to 6.00 pm (GMT) Monday to Friday.

USA Autonomy Inc. One Market Spear Street Tower San Francisco CA 94105 Help Desk: Switchboard: Fax: Email +1 877 333 7744 +1 415 243 9955 +1 415 243 9984 for information: for support: Website: www.autonomy.com info@us.autonomy.com support@us.autonomy.com

The Help Desk operates from 9.30 am to 6.00 pm (CST) Monday to Friday, toll-free.

ii

Downloading manual updates from Automater


To assist you in utilizing the benefits that Autonomys solutions offer you, Autonomy provides free downloads of the latest available documentation. To download documentation updates: 1. Enter the following URL in your web browser's Address field: http://automater.autonomy.com 2. 3. 4. Enter your Username and Password, and click on the Login button. Click on the Download menu option. Under the Documentation and Release Notes heading, click on the Click here link, then click on the Manuals folder to display the latest available manual versions. You can display any of the manuals in your browser and download them.

Note: the manual's version number (for example, version 4.1.x) corresponds to the product version. The last number of the product version has been replaced with an x for all manuals as this number relates to minor product releases that have no effect on the documentation. If a manual has a revision number (for example revision 5), it indicates that this manual has been revised since it was first released. Automater always contains the latest available revision of all manuals.

Typographical conventions
Autonomy documentation uses the following typographical conventions. Formatting convention: Bold type Type of information: References to any following: Courier font <text> Interface options (for example, menus or buttons) Actions Parameters

Configuration examples A string that needs to be replaced with a personal setting. For example <port> indicates that you have to specify a port number, [<MySection>] indicates that you have to specify a section name and so on. Note that this only applies where this does not explicitly refer to XML. Another exception are instructions for writing ACI templates (an appendix to product manuals where this is applicable) where personal settings are indicated by Italic type.

iii

Preface

Related documentation
You should use the File System Fetch manual in connection with the following:

Import Module manual


Autonomys Import Module is an integral part of any Autonomy connector. The Import Module manual provides information on how you can configure the settings that determine how content is treated during the importing process (before it is passed to IDOL server).

IDOL server manual


IDOL server lies at the center of any Autonomy infrastructure, storing and processing the data that connectors index into it. The IDOL server manual describes the operations that IDOL server can perform with detailed descriptions of how to set them up.

DIH manual
The DIH (Distributed Index Handler) manual contains details on how you can use a DIH to distribute aggregated documents across multiple IDOL servers.

Best Practices Guide


The Best Practice Guide provides useful hints and tips on setting up and configuring Autonomy solution as well as examples on how to combine multiple products effectively.

IAS manual
The IAS manual contains details on how you can use Autonomys Intelligent Asset Protection System (IAS) to ensure secure access through authentication and role permissions.

DiSH manual
The DiSH (Distributed Service Handler) manual contains details on how you can use a DiSH server to administer and control multiple Autonomy services.

Online help
The online help details the actions and configuration settings that are available for File System Fetch. Please refer to Displaying help on configuration settings on page 21 and to Displaying online help on page 31 for details on how to display help.

iv

1. Autonomy infrastructure
"Today, 80% of business is conducted on unstructured information." Gartner Group "85 per cent of all data stored is held in an unstructured format." Butler Group "Unstructured data doubles every three months." Gartner Group

Information that you need in order to conduct business successfully comprises the following types:

In the past companies could only make use of 20% of the information that was relevant to them. In order to deal with this information they used keyword search engines, tagging schemes, collaborative filtering or linguistic methods. These methods were not only costly and time-inefficient but also nonscalable, inaccurate and taking the focus from core business. 80% of relevant information could not be utilized.

Page 1

Autonomy infrastructure Autonomy's software infrastructure allows you to utilize 100% of the information that is relevant to you. It automates all the business processes that formerly had to be dealt with manually. By developing a patented combination of Bayesian Inference, Shannon's information theory and pattern matching, Autonomy has enabled computers to understand unstructured, structured and semistructured information. This means that Autonomy's software infrastructure solves a fundamental problem that affects every industry, and can be used in virtually any application that handles unstructured information:

E-Commerce CRM Knowledge Management Business Intelligence Enterprise Information Portals Online Publishing

Autonomy's software infrastructure is fully scalable and allows you to process information:

automatically in real time in any language

Page 2

Autonomy infrastructure

IDOL server
Using Autonomy connectors, Autonomy's Intelligent Data Operating Layer (IDOL) server integrates unstructured, semi-structured and structured information from multiple repositories through an understanding of the content, delivering a real time environment in which operations across applications and content are automated, removing all the manual processes involved in getting the right information to the right people at the right time.

Connectors
Connectors enable automatic content aggregation from any type of local or remote repository (for example, a database, a web site, a real-time telephone conversation etc.), forming a unified solution across all information assets within the organization.

Interfaces
Portlets are windows that can be set up in Autonomy's Portal-in-a-Box or third party portals. Each portlet contains an application that allows the portals' end users to benefit from a variety of IDOL server functionality. Retina, an easy-to-use web interface application that provides a full scale of retrieval methods that adjust to the individual users proficiency. Autonomy Desktop Suite brings the power of Autonomy to every desktop. Conducting a realtime analysis of the ideas involved in the content of any opened desktop application, Desktop Suites ActiveKnowledge or Active Windows Extensions module provides real-time links to relevant internal and external information without the user being needlessly diverted from his work in progress to perform an exasperating search or retrieval operation.

Distributed systems
Autonomys distribution solutions facilitate linear scaling of systems through faster command execution and reduction of processing time DAH (Distributed Action Handler) enables the distribution of ACI (Autonomy Content Infrastructure) action commands to multiple Autonomy IDOL servers, providing failover and load balancing. DIH (Distributed Index Handler) enables distributed indexing of documents into multiple Autonomy IDOL servers, providing failover and load balancing.

Page 3

Autonomy infrastructure

Administration
DiSH (Distributed Service Handler) provides crucial maintenance, administration, control and monitoring functionality for the Autonomy infrastructure. DiSH delivers a unified way to communicate with all Autonomy services such as connectors, DIH, DAH and so on from a centralized location Autonomy Service Dashboard is a stand-alone web application that allows administrators to manage all Autonomy modules /services running locally or remotely. The Dashboard communicates with the Distributed Service Handler (DiSH) module that is the back end process for monitoring and controlling all the Autonomy child services. Autonomy Service Dashboard provides the administrator with a list of all child services that DiSH is monitoring, together with control buttons and status information.

PODS
Autonomys Product Orientated Drop-in Solutions allow Autonomy solutions to be easily integrated with third party applications and solution providers. PODS enable organizations to make their existing applications compatible with IDOL with minimal configuration and administration requirements. Making IDOL server a part of any solution delivers the direct benefits of content automation and the ability to perform a vast range of IDOL server operations, irrelevant of file format or location.

Page 4

Autonomy infrastructure

Data flow and security

Page 5

Autonomy infrastructure Aggregation & Distribution Connectors aggregate content from various repositories and index it into IDOL server or, if the content needs to be distributed across multiple IDOL servers, a DIH (Distributed Index Handler).

Querying & Distribution User queries are sent from a front end directly to IDOL server or distributed to multiple IDOL servers using the DAH (Distributed Action Handler).

Distributed Administration The DiSH (Distributed Service Handler) enables administrators to maintain, configure and control multiple Autonomy services via the Autonomy Service Dashboard, a front-end web interface.

Security The Autonomy IAS (Intellectual Asset Protection System) ensures secure access through authentication and role permissions. When a user logs on to a front end (for example, Retina or a 3rd party portal) his authentication details are sent to IDOL server which returns the user's security details to the front end, where they are stored until the user logs off or his session times out. Every time the user issues a query, his security details are attached to the query string that is sent to IDOL server. The group servers store the user group information of repositories that store users in groups. This allows the front end to quickly retrieve user security information from the group servers, and send the query and the user's security information to IDOL server in order to check if the user is permitted to view result documents before they are displayed to the user. When a user queries IDOL server through the front end, his security information is retrieved from the appropriate group server and sent with his query to IDOL server. IDOL server passes the user's security details to the security libraries for the data repositories that contain result documents for the user's query. The security libraries then check the user's security details against the ACLs for the documents that match the query. If the user is entitled to view a document, it is returned as a result to the front end.

Page 6

2. Introduction
File System Fetch is an Autonomy connector that automatically aggregates documents from file systems on local or network machines, imports them into IDX or XML file format (only IDX or XML files can be indexed in IDOL server) and indexes them into an Autonomy IDOL server. Once IDOL server receives the documents, it automatically processes them, performing a number of intelligent operations in real time, for example: Agents Alerting Categorization Channels Clustering Collaboration Dynamic Thesaurus Expertise Hyperlinking Mailing Profiling Retrieval Spelling Correction Summarization Taxonomy Generation

Please refer to your IDOL server manual for further details.

Page 7

Introduction

System architecture
File System Fetch aggregates documents from any type of local or remote repository and indexes them into an IDOL server:

If you want to distribute the documents that File System Fetch aggregates across multiple IDOL servers, you need a DIH (Distributed Index Handler) installation. In this case File System Fetch aggregates documents from any type of local or remote repository and indexes them into the DIH which then distributes the documents between the IDOL servers it connects to, providing Load Balancing and Failover:

Page 8

Introduction

Controlling internal file import


You can use the PollingMethod configuration setting to control how File System Fetch imports files (see Displaying help on configuration settings on page 21): File Polling File System Fetch reads a specified text file that lists a set of documents with their file path. It then finds, imports and indexes these documents. File System Fetch creates a .pos file for each of the text files from which it reads which files it should process. The .pos file stores the current position in the queue of files that File System Fetch is processing, and is located alongside the text file from which File System Fetch reads which files it should process. If you want to stop the File System Fetch and restart its process from scratch, you should delete all .pos files and .pos.bak files from the File System Fetch directory. If you don't File System Fetch will refer to them when it is restarted and carry on its process from where it stopped. If you want to reprocess the last file that File System Fetch dealt with, you can replace the contents of the .pos file with the contents of the .pos.bak file, which is a copy of the .pos file before the last file was processed.

Directory Polling File System Fetch imports and indexes any file that is contained in a specified directory, provided the file meets the criteria that you have specified. File System Fetch creates a <InstallationName>.dirstatn file for each of the jobs that it carries out. This file contains a list of all the files that have been processed and is stored in the File System Fetch installation directory. If you want to stop File System Fetch and restart its process from scratch, you should delete the <InstallationName>.dirstatn and<InstallationName>.dirstatn.bak files. If you don't, File System Fetch will refer to them when it is restarted and carry on its process from where it stopped. If you want to reprocess the last file that File System Fetch dealt with, you can replace the contents of the <InstallationName>.dirstatn with the contents of the <InstallationName>.dirstatn.bak file, which is a copy of the <InstallationName>.dirstatn file before the last file was processed. File System Fetch automatically processes any new files that appear in the specified DirectoryPathCSVs directory. You should therefore ensure that no application will create temporary files in this directory.

Every time new files are added to the list file or the directory from which File System Fetch is reading, it processes them automatically. Use the <InstallationName>.log file (located in the File System Fetch installation directory) to keep track of all actions that File System Fetch performs.

Page 9

Introduction

Page 10

3. Installation
System requirements
File System Fetch should be installed by the system administrator as part of a larger Autonomy system (that is a system that includes Autonomy IDOL server and an interface for the information stored in IDOL server).

Supported platforms
Microsoft Windows NT4, 2000 and XP Linux Solaris Note: File System Fetch also supports other POSIX UNIX versions on request.

Minimum server specification


Windows 200 MHz Pentium processor 128 MB RAM 200 MB hard disk

UNIX 128 MB RAM 200 MB hard disk

Note: this specification is dependent on the amount of data to be fetched. Due to substantially different disk usage patterns it is beneficial to run fetch and IDOL server processes on separate drives or partitions.

Page 11

Installation

Implementation procedure
You can use the following implementation procedure to testrun your File System Fetch installation: 1. Install File System Fetch: Run the installer (see Installing File System Fetch on Windows on page 13). When the IDOL server Details dialog is displayed, enter xxx in the Host field. This stops File System Fetch from indexing files into IDOL server (after they have been imported) and forces it to store them in the main installation directory instead. When the File System Fetch Services dialog is displayed, uncheck the box to ensure that File System Fetch does not start immediately. 2. 3. 4. 5. Open the File System Fetch configuration file in a text editor, and set the PollingPeriod parameter to 0 in order to ensure that File System Fetch cycles only once. Navigate to the data directory in your File System Fetch installation, and place a Word document that you want to index into IDOL server into this directory. Display the Windows Services dialog and start File System Fetch. File System Fetch cycles only once. Wait until it has completed its cycle. You can check the <InstallationName>.log file in the File System Fetch installation directory in order to see when the cycle is finished. (Note that because you have set IP Address to xxx, the <InstallationName>.log file will state that the indexing command failed). Display the Windows Services dialog, and stop File System Fetch. In the installation directory, open the <MyJob>.tmp.queued.idx file in a text editor and check that it contains all the content that you want to index into IDOL server. If it doesnt, you need to configure File System Fetch to aggregate the content you want. You can do this using specialized File System Fetch and Import Module settings (please refer to your online help for details on available settings). Once you have made changes to the File System Fetch configuration file, delete the <InstallationName>.dirstat0 and <InstallationName>.dirstat0.check files that File System Fetch has created in its installation directory (this allows File System Fetch to repeat the cycle), start File System Fetch and repeat steps 2-6 until you are happy with the content of the <MyJob>.temp.queued.idx file. Finalize your File System Fetch configuration: Open the File System Fetch configuration file in a text editor. Set the PollingPeriod parameter to an appropriate number (for example, 86400000 if you want File System Fetch to run every 24 hours). Set the DREHost parameter to the IP address (or name) of the machine that hosts your IDOL server. Set up the Fetch jobs that you want File System Fetch to execute (see [Configuration] section on page 25). 10. You can now run File System Fetch.

6. 7.

8.

9.

Page 12

Installation

Installing File System Fetch on Windows


To install under Windows insert the File System Fetch CD-ROM into your CD-ROM Drive. If your Windows installation is configured to support it, inserting the CD-ROM will automatically start the File System Fetch installation program. Otherwise you can start the installation by double-clicking on the File System Fetch-4.2.x_NT.EXE program in the root directory of the CD-ROM through Windows Explorer. Read and follow all installation instructions on the screen carefully. Before the installation program can start to copy files onto your PC, you need to provide it with some information:

1. 2.

The installation opens with the Welcome dialog. Read the text, and click on Next. The License agreement dialog is displayed. Read the license agreement and click on Next to accept it.

3.

The Installation Name dialog is displayed. Enter a unique name for the File System Fetch installation, and click on Next. Note that the unique name must not contain any spaces

4.

The Choose Destination Location dialog is displayed. Select the directory in which you want to install File System Fetch, and click on Next. By default this is C:\Autonomy\FileSystemFetch, but you can use the Browse button to navigate to another location.

5.

The Select Program Manager Group dialog is displayed. Select the Program Manager group to which you want to add icons for File System Fetch, and click on Next.

6.

The IDOL server Details dialog is displayed. Enter the following information for the IDOL server you want File System Fetch to index into, and click on Next: IP Address The IP address (or name) of the machine on which IDOL server is running. Index Port The port that is used to index documents into IDOL server (this must be the IndexPort or the ExtendedIndexPort that you have specified in the IDOL server configuration files [Server] section). Database The name of the IDOL server database in which you want to store the documents that File System Fetch aggregates.

Page 13

Installation 7. The File System Fetch Details dialog is displayed. Enter the following for File System Fetch, and click on Next: ACI Port The port File System Fetch listens on for action commands (this sets the Port parameter in the File System Fetch configuration files [Server] section). Service Port The port File System Fetch uses for service commands (see Appendix A: Service port commands on page 35). 8. The File System Fetch Services dialog is displayed Leave the box checked, if you want to start the File System Fetch service immediately after the installation, and click on Next. Otherwise, uncheck the box to complete the installation without immediately starting the File System Fetch service. 9. The MS Outlook (PST file) processing dialog is displayed. Check the PST file processing box if you want File System Fetch to be able to aggregate Outlook items (appointments, contacts, notes, tasks, messages and attachments) that are contained in PST files, and click on Next. 10. The Start Installation dialog is displayed. Click on Next to confirm the settings you have made and start the installation of File System Fetch. Alternatively, click on Back to return to previous dialogs if you want to make any changes. 11. The Installing dialog is displayed. The progress of the installation process is indicated. If you want to abort the installation process, click on Cancel. 12. The Add Shortcuts dialog is displayed. Select Yes or No to indicate whether you want to add shortcuts to the File System Fetch service to your Start menu, and click on Next. 13. The Installation Complete dialog is displayed. File System Fetch has been installed successfully. Click on Finish to exit the installation. If you selected to start the File System Fetch service immediately after the installation, it will now launch.

Page 14

Installation

Directory structure: Windows


The following files and folders are created in the installation directory when you install File System Fetch (note that folders are shown in bold):

convtables data filters binslave.cfg binslave.exe importslave.exe omnislave.cfg omnislave.exe pdfslave.cfg pdfslave.exe various DAT files various DLL files importTemp pstslave redemption.dll pstslave.cfg pstslave.exe <InstallationName>.cfg <InstallationName>.exe INSTALL.LOG Uninstall.exe

Folder that contains various text files that are used for language conversion. Folder from which File System Fetch aggregates data by default. Folder that contains executables that are used during the importing process. Configuration file that contains settings for binslave. Binslave executable (used during the importing process to extract text from binary files). Executable that generates IDX files for IDOL server. Configuration file that contains settings for omnislave. Omnislave executable that parses PDF files not in HTML or PDF format to IDX files. Configuration file that contains settings for pdfslave. Executable that parses PDF files to IDX files. Files used by binslave. Filters used by omnislave. Folder for temporary import data. Folder that contains pstslave files. Library file that is used in the processing of PSt files. Configuration file that contains settings for pstslave. Executable that parses PST files to IDX files. File System Fetch configuration file File System Fetch executable Installation log file Executable to uninstall File System Fetch from your computer

Page 15

Installation In addition, the following folder and files are created when you start the File System Fetch service: queue uid <installation_name>.dirstat0 <installation_name>.dirstat0.bak <installation_name>.lck <installation_name>.log <installation_name>.str <installation_name>cfg.log license.log service.log Folder that stores queued action commands and the results of queued actions (if you have set the results to be stored). Folder that contains document tracking files. Store of which files from the file system have been indexed by File System Fetch. A DIRSTAT file and backup are created for each File System Fetch job. Internally used lock file for File System Fetch. File System Fetch log file. File System Fetch structured configuration file. File System Fetch configuration log file. License log file. Service commands log file.

Page 16

Installation

Installing File System Fetch on UNIX


1. 2. Copy the File System Fetch installer from the CD to your local disk. Uncompress the installer using the command: uncompress <Installer>.tar.Z 3. Un-tar the resulting file using the command: tar -xvf <Installer>.tar This creates a subdirectory called FileSystemFetch-4.2.x, which contains the following files: LICENSE.TXT and Setup.sh; and the subdirectory: File System Fetch 4. 5. Enter the command cd FileSystemFetch-4.2.x to move to the subdirectory. Run the installer script, ./Setup.sh. The Welcome text is displayed. Press v to read the license agreement. When you have finished, press y to accept the agreement and continue with the installation. 6. The Installation Actions dialog is displayed. Enter 1 to continue the File System Fetch installation. Enter 2 to cancel the installation. 7. 8. 9. Enter a name for your File System Fetch installation and press Enter. By default this is File System Fetch. Enter the full path for the location in which you want to install the File System Fetch files, and press Enter. By default this is Autonomy/<installation_name>. Enter the following value for your File System Fetch installation: ACI port The port File System Fetch listens on for action commands (this sets the Port parameter in the File System Fetch configuration files [Server] section). Directory The directory from which you want File System Fetch to import content. The default is the data directory that the installer creates in your File System Fetch installation directory. 10. Enter the following details for the IDOL server that you want File System Fetch to index content into: IP address The IP address (or name) of the machine on which IDOL server is running. Query port The port by which queries are sent to IDOL server. Index port The port number used to index content into IDOL server.

Page 17

Installation 11. The Autonomy File System Fetch Installation text is displayed. Check that your settings are correct, and press Enter to confirm your settings and to install File System Fetch. If you want to change a setting, enter the corresponding number, press Enter and then enter a new value for the setting. Alternatively, type X or press Ctrl+C to cancel the installation. 12. The Installation complete dialog is displayed. You have successfully installed File System Fetch. Press Enter to finish.

Page 18

Installation

Directory structure: UNIX


The following files and folders are created in the installation directory when you install File System Fetch (note that folders are show in bold): convtables data filters binslave.cfg binslave.exe importslave.exe omnislave.cfg omnislave.exe pdfslave.cfg pdfslave.exe various DAT files various SO files importTemp <InstallationName>.cfg <InstallationName>.exe Start.sh Stop.sh Uninstall.sh Folder that contains various text files that are used for language conversion. Folder from which File System Fetch aggregates data by default. Folder that contains executables that are used during the importing process. Configuration file that contains settings for binslave. Binslave executable (used during the importing process to extract text from binary files). Executable that generates IDX files for importing into IDOL server. Configuration file that contains settings for omnislave Omnislave executable that parses PDF files not in HTML or PDF format to IDX files Configuration file that contains settings for pdfslave Executable that parses PDF files to IDX files Files used by Binslave Filters used by Omnislave Folder for temporary import data File System Fetch configuration file File System Fetch executable Start script for File System Fetch Stop script for File System Fetch Script to uninstall File System Fetch

Page 19

Installation In addition, the following folder and files are created when you start the File System Fetch service: queue uid <installation_name>.dirstat0 <installation_name>.dirstat0.bak <installation_name>.lck <installation_name>.log <installation_name>.str <installation_name>.log Folder that stores queued action commands and the results of queued actions (if you have set the results to be stored) Folder that contains document tracking files. Store of which files from the file system have been indexed by File System Fetch. A DIRSTAT file and backup are created for each File System Fetch job. Internally used lock file for File System Fetch File System Fetch log file File System Fetch structured configuration file File System Fetch configuration file

Page 20

4. Configuring File System Fetch


The settings that determine how File System Fetch operates are contained in the <InstallationName> configuration file, which is located in your installation directory. You can modify these settings in order to customize File System Fetch according to your requirements.

Displaying help on configuration settings


For details on the settings that the individual configuration file sections can contain and on how you can configure them, please refer to the File System Fetch online help.

To display the online help 1. Issue the following command from your web browser: http://<host>:<port>/action=Help

<host> Enter the IP address (or name) of the machine on which File System Fetch is installed. <port> Enter the port number that client machines use to communicate with File System Fetch (this is specified by the Port setting in the File System Fetch configuration file's [Server] section).

2.

Click on the config help link in the top right-hand corner to display the configuration parameter help (by default the action command help is displayed). Note: the configuration file sections that each configuration parameter can be used in are listed under Allowed in Sections.

Note: You can also generate configuration help without starting File System Fetch. Issue the following command from the command line to generate html files in your installation directory: <FileSystemFetch_installation_directory_path><IDOLserver_installation_name>.exe -help

Page 21

Configuring File System Fetch

Modifying configuration parameter values


Entering Boolean values For parameters that require Boolean settings the following settings are interchangeable: TRUE = true = ON = on = Y = y = 1 FALSE = false = OFF = off = N = n =0

Entering string values If the value that you want to enter for a parameter that requires a string contains quotation marks, you must put the value into quotation marks and escape each quotation mark that the string contains by putting a slash in front of it. For example: FIELDSTART0="<font face=\"arial\"size=\"+1\"><b>" Here the beginning and end of the string is indicated by quotation marks while all quotation marks that are contained in the string are escaped.

If you want to enter a comma separated list of strings for a parameter, and one of the strings contains a comma, you must indicate the start and the end of this string with quotation marks. For example: ParameterName=cat,dog,bird,"wing,beak",turtle

If any string within a comma separated list contains quotation marks, you must put this string into quotation marks and escaped the quotation marks in the string by putting a slash in front of them. For example: ParameterName="<font face=\"arial\"size=\"+1\"><b>",dog,bird,"wing,beak",turtle

Applying modifications to File System Fetch's operation New configuration settings only take effect once the File System Fetch service is stopped and restarted.

Page 22

Configuring File System Fetch

Configuration file sections


File System Fetchs configuration file comprises a number of sections, which represent different areas that can be configured. You can configure each area by setting configuration parameters for it. Note that the configuration file sections that each configuration parameter can be used in are listed under Allowed in Sections. The configuration file can contain the following sections:

[License] [Service] [Server] [Default] [Configuration] [<MyJob>]

Note: for import parameters that you can specify in the configuration file's [Default] and [<MyJob>] sections, please refer to the Import module manual.

[License] section
The [License] section contains licensing details. You should not edit this section, as that may cause File System Fetch to stop working. For example: [License] Holder=My Company Key=01234567890

Page 23

Configuring File System Fetch

[Service] section
The [Service] section contains the details that File System Fetch requires, when it is run as a service under Autonomys Distributed Service Handler (DiSH). For example: [Service] ServicePort=10023 ServiceControlClients=127.0.0.1 ServiceStatusClients=127.0.0.1

[Server] section
This section contains general settings for indexing and querying. For example: [Server] Port=7000 QueryClients=10.1.1.*,127.0.0.1 AdminClients=10.1.1.10,127.0.0.1 Threads=2

[Default] section
The [Default] section contains default settings that apply for each Fetch job that is set up in the configuration file (in the individual Fetch job sections). If you configure settings in an individual Fetch jobs section, they override the default settings for this job. Note: in addition to File System Fetch configuration settings, you can also specify Import module settings in this section (or in individual Fetch job sections). Please refer to your Import module manual for details on the Import module. For example: [Default] PollingPostAction=0 PollingAction=7 PollingMaxNumber=1000 DreHost=127.0.0.1 QueryPort=9000 IndexPort=9001 Database=database0

Page 24

Configuring File System Fetch PollingMethod=2 PollingPeriod=10000 RemoveLogFileOnStart=on ImportIDXFilesAction=0 ImportStoreContent=on ImportTempDir=./importTemp ImportSummary=on ImportBreaking=ON ImportBreakingMinParagraphWords=300 ImportBreakingMaxParagraphWords=500 ImportBreakingMinDocWords=500 ImportIntelligentTitleSummary=0 ImportDefaultSlaveDirectory=./filters ImportCharsetConvTablesDirectory=./ConvTables ImportExtractDateFrom=8 ImportExtractDateToField=DREDATE ImportExtractDateToFormat=EPOCHSECONDS

[Configuration] section
The [Configuration] section lists all individual fetch jobs that you want File System Fetch to carry out. Note that you must list the fetch jobs in consecutive order, starting from 0. For example: [Configuration] Number=2 0=MyFirstJob 1=MySecondJob

Page 25

Configuring File System Fetch

[<AFetchJob>] section
An individual fetch jobs section contains settings that only apply to this job. The settings that are set for an individual job override default settings (set in the [Default] section) for this job. Note: in addition to File System Fetch configuration settings, you can also specify Import module settings in this section (or in the [Default] section). Please refer to your Import Module manual for details on the Import module. For example: [MyFirstJob] DirectoryPathCSVs= DirectoryFileMatch=*.txt,*.htm* DirectoryRecurse=on [MySecondJob] DirectoryPathCSVs=./data DirectoryFileMatch=*.* DirectoryRecurse=off

Page 26

Configuring File System Fetch

Example configuration file


[License] Holder=My Company Key=01234567890 [Server] Port=7000 QueryClients=* AdminClients=* Threads=2 [Service] ServicePort=10023 ServiceControlClients=* ServiceStatusClients=* [Default] PollingPostAction=0 PollingAction=7 PollingMaxNumber=1000 // IDOL server settings DreHost=127.0.0.1 QueryPort=9000 IndexPort=9001 Database=database0 PollingMethod=2 PollingPeriod=10000 RemoveLogFileOnStart=on ImportIDXFilesAction=0 ImportStoreContent=on ImportTempDir=./importTemp ImportSummary=on ImportBreaking=ON ImportBreakingMinParagraphWords=300 ImportBreakingMaxParagraphWords=500 ImportBreakingMinDocWords=500 ImportIntelligentTitleSummary=0 ImportDefaultSlaveDirectory=./filters ImportCharsetConvTablesDirectory=./ConvTables

Page 27

Configuring File System Fetch

ImportExtractDateFrom=8 ImportExtractDateToField=DREDATE ImportExtractDateToFormat=EPOCHSECONDS [Configuration] Number=1 0=Import [Import] DirectoryPathCSVs= DirectoryFileMatch=*.txt,*.htm?,*.pdf,*.doc,*.xls,*.ppt DirectoryRecurse=on

Page 28

5. Importing PST files


If you are running File System Fetch on Windows, you can use it to import Outlook items (appointments, contacts, notes, tasks, messages and attachments) that are contained in PST files. You can enable this when you install File System Fetch. Alternatively, you can manually configure File System Fetch to import PST files.

To configure File System Fetch to import PST files: 1. Open the pstslave.cfg file in a text editor, and configure appropriate settings in the [Default] section in order to determine how the pstslave will operate. Note: the settings that you can configure are detailed in the File System Fetch online help (see Displaying help on configuration settings on page 21). 2. 3. Save the changes you have made, and close the configuration file. Find the [Configuration] section, increase the Number setting by 1, and list a new fetch job for PST file importing. For example: [Configuration] Number=3 0=MyFirstJob 1=MySecondJob 2=MyPstFileImportingJob 4. Create a new configuration file section for the new PST file importing fetch job you have listed. For example: [MyPstFileImportingJob] 5. Add appropriate settings for this fetch job to the new section. You must at least specify the following settings: PollingAction Enter 18 to pass the files that this fetch job aggregates to File System Fetchs pstslave for processing. PstSlaveName Enter the name of the pstslave executable that you want to use to process Outlook items (appointments, contacts, notes, tasks, messages and attachments) contained in PST files. By default this is pstslave.

Page 29

Importing PST files PstSlaveDirectory Enter the full path to the directory that contains the pstslave executable that you want to use to process Outlook items (appointments, contacts, notes, tasks, messages and attachments) contained in PST files. By default this is the current working directory. In addition, you can also specify the following settings as well as any other appropriate File System Fetch settings: PstBatchSize PstKeepExtractedFiles PstRootOutputDir Please refer to the online help for details on all available parameters (see Displaying help on configuration settings on page 21). For example: [MyPstFileImportingJob] PollingAction=18 DirectoryPathCSVs=C:\Autonomy\FileSystemFetch\data DirectoryRecurse=off DirectoryFileMatch=*.pst

Page 30

6. Importing individual files


You can import individual files into IDX file format by sending an Import action to File System Fetch. If you have configured appropriate settings in File System Fetchs configuration file, the IDX files that this action produces are also indexed into IDOL server. You can call the Import action from your browser or via the ACI API. Please refer to the File System fetch online help for details on the Import action.

Displaying online help


Enter the following command to display help on File System Fetch action commands:

http://<host>:<port>/action=Help <host> Enter the IP address (or name) of the machine on which File System Fetch is installed.

<port> Enter the ACI port by which commands are sent to File System Fetch (this is specified by the Port setting in the File System Fetch configuration file's [Server] section).

Example:

http://12.3.4.56:4000/action=Help

This command uses port 4000 to request Help on action commands from File System Fetch which is located on a machine with the IP address 12.3.4.56.

Note: to display help on configuration settings, click on the config help link in the top right-hand corner (see Displaying help on configuration settings on page 21).

Page 31

Importing individual files

Action command syntax


File System Fetch can be operated via action commands which you can send from your web browser. The general syntax of these commands is as follows:

http://<host>:<port>/action=<action>&<mandatory_parameters>&<optional_parameters>

<host> Enter the IP address (or name) of the machine on which File System Fetch is installed.

<port> Enter the ACI port by which commands are sent to File System Fetch (this is set by the Port parameter in the File System Fetch configuration file's [Server] section).

<action> Enter the name of the action that you want File System Fetch to execute (for example, Import).

<mandatory_parameters> Enter the parameters that the action that you have specified requires (not all actions require parameters).

<optional_parameters> You can enter optional parameters for the action that you have specified (optional parameters are not available for all actions).

Note: you must separate individual parameters with an ampersand.

Page 32

7. Starting and stopping File System Fetch


Starting File System Fetch
Once you have installed File System Fetch, you are ready to run it. You can do this:

by double-clicking on the <InstallationName>.exe file in your installation directory

using services: 1. 2. 3. Display the Windows Services dialog. Select the <File System Fetch installation name> service, and click on the Start button to start File System Fetch. Click on the Close button to close the Services dialog.

Page 33

Starting and stopping File System Fetch

Stopping File System Fetch


You can stop File System Fetch from running using:

services: 1. 2. 3. Display the Windows Services dialog. Select the <File System Fetch installation name> service, and click on the Stop button to stop File System Fetch. Click on the Close button to close the Services dialog.

the service port: Send the following command to File System Fetchs service port (you need to have specified a service port in the File System Fetch configuration file):

http://<host>:<Service_Port>/action=stop

<host> The IP address (or name) of the machine on which File System Fetch is running.

<Service_Port> File System Fetchs service port (which is specified in the [Service] section of the File System Fetch configuration file).

Page 34

Appendix A: Service port commands


File System Fetch behaves as a standard Autonomy service. If the ServicePort, ServiceStatusClients and ServiceControlClients settings are added to the [Service] section of the File System Fetch configuration file, the service port is enabled and will accept the following standard status and control commands: GetConfig Returns the services configuration file settings. GetLogStream Returns a specific log stream for the service. GetLogStreamNames Returns the names of the log streams that have been set up for the service. GetStatistics Returns statistics for the service. GetStatus Returns the services status (running or stopped). GetStatusInfo Returns status information for the service (for example, the services product name, version number and so on). MergeConfig Merges a configuration file fragment with the services configuration file. This command requires a POST request method. SetConfig Sets the services configuration file. This command requires a POST request method. Stop Stops the service.

Page 35

GetConfig
The GetConfig command returns the services configuration file settings. http://<host>:<port>/action=GetConfig <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.

GetLogStream
The GetLogStream command returns a specific log stream for the service. http://<host>:<port>/action=GetLogStream&Name=<name>&FromDisk=<true/ false>&Tail=<number> <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section. <name> Enter the name of the log stream that you want to return. <true/false> Enter true if you want the log stream to be read from disk rather than from memory. By default this is false. <number> Enter the number of lines that you want to return from the log stream. The lines are read from the top (that is the most recent lines are retuned). Enter -1 to return all entries (this is the default).

Page 36

GetLogStreamNames
The GetLogStreamNames command returns the names of the log streams that have been set up for the service. http://<host>:<port>/action=GetLogStreamNames <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.

GetStatistics
The GetStatistics command returns statistics for the service. http://<host>:<port>/action=GetStatistics <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.

Page 37

GetStatus
The GetStatus command returns the services status (running or stopped). http://<host>:<port>/action=GetStatus <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.

GetStatusInfo
The GetStatusInfo command returns status information for the service (for example, the services product name, version number and so on). http://<host>:<port>/action=GetStatusInfo <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.

Page 38

MergeConfig
The MergeConfig command allows you to merge the File System Fetch configuration file with one or more configuration file sections. Alternatively, you can use it to set or delete individual configuration parameters.

Using MergeConfig to merge a configuration file with one or more configuration file sections If the File System Fetch configuration file already contains a section that has the same name as the section with which it is going to be merged, any settings that only the new section contains are added to the existing section. If the new section contains settings that are already present in the existing section, the new section's settings overwrite the settings of the old section. Note: This command requires a POST request method action=MergeConfig&Config=<configuration_file_content> <configuration_file_content Enter the configuration file content that you want to merge with the content of the File System Fetch configuration file. Note that you must escape the configuration file content.

Using MergeConfig to set individual configuration parameters The MergeConfig command allows you to set one or more configuration parameters. http://<host>:<port>/action=MergeConfig&Key<n>=<param>&Value<n>=<value> <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section. <n> A unique number that identifies which <param> belongs to which <value>. <param> The configuration file section that contains the parameter you want to set, and the parameter whose value you want to set. Note that you need to specify this using the format: <config_file_section>/<parameter_name>

Page 39

<value> The value that you want to set for the corresponding <param>.

For example: http://1.23.45.6:10000/action=MergeConfig&Key0=Server/ QueueCleanSeconds&Value0=30& Key1=Default/DirectoryRecurse&Value1=true In this example, the MergeConfig command is used to set the value of the QueueCleanSeconds parameter in the configuration files [Server] section to 30, and to set the value of the DirectoryRecurse parameter in the configuration files [Default] to true.

Using MergeConfig to delete individual configuration parameters The MergeConfig command allows you to delete one or more configuration parameters. http://<host>:<port>/action=MergeConfig&DeleteKey<n>=<param> <host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section. <n> A unique number for each <param> you want to delete. <param> The configuration file section that contains the parameter you want to delete, and the parameter you want to delete. Note that you need to specify this using the format: <config_file_section>/<parameter_name>

For example: http://1.23.45.6:10000/action=MergeConfig&Key0=Default/ StableCheckMinWaitTime&Key1=UserEm ail/RunMailer In this example, the MergeConfig command is used to delete the DeleteAfterAdd parameter from the configuration files [Default] section.

Page 40

SetConfig
The SetConfig command allows you to set the File System Fetch configuration file. Note: this command requires a POST request method action=SetConfig&Config=<configuration_file_content> <configuration_file_content Enter the configuration file content with which you want to overwrite the current content of the File System Fetch configuration file. Note that you must escape the configuration file content.

Stop
The Stop command stops the service http://<host>:<port>/action=Stop

<host> The IP address (or name) of the machine that hosts the service. <port> Enter the ServicePort that you have specified in the File System Fetch configuration files [Service] section.

Page 41

Page 42

Glossary
Connector
A Connector is an Autonomy fetching solution (for example HTTPFetch, File System Fetch and so on) that allows you to retrieve information from any type of local or remote repository (for example, a database or a web site). It imports the fetched documents into IDX or XML file format and indexes them into IDOL server from where you can retrieve them (for example by sending queries to IDOL server).

Database
An Autonomy database is a data pool that is contained within IDOL server. You can retrieve information that has been indexed into IDOL server from the database, for example, through submitting a query to IDOL server.

DiSH (Distributed Service Handler)


The Distributed Service Handler provides a unified way to communicate with all Autonomy services from a centralized location. It also facilitates the licensing that enables you to run Autonomy solutions. You must have a running Autonomy DiSH server that resides on a machine with a static known IP address.

Fetching
The process of downloading documents from the location they are stored in (for example a local folder, a website, a database, a Lotus Domino server and so on), importing them to IDX format and indexing them into IDOL server.

IDOL server
Using Autonomy Connectors, Autonomy's Intelligent Data Operating Layer (IDOL) server integrates unstructured, semi-structured and structured information from multiple repositories through an understanding of the content, delivering a real time environment in which operations across applications and content are automated, removing all the manual processes involved in getting the right information, to the right people at the right time.

Importing
After a document has been downloaded from the location it is stored in, it is imported to an IDX file format. This process is called "importing".

Page 43

Indexing
After documents have been imported to IDX file format, their content is stored in IDOL server. This process is called "indexing".

Query
You can submit a natural language query to IDOL server which analyzes the concept of the query and returns documents that are conceptually similar to the query. You can also submit Boolean, bracketed Boolean and keyword searches to IDOL server.

Page 44

Index
A Action commands Help 21, 31 Syntax 32 Administration 4 [<AFetchJob>] section (configuration file) 26 Automater iii Autonomy Data flow and security 5 Infrastructure 1 B Boolean values 22 C Configuration 21 Entering Boolean values 22 Entering string values 22 Example configuration file 27 File sections 23 Modifying configuration parameter values 22 Configuration file [<AFetchJob>] section 26 [Configuration] section 25 [Default] section 24 Example 27 [License] section 23 [Server] section 24 [Service] section 24 [Configuration] section (configuration file) 25 Connector 3, 43 Controlling internal file import 9 D Database 43 [Default] section (configuration file) 24 Directory Polling 9 Directory structure UNIX 19 Windows 15 DiSH (Distributed Service Handler) 43 Displaying Help on configuration settings 21 Online help 31 Distributed systems 3 E Example configuration file 27 F Fetching 43 File Polling 9 File System Fetch Configuration 21 Directory structure 15, 19 Implementation procedure 12 Importing files 31 Installation 11, 13, 15, 17, 19 Introduction 7 Starting and stopping 33 System architecture 8 System requirements 11 G GetConfig (service port command) 36 GetLogStream (service port command) 36 GetLogStreamNames (service port command) 37 GetStatistics (service port command) 37 GetStatus (service port command) 38 GetStatusInfo (service port command) 38 H Help action 21, 31 I IDOL server 3, 43 Action command syntax 32 Data flow and security 5 Online help 31 System architecture 5 Implementation procedure 12

Page 45

Index Importing 43 Import action 31 Individual files 31 Outlook items 29 PST files 29 Indexing 44 Installation 11 On UNIX 17 On Windows 13 Interfaces 3 Introduction 7 L [License] section (configuration file) 23 M MergeConfig (service port command) 39 Modifying configuration parameter values 22 O Online help 21, 31 Outlook items Importing 29 P PODS 4 pos files 9 pos.bak files 9 PST files Importing 29 Q Query 6, 44 S [Server] section (configuration file) 24 Service port commands GetConfig 36 GetLogStream 36 GetLogStreamNames 37 GetStatistics 37 GetStatus 38 GetStatusInfo 38 MergeConfig 39 SetConfig 41 Stop 41 [Service] section (configuration file) 24 SetConfig (service port command) 41 Starting and stopping File System Fetch 33 Stop (service port command) 41 String values 22 System Architecture 8 Requirements 11 T Typographical conventions iii

Page 46