Вы находитесь на странице: 1из 45

A Guide to AutoWeb

Release 2.0




















Memex Technology Limited
2 Redwood Court
Peel Park
East Kilbride G74 5PF
Scotland UK
Tel: +44 (0) 1355 233 804
Fax: +44 (0) 1355 239 676
Web: http://www.memex.com



Copyright 2007 Memex Technology Limited. All rights reserved.

This manual and the software described herein are the copyright of Memex Technology Limited and may not be
copied or disclosed to a third party without the prior written permission of Memex. Whilst all possible care is taken in
the preparation of this manual, Memex assumes no responsibility or liability for any errors or inaccuracies that may
appear in this document. Memex reserves the right to make changes without notice both to this manual and to the
software and hardware it describes.
The software described in this document is furnished under licence and may only be used in accordance with the
terms of such licence.
The people, places, organisations, telephone numbers, vehicle identification numbers and other details referred to in
the sample record data in this publication are entirely fictitious. These details have been created for demonstration
purposes only and do not refer to any actual organisation, telephone number, vehicle, etc., or to any actual person,
living or dead.
The text of this document may include references to previous releases of the product for example, in screenshots
and procedural examples. Regardless of any versions that may be mentioned, this manual describes the current
functionality provided by the release of the software identified on the title page.

Trademarks
Memex, Textract and Total Content Access are registered trademarks of Memex Technology Limited. Microsoft,
PowerPoint and Windows are registered trademarks of Microsoft Corporation. Other product, brand and company
names mentioned herein are trademarks or registered trademarks of their respective owners and should be treated
as such.
















2.0a-5-IJ -AC-20070912-1.6

Contents
Scope............................................................................................................5
Related documents...............................................................................................5
Product names......................................................................................................5
Introduction.................................................................................................7
AutoWeb toolbar...................................................................................................................... 7
AutoWeb server....................................................................................................................... 7
Chapter 1 Installing the AutoWeb server....................................................8
Server components...............................................................................................8
Installation prerequisites...................................................................................10
SFU requirements .................................................................................................................. 10
Installing the server components ......................................................................11
Installing using the auto-installer ............................................................................................ 11
Using the auto-installer on Windows ....................................................................................... 11
Using the auto-installer on Solaris or Linux.............................................................................. 12
Installing using the tar file...................................................................................................... 12
Creating extra databases........................................................................................................ 15
Setting up the AutoWeb configuration file.........................................................15
The default spider.cfg file....................................................................................................... 17
HTTrack options and robots.txt............................................................................................... 17
Upgrading to AutoWeb 2.0.................................................................................18
Unpack the installation package.............................................................................................. 18
Updating the configuration database....................................................................................... 18
Run the upgrade scripts ......................................................................................................... 19
Chapter 2 Installing the AutoWeb client...................................................20
Installing the toolbar..........................................................................................20
Configuring the toolbar........................................................................................................... 20
Configuring the toolbar from the Windows registry................................................................... 21
How the toolbar works........................................................................................................... 22
Memex Analyst forms.........................................................................................23
Installation tasks................................................................................................24
Memex Intelligence Engine..................................................................................................... 24
Memex Patriarch.................................................................................................................... 24
AutoWeb databases for Memex Patriarch................................................................................. 25
Configuration tasks............................................................................................27
Modifying the spider.cfg file.................................................................................................... 27
Linking to the WebConfig database......................................................................................... 27

Memex Technology Ltd A Guide to AutoWeb
Linking to the WebArchive database........................................................................................ 28
Setting up picklists.............................................................................................29
Adding additional web archives .........................................................................29
Chapter 4 Using AutoWeb..........................................................................31
Selecting a Memex database..............................................................................31
Specifying keywords...........................................................................................31
Indexing Web page text.....................................................................................31
Indexing a Web page..........................................................................................31
Viewing indexed pages.......................................................................................32
Monitoring Web sites..........................................................................................33
Specifying the sites you want to monitor...........................................................33
Specifying sites - Memex Patriarch.......................................................................................... 33
Specifying sites Memex Analyst............................................................................................ 34
Fields on the configuration form.............................................................................................. 35
How Web site monitoring works........................................................................37
Stopping getsite.pl .............................................................................................37
Extracting the Web page text.................................................................................................. 38
Appendix A Known limitations...................................................................39
Appendix B Troubleshooting......................................................................40
Appendix C HTTrack options......................................................................41
Appendix D Upgrading to AutoWeb 1.3.....................................................43
Backing up your previous AutoWeb setup.........................................................43
Installing AutoWeb 1.3.......................................................................................43
Converting your AutoWeb data..........................................................................44
Setting up the conversion script.............................................................................................. 44
Running the conversion script................................................................................................. 44


4
Memex Technology Ltd A Guide to AutoWeb
Scope
Thisguideprovidesdetailedinstallationanduserinstructionsforrelease2.0ofAutoWeb.
Thedocumentcontains:
AnoverviewoftheAutoWebapplication
Installationandconfigurationinstructionsfortheclientandservercomponents
Detaileduserinstructions
Informationonknownlimitations
Instructionsonhowtoupgradefromapreviousrelease

Ifyouhaveanycommentsaboutthisguide,pleasecontactMemexCustomerSupport:
support@memex.com

Related documents
ForfurtherinformationaboutthisreleaseofAutoWeb,pleasereadtheAutoWebReleaseNotes.

Product names
ThismanualcontainsreferencestootherMemexproducts.Thenamesofsomeofthese
productswerechangedrecentlyfornewreleasesofthesoftware.Thenamechangesare
showninthefollowingtable.
Current name Previous name Notes
MemexPatriarch IntelligenceManager MemexPatriarchisadesktopclient
application,whereasIntelligenceManager
comprisesadesktopapplicationplus
variousservercomponents.
MemexAnalyst IntelligenceAnalyst
MemexSeriesVI TheIntelligence
Managerbundle
MemexSeriesVIandtheIntelligence
Managerbundlearesetsofcompatible
products.
MemexSeriesVI
Server
TheIntelligence
Managerserver
componentsplusthe
MemexIntelligence
Engine
TheMemexSeriesVIServercomprisesthe
MemexIntelligenceEngineplusvarious
servercomponentsthatsupporttheclient
applications.

5
Memex Technology Ltd A Guide to AutoWeb

Thismanualusesthenameofthecurrentreleaseofthesoftwareunlessspecificallyreferring
toanolderrelease.Unlessstatedotherwise,detailsreferringtoaproductbyitscurrentname
alsoapplytoreleasesoftheproductsthatusedthepreviousname.

Introduction
AutoWebprovidesaneasywaytoextracttextfromaWebsiteandtransferittoaMemex
database.
AutoWebhastwomaincomponents:
AtoolbarthatintegratesintoInternetExplorerandallowsyoutoindexindividualpages
directlyfromthebrowser.
Aserversideprocessthatyoucaneitherrunmanuallyoraspartofacronjob.

AutoWeb toolbar
WhenyouusetheAutoWebtoolbar,youcanchoosetoindexallthetextfromaWebpageor
justindexselectedtext.ThetoolbaralsoallowsyoutospecifytheMemexdatabasewhereyou
wanttoindextheWebpage,andtoenterkeywordsassociatedwiththepage.

AutoWeb server
Theserversideprocessreadsthecontentsofaconfigurationdatabasecontaininginformation
onwhichpagesshouldbeindexed.Theprocessthenmirrors(thatis,storesalocalcopyof)
eachWebpageandcreatesarecordinaMemexdatabase.Themirroredfilesareusedfor
displayingtheWebpageinabrowser.ThedatabaseisusedforretrievingaWebpagebased
onasearchqueryenteredinMemexPatriarchorMemexAnalyst.
Wheneverapageisindexed,eitherfromthetoolbarorfromtheserverprocess,AutoWeb
makesacopyofthepage.Thisenablesyoutoaccesshistoricalcopiesofthepagesyouhave
indexed.
Note AutoWeb is designed to be integrated with Memex Patriarch and Memex Analyst or
Intelligence Manager and Intelligence Analyst if you are using older versions of these
applications. You can use either application to view the configuration and index
records and access the indexed Web pages.

Chapter 1
Installing the AutoWeb server
Server components
ThistableliststhecomponentsthattheAutoWebserverinstallationprocessinstalls.
Name Details
bin/HTTrack HTTr ackisautilitythatisusedtomirrorWeb
pages.
bin/libhttrack.so.1 SharedlibraryforHTTr ack(forSolaris)
bin/lynx Lynxisatextbasedbrowserutilitythatisusedto
extractthetextfromWebpages.
bin/lynx.cfg ConfigurationfilefortheLynxutility.
bin/getsite.pl Thisperlscriptisrunasacronjob.Itlooksatthe
contentsoftheconf i g. dbdatabaseandindexes
anysitesthathavebeensetup.
bin/addtomemex.pl ThisperlscriptiscalledbyanyHTTr ackprocess
thatislaunchedfromgetsite.pl.HTTr ackcallsthis
scripteverytimeitdownloadsafile.Thescriptthen
decideswhattodowiththefileandaddsarecordto
adatabaseifnecessary.
bin/addpagefile.pl ThisperlscriptiscalledbyanyHTTr ackprocess
thatislaunchedfromthefileI ndexPage. pl .
HTTrackcallsthisscripteverytimeitdownloadsa
file.Thescriptthendecideswhattodowiththefile.
cgibin/Bar.pl Thisisacgiscriptforbackwardscompatibilitywith
theoriginalMemextoolbar(Version1.0a).This
controlswhatappearsonthatversionofthetoolbar
andtheactionsthatthetoolbarbuttonsperform.
cgibin/Databases.pl ThisisacgiscriptthatisusedbythenewMemex
toolbar(Version1.0b)todeterminethelistof
databases.
cgibin/IndexPage.pl Thisisacgiscriptthatiscalledwheneverauser
selectsIndexSelectedTextorIndexPage.
8
Memex Technology Ltd A Guide to AutoWeb
Name Details
config.db Thedatabasethatcontainsinformationonwhatsites
getsite.plshouldindex.
databases Thisdirectorycontainsallthedatabaseswherethe
indexedpagesarestored.
dbconfigs Thisdirectorycontainsthedatabaseconfigs.
images/memexbar.bmp Thisbitmapisanimagelistforthetoolbar.
install Theinstallscriptfortheserverinstallation.
mirror ThisdirectorycontainsthemirroredWebpages.
spider.cfg ThisistheconfigfileforAutoWeb.
locales/EN.loc Englishlocalefile.
perlmodules/Config/General.pm Requiredperlmodule.
perlmodules/Config/General/
Extended.pm
Requiredperlmodule.
perlmodules/Config/General/
Interpolated.pm
Requiredperlmodule.
perlmodules/File/Basename.pm Requiredperlmodule.
perlmodules/File/CheckTree.pm Requiredperlmodule.
perlmodules/File/Compare.pm Requiredperlmodule.
perlmodules/File/Copy.pm Requiredperlmodule.
perlmodules/File/DosGlob.pm Requiredperlmodule.
perlmodules/File/Find.pm Requiredperlmodule.
perlmodules/File/Path.pm Requiredperlmodule.
perlmodules/File/Spec.pm Requiredperlmodule.
perlmodules/File/stat.pm Requiredperlmodule.
perlmodules/File/Spec/Functions.pm Requiredperlmodule.
perlmodules/File/Spec/Mac.pm Requiredperlmodule.
perlmodules/File/Spec/OS2.pm Requiredperlmodule.
perlmodules/File/Spec/Unix.pm Requiredperlmodule.
perlmodules/File/Spec/VMS.pm Requiredperlmodule.
perlmodules/File/Spec/Win32.pm Requiredperlmodule.

9
Memex Technology Ltd A Guide to AutoWeb
Installation prerequisites
BeforeyoucaninstalltheAutoWebserver,yoursystemmustcontain:
Oneofthefollowingoperatingsystems:
SunSolaris10
RedHatEnterpriseLinux4
MicrosoftWindowsServicesforUNIX3.5
Perl5.0orgreater
MemexIntelligenceEngine(MIE)6.0
Apache2HTTPserver.
ApachebeconfiguredtorunastheMemexadministratoruser.
ToconfigureApache2torunastheMemexadministratoruser:
ChangetothedirectorywhereApacheshttpd.conffileislocated.Forexample:
cd /usr/local/apache2/conf
Editthehttpd.conffilewithaplaintexteditor,suchasvi.
Locatethesectionoftheconfigurationfilethatspecifiestheuseraswhomthehttpd
servicewillrun.Forexample,toforceApache2torunastheusermxadmininthe
groupmxadmins,addorchangetheUserandGrouplines:
User mxadmin
Group mxadmins
ApacheslogfilesmustbewritablebytheMemexadministratoruser(typicallymxadmin
ormxroot).
TodothisonSolarisorLinux:
suasroot
ChangetheownershipofthedirectorywhereApacheslogfilesreside.Thelocation
ofthelogfilesisspecifiedinApacheshttpd.conffile.Thedirectoryanditscontents
shouldbeownedbytheMemexadministratoruser.Forexample:
chown -R mxadmin:mxadmins /var/apache2/logs
TodothisonWindowsSFU:
FromanSFUcommandconsole,suasAdministrator.
ChangetheownershipofthedirectorywhereApacheslogfilesreside.Thelocation
ofthelogfilesisspecifiedinApacheshttpd.conffile.Thedirectoryanditscontents
shouldbeownedbytheMemexadministratoruser.Forexample:
chown -R SERVERNAME+mxadmin:SERVERNAME+mxadmins
/usr/local/apache2/logs
Torunthegetsite.plscriptasacronjob(seeMonitoringWebsitesonpage33),theMemex
administratoraccount(usuallymxadminormxroot)musthaveahomedirectory.

SFU requirements
IfyouareinstallingonSFU,youmustfirstinstallthefollowingsoftwarepackages:
Package name Description
httpd Apache2HTTPServer
lynx LynxWebbrowserforterminals

10
Memex Technology Ltd A Guide to AutoWeb
zlib Zlibdatacompressionlibrary

ThesepackagesareavailablefromtheSFUToolsWarehouseWebsite:
http://www.interopsystems.com/tools/warehouse.htm
Toinstallthesepackages,firstdownloadandinstallthepackageinstallerthatisavailableasa
shellscriptfromthesameWebsite.Youcanthenissuesimplecommandsfromashell
consolewindowthatusethepackageinstallertodownloadandinstallthesoftwarepackages
andalltheirdependencies.Forexample,toinstallApache2,runthecommand:
pkg_updat e L ht t pd
Formoreinformation,seetheSFUToolsWarehouseWebsite.

Installing the server components
ThemethodinstallingtheAutoWebservercomponentsvariesdependingonwhetheryour
MIEwasinstalledaspartofaMemexSeriesVIServerinstallation.Ifyouareadding
AutoWebtoaMemexSeriesVIsystem,usetheautoinstallermethoddescribedhere.
Otherwiseusethetarfilemethodonpage12.
Installing using the auto-installer
TheautoinstallerisavailableforWindows,LinuxandSolaris.YoumusthaveaMemex
SeriesVIServersetuptobeabletousetheAutoWebautoinstaller.
Using the auto-installer on Windows
1. Locatetheautoweb_windows.exefileinWindowsExplorer.
2. RightclickthisfileandchooseRunAs.
3. SelectThefollowinguserandenter<COMPUTERNAME>\Administrator.
4. EnterthepasswordforAdministratorandclickOK.
5. Followthesetupinstructionsonscreen:
MemexrecommendsleavingthedestinationdirectoryasC:\SFU\opt\memex
Inmostcasesyoucanleavethehostnameandportsettingsattheirdefaultvalues:
Hostname:l ocal host
Port:9001
EnterthenameandpasswordofanMIEsuperuser.TocheckthenamesofcurrentMIE
superusers,lookatthevaluesofthesuperuserselementinthememexsvr.xmlfile
(usuallylocatedin/opt/memex/etc).
6. Asinstructedattheendoftheautoinstallationprocess,addanIncludestatementto
Apacheshttpd.conffile.
Forexample,fromanSFUshell,runthecommand:

11
Memex Technology Ltd A Guide to AutoWeb
echo " I ncl ude / opt / memex/ aut oweb/ conf i g/ apache2. conf " >>
/ usr / l ocal / apache2/ conf / ht t pd. conf
7. Start,orrestart,ApacheWebserver:
/ usr / l ocal / apache2/ bi n/ apachect l r est ar t

Using the auto-installer on Solaris or Linux


1. Logontotheserverasthelocalrootuser.
2. Locatetheautoweb_linux.shinstallscriptandrunitbytypingthecommand:
sh autoweb_linux.sh
3. Followthroughthesetupinstructionsonscreen.(Thedefaultvaluesareusuallycorrect
foreach):
Memexrecommendsleavingthedestinationdirectoryas/opt/memex
EnterthehostnameandportnumberofyourMemexSeriesVIServer.Thedefault
valuesoflocalhostand9001areusuallycorrect,butyoucanmodifythem.Ifyou
areinstallingAutoWebonaserverotherthantheonethathostsyourMemexSeries
VIsetup,youmustalsoprovidetheportnumberforthatserversMIE.Otherwise,
enterthesamevalueasyouenteredforthepreviousportnumber.
EnterthenameandpasswordofanMIEsuperuser.Tocheckthenamesofcurrent
MIEsuperusers,lookatthevaluesofthesuperuserselementinthememexsvr.xml
file(usuallylocatedin/opt/memex/etc).
Note If any of the values you enter for the previous two steps are incorrect, the installer
will display an error and prompt you to re-enter the correct values.
4. Asinstructedattheendoftheautoinstallationprocess,addanIncludestatementto
Apacheshttpd.conffile.Forexample,runthecommand:
echo "Include /opt/memex/autoweb/config/apache2.conf" >>
<APACHE_CONF>/httpd.conf
Where <APACHE_CONF> is a path such as /etc/apache2.
5. Start,orrestart,theApacheWebserver:
<APACHE_HOME>/bin/apachectl restart
Where <APACHE_HOME> is a path such as /usr/apache2.

Installing using the tar file


ThismethodofinstallationshouldonlybeusedifyourMemexserverwassetupmanually
andnotwiththeMemexSeriesVIautoinstaller.IfyouareunsurewhichtypeofMemexset
upyouhave,contactsupport@memex.com.
Note You must install the AutoWeb server components as the Memex administrator
account. For example, mxadmin or mxroot.


12
Memex Technology Ltd A Guide to AutoWeb
ToinstalltheAutoWebserver:
1. CreateanautowebdirectoryontheserverwhereyouwanttoinstalltheAutoWeb
server.
Forexample:mkdi r / opt / memex/ aut oweb
2. Runthefollowingcommandtogivewriteaccesstotheaut owebdirectory:
chmod g+w / opt / memex/ aut oweb
3. Movetotheautowebdirectory.
4. Copythesuppliedinstallationtarfiletothisdirectory.Thefileiscalledmxwasvr
platformversion.tar.Forexample:mxwasvrsolaris2_0a3.tar
5. Runthefollowingcommandtouncompressthetarfile:
t ar xvf mxwasvr - platform- version. t ar
6. IfyouareinstallingonSolaris,youmustmakesurethatyouhavethepathtothe
MIEsbindirectoryinthePATHenvironmentvariablefortheuser.
Tofindwhatisinyourpath,enter:echo $PATH
Toaddthebinpath,ifnecessary,enter:PATH=$PATH: <path_to_mie_bin>
Tomakeyourchangesvisible,enter:expor t PATH
7. Usingasuitabletexteditor,checkthatthe/etc/servicesfilefortheMIEcontainsthe
followingentry:
mx- ai l svr <port>/ t cp
TheportnumberoftheMIEisusually9001.
8. Runtheinstallshellscript.
Thisscripttakesoneparameter:thefulldomainnamefortheWebserver.For
example:
sh i nst al l ser ver . domai n. com
9. Specifythelanguagefortheserverbyeditingthelocaleentrywithinthespider.cfg
configurationfile.
ThedefaultisEN(forEnglish),butyoucanchangethistoloadadifferentlocaleifyou
haveanotherlocalefileinstalledinthelocalesdirectory.
Toloadadifferentlocale,enterinthenameofthefilewithoutthe.locextension.For
example:l ocal e EN
10. ConfigureyourwebserversothatthemirrorsubdirectoryisvisibleasaWeb
subdirectory.
Todothis,addalinetothehttpd.conffile,suchas:
Al i as / aut oweb- mi r r or / / opt / memex/ aut oweb/ mi r r or /
Notes:
Thenamethatyougivetothisaliaswillhaveanimpactonthemi r r or ur l entry
withinthespider.cfgfile.
ItisgoodpracticetogroupAliasentriestogetherwithinthehttpd.conffile.If
youareunsurewherethisAliasentryshouldgo,oryoudonothaveanyother
Aliasentries,addittotheendofthefile.

13
Memex Technology Ltd A Guide to AutoWeb
11. ConfigureyourwebserversothattheimagessubdirectoryisvisibleasaWeb
subdirectory.
Todothis,addalinetoApacheshttpd.conffile,suchas:
Al i as / aut oweb- i mages/ / opt / memex/ aut oweb/ i mages/
Note:
Thenamethatyougivetothisaliaswillhaveanimpactonthei mgl st entrywithin
thespider.cfgfile.
12. AddacgibindirectorytoyourWebservercalled/ aut oweb- bi n/ .Thisdirectory
mustbealiasedtothecgibinsubdirectorywithintheautowebdirectory.
Todothis,addalinetoApacheshttpd.conffile,suchas:
Scr i pt Al i as / aut oweb- bi n/ / opt / memex/ aut oweb/ cgi - bi n/
13. MakeanoteofthefullURLlocationofthisScriptAlias.
YouenterthisURLwhenconfiguringtheAutoWebclienttoolbar.
14. AddadirectorytoyourWebserverthatpointstothecgibinsubdirectorywithinthe
autowebdirectory.
Todothis,addthefollowinglinestoApacheshttpd.conffile:

<Di r ect or y " <autoweb_install_path>/ cgi - bi n" >
Al l owOver r i de None
Opt i ons None
Or der al l ow, deny
Al l ow f r omal l
</ Di r ect or y>

Where<autoweb_install_path>isthelocationofyourAutoWebinstallation,
typically/opt/memex/autoweb.
YoumustalsoaddanotherdirectorytoyourWebserverforeachofthemirrorand
imagesdirectoriessimilartotheoneshownaboveforthecgibindirectory.For
example:

<Di r ect or y " <autoweb_install_path>/ mi r r or " >
Al l owOver r i de None
Opt i ons None
Or der al l ow, deny
Al l ow f r omal l
</ Di r ect or y>

and
<Di r ect or y " <autoweb_install_path>/ i mages" >
Al l owOver r i de None
Opt i ons None
Or der al l ow, deny
Al l ow f r omal l
</ Di r ect or y>


14
Memex Technology Ltd A Guide to AutoWeb

Creating extra databases
Onesampledatabaseiscreatedaspartoftheinstallationprocess.Thesampledatabaseis
calledwebarchive.ThedirectoryforAutoWebdatabasesis:/opt/memex/autoweb/databases.
Youcancreateextradatabasesbyusingthens_createcommandfollowedbythemkphonetic
command.Forexample:
ns_create -c /opt/memex/autoweb/dbconfigs/config.archive
-n 8192 /opt/memex/autoweb/databases/mynewdb

mkphonetic /opt/memex/autoweb/databases/mynewdb

SeetheMemexIntelligenceEngineAdministratorsGuideformoreinformationonthens_create
andmkphoneticutilities.

Setting up the AutoWeb configuration file
spider.cfgistheconfigurationfileforAutoWeb.Thistableliststheentriesthatthe
configurationfilemustcontain.Thedefaultspider.cfgfileisshownonpage17.
Name Details
installpath
TheinstallationdirectoryoftheAutoWebserver.Thisissetautomatically
bytheinstallscript.
Forexample:/ opt / memex/ aut oweb
locale
ThelanguagelocaletousefortheserverresponsestotheMemextoolbar.
Thismustbesettomatchoneofthefilesinthelocalesdirectoryinthe
installationpath.
Forexample:EN
mirrorurl
TheURLforthemirrordirectory.Thismustcontainthefulldomainname
andthealiasthatyougaveforthemirrordirectory.
Forexample: ht t p: / / ser ver . domai n. com/ aut oweb- mi r r or
httracklib
ThepathtothelibfileforHTTrack.
Forexample:/ opt / memex/ aut oweb/ bi n
httrack
ThepathtotheHTTrackexecutable.
Forexample:/ opt / memex/ aut oweb/ bi n/ ht t r ack
opts
Theoptionsthatgetsite.plusestocallHTTrack(seeHTTrackoptionsand
robots.txtonpage17).
Forexample:- n - %e0
stdopts
Moreoptionsthatgetsite.plusestocallHTTrack.
Forexample: - I 0 - Qq - - assume cf m=t ext / ht ml , php=t ext / ht ml
- X0 - %F " "

15
Memex Technology Ltd A Guide to AutoWeb
Name Details
append
Thepathtothens_appendutility.
Forexample:/ opt / memex/ mi e/ bi n/ ns_append
decode
Thepathtothedecodeutility,
Forexample:/ opt / memex/ mi e/ bi n/ decode
configdb
Thepathtotheconfigdatabaseforgetsite.pl.
Forexample:/ opt / memex/ aut oweb/ conf i g. db
lynx
Thepathtothelynxutilityandtheparametersthatmustbepassed.
Forexample:/ opt / memex/ aut oweb/ bi n/ l ynx
cf g=" / opt / memex/ aut oweb/ bi n/ l ynx. cf g"
domain
Thewebserverdomain.
Forexample:ser ver . domai n. com
imglst
Thepaththatwillbeaddedtothedomaintoretrievetheimagelistforthe
toolbar.Thefirstpartofthismustbethenamethatyougavetothealias
forthe/imagesdirectory.
Forexample:/ aut oweb- i mages/ memexbar . bmp
cgi-bin
Thepaththatwillbeaddedtothedomaintoaccessthecgibinfor
AutoWeb.Thismustbethenameofthealiasthatyougaveforthecgibin
directory.
Forexample:/ aut oweb- bi n/
pageopts
Theoptionsusedinthecallfromindexpage.pltoHTTrack.
Forexample: - %P0 C0 - I 0 - %Q - n - Qq - d - - assume
cf m=t ext / ht ml , php=t ext / ht ml - X0 - %F " "
logfile
ThelocationofthelogfileforAutoWeb.Ifthisentrydoesnotexist,nolog
fileiscreated.
Forexample:/ opt / memex/ l ogs/ cr awl er l og. t xt
filtertypes
AlistofthefiletypesthatAutoWebwillnotwritearecordfor.
Forexample:
r a| r am| j pg| gi f | pbm| mov| avi | wmv| css| pdf | ps| j s| xml | r df
lockfile
Thelockfilethatisusedtopreventget si t e. pl fromrunningmorethan
once.
Forexample:/ t mp/ aut owebl ock
notrenamed
AlistofthefiletypesthatHTTrackdoesnotrenameashtml.
Forexample:ht ml | ht m| t xt
imbase
TheinstallationdirectoryoftheMemexPatriarchsoftwareontheserver.
ThisentryisoptionalandisonlynecessaryifyouwanttouseAutoWeb
fromwithinMemexPatriarch.
Thisparametershouldusuallybesetto:/ opt / memex/ i m

16
Memex Technology Ltd A Guide to AutoWeb
Name Details
rollover
Thenumberofdaysbeforethemirrordirectoryisrolledover.
Rollingoverthemirrordirectoryinvolvescreatinganewsubdirectoryin
thelocationspecifiedbythemirrorurlsetting.Ifyouleavethisatthe
defaultof7,anewmirrorsubdirectoryiscreatedevery7daysforstoring
Webpagesin(2007001,2007002andsoon).
Toturnoffthisprocess,setthevalueto0,althoughthisisnot
recommended.Thedefaultandrecommendedvalueintheprovidedfile
is7.
Note You use different configuration file variables to specify the HTTrack options,
depending on how you are running AutoWeb:
If you are running the AutoWeb toolbar, use the pageopts variable to specify the
HTTrack options.
If you running AutoWeb as a cronjob via getsite.pl use the StdOpts variable
to specify the HTTrack options.

The default spider.cfg file

#Conf i g f i l e f or I nt el l i gence Mi r r or
i nst al l pat h / opt / memex/ aut oweb
mi r r or ur l ht t p: / / localhost/ aut oweb- mi r r or
ht t r ackl i b / opt / memex/ aut oweb/ bi n
ht t r ack / opt / memex/ aut oweb/ bi n/ ht t r ack
opt s - n - %e0 - A32000
st dopt s - I 0 - Qq - - assume cf m=t ext / ht ml , php=t ext / ht ml - X0 - %F " "
append / opt / memex/ mi e/ bi n/ ns_append
decode / opt / memex/ mi e/ bi n/ decode
conf i gdb / opt / memex/ aut oweb/ conf i g. db
l ynx / opt / memex/ aut oweb/ bi n/ l ynx -
cf g=" / opt / memex/ aut oweb/ bi n/ l ynx. cf g"
domai n localhost
i mgl st / aut oweb- i mages/ memexbar . bmp
cgi - bi n / aut oweb- bi n/
pageopt s - %P0 - C0 - I 0 - %Q - n - Qq - d - - assume
cf m=t ext / ht ml , php=t ext / ht ml - X0 - %F " "
l ogf i l e / opt / memex/ aut oweb/ cr awl er l og. t xt
f i l t er t ypes r a| r am| j pg| gi f | pbm| mov| avi | wmv| css| pdf | ps| j s| xml | r df
l ockf i l e / t mp/ spi der l ock
not r enamed ht ml | ht m| t xt
l ocal e EN
i mbase / opt / memex/ i m
r ol l over 7

HTTrack options and robots.txt
Arobots.txtfileisstoredintherootofmostWebservers.Thisfilealertscrawlersandweb
spiders,suchasAutoWeb,astowhichpagestheyshouldignorewhenretrievingpagesfrom
theremoteWebserver.
TheoriginalspecificationofthisstandardandtheIETFdraftareavailablefromthefollowing
sites:
http://www.robotstxt.org/wc/norobots.html

17
Memex Technology Ltd A Guide to AutoWeb
http://www.robotstxt.org/wc/norobotsrfc.html

Becauserobots.txtrestrictsthefilesthatcanbedownloadedbywebspiders,ithasanimpact
ontheAutoWebserversoftwareanditsabilitytotrackandstoreWebpages.
AutoWebusesHTTracksoftwaretoretrieveremoteWebpages.Ifrequired,youcan
configureHTTracktoeitherfolloworignorethedirectivesintherobots.txtfile.Youdothis
bychangingtheopt ssettinginthespider.cfgfile.Formoreinformation,seeAppendixC
HTTrackoptionsonpage41.

Upgrading to AutoWeb 2.0


ThefollowingseriesofinstructionsmustbeperformedtoupgradeanAutoWeb1.3
installationtoAutoWeb2.0.IfanupgradeisbeingperformedfromAutoWeb1.0or1.1the
configurationmustbeupgradedtoAutoWeb1.3beforethefollowingstepscanbeapplied.
Instructionsforupgradingtoversion1.3aregivenintheappendixonpage43.

Unpack the installation package


UnpacktheAutoWeb2.0installationpackageinatemporarylocation.Forexample:
tar -xvf mxwasvr-<platform>-<version>.tar

Updating the configuration database
MemexAnalystconfig.dbdatabase
IfyouareusingMemexAnalystforadding/editingconfigurationrecordsforAutoWeb,two
newfieldsmustbeaddedtotheconfigfilefortheconfig.dbdatabase.Thepathtothisfileis
typically/opt/memex/autoweb/config.db/config.Useaplaintexteditor,suchasvi,toedit
thisfile,addingthefollowingtwolinestotheendofthefile:
f i el d: 6 i ndex xxi ndex " "
f i el d: 7 pr i or i t y xxpr i or i t y " "
Note If the field numbers 6 and 7 are currently used by other fields, use the next available
highest numbers that are not currently in use.
MemexPatriarchWebConfigDatabase
IfyouuseMemexPatriarchforadding/editingconfigurationrecordsforAutoWeb(thatis,if
config.dbisasymboliclinktotheMemexPatriarchWebConfigdatabase),youmustadd
indexandpriorityfieldstotheWebConfigdatabasedefinition.DothiswithinMemex
Patriarch,usingEntityManager.SeetheMemexPatriarchonlinehelpfordetailsofhowto
addnewfields.
TheMemexPatriarchformforWebConfigrecords(and,optionally,theformfor
WebArchiverecords)shouldbereplacedbytheformssuppliedintheim13autoweb/forms
directoryofthedistribution.Forexample:

cp i m13aut oweb/ f or ms/ WebConf i g. f or m/ opt / memex/ i m/ CS/ f i l es/ f or ms
18
Memex Technology Ltd A Guide to AutoWeb
cp i m13aut oweb/ f or ms/ WebAr chi ve. f or m/ opt / memex/ i m/ CS/ f i l es/ f or ms
TwonewpicklistsshouldbeaddedwithinListManagementtotheWebConfigdatabase
definitionfortheindexandpriorityfields.IndexshouldhavethevaluesYESandNO.Priority
shouldhavethevaluesHIGH,MEDIUMandLOW.
SeetheMemexPatriarchonlinehelpfordetailsoncreatingpicklists.
Note These picklist files are supplied with the AutoWeb distribution in
im13autoweb/picklists.
Run the upgrade scripts
WithinthedirectorythattheAutoWeb2.0installationpackagewasunpacked,enterthe
followingcommand:
sh upgr ade- scr i pt s <autoweb installation directory>
Where<autoweb installation directory>istheinstallationdirectoryoftheexisting
AutoWeb1.3software.Thisisnormally/opt/memex/autoweb.



19

Chapter 2
Installing the AutoWeb client
Installing the toolbar
ToinstalltheAutoWebtoolbar:
1. InWindowsExplorer,browsetothelocationofthesuppliedAutoWeb.exefileforthe
clientapplication.
2. DoubleclickAutoWeb.exe.
ThislaunchestheAutoWebInstallShieldprogram.
3. ClickYestoacceptthelicenseagreement.
ThisdisplaystheChooseDestinationLocationpage.
4. Browsetothelocationwhereyouwanttoinstallthefiles,andclickNext.
TheInstallShieldprograminstallstheAutoWebfilesanddisplaysaconfirmation
messagewhentheinstallationiscomplete.
5. ClickFinishtoacknowledgethemessage.

Configuring the toolbar
AfterinstallingtheAutoWebtoolbar,youneedtoopenInternetExplorerandmakesurethat
thetoolbarisnowavailable.
Ifthetoolbarisnotvisible,chooseView>Toolbars>AutoWeb.ThisaddstheAutoWeb
toolbartoInternetExplorer.
Thetoolbarshouldlooklikethis:

ToconfiguretheAutoWebtoolbar:
1. ClickthearrowbesidetheAutoWebbuttonandchooseConfigurationfromthedrop
downlist.

20
Memex Technology Ltd A Guide to AutoWeb
ThisdisplaystheConfigurationdialogbox.

2. EntertheURLofthecgibindirectoryonthewebserverwheretheAutoWebserver
softwareisinstalled.Typically,thisis:http://server.domain/autowebbin/
Forexample:http://achilles.memex.com/autowebbin/
YoucancheckthisvaluebylookingfortherelevantScriptAliasentryinApaches
httpd.conffile(orinthe/opt/memex/autoweb/config/apache2.conffileforan
installationwithMemexSeriesVIServer).
3. ClickOK.
ThisenablestheAutoWebtoolbar.Allthetoolbaroptionswillnowbeavailable.

Configuring the toolbar from the Windows registry
IfyouareinstallingtheAutoWebtoolbaronasignificantnumberofmachines,orifyouwant
torestrictuseraccesstotheConfigurationoption,youcanconfigurethetoolbarviaaspecific
registryfileautoweb.reg.ThisfileissuppliedbyMemexalongsidetheclientinstallation
file.
Youspecifythefollowingsettingsintheautoweb.regfile:
URL
ThefullURLofthecgibindirectoryonthewebserverwheretheAutoWebserver
softwareisinstalled.
Conf i gDi sabl ed
ADWORDvalueintheregistry.Setthisto1(oranynonzerovalue)todisablethe
AutoWebtoolbarsConfigurationmenuoption.
Forexample,atypicalautoweb.regfilelookslikethis:

REGEDIT4

[HKEY_LOCAL_MACHINE\SOFTWARE\Memex Technology Ltd\AutoWeb]
"URL"="http://server.domain/autoweb-bin/"
"ConfigDisabled"=dword:00000000

DoubleclickthisfiletoapplythechangestotheWindowsregistryofthelocalcomputer.

Note These settings apply to all user accounts on the computer. The changes are applied
to Internet Explorer the next time it is started.


21
Memex Technology Ltd A Guide to AutoWeb
Toaddafurtherlevelofsecurity,youcanplacesecuritypermissionsontheseregistrykeysto
preventthembeingchanged.Thisstopsusersfromreconfiguringthetoolbarthemselves.For
moreinformationonsettingpermissionsforregistrykeys,seeyourMicrosoftWindows
documentation.
How the toolbar works
Implementation
TheAutoWebtoolbarisimplementedasanativeDeskBandcomponentforInternetExplorer
usingVisualC++.ThisrequirestheMXAutoWeb.dllfiletoberegisteredoneachclient
machine.Afterthelibraryisregistered,userscandisplaythetoolbarbyaccessingInternet
ExplorerandselectingView>Toolbars>MemexAutoWebToolbar.
Configuration
Thetoolbarconfigurationiscontrolledbythefollowingregistrykey:
HKEY_LOCAL_MACHINE/Software/Memex Technology Ltd/AutoWeb

ThiskeyisheldunderthestringvalueURL,whichcontainsthebaseURLtothecgibin
directoryonthewebservercontainingtheCGIscripts.
Processing index requests
WhenauserclicksIndexPageorIndexSelectedTextonthetoolbar,AutoWebsendsan
HTTPrequesttotheIndexPage.plPerlCGIscript,locatedwithinthecgibindirectoryonthe
server.
Thisrequestcontainsthefollowingparameters:
TheMemexdatabasewheretheindexedtextwillbestored
Thekeywordstoaddtothedatabaserecord
The(selected)textfromthepage
Anindicationastowhethertheuserisindexingtheentirepageorjustselectedtext
TheWebpagesURL

IndexPage.plthencallsHTTrackfortheURL(thiscallisruninthebackground).HTTrack
attemptstocreateamirrorofthatpage.
ThiscalltoHTTrackcontainsaparameterspecifyingwhethereachindexedfilewillcontaina
timestampinthefilename.HTTrackinturncallsaddpagefile.pl,whichcomparesthenew
indexedfilewiththemostrecentversiononthelocalserver.
Ifthefilesarethesame,thenewversionisdeletedandreplacedwithasymboliclinkto
themostrecentfile.
Ifthefilesaredifferent,thenewfilebecomesthemostrecentversionandisusedfor
anysubsequentcomparisons.

AftercompletingthecalltoHTTrack,IndexPage.plwritesarecordintothespecified
databasecontaining:
TheoriginalURL
TheURLofthemirror
Thekeywords
The(selected)textfromthepage

22
Memex Technology Ltd A Guide to AutoWeb
Thedateandtime

Note All the responses that IndexPage.pl returns to the user come from the selected
locale file within the locales directory. If no locale is set, the default English locale
(stored in EN.loc) is used.

Memex Analyst forms
TwonewMemexAnalystformsareinstalledaspartoftheAutoWebclientinstallation:
WebAr chi ve. mf m
Usethisformtoviewanyrecordsindatabasesthatstoreinformationonindexedweb
pages.
Cont r ol . mf m
Usethisformtoviewtherecordsintheconf i gdatabaseontheserver.

Tousetheseforms:
1. GotothePropertiesdialogboxforthedatasource.
2. IntheFormsection,choosetheUseformradiobutton.
3. Clickthe buttontobrowsetothelocationoftheformfilesonthelocalcomputer,
typically:
C:\ProgramFiles\MemexTechnologyLtd\AutoWeb\
4. SelectthefileandclickOpen.
ThescreenshotbelowgivesanindicationofhowthePropertiesdialogshouldlookonce
youhaveselectedyourform.

5. ClickOKinthePropertiesdialogbox.


23

Chapter 3
Using AutoWeb with Memex Patriarch
Note This chapter contains information on configuring AutoWeb to be used with Memex
Patriarch on a Memex system that was manually installed. If your system is a
Memex Series VI Server that was installed using the provided auto-installer (i.e. you
use Memex Patriarch to administer your system), you can skip this chapter and
continue reading Chapter 4
Using AutoWeb on page 31.
AutoWebisdesignedtointegratewithMemexPatriarchandMemexAnalyst.However,you
mustperformsomeextrainstallationandsetuptaskstouseAutoWebwithinMemex
Patriarch.
Important You can use either Memex Patriarch or Memex Analyst for choosing the Web
sites you want AutoWeb to monitor. However, you cannot configure AutoWeb
from both applications. The steps described in this section enable configuration
from within Memex Patriarch. This will disable configuration from within Memex
Analyst. You will still be able to view the configuration records in Memex
Analyst, but you will only be able to add or edit configuration records from
Memex Patriarch.

Installation tasks
Memex Intelligence Engine
ForMemexPatriarchandAutoWebtoworktogether,MIE6.0mustbeinstalledonallthe
serversthatwillbeusedtohostbothMemexPatriarchandAutoWeb.
Notes You do not need to place Memex Patriarch and AutoWeb on completely
separate physical machines. A single MIE instance can host both the Memex
Patriarch and AutoWeb databases.
If your system uses multiple physical servers, all the physical machines must
share the same secret file to allow for certificate authentication.

FordetailsonhowtosetuptheMIEonyourservers,readtheMIE6.0InstallationGuide.
Memex Patriarch
TheMemexPatriarchserversidecomponentscanbeinstalledintwoways:
1. UsingtheMemexSeriesVIServerautoinstaller
2. UsingthePerlbasedinstaller
24
Memex Technology Ltd A Guide to AutoWeb
ThePerlbasedinstallerprovidesawaytospecifymanyoftheconfigurationoptionsduring
theinstallationprocess,whereastheautoinstallerprovidesaquickwaytoinstallaprebuilt
installation.
ThissectionrelatestoMemexserverinstallationsdoneusingthePerlbasedinstaller.This
installerwillalsobeusedtoinstallthetwoAutoWebdatabasesforMemexPatriarch.
FormoreinformationonthePerlbasedinstallerseetheMemexSeriesVIServerInstallation
Guide:PartIIPatriarchComponents.
AutoWeb databases for Memex Patriarch
AutoWebcontainstwoMemexdatabasedefinitionsthatyoucanusetoinstallAutoWeb
databasesforMemexPatriarch.ThesedatabasesallowyoutosearchandcontrolAutoWeb
frominsideMemexPatriarch
Toenablethesedatabasedefinitions,copytheim13autowebdirectoryintotheiminstall
directory(whichwascreatedwhenthePerlbasedinstallerwasusedtoinstalltheMemex
Patriarchservercomponents).Forexample,
cp - R / opt / memex/ aut oweb/ i m13aut oweb / opt / memex/ i m/ i m- 2. 0a- 105- vani l l a-
i nt er i x/ i m- i nst al l
Important If you deleted the im-install directory after installing the Memex Series VI
Server, you will no longer have the Perl-based installer. You need this to
proceed with this installation procedure. Contact Memex Customer Services
and request a copy of the tar file containing the Perl-based installer for the
Memex Patriarch server components.
The installer for the Memex Patriarch server components must be run on the
physical machine that hosts the Memex configuration server. If AutoWeb is
installed on a machine that is not the configuration server, you must copy
the AutoWeb database definitions to the configuration server, by transferring
the im13autoweb directory across the network to the physical machine that
is hosting the configuration server.

BeforeyoucaninstalltheAutoWebdatabasedefinitions,youneedthefollowinginformation
aboutyourMemexSeriesVIServersetup:
ThehostnameandportnumberfortheMemexIntelligenceEnginethatyouwilluseto
accesstheAutoWebdatabases
TheprefixandnameofthelogicalserverthatwillhosttheAutoWebdatabases

Youwilladdthisinformationtotheinstallerssetup.xmlfiletospecifywheretheAutoWeb
databaseswillbecreated.

Editing the setup.xml file
Whenyouhavecopiedtheim13autowebdirectorytotheiminstalldirectory,youmust
modifythesetup.xmlfilewithintheiminstall/im13autowebdirectory.Thisfilecontainsthe
databasedefinitionsforthetwonewAutoWebdatabases:WebConfigandWebArchive.It
alsodefinesanewlogicalservernamedAutoWeb(prefixAW).

25
Memex Technology Ltd A Guide to AutoWeb
IfyouwanttocreatetheAutoWebdatabasesonaremoteserver,youmustedittheattributes
forthehost element,specifyingtheserverwherethenewAutoWebdatabaseswillbe
created.Todothis,changetheattributesto:host name=" hostname"por t =" number" .
Forexample:<hosthostname=cutlassport=9001>
Alternatively,tocreatetheAutoWebdatabasesonthesamephysicalmachineastheMemex
Patriarchconfigurationserver,leavethehost attributeas:
<host l ocal =y>

Installing the AutoWeb databases
Aftereditingtheset up. xml file,youmustrunthePerlbasedinstallerforMemexPatriarch,
toinstallthenewAutoWebdatabasesandlogicalserver.
ToinstalltheAutoWebdatabasesandserver:

1. Changetotheiminstalldirectoryontheconfigurationserver.Forexample:
cd /opt/memex/im/im-2.0a-105-vanilla-interix/im-install
2. Runthefollowingcommand:
perl install.pl c <CS_Prefix> -i <Patriarch_Install> -m
<MIE_Install> -x <MIE_Config> -p <Local_MIE_Port> -f
autoweb/im13autoweb

Where:
<CS_Prefix>istheprefixofthelogicalserverusedastheconfigurationserver(usually
CS).
<Patriarch_Install>isthedirectorywheretheMemexPatriarchserverside
componentsareinstalled(usually/opt/memex/im).
<MIE_Install>isthedirectorywheretheMIEisinstalled(usually/opt/memex/mie).
<MIE_Config>isthepathtotheMIEconfigurationfile(usually
/opt/memex/etc/memexsvr.xml).
<Local_MIE_Port>istheTCPportonwhichthelocalMIElistensforconnections.
Forexample:
perl install.pl -c CS -i /opt/memex/im -m /opt/memex/mie -x
/opt/memex/etc/memexsvr.xml -p 9001 -f autoweb/im13autoweb
3. Whenthedetailsoftheinstallationaredisplayed,enterytoconfirmthatyouwantto
continuewiththeinstallation.
4. EntertheusernameandpasswordoftheMemexPatriarchsuperuser.
Thescriptcompletestheinstallation.



26
Memex Technology Ltd A Guide to AutoWeb
Configuration tasks
ToconfigureAutoWebtoworkwithMemexPatriarch,youmustupdateAutoWebtousethe
newentitiesthathavebeencreated.

Modifying the spider.cfg file
ThistaskismandatoryifyouwanttouseAutoWebwithMemexPatriarch.
Thespider.cfgfileislocatedintheaut owebdirectory.Thefilecontainsthesettingi mbase.
YoumusteditthissettingtopointtothedirectorywhereMemexPatriarchisinstalled.
Forexample,ifMemexPatriarchisinstalledin/opt/memex/im,youwouldchangethe
spi der . cf gfilesettingto:
imbase /opt/memex/im

Important You must modify spider.cfg before you make any of the other changes
described in this section. If you do not make this change, AutoWeb will not be
able to detect that it is inserting data into an Memex Patriarch database, and
the resulting records will be inaccessible from the client software.

Linking to the WebConfig database


Thegetsite.plscript,whichisusedtoindexWebpagesautomatically,isconfiguredusing
recordsinalegacyformatdatabasecalledconfig.db.Youcanaddandeditrecordsinthis
databaseusingMemexAnalyst.However,toaddoreditconfigurationrecordsfromwithin
MemexPatriarchyoumustusethenewMemexPatriarchWebConfigdatabasethatwas
createdwhenyouinstalledtheim13autowebsetup.
Theconfig.dbdatabaseisstoredintheAutoWebinstallationdirectory.Forexample,if
AutoWebhasbeeninstalledin/opt/memex/autoweb,thepathtothisdatabaseis
/opt/memex/autoweb/config.db.
Important You can only configure AutoWeb from either Memex Patriarch or Memex
Analyst. You cannot configure AutoWeb from both applications.

Creating the symbolic l nk i
Creatingasymboliclinkfromtheconfig.dbdatabasetothenewWebConfigdatabase,forces
AutoWebtouseMemexPatriarchsWebConfigdatabase.
Tocreatethesymboliclink:
1. AstheMemexadministrativeuser,movetotheAutoWebinstallationdirectory.For
example:

cd / opt / memex/ aut oweb


27
Memex Technology Ltd A Guide to AutoWeb
2. Movetheconfig.dbasidebyenteringthefollowingcommand:

mv config.db config.db.old

3. CreatealinktotheWebConfigdatabasebyenteringwiththefollowingcommand:

ln s <path_to_im>/<autoweb_database_prefix>/databases/
WebConfig config.db

Forexample:

ln s /opt/memex/im/AW/databases/WebConfig config.db

AutoWebwillnowusetheWebConfigdatabaseratherthantheconfig.dbdatabase.

Note If you are upgrading your AutoWeb setup from a previous version, you must make
sure that a uniq_id file is stored in the WebConfig databases directory. You can do
this manually, or by adding a record to the database in Memex Patriarch.
For more information, consult the MIE Administrators Guide.

Reverting to the legacy database


If,atalaterdate,youdecidethatyouwouldprefertouseMemexAnalystforconfiguring
Websitemonitoring,youcanreversetheaboveprocess,deletingthesymboliclinkand
renamingtheconfig.db.oldfileasconfig.db.However,afterdoingthis,theconfiguration
databasewillbeemptyandyouwillbeleftwithaWebConfigdatabaseinMemexPatriarch
thatisnolongerconnectedtoAutoWeb.

Linking to the WebArchive database


Bydefault,AutoWebusesalegacyformatdatabasecalledwebarchiveforindexingpages.
Thisdatabaseislocatedinthethe/opt/memex/autoweb/databasesdirectory.Toconfigure
AutoWebfromMemexPatriarchyoumustusetheMemexPatriarchWebArchivedatabase
thatwascreatedwhenyouinstalledtheim13autowebsetup.
TousetheWebArchivedatabase,youmustcreateasymboliclinktoforceAutoWebtouse
thisdatabaseratherthanthe/opt/memex/autoweb/databases/webarchivedatabase.

Creating the symbolic l nk i
YoucreatethesymboliclinktotheWebArchivedatabaseinthesamewayasyoucreatedthe
symboliclinktotheWebConfigdatabase.
Tocreatethesymboliclink:

28
Memex Technology Ltd A Guide to AutoWeb
1. Movetothedatabasessubdirectoryoftheautowebinstallationdirectory.For
example:

cd / opt / memex/ aut oweb/ dat abases

2. CreatealinktotheWebArchivedatabasebyenteringwiththefollowingcommand:

ln s <path_to_im>/<autoweb_database_prefix>/databases/
WebArchive <name_of_archive_database>

Forexample:
ln s /opt/memex/im/AW/databases/WebArchive webarchive

Notes
The AutoWeb toolbar will list the WebArchive database by the name of the
symbolic link usually webarchive.
If you are upgrading your AutoWeb setup from a previous version, you must
make sure that a uniq_id file is stored in the WebArchive databases directory.
You can do this manually, or by adding a record to the database in Memex
Patriarch.
For more information on the uniq_id file, see the Memex Intelligence Engine
Administrators Guide.

YouwillbeabletouseMemexPatriarchtoviewWebpagesindexedfromtheAutoWeb
toolbarbysearchingtheWebArchivedatabaseontheAutoWeblogicalserverwithinMemex
Patriarch.

Setting up picklists
Thisisanoptionaltask.
InMemexPatriarch,theWebConfigentitycontainsasinglepicklistfielddatabasewhich
holdsalistofallthedatabasesinAutoWeb.Thislistisnotautomaticallypopulated.You
shouldupdatethislistwheneveryouaddadatabasetoAutoWeb.
ForinformationonmodifyingpicklistsinMemexPatriarch,refertotheMemexPatriarch
OnlineHelp.

Adding additional web archives


TheinstructionsinthischapterdescribehowtocreateasingleAutoWebarchivedatabase
thatisaccessiblefromMemexPatriarch.However,youcanusemultipledatabasestostore
WebpagesindexedbyAutoWeb.

29
Memex Technology Ltd A Guide to AutoWeb
ForeachnewdatabaseyouwanttousefromAutoWeb,youmustcreateanewlogicalserver
inMemexPatriarch,oruseanexistinglogicalserverthatdoesnotcontainanAutoWeb
database.
Createthenewdatabase(andlogicalserver,ifrequired)byusingthePerlbasedinstallerfor
MemexPatriarchservercomponents.ThisexampleshowshowtocreateanewWebArchive
databaseonanewlogicalservercalledAutoWeb2,withtheserverprefixZW:
1. AstheMemexadministrativeuser,copythesuppliedim13autowebdirectory:
cd /opt/memex/autoweb
cp -R im13autoweb im13autoweb2
2. Editthesetup.xmlfilewithinthenewim13autoweb2directory,removingthetwo
includestatementsandchangingthenameandprefixattributesfortheserverelement,
ensuringyouuseaprefixthatisnotalreadyusedbyanexistinglogicalserver.
Note For more information on server prefixes see the topic Use the installer to add a
logical server in the Memex Patriarch online help.
Forexample:
<?xml version="1.0"?>
<setup>
<host local="y">
<server name="AutoWeb2" prefix="ZW">
<database entity="WebArchive"/>
</server>
</host>
</setup>

3. Changetotheiminstalldirectoryandruntheinstallerwiththeim13autoweb2setup.
Forexample:
cd /opt/memex/im/im-2.0a-105-vanilla-interix/im-install
perl install.pl -c CS -i /opt/memex/im -m /opt/memex/mie
-x /opt/memex/etc/memexsvr.xml -p 9001
-f /opt/memex/autoweb/im13autoweb2
4. Changetothedatabasessubdirectoryoftheautowebinstallationdirectory.
5. CreateasymboliclinktothenewWebArchivedatabasebyenteringwiththefollowing
command:
ln s /opt/memex/im/ZW/databases/WebArchive webarchive2
Note Each archive database (or symbolic link) in the databases directory must have a
unique name. For example: webarchive1, webarchive2, and so on.

30

Chapter 4
Using AutoWeb
AutoWebisautilitythatallowsyoueasilytoaddthetextofaWebpagetoaMemex
database.Inadditiontothis,whenyouextracttextfromaWebpage,AutoWebcreatesa
mirroroftheWebpageonalocalserver.YoucanthenuseMemexAnalysttoviewthe
recordscreatedfromtheWebpagetextandtoviewthemirroredcopyoftheWebpage.

Selecting a Memex database
TospecifywheretheWebpagetextwillbestored,chooseadatabasefromtheSelect
Databasedropdownlist.


Specifying keywords

ToassociatekeywordswithanindexedWebpage,typethekeywordsintotheEnter
Keywordstextbox.

Indexing Web page text
ToextractspecifictextfromaWebpage,highlightthetextandthenclicktheIndexSelected
Textbutton.

Whenyouclickthisbutton,AutoWebalsomirrorstheentireWebpagetothelocalserver.

Indexing a Web page
ToextractthetextofanentireWebpage,clicktheIndexPagebutton.
31
Memex Technology Ltd A Guide to AutoWeb

Whenyouclickthisbutton,AutoWebalsomirrorstheentireWebpagetothelocalserver.
Viewing indexed pages
YoucanuseMemexPatriarchorMemexAnalysttoretrievetheindexedrecords.
TheindexedrecordforeachWebpagecontains:
TheURLoftheoriginalpage
TheURLofthemirroredcopyofthepage
Thedateandtimethatthepagewasindexed
Thetext(ortheselectedtext)fromthepage
Thekeywordsthatareassociatedwiththepage

IfMemexAnalysthasbeensetuptousetheformsdistributedwiththeAutoWebtoolbar,the
resultformdisplaysthemirroredcopyofthepagewhenyouviewoneoftherecords.The
screenshotbelowshowsanexampleofthis.


32
Memex Technology Ltd A Guide to AutoWeb
Monitoring Web sites
Thegetsite.plscriptindexesallthesitesthatarelistedasrecordsintheAutoWeb
configurationdatabase.ThisdatabaseiscalledWebConfigwithintheAutoWeblogicalserver
onaMemexSeriesVIinstallation,orcomprisesthefile/opt/memex/autoweb/config.dbon
installationscompletedusingthetarfilemethod.
Afteryouhavespecifiedthesitesyouwanttoindexyoucanrungetsite.plmanually,oryou
canconfigureittorunonaregularbasisasacronjob.Forexample,torungetsite.plasacron
jobonceanhour,enterthefollowing:
1 * * * * /opt/memex/autoweb/bin/getsite.pl
IfyouusedtheautoinstallertoinstallAutoWeb,thefollowingentriesareaddedtothecron
taboftheMemexadministratoruser:
# AutoWeb Run HIGH priority sites every hour
0 * * * * /opt/memex/autoweb/bin/getsite.pl HIGH
# AutoWeb Run MEDIUM priority sites every day
0 0 * * * /opt/memex/autoweb/bin/getsite.pl MEDIUM
# AutoWeb Run LOW priority sites every week
0 0 * * 1 /opt/memex/autoweb/bin/getsite.pl LOW
Thecombinationofconfigurationrecordsandthegetsite.plscript,runasacronjob,allows
youtomonitorthespecifiedWebsites.
Note The auto-installer for AutoWeb adds the above cron jobs to the cron tab of the
Memex administrator user. In the unlikely event that you run the auto-installer more
than once for example, if you delete installed files and then run the auto-installer
again a duplicate set of cron jobs will be added to the cron tab. So, if you run the
auto-installer more than once, you must edit the cron tab and remove the duplicate
entries.

Specifying the sites you want to monitor


YouspecifytheWebsitesyouwanttomonitorbycreatingAutoWebconfigurationrecords
onerecordperWebsite.DependingonthewayAutoWebwasinstalled,youcanuseeither
MemexPatriarchorMemexAnalysttocreateandeditrecordsintheAutoWebconfiguration
database.
Note Your system must be set up to use either Memex Patriarch or Memex Analyst for
editing configuration records. You cannot use both applications. However, if you use
Memex Patriarch, you will be able to view configuration records in Memex Analyst.

Specifying sites - Memex Patriarch


YousetupAutoWebtomonitoraWebsitebycreatinganewrecordintheMemexPatriarch
WebConfigdatabase.

33
Memex Technology Ltd A Guide to AutoWeb
ThefollowingscreenshotshowsanexampleofcreatingaconfigurationrecordfortheMemex
WebsiteusingMemexPatriarch.

EntervaluesfortheName,URLandDatabasefieldstospecifywhatyouwanttoindexand
whereyouwanttostoretheindexedWebpagedata.
Entervaluesfortheotherfields,asrequired.Thesefieldsaredescribedinthetableonpage35.
ClickAppendtosavethenewrecord.

Specifying sites Memex Analyst


ThefollowingscreenshotshowsanexampleofcreatinganindexrecordfortheMemexWeb
siteinMemexAnalyst.


34
Memex Technology Ltd A Guide to AutoWeb
EntervaluesfortheKeywords,SiteToIndexandDatabasefieldstospecifywhatyouwant
toindexandwhereyouwanttostoretheindexedWebpagedata.
Entervaluesfortheotherfields,asrequired.Thesefieldsaredescribedinthetablebelow.
Savethenewrecord.

Fields on the configuration form


ThefollowingtableexplainsthefieldsontheconfigurationformsusedwithinMemex
PatriarchandMemexAnalyst.
Note The default configuration forms have the heading Index Request. This is part of the
form design and can be changed, if required. The labelling of fields on the forms can
also be changed as part of the form design. The first two columns in the following
table show the labels as they appear in the default forms supplied for Memex
Patriarch (Field MP) and Memex Analyst (Field MA).

Field MP Field MA Details


URN ThisfieldispopulatedbyMemexPatriarchwhenyousavetherecord.
ThefieldisnotincludedonthedefaultformforMemexAnalyst.
Name Keywords EnterthenameoftheWebsiteyouwanttoindex.Thenameshouldbe
relevanttothesiteyouwanttoindexastextenteredherecanbeusedas
keywordswhensearchingforitlater.
URL

SiteToIndex EnterthefullURLoftheWebsiteyouwanttoindex.Ifyouenterthe
URLofaWebsitewithoutspecifyingaparticularWebpage(for
example,http://www.yourcompany.com),AutoWebusesthehome
pageofthesiteasthestartpagefromwhichtoindex.Youcanindexan
areawithinaWebsitebyspecifyingaparticularpageonasite(for
example,http://www.youcompany.com/personnel/vacancies.html).
Indexed Index Thisfieldallowsindexingtobetemporarilyturnedoffbysettingthe
fieldvaluetoNO.ToresumeindexingsetthevaluetoYES.Thedefault
valueisYES,soWebsitesforrecordswithnovalueinthisfield(suchas
recordsfromupgradedversionsofAutoWeb)areindexed.
Database Database ThisisthenameofthedatabasetowhichindexrecordsfortheWebsite
aresaved.Thevalueisthenameofthedatabaseasitappearsonthefile
system,withintheautoweb/databasesdirectory.Thewebarchive
databaseisthedefaultavailabledatabasecreatedforsavingnewindex
recordsto.
Priority Priority Thevalueinthisfieldallowsindexingtobeperformedatdifferent
frequencies.Thisisachievedbyrunningthegetsite.plscriptagainsta
subsetofrecords,basedonthevalueofthisfield(asshowninthecron
tablistingonpage33).TheAutoWebautoinstallercreatesthree
Priorityoptionstochoosefrom.Chooseyourprioritydependingon
howoftenyouwantthesitetobeindexedandupdated.

35
Memex Technology Ltd A Guide to AutoWeb
Field MP Field MA Details
Thefrequencyofupdatesisdefinedasfollows:
HIGHprioritysitesareindexedeveryhour
MEDIUMprioritysitesareindexedeveryday
LOWprioritysitesareindexedeveryweek
Note:ThesefrequenciesaredefinedintheMemexadministratorusers
crontab.SeeMonitoringWebsitesonpage33formoredetails.
Options Crawler
Options
UsethisfieldtopassspecificoptionstotheHTTrackWebsiteCopier
software.HTTrackisathirdpartytoolusedbyAutoWebtocopyWeb
pages.Byspecifyingoptionsyoucanoverrulemanyaspectsof
AutoWebsdefaultbehaviour.
ForfulldetailsofthemanyoptionsforHTTrackseetheonlineUsers
Guideat:
http://www.httrack.com/html/fcguide.html
Theoptionthatyouaremostlikelytowanttospecifyisthelinkdepth.
AutoWebsdefaultlinkdepthis2.Thismeansthatyouwillindexall
thepagesthatarelinkedtofromthespecifiedstartpage(e.g.thehome
pageofaWebsite)plusallthepagesthatarelinkedtofromthose,
primarylink,pages.OnalargeWebsite,withpagesthateachcontain
manylinks,alinkdepthof2couldresultinhundredsofpagesbeing
indexed,andyoumay,therefore,wanttoreducethelinkdepth.Ona
smallWebsite,however,youmightwanttoincreasethelinkdepthto3
or4.
Theoptionforsettinglinkdepthis:
-%eN
WhereNisanintegertypicallybetween0and4.
Notes:
Youmustbeextremelycarefulwhenspecifyingoptions.Ifyouenter
invalidoptions,orthewrongoptionforthebehaviouryou
intended,itcanresultinnothingbeingindexed,unexpected
indexingresults,oreverythingontheentiredomainbeingindexed.
Ifyoudonotsetavaluehere,thelinkdepthdefaultsto2.
SettingahighlinkdepthvalueforalargeWebsitecanquickly
resultinyouusingupagreatdealofavailablediskspace.
Bydefault,AutoWebdoesnotindexpagesthatarelocatedoutside
thedomainonwhichthestartpageislocated.Thishelpstorestrict
indexingtoasingleWebsite.Youcanbypassthisrestrictionby
usingthe-eoption.However,youshouldusethisoptionwith
extremecautionasitcaneasilyresultinyouindexingavastnumber
ofpagesfromtheinternetatlarge.

36
Memex Technology Ltd A Guide to AutoWeb
Field MP Field MA Details
Linkdepth,bydefault,onlyextendstopagesonorbelowthe
currentdirectorylevel.Forexample,ifyouindex
http://www.memex.co.uk/AboutMemex/index.phpwithalink
depthof2,AutoWebwillindexpagessuchas
http://www.memex.co.uk/AboutMemex/Awards/index.php,asthis
pageislocatedinadirectorybelowthestartpage,butitwillnot
indexhttp://www.memex.co.uk/index.php,whichisinadirectory
abovethestartpage.Youcanusethe-BoptiontoallowAutoWeb
toindexupthedirectorystructureaswellasdownit.
HTTrackWebsiteCopierisopensource,thirdpartysoftware.
Memexisnotresponsibleforanyofthecontentonthe
www.httrack.comWebsite.
Notes Notes Youcanenteranytextaboutthesiteorthisparticularrecordherefor
yourownreference.

How Web site monitoring works


WebsitemonitoringisaccomplishedbyrunningaPerlscriptcalledgetsite.platregular
intervals.Thisscriptperformsthefollowingactions:
1. Decodestheconfigurationdatabase.
2. ParsestheoutputtodeterminewhichWebsitestomirrorandindex.
3. CallsHTTrackforeachsitethatshouldbeindexed.
Note:Ifgetsite.plwasrunwithaspecificprioritysetting(e.g.HIGH),onlyasubsetof
theconfigurationrecordsmayproducecallstoHTTrack.
TheHTTrackWebsiteCopierprogramthencreatesamirrorofthesiteinthemirror
directoryoftheserverinstallation.

Stopping getsite.pl
Ifyouhavestartedgetsite.plandwanttostopit,youmustmanuallydosobykillingits
processandanyhttrackprocesses.
Tokillanygetsite.plandhttrackprocesses:
1. AsrootortheMemexadministratoruser,openashellconsole.
2. Typethefollowingcommand:
ps -eo pid,args|grep autoweb
Thisliststhecurrentlyrunningprocesseswhosedetailsmentionautoweb.

37
Memex Technology Ltd A Guide to AutoWeb
Forexample:
1545 grep autoweb
3197 /opt/memex/autoweb/bin/httrack -V /opt/memex/autoweb/bin/addtomemex
5371 /usr/contrib/perl -I/opt/memex/autoweb/perlmodules /opt/memex/autow
5513 sh -c /opt/memex/autoweb/bin/httrack -V '/opt/memex/autoweb/bin/add
3. UsethekillcommandwiththerelevantprocessIDnumbertostopeachofthelisted
processes,apartfromtheonementioninggrep,whichsimplyreportsthesearchyou
ran.
Forexample:
kill 3197
kill 5317
kill 5513

Extracting the Web page text


ForeachpagethatHTTrackdownloads,itcallstheaddtomemex.plscript.
addtomemex.plcheckswhattypeoffilehasbeendownloadedandwhetherthetextcanbe
extractedfromthefile.ItthenusestheLynxtextbasedWebpagebrowsertooutputatext
onlyversionofthepage,fromwhichitextractsthetext.
Whenthetexthasbeenextractedsuccessfully,addtomemex.plwritesarecordtothe
specifiedWebarchivedatabasecontainingthefollowinginformation:
Thekeywordsfromtheconfigrecord
TheoriginalURLofthefile
ThemirroredURLofthefile
Thetextfromthepage
Thedateandtimethepagewasmirrored

38

Appendix A
Known limitations
AutoWebcontainsthefollowinglimitations:
Ifapagecontainsanycrossdomainframes,theindexselectionandindexpagebuttons
willnotwork.Formoreinformation,seetheMicrosoftwebsite:
ht t p: / / msdn. mi cr osof t . com/ l i br ar y/ def aul t . asp?ur l =/ wor kshop/ aut h
or / om/ xf r ame_scr i pt i ng_secur i t y. asp
AutoWebwillnotindexURLsthatareredirected.Forexample,ifyouareintheUK
andyoubrowsetowww. memex. comyouareredirectedtowww. memex. co. uk.Asa
resultyoucannotuseAutoWebtoindexht t p: / / www. memex. com.Theworkaround
istoindexaspecificpagebelowtheredirecteddomainforexample,
ht t p: / / www. memex. com/ About Memex/
Ifauserattemptstoindexapagethathascrossdomainframes,thefollowingerror
messageisdisplayed:
Br owser secur i t y r est r i ct i ons pr event you f r omi ndexi ng t hi s
page
WhenAutoWebmirrorsaWebpageitdoesnotautomaticallymirrordocumentslinked
tofromthatpage.Thedepthofmirroringdependsontheoptionsspecifiedinthe
configurationrecord.Asaconsequence,stylesheetsusedbythepage,orimagesthat
appearonthepage,maynotbemirrored.
MemexstronglyadvisesthatyouchangetheInternetsecurityzoneofthemirrorto
disablescripting.AsfilescopiedtothelocalmirrorareonyourlocalIntranet,theymay
havemoresecurityrightsthanishealthy.SelectTools>InternetOptions>Security>
RestrictedSites,clickSites,andaddyourmirrordomaintothelist.
WhenindexingWebsitesusingget si t e. pl ,imagesarenottimestamped.This
meansthatifaWebpagecontainsanimagethatchanges(butkeepsthesamename),
theoldcopyoftheimagewillbeoverwritten.Asaresult,theearlierversionofthepage
willreferencethenewerversionoftheimage.

39

Appendix B
Troubleshooting
IfaWebsiteisnotindexedorisnotindexedinthewayyouexpected:
ChecktheknownlimitationslistedinAppendixA.
MakesureyouareawareofthedefaultindexingbehaviourofAutoWebandthe
variousHTTrackoptions.
Seepage41foralistofthedefaultoptionsandtheonlineUserGuideforHTTrack
WebsiteCopierathttp://www.httrack.com/html/fcguide.htmlforacompletelistof
availableoptions.
Checkthemessagesinthelogfile.Thepathandnameofthisfilearegivenasthevalue
ofthel ogf i l eparameterinthespider.cfgconfigurationfile
(/opt/memex/autoweb/spider.cfg).
Forexample:/opt/memex/logs/crawlerlog.txt
IfyougetthemessageAlreadyRunningResourcetemporarilyunavailablewhenyou
runthegetsite.plscript,itindicatesthatthescripthasnotfinishedindexingpages.This
maybebecausetheconfigurationrecordsarecausingittoindexmorepagesthanyou
hadexpected,orrequire.Ifthishappensyoushouldeitherwaitforthescriptto
complete,orkilltheprocess(asdescribedonpage37),andthenchecktheconfiguration
recordsbeforerunninggetsite.plagain.
Ifthegetsite.plscriptrunsmorefrequentlythanexpected,checktheentriesinthecron
tabfortheMemexadministrator.TheautoinstallerforAutoWebaddscronjobsfor
getsite.pltothecrontaboftheMemexadministratoruser.Iftheautoinstallerwasrun
morethanonce,thecrontabwillcontainduplicatecronjobs,whichmustberemoved
byeditingthecrontab.

40

Appendix C
HTTrack options
HTTrackWebsiteCopierisopensourcesoftwarethatisusedtomirrorWebpages.Memex
hasalteredthesoftwareslightlyforusewithAutoWeb.
Note For more information about HTTrack, visit: http://www.httrack.com/ and
http://www.httrack.com/html/fcguide.html.

ThistableliststheoptionsthatAutoWebusesbydefault.
Option Description
-n
GetnonHTMLfilesnearanHTMLfile
-%e2
Setstheexternallinkdepthto2
-A32000
Setsthemaximumtransferrateinbytes/seconds
-I0
Dontmakeanindexpage
-Qq
Nologandnoquestions
--assume
cfm=text/html,php=text/html
Assumethatatype(cfm,php)isalwayslinkedwitha
mimetype
-X0
Donotpurgeoldfilesafterupdate
-%F ""
DonotputafooterintotheHTMLpages
-%P0
Donotdoextendedparsing
-C0
Donotuseacache
-%Q
Donotfollowanyhyperlinksfromthepage
ThisoptionhasbeenaddedtoHTTrackbyMemex
-d
Stayonthesameprincipaldomain

Thistablelistsotheroptionsthatyoucanuse,ifnecessary.Touseeitheroption,addittothe
optsparameterinthespider.cfgfile.Ifnooptionisset,thedefaultbehaviourisfollowthe
rulesinrobots.txt.SeeSettinguptheAutoWebconfigurationfileonpage15.
41
Memex Technology Ltd A Guide to AutoWeb
Option Description
-s0
WhenretrievingWebpages,donotfollowtherulesspecifiedinrobots.txt
ontheremotewebserver.
-s2
Followalloftherobots.txtruleswiththeexceptionofDisallow:/asthis
willpreventthesoftwarefromretrievinganypagesfromaWebsite.


42

Appendix D
Upgrading to AutoWeb 1.3
IfyouarecurrentlyusingAutoWeb1.0or1.1youmustupgradetoversion1.3beforeyoucan
upgradetoversion2.0.Onceyouhavea1.3systemyoucanupgradeto2.0byfollowingthe
instructionsonpage18.
UpgradingfromAutoWeb1.0or1.1toAutoWeb1.3isatwostageprocess.First,youmust
backupyourpreviousAutoWebsetup;thenyouneedtoinstallAutoWeb1.3.
Important You will need the installation package for version 1.3 of AutoWeb to complete
this procedure.


Backing up your previous AutoWeb setup
Beforebeginningtheupgrade,youshouldbackupyourexistingAutoWebconfigurationand
databases.Iftheupgradeprocessencountersanyproblems,youcanthenreverttoyour
known,validsetup.
Afterthebackupiscomplete,shutdowntheexistingMIEandmovetheAutoWebdirectories
aside.Forexample,ifyouinstalledyourpreviousversionofAutoWebin
/opt/memex/autowebyoushouldmovethiswholedirectoryto/opt/memex/autowebold.

Installing AutoWeb 1.3
AfterbackingupyourexistingAutoWebsetup,youmustperformanew,cleaninstallationof
AutoWeb1.3.
Note You must install AutoWeb into the same directory as your previous version. For
example: /opt/memex/autoweb.

IfyouareusingthisproductwithMemexPatriarch,itisessentialthatyoureadChapter3
UsingAutoWebwithMemexPatriarchonpage24.Youmustperformallthestepsdetailedthere
beforeyouproceedwiththeconversion.

43
Memex Technology Ltd A Guide to AutoWeb
Converting your AutoWeb data
AfterinstallingAutoWeb1.3,youmustrunaconversionscripttoconvertthedatafromyour
previoussetupandcreateanynewdatabasesthatmayberequired.

Setting up the conversion script
Theconversionscriptsreadsaconfigurationfileconvert.confwhichisstoredinthebin
directoryofthenewAutoWebinstallation.ThisfilespecifiesthedetailsoftheAutoWeb
databasesthatwillbeconverted.
Beforerunningtheconversionscript,youmustsetthefollowingoptionstoreflectyour
AutoWebsetup:
Option Details
MIEDecodeDir ThepathtotheMIEinstallationusedbythepreviousversionof
AutoWeb
MIEDir ThepathtothenewMIEinstallation
MIEPort ThenetworkportthatthenewMIEislisteningon
OldAutoWeb ThepathtothepreviousAutoWebsetup
NewAutoWeb ThepathtothenewAutoWebinstallation
IMBase TheinstallationdirectoryforMemexPatriarch(ifinstalled)
TempDir Adirectorytouseforstoringtemporaryfiles
Verbosity HowdetailedtheAutoWeboutputwillbe:
0basicoutput
1tracksprocesseddatabases
2detailedoutput

Running the conversion script


Afterspecifyingtheconversionoptions,youcanruntheconversionscript.
Toruntheconversionscript:
1. MovetothebindirectoryofyournewAutoWebinstallation.
2. Runthefollowingcommand:
perl aw-convert.pl

ThescriptconvertsallthedatafromyourpreviousAutoWebsetupandcreatesanynew
databasesthatareneededtomatchyourprevioussetup.

44
Memex Technology Ltd A Guide to AutoWeb
Note After running the conversion script you still need to open the spider.cfg file in the
new AutoWeb installation directory and make sure the options are configured
correctly.


45

Вам также может понравиться