Вы находитесь на странице: 1из 15

MatchMaker

Features, Specifications, and Requirements


MatchMaker is a high performance fault-tolerant data matching and search
platform. Its unique focus on structured data allows very advanced data handling
capabilities at very high speeds ideal for directories, e-commerce applications,
enterprise security and data quality applications, and much more. MatchMaker is
also a full-featured server application with advanced configuration GUIs, data
connectivity support, connectivity for distributed architectures, benchmarking and
monitoring tools, and advanced APIs.
MatchMaker offers various kinds of options and extensions. There are also prepackages applications developed around its powerful data handling capabilities.
This document details all its features, options, extensions, and system
requirements.

System Requirements
Hardware
requirements

MatchMaker is a memory intensive application and should use a


DDR Ram System with a high FSB clock rate. CPU power is not as
important, but should be around 1GHz on such systems in order
to use the DDR-RAM to its best.

Operating
systems
requirements

Binary support is currently provided for:


Windows NT Family: Windows 2000, Windows XP, Windows
Server 2003
Linux: gcc 3.2 on Intel.
Solaris: Solaris 8 and 9 on Sparc.
Customer specific executables and libraries are available on
demand for all major systems or platforms.

Data
compatibility

MatchMaker can be paired with any database or structured data


format. Its proprietary in-memory index and incremental update
capabilities provide complete flexibility.
Indexing features allow to also index full-text data and
unstructured documents for lookups from MatchMaker.

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

9/12/2008

System Requirements
Unicode and
languages

MatchMaker is Unicode compliant and language independent.


The data matching techniques available in MatchMaker rely
primarily on mathematical algorithms that work in all languages.

Transliteration

MatchMaker offers standard support for transliteration functions


from Hangul, Kanji, Katakana and Hiragana for Japanese to Roman
character sets. This allows building indexes that can be
alternatively accessed through Katakana or Hiragana or Romaji
(Roman) input, even with errors in them. Additional languages can
be easily added on request. So MatchMaker can be used on Asian
language interfaces with no changes to existing systems.

Architecture
Scalability of
system

Recall engine can be clustered to increase throughput by using


multiple processors on a single machine or multiple machines. The
system scales well for latency in large datasets with approximately
linear increase with the data size. Support is included for splitting
very large datasets to reduce memory requirements and latency.

Availability

The system remains available during data update. A drop in


available bandwidth during the update equal to one over the
number of recall engine services deployed is the only measurable
consequence of an index update.

Error-recovery

There is support for recovery from network errors and any forced
program termination within the run-time system components.

File history

New dataset builds can be reverted to one backup of the


compiled files and runtime configuration.
The files have a time-stamp id and only the youngest files and
runtime configuration are kept available.

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

Architecture
Context storage

For constructing powerful interactive multi-field servers (patents


pending).

Dictionary size

Any size of database can be searched. In MatchMaker there is


virtually no limit to the size of the dictionary (your data indexed
by MatchMaker).

Data updates

For data without time sensitive changes, complete data can be


fetched and recompiled. Incremental update is supported through
a difference file mechanism and up to two levels of delta servers
to index the changes while the main index recompiles.

ODBC and CSV


Data Import

MatchMaker has the ability to import data via ODBC and CSV
sources. Data can thus be imported directly into MatchMaker
either from running databases or CSV files, without intermediate
extraction steps.

File mapping

File mapping on Windows and Unix systems allows faster loading,


sharing of memory and use by multiple processes.

Thread safety

Thread safety is offered after initialization (optional).

Release Policy

Releases are numbered as major.minor.build. Minor numbers


indicate feature changes and major numbers indicate the
introduction of major new features. Changes of a build number
only indicate bug fixes. The minor release interval is between 5
and 8 months.

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

MatchMaker Components
exMinistrator

MatchMakers configuration manager includes wizards, graphical


support and logical arrangement of parameters in individual
configuration panels and links to an on-line help that describes all
parameters.
It allows to:
Create instances of exTractor (see below) components and
schedule the build process:
o exTractor compiles data
o exTractor signals exTributor
o exTributor signals exSight
o exSight fetches data
Install and configure services for exStream, exTributor and
exSight
Stop and start all active services, to allow installation of
software updates

exTractor

exTractor allows to configure the build and recall settings for a


given data source. Compiled database storage that combines all
functionality into one consistent data structure. It handles more
than 30 fields that can each use a different approximate matching
technique.
There are four main branches in tree view:
Project (etp), this contains the links to the other files
Extract (mbc), this contains data source and optional data
extraction settings
Build (mbc), this contains the settings that are needed to
compile the data
Session (ses), this contains only run time settings.

exTributor

exTributor is a query receptor and dispatcher required for each


MatchMaker index which can communicate with a web server
through the Matchmaker API.

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

MatchMaker Components
exSight

The SESSION exSight engine is the query processor which handles


interactive multi-field search sessions.
The server stores the state of each open search session up to a
licensed limit.
The session has the following parts:
o Input fields with status, modes, etc.
o Field candidates
o Record Suggestions with selectable columns
o Context Selection
All field results have a status with respect to the context (in,
out, unknown)

exSampler

exTtractor links to exSampler for quickly testing the settings


Standalone process
Run locally for testing configuration

exSpector

Interactive test client per data (browser Build or Runtime system


simulate API calls).

exSpeed and
Monitor

Additional benchmarking tool and server performance monitor.


Dynamic analysis and visualization of queries in real-time. An
engineer can thus examine, search latency. Storage requirements
for the index file and queries can also be graphically analyzed.
This enables developers to easily find bugs such as memory leaks
and much more.

Optional MatchMaker Components


exStream

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

Scalability enabling system to broadcast between different


portions of data through several instances of exTributor and a
remote client through the MatchMaker API.

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

MatchMaker Extensions
MatchMaker
Search

A specific implementation and configuration of MatchMaker


offering:
Web-based user GUIs allowing searching of data, display of
results, and search engine-style implementations with worldclass approximate results quality and relevance of
MatchMaker.

SearchNavigator

A specific implementation and configuration of MatchMaker


offering:
Web-based user GUI allowing interactive Web 2.0-style
suggestions. User gets a number of suggestions in an
interactive JavaScript layer as he types each character of his
query, straight from full index. This can be used as a search
interface enhancement, or a tool for decision support.

FlexForm

A specific implementation and configuration of MatchMaker


offering:
Web-based user GUI allowing interactive Web 2.0-style
advanced query configuration. User gets a number of
suggestions in an interactive JavaScript layer as he types each
character of his query. Once he chooses a suggestions, query
terms is stored under a given query field and suggestions for
the next logical field are offered. Again working straight from
full index. This can be used as a search interface
enhancement, a tool for decision support, a product
configuration, or simplified combination of advanced and
simple query interfaces.

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

Matchmaker
OCR

A specific implementation and configuration of MatchMaker


offering:
automated matching of Optical Character Processed content
against a database of normalized content (addresses, words,
forms, administrative processing, etc.),
queuing of queries falling under a relevance metrics threshold,
Web-based GUIs allowing the manual processing of queued
queries by human operators with interactive multi-stage
decision support through MatchMaker Search and
SearchNavigator.

MatchMaker-based Applications
MatchMaker
Data Quality
Server
(Q1 2009)

A specific implementation and configuration of MatchMaker


offering:

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

automated matching of queries against a database of


structured content,
queuing of queries falling under a relevance metrics threshold,
sophisticated Web-based GUIs allowing the manual
processing of queued queries by human operators with
interactive multi-stage decision support through MatchMaker
Search and SearchNavigator,
a template language allowing the creation of complex GUIs
supporting all the search methods available in this
configuration.

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

MatchMaker
Directory
Platform

A specific implementation and configuration of MatchMaker for


the online directory industry (Yellow Pages, local search, vertical
directories, etc.) usable a white label directory platform offering:

(Q2 2009)

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

a preconfigured MatchMaker implementation specially geared


at searching name and address data,
a Search Navigator enhanced search interface for optimal
usability,
a fully functional white label directory web site including
search results pages featuring maps, targeted ads, premium
listings, categorical browsing,
a unique set of partner applications for content management,
data sales management, ad serving, etc.

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

Recall Engine Features


String Recall

MatchMaker contains several string matching algorithms such as


the Levenshtein edit-distance algorithm for unlimited fault
tolerance, the finite automaton structure (even faster recall limited
to 3 edit operations), and the longest common subsequence (LCS)
algorithm. The Levenshtein edit-distance is scaled by weighted
query and entry length calculated with: edit structure for unlimited
fault tolerance; auto structure for faster recall limited to 3 edit
operations; or words structure for word-order independent recall.
Special algorithms such as the SUBSET and SUPERSET algorithms
match word flipping like breakfast->fast-break.
Three modes are available: exact recall; approx for whole word
approximate access; complete for approximate prefix matching;
and detect for in-word matching. String recall is all performed in
8-bit character sets. The edit method also supports wild-card
search with glob characters '*' and '?' in exact mode.

Error correction

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

The following search term or error correction operations are


supported:
Insertion of single characters.
Deletion of single characters.
Substitution of single characters.
Transposition of adjacent characters.
Global transposition of characters and also parts of words.
Global character mapping, Umlaut expansion, de-accentuation
etc.
Local character alternatives ([g9]).
Wildcards and globstyle matching ('?', *).

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

Recall Engine Features


Multi-field
Access

Weighted multi-field result intersection with re-evaluation and


support for local aliases. In addition to the field weighting fields
can be marked as mandatory or optional. Three index to base
references modes available: normal; interval for indexes with many
repeating values; and group for multi-word keyword type indexes.
Multiple-fields are combined using conjunction with the option of
disjunction on one level below. One field may be used as bias
field, meaning scores from this field are used to push results up,
rather than contributing to the conjunction.

Cross Field
Search

MatchMaker offers a fast Cross Field Search Module which parses


and assigns the query strings to the correct field. The result set
contains all similar results from each field. Using this splitting
approach, a query strings is easily possible. The queries are
interpreted as follows: "Is there any field whose contents are
approximately equal to the Nth part of the query?"

Combined InField Methods

MatchMaker enables the processing of fields with mixed contents.


One field may, for instance, contain a date value sometimes and,
in some cases, normal strings. MatchMaker can be configured to
automatically call the appropriate method based on the content
type. This means MatchMaker activates the appropriate
comparison method depending on the data not on the query (i.e.
date comparison is applied to date entries, while string
comparison is applied to all other entries.

Words2 Methods

Several multi-word lookup methods (with word swaps, free factor


search and special compression scheme) handle multi-word
search terms that contain interchangeable words. There is
automatic relevance ranking, single word aliasing, single word
biasing and other approaches. For fast and fault-tolerant
extraction of such keywords from long text, MatchMaker also
supports a quick scanning function with approximate lookup. This
allows labeling text entries containing relevant keywords which
can be then used to support SearchNavigator.

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

10

Recall Engine Features


Approximate
grep

Function for approximate scanning of files.

Phonetic
matching

Going beyond SOUNDEX (SOUNDEX, METAPHONE or Exorbytes


own phonetics all available) and using edit distance features in
addition.

Flag Attributes or
Option Attributes

Highly structured data often contain yes/no attributes (flag


attributes) and numerical attributes with limited value sets (option
attributes). Such data attributes can now be efficiently
compressed by MatchMaker, so that hundreds of these attributes
can be combined and queried approximately. This enables a single
query to return all entries whose attribute sets most closely match
those in the query, without having to test each attribute set
separately.

Automatic
re-evaluation

For all query modes for building ordinary SQL data servers.

Geographic
recall (GPS)

Geographic recall method available for bi-dimensional


approximate geometric recall. MatchMaker can configure the
radius and the sharpness of such a lookup. Results can be scored
and ranked as for any other string distance recall.

Date
Comparison
Module

Data fields with calendar date information can be compared using


many different standard formats. Comparison is approximate with
respect to spelling AND time distance, where the sharpness of
time retrieval is configurable.

Number Range

Data fields with simple numbers can be used with range


comparison functions that allow for approximate queries like
value is approximately greater than, where the accuracy can be
configured.

Other recall
methods

Counter for trivial unique-key handling; bias for including


popularity or other additional weighting information in record
scoring; premise method for evaluation premise ranges; a
configurable Tcl method for custom recall.

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

11

Recall Engine Features


Fast combination
of results

From different modules using weights and relevance from each


single query result.

Very fast wordby-word


comparison

Routines using similar techniques as the indexing version for


almost all modes, including localization of the actual match.

Plausibility
evaluation

For a priori plausibility and a posteriori evaluation of


corrections made.

Aliases

MatchMaker supports alias (synonyms, acronyms, abbreviations,


fixed skip words, white lists, black lists, etc.) handling in different
ways: Local aliases that only apply to a single record, or global
aliases that are valid for the whole data set. Systematic aliases are
also supported through alternative fields. In a different language
for instance, that relationship is treated as a logical OR.

Search Profiles

MatchMaker allows custom search profiles. Different clients,


depending on their roles, may use different interfaces to connect
to the same database (weighting differences, connection logic,
rescaling, thresholds, extra fields, etc.) All of these settings can
now be configured separately for each client type and saved
under a given profile name.

Access Rights
Management /
View Module

Matchmaker offers a very fast filtering method that provides


filtering of "allowed" candidates whether they are individual
entries, whole branches of a category-tree, or other subsets. The
method allows for easy implementation of a view or of role
concepts already taken into account during the search on the
server.

Character
Encodings

Queries and data can use any standard encoding scheme (e.g. ISO
8859 or Unicode utf-8). Internally Unicode data is handled by
mapping utf-8 strings into an 8-bit character set. Configurable
character mappings are used before all comparisons to allow sets
of similar characters to be unified before comparison. The
mappings also allow for single characters to be mapped to
character strings up to 4 characters.

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

12

Recall Engine Features


Full Text Search
Module

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

MatchMaker has a full text indexing engine for documents on a


file by file basis. For each word in the document collection
MatchMaker tracks the position of the word within each
document, the frequency of the word in each document, overall
frequency and other metrics. These measurements all become
available as ranking criteria for subsequent query processing.
MatchMakers full text module supports phrase detection,
exclusion words, inclusion words, approximate or exact recall on
word basis, word combination and word splits on demand, single
word aliasing, single word biasing, custom skip words, automatic
skip words, wildcard search, suffix and prefix search, and more.
MatchMaker full text module can also generate teasers (text
surrounding the matched words in the original document) on
exact or approximate matches.
MatchMaker full text module can be integrated with standard web
crawlers. Crawlers can be controlled from MatchMaker to directly
import documents of different types.

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

13

Integration and API


OCR-Extensions

The OCR version of Matchmaker has special recall functionality for


handling of OCR recognition engine character guesses. And also
additional logic for labeling the status of a given result with
respect to the suitability of the result for automatic update and/or
display to human operator for manual verification.

Server Side
Scripting

MatchMaker supports Tcl server-side scripting to replace standard


search functionality by a custom logic with the ability to read all
input strings and modes. Manipulate the recall session and
construct results by merging several query results and/or post
processing the results. In the scripting ten extra DETECT recall
modes are available for selecting special algorithmic modes and
re-evaluation models.
It also allows custom created scripting filters on single fields,
manipulating complex queries, modifying results by merging
several query results, and post-processing results. Additional
resources can be utilized in the scripting, like using special
comparison methods (string matching algorithms). Writing scripts
is supported by template generators for each function type.

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

14

Integration and API


Application
Programming
Interface (API)

Communication with exTributor or exStream is done via an API


(called MMI).
This client-side programming interface for building powerful
applications allows almost any programming language required.
The underlying unified client interface is plain text over TCP/IP
sockets, allowing access from any language with support for
sockets. Native APIs and example code are available for: C++,
COM, Java, PHP, Tcl and Python.
For each query we open a new port (optional)
All data is ASCII with tab separators for keys and value pairs
Full interface specifications, code examples, and documentation
are available to support developers who write their own client
libraries.

Exorbyte Inc.
7400 SW Barnes rd, Ste 743
Portland OR 97225 - USA

www.exorbyte.com
T +1 503 616 4007
F +1 503 914 5937

15