Академический Документы
Профессиональный Документы
Культура Документы
ment’s intelligence relevance and obtain actionable informa- FIGURE 1 Before – After: NovoDynamics software ex-
tion from it to support the war fighter. tracts information from degraded documents, to help
military analysts pull valuable information from more
An emerging solution to the challenge of battlefield document
sources.
exploitation employs field-based systems that integrate ad-
vanced forms of document image capture, multilingual Optical
High word-spotting accuracy is obtained through the use of
Character Recognition (OCR), multilingual machine transla-
query-time OCR. A typical general-purpose OCR lexicon is
tion, and multilingual word or phrase spotting. Of particular
designed to cover the most frequently used words in the target
interest are the recent developments in multilingual OCR and
language to maximize recognition performance without mak-
multilingual word spotting that make field-based exploitation
ing any assumptions about document content. While this strat-
systems practical.
egy provides the best generic recognition, it is not ideal for
The current military operations in Iraq and Afghanistan under- word spotting, or equivalently, search queries, because queries
score the need for OCR systems that effectively transcribe against a document (or a corpus) are almost always concerned
Middle Eastern and Asian languages. While OCR software for with less frequent words representing entities such as people,
Latin languages has long existed, systems that can recognize places, or things. Since these types of words occur only in
languages such as Arabic, Persian, Pashto, and Urdu as well as specialized contexts, they are not usually included in a
Chinese, Japanese, and Korean are just emerging. Recently general-purpose lexicon. Consequently, they are more likely to
developed multilingual OCR systems address these military- be incorrectly recognized by a generic OCR engine, particu-
significant languages and have a number of unusual capabili- larly in the case of low-quality document imagery where
ties that directly fulfill the needs of battlefield systems. Promi- word-spotting accuracies are significantly decreased.
Query-time OCR is implemented by constructing a supple-
mental lexicon from the keywords of each query and providing
it to the OCR when word spotting is performed. Though not
obvious, query-time OCR turns out to be a very practical ap-
proach to word spotting that results in accuracy improvements
of up to 15% compared to conventional methods.
NovoDynamics, Inc.
Tel: 734.205.9126
www.novodynamics.com