Вы находитесь на странице: 1из 44

Document Recognition

a technology overview

Presented by:
Chris Riley of Artsyl Technologies, Inc.
But First
„ Your new AIIM Board!

„ Exciting new events


… Golf
… Networking
… More Education Sessions
What we will cover:
„ Why Chris?

„ What Are the Document Recognition Technologies

„ Who Makes Them

„ Buyer Beware

„ The future

„ Q&A

„ Free Stuff!
Why Chris?
„ Who is Artsyl?

„ What qualifies Chris to talk to me?


…
… When a developer turns to sales
What we will cover:
„ Why Chris?

„ What Are the Document Recognition Technologies

„ Who Makes Them

„ Buyer Beware

„ The future

„ Q&A

„ Free Stuff!
Who knows what OCR is?
The Technologies
„ OCR – Optical Character Recognition
„ ICR – Intelligent Character Recognition
„ OMR – Optical Mark Recognition
„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: OCR
„ OCR – Optical Character Recognition
„ ICR – Intelligent Character Recognition Ship To:
„ OMR – Optical Mark Recognition
„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: ICR
„ OCR – Optical Character Recognition
„ ICR – Intelligent Character Recognition Ilya
„ OMR – Optical Mark Recognition
„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: OMR
„ OCR – Optical Character Recognition
„ ICR – Intelligent Character Recognition Card Account

„ OMR – Optical Mark Recognition


„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: Barcode
„ OCR – Optical Character Recognition
„ ICR – Intelligent Character Recognition 1889094476620

„ OMR – Optical Mark Recognition


„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: Handwriting
„ OCR – Optical Character Recognition
„ ICR – Intelligent Character Recognition * Critical *
„ OMR – Optical Mark Recognition
„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: Acronym Heaven
„ OCR – Optical Character Recognition
„ ICR – Intelligent Character Recognition
„ OMR – Optical Mark Recognition
„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: CAR/LAR
„ OCR – Optical Character Recognition
„ ICR – Intelligent Character Recognition
„ OMR – Optical Mark Recognition
„ Barcode 2 hundred dollars & no cents
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: Assisted Capture
„ OCR – Optical Character Recognition
„ ICR – Intelligent Character Recognition
„ OMR – Optical Mark Recognition
„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: Fixed Form Processing

„ OCR – Optical Character Recognition Name: Ilya


„ ICR – Intelligent Character Recognition Date: 12/21/2982
„ OMR – Optical Mark Recognition
„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: Fixed Form Processing

Name: Ilya
Date: 12/21/2982
80% of business end-user documents
are semi-structured
The Technologies: Semi-Structured Forms
Invoice No: 99044
„ OCR – Optical Character Recognition Date: 06/09/04
„ ICR – Intelligent Character Recognition Invoice No: 24567
„ OMR – Optical Mark Recognition Date: 06/09/04

„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: Semi-Structured Forms

Invoice No: 99044


Date: 06/09/04

Invoice No: 24567


Date: 06/09/04 (06/09/2004)
The Technologies: Semi-Structured Forms
Consignee
„ OCR – Optical Character Recognition Consignor
„ ICR – Intelligent Character Recognition Date
„ OMR – Optical Mark Recognition Term

„ Barcode
„ Handwriting
„ All the other ones made up for marketing purposes

„ CAR/LAR ( Check21 ) – Courtesy and Legal Amount Recognition


„ Assisted Capture
„ Fixed Form Process
„ Semi-Structured Forms Processing
„ Unstructured Document Processing
The Technologies: Common Processes

„ Full page conversion


„ Classification
„ Index level extraction

„ Redaction
„ Routing
„ Auto Filing
„ Re-Purposing
„ Image Rotation
The Technologies: Full page conversion

„ Image file to electronic data file


„ ALL text on the page
„ Includes:
… Image Pre-processing
… Document Analysis/Zoning
… Extraction
… Export ( Commonly PDF, DOC )
The Technologies: Classification

„ Software tells you the document type


„ Scan batches of mixed documents

i ng ice
ad vo
f L In
lo e ck
il
B Ch

PO
The Technologies: Index Level Extraction

„ Just certain required fields extracted


„ Normalization of data
„ Export usually to a database

Invoice Number
Invoice Date
Total Amt Due
Term
The Technologies: How Accurate

„ Better question is how do you determine


accuracy

„ Document Type Accuracy


„ Field/Zone Location Accuracy
„ Data Type Accuracy
„ Character Accuracy
The Technologies: Common usage scenarios

„ Document Conversion

„ Document Archival / Retrieval

„ Invoice Processing

„ Insurance Processing( medical, mortgage )

„ Waybill processing

„ Survey processing
What we will cover:
„ Why Chris?

„ What Are the Document Recognition Technologies

„ Who Makes Them

„ Buyer Beware

„ The future

„ Q&A

„ Free Stuff!
There Really are only 3 core
technology providers

It takes 50 man-years to develop OCR


using current computing abilities
Who Makes Them: Core Engines
„ ABBYY
„ Nuance ( formally ScanSoft )
„ ReadI.R.I.S

„ Océ
„ CharacTell
„ ParaScript
„ A2iA

„ Handful of Open Source


„ Handful of Other Vendors
„ Two handfuls of OLD engines
Who Makes Them: Who Licenses Them
„ EVERYONE ELSE!
„ AnaComp
„ Anydoc
„ BancTec
„ BrainWare
„ Captaris
„ Captivation
„ Cardiff
„ CVision
„ DataCap
„ DigiTech
„ eCopy
„ EMC Documentum
„ Kofax
„ LaserFiche
„ LeadTools
„ Microsoft
„ NSi AutoStore
„ OnBase
„ Perceptive Imaging
„ ReadSoft
„ SER
„ Top Image Systems
„ Tower
„ Westbrook
„ Xerox

„ Hundreds More
What we will cover:
„ Why Chris?

„ What Are the Document Recognition Technologies

„ Who Makes Them

„ Buyer Beware

„ The future

„ Q&A

„ Free Stuff!
30% of organizations that purchase,
purchase the wrong thing

Over 50 % of organizations that


purchase never use it properly
Buyer Beware
„ If OCR is the reason for buying a solution know
what Engine it is!

„ Talk about the WHOLE solution not the pieces

„ Get past marketing gimmicks

„ Trust, Love, Be Certain of your reseller / vendor


Buyer Beware: Know your engine

„ What version?
„ Will they upgrade?
Buyer Beware: Talk about Whole Solution

„ Scanner / Input
„ Capture
„ Storage

„ Have Requirements List Before


Buyer Beware: Get past Gimmicks

„ NOTHING! Is 100%

„ All canned demos work perfect

„ Always see test on your documents

„ Version numbers are really arbitrary


Buyer Beware: Trust your vendor / reseller

„ Support after sale ( test them )

„ Where to get professional services

„ Do they understand the solution and not


just the pieces?
What we will cover:
„ Why Chris?

„ What Are the Document Recognition Technologies

„ Who Makes Them

„ Buyer Beware

„ The future

„ Q&A

„ Free Stuff!
The Future
„ Full-page OCR will be a commodity

„ Advance Document Processing will become main-


stream but less required

„ Think about what to do now that you will be gathering


data rapidly

„ There will be a new approach to OCR


What we will cover:
„ Why Chris?

„ What Are the Document Recognition Technologies

„ Who Makes Them

„ Buyer Beware

„ The future

„ Q&A

„ Free Stuff!
Questions and Answers
„ Before you ask
What we will cover:
„ Why Chris?

„ What Are the Document Recognition Technologies

„ Who Makes Them

„ Buyer Beware

„ The future

„ Q&A

„ Free Stuff!
Free Stuff
„ Copy of ABBYY FineReader Pro 9.0
„ Copy of Nuance OmniPage 16
„ Copy of ReadI.R.I.S Pro 11

„ 4 Hour Consulting Session with ME!

Вам также может понравиться