Академический Документы
Профессиональный Документы
Культура Документы
Version 1.120613
Objective
Agenda
PDF Metadata
Dictionary, XMP and Entity Extraction
Configuration
Sharepoint 2010 , 2013
Summary
Microsoft Sharepoint Server - 125 million licenses sold Sharepoint to be a natural target for PDF storage
What is Sharepoint?
On-Premise and Cloud-based Collaboration & Document Management Platform
Sharepoint Overview
Sharepoint Overview
List or library data in a site collection is stored in a SQL Server database table, which uses queries, indexes and locks to maintain overall performance, sharing, and accuracy.
Filtered views with column indexes (and other operations) create database queries that identify a subset of columns and rows and return this subset to your computer.
Thresholds and limits help throttle operations and balance resources for many simultaneous users.
Privileged developers can use object model overrides to temporarily increase thresholds and limits for custom applications.
Administrators can specify dedicated time windows for all users to do unlimited operations during off-peak hours.
Information workers can use appropriate views, styles, and page limits to speed up the display of data on the page.
Windows Server 2008/12 Internet Information Server (IIS) .Net Framework SQL Server MS Office
Options
Sharepoint UI Acrobat XI Load Tools Custom Code Workflow & Event Receivers
WebRequest request = WebRequest.Create(destUrl); request.Credentials = CredentialCache.DefaultCredentials; request.Method = "PUT"; byte[] buffer = new byte[1024]; using (Stream stream = request.GetRequestStream()) using (MemoryStream ms = new MemoryStream(fileBytes)) { for (int i = ms.Read(buffer, 0, buffer.Length); i > 0; i = ms.Read(buffer, 0, buffer.Length)) { stream.Write(buffer, 0, i); } } WebResponse response = request.GetResponse(); response.Close(); Logging.Log("Upload successful");
http://www.adobe.com/uk/products/acrobat/pdf-version-control-sharepoint-integration.html
Item 1 Item 2
iFilters scan documents for text and attributes primarily in support of Microsoft Search technologies.
iFilter Architecture
iFilter Configuration
iFilter Explorer
iFilter Explorer
https://gist.github.com/jimschubert/1473904
StringBuilder Buffer=new StringBuilder(); string PDFFile = @"C:\dev\PDF Conference\s.pdf"; FilterCode f=new FilterCode(); f.GetTextFromDocument(PDFFile, ref Buffer); Console.WriteLine(Buffer);
public void GetTextFromDocument(string Path, ref StringBuilder Buffer) { IFilter filter = null; int hresult; IFilterReturnCodes rtn; // Initialize the return buffer to 64K. Buffer = new StringBuilder(64 * 1024); // Try to load the filter for the path given. hresult = LoadIFilter(Path, new IntPtr(0), ref filter); if (hresult == 0) { IFILTER_FLAGS uflags; // Init the filter provider. rtn = filter.Init( IFILTER_INIT.IFILTER_INIT_CANON_PARAGRAPHS | IFILTER_INIT.IFILTER_INIT_CANON_HYPHENS | IFILTER_INIT.IFILTER_INIT_CANON_SPACES | IFILTER_INIT.IFILTER_INIT_APPLY_INDEX_ATTRIBUTES | IFILTER_INIT.IFILTER_INIT_INDEXING_ONLY, 0, new IntPtr(0), out uflags); if (rtn == IFilterReturnCodes.S_OK) { STAT_CHUNK statChunk;
[DllImport("query.dll", SetLastError = true, CharSet = CharSet.Unicode)] static extern int LoadIFilter(string pwcsPath, [MarshalAs(UnmanagedType.IUnknown)] object pUnkOuter, ref IFilter ppIUnk);
iFilter Test
Bookmark Text
XMP Metadata
PDF Attachment
Annotation
PDFLib iFilter
FoxIt iFilter
Classify :
Image-Only Born-Digital Part Image-Only, Part Born-Digital Previously OCRed
Objectives:
Ensure Full Searchability Avoid Text to Image Processing
Process :
Consider Automation
Entity Extraction
Configuration
http://www.adobe.com/devnet-docs/acrobatetk/tools/AdminGuide/Acrobat_Reader_IFilter_configuration.pdf
PDF Format Handler Support Currently no iFilter Support for PDF !?!?!!
http://stevemannspath.blogspot.co.uk/2012/10/sharepoint-2013-pdf-preview-in-search.html http://stevemannspath.blogspot.co.uk/2013/04/sharepoint-2013-pdf-support-and.html
Microsoft Sharepoint Server - 125 million licenses sold Sharepoint to be a natural target for PDF storage PDF as a Sharepoint First Class Citizen
Summary
Contact : neil.pitman@aquaforest.com