Вы находитесь на странице: 1из 28

Enterprise Content Management Systems for Publishing Solutions at

State and Federal Government Agencies

A Whitepaper

Quantilus Inc.
115 Broadway, 12th Fl
New York, NY 10006
Ph: (212) 768-8900
Email: info@quantilus.com
http://www.quantilus.com

! ! ! ! !
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
Table&of&Contents&
1! UNDERSTANDING,A,GOVERNMENT,PUBLISHING,AGENCY’S,NEEDS, 1!
1.1! APPROACH, 1!
1.2! FUTURE,STATE,WORKFLOW, 2!
1.3! FUTURE,STATE,–,PUBLICATION,AND,DOCUMENT,LIFECYCLES, 3!
1.4! FUTURE,STATE,–,SELECTED,ANNOTATED,WIREFRAMES, 3!

2! SYSTEM,ARCHITECTURE, 6!
2.1! REPOSITORY,LAYER, 6!
2.1.1! DATABASE!LAYER! 6!
2.2! SECURITY,SERVICES, 6!
2.2.1! PERMISSIONS!SERVICE! 6!
2.2.2! USERS!AND!GROUPS!SERVICE! 7!
2.2.3! ENCRYPTION!SERVICE! 7!
2.2.4! LOGGING!SERVICE! 7!
2.2.5! AUTHENTICATION!SERVICE! 7!
2.3! CONTENT,SERVICES, 7!
2.3.1! CONTENT!MODEL! 7!
2.3.2! CHECK>IN/CHECK>OUT! 7!
2.3.3! SEARCH!AND!QUERY! 7!
2.3.4! VERSIONING! 8!
2.3.5! LIFECYCLE! 8!
2.3.6! RECORDS!MANAGEMENT! 8!
2.4! PROCESS,SERVICES, 8!
2.4.1! COLLABORATION! 8!
2.4.2! WORKFLOWS! 8!
2.5! SYSTEM,INTEGRATION,AND,CONTENT,DISTRIBUTION, 9!
2.6! FINANCIAL,LEDGER, 10!
2.7! REPORTING, 10!
2.8! USER,INTERFACE, 11!
2.9! CHOICE,OF,CMS:,ALFRESCO,VS,DOCUMENTUM,VS,SHAREPOINT,VS,CONTENT,MANAGER,VS,NUXEO, 11!

3! EDITING,AND,FORMATTING,TOOL, 13!
3.1! CORE,WORD,FEATURES, 13!
3.2! TEMPLATES, 13!
3.3! EXTENSIONS,AND,PLUGINS, 13!
3.4! RIBBON,BUTTONS,AND,MENU,ITEMS, 13!
3.5! BROWSERGBASED,ONLINE,AUTHORING,TOOL, 14!
3.6! BLUEPENCIL,CONTENT,QA,TOOL, 14!
3.7! ONGDEMAND,PREVIEW,SERVICE, 15!

4! MIGRATION,,DEPLOYMENT,AND,SECURITY, 16!
4.1! CONTENT,MIGRATION, 16!
4.2! CONTENT,CONVERSION, 16!

! ! ! ! !
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
4.3! INFRASTRUCTURE,AND,DEPLOYMENT, 16!
4.4! OPERATIONS,AND,MAINTENANCE, 17!
4.5! INFORMATION,ASSURANCE,SUPPORT, 19!

5! MANAGEMENT,,TRAINING,AND,SUPPORT, 21!
5.1! QUANTILUS,PUBLISHING,CENTER,OF,EXCELLENCE, 21!
5.2! SOFTWARE,DEVELOPMENT,APPROACH, 22!
5.2.1! TOOLS!USED! 23!
5.2.2! PROGRAM/PROJECT!MANAGEMENT!SUPPORT! 23!
5.3! QUALITY,CONTROL, 24!
5.4! HELP,DESK,SUPPORT, 25!
5.5! TRAINING, 25!
!

! ! ! ! !
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

1! Understanding!a!Government!Publishing!Agency’s!needs!

Federal and State government agencies looking for Publishing and ePublishing solutions
frequently face similar issues. Our understanding of their requirements is based on many full
lifecycle implementations, responses to multiple RFPs, presentations made at pre-solicitation
conferences, and our discussions/demonstrations with some of the key stakeholders during the
RFI stage of multiple solicitations. The high level goals for most projects of this nature are:

!! To create a robust content repository with a customized, easy to use interface for content
storage, access, search, and retrieval.
!! To create a streamlined and automated workflow and notification system for increased
efficiencies in the publishing process.
!! To create a comprehensive content model for content that extends current XML schemas and
metadata models.
!! To create a system that can manage content in different Lifecycle states, and apply business
logic based on the lifecycle states.
!! To extend the capabilities of MS-Word as an authoring tool by providing a rich feature set
that are desired by the agency’s authors and editors.
!! To create modularized and real time Content QA services, that can validate authored content
against styles, guidelines, XML schemas, and business rules.
!! To create a lightweight browser-based authoring tool for in-situ editing of content, with access
to the full suite of Content QA services.
!! To streamline delivery of publish-ready content to multiple channels – print, web and mobile.
!! To provide useful reporting on the state of the content, and the state of the workflows for
various publishing products.
!! To generate notifications for workflow changes, deviations from schedule, lifecycle state
updates, and other pre-set triggers.
!! To provide a platform with enhanced security and audit trails to keep sensitive content secure
and provide traceability for any changes.

These projects are intended to modernize the publishing process – moving it from a print focus
process to a modern, single-source, multi-channel delivery platform.

1.1! Approach
Our approach to this proposal is to present an ideal Future State workflow, describing what the
process will look like after the ECM solution is implemented. We then describe the components
of the solution itself – what it will take to achieve the future state described. Subsequently we
will go over the management practices, development methodology, the process controls and the
logistics – how this solution will be implemented.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 1!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

1.2! Future State Workflow

Figure 1 - Example Future State Workflow


Note: This is a representative process diagram based on our past experience with similar
products that does not necessarily represent all the steps in every agency’s workflow.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 2!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

1.3! Future State – Publication and Document Lifecycles


The following state diagram describes the representative lifecycle states and transitions for
Publications and documents associated with the publications.

Figure 2 - Publication Lifecycle


Lifecycle states are changed by workflow events (either manual or automated) or by rule-based
metadata updates.

1.4! Future State – Selected Annotated Wireframes


The DashBoard view below, represents the complete summary of activity for a user and an
easy way to access tasks and content that are most relevant to the user. A publication is initiated
and then moved through the various steps of the workflow by user decision, automated processes
or elapsed time. During the publication lifecycle, each user is presented tasks to perform to
continue the publication’s process (My Tasks), and alerts (Messages/Notifications) if the
milestones are not being met or action and approvals are required.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 3!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

Figure 3 - Example Dashboard

The Dashboard view is the central security and access point to all information relating to the
publication, giving users easy access to the files, reports and tasks that they will be most likely to
work on at any given point of time. Subsequent screens such as the Publications Page shown
below provides access to the list of all publications that a user has access to, and easy access to
the files contained in the publication grouped by file type. This page also contains controls to
perform operations on individual files (check-in/out, versions, run QA tests, etc.) or on the
publication (manage metadata, publish content, view reports, etc.).

Figure 4 - Operations Context Menu


Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 4!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

When initializing a publication, a set of metadata is specified for the publication. This
information includes the security settings, workflow paths, business rules, reporting categories,
and inquiry keywords for the publication. This view is fully configurable by authorized users.

Figure 5 – Metadata Definition


During the review process, several forms of assistance are available to ensure consistent and
compliant publications. The following view is of the QA report generated at the document
level. This tool runs multiple checks (style, XML schema, business rules, and links) on the
document, and combines errors and warning in one easy-to-use report. The errors can be fixed
inline by the author and / or reviewer.
This view is a prime example of what is possible with the use of the BluePencil Content QA
solution accelerator – a single service to generate and consolidate multiple QA reports.

Figure 6 - Report Inline Editor


Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 5!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

2! System!Architecture!

Figure 7 - System Architecture Diagram


2.1! Repository Layer
The content repository comprises a robust file system to store binary streams of content; a
database to store metadata and associations among content items, classifications, and the
folder/file structure; and full-text indexes.

2.1.1! Database Layer


Most ECM products support multiple databases – including MS-SQL Server, Oracle, and
MySQL. We recommend using Microsoft’s SQL Server 2014 Enterprise Edition, a robust
enterprise database solution. SQL Server is an industry leader with a long list of successful
implementations. We have implemented SQL Server at many large commercial clients and
can take advantage of the robust set of features including back-up and recovery, scalability and
performance, and high availability capabilities.
If the agency has a preference for any other database solution that is supported by standard ECM
products (e.g. Oracle/MySQL), then we also have the experience to work with that.

2.2! Security Services


2.2.1! Permissions Service
The Permissions service supports methods relating to various permissions. There are various

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 6!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

methods relating to the Permission service, including the ability to Read, set, and delete
permissions for objects, Query permissions, and evaluate permissions for a user against objects.
Content is secured using granular permissions and role-based security to ensure authenticated
users only access the content they’re authorized to.
2.2.2! Users and Groups Service
The Users and Groups service allows system administrators to create and delete users and
groups, assign and remove users to/from groups, and assign Permissions to users and groups.
2.2.3! Encryption Service
The Encryption service supports full encryption of content in the repository if required. All data
uploaded to and downloaded from the CMS is SSL encrypted.
2.2.4! Logging Service
The Logging service provides a configurable record of actions and events. It collects information
and stores it in a simple database form. The Logging service provides a full audit trail of system
events, user actions, and metadata changes.
2.2.5! Authentication Service
The Authentication service provides multiple types of authentication procedures, including
simple password-based authentication, and LDAP-based SSO.

2.3! Content Services


2.3.1! Content Model
The Content model defines the metadata for different content types, associations between
objects, and classifications of content. The content model is the controlling element for almost all
rule-based activity in the system.
Client Story: We built a taxonomy system for one of the largest travel content sites in the world.
The system was built using TMCore – a .Net and SQL Server based taxonomy system. We
implemented a comprehensive geographical, demographic, and user interest based classification
system for Points of Interest. For example, a restaurant could now be classified by granular
region (Logan Circle – Washington – D.C. – USA – North America), demographics (Kid
friendly, good for professionals, etc.) and interests (music, baroque architecture, etc.) This
dramatically increased the efficiency and agility of authors and editors while creating content.

2.3.2! Check-in/Check-out
Check-out and Check-in services control updates to documents and prevent unwanted
overwrites. Checking out a document locks it, preventing other users writing changes to it.

2.3.3! Search and Query


The Search service supports Full-text search, multi-faceted search, and XPath searches. The
Query engine supports standard content query languages (CMIS QL).

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 7!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

2.3.4! Versioning
The Versioning service manages versions of individual content objects. Each version has a
version number that is allocated sequentially and follows a similar strategy to Concurrent
Versions System (CVS) version numbering. There are various methods relating to the
Versioning service - Create Version, Version History, Get Current Version, Revert, Restore
Version, Delete Version History.
!
2.3.5! Lifecycle
The Lifecycle service defines and manages the lifecycle states of content objects. Content objects
can be assigned rules and permission sets based on Lifecycle states.

2.3.6! Records Management


The Records Management service controls important information for retention over time. The
system is certified for US Government 5015.2 records standard and is useful for controlling
retention and review periods, providing specialized security, and determining whether the
records are archived or destroyed after a specified period of time.
!
2.4! Process Services
2.4.1! Collaboration
Collaboration services integrate the production and publishing of content into user networks.
These comprise:
•! Activities: continuous personalized feed of activities performed by others
•! Wikis: easy creation and editing of interlinked web pages
•! Blogs: log of regularly maintained entries of commentary, events, and other material,
such as documents and videos
•! Discussions: threaded conversations on content objects or collections

2.4.2! Workflows
The content workflow is a sequence of connected tasks applied to a content object (folder,
document, graphic, form, etc.). Each task can be performed by a person, a group, or
automatically by the system.
The system will provide a Graphical workflow modeler for power users and powerful workflow
scripting tool for developers. Administrators and power users will be able to change or re-route
workflows based on privileges.

Client Story: We implemented a full workflow system for a major Educational Publishing
company using Alfresco’s Workflow engine. The complex workflow involved multiple content
types (text, graphics, audio, video, and interactive applications), and multiple locations – text
and graphics content in San Francisco, audio/video content in Los Angeles, and interactive
applications in the Philippines, India, and Costa Rica. The workflow system allowed the
geographically disparate teams to work efficiently and effectively.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 8!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

2.5! System Integration and Content Distribution


Content will be distributed to multiple channels packaged as XML files with all referenced
assets. Data and metadata will be packaged as XML/JSON as acceptable by the downstream
system. The solution will package Published content for easy accessibility and delivery to the
agency’s website and any other digital delivery platforms.
Our preferred architectural approach for content distribution and integration is to use RESTful
APIs and services. The benefits of using RESTful architecture for the integration are increased
performance and network efficiency, scalability, simplicity, modifiability of components,
visibility of communication, and reliability through resistance to failure at the system level.
All major ECM products, including Sharepoint, Alfresco and EMC Documentum, provide REST
APIs that allow us to communicate with the content application server using HTTP/HTTPS-
based resource-oriented interfaces. Multiple resources are exposed by the RESTful APIs
available with the major ECM products – for instance: Queries, Workflow operations
(Start/cancel/resume, set states), CRUD (Create, Read, Update, Delete) operations on
Documents, Users, Groups, Directories, Metadata, posting content to Publishing channel queues.
We will write custom REST services in Java or .Net for any integration requirements that are not
provided by the default REST APIs of the ECM system. A list of custom services will be
provided as part of the ePublishing Solution System Integrations Document.

Service!Bus/
REST/SOAP/WSDL!
Orchestration!Layer System!1
Services
XML/JSON over HTTP/HTTPS

ECM,Repository,
Layer Queueing
REST/SOAP/WSDL! System!2
Document/File! Messaging Services
System REST!APIs/!
Services
Monitoring REST/SOAP/WSDL! System!3
Services
Database Security
REST/SOAP/WSDL! System!4
Transformation Services

Figure 8 – System Integration Architecture


We will use a lightweight Enterprise Service Bus (ESB) to manage all information exchanges
ensuring complete logical business transactions across systems, even accounting for periods of
system unavailability. Not all components of the ESB may not be required if the complexity of
the integration points is not high, and their number is not expected to increase over time. Using
an ESB allows us to provide high availability, monitoring and robustness of the integration
solution. Adding additional integration points in the future also becomes significantly simpler.
Our plan of approach is as follows:
1.! Create the ePublishing Solution System Integration Document, listing the integration
points, data/content to be transmitted, available web services for each integration point,
requirements for new web services/APIs, and technical limitations/constraints, if any.
This document will also include any authentication data or protocols needed to access the
systems via a web service.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 9!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

2.! Construct a test plan for successful communication and systems integration test.
3.! Configure or write new RESTful services for our ECM solution, based on the
documented requirements. Integrate a transaction queue to hold pending transactions.
4.! For the downstream systems, we will use any services that already exist, whether REST,
SOAP or other web services. If they don’t exist, we will write services for these systems,
provided we have required access.
5.! Implement and configure open-source ESB solution like Mule to the required
specifications. We prefer Mule ESB for most scenarios, but can use any other ESB
solution preferred by the government agency.
6.! Provide monitoring of successful/failed transactions exchanged between the systems.
Client Story: Our ECM solution for a major publisher includes multiple integration points –1.
product data from the Financial System (JD Edwards) is pulled in while creating a new product
in their Content Management System; 2. on Publish, a transaction is sent to the legacy print
repository, summarizing attributes about the document, including linkage information to the
document’s physical location; 3. XML content is sent to a staging server for consumption by the
public facing subscription website..

2.6! Financial Ledger


The Financial Ledger is not standard functionality in most ECM products. This functionality is
custom built as an extension to the core ECM solution. The financial ledger capability provides
the ability for users to:
•! load initial balances and incremental funding, manually or by importing a MS Excel
spreadsheet, for proponent accounts,
•! adjust balances at the beginning of each fiscal year,
•! check available balances in proponent accounts when print jobs are estimated, Load
transactions from a MS Excel spreadsheet, and identify any errors or data mismatches.

2.7! Reporting
•! Content Reports – including reports on content lifecycle states, repository volume,
incomplete content, ‘orphaned’ content, etc.
•! Workflow Reports – status of publication workflows, actions performed by users,
delayed tasks, tasks pending for longer than pre-set thresholds, overloaded users, etc.
•! Financial Reports - daily transactions which lists all funds coming in to the agency or
going out for each proponent account, an accounting of all activity for the fiscal year for
all proponent accounts (or an individual proponent), transactions by funding type (direct,
postal, reimbursable).
•! Custom Reports – Query Engine, Graphs, Formats, transmission media.

Client Story: A major Education technology customer needed a real-time Assessment Reporting
system that would allow teachers and school administrators to get reports on the relative
performance of students in their classrooms. Student actions are tracked at a granular level, as
they use electronic content provided by the company. This data is growing exponentially, and
currently stands at multiple TBs. We used advanced Big Data technologies to provide useful
reports that teachers can use to identify weaknesses and strengths for individual students and
groups, and tailor their pedagogy accordingly.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 10!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

2.8! User Interface


The User Interface is a key element of a successful CMS and authoring tool system. Our solution
contains the following services to support this architecture element:
•! Personal Workspace: The personal workspace is the section of the UI where users can
access content objects that they are working on or plan to work on. It also contains
content objects that have not been committed to the repository yet – i.e. content that is
private.
•! Shared Workspaces: A user can have access to multiple shared workspaces. These can
be shared between user groups, ad-hoc collaborative units, or corporate subdivisions
(departments, etc.). Shared workspaces can be set up with varying permission sets for
users.
•! Assigned Tasks: This section of the interface lists all workflow tasks assigned to the user,
and associated data like due date, completion level etc. This is like an enhanced inbox for
the user.
•! Dashboard: The Dashboard provides the user quick and easy access to reports, updates,
tasks, activities and notifications. Dashboards can be personalized to present the
information most useful for the user.
•! Notifications: Notifications can be triggered by new task assignments, due date
warnings, or lifecycle changes. Notifications can be displayed in the UI (Inbox) or can be
automatically emailed to the user.
•! Administrative Tools: The user interface provides access to administrative tools based on
a user’s access level.

2.9! Choice of CMS: Alfresco vs Documentum vs Sharepoint vs Content Manager vs Nuxeo


There is a range of enterprise CMS products in the market, and while we have evaluated almost
all of them, there are four that we have worked with extensively for our clients – Alfresco, EMC
Documentum, IBM Content Manager, and Nuxeo. Based on our hands-on experience with these
four products, we have come up with the following comparison chart:

Alfresco EMC Share- IBM Nuxeo


One Documentum point Content
Manager
Ease of Use 5 4 4 3 5
Granular Security Model 5 5 5 4 4
Personalization 5 3 3 3 5
Configurability and 5 4 3 2 5
Flexibility
Cross-platform 5 4 3 1 5
capabilities
SSO Support 5 5 5 5 4
Workflow/Lifecycle 5 5 5 3 4
Management
Management/Admin 5 5 5 4 4
Tools
Functional Coverage 5 5 4 3 4
Scalability 5 5 5 5 3

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 11!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

Integration with other 3 4 4 3 3


systems
Enterprise volume 4 5 5 4 3
management
Total Cost of Ownership 4 3 3 2 5
Documentation and 5 5 5 4 3
Support
Reporting 5 5 5 4 4
Collaboration Tools 5 4 4 2 5
5015.2 Compliance 5 5 5 5 1
Cloud Migration 4 4 5 3 4
Figure 9 - CMS Product Comparison
Note: The products are rated on a scale of 1 (worst) to 5 (best).

Nuxeo is the most flexible solution. However, it is relatively immature and not sufficiently
robust for most government requirements. It is also not 5015.2 compliant.
IBM Content Manager is an expensive and relatively inflexible solution that ties the customer to
IBM’s ecosystem. The core technology is also largely legacy and not easily supported/extended.

EMC Documentum is the most scalable and robust solution. It is one of the best-of-breed
leading solutions as per Gartner’s Magic Quadrant. However, it compares relatively poorly to
Alfresco on ease-of-use and cost of ownership.
Alfresco has a functionally complete ECM solution, and is cheaper and more flexible than
Documentum. However, it compares relatively poorly in terms of total installed base and
resource availability.
Sharepoint is a mature solution and offers the best integration with Microsoft’s Office Suite. It
is also easily set up on Azure cloud infrastructure. However, it is also restrictively tied to
Microsoft’s ecosystem, and the cost of ownership is relatively higher than for Alfresco and
Nuxeo.

From our experience, the choice of ECM solution for the government’s needs narrows down to
one of Documentum, Alfresco or Sharepoint. A number of our Publishing industry clients
(Pearson Education, John Wiley and Sons, Wolters Kluwer Law and Business) use one of these
three products for different applications. We have successfully implemented robust, user-friendly
solutions using all three technologies.
1.! Alfresco, Documentum and Sharepoint have a Java or .Net code base that can be
extended. However, there is a higher degree of customization flexibility available with
Alfresco.
2.! Documentum are the more mature solutions with larger user bases.
3.! All three ECM products have DoD 5015.2 certification.
4.! Documentum and Microsoft are two of the Leaders and Alfresco is one of the Visionary
ECM products in Gartner’s Magic Quadrant.

The final choice of ECM product will come down to the specific requirements of the
agency. We can provide further insight into the relative capabilities of these technologies and
share some of our experiences with all of them – just call us for a free consult!

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 12!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

3! Editing!and!Formatting!Tool!

Figure 10 - Editing and Formatting Architecture


3.1! Core Word Features
Our solution is based upon customized the robust features of MicroSoft Word. The Core MS-
Word functionality can satisfy a number of the requirements specified in the RFP.

3.2! Templates
In addition, Templates for a set of pre-configured MS-Word templates will be available for the
different publication types. These templates will impose a content model on the documents –
resulting in the generation of valid XML.

3.3! Extensions and Plugins


A number of Word extensions and plugins can be developed using VBA scripts. These are
intended to add features to core MS-Word functionality, making the authoring, editing and
formatting of documents faster, more efficient, and more consistent. Each extension and plugin
will also have a corresponding button on the Ribbon or a new custom menu item.

3.4! Ribbon Buttons and Menu Items


Extensions and plugins built for the users will be exposed to the users through custom buttons on
the MS-Word ribbon, and/or custom menu items – through use of the RibbonX API.

Client Story: We built a question authoring tool for the Higher Education group at a major
publisher. This tool is a hardened template-based authoring tool for creating Questions and
Answers that adhere to the QTI standard. The tool allows authors to effectively create QTI-
compliant Q&As that are used in their textbooks and tests.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 13!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

3.5! Browser-based Online Authoring Tool


The browser based authoring allows for content capture, maintaining the workflow status,
routing, and collaboration all within a web-based browser environment. The key components of
the authoring tool are:
•! XML-First Editor: The browser-based authoring tool is essentially a user friendly XML
editor. It will allow users to make edits to the content within the constraints of the XML
schema. Quantilus has built a similar tool for Frommers.com (now acquired by Google)
that we will re-purpose for this project.
•! Change of Workflow: When content is edited in the browser-based tool, the user can
choose to automatically update the associated MS-Word document. Essentially, they are
editing the same file. They can also choose to not update the base Word doc and instead
make a copy. Any content that is updated in this way will be flagged in the ECM system
– where the source Word document does not match the edited content. This is useful in
cases where users may want to have slightly different content (say, for print and web) but
want to maintain a link between the documents.
•! Real-time Collaboration: The browser based authoring tool allows multiple users to
simultaneously edit and format a given document.
!
Client Story: We built an online authoring tool for a company that does SEC Edgar filings for
public companies. Edgar filings are very large (200+ page) documents that posed a challenge
while loading in a browser for editing due to their size. We solved the problem by loading the
documents as a component tree, where sections or paragraphs could be selected and edited
individually. The system currently allows up to 5 users to work simultaneously on different parts
of the same document – greatly increasing the efficiency of the process.

3.6! BluePencil Content QA Tool


BluePencil is Quantilus’ proprietary Content QA tool that was built as a service to run
multiple types of validation rules against a document, and generate an error report. BluePencil
has the following components:
•! Style Checker – uses advanced concepts of Natural Language Processing to check a
documents for stylistic errors (as opposed to grammatical errors, which are easily
identified in MS-Word). For example, it can point out sentences written in passive voice,
the use of double negatives, etc. The stylistic rules are completely configurable.
BluePencil will be configured to check documents against AR 25-30 DAPAM 25-40.
•! XML Schema Checker – validates XML documents against the defined schema and
schematron rules.
•! Business Rules Checker – checks documents against business rules. For example, it will
generate errors if a pre-specified word count or page count is exceeded, or if the full
name of the author is not included. The business rules are completely configurable.
•! Link Checking – checks all links in the document for validity and existence. Generates
an error report for broken links.
BluePencil will be called as a service from within MS-Word – through a custom button on the
Ribbon, or from the browser-based editing/formatting tool as a menu item.
On a high-level, content QA activities comprise of the following tasks:

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 14!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

•! XML Integrity Check: ensures that the XML submitted to the CMS is a valid, “not
malformed” XML document (this is not the same as checking against a schema or a
DTD).
•! XML Rules Validation: ensures that the XML submitted to the CMS conforms to the
business and grammar rules defined for the DTD or schema that it is based on (viz. WML
2.1, WML 3G)
•! Link Validation: ensuring that if part of the content points to (hyperlinked to) another
part of the same publication, the target XML must exist
•! Image Validation: ensure that if XML content points to external binary assets (images,
flash objects, etc.) then the objects that are pointed to must exist and could be found at
the specified location
•! Rendering: Simply put, a rendering is the XHTML rendition of the XML content after it
is transformed using an XSL template.
•! Pattern Search: Pattern search is an interactive way of testing the contents in XML.
Although XRV service implements the business and grammar rules, but there are certain
rules that are not coded in (or should not be coded in) the XRV service but still need to be
tested against. Pattern search is the ability of searching for certain values in certain tags in
the content XML. It provides a kind of name-value pair search over the XML content in a
given publication.

Client Story: We initially built BluePencil for a major journal publishing customer. The stylistic
rules varied by each publication type – Architecture, Accounting, Culinary, Scientific and
Technical Journals, etc. BluePencil uses advanced Natural Language Processing technologies to
process the content against the rules. Like most NLP-based algorithms, the system “learns” or
improves at pattern recognition as more content is processed by it.

3.7! On-Demand Preview Service


This can be initiated from either MS-Word or the online authoring tool. The service will generate
a PDF and HTML viewer for Print and Web previews of the content.

! !

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 15!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

4! Migration,!Deployment!and!Security!
!
4.1! Content Migration
Most government agencies have large existing content repositories – typically in multiple
Terabytes. The files are in multiple formats: PDF, XML, MS Word, Text, Excel, Lotus, and
graphics files, to name a few. The files may also have metadata elements stored in multiple
databases. The metadata and content has to be migrated to the ECM solution as part of every
project.
Both Alfresco and Documentum have robust content migration APIs. However, there is a level
of custom scripting required for specific use cases. We use custom migration scripts combined
with extensive automated and manual testing to migrate content to the new platform.

Client Story: We implemented Documentum for for a major Legal Publishing group. As part of
the project we migrated their entire content repository from the disparate systems of multiple
editorial pagination vendors in Israel and India. Even though the Publisher owned the content,
the definitive and final content (as XML packages and PDF files) was actually held by the last
vendor to touch it. This was aggregated and centralized into the Documentum repository.

4.2! Content Conversion


Government agency document repositories have documents in various formats created at
different points in their evolution – formats include XFDL, SGML, Word, XML, etc. We
generally convert XFDL documents to PDF using automated batch processes. We also convert
SGML documents to XML. Also, if the XML DTD/Schema has been modified, we can convert
the content to reflect the updates.
We have extensive experience with converting content through automated scripts. In most
cases, we are also able to automate the testing of converted content. However, in certain
situations, extensive manual testing is essential to ensure converted content for accuracy and
completion.

4.3! Infrastructure and Deployment


The infrastructure setup and deployment process is a critical aspect of this project, especially
considering the multiple environments that we will need to securely manage the software builds.
It is possible that the server configurations and policies in the government domain will be
restrictive for some aspects of our code base. To minimize this risk, it will be ideal if the
government provides us with a Virtual Machine with the identical server configurations and
policies.
The following image illustrates the infrastructure setup we propose for government projects.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 16!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

Figure 11 - Deployment Strategy


The primary development environment is hosted in the Quantilus offices. The development
and QA teams for the core solution will be located at the Quantilus offices. The teams will use
GitHub for source control, and will set up continuous integration on the Development server.
The Test environment will be updated with the main code branch that has passed unit testing.
At the end of each 2-week Sprint, the tested and approved code base from the Test environment
will be deployed on the Test environment set up in the government domain. The Quantilus
environment and the government environment will be kept completely firewalled from each
other.
Other activities that will happen within the government domain will include:
1.! Data Migration: The developer and QA resource working on the migration will work
onsite.
2.! QA in the government domain: Code deployed on the Test environment will be tested by
our QA resource, as well as by government personnel as required.

4.4! Operations and Maintenance


Content Model Change Committee – A mature publishing organization needs to have a
living content model. Each new publication or revision may raise the need for new XML
elements, rule changes, and metadata updates. However, changes to the content model must be
tightly controlled for the following reasons:
•! Backward compatibility is essential to support all previous publications

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 17!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

•! The downstream impact to systems and processes must be carefully evaluated before
each change is authorized.
•! The content model should be kept manageable as far as possible.

As such, a Content Model Change Committee must be set up to evaluate each proposed
change, and to oversee the implementation of the approved changes. Committee members will
comprise Subject Matter Experts and a rolling set of Editors and Authors. It is essential that
respect for the potential impact and power of content model be institutionalized.

Software Upgrades – For periodic maintenance releases for custom code, a Build/Release
process will be established. A Change Board will be created to approve change requests (tickets)
for development, and to approve test results for completed tickets (batched in builds) prior to
deployment Production.

Figure 12 - Change Process Workflow


All COTS vendors release periodic updates to their software products. For each upgrade, a
careful evaluation must be done upfront to analyze the potential impact on all customizations
made as part of the project. A full regression test cycle must be performed on a test environment
prior to deploying the upgrade on the Production environment.

Client Story: We set up the build-release process for a major Homebuilder as part of their
Sarbanes-Oxley compliance project. We helped the client document and implement process
controls for compliance. The build-release process for their Financial system was a key part of
their controls.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 18!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

Hardware Updates – Our Operations Support personnel will set up monitoring on the
Production environment to constantly evaluate the status and performance of the servers.
Hardware may be updated in the following scenarios:
•! Degradation of performance over time.
•! Proposed software updates necessitating hardware changes.
•! Deprecation of support for current hardware stack necessitating replacement.
Our Operations Support personnel will work closely with government infrastructure personnel if
hardware needs to be replaced or upgraded. We will involve our IA personnel to ensure that all
security policies are maintained.

4.5! Information Assurance Support


Information Assurance (IA) is a key part of any ePublishing Solution we provide to the
government. Our approach for Federal and DoD clients is based on the Risk Management
Framework (RMF) for DoD Information Technology (IT), DoDI 8510.01 and the DoD
Information Assurance Workforce Improvement Program, DoD 8570.01-M.
As such, we will provide the expertise necessary to perform the following functions:
1)! Identify the type of incidents when they occur;
2)! Develop and execute an effective incident response and response plan of action;
3)! Provide trouble ticketing and trouble tracking support;
4)! Provide security “best practice” and “good housekeeping” support to include the
performance of system assessment and vulnerability scanning, testing and mitigation;
5)! Provide system accreditation and certification planning and implementation support
to include program protection plan development and evaluation; and
6)! Develop procedures and capabilities to guard against Distributed Denial of Service
Attacks (DDSA) and other cyber related activities and mandates.

Risk Management Framework

Architecture! Organizational!
Description Step!1:!Categorize! Inputs
Information!System

Step!6:!Monitor! Step!2:!Select!
Security!Controls Security!Controls

Step!5:!Authorize! Step!3:!Implement!
Information!System Security!Controls

Step!4:!Assess!
Security!Controls

Figure 13 - DoDI 8510.01 Risk Management Framework (RMF)

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 19!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

Our team applies the DoDi 8510.01 six-step RMF process to develop and support security
systems. A few of the regular steps in this process are:
•! Production system vulnerability assessment;
•! Network and operating system hardening;
•! External and internal threat assessment;
•! Repository level security analysis;
•! Backup and disaster recovery analysis;
•! Mobile/edge device security support analysis;
•! Applicable standards and procedures compliance analysis;
(e.g. DoDI 8510.01, AR25-1, AR25-2, NIST SP 800-53, etc.).

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 20!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

5! Management,!Training!and!Support!
!
5.1! Quantilus Publishing Center of Excellence
Our ePublishing solutions are designed to take advantage of the fundamental building blocks and
services that support all the major publishing houses in the US, including Pearson, McGraw-Hill,
Wolters Kluwer, John Wiley and Sons, and others. These companies know the life-blood of their
business is in content management and authoring tools. To achieve maximum efficiency and
optimal products, these firms have utilized over 80 software professionals from Quantilus over
the past several years.

The government can take advantage of our team’s core CMS knowledge and proven software
solutions. Our proposal includes several software components that emanate from Quantilus’
Center of Excellence (COE). The COE represents a large cadre of software professionals who
have decades of experience with Publishing systems and processes. Quantilus has built up the
largest and best collection of
combined knowledge in this field.
Our teams are frequently spread out
across our clients, but we have
established a process for
centralizing the aggregate domain
knowledge and experience so all of
our teams can benefit.

Quantilus’ Publishing Center of


Excellence is a team of experts
with specialized experience in the
Publishing industry who work
together to develop and promote
best practices in Publishing Figure 14 - Quantilus Center of Excellence Overview$
technology. Our COE provides subject matter and technical guidance to all of our
implementation teams in the field.

The Quantilus Center of Excellence is available to all our project teams for the entire duration of
the project at no additional cost for the client. The government will be able to derive all of the
benefits of our combined experience through this model. The COE produces “solution
accelerators” which are software modules ready to be utilized within an overall technical
architecture. One solution accelerator from the COE is BluePencil. This accelerator is Quantilus’
proprietary Content QA tool that was built as a service to run multiple types of validation rules
against a document, and generate an error report. BluePencil has the following fully configurable
components:
•! Style Checker – uses advanced concepts of Natural Language Processing to check
documents for stylistic errors (as opposed to grammatical errors, which are easily
identified in MS-Word).
•! XML Schema Checker – validates XML documents against the defined schema and
schematron rules.
•! Business Rules Checker – checks documents against business rules.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 21!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

•! Link Checking – checks all links in the document for validity and existence. Generates an
error report for broken links.

5.2! Software Development Approach


Quantilus projects use an Agile Scrum development methodology. The agile approach highlights
iterations involving constant end user involvements and reviews, and deliverables focused on
specific project tasks and requirements.

Multiple agile sprints are required to release a deliverable for a project. Sprint lengths typically
will range between 2 to 4 weeks. For most projects we propose a sprint length of 2 weeks.
The basic steps of a sprint are:
•! User Story Estimation - A meeting at the beginning of each sprint where the team
commits to a sprint goal. They also identify the requirements that support this goal and
will be part of the sprint, and the individual tasks it will take to complete each
requirement.
•! User Story Assignment –Each person on the team is given a specific set of tasks and
associated work products to be completed for that task.
•! Daily Stand Up - A 15-minute meeting held each day in a sprint, where development
team members state what they completed the day before, what they will complete on the
current day, and whether they have any roadblocks.
•! End of Sprint Review: A meeting at the end of each sprint, introduced by the product
owner, where the development team demonstrates the working product functionality it
completed during the sprint.
•! Sprint Retrospective: A meeting at the end of each sprint where the scrum team discusses
what went well, what could change, and how to make any changes.
The following diagram shows the relationship of these steps:

Figure 15- Agile Development Lifecycle

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 22!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

Each sprint includes constant coordination, collaboration and review with the client, as shown by
the different colors before the green arrow. After a sprint is completed, there is a feedback cycle
to ensure any issues do not appear again in future sprints.

5.2.1! Tools Used


Our team will employ standard industry tools such as Eclipse for CMS (Java), Visual Studio for
authoring tool extensions. Source control will be handled by BitBucket.
Source Control: BitBucket
Task and Issue Management: JIRA
If the Federal government agency has other standard tools that they would like would like the
project team to use for source control or task management, we will be glad to comply. Other
source control tools that our teams have used include (but are not limited to): GitHub, CVS,
SVN, ClearCase. Other task management tools that our development teams have used include
(but are not limited to): Wrike, Mantis, Rally, Trello, Remedy, ClearQuest, BugZilla.

5.2.2! Program/Project Management Support


Our project management expertise runs throughout the lifecycle of any project. It begins with
the building and mobilization of the project team - ensuring that the project vision and goals are
clearly articulated and boundaries clearly defined. Delivery is assured via detailed planning, cost
control and proactive management of risks and issues. Our PMs establish clear and unambiguous
communication channels to create a smoother route to success.
Our PMs drive and manage the project through the following lifecycle stages:

•! Project start-up
•! Planning
•! Cost control
•! Change control
•! Risk, interface and issue management
•! Resource Scheduling
•! Document management
•! Communications and assurance
•! Benefits and performance measurement

Our Project Managers benefit from ongoing training and knowledge sharing through our Center
of Excellence and ongoing collaboration sessions with our teams.
Our Project Managers will provide the government visibility on all ongoing tasks and upcoming
milestones, helping make informed decisions and reduce risks in order to increase the likelihood
of success. Our PM reports offer the following benefits:

•! Enable informed budget decision making


•! Simplify access to key risks and opportunities
•! Allow effective project prioritization

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 23!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

•! Facilitate resource management


•! Provide clear and concise visual progress tracking

Our Project Managers are credentialed – all our Project Managers have PMP or Scrum Master
certifications.

5.3! Quality Control


The Agile Scrum methodology followed by our project teams emphasizes that code must be
tested on an ongoing basis. The goal of each Sprint is to deliver a “shippable” product – and that
means that the output of each Sprint must be thoroughly tested. The goal is to identify and fix
bugs when they are created, and not wait till they are caught at a later testing stage.
In order to facilitate this, Testing specialists are embedded into the Sprint teams. They work
closely with the developers to unit test every new functionality. Since the code base grows with
every Sprint, the Testing specialists will also automate baseline test cases as they go along. This
ensures that the incremental amount of testing required per Sprint does not become excessively
large.

Figure 16 - Quality Control Workflow

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 24!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$
!

This diagram lays out the testing process for User stories within a Sprint. Each user story is
tested and passed before it is merged into the main develop branch of the code base. The develop
branch is then tested for Sprint release.

5.4! Help Desk Support


If required by the government agency, we can provide Tier I, Tier II and Tier III Help Desk
support for the project. We provide a user interface for Tier I Help Desk to accomplish basic user
administration tasks including (but not limited to) new user set up; establishing and modifying
user roles and access privileges; and resetting user passwords.

For Tier II and Tier III support, we will deliver all necessary services, staff, and expertise to
operate and maintain Help Desk functions including ticketing, system and operational support
and troubleshooting onsite or remotely as required by the technicians. Our team will support
Incident, Problem, and Request Management processes leveraging best-business practices.

5.5! Training
Quantilus has extensive experience in the training domain. We have built MOOC platforms for
delivering training simultaneously to thousands of users globally. We have also built innovative
mobile solutions for K-12 education.
For our ECM projects, we set up train-the-trainer sessions to empower the agency to take control
of its training processes. We also set up an online training platform for ongoing training, as
required by the customer. The ongoing training can be live or pre-packaged, as desired.

Client Story: As part of our project for a major Comic book publisher, we provided Train-the-
trainer support for teams working on meta-tagging their extensive corpus of comic books. The
project involved writing synopses for, and tagging characters and situations in over 30,000
comic books. A large team had to be spun up onsite and offshore to handle this, and quality was
ensured through extensive training and re-training. The teams were trained on the workflow
system and on the processes and nuances of tagging the books.

Publishing!Content!Management!Systems!4!Whitepaper!!!!! ! ! ! ! 25!
Use$of$this$document$is$restricted$by$Creative$Commons$License$Attribution7No$Derivatives$(CC$BY7ND$4.0).$

Вам также может понравиться