Вы находитесь на странице: 1из 201

SECTION 1

Enterprise Search


In This Section...

1. Search Components
Overview
2. Infrastructure
Search Techniques
3. Functional Search
Techniques
4. Search Engine Optimization
5. Information Correlation and
Comparison













Search Components Overview

Search is the art and science of making content
easy to find thus, it may be better termed "find-
ability," a word recently popularized by Peter
Morville in his book Ambient Findability.

The art refers to Language Arts -- specifically, the
leveraging of software that can parse, diagram,






































and/or infer meaning from captured content.
The art also refers to the development of a
user interface that makes the retrieval process
intui-tive and responsive.

The science is Library Science, including tech-
niques such as metatagging, categorization, and
taxonomies, all of which have to do with informa-

10
tion organization and are key to efficiently getting stuff back out of
an information management system once it has been put in.

Findability is far more than just typing something into a search box
and getting a result. Its also about discovering things about a topic
that you didnt necessarily know you were looking for, and thus
includes elements of browsing and discovery as well.

Enterprise search is how your organization helps people seek
the information they need from anywhere, in any format, from
any-where inside their company -- in databases, document
manage-ment systems, on paper, wherever.

Now, a lot of people think that because the tools for this are so
good, they dont have to organize their information at all. The nor-
mal reaction is "well just use Google on everything and find stuff
that way."

The problem is that this approach does nothing to minimize the
risks associated with losing things, not knowing which of multiple
copies are the "real" ones, and so forth. So as highly efficient and
effective as Google and its brethren are, they are not magic
bullets to be fired into repositories in the hope they leave order in
their wake.

Enterprise search is made up of several sub-systems. Here is
how the Real Story Group describes these subsystems:

What happens first is that a "crawler" crawls directories and web-
sites, and extracts content from databases and other
repositories, and arranges for content to be transferred to it on a
regular basis so it can notify the search engine that new
information is avail-able.
Next, a searchable index is created, and other value-added
proc-essing, such as metadata extraction and auto-
summarization, may take place. These functions group
information into logical categories that in turn can be searched
and return results to us-ers based on how the particular search
engine has categorized them.

Once this index is created, queries can then be accepted.
Queries arent necessarily questions, as they can also be just
terms or phrases that represent whatever youre looking for,
typed into the search box.

At this point, the search engine processes the query by passing
over the index, finding the information that matches the particu-
lar term or subject entered, and sending that information to
some sort of processor, which then sorts the information by rele-
vancy or other measure, clusters it based on the categorization,
applies some other logic (such as "best bets" or "recommended
best").

Last comes the formatting, which presents the results page that
youre used to seeing, in whatever format youve chosen.

Whats especially powerful about all this is that you can tweak
every step in order to accommodate your particular organizations
particular needs, from how the information is indexed, to what
kinds of queries you want to accept, to what kinds of documents
you want to return based on those queries. So understanding how
search works -- and how to prepare information so it can be found
-- is central to your information management success.


As noted previously, the crawler is usually the first major compo-
nent to make its appearance. A program or script that periodically
and methodically roams through your content -- or any outside
content, for that matter, including on the Web -- it provides up-to-
date data to the search engine being used.

Other terms for "crawler" are "spider" and "automatic indexer," and
the task boils down to identifying a set of repositories, called
seeds, that need to be crawled, then crawling them to identify all
the links they contain, and then adding those linked pages to the
list of places to crawl next. Relevant text and metadata are stored
along the way to facilitate reuse and speed subsequent searches.

A search engine index is used much as a book or map index is: to
tell users where what they seek can be found. Functionally in our
context, it collects, parses, and stores information about the pages
the crawler finds so future searches can be performed more
quickly -- without one, the search engine would have to search not
only every bit of content that has to do with the particular keyword
used, but every other piece of information it has access to as well,
to ensure that it doesnt miss anything. Any number of specific dis-
ciplines can be used here in order to ensure effectiveness,
including computer science, mathematics, informatics, linguistics,
and even cognitive psychology to get inside why people ask what
they do and what they might really mean to find.

A query engine is the computer program that actually searches
documents for a specified word or words and provides a list of
documents in which they are found.

Structured queries are those using Boolean operators like "and,"
"or," and "not." Unstructured queries are more ambiguous, such as
the plain language terms we generally use when we search
the Web. Query engines typically must be able to handle both.

Human-powered directories are search listings that are compiled
by human beings. Users can submit short descriptions to the
direc-tory of the sites they like, and/or editors can write them for
the sites they review, as well as organize the search results. The
search engine then looks for

AIIM provides deep dive training - both

in person and online - in such areas as ECM,
ERM, SharePoint, Taxonomy and Metadata,
Social Media Governance and others.


matches only in the descriptions. This can greatly speed the proc-
ess, but the human element can make the results highly
subjective until/unless enough people are involved to balance the
perspec-tives out.

Hybrids, as the term suggests, combine crawler-based results with
human-powered listings, thereby providing the best -- or perhaps
the worst, if care isnt taken -- of both worlds.

Infrastructure Search Techniques

A homogeneous search engine uses the same search technology on
many separate information sets. For example, separate instances of
"search tool x" are engrained in three document management appli-
cations, your ERP system, and your corporate intranet. In this sce-
nario, youd be creating separate indexes for each application. The
enterprise search single query interface would accept the query,

broker it out to the separate application indices, and then merge
the results set.

Federated search orchestrates search not only across multiple re-
positories, but involves separate and distinctively different search
engines. For example, "search tool x" is used to index your docu-
ment management system, "search tool y" indexes your e-mail,
and "search tool z" indexes your web-based content. In the
federated search model, one search engine is selected as the
"master" or or-chestrator. It accepts the query and passes it on to
the heterogene-ous search engines running within each repository.
It then synthe-sizes and rationalizes the multiple results sets. A
publicly-available example of federated search is the Web site
merlot.org, which provides a central facility for finding educational
resources across multiple sites.

Universal search uses a single search tool to create a single
index across any and all enterprise repositories, regardless of
whether these individual repositories have their own application
search. In essence, it disregards any other search tools.

In each of the aforementioned cases, it is important to realize that
users typically are able to specify how many, or which, repositories
they wish their query to be executed against. The default is usually
ALL, but it does not have to be that way.

Functional Search Techniques

Application search refers to the search function that comes built
into particular applications (like email), repositories (like records
management systems), or systems (like desktop search). When
you search within an application, youre only searching the infor-
mation thats managed by that particular tool. Among the advan-
tages of this kind of search is that it is made available directly in the
context of the business application, as opposed to being offered as a
separate capability. The interface thus is inherently more intui-tive as
it behaves like a feature of the business application itself. It also may
understand and respect the applications security and ac-cess
model, meaning that a search within a financial application would
allow you to access only those documents that you are al-lowed to
view, which it would know from your system login or authentication.
On the down side, application search works only on the information
contained within the application, and it may not be as feature-rich as
best-of-breed alternatives.

Parametric search is a primary and fundamental example of rules-
based search. Also called fielded search because it refers to meta-
data fields within databases or content repositories, it operates on
attributes that have been predefined in the given data sources. For
example, imagine that you are looking for a womens size eight red
shoe with a three-inch heel. These are the specific parameters that
you are querying on, and particular elements in your repository
have been populated with values that I can then search: like gen-
der (which is womens), article (which is shoe), size (which is eight),
color (which is red), and heel size (which is three inches).
Parametric search provides the highest level of precision on a
query. The drawback however, is the query capability is strictly lim-
ited to the fields that are declared. And you have to be conscious
not only of which fields you want to be searchable, but also whether
those parameters align with what users will need or want to query
on.

Keyword search is a specific form of parametric search, as it is
based on one or more fields. These fields, though, contain user-


declared words or phrases that represent concepts within the con-
tent. We use them all the time, and even set them up ourselves in
such simple examples as the one shown, in which we characterize
the bookmarks we create when browsing the Web. Under this ap-
proach, database fields are populated with words and phrases that
have been associated with the targeted content. These words and
phrases are used to represent the overall meaning or value of the
content. Thus -- and this is important -- these keywords do not nec-
essarily have to be represented as words within the content.

For example, a document reviewer may assign the keyword or
phrase "World War II" to a document that discusses "the alliance
between Stalin, Churchill and FDR in their quest to beat Hitler,
Mussolini and Tojo." Nowhere in the document is the phrase
"World War II" actually mentioned, but the human reviewer made
that deduction and imposes this concept onto the document via the
keyword. One value of keyword searching is the ability to impose
human-based knowledge into the system, providing context and
insight about the bodies of content that are NOT overtly present in
the content itself, and making that available to the searcher. An-
other value of keyword search is its ability to be applied to any
form of content: not just text-based documents, but photos,
images, videos, audio, virtually anything. There are drawbacks,
however, starting with being reliant on human beings to analyze
each body of content and determine the keywords. This leads to
inflexibility because it depends on the words and phrases chosen
by the index-ers, and items can be missed if there are different
perspectives and vocabularies.

Rather than using ranking algorithms to predict relevancy, seman-
tic search identifies content on the basis of what words mean, not
merely their existence in a document. It is also known as natural
language search. In most cases, the goal is to deliver the informa-
tion queried by a user rather than have a user sort through a list of
loosely related keyword results.

A related discipline is known as pattern search, which looks at
things like how often certain words are used, in what proximity to
each other, in what order to determine which results are relevant.
This analysis is based either on prior knowledge or statistical infor-
mation it extracts from the patterns.

Statistical search uses mathematical algorithms to determine the
overall context of the meanings contained within information. En-
compassing many techniques, such as Bayesian probability, it
uses many different algorithms that are typically proprietary and
not modifiable. Because statistic-based approaches are NOT
based on words, they are more flexible than approaches based on
linguistic rules, and they typically can be applied to many different
lan-guages and forms of content. However, it is not an EXACT
science, and the results are not necessarily predictable. So it is
important to determine whether it "mirrors" human judgment well
enough to fit the business setting it is to be used in. Statistical
approaches are now embedded in almost every search engine,
although some rely on them more than others. They often make
their effect known in terms of determining the relevancy of
retrieved results in a process known as relevancy ranking.

Concept and fuzzy search take the users inputs and broadens
them outward to include other terms that relate in terms of mean-
ing, spelling, phonetics, and more. It therefore is more forgiving
than more literal forms of search like keyword, and can be more


flexible as well. But it also can generate longer lists of results, so
a balance must be struck before "information overload" occurs.

For example, a query for the word "fast" would automatically lo-
cate documents containing such related concepts as "quick,"
"speedy" and "rapid." Another example is one in which a query on
the work "walk" would include variations such as "walks," "walked,"
"walking," and "walker." The highest level of concept searching is
concept clustering. This functionality is provided through many
different proprietary algorithms and approaches that share the
ability to holistically analyze each document in the corpus. The
result is that each document is profiled according to the topics
and/or concepts it addresses, and then compared to all the other
documents in the collection. In the end, the collection is organized
into a series of separate but overlapping concepts. Users queries
are analyzed in a similar manner. Retrieval then is based on how
similar the profile of the users query is to the document profiles.


Rounding out this litany of types of search is social search, which
as the name suggests takes users social business activities into
ac-count and looks at things like blog tags applied, social rankings,
newsfeeds, podcasts, knowledge sharing, and colleagues search
activities when executing a search. Google+ is now making use of
this kind of search utility. According to the Web site, anytime you
share a link on Google+, any of your friends finding that link using
Google will see that it was identified by you. The idea is that "con-
tent from your friends and acquaintances is sometimes more rele-
vant and meaningful to you than content from any random per-
son." This is how Google put it on its Social Search site, and social
search of all stripes takes this into account by evaluating not
just relevancy, but relevancy to you.

Search Engine Optimization

Improving search results can be accomplished by working in sev-
eral directions, perhaps one of the most well known of which is
search engine optimization. SEO actually operates on the content
rather than the search engine by ensuring it is as findable as
possi-ble. Information that is not useful, or is locked tightly away,
obvi-ously takes away from the value of the search result.

Making effective use of keywords is the next critical component --
not only in the text of the information, but also in the <title>, <me-
ta>, and <header> tags of HTML and XML documents, all of
which are crawl-able.

Including links to related internal information -- and expanding the
list over time -- is another way to boost the effectiveness of your
search capability by exposing more quality content as you go
and empowering your users to contribute their findings can also be
a good idea, in a controlled way to guard against the search-
equivalent of spam from creeping in.

Social media techniques also have a home here by expanding the
number of places good content is referenced, thus giving the
search engine more opportunities to discover it. Never mind that
using organizational blogs, wikis, and tweets to alert users to new
infor-mation is a good idea anyway

Here are some additional techniques in use that also can help
a search engine do its job:




Constructing a site map to lay it all out for the engine (and the
people!) to navigate
Using plain-language URLs to promote understanding:
company.com/cooking/desserts is far more intuitive and
keyword-laden than company.com/1234/r45.html

Surrounding images with relevant text, and using keywords in
the <alt> tags, the image filenames, and captions

Information Correlation and Comparison

A big part of enabling access and use is reconciling the different
vo-cabularies used by different repositories so any given search re-
turns all the salient results, not just those that happen to use the
same term the searcher did.

Two common tools for achieving this correlation are the thesaurus
and the semantic network. While relatively straightforward in the-
ory, they likely will cause you consternation in practice because the
choices youll have to make seemingly will be without end.

A thesaurus is a file that manages and tracks the definition of
words and phrases and their relationships to one another, in a
hier-archical fashion. Its construction is governed by NISO
standard Z39.19.

A thesaurus ranges far beyond the simple antonyms and syno-
nyms that we learned about in school (although that certainly is part
of it). Also included are comparisons like "equal to," "related to,"
and "opposite of," and they are critical to ensuring a correlation can
be made between the taxonomies and metadata of every reposi-
tory, business unit, or functional group touched by the information
solution.
Take, for example, the word "lettuce," which also could be called
"greens" -- or "coriander," which is related to "cilantro" (coriander
being the seed while cilantro is the leaf). Though not necessarily
di-rect synonyms, they do have a relationship that could be
important to facilitating access to and use of that information, so
they there-fore must be mapped one to the other.

Semantic networks are functionally similar to thesauri but operate on
a higher conceptual plane. Continuing with our salad example, a
semantic-network-based system would understand that content about
mesclun greens, endive, and radicchio has something in com-mon
with content about lettuce, and it would use a metadata-based
infrastructure to unlock these particular secrets. Semantics are also
used as one way -- along with structural comparison -- to identify and
separate identical documents and unique ones so the dupes can be
weeded out of a search results lists and, ultimately, from the
enterprises information infrastructure. Semantic feature extraction
and comparison is based on the notion that objects con-sist of certain
features that we use semantics to describe (e.g., a robin has wings,
feathers, a beak, and a red breast). The existence of a sufficient
number of these terms allow us to decide what the object is. If two or
more objects share a sufficient number of these terms, they are
deemed to be the same thing. In the context of docu-ments and
information, success here presupposes an agreed-upon taxonomy
and mapping structure to support the comparison.

SECTION 2

Business Intelligence


In This Section...

1. Business Intelligence
Tools and Processes
2. BI, BPM, and Reporting



























Business Intelligence Tools and Processes

Business intelligence is the process of collecting,
analyzing, and presenting business and opera-
tional information in historical, current, and pre-
dictive views so better decisions can be made.
Traditionally focused on structured "data" -- as
from sales and accounting databases -- it increas-



























ingly now incorporates unstructured "content" as
well. As The Data Warehousing Institute sees it, BI
as we know it today is a waypoint on the path to
what it calls "performance management" -- the
logical endpoint on an analytical spectrum that
began with the use of inflexible client/server-based
data warehousing systems for historical re-


porting purposes and finishes with future-looking insight engines
that use flexible Web services to gather and assess information.

The key elements of BI are:

Information collection: gathering all there is to find

Information integration: aggregating the results

Analysis and synthesis of collected information: separating chaff
from wheat
Reporting and presenting information at various levels of granu-
larity: sharing the result in detail

One key component of BI is data mining, the process of finding
and extracting forward-looking information -- your organizational
gold! -- from within large databases. Also known as Knowledge
Discovery in Databases, or KDD, it is tough but necessary work
since organizations generally have most of the intelligence they
need in-house but either dont know it or cant get at it because its
buried so deeply.

Data cleansing is another critical task since it is focused on remov-
ing corrupt or erroneous data from the mix so the analysis can be as
pure and on-target as possible. It is also used to apply consis-tency
across data sets so they can be accurately matched up.

Steps here include:

Normalization: e.g., making all phone numbers look the same,
ensuring addresses share a common format
Replacing missing values: e.g., looking up Dun & Bradstreet
numbers or ZIP codes, adding a zero in front of ZIP codes
that need one

Standardizing values: e.g., converting all measurements to
met-ric, prices to a common currency, part numbers to an
industry standard

Mapping attributes: e.g., parsing the first and last name out of a
contact-name field, moving Part# and partno to the PartNumber
field

With the data now pulled out and cleaned up, a multi-dimensional
model can now be built. This big-sounding term simply means or-
ganizing data by relating "facts" and "dimensions" -- in other
words, correlating numeric measures like sales figures and budget
information to categories like geography, time, and products.

If youve ever built or used a simple Excel spreadsheet, youre al-
ready well familiar with what essentially is a two-dimensional
model that related data in rows to data in columns. A multi-
dimensional model is basically the same idea but takes it further in
order to support more complex analyses.

One popular analytical technique used is OLAP, or Online Analyti-cal
Processing. It is designed for data analysis, which sets it apart from
OLTP -- Online Transaction Processing -- another common
technique but one that is in the context of data capture. An OLAP
cube is a type of multi-dimensional model that aggregates the "facts"
of in each level of each "dimension" so users can roll data up into
higher-level groupings, drill down into greater levels of detail,





19
and slice and dice data to examine specific datapoints from multi-
ple perspectives.

Data marts live on the other end of the simplicity/complexity scale.
Simple data warehouses that focus on single subjects or func-
tional areas, they are generally built and controlled by a single de-
partment (sales, finance, marketing), and they draw on only a few
sources (internal operational systems, a central data warehouse,
ex-ternal data).

BI, BPM, and Reporting

The term business process management covers how we study,
iden-tify, change, and monitor business processes to ensure they
run smoothly and can be improved over time. Often framed in
terms of the daily flow of work -- and yes, "workflow" generally does
fit un-der the BPM umbrella -- it is an important piece of the access
and use puzzle since no or poor process really degrades your
ability to get at and leverage information.

BPM is best thought of as a business practice, encompassing tech-
niques and structured methods. It is not a technology, though there
are technologies on the market that carry the descriptor because of
what they enable: namely, identifying and modifying existing proc-
esses so they align with a desired, presumably improved, future
state of affairs.

Put more simply, its about formalizing and institutionalizing bet-
ter ways for work to get done.

Successfully employing BPM usually involves the following:
Organizing around outcomes -- issuing a firearms certificate, in
the case shown on this slide -- not tasks to ensure the proper fo-
cus is maintained

Correcting and improving processes before (potentially) automat-
ing them; otherwise all youve done is make the mess run faster
Establishing processes and assigning ownership lest the work
and improvements simply drift away -- and they will, as human
nature takes over and the momentum peters out

Standardizing processes across the enterprise so they can
be more readily understood and managed, errors reduced,
and risks mitigated

Enabling continuous change so the improvements can be ex-
tended and propagated over time
Improving existing processes, rather than building radically new
or "perfect" ones, because that can take so long as to erode or
ne-gate any gains achieved

Getting information to where it needs to go, when it needs to go
there, is only part of the solution -- much of the rest involves first
requesting the insights you need, and then having those insights
communicated to you in an immediately usable format. This is
what reporting and querying software is all about.

Most database and repository solutions come with some sort of
built-in reporting and query capability, and a thriving market for
third-party products from Actuate to Zoho put all sorts of features
and functions at your fingertips. Your challenge is to understand
how your users want to ask for and then see the information they



20
seek, from wherever it lives, so they can make the best and
quick-est decisions they can.

Success in this regard depends in large measure on how well you
label the data in your repositories so it can be identified and in-
cluded when an appropriate query comes along. A major boost to-
ward accomplishing this goal exists in the form of the Common
Warehouse Metamodel (CWM), a complete specification of syntax
and semantics that data warehousing and business intelligence
tools can leverage to successfully interchange shared metadata.
Re-leased and owned by the Object Management Group, the
CWM specifies interfaces that can be used to enable the
interchange of warehouse and business intelligence metadata
between warehouse tools, warehouse platforms, and warehouse
metadata repositories in distributed heterogeneous environments.
It is based on three standards:

UML - Unified Modeling Language, an OMG modeling standard

MOF - Meta Object Facility, an OMG metamodeling and meta-
data repository standard
XMI - XML Metadata Interchange, an OMG metadata inter-
change standard

CWM models further enable users to trace the lineage of data by
providing objects that describe where the data came from and
when and how it was created. Instances of the metamodel are
ex-changed via XML Metadata Interchange (XMI) documents.

Beyond the metadata aspects, other capabilities present in good
re-porting tools include the following:
Data source connectivity, so nothing important is left out

Scheduling and distribution functionality, so the intelligence can
be automatically generated and circulated
Security, so only authorized personnel find themselves on the
distribution list, and no unduly sensitive information is released
Customization, so corporate standards can be applied in terms
of presentation and layout
Export capabilities, so the intelligence can be readily poured into
other applications (like Excel, say) for analysis

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org



21
SECTION 3

Master Data Management


In This Section...

1. MDM Tools and Tasks

2. MDM Data and Information
Types
3. MDM Data Quality and
Information Governance





















MDM Tools and Tasks

Master data management is the practice of creat-
ing and maintaining a consistent and accurate
list of information that is regarded as THE source
of authority in your organization.

Encompassing technology, tools, and processes,
it is what allows different applications and users





















to leverage a single source of trusted data that
can be centrally -- and thus efficiently -- man-
aged and controlled.

Agreeing on and establishing that single source of
trusted data is no easy task, not in the least be-
cause success in that regard is more often choked
by internal politics than it is by technology. The

22
goal is to get to master data management to become an ongoing
process, and the best way to do that is to develop and enforce
poli-cies and processes about what and how data is collected.

It is absolutely essential that this be done, and done early in your
strategy process to guard against the result becoming diffuse, or
your initiative to establish an unequivocal touchstone becoming in-
effective.

Master data management is not for the faint of heart because it in-
volves such intense scrutiny of so many data and document types
and the repositories in which they reside.

For example, MDM guru David Loshin once offered up this exam-

ple. Consider this sentence...

"David Loshin purchased seat 15B on US Airways flight 238 from
Baltimore to San Francisco on July 20, 2006."

Embedded within this are several data objects:

Customer: David Loshin

Product: Seat 15B

Flight: 238

Location: Baltimore

Location: San Francisco

Imagine, though, that the airline he flew on got bought by another,
but their data object equivalents were:

Not customer, but passenger
Not product, but seat

Not locations, but departure and arrival cities

The challenge is clear. If you have the same essential data
objects, but different terminology, and only one can be blessed
as being "right."

The Data Warehouse Institute (TDWI) confirms this complexity by
reporting that "most MDM solutions are homegrown and require
five tools or platforms, usually from multiple vendors."

These tools generally distill down into these:

Database management systems, for maintaining the data itself

Data integration platforms, to foster communication among data-
bases
Metadata management repositories, for codifying and reconcil-
ing terms used to describe the data elements
Data modeling tools, to facilitate analysis

Tools for integrating MDM with operational applications, to
make the data accessible to and usable by line-of-business
soft-ware

Peel back a layer, and youll find that a number of specific utilities
are nearly always required to do the MDM dance. Perhaps chief
among these is the one labeled "ETL," which stands for extract,
transform, load and is used to clean, convert, and merge source
data.




23
Extraction is the removal of information from one or more
sources of any kind (e.g., databases, unstructured data, XML
documents, packaged applications like SAP or PeopleSoft)

Transformation refers to the cleansing, reformatting, and stan-
dardization of that information according to business rules and
organizational consensus. (These functions are covered in the
Business Intelligence module of this course.)

Loading is the insertion of the resulting data set into specified tar-
get systems or file formats

Hardly any kind of data management initiative worthy of the name
takes place without ETL, and choices abound in terms of the tools
needed for its execution.

Versioning is also commonly used because of the need to track
changes made to data, to revert to previous versions if new ones
are released prematurely or mistakes are made, or to satisfy
audit requirements.

For instance, a life insurance carrier may wish to know not only
whom a customer named as beneficiary, but also who was named
before that, and when the change was made. Researching this re-
quires that historical information has remained available -- as do
principles of good data stewardship, governance, and compliance.

Data migration is the process of transferring data between storage
types, formats, or computer systems. It usually is performed pro-
grammatically so it can be automated, but success here depends
on the quality of the ETL, versioning, and other work that has
come before, and it nearly always requires some level of human
involve-ment.
Migration usually takes place as part of a system change or up-
grade, or during consolidation after an acquisition is made. Issues
here include matching storage types, formats, and/or systems, and
the results are generally subjected to data verification to ensure
the data came over accurately and completely, and that the
business processes that rely on it work properly.

Data fusion distills data from multiple sources so higher-quality in-
ferences can be drawn than when using a single source. As a mat-
ter of practical fact, this boils down to data integration, which sim-
ply merges data, followed by data reduction or replacement, which
results in a smaller data set -- albeit one in which users have a
great deal more confidence.

MDM Data and Information Types

Structured content, often referred to as "data," is stored in
database tables, or increasingly as XML. Examples include
customer data, sales records, employee information, etc.

Unstructured content is, well, not stored in databases and includes
things like word processing documents and e-mail messages, files
in shared directories and on users hard drives, images, media
files, CAD drawings, and -- oh yes! -- paper in file cabinets!

And then there is semi-structured content such as HTML files,
which are free form but contain embedded tags that can give
the content context and make it searchable.

Transactional data -- also known as dynamic data -- is information
that is asynchronously changed as further updates to the informa-
tion become available. The opposite of this is persistent data,
which is data that is infrequently accessed and not likely to be


modified. The term is a familiar one in the world of ERP, where
transactional data is generated as part of a single business event
like buying or paying for something, and then is recorded in SAP
or other back-office application. As an information management
professional, your challenge is keeping tabs on these changes as
they are made, ensuring they are reflected in whatever other sys-
tems utilize that data, and thus providing a foundation to support
the making of sound business decisions.

Metadata is data about data, namely information that is used to re-
late information to other pieces of information and to their real-
world counterparts. Think of it this way: metadata is data that la-
bels information for the purpose of organizing it, identifying it, and
finding it again. It defines what something is, what its about, and
what characteristics it possesses, and properly done, it allows you
to find other pieces of information, and objects, that exhibit similar
traits.

Swedish man of science Carl Linnaeus, considered to be one of
the fathers of taxonomy, developed a classification system to
catego-rize all living creatures. For example, what makes a bird a
bird? You can say a bird flies, but there are those that dont (like
os-triches and penguins, for example). So there has to be
something more to it. In this case, the "more" includes feathers,
hollow bones, and the laying of hard eggs. Anything that doesnt
have these char-acteristics is not, in fact, a bird. So this metadata
is critical to deter-mining whats in and whats out -- a capability
that is crucial to searching, finding, and leveraging information.

A variation on the categorization theme, master data groupings are
"buckets" into which your master data can be put to facilitate
searching and retrieving. Typical breakdowns include grouping
data by person, product, place, time, etc. -- in other words,
segmen-tations that are not dissimilar to those used in constructing
a multi-dimensional data model as described in the Business
Intelligence section of this Certification prep course.

All the data slicing, dicing, cleansing, fusion, folding, and spin-dling
that weve discussed here and in the section on Business Intel-
ligence may leave you wondering if there isnt a risk of losing sight
of which piece of information is the "official" one in the case of a
mismatch or disagreement -- and indeed, youd be correct!

Thats why the concept of authoritative sources is so important --
and potentially so dicey, too. These contain common reference
data for use throughout the organization to reconcile differences as
they occur -- and they likely will occur even despite your best
efforts. Authoritative sources are needed when data will be
accessed by many applications, and they require consistency to
ensure reliabil-ity. The good news given the work required to set
them up is that they can be reused when developing new
applications, so the work doesnt have to be redone every time.

In many ways, records management systems have played this
role for years for critical business documents. But the issue is
much larger in the overall enterprise context, for it can encompass
infor-mation of all kinds -- a statement that brings us fill circle in
terms of the lessons contained in this module.

A solid master data model can boost the efficacy of all the data and
text in your charge by relating them to each other where appropri-ate,
and providing a touchstone to ensure quality. For all the work



involved in building it, though, the real challenge is keeping it
fresh, and there are several common methods for doing so:

The single-copy approach calls for the maintaining of only one mas-
ter copy of the master data, and the applying of all additions and
changes directly to it. In this scenario, all applications that use the
master data are rewritten so they use the new data instead of what
theyre using now. While this guarantees consistency of the master
data, its not terribly practical because modifying all your applica-
tions to use a new data source with a different schema and differ-
ent data can be very expensive and even impossible in the case of
some purchased applications.

Another strategy calls for maintaining multiple copies but only one
point of maintenance. In this instance, data can be added to or
changed in the single master, and copies of those changes are
sent out to the source systems for local storage. Each application
then can only change or add to information that is not part of the
master data. The net effect is a reduction in the number of
application changes, but also the need to disable functions that
add or update master data.

A third approach deals in continuous merge, which allows each ap-
plication to change its copy of the master data. These changes are
sent to the master, where they are merged into the master list, and
then forwarded on to be applied to the local copies. This requires few
changes to the source systems because, if necessary, the change
propagation can be handled in the database, so no application code
is changed. However, it does leave the door open to conflicts, as
when two source systems change a customer's address to different
values, for there is no mechanism to reconcile the inputs. It also re-
quires additions to be remerged since there is a chance that multi-
ple systems can add the same data (like a new customer, say).

None of these are perfect, as you can see, but thinking about
them at the start of your project can help you make an intelligent
deci-sion about the trade-offs involved. And believe me, a little
intelli-gence can go a long way when the stakes are as high --
and the worked is as detailed -- as is the case here.

MDM Data Quality and Information Governance

Any discussion of data quality could begin with this quote from Jo-
seph Juran, a management consultant best known for his work in
quality and quality management, who defined it in such practical
terms that it still resonates despite having been first espoused way
back in 1951.

High-quality data is "fit for their intended uses in operations, deci-
sion making, and planning," he wrote, and it is still true today be-
cause whats important is how well that data supports what youre
trying to do with it, not some esoteric algorithm used to calculate
error rates, or some fancy new piece of technology that will im-
prove your information and walk your dog.

Juran, by the way, was the brother of Academy Award winner Na-
than Juran, who won Best Art Direction for How Green Was My
Valley and directed such science fiction and fantasy films as
Attack of the 50 Foot Woman.

From the standpoint of international best-practice standards, the
ISOs workgroup on industrial data quality breaks the key proper-
ties of data quality into two parts:



Information definition, including relevance, clarity, accessibility
and security, and consistency, and
Information values, including cost/benefit, accuracy, timeliness,
and completeness

These provide excellent signposts on the way to resolving issues
related to poor data quality -- the very focus of an ISO-8000 initia-
tive that is now in the works, and of which the working group is a
part.

NOT resolving these issues is problematic, and considering
the size of the some of the problems, it is surprising that the
topic doesnt occupy a lot more overt mindshare than it does.

Research by database technology expert Jack E. Olson -- a
signifi-cant contributor in his career at such minor firms as IBM,
BMC, Evoke, and NEON Enterprise Software -- suggests that
many or-ganizations seem to accept lesser data quality as a cost
of doing business despite the fact that he estimates fixing the
problem could add 15-25% to their operating profit.

The ultimate irony here is that organizations, as reported by Olson,
are aware of the problem but underestimate the consequences
and have no idea what its costing them. The end result, then, is
that they just let it go.

But lets say you do understand all this. How do you go about ap-
plying that fix?

There are several conclusions to be drawn -- all of which will
sound suspiciously familiar to anyone who has spent any time in
content or records management, or business intelligence.
Ensure the data is relevant, well defined, consistent, available
yet protected from unauthorized viewing or changing, and audit-
able so its history can be documented.

At the same time, understand what the data is worth to your or-
ganization and your users, balancing the expense of
maintaining/protecting/distributing it with the cost associated
with it being incorrect, or incomplete, or slow to arrive, or of
questionable authority.

Its all about the groupings, comparisons, cleansing, and fusion cov-
ered elsewhere in this section of this program, and ensuring they
are properly understood and executed, on an ongoing basis.

The preceding discussion of data quality focused on but one part
of the broader issue known as information governance, which is
the application of formal and informal controls to ensure informa-
tion is managed according to the organizations legal and opera-
tional requirements.

More than policies and procedures, governance is a culture of ac-
countability to which employees at all levels -- senior executives,
business unit managers, end users, and IT, records, and legal staff
-- must be committed. Otherwise, the best technology and the
most well-considered guidelines will mean little, and operational
stan-dardization and compliance both will go out the window.

In a recent study on information governance, the Economist Intelli-
gence Unit found that the single biggest worldwide challenge to
successful adoption of information governance is difficulty in mak-
ing the case for it.

Consultant Barclay Blair has distilled the major factors into these
eight:

1. We Cant Keep Everything Forever -- Having unnecessary
informa-tion around only makes it more difficult and expensive to
har-ness information that has value. Plus, it can be a legal liability.

2. We Cant Throw Everything Away -- Only information governance
provides the framework to make good decisions about what in-
formation to keep and what and when to trash some.

3. E-Discovery -- Proactively managing information reduces the vol-
ume that is exposed to e-discovery and simplifies the task of find-
ing and producing the right stuff in the right timeframe. Plus, not
supporting it, or supporting it well, can have legal ramifica-tions.


4. Employees are Screaming for It -- It helps knowledge workers
sepa-rate "signal" from "noise" and improves information
delivery and productivity

5. It Aint Gonna Get Any Easier -- New laws and technologies that
create new requirements and challenges will make the govern-
ance issue harder over time

6. The Courts Will Come Looking for It -- Coming up short can lead
to fines, sanctions, loss of cases, and other outcomes that have
nega-tive business and financial consequences

7. Manage Risk and IG Is a Big One -- Poor information
governance can lead to poor data quality, poor decisions, poor
service, and ultimately poor financials
8. Email -- Reason Enough. Its the last bastion of uncategorized
business-critical information

Any one of these things presents good reason to travel the govern-
ance path. Taken together, they make a compelling argument for
doing so that smart organizations ignore at their own peril.

The critical first step to achieving governance is the establishment
of an organizational structure to guide, oversee, and arbitrate the
process.

Populated with representatives from all walks of organizational
life, the list of responsibilities is long and generally includes:

Establishing policies and standards, including implementation
methodologies, development platforms, and integration proto-
cols so everything works together the way theyre supposed to

Prioritizing projects, starting with the most achievable as defined
by feasibility, impact, or sponsorship (in other words, who wants
it)

Enforcing rules and providing a conduit to executive authority
for final judgment
Maintaining best practices through shared vocabularies and stan-
dard operating procedures
Establishing a measure-and-improve mindset by capturing met-
rics and analyzing query logs and click trails to identify areas
needing enlargement




Integrating the handling of taxonomy, metadata, user interfaces,
and search to ensure they all work together for usability, compli-
ance, and proper tagging to facilitate automation

Good governance requires that all of these tasks be undertaken
and in an organized way. It wont all happen overnight, though, so
breaking it into smaller pieces -- and perhaps assigning those
pieces to smaller subcommittees -- is not a bad way to go.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org


































29
SECTION 4

Text Analytics


In This Section...

1. Text Analytics































Text Analytics

Also known as text mining, text analytics com-
bines a number of kinds of artificial intelligence
to infer meaning from bodies of textual content --
these include semantic analysis, linguistics, en-
tity extraction, tagging, pattern recognition, and
lexical analysis, and functionality ranges from































auto-classification to clustering content around
specific business targets and determining
whether that content has to do with that business
focus.

Text analytics is a major driver of modern busi-
ness intelligence because it applies the traditional
focus on "data" to the world of unstructured con-


30
tent. But remember that before this can happen, you have to
organ-ize and manage your content first so it can be most
efficiently and accurately found.

A major piece of information organization centers on developing
ontologies, which basically apply rules that specify terms, what
they mean, and what the relationships are between and among
them. As such, they represent domains of knowledge rather than
ways to structure a vocabulary, like their taxonomy cousins.

For example, an ontology for salad would specifically contain the
structure for how it relates to all its parts -- from the ingredients to
the growers to maybe even the rodents that might eat its compo-
nents in the field, and how a salad is different in Japan versus
Italy. So as you can see, an ontology about a particular topic
should en-able you to derive all the knowledge that exists about
the particu-lar topic.

Not surprisingly given the enormous influence of the Internet on
information management, the ontology concept has been formally
extended to the Web in the form of the W3Cs Web Ontology Lan-
guage, known informally as OWL 2. The W3C, of course, is the
World Wide Web Consortium, an international community dedi-
cated to developing Web standards. OWL 2 is one of these, and is
designed to facilitate ontology development and sharing via the
Web, with the ultimate goal being to make Web content more
acces-sible to machines.

Clustering is the grouping of entities and relationships. Through
clustering, a person searching a particular repository might be able
to discover not only a lot of information specifically about the spe-
cific topic of interest, but a lot of related information as well that
improves the whole discoverability process around the topic. And
that is exactly what clustering is all about.

Content analytics takes this same principle and broadens it to cover
not just text documents, but rich media files and other un-structured
content as well. Supporting trend analysis, content as-sessment,
pattern recognition, and exception detection, content ana-lytics
tools provide business intelligence and strategic value across
unstructured data at similar levels to those conventionally associ-
ated with structured data reporting. Content analytics can be put to
many uses besides its "regular" BI context, including fraud detec-
tion, asset protection, healthcare research, market monitoring, and
perhaps the most famous of all: the powering of the Watson com-
puter from IBM that was declared champion on a recent episode of
the TV quiz show Jeopardy. Analyzing the content presented in the
questions allowed Watson to return high-probability answers
(phrased in the form a question, of course!) at a rate that secured it
the victory.

In the same way content analytics takes text analytics to another
level, so does content aggregation do the same with data
aggrega-tion. The idea involves collecting content from internal
and exter-nal resources, but instead of just data, it encompasses
information of all sorts.

Continuing to peel back the layers, the next layer is content entity
extraction, which is the process of automatically pulling metadata
out of unstructured documents so aggregation and other analytics
techniques can be applied. Examples include person names, loca-
tions, dates, and any terms specific to the context.


This can get fairly sophisticated and fairly well automated. For in-
stance, a person entity extractor might know about first name ali-
ases, so it could know that "Bob" is the same thing as "Robert" and
could check the staff directory and fill the correct full name into the
person attribute. In the same way, a date extractor might know
about many different date formats, and a product name extractor
might be able to associate names with a product database and in-
sert the correct ID.

Having all this happen in the background can take much of the bur-
den off of human operators, and can greatly speed the process
along. However, prudence dictates that human beings do get in-
volved at some point for quality-control purposes and of course for
exception handling

Under the covers, most systems employ whats known as named
entity recognition (NER), which takes the form of translating an un-
annotated block of text such as, "Jim bought 300 shares of Acme
Corporation in 2006" into an annotated block of text like the follow-
ing:

<ENAMEX TYPE="PERSON">Jim</ENAMEX> bought <NU-MEX
TYPE="QUANTITY">300</NUMEX> shares of <ENAMEX
TYPE="ORGANIZATION"> Acme Corp.</ENAMEX> in <TIMEX
TYPE="DATE">2006</TIMEX>.

In this example, the annotations have been done using so-called
ENAMEX tags that were developed for the Message Understand-
ing Conference in the 1990s.

NER systems have been created that use grammar-based tech-
niques as well as statistical models. While the former typically ob-
tain better precision, they involve months of work by experienced
computational linguists. The latter, on the other hand, typically re-
quire a large amount of manual annotation to train them -- but once
theyre trained, they can perform at levels within shouting dis-tance
of what human beings can achieve (having been measured at
roughly 93% accuracy vs. 97%). So there are tradeoffs to be had
ei-ther way.

With the content now tagged, comparisons and aggregations
can be performed, and associations then codified to boost
search and business intelligence outcomes -- tasks that are
coming to be known generally as "content curation."

Rohit Bhargava is Senior Vice President of Global Strategy and
Mar-keting for Ogilvy, and an Adjunct Professor for Global
Marketing at Georgetown University. Author of a 2009 blog post
called "Mani-festo For The Content Curator," he predicted that this
role would be one of the fastest growing and most important jobs of
the fu-ture, and he defined it as one who finds, groups, organizes,
or shares the best and most relevant content on a specific issue.
What it does NOT do, he wrote more recently, is add more content
or noise to "the chaotic information overload of social media";
instead it "focuses on helping any one of us to make sense of this
informa-tion by bringing together what is most important."


Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org



















































33
Capture and Manage

























































xxxiv
SECTION 1

Information Capture


In This Section...

1. Types of Capture

2. Indexing Strategies
3. Capture Planning and
Preparation
4. Integration Techniques
5. Compression
Techniques and Formats

















Types of Capture

Information capture is getting information from the
original source into the information manage-ment
system. It can include the process of trans-
forming content into a format that can be reliably
searched, retrieved, and used. Sources include






































hard copy, emails, faxes, Web content, and
infor-mation contained on mobile devices.

One of the earliest and most venerated forms of
capture was called COLD: for Computer Output to
Laser Disc. Now mostly referred to as ERM, for
Enterprise Report Management, it was and is used
to capture, archive, store, and retrieve large-


36
volume data such as accounting reports, loan records, inventories,
shipping and receiving documents, and customer bills. These sys-
tems were typically implemented to replace paper creation and mi-
crofiche solutions, and they usually work by capturing data from
print streams and storing it for subsequent retrieval through a Web
browsers or fat piece of client software.

Document imaging is another old-school-with-a-new-twist tech-
nique that is used to take hard-copy information and digitize it for
use in the information management system. Though paper typi-
cally is what comes to mind for most people, microfilm, microfiche,
blueprints, and other physical content can be and are imaged, ei-
ther by scanning or photography.

A generic process model for a centralized scanning operation
might contain the following steps. The overall optimization of the
end-to-end business process should be measured for time effi-
ciency, costs involved, and other factors.

Sorting and preparation have to do with sorting the originals
into content types -- claim forms vs. photos vs. policy docu-
ments, perhaps, in an insurance context -- and removing
staples and paper clips so the documents can be fed into the
scanner quickly and smoothly.

Scanning is the physical process of using the scanner hardware
to create an image.
Image enhancement tools are then typically used to clean up any
resulting images that are found to need it.
The captured document then needs to be indexed so it can be effi-
ciently found and retrieved.
It is then stored in the information management system prior to
releasing it to users or taking the initial steps of an automated
process.

Straightforward as this sounds -- and is -- there are a number
of critical factors to consider before embarking down this path,
among them:

Whether documents are being scanned as they enter the organi-
zation starting today, or past documents will be scanned in bulk
The number of locations in which scanning is to take place

The volumes and types of paper documents to be scanned

The number and types of scanning equipment required -- for ex-
ample, large scanners may need to be housed in a sound-
proofed room

So there are many moving parts to be thought about before
embark-ing down this path.

Heres another big decision to make: whether you need, want, or
should scan in full color, gray scale, or black-and-white. The temp-
tation may be to scan and store content in the highest resolution,
and in full color, to capture maximum amounts of information. But
the processing load and storage overhead associated with doing
this may not be worth the tradeoff, especially if that extra resolu-
tion and color doesnt add any extra value to the document -- as
would be the case, for instance, with an insurance policy document
vs. a photo of an industrial accident scene. More visibly, the nature
of the image can change dramatically depending upon the color
depth chosen.

If imaging and scanning spring to mind when hearing the word
capture, then my guess is that email doesnt -- and yet, messages
and their attachments together typically represent 70% of what en-
terprise content management systems store. Now, before you say
"We can find things just fine using Outlooks search function," keep
in mind that Outlook and the many other email engines out there
werent designed to be information repositories. Even Exchange
and other back-end systems with additional baked-in capabilities
can take fairly serious performance hits if they are asked to per-
form this function -- I mean, imagine the strain associated with the
example on the screen, which I am assured is a real-life example
that shows more than 21 million items in the inbox. How effi-ciently
do you suppose a search runs in this particular case?

Fax is another information type that often doesnt get the respect it
probably deserves, likely as not because we think of it as involving
archaic dial-up devices that we feed paper into and produce paper
on the other end. But a lot of fax information is computer-based
and uses the fax format as a transport standard of sorts in much
the same way PDF attachments are emailed around. And it sits at
the center of a relatively large market, as these figures from Re-
search and Markets show. So fax is not going away, and the infor-
mation it represents needs to be captured as much as anything
else does.

Another information type that isnt going away is the form. Like the
fax, the form is still in common use -- perhaps even more so -- but
it less often manifests itself on paper than it used to, and so the
perception is that its day is done.
The truth of the matter, though, is that most business processes are
based on forms, even if they look like Web site or iPad screens
rather than boxed-in pieces of paper. The function is very much alive
and well, and all that information that gets filled in has to be captured
if it is going to have any business value at all. Technologi-cally, the
news is all good, as software now exists to automatically
(a) determine whether a document is a form or not, (b) determine
what kind of form it is, and (c) extract the information from the form
and populate or update a database with the latest. If we know
what the form is before the system gets it, or its so highly
structured as to be readily identifiable according to pre-entered pa-
rameters, so much the better. But even wholly unknown and un-
structured forms can be dealt with quite readily, thus providing a
great leg up on the old manual rekeying processes.



AIIM provides deep dive training - both

in person and online - in such areas as ECM,
ERM, SharePoint, Taxonomy and Metadata,
Social Media Governance and others.



This is probably a good time to talk about the major different kinds
of recognition software in use today, because they all come into
play during forms processing -- though their use is far from lim-ited
to there. In no particular order

Intelligent character recognition (ICR) uses neural recognition tech-
niques to detect shapes -- for example, a capital A contains a for-
ward slash, a backward slash and a horizontal dash which ICR
rec-ognizes as a letter "A." ICR is effective on both typed and hand

38
printed text (but not joined up handwriting), and can still be effec-
tive even when encountering a previously unseen font. User also
can "train" the system to recognize new patterns, which it then in-
corporates in its neural network.

Optical character recognition (OCR) is used to recognize text char-
acters, usually by comparing the scanned bitmap of a character
against stored character sets, repeating the process until a match
is found. After so many years of practice and refinement, OCR typi-
cally achieves accuracy of over 98% on typed text -- a figure that
represents a misread of one character in 50. In a credit card
process-ing application (where the average credit card number is
16-18 dig-its long), this means that two transactions in three will be
correctly processed. A number of techniques are used to improve
accuracy including check digit verification, cross matching of
address against zip code, totals balancing and spell checking, and
other techniques too numerous to mention.

Bar code recognition (BCR) provides a simple and highly accurate
technique for capturing information -- pre-printed barcodes are of-
ten applied to forms which will then be hand completed and re-
turned, avoiding the need to index such documents when scanned.
Barcode labels are often used in applications where documents
are indexed prior to scanning, for example if indexing is carried out
in-house but scanning is outsourced.

Optical mark recognition (OMR) is a technique for recognizing pre-
defined shapes in pre-determined positions -- for example a tick or
a cross in a box. Software verification can be used to detect errors
(such as ticking too many boxes), or even to distinguish be-tween a
ticked box and one that has been scrubbed out. It can also
be used for signature detection -- not to recognize the signature,
but to check that something has been entered into a box labeled
"signature."

Hardly a day goes by in which we dont pull some kind of informa-
tion off the Web. Whether were cutting and pasting text into a
Word document, or saving a page or image to our computers for
use in a presentation later on, or downloading files for editing and
emailing, we barely give a thought to what were doing from an in-
formation management perspective. But we should, because all
were doing is adding to the overall "noise" that is present in our
computing infrastructures. When was the last time you added a
metadata tag to something you got from the Web?



The logical current extension to the above scenario is the one that
has all the same characteristics but takes place in our hands and
on our laps, rather than on our desks. Specifically, the exploding
popu-larity of smartphones, pads, tablets, and net and notebooks is
put-ting us in touch with more content, in more places, in more
formats than ever before -- and to be useful and reusable, the most
relevant of it all really ought to be captured in one way or another.

After texting, email is perhaps the most convenient of the alterna-
tives as it is becoming ever more routine to click the little icon on
the screen and send a link to the Web page being viewed to
whom-ever you choose. Apps like "Documents to Go" also
facilitate the transfer of living, editable documents from the smaller
screen to the larger.



But the more clever and sophisticated technique may well be the
use of the now-nearly-ubiquitous cellphone camera to take a pic-
ture of a physical page or object and send that back to a server
somewhere for processing. From that point, the system works the
same way it would if you had simply scanned a piece of paper or a
photograph. But the capture mechanism and interface is a lot more
intuitive and available than a scanner usually is, and we can
expect to see more and more applications arise to take advantage
of the capability.

Indexing Strategies

Indexing involves identifying content and then applying metatags, or
labels, so it can be subsequently tracked and retrieved. As a proc-
ess, it is always critical and is generally time consuming.

How critical is this? Imagine converting a warehouse full of con-
tracts, and indexing them simply as "contract 1," "contract 2", "con-
tract 3," etc. The only way to retrieve any one in particular would be
to know which number referred to the one you want. This is what
indexing does, allowing you to associate real-world descrip-tors with
particular pieces of content, and cross-referencing them using those
labels with other pieces of related information (such as the parties to
the contract, the date it was executed, etc.).

The quality of your search results, therefore, correlates directly
with how good your index is in the same way that the old comput-
ing adage "garbage in/garbage out" spoke to the nature of data-
base information. If your index is poor, youre not going to get
good results when you search on it.

So depending upon how accurate your content is when its mined
and classified, your index probably is going to require tuning by
both a subject matter expert and an IT professional. The most suc-
cessful implementations involve quite a bit of tuning, in fact, in
which a subject matter expert will test a whole series of terms, look
at the results, and work with IT to get more optimal results back.

This effort sounds easy enough, but dont underestimate the
amount of work it actually requires. Tuning an index both during the
initial configuration and on an ongoing basis as new content is
added isnt going to take care of itself -- and the horsepower
needed to rebuild an index after that new content is added is such
that you may not want to do it in the middle of the day lest the peo-
ple needing to search start complaining about the performance hit
they are taking.

It is because of this load that you are probably best served by not
designing and testing the indexing system based solely on a proto-
type or small sampling of content -- a job that runs quickly and well
when executed on 10 documents may fail catastrophically when
targeting 100,000 -- every day! Do it to try out your basic process
and vocabularies, but dont put too much stock in how smoothly it
performs unless it doesnt perform well, in which case you know
you have to go back to the drawing board.

And speaking of process and vocabulary one of the basic index-
ing truths is that you really want to streamline how you do it to the
greatest degree you can. Using standard procedures, forms, and
controlled vocabularies right up front thus is critical, for it is the first
step toward driving quality up and costs down, regardless of
whether the indexing will be done automatically or manually.

The best idea, of course, is to automate as much as possible. Even a
simple approach that captures document-specific metadata (for ex-

ample, the author and title) can make indexing, and thus retrieval,
not only more efficient, but also more reliable and consistent.

If your IT infrastructure includes such databases of user details as
LDAP or email, then it may further be possible to automate the
look-up and entry of more information about the user -- like his or
her job title and/or department -- thus completing additional meta-
data entries automatically and appropriately.

Experience shows that most users are unwilling to devote much
time and effort to making metadata entries -- though they do appre-
ciate the benefits of a metadata-rich repository. The implication
here is that an information system should automate the capture of
metadata values as much as possible, including as many optional
entries as is feasible.

So a number of classes of software has arisen to help.

One example is the Document Information Panel available in the
later versions of Microsoft Office. This may not automate the cap-
ture of index information, but it can at least put some of the
proper fields in front of the user -- and thats a good first step.

Another example is the technical metadata that is captured auto-
matically by a digital camera.
Additional examples include OCR, ICR, OMR, bar coding, and
forms processing, which are explored in the training module on
Capture but are included here because it is in indexing that they
can make some of their most significant contributions. By pluck-
ing key words and descriptors directly from the information it-
self, they can do a pretty good job of providing what you need to
know.
Finally, there is auto-classification software, which offers a varia-
tion on this theme by suggesting values to associate with any
sort of document.

Capture Planning and Preparation

Among the first considerations of information capture has to be to
look at the original sources of your information.
Sources can include the likes of hard copy, emails, faxes, Web con-
tent, and information contained on mobile devices -- a broad array of
media types and formats that must be addressed each in its turn.

Another major consideration has to do how you will balance your
budget, resources, and available time to get the job done. This is
true of every project, of course. But in the context of capture, a
big part of the answer depends upon the point from which you
want to start capturing information:

Everything that has come before, or just part? (known as com-
plete or partial backfile conversion)
Day-forward, meaning everything from today onward or from a
certain date in the future?
Day-forward with on-demand? Meaning everything from a cer-
tain date plus older stuff as it is needed.

One good starting point is to investigate which documents users
need to access often or simultaneously, and which they require quick
access to. Also think about whether you want to capture just the
documents or need to take some or all of any existing metadata as
well, and dont forget to factor in whether or how much of the


content is handwritten, printed text, forms-based, or barcoded --
these all will have an effect on the nature and size of your project.

Also high on the list is developing a strategy for migrating informa-
tion. Migration has to do with moving information between stor-age
types, formats, or computer systems. But sometimes it may make
sense to leave some of it where it lives already, and that deci-sion
has to be made as part of your planning process.

For instance, your organization may not possess the skills required
to prepare data in one database for reading, importing, or process-
ing by another -- or it may be just too expensive to outsource or
too time-consuming even to think about doing. So in a case like
that, you may well choose just to leave the data where it is, and
use links or other means of integration to connect them up for
search and analysis purposes.

Another reason to leave them in place is that the information in
question may be too active to take offline for the time needed to
perform the necessary Extract, Transform, and Load functions
that are part-and-parcel of this activity. Or it may be that the
system re-sponsible for certain data is so well optimized for the
job that it doesnt make any sense to subtract it from the equation.
This is of-ten the case with specialized insurance or medical
solutions, to name just two.

How to ensure a certain level of content quality -- and handle the
instances in which something goes awry, as will be inevitable -- is
another piece of the puzzle that organizations are well advised to
think about before problems arise!
One of the hardest parts of this is deciding what level of perform-
ance will be deemed acceptable: 80% trouble-free? 90%? 95%?
One thing is for sure: you wont achieve 100% perfection no
matter what you do! But the ultra-high-90s are within reach if you
take the time to clean and tag your content properly.

The other side of the quality question relates to scanning, which in-
volves the same complexities as data migration in terms of prop-
erly handling the data after its been extracted from the image.

But before that, you have to take care to optimize the systems
op-portunity to produce clean information by allowing for image
en-hancement, in terms of rotating pages if need be, deskewing
them if theyre crooked, despeckling them to remove stray spots
or any "shadows" left by dust so they dont interfere with the OCR
or other extraction technique used, and so forth.

A lot of this can be readily taken care of by the systems
themselves according to parameters you set. But at the end of the
day, some-one has to sit at a station somewhere and look at at
least a random sample of the images and data coming through to
ensure the out-put is what was expected -- and a workflow must
be established to route any exceptions the system finds by itself to
whomever will be fixing them.

And then there are physical preparations to be made, particu-
larly in an imaging environment.

The documents themselves have to be rounded up, of course,
and a note to that effect left in the storage boxes or file
cabinets from whence they were retrieved so other people
know what happened to them.



42
They then should to be sorted into logical batches to help the
identification and tagging process by minimizing the chances
mixing and matching will occur.

Any envelopes need to be opened, and their contents -- as
well as any other material -- must be unfolded before being
sent through the scanner.

And dont forget to remove any staples or paper clips along the
way so they dont jam up or break the hardware!

Integration Techniques

The goal is to transfer captured content into the repository in which it
will be stored -- or in those instances in which the decision has been
made to continue to manage sections of content in their original
places, integrating those "places" into the overall flow of information
so the solution appears and operates as a single entity.

Most of the major entries in the market make the importing of indi-
vidual documents fairly straightforward, using pulldown menus or
something similar, and simple spaces in which you can ensure the
metadata and database fields match up. This only works, re-
member, if youve already done the heavy master data manage-
ment work needed to clean and reconcile them.

An "ancient" way to facilitate data sharing, having been developed
in 1992, ODBC (Open Database Connectivity) accomplishes plat-
form and language independence by using a special driver as a
translation layer between the application and the DBMS. The appli-
cation thus only needs to know ODBC syntax, and the driver can
then pass the query to the DBMS in its native format, returning the
data in a format the application can understand. This is the data-
base equivalent to what Windows did for office printing years ago,
thus eliminating the need for software vendors to develop different
drivers for each and every printer, and instead just develop one, for
Windows.

XML has a home here as well. Short for Extensible Markup Lan-
guage, it is a method of encoding documents, and parts of docu-
ments, so they can be more easily searched and parsed.

Although XML focuses mainly on documents, it is also widely
used to represent data structures, and many application program-
ming interfaces (APIs) have been developed so software can
proc-ess XML data. Web services, too, can rely heavily on it.

Web services are defined by the W3C -- the people who brought
you the World Wide Web -- as "a software system designed to
sup-port interoperable machine-to-machine interaction over a net-
work." Or put into human terms, they represent a standard way to
get computing systems to talk to one another.

So-called "big Web services" communicate using XML messages
that adhere to a popular standard called SOAP, which stands for
Simple Object Access Protocol and is used to exchange structured
information between systems.

"Web APIs," on the other hand, are moving away from SOAP-based
communications and toward REST -- Representational State
Transfer. These allow the combination of multiple Web services into
new Web 2.0 applications known as mashups, and they do not
require XML either. So clearly this is a moving and advancing tar-
get, and much of it comes home to roost where software- and



infrastructure-as-a-service -- the "cloud," to you and me --
meets the enterprise.

There is one subset of Web services that is aimed specifically at
im-proving interoperability between content management solutions.
Leveraging both SOAP and REST, CMIS uses Web services and
Web 2.0 interfaces to enable rich information to be shared across
Internet protocols in vendor-neutral formats, among document sys-
tems, publishers and repositories, within one enterprise and be-
tween companies. Not yet in universal use, it has advanced the
cause to a significant degree and is expected to continue gain trac-
tion.

Application programming interfaces (APIs) are bits of program-
matic code that facilitate interactions between software programs in
the same way that graphical user interfaces do so between peo-ple
and computers. Not all vendors open theirs up for customer use,
desiring instead to be paid for these sorts of integration serv-ices.
But this has changed some over the past bunch of years, and its
definitely something worth asking about if your organization has the
skills in house to take advantage of them.

It is especially worth inquiring about if the vendor offers a soft-
ware development kit (SDK), a set of development tools built spe-
cifically to ease the work surrounding a particular application.
These often are offered for free as a way to encourage organiza-
tions to buy the software with which they are associated, and be-
sides an API, they may include debugging aids, sample code,
and supporting technical notes or documentation.

Enterprise Application Integration takes this programming option
one step further by creating a single interface point for multiple
back-end data sources or applications. Focused on system-to-
system integration, it includes plans, methods, and tools for con-
solidating and coordinating solutions so they can act as one.

Some early EAI solutions were simply preconfigured connectors
between specific applications that saved the work of doing the de-
velopment yourself. As things progressed, though, EAI really be-
came a platform that business applications could use specific pre-
built adapters to plug directly into, thus making data transforma-
tion a lot more straightforward, and putting it squarely in the realm
of information capture in more complex situations.



Compression Techniques and Formats

Compression is a way to reduce content storage and bandwidth
re-quirements, thereby reducing the need for storage space,
enabling faster transmission, and even creating a whole new
market, as the MP3 format did for portable music players.The
process entails us-ing specific encoding schemes that use fewer
bits (or other information-bearing units) to encode information than
are used by un-encoded representations.

Some schemes are reversible so that the original data can be recon-
structed; this is called lossless compression. This is opposed to other
schemes that accept some loss of data in order to achieve higher
compression; these, perhaps unsurprisingly, are called lossy.

In an age in which sending email attachments is as natural as post-
ing a letter used to be, the idea of "zipping" a file to make it smaller is
fairly well understood. But its important to remember what is
happening with your files because compression can reduce the

quality of your content, or introduce errors that could prove to be
catastrophic, or open the door to legal challenges if it introduces
questions regarding integrity or ownership. And especially on an
enterprise level, none of these promise to have a happy outcome.

Compression formats come in many varieties. In the document
world, the most commonly-used compression file format is known
as "ZIP." In use and widely available since the late 1990s, it gets
its name, as most of the formats do, from the file extension it
carries -- in this case, "dot-zip."

Another prominent type is the RAR file, which differs from ZIP pri-
marily because it compresses better and can break a single RAR
file up into multiple files to facilitate compliance with email or
upload-ing filesize constraints. It is much less common than ZIP,
however, so you want to take care in terms of under which
circumstances and for whom you want to use it.

MP3 plays the same role but for audio files -- a "document" of a dif-
ferent type! -- and is noted for its ability to maintain the quality of
sound despite being compressed. For video, the likes of MP4,
MPG, WMV, and AVI are among the more popular.

From an organizational standpoint, PDF is perhaps the most widely
used standard document format around -- which is ironic, because it
was developed as a proprietary product by Adobe and was adopted
as an official standard in 2008 (ISO 32000-1:2008, to be specific).
This so-called "Portable Document Format" is notable for its ability to
faithfully represent documents created in pretty much any application,
but do so independent of that original application, any hardware, or
operating system. Because it maintains the same page arrangement,
fonts, colors, and pretty much all other charac-
teristics of the original -- and because most browsers either bake-in
or make easy the ability to read PDF files, with or without the offi-
cial Adobe Reader -- PDF is frequently for format of choice for cap-
turing documents that will be presented on the Web.

XPS is Microsofts entry into the same sweepstakes. A functional
quasi-competitor to PDF, it stands for "XML Paper Specification,"
utilizes ZIP compression, and can support XML versioning and ex-
tensibility. It does not, however, support dynamic content, such as
content contained in a drop-down menu on a form. Still, it has
been accepted as a standard as well, in the form of ECMA 388.

On the image side of the house, TIFF -- the Tagged Image File
For-mat -- is a popular lossless format that is good for archiving
be-cause files may be edited and saved without losing
compression. Tags may also be used to handle multiple images
and data within a single file. File sizes can be large, however,
especially for color im-ages.

Anyone who has spent any time with a digital camera will recog-nize
the next acronym on our list: JPG. It actually stands for some-thing
too -- "Joint Photographic Experts Group" -- but because it is a lossy
format, it is better suited for photographs than for text or images that
have to be lossless. Newer versions of JPG have ad-dressed this,
however. JPEG2000 is based on wavelet compression and is
applicable for modern digital imaging cases like digital cam-eras, but
also pre-press and medical imaging. JPEG2000, Part 1 (ISO
standard 15444) offers lossless and lossy compression and bet-ter
image quality at smaller file sizes. JPEG2000, Part 2 (ISO 15444/ 6
for those keeping score at home) compresses scanned color docu-
ments with bitonal elements as well as images.


PNG -- short for Portable Network Graphics -- is a bitmapped im-
age format that employs lossless data compression. Created to
im-prove upon and replace GIF (Graphics Interchange Format) as
an image-file format, it was designed for transferring images on
the Internet, not for professional-quality print graphics. As such, it
does not support non-RGB color spaces such as CMYK, but can
be used instead of JPG for line drawings and text not requiring
higher resolutions.

GIF is also a lossless bitmap image format, but a very old one in
the scheme of things. Introduced by CompuServe in 1987, its sup-
port and portability made it a fixture on the early Web, but its tech-
nical color limitations makes it unsuitable for reproducing photo-
graphs and other images with continuous color. It is still well suited
for simpler images like graphics or logos with solid areas of color,
though.

Chapter 2.1.6 - Capture Process Mapping and Shared Drive

Cleanup

Video - Chapter 2.1.6 - Capture Process Mapping and Shared Drive

Cleanup

"Capture" is a lot broader than just "imaging." Some of the major
steps to consider involve the following:

Identify

The information you want to capture

Doc type, line of business, geographic region, etc.

The format it is in/medium is it on (paper, PDF, database, etc.)
Where the information can be found, how it can be accessed
and/or delivered to you

Sort

Separate -- either physically or virtually -- into logical batches for
efficiency "purity" (e.g., claim forms vs. policy documents vs. cus-
tomer contact and beneficiary information, perhaps, in an insur-
ance context)

Insert separator sheets -- either physically or virtually -- with
patch codes or barcodes to distinguish the start of a new batch
of documents

This represents a lot of extra work and cost (for people and
sup-plies) for organizations manually sorting physical paper,
and it can be mitigated by electronically classifying documents
prior to, rather than after, scanning them.

Prepare

For data: Cleanse, extract, transform

For hard copy: open, unfold, remove paper clips/staples; insert
separator sheets

Capture

For data: load

For hard copy: scan
Validate

Test/inspect for accuracy and integrity




Correct as necessary
Classify/Index

Import/apply metatags to enable search and retrieval
Store and Distribute

Make the information available to users

Simply identifying these steps isnt enough, however -- in addi-
tion, it is critical that you assign resources to each one, and set
ex-pectations for how long it should take. Then the whole thing
can be drawn out as a process map, and tracked and amended
as cir-cumstances dictate.

Lets put some of this into a practical context: specifically, the
clean-ing up of your shared directories. Practically every
organization has them -- those network folders that everybody can
get to, and into which everybody puts documents so everybody
else can get to them. In the beginning, there probably was some
rhyme and rea-son behind the way they were set up and labeled.
But now, finding particular documents is well nigh impossible, to
the point where its easier and faster to recreate one than to look
for it. So the prob-lem propagates as these new recreations also
get socked away, and its far worse than needles in a haystack
because needles at least look different than haystacks do.

The solution is to throw some serious process at the problem by
mapping out and following particular steps, in order, and with con-
viction. Its neither fun nor quick, but it does work and boils down to
these handful of tasks, specifically defined in terms of what, who,
and by when.
Assign responsibility for every main folder

With responsibility comes accountability, one key to getting
work done
Conduct a document inventory to understand: 1) what kinds of
documents they are 2) what projects they relate to 3) what
depart-ments they come from 4) who created them and is most
likely to need them 5) how long its been since they were last
accessed or updated and thus 6) what should be saved and what
may be a candidate for deletion

Organize folders by functions or activities, not people, and tag
them accordingly -- Why? Because while people leave or
change roles, functions remain the same or similar and
organization and tagging sets the stage for interoperability with
or injection into an information management system

Identify and apply proper security controls

Identify and apply proper retention schedules

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org



47
SECTION 2

BPM


In This Section...

1. Process Improvement
Technologies
2. Relating Process and
Information Management
3. Analysis Techniques
4. Process Auditing
5. Routing by Roles and
Responsibilities
6. BPM and BPR














Process Improvement Technologies

The simplest process improvement technologies
is cleverly known as routing or simple workflow.
It moves content -- very often in the form of con-
ventional documents -- from one place or person
to another, and when task A is complete, it al-
lows for task B to begin. Routing tends to be ad-















































hoc, without any automated rules processing, and
with little or no integration between the proc-ess
management and the affected applications.
Instead, it is pretty much person-to-person.

Workflow is more than just simply moving
things from A to B to C to D because it allows
tasks to be carried out in parallel, saving time


48
and increasing productivity. Able to manage multiple processes tak-
ing place at the same time, it accommodates exceptions and condi-
tions by applying user-defined rules.

Workflow also generally includes a graphical process designer with
which users can chart and refine the way they want their proc-
esses to flow, to whom, and under what time constraints or other
conditions.

BPM itself is perhaps the "ultra" process improvement technique
because it explicitly addresses the complexity of inter-application
and cross-repository processes, and incorporates data-driven as
well as content-driven processes -- all on an ongoing basis.
Usually driven by business rules, it involves a lot of operational
analysis and flow charting, and the more sophisticated offerings in
the space include not only process designers, but also simulation
tools so processes can be run virtually to identify bottlenecks or
other is-sues related to either people or underlying infrastructure.

The trick, of course, is to figure out which of these major process
automation tools makes the most sense for you. Imagining them
as existing along a continuum of sophistication can help this
thinking along, for it helps to illustrate how routing would be best
suited for straightforward document approval processes, say,
especially in smaller organizations.

On the other hand, it would be wholly unsuitable for something
like a major insurance claims process that involves multiple docu-
ments, document types, people, departments, and systems, all
working in parallel and yet relying on each other for information so
a rapid conclusion can be reached. Here, BPM would be the right
call -- though just to make life more interesting, a smaller op-
eration might do just fine with a high-end workflow solution. At
the end of the day, the separations between the categories are
not hard and fast. So as it always does, picking the right one
means truly understanding your own situation first.

Business activity monitoring is closely related to business intelli-
gence because it involves analyzing and displaying data. However,
the data is analyzes and displays relates not to whats housed in
the organizations information repositories, but rather to the busi-
ness process being monitored. This data is typically displayed on a
dashboard or in reports. This is an important element of process im-
provement technology because its vital to know what a process
status is, who has what information, when they got it, and whats
supposed to happen to it next, and by when. In each case "it" is the
process or an element of the process.

Transactional content management targets processes that focus on
enacting business or bringing about a decision or end-result. These
processes are not focused on creating content, but using content to
help drive actions and decisions. Examples include invoice
process-ing, application processing, employee onboarding,
accounts pay-able, insurance claims, patient charts, and the
processing of per-mits and loans.

We talk about transactional content management as a process tech-
nology because it usually requires a lot of workflow configuration and
integration with other systems. Being content-oriented, though, it just
as easily could have been listed under content man-agement as the
name suggests. For charting purposes, think of it as occupying a
space right along the border between the process and content
disciplines -- which are fairly tightly tied together anyway.


Sometimes known as adaptive or dynamic case management,
ad-vanced case management endeavors to improve the
performance of an organization by putting case information front-
and-center rather than considering the process as primary, the
way workflow and BPM do. Such information will be accessed
over the entire length of time the case is open, and in many
instances, it will be-come the official record for that work.

A "case," of course, is a compendium of information, processes, ad-
vanced analytics, business rules, collaboration, and sometimes so-
cial computing that relates to a particular interaction with or issue
involving a particular party like a customer, supplier, patient, stu-
dent, or defendant. Case management solutions are designed to
manage all this to help drive more successful optimized outcomes.

Relating Process and Information Management

Business reality is that process and information management are
very closely related. Most processes exist to facilitate the
transport of some form of content from Point A to Point B, and
content with-out a means to get there is pretty well devoid of
business value. So when considering changing one, it is important
to consider the likely affects on the other at the same time. Lets
look at a real-life example to bring this into focus.

A long time ago, the way banks processed cashed checks was as
fol-lows:

A courier would ride around and collect the paper checks re-
ceived by tellers all over town.
These checks would then be brought to the proof department,
where they would be run through the MICR machine to read the
banks routing and account numbers.

An operator would then key the dollar amount into the banks
mainframe computer, which in turn communicated with the issu-
ing bank to verify the transaction.

From here, the checks would be microfilmed, and the films
would be archived. The paper checks themselves would go
to the Federal Reserve Bank for clearing and return.

Todays emerging model is much smoother and quicker:

A specialty scanner at the tellers window is used to image the
check right there at the first point of customer contact, and the
image is immediately stored away.

The scanner reads the MICR code and the amount, and electroni-
cally communicates with issuing bank for verification.

No more courier, no more proof department, no more microfilm.
Thanks to Check 21 (the Check Clearing for the 21st Century Act), the
paper check itself is no longer even needed -- not even by banks,
which increasingly are letting customers use their cellphone cameras
to image and submit the checks theyre depositing. The point of this
story is that moving the scanner to the start of the proc-ess shortens
the process itself and changes the way the information is being
managed -- and effect that can be seen in pretty much any process
that involves almost any kind of information.

Now, not every example is as dramatic as this one is, but the dy-
namic it illustrates really shows how interconnected process and


50
information really are. It also highlights another interesting truism,
which is that an organizations technology can shed light on what
its processes are -- an especially important fact for those times
when no one really knows how things work, only that they just do!

Lets see what we can discern from our old-time banking example.
Looking purely at the technologies involved and the purposes they
serve, we see:

A courier used for transport

MICR readers for data extraction

Key from paper for information capture, and

Microfilm for archiving purposes

Today, we might find:

A scanner for data extraction and information capture

The Internet for transport, and

An image repository for archiving

From a process standpoint, both scenarios indicate that informa-
tion is being sent from the branches to a central location, critical
ac-count metadata is being automatically identified and entered,
criti-cal transaction information is being captured, and the source
docu-ment is being preserved.

Analysis Techniques

Business process analysis involves decomposing a process to un-
derstand its basic tasks, routes, and rules, and to chart its perform-
ance in terms of resources consumed, time taken, etc. The idea is to
uncover the inevitable inefficiencies that creep in so bottlenecks,
loops, and any unnecessary work can be eliminated. The analytical
process starts, of course, with collecting information about the busi-
ness process youre studying, and it ought to involve a series of dif-
ferent activities. One of the more important ones is to review any
existing documentation that directly or indirectly illuminates how
work gets done in the business unit or organization youre focus-ing
on. Even where particular activities appear chaotic and no pro-
cedure manuals are available, individuals often keep notes to re-
mind them how to do things, and this can be great soup-starter for
your analysis task.

Personal interviews are also proven effective, either one-on-one or
in small group settings -- or both! Organizational culture is a major
contributor to the effectiveness of one vs. the other, as people who
feel free to talk obviously will do so much more readily than those
who wonder whether their inputs will be viewed as a senior man-
agement criticism.

Another very good method is to shadow and observe a process in
action. It feels strange at first because you literally sit and watch
how people work, and make notes about what you see. In the be-
ginning, some people will do things the way they think you want to
see them done. But after a while they revert to their usual habits,
and it can be interesting to see how what they tell you in an inter-
view differs from what actually takes place. At the end of the day,
the reason youre collecting all that good insight is to write it all
down so you can analyze and improve upon it.

One of the best ways of doing this is to draw a map of how you un-
derstand the process flows, and confirm your view with some of the
people who take part in it.

Flowcharting is the more basic of the two major varieties of
maps in use. A simple technique, it calls for sketching out the
or-der and flow of activities within an organization, and creating
a graphic of the sequence and key elements.

Process models are more advanced, as they take a flowcharts
in-formation and add data from other sources to flesh out the
dia-gram. The end result is a detailed construction of what
occurs in each step of a process and how different processes
link together. The capturing of this kind of detail means process
models also can be used to support the simulation of flows to
check for effi-ciencies and bottlenecks, and support the future
monitoring of improved processes.

Properly drawing process maps is not simply a matter of drawing
boxes on a screen and connecting them up with little arrows -- in
fact, the activity is laden with notational shorthands to make very
clear just what is happening where in the process. The standard
used for this is called Business Process Model and Notation
(BPMN), a graphical representation for specifying business proc-
esses in a business process model. Developed by the Business
Proc-ess Management Initiative (BPMI), it has been maintained by
the Object Management Group since the two organizations
merged in 2005. BPMN uses specific symbols to represent specific
process ele-ments:

Flow objects, including Events (start, end), Activities (tasks), and
Gateways (process forks and merges)
Connecting Objects, including Sequence Flows (the flow order),
Message Flows (communications across organizational bounda-
ries), and Association (inputs and results)

Swimlanes, including Pools (major organizational participants in
a process) and Lanes (indicating activities within a pool accord-
ing to function or role), and

Artifacts, including Data Objects (showing the reader which data
is required or produced in an activity), Groups (of activities), and
Annotations (for clarity)

BPMN can be used a graphical front-end to something called
BPEL, or Business Process Execution Language, which is an
OASIS standard XML-based language for actually executing
business proc-ess actions via Web services. Because both are still
emerging in terms of practical implementation, this tight coupling is
not yet a given, though it is representative of where the future of
process analysis and automation lies.

Process Auditing

In April of 2007, management guru Michael Hammer wrote in the
Harvard Business Review that a "revamped business process
needs employees to focus on a broad, common outcome; if the
organiza-tion measures performance as it has always done, it will
reward people for focusing on narrow, functional goals. How can
the proc-ess live up to its potential under those circumstances?"
With that, Hammer introduced the notion of the formal Process
Audit, which picks up and runs with the concept by focusing on
outcomes, rather than procedures, which are very task-oriented.

Taking it to the next step, Hammer then codified his thinking into
what he called the Process and Enterprise Maturity Model
(PEMM), a framework for evaluating an organizations position on
the process-improvement scale. This model encompasses 5
process enablers, which pertain to individual processes, and 4
enterprise capabilities, which apply to entire organizations.

The process enablers are:

Design: How comprehensively the process is to be executed is
specified
Performers: The people who execute the process, particularly in
terms of their skills and knowledge
Owner: A senior executive who has responsibility for the process
and its results
Infrastructure: The information and management systems that
support the process
Metrics: The measures the company uses to track the
process performance

The enterprise capabilities are:

Leadership: The senior executives who support the creation of
processes
Culture: The values of customer focus, teamwork, personal ac-
countability, and a willingness to change
Expertise: Skills in, and methodology for, process redesign
Governance: Mechanisms for managing complex projects and
change initiatives

These items are notable not just for their effect on process out-
comes, but also, more directly in our context, for how well they
map to long-accepted best-practices for information
management, like:

Securing an executive sponsor

Focusing on governance

Measuring wherever possible

Paying heed to the people and not just the technology

Changing the organizational culture

All of these are accepted as givens by most thought leaders --
even though they are not always adhered to.

Hammer used a spreadsheet to capture his audit information. Basi-
cally, for each cell outlining the maturity level of a particular char-
acteristic, the analyst conducting the audit applied a color code in
the corresponding box to the right, according to the degree to
which he believes the characteristic is true. The placement and
number of red marks made it immediately apparent where the ma-
jor points of focus ought to be.

Hammers isnt the only model in existence, of course. Bearing-
Point developed one a few years before Hammers; it takes a
some-what more infrastructural approach but asks the same basic
ques-tions: how ready is your organization to take process
improvement to heart, and where do you go from here?


53
The gurus at consulting firm Transition Support offer yet another
variation on the theme. Directed by David Hoyle and John Thomp-
son, the company offers a number of key points to capture while
reviewing, improving, and measuring processes.

Identify the process objectives and identify the factors affecting
success
Establish how the objectives will be achieved and verify that ap-
propriate controls are in place
Establish the competences and capabilities required to deliver the
process outputs, and that they are being assessed effectively
Establish what results are being achieved and how they are be-
ing measured, and verify their integrity
Establish that performance, efficiency, and effectiveness are
be-ing reviewed and pursued

All of these incorporate the same themes -- measurement,
technol-ogy, people -- but in a slightly different style.

Routing by Roles and Responsibilities

Roles, of course, define the function people perform for their em-
ployers: marketing, accounting, technology management, etc. Gen-
erally speaking, they align with the organizational structure -- and
when they do, information managers say a word of thanks (or at
least, they should) because the enterprises telephone and systems
directories often are managed the same way through functions like
Active Directory and LDAP (the Lightweight Directory Access Pro-
tocol). And where they exist, they are ready and waiting to be lever-
aged by any workflow or BPM system worth its salt.
Responsibilities, on the other hand, involve the things theyre ac-
countable for getting done: writing press releases, preparing the
quarterly statements, supporting users, etc. Here, the delineations
may not follow the org chart as closely since, for example, product
marketers can live in the lines of business while corporate market-
ers can occupy staff positions at HQ, both at the same time.

Understanding and applying roles and responsibilities as a routing
tool is important because these attributes provide ready "handles"
to use to "steer" processes at a more macro level than a specific
per-son.

For example, even a fairly simple workflow system can be set up to
track activities by due date, and to send alerts to process partici-
pants and supervisors when a deadline is missed. But as you
move up the spectrum of sophistication, the system can also
automati-cally reroute the work to someone else who plays the
same role or has the same responsibility.

This same capability comes into play when the system is notified
that someone is out on vacation, or has left the organization -- ei-
ther directly by a user or the HR department, or automatically from
changes made to the enterprise directory (remember LDAP?).
Thus, processes dont have to be "reprogrammed" every time a
per-son comes or goes; they can simply incorporate or exclude as
neces-sary according to the rules that have been established.

This automation can be extended by using timestamps as triggers of
"next steps," just as the deadline information is used in the exam-ple
we just discussed. In this case, the logging of a completed step --
say, the approval of a brochure, or the uploading of a new con-


tract -- would kick off the next round of activity without need for
any human intervention.

Where this gets interesting is when processes kick of other proc-
esses -- as when receipt of an order from a catalog sends an
auto-mated "thank you for your order" message and sets the
inventory picking work in motion. In most modern organizations,
all of this is required to happen within a very short period of time,
and the ability to track to the second when different processes
begin and end is an effective management tool indeed. Never
mind that it plays directly into making life just a little easier for
those in the af-fected roles and/or with the involved responsibility.

This raises yet another interesting benefit of routing via roles and
responsibility: namely, the ability to foster and manage several
processes at once, rather than to have to tend them one at a time
as human beings necessarily do. Being able to split the work onto
par-allel tracks -- such as sending the automated "thank you" note
while the pick list is generated and perhaps the shipping label pre-
pared as well -- clearly can slash a process timetable compared to
having it be performed in sequence. This is one of the most dra-
matic (if obvious) advantages of using workflow and BPM, and the
key to making it work is basing it on roles and responsibilities, not
individuals.

BPM and BPR

Earlier in this domain, we called business process management
(BPM) the "ultra" process improvement technique because it explic-
itly addresses the complexity of inter-application and cross-repository
processes, and incorporates data as well as content -- all on an
ongoing basis. Well, business process reengineering (BPR) is
much the same thing. Also known as business process redesign,
business transformation, and business process change manage-
ment, it differs mostly because it places somewhat more
emphasis on the "R" part -- the reengineering -- than the "M" (the
manage-ment).

Strictly speaking, what this means is that BPR is about making big
one-time changes in how work gets done and decisions get made,
while BPM is about making more smaller changes over time. In
practical fact, the difference has come to be much less stark than it
once was -- perhaps because at one point "BPR" had come to mean
"downsizing" to many -- but the notion of "permanent fix" vs. "itera-tive
improvement" persists as ways to pursue better future-states.

BPR is rooted in the late 1980s, when management gurus Michael
Hammer and James Champy published their seminal book Reengi-
neering the Corporation, which basically said that organizations
must radically remake themselves and make better use of informa-
tion technology in order to become more competitive. As a
manage-ment "manifesto," it strikes many of the same themes that
we see in present-day BPM philosophy. Where theres a difference,
its that -- in the authors own words -- reengineering is "an
intensive, top-down vision-driven effort that requires non-stop
senior manage-ment participation and support." BPM, on the other
hand, can be driven from the middle or the bottom of the
organization, though having a senior sponsor at some point does
become more than just a little useful.

BPM and BPR also share a focus on the participants in an organiza-
tions business processes -- the people wed call users in technol-ogy
terms. In both cases, much attention is paid to how they actu-



ally do their work -- not merely how they think they do -- and in
what ways changes can be made to make it better. BPR, though,
is predicated on taking this one step further than BPM, as it has
within its philosophical charter the potential to actually reorganize
the organization itself. BPM activities can lead to this, but when it
happens, it tends to be more of a byproduct than an articulated po-
tential outcome.

Another manifestation of this heavy-handedness, if you will, is a
reflection of BPRs top-down character, which tends to leave us-
ers feeling it is happening TO them rather than FOR them. Either
condition can, of course, exist in either BPR or BPM, so the distinc-
tion may be of little practical import. But it is something of a defini-
tional difference between them.

And then theres the focus on the flow of the work and information
those users engage with every day, which has to be captured and
studied before it can be improved. This may be where BPM and
BPR align most closely, as both are process-oriented practices at
the base. They can make excellent use of the same mapping and
analy-sis tools, and they share the goal of boosting efficiency and
effec-tiveness.

If theres a formal difference, it likely lies in the realm of informa-
tion technology, which is called out fairly specifically in the BPR
dogma but is more ancillary to BPM. Again, as a matter of practical
fact, good programs of both kinds will place IT front-and-center, so
the impact on best-practices is nil.

Perhaps the most controversial area the two approaches touch upon
is staffing. At the end of the day, each is aiming to install proc-esses
that are as streamlined as possible -- a goal that if taken to its
extreme suggests an inevitable reduction in force. In fact, it is this very
brush that tainted BPR for much of the 1990s and early 2000s, when
people would say theyd been "reengineered" out of a job.

The truth is, though, that neither BPR nor BPM has to lead to this
outcome. Much depends on how poorly orchestrated an organiza-
tion is to begin with, but many times the result is either greater pro-
ductivity, a redeployment of existing staff, the ability to delay new
hiring following a departure, or some combination of the three.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org


SECTION 3

Knowledge Management


In This Section...

1. Knowledge Management































Knowledge Management

Like many of the practices covered by this Guide,
knowledge management is an old story that is be-
coming new again, thanks in no small part to ad-
vances in business process and collaborative com-
puting technology. In a nutshell, it is the practice of
systematically -- and continually, for its a prac-































tice, not a project -- capturing, controlling, and
disseminating organizational intelligence among
its workers in other words, what the organiza-
tion knows by virtue of its people. While this
sounds eminently logical and straightforward, the
practicality is quite challenging because most of
this knowledge is locked in peoples heads --



57
so-called tacit knowledge, vs. explicit knowledge, which is written
down in the form of procedure manuals and the like.

Managing knowledge often begins with what is known as a knowl-
edge audit, which is an investigation into where knowledge is pro-
duced, where there may be need for further input, and where
knowledge transfer is required. Expressed less formally, its a dedi-
cated exercise aimed at codifying who knows what about what,
what nobody seems to know enough about, and where it would be
valuable for people to share what they know.

Now, because they themselves often dont even know what they
know, unlocking and codifying it can be, and is, difficult indeed. So
myriad methodologies have arisen over the years to provide
guidance in this regard. Weve found this Knowledge Acquisition
Unified Framework by Dr. Tony Rhem to be particularly illuminat-
ing, so lets take a moment here to run down the steps.

Define domain knowledge: which department/division/ location,
and what knowledge is pervasive there? use a knowl-edge map
also known as expertise location include customers/clients,
who may have wholly different sets of knowl-edge that internal
people do

Decompose the knowledge domain: parse the task among sub-
ject matter experts, managers, etc. use taxonomy as a guide to
how to organize collected knowledge

Determine interdependencies: identify what information/which
people rely on input from others and reconcile any inconsisten-
cies
Recognize knowledge patterns: analyze with an eye toward a
process of Connect, Collect, Catalogue, and Reuse, and pay
atten-tion to patterns that emerge to make your job more efficient

Determine judgments in knowledge: i.e., separate fact from opin-
ion, objective from subjective
Perform conflict resolution: where findings are fuzzy, determine
how to achieve clarity and/or when to eliminate an element from
consideration

Capture/catalog the knowledge: via interviews, wikis, online
forms, and other forms (also known as knowledge sharing).

The preceding serves as an effective guide to identifying and
gath-ering knowledge from around the organization. Actually
dissemi-nating it, however, requires not one but a collection of
technolo-gies, the precise makeup of which depends upon the
nature of your business, your staff, and your existing
infrastructure. There are a few essentials, though, that youll want
to look at and install in some combination or another, including:

Social computing applications like wikis, blogs, and shared book-
marks: YouTube, Digg

Online presence tools for instant communication: Yammer (corpo-
rate Twitter, basically), Skype, Digsby (instant messaging)

Collaborative workspaces for project participation: eRoom from
Documentum, Box.net to a degree

Workflow or BPM systems that can automatically link you, per-
haps thru LDAP or other directories, to the right people even if
you dont know who they are

58
One of the larger challenges is finding ways to enable people to
effi-ciently share their tacit knowledge without forcing them to
endure the time and pain of making it explicit -- in other words, to
expose them to mechanisms through which their knowledge can
simply "come out" during the natural course of events.

This has the added benefit of leaving them feeling empowered as
valued contributors, and wanting to contribute more, rather than
feeling like they are being forced to give up what they know and
thus relinquish their organizational power.

Webconferencing is one way this can occur, as the electronic com-
ing together for a common purpose allows for all contributions to be
automatically captured if it is set up that way. Thus, anything
anyone says, types, or otherwise communicates can be stored and
mined for knowledge later on -- without having anyone feel put
upon or simply silly as they fumble around in an effort to articu-late
what they know they know but cant easily explain.

Simply observing people in action is another way to bring their
tacit knowledge to the surface, as they almost certainly tap into in-
formation reserves while they work that they are not even con-
sciously aware of. Watching them perform and taking note of their
thought processes is a great way to get them to "share" -- even if
they never speak a word!

These last comments bring us to something Im sure you already
know: that all the best intentions, and all the best technology will
mean little unless your organizational culture actually supports --
nay, encourages -- people to share what they know! So making
knowledge management a regular way of life often requires that
you make change management a big part of what you do. Simply
installing new systems for wiki creation, online presence, and col-
laboration isnt enough -- and truth be told, if you had to choose,
youd be better off changing the culture than changing the technol-
ogy.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org


SECTION 4

Email Management


In This Section...

1. Email Management
Concepts and Issues
2. Email Architecture - Backup
and Archiving
3. What and When to Manage
Email




















Email Management Concepts and Issues

Email management involves the systematic con-
trol of both the quality and quantity of electronic
messages that are sent from within, and received
by, an organization. For many, this devolves into
simply removing emails from a server and sav-ing
them to a repository. But this is not enough






































because, to do it right, each one must be classi-
fied, stored, and perhaps destroyed in a manner
consistent with established business policies
and standards -- just as should be done for all
other kinds of documents and records.

As you may have surmised from the last state-
ment, email as an information type is often

60
treated differently than others even though it shouldnt be. The rea-
sons are many and reflect:

The sheer volume of messages involved, which dwarf the num-
ber of other document types in play on a daily basis
The informality with which they can be created and forwarded:
everybody does it, all the time, with the simple click of a mouse
The ease with which documents can be attached, even if theyre
not supposed to be

These factors make it very difficult to grab hold of email manage-
ment as a discipline and wrestle it to the ground. So what happens
is that organizations often -- too often -- default to one or more of a
number of unacceptable options that they think constitute manage-
ment but really dont. Things like:

Saving all email messages forever

Saving all email messages in the messaging application

Setting arbitrary time limits for all messages

Setting arbitrary mailbox sizes for all users

Declaring "email" as a record series

And, of course, the ever-popular "doing nothing."

The best answer is to ensure your organization has strong policies
in place to govern such things. And yet, we know full well that not
every organization does. According to a 2009 AIIM study on the
subject:
Only 10% of organizations have completed an enterprise-wide
email management initiative,
Only 20% currently are rolling out a project, and

Even in larger organizations, 17% have no plans to do so.

What were talking about, of course, are such things as deciding
when to declare a copy of a message -- or entire thread -- as a
re-cord, based on the sender, receiver, type of content,
attachments, text within the message, etc. All fundamental to
business record-keeping in other parts of the business, but all
too infrequent in terms of email.

Records-keeping aside, another major policy issue has to do with
acceptable use, or the way in which an organizations email system -
- or any other information solution, for that matter -- can be used.
Such policies are becoming ever more commonplace for new hires,
students from grade school on thru university, and members of other
groups to sign before they can fully participate in the activi-ties of the
day, and they often are dominated by things NOT to do, like use the
company system to send messages containing:

Obscene language or otherwise inappropriate content

Jokes or chain letters

Racial, ethnic, religious, or other slurs

And sometimes non-standard signature blocks and confidential-
ity statements

The reason acceptable use documents exist is to reduce the poten-
tial for legal action against an organization by providing a mecha-

61
nism to clearly articulate whats sanctioned and what isnt, whos
responsible if a problem arises, and/or to justify disciplinary ac-
tion, including termination, should an issue arise. The good ones
also are integral parts of broader information security frameworks,
and they are concise and clear, and spell out what the
ramifications are in non-compliance situations.

What weve been talking about is the kind of security designed to
save the organization from its people. The flip side of the coin in-
volves saving the organization from other people, and from insid-
ers who may have nefarious deeds in mind rather than a desire to
share the latest off-color cartoon with colleagues. The email man-
agement capabilities Im talking about here can automatically en-
crypt messages based on rules such as those described in a policy
document to ensure that private financial information, health infor-
mation, legal agreements, or any other forms of sensitive content
are handled appropriately: either entirely automatically, entirely
manually, or with prompting via a semi-automated capability.

How to handle attachments is a big part of this since that little
graphical paper clip on most email client toolbars can be an effec-
tive exit pass for information of all kinds. Solid email security poli-
cies will address this head-on by determining whether they can be
sent at all, and if so, by whom, of what file types, up to what file
sizes, to which pre-authorized addresses, etc.

A handy restriction that doesnt glue the door shut is to forbid the
attaching of actual files but permit the sending of links to those files -
- links that themselves can be secured by controlling access to them
through passwords or other devices. Another in-between is to apply
content filtering, which checks for certain key words, ad-
dressees, and so forth to decide whether or not to allow the attach-
ment to pass out of the organization. All of this quickly bumps into
the important issue of ownership because it involves the concept of
"who says I can and I cant!."

Particularly in the U.S., email is often considered to be owned by the
organization and not by the individual, even if the email ac-count
accessed is a private one hosted by the likes of Google or AOL.
According to many policies, as long as a piece of corporate
infrastructure is used to make the connection -- be it an email cli-ent,
a computer, or a wired or wireless network -- "owned" means subject
to audit, records-keeping, and, perhaps most potentially worrying,
subpoena. Anything sent or received may be monitored, and you can
forget about citing privacy as a defense in these situa-tions. This is
why corporate counsel attuned to email urge their cli-ents to adopt
policies as quickly as possible if they havent already.

Perhaps less threatening is the issue of archiving, which is one
of the most common applications for email management. At their
most basic, these solutions either copy or remove messages
from the messaging application to store in some other location.
This may be a manual process or handled by various types of
automa-tion.

For example, a rule could be set to archive all messages older than
30 days, or move messages from the system once the mailbox
reaches 90% of capacity, stopping only when the mailbox dips be-low
50% of capacity. The problem here is that the system doesnt know
which messages are more important than others, and arbi-trary date-
and capacity-based triggers may shunt critical messages off to the
side before the user needs them to be. So intensive work



must take place up front to ensure this doesnt happen or other un-
happy outcomes ensue.

Email Architecture - Backup and Archiving

Architecturally, email boils down to:

messaging-enabled applications within the organization and en-
compassing collaborative apps, workflow, and the like, as well
as email itself

a backbone for routing and policy enforcement; and

a secure gateway the communicates with the Internet

Depending upon the situation, all this may be arranged more or
less rigidly as dictated by any preferences for a more closed client/
server computing setup, a more open connection to a Web-based
engine (with either a formal email client application or simply a
browser), or other considerations. But they all pretty much work the
same way, with receiving, of course, working in the opposite
direction, and with both functions typically including connections to
a file store someplace to support the sending and saving of at-
tachments -- and backup and archiving systems to the "right" of the
applications on the slide.

The way messages are stored, backups are made, and archives
are kept is important because they all have an impact on how
findable and shareable the information they contain ends up being.
Remem-ber, its not that long ago that emails were simply "out
there" and not thought of as things that needed to be managed in
the same way as other information resources. After all, the thinking
went, storage is cheap, so we can just keep everything. And if we
need to find it, well, Outlook and Exchange have search functions.
But what about emails that have been archived elsewhere, or
sim-ply saved to someones desktop? And what about the people
searching from within other applications? And what of the attach-
ments?

Consider if you will, the real-life example of a user seeking ways to
search across multiple Outlook .PST files. The fact that responders
posted links and references to tools and techniques to help this
per-son is irrelevant, though possibly helpful. Its irrelevant
because its just so obvious this organizations emails were
afterthoughts at best. And now that a judge has asked to see
something buried within them, the issue is front-and-center, and
potentially costly in terms of dollars and time.

You see, how messages are kept makes a big difference in terms
of how much value they provide. And .PST files are just the begin-
ning of the chase, since there are other email storage formats out
there (such as Mozilla Thunderbird, to name one), and plenty of lo-
cations to cover like shared files, mobile devices, Webmail servers
-- which often arent open to broad queries -- and backup tapes,
which likely arent even online and available for searching without
special request.

If youre paying any attention at all, you have noticed that these is-
sues sound an awful lot like the ones you face elsewhere in your
information management strategy -- and thats precisely the point.
These considerations must be taken into account if your emails are
to be backed up -- and archived -- as they should be, as just an-
other information type. Why? Because thats exactly what they are.

The more emails come to be viewed as records-like creatures, the
more the established principles of archiving come into play. Offer-

ing more than just long-term storage, email archiving applications
index the messages they operate upon and provide quick,
searcha-ble access to them independent of the original users of
the system. Every case is different, but here are some good best-
practices for the function:

Manage the process centrally rather than hold individual users
responsible for it -- theyll never do it, or will do it but without
enough consistency to be useful. Centralization also eliminates
unnecessary duplication, identifies and links threads, provides
access to more than one user, and goes far to assure legal
compli-ance.

Support the ability to search and retrieve quickly, especially spe-
cific messages or attachments. This is one big difference
between "backup" and "archiving," the former being great for
wholesale restores but not granular search.

Ensure preservation first, then focus on deletion. This requires
policies that go beyond retention, like legal hold, which creates
exceptions for messages and documents that are under the
litiga-tion microscope. It may also require storage media that can
last or be readily refreshed after many years to accommodate
circum-stances like HIPAAs requirement that patient information
be kept for the life of the patient plus 9 years -- a span that likely
will outlast even the most modern of todays archiving media.

As a corollary, separate your data from your applications so files
can be opened without the native app many adhere to stan-
dards like XML and PDF to help make this a reality.
And finally, include email management as a core compliance or
records in order to foster cooperation, coordination, and support
of your email management initiatives.

What and When to Manage Email

One major task is figuring out which messages are important and
which are not! Emails associated with a contract negotiation or an
invoice or serving as a receipt for a transaction clearly have stand-
ing as business records and should be managed under your re-
cords retention guidelines. (Remember, email is not a record
series, but an emails content should be treated as such and
classified ap-propriately.)

But what of personal messages, drafts, meeting requests, and an-
nouncements about the corporate picnic? In most cases, these do
not represent critical information that should be retained perma-
nently or even beyond the date of the event. You may want to hold
on to them temporarily, but there is likely no business value to this
content. And how far back do you go? Do you include emails that
have been moved to offline storage or out into the archives? How
about those that have been marked as having been deleted but
still exist on a server or backup drive somewhere?

The answers to these questions are very organization-specific, but
the questions have to be asked and the answers acted upon.
Regula-tory guidelines provided by the government or industry
some-times dictate how email content should be managed and
retained, and the process of looking these up can be enlightening
even if some of those regulations arent directly applicable.

Working from the other direction, you may notice certain opera-
tional characteristics that suggest email management and archiv-

64
ing is just what the doctor ordered. Broadly speaking, there are
three primary telltale signs:

Performance -- When the email system slows down because of
the large number of messages it is being asked to store and
search -- lightening the load by taking the older and less impor-
tant ones off can only help! This move strikes squarely at your
bottom line because money that would have gone to additional
servers and storage can be spent on other things instead.

Efficiency -- When users spend measurable time dragging mes-
sages into personal or shared folders to unclutter their inboxes -
- having that many is a sign that not enough active management
is taking place. The problem here is pretty serious, as unman-
aged "dragging and dropping" invites questions regarding the
integrity and security of the information in the affected emails--
plus you dont want to pay people to spend time doing this in-
stead of their actual work.

Compliance -- Such as when requests come in for regulatory
audits or e-discovery support -- at which point, it may be too
late. Archiving supports data permanence, the notion that data
must be retained in its original state without being altered or de-
leted, and data security, safeguarding it from unauthorized ac-
cess and/or physical damage. These are both central to the
con-cept of compliance, and in many cases are the drivers of
email management and archiving in the first place.

Before getting to the point of archiving, though, emails continue to
pour in, and something really ought to be done about them. Espe-
cially those that relate to work in progress that others might like to see
-- plus their attachments -- whether or not they were copied on
the CC: line in the message. Here, utilizing the capabilities of a con-
tent or records repository is whats called for. From this standpoint,
emails represent just another type of document to be shared, lever-
aged, subjected to retention policy, and/or audited, and the tech-
nology for doing these sorts of things is well established and avail-
able from a large number of venerable vendors.

Now, just because this far weve talked mostly about managing
emails in the context of archives and repositories, dont go thinking
that this is all email management is about -- it isnt. A big part of it
also has to do with, well, actually managing the messages as they
come and go, not just finding them a place to put them after they
come home to roost.

Smart systems will set things up so emails being received and
sent are filtered so any bad stuff can be screened out and
anything thats not supposed to leave stays in. Spam blockers are
the obvi-ous examples of this, even though we dont think of them
as email management tools per se. But they are, really, and weve
been us-ing them for so long now that we dont even think twice
about them.

Imagine applying the same principle in reverse to compliance docu-
ments. In this scenario, the system would examine an outgoing
message to look for attachments or keywords that may indicate it
shouldnt be transmitted over such an insecure channel -- or maybe
even not at all. Or perhaps it matches the path of the at-tached
document to see if it originated in a directory the sender or receiver
shouldnt have access to. In these cases, the system would prevent
the file from leaving the organization and maybe notify

the IT security folks and powers that be that the event occurred.


There is also the ability to automatically route and file incoming
messages. This is a management tool and yet another example of
something weve been doing forever without really thinking about
in the form of creating rules in our email clients and auto-
forwarders on our email servers.

Operating at the server level means rules can be set globally if de-
sired, and can operate with keywords, attachments, senders, receiv-
ers, time-of-day, out-of-office periods, and metadata of all kinds to
ensure the right people are copied, the wrong ones excluded, con-
tent is properly stored, and so forth. One major advantage of apply-
ing automation at the server level is that it can run out of sight of the
users, who can manage the email they receive and send pretty much
as they always have. This is important because users tend to resist
any information management initiative that strikes them as being
especially "in their face" -- especially when it strikes at some-thing
that feels so intensely personal as email.

So while email management may be the right thing to do, care
must be taken that it not get in the way of the usual business
proc-esses. If it does, it wont be applied consistently or even at
all, and all your efforts will have gone for naught.

Finally, training your users on the "whys" and "wherefores" of email
management can go a long way toward easing resistance and
fostering compliance -- which then needs to be monitored to ensure
it sustains. As you develop your policies, roll them out, and
continually track and police users follow-through, it is important to
involve anyone who will be affected -- and in the case of email, thats
everyone! This way they have a chance to provide input, un-derstand
the reasons, and get used to the idea. Organizations with-
out policies have a hard time managing anything, let alone email,
and policies without user training and compliance auditing may as
well not exist, for they will not be heeded even people know they
are present.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org



SECTION 5

Content Management


In This Section...

1. Principles and Lifecycle
Considerations
2. Content Lifecycle
Considerations
3. Format Considerations
4. Workgroup and Public
Access Considerations
5. Content Inventory and
Metrics and Interactions
6. Digital Asset and Case
Management









Principles and Lifecycle Considerations

Content management is the systematic collection
and organizing of information that is to be used by
a designated audience -- business executives,
customers, etc. Neither a single technology nor a
methodology nor a process, it is a dynamic com-
bination of strategies, methods, and tools used to






































capture, manage, store, preserve, and deliver
in-formation supporting key organizational proc-
esses through its entire lifecycle.

Capture, which is covered elsewhere in this
Course, boils down to entering content into the
system.



67
Manage is what you do next to it, so it can be found and used by
whomever it is intended for.
Storing it means finding it an appropriate home in your infra-
structure, be it a formal content management system or other in-
formation solution.

Preserve refers to long-term care -- archiving, if you will -- the
practice of protecting it so it can be utilized however far into the
future the organization needs it to be available.

And deliver is all about putting the information in the right peo-
ples hands right when they need it to be there.

The length of this list provides a clue as to how "content manage-
ment" differs from "document management," which it incorporates
in many important ways.

Document management is one of the precursor technologies to con-
tent management, and not all that long ago was available solely on a
stand-alone basis like its imaging, workflow, and archiving breth-ren.
It provides some of the most basic functionality to content
management, imposing controls and management capabilities onto
otherwise "dumb" documents. Key features include:

Check-in/check-out and locking, to coordinate the simultaneous
editing of a document so one persons changes dont overwrite
anothers

Version control, so tabs can be kept on how the current docu-
ment came to be, and how it differs from the versions that
came before
Roll-back, to "activate" a prior version in case of an error or pre-
mature release
Audit trail, to permit the reconstruction of who did what to a
document during the course of its life in the system

Document management eventually was subsumed into content
management in no small measure because there is more informa-
tion available to us today than ever before, and most of it is not be-
ing created by us. Thanks to the mainstreaming of a whole range
of sources like the Web, thumb-drives, smartphones, etc., the need
has accelerated to deal with information of all kinds: not just in
terms of more media types like text vs. images vs. voice files, but
also in terms of how structured -- and thus how readily managed --
it all is.

Structured information is information that is highly defined and not
only is intended to be processed by a computer program, but
readily can be -- like most of the information held in relational data-
bases and acted upon by line-of-business solutions.

Unstructured information is, well, information that does not have a
fully defined structure, and most likely will be read and used by
humans. As examples, think of most of the information produced
by common office applications (word processors, presentation pro-
grams).

Semi-structured information is information that lies somewhere in
between, like invoices, purchase orders, and receipts, which con-
tain data to be computer-processed but which come in formats and
layouts that first need to be identified and classified -- a task that




often is handled by humans but increasingly is being automated as
the tools improve.

Content Lifecycle Considerations

This all becomes important when you consider the effect on your
business that not managing these elements can have! Diminished
utility, loss of time, loss of productivity, possible non-compliance
with regulations or corporate policies, the risk of serious business
interruption if key repositories die or natural disasters strike --
none of them happy outcomes. Effectiveness, efficiency, compli-
ance, and continuity all combine, in different proportions, to drive
the business case for content management in most organizations.

A big part of the challenge is that much of your information needs
to be managed from the time it is created until the time it can be
disposed of, according to the rules and policies of the organization.
Content management is not "just" about creation, or retrieval, or
maintaining an archive for a year or a decade or a millennium. It
encompasses the entire lifecycle of content, from the beginning to
the end, according to the business uses and objectives to which
that content is to contribute. How this gets done is something of a
moving target because information itself can change forms during
the course of its life. For example, a credit card application filled in
and returned on paper may be scanned immediately upon its re-
turn to the issuing organization, thus changing it from hard copy to
electronic -- and then, the choice as to which electronic format has
to be made according to the dictates of the business process that
then ensues, and whether it needs to be annotated, approved,
signed, etc. -- and the for how long it needs to be stored.
One variation on this theme has to do with the notion of reuse --
which is to say, the delivery of the same content in different forms
and formats according context, viewing device, security, etc. Imag-
ine, for example, the common case of a commuter reading The
New York Times on an iPad or other tablet while on the train, and
in printed form upon arriving home. The content is largely the
same, but the experience is wholly different -- and the technology
behind the scenes is not for the faint of heart as the same stories
and images are assembled and reassembled to maximize the look,
feel, features, and usability of the medium being used. Add to this
customizing the content according to geography, say, to feature
sto-ries deemed to be more pertinent to the commuters location as
de-termined by the devices geolocation ability or the subscription
ad-dress for the print edition and the ability to present premium
con-tent (or sensitive corporate information, in a workplace setting)
on-the-fly according to the viewers access credentials and you
have a sense of some of the complexities involved. Theres great
utility as-sociated with reuse, and content management tools can
help make it a reality.

Format Considerations

Many organizations capture and keep their documents in the na-
tive format in which it was created, be it Microsoft Word or Excel,
or one of the OpenOffice functions. These are usable by a great
many people and support metadata tagging as well, but
backwards-compatibility can be an issue without the appropriate
converters.

So a lot of companies turn to PDF, which is one of the most widely
used formats around and for better or for worse is often consid-
ered a "default" content format. This isnt necessarily a bad thing --


69
especially since this original Adobe creation was sanctioned by the
ISO as an official standard in 2008 (ISO 32000-1:2008, to be
specific) -- but, depending upon your situation and the nature of
your con-tent, a native or other image format may better serve
your pur-pose.

PDF -- for "Portable Document Format" -- is notable for its ability to
faithfully represent documents created in pretty much any appli-
cation, but do so independent of that original application, any hard-
ware, or operating system. The latest versions routinely handle
electronic forms and digital signatures, and have the ability to
authenticate and secure sections of a document as well as the
whole thing. So it does indeed have a firm place in the information
management world.

XPS is Microsofts entry into the same sweepstakes, but it doesnt
have nearly the footprint PDF does. A functional quasi-competitor to
PDF, it stands for "XML Paper Specification." Though it supports
XML versioning and extensibility, it does not support dynamic con-
tent, such as content contained in a drop-down menu on a form.

Still, it has been accepted as a standard as well, in the form
of ECMA 388.

TIFF is the Tagged Image File Format, and it is good for archiving
because files can be edited and saved without losing compression.
Tags may also be used to handle multiple images and data within a
single file. Many organizations thus choose TIFF for their scanning
applications, though file sizes can be large, especially for color im-
ages. So theres a tradeoff there that has to be considered.

JPG, PNG, and GIF are other common image formats that you may
need to manage. JPGs are used most often in digital photography,
and feature some variability in terms of balancing quality and file
size, as is illustrated by the picture on this slide, which shows what
happens to an image after being compressed with successively
higher-loss compression ratios as you go from left to right.

PNG and GIF were designed for transferring images on the Inter-
net, not for professional-quality print graphics. As such, they are
more limited than JPG in terms of handling color but can be used
for line drawings and text that does not require higher resolutions.
GIF, by the way, dates back to the early days of the Web, having
been introduced by CompuServe in 1987, and it still can be used
for simpler images like graphics or logos that have solid areas of
color.

Workgroup and Public Access Considerations

Workgroup content management is perhaps more steeped in
col-laborative technology than the "regular" variety because
work-groups usually are made up of a bunch of people all
working on the same task or project.

For sure, all content management systems trade in basic collabora-
tive functions like check-in/check-out, version control, etc. But these
are dramatically boosted by the ability to share edits, leave public
comments, and add capabilities like knowledge-sharing, on-line
presence trackers, and means of instant communications like Skype
and its ability to share a personal desktop.

Not every workgroup needs every one of these functions at all
times. But most will use at least some of them at one time or an-
other, and not every solution supports the sharing, tracking, and
safeguarding of that knowledge, those comments, and the like in
the same way -- and sometimes they dont do it at all.

In another realm altogether, public access content management
boils down to "regular" content management with the ability to al-
low people access without having to log in. Government web sites
are obvious examples of this, but private companies also use it to
open their stores of marketing materials and instruction manuals,
say, to prospects and customers.

Under the covers, it is the security element that gets the brunt of
the workout needed to make this a reality by extending the usual
internal malware and intrusion protections outward while not re-
stricting the ability to get at the information that has been desig-
nated for public use. Never an easy balancing act, it means first
identifying what information that should be, and then staging it
out accordingly. The flip side of this has to do with determining
how and where to host it so it can be most available and with the
least constraints. Public access systems can get orders of magni-
tude more hits than internal ones, and this load has to be
handled well in order to assure uptime.

Another area of particulars to be managed is that of the user inter-
face. Dealing with the public means an organization has no control
over what users use to access the system, or how familiar they are
with a particular screen layout. So attention must be paid to ensure
a consistency and an intuitiveness is achieved regardless of the
browser being used or the experience level of the user.

Digital asset management encompasses all non-text-based content,
also referred to as rich media and including audio, video, graphics,
digital photos, etc. As such, its variation on the theme is steeped in
intellectual property management as much as anything else, as a big
part of it has to do with permissions to use the assets, under
what conditions, for how long, and with what royalties. Theres a
heavier infrastructure component as well, for digital asset files tend
to be much bigger than conventional content. Just think about how
many more bytes a movie takes up vs. a Word document con-
taining the script.

It is important to note that a digital asset includes not only the core
content but and also the associated metadata, like date, time,
author, rights and permissions, format type, time last accessed
and so forth. That metadata usually is managed in the same way
as the rest of the metadata in the system, and not only is that
where the intersection with "regular" content management takes
place, but it actually makes it possible to search rich media at all
since the ab-sence of text means it is invisible to other tools.

Case management may be described as operating on bundles of
content rather than individual documents or images. A "case," of
course, is a compendium of information, processes, advanced ana-
lytics, business rules, collaboration, and sometimes social comput-
ing that relates to a particular interaction with or issue involving a
particular party like a customer, supplier, patient, student, or defen-
dant. Case management solutions are designed to manage all this
to help drive more successful, optimized outcomes -- even as they
also attend to and secure the individual bits of material contained
therein.

Now, if this sounds a lot like "regular" content management to you,
youre right! But there are differences, perhaps most notably case
managements inclusion of functions like incident reporting and
investigation management. These capabilities involve entire proc-
esses unto themselves and can encompass living documents of all



kinds that ultimately will need to have content management princi-
ples applied to them. As such, they require specialty care-and-
feeding not only up front as information is captured and analyzed,
but after the fact as well, when remediation steps are taken. So
they represent a distinctly different, if related, breed of cat.

Content Inventory and Metrics and Interactions

Just as shopkeepers take inventory of the merchandise they have
to sell, so should information managers take inventory of the
content they control. In many ways a thankless, excruciating task
for the painstaking detail required, it really is the best way to
determine what information you actually have. It is also useful for
identifying the relationships that exist in that content -- cross-links,
common pointers to repositories, overlapping authors, etc. -- and
thus for shedding light on your processes themselves.

Taking a content inventory is imperative if you are migrating your
information from one system to another, for it will help ensure none
of it gets lost. It is also critical when you are either first imple-
menting or updating your information management practices so
everything can get properly tagged -- or simply be left behind. The
job itself involves several well-established steps that generate so
much information about your information that you may wonder at
times how helpful is really is. But having this information will pay
dividends as your work progresses, so its definitely worth seeing
the program through.

The first step is to identify the major categories of content that you
have, separating them into logical chunks as by business unit, geo-
graphic location, subject matter, and so forth.
In a Web content management context, the task often begins with
a simple accounting of the pages on the site, and the capturing of
their page ID numbers and descriptions so they can be properly
tied to their owners, departments, etc.

Document type usually is the next most immediate identifier to be
captured, and its one of the most important because it provides
clues as to where in the organization the information lives, and
whose responsibility it is. Invoice, purchase order, order form, re-
search report, digital photo, video feed, and brochure are all valid
identifiers, and youll undoubtedly come up with scores more be-
fore youre finished.

Topics and keywords will become the metadata that in turn will fa-
cilitate content search and retrieval. Examples include customer
care, shipping and receiving, vacation request, computer requisi-
tion, employee name, and so on. Here, too, the choices can be
end-less, so its important to use a controlled vocabulary to ensure
con-sistency and to keep the volume in check.

Recording who owns and maintains the information is another im-
portant piece of the puzzle because it tells you not only who cre-
ated it in the first place, but who is in charge of updating and main-
taining it as well. This tells you who the most active users may be
and sets the stage for instituting or revising policies regarding ac-
cess authorization and editing rights, both of which are fundamen-
tal to sound content management practices.

And then theres ROT, which stands for Redundant, Outdated, or
Trivial -- three useful classifications that can help you set action
items and priorities. For example, if you find multiple copies of the
same information -- and you will -- which one is best considered


72
the original? If you find something old that hasnt been accessed in
a long time, perhaps it should be put on the road to being disposed
of. And if you find something kind of "light," like announcements
regarding holiday parties of the past, maybe you dont want to get
rid of them because of their sentimental value. But you probably
should label them as something that can be handled later rather
than sooner.

Besides all this labeling, you also want to record where the
content you are inventorying goes when it is in use, what and who
inter-acts with it, and what and who interacts around it afterwards.

An example may help to make this more clear.

Imagine a sales managers end-of-quarter report. Working with her
team, she consolidates her groups results and forecasts into a sin-
gle document, and sends it to her boss -- who then consolidates it
with his other financials to pass further upstairs. Thats the easy
part to discern.

Whats harder is identifying everything and everyone the docu-ment
touches, and is touched by, before it eventually comes to rest in a
directory somewhere (hopefully with good metadata attached so it
can be found again later if need be). These may include:

Contributors, editors, and approvers of the document itself

Emails or other communications that it references, or that refer-
ence it
Other documents, by virtue of links it may contain to contracts,
lets say, or links to it in other summary materials, and
Other information systems, which may extract and leverage the
content for other forms of analysis and decision support

Reconstructing how, where, and in what order information has
flowed, and what happened to it along the way, is akin to charting
ice floes in the Arctic because youre dealing with chunks of mate-
rial moving in seemingly random directions, encountering seem-
ingly random objects, but actually traveling according to some un-
seen plan. Understanding that plan is central to your ability to man-
age it.

Metrics are also key, for they provide the specifics you need to
make sound decisions about your information management. Sys-
tem logs, email timestamps, and file properties are simple and
read-ily accessible sources of usage data that can tell you much
about what happened when, and who was involved, as it relates to
infor-mation flow. Other metrics worth capturing include:

The number of information assets in use in each of the categories
you identified at the outset
The number of information types you are dealing with

How often they are accessed, and how many times the same peo-
ple or systems access them
The number of people or systems that touch it overall

The time it takes for a document to move from Point A to Point
B, and how long, and where, it may sit while waiting for atten-
tion




Digital Asset and Case Management

Digital asset management encompasses all non-text-based
content, also referred to as rich media and including audio, video,
graphics, digital photos, etc. As such, its variation on the theme is
steeped in intellectual property management as much as anything
else, as a big part of it has to do with permissions to use the
assets, under what conditions, for how long, with what royalties,
etc. Theres a heavier infrastructure component as well, for digital
asset files tend to be much bigger than conventional content.

It is important to note that a digital asset includes not only the core
content but and also the associated metadata, like date, time,
author, rights and permissions, format type, time last accessed
and so forth. That metadata usually is managed in the same way
as the rest of the metadata in the system, and not only is that
where the intersection with "regular" content management takes
place, but it actually makes it possible to search rich media at all
since the ab-sence of text means it is invisible to other tools.

Case management may be described as operating on bundles of
content rather than individual documents or images. A "case," of
course, is a compendium of information, processes, advanced ana-
lytics, business rules, collaboration, and sometimes social comput-
ing that relates to a particular interaction with or issue involving a
particular party like a customer, supplier, patient, student, or defen-
dant. Case management solutions are designed to manage all this
to help drive more successful, optimized outcomes -- even as they
also attend to and secure the individual bits of material contained
therein.

Now, if this sounds a lot like "regular" content management to you,
youre right! But there are differences, perhaps most notably case
managements inclusion of functions like incident reporting and
investigation management. These capabilities involve entire proc-
esses unto themselves and can encompass living documents of all
kinds that ultimately will need to have content management princi-
ples applied to them. As such, they require specialty care-and-
feeding not only up front as information is captured and analyzed,
but after the fact as well, when remediation steps are taken. So
they represent a distinctly different, if related, issue.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org






74
Collaborate and Deliver

























































lxxv
SECTION 1

Collaboration


In This Section...

1. Enabling Technologies

2. Core Functionality
3. Required Social
Functionality
4. Virtual Teams
5. Leveraging Consumer IT
and Commercial Sites
6. Management by Roles and
Responsibilities
7. Governance by Roles
and Responsibilities








Enabling Technologies

Collaboration is the practice of working together
to achieve a defined and common business pur-
pose. It exists in two forms:

Synchronous, where everyone interacts in
real time, as in online meetings, through
instant messaging, or via Skype, and






































Asynchronous , where the interaction can be
time-shifted, as when uploading documents or
annotations to shared workspaces, or making
contributions to a wiki.

Shared workspaces are among the most visible
entries in the collaboration space. Aimed at roll-
ing document and application sharing up with


77
chat and perhaps versioning and other auditing capabilities, they
may have more or fewer features, and may be available either for
license or on a syndicated basis "in the cloud," as they say.
Google Docs is a notable example of the latter, Microsoft
SharePoint and EMC Documentum eRoom of the former.

Perhaps best thought of as online encyclopedias or "how-to" manu-
als, Wikis are applications that let users freely create, edit, and
reor-ganize content using a Web browser. Perhaps the most visible
ex-ample of this breed is Wikipedia, and variants exist throughout
en-terprises of all kinds and sizes. The plus and the minus of wikis
are that more or less anyone can enter more or less anything into
the resource -- so while theyre a great way to capture and share
what people know, they also must be vetted to ensure nothing
erroneous gets planted within (intentionally or otherwise). The good
news is that, over time, active wikis tend to be of fairly high quality
due to the self-policing nature of an engaged user base.

Virtual conferencing is a broad term that encompasses numerous
ways to allow people to participate in meetings from separate loca-
tions. The tools for doing so range from full-blown video such as
shown on this slide to simpler tools like Skype, desktop sharing via
instant messaging, and even conference calling on the tele-phone.


On the Web, this functionality comes in two essential varieties:

Open participation, when the entire group can edit what they
see on the screen
Mediated participation, when the attendees can only read and
comment on what they see, while the organizer makes the
edits and shares them for all to enjoy

Todays ability to easily incorporate video and voice -- and often
for free, to boot -- has taken these concepts to a level whereby
peo-ple now actually forego business travel in order to have "face
to face" meetings! Especially with a fast-enough connection, the
expe-rience is remarkably "real," and thanks to inexpensive
consumer enablers like iPhones and iPads, it is charging hard to
the center of the collaboration landscape.

One major driver of this phenomenon is the mainstreaming of Voice
over IP technology, which is essentially the use of a comput-ing
infrastructure for telephony purposes. Services like Skype, Goo-gle
Voice, and Vonage owe their very existence to it, and its a ma-jor
step along the way toward treating voice in the same way as our
other favorite content types: text, images, and video -- a trend that
has huge ramifications on customer service and compliance given
how many business-critical interactions take place vocally.

Social networking has given a new face to collaboration by making
it possible to simultaneously engage huge numbers of people with
shared interests or activities. An asynchronous medium, it sets
peo-ple up as "broadcasters" of a sort, communicating outward as
thoughts or experiences occur in anticipation and expectation of
responses to follow.

High-profile public examples include Facebook, LinkedIn, You-
Tube, and Yelp, with corporate versions like Yammer rounding out
the field. The widespread popularity of services like these is caus-
ing growing consternation on the part of those information manag-


78
ers who are paying attention because their use by employees or
other organizational representatives is often uncontrolled, and thus
potentially a legal or compliance risk.

As if anyone doesnt know at this point, a blog is a type of Web site
that contains regular entries of commentary, event descriptions, or
other material (such as graphics or video) as provided by an indi-
vidual or an organization. Originally a blend of the term "Web log,"
the medium is collaborative in that visitors can leave comments for
each other, and the author, to read, and it is this asynchronous
inter-activity that distinguishes it from a "regular" static Web site.

A microblog is a little different in that its content is typically
smaller in both actual and aggregate file size, and may contain
nothing more than short sentences, individual images, or video
links. Another word for it that may be more familiar from other
contexts is "status update" such as is characteristic of services
like Twitter, Foursquare, and Tumblr.

Social sharing is the means by which users can identify and publi-
cize sources of information they find particularly interesting or
valuable. The Facebook "Like" button is a simple example that has
become part of the popular vernacular, and similar mechanisms
ex-ist -- sometimes multiply, as shown here -- on Web sites and
blogs of all kinds.

Core Functionality

Portals are frameworks for integrating information, people, and
processes across organizational boundaries. Providing a single se-
cure unified access point -- often via a Web browser -- they present
a personalized view of information through application-specific
"portlets," or windows on the main screen through which the differ-
ent applications, or their content, can be viewed and accessed.

Instant messaging is a real-time text-based online chatting mecha-
nism, like instantaneous email -- though the "text based" part is rou-
tinely now complemented by other means of communication like
voice or video, the ability to link to outside resources, and the abil-
ity to transfer files.

These enhanced capabilities put it firmly on the road to application
sharing, which lets the people youre chatting with actually see and
interact with the screen at which you are youre looking.


AIIM provides deep dive training - both

in person and online - in such areas as ECM,
ERM, SharePoint, Taxonomy and Metadata,
Social Media Governance and others.



One person usually controls the application at a time while the oth-
ers view it, and in some systems, control can be passed around.
There are clear trust and security issues associated with this, of
course, so most tools enable the sharing of only designated
applica-tions, rather than all the ones that are running. For larger
groups, and for groups involving participants from outside the
organiza-tion, centralized server-based spaces are provided either
by an in-house solution or an outside service to allow this to
happen in a controlled way.




This latter alternative puts us squarely on the road to shared work-
spaces, which formalize the model on a grander scale and are
aimed at rolling document and application sharing up with chat and
other means of collaboration, including versioning and other
auditing capabilities. Depending on the offering, they may have
more or fewer of these, and may be available both for license and
on a syndicated basis "in the cloud," as they say. Google Docs is a
notable example of the latter, EMC Documentum eRoom of the
for-mer.

Required Social Functionality

A useful framework to consider when thinking about social func-
tionality is the one created by Harvard professor Andrew McAfee in
2006 to describe core capabilities or requirements for systems in
this Internet age. Called SLATES, it calls for the inclusion of:

Search, or the discoverability of information via search, brows-
ing, metadata, and taxonomies. Also known as "findability"
Links, which is to say, hyperlinks, the very hallmark of informa-
tion management today
Authorship, or the ability of any and all users to create, com-
ment, or edit content. Also known as "user driven" content crea-
tion.

Tags refer to folksonomies and social bookmarking, or any
free-form "tagging" of information to make it easier to identify
and find.

Extensions relate to enabling automated intelligence around con-
tent, to points out related information to users, reveal usage pat-
terns, and lead to valuable insights
And lastly, Signals provide a variety of proactive notifications to
users, a capability also known as "alerts."

One of the most fundamental bits of social functionality is the wall,
the place on a social networking page where people can leave
com-ments, links, and other tidbits thought to be of interest to the
on-line community involved. It maps directly to the Authoring, and
often to the Links, point in the SLATES framework just outlined.

Ratings are also fundamental as they invite participants to evaluate
the physical and virtual experiences they have and share them
with their immediate world. Restaurant reviews, movie critiques,
and SlideShare presentations are all fodder for a ratings engine --
as is almost anything else you can think of.

The real power of the medium lies in its ability to indicate relative
value as determined by user consensus. On eBay, for instance, this
is measured as a "score" and depicted as a star that is colored to
il-lustrate how high the score is. Amazon also uses stars, while
Face-book uses "likes" -- the more, the better.

Status updates answer the question: whats happening? Which, not
coincidentally, is the phrase that greets you when you log into Twit-
ter (having replaced "What are you doing?). Their ability to capture
thoughts and ideas as they occur are an integral part of the immedi-
acy that can make social media so compelling, and theyve moved
far beyond the "Im brushing my teeth" characteristic they often had
when they first burst on the scene.

In collaboration terms, a forum is an online discussion site where
people can hold conversations in the form of posted messages.
Once known as a message board, it is a common piece of function-

ality in Web communities of all stripes, as it is an easy to use and
administer means of sharing and discussing opinions, and asking
and answering questions.

Virtual Teams

As the term suggests, a virtual team is a group of people who are
working together toward a common goal but may be spread across
time, geography, and organizational boundaries, and are using the
Web and other communications technologies to connect and col-
laborate. Writing for the ACM Special Interest Group for Manage-
ment Information Systems back in 2004, Anne Powell, Gabriele
Pic-coli, and Blake Ives identified four main areas of related focus:

Inputs, of the design, culture, technical, and training variety

Task processes, encompassing communication, coordination,
and the fit between tasks, technology, and structure
Socio-emotional processes, including relationship-building, cohe-
sion, and trust, and
Outputs, centering on performance and satisfaction

Whats striking about this short list -- which is still quite relevant
today -- is that every item has an element (or more) of human be-
havior embedded in it: culture, cohesion, coordination, satisfaction.
This is perhaps the single most important takeaway of any discus-
sion about virtual teams, for as difficult as it can be to build effec-
tive working groups in person, it is doubly so in a virtual environ-
ment where physical interaction and first-hand observations of
mood and body language are rare or lacking altogether.
The fact that virtual team members may be in different time zones
-- or simply that people can be very busy and thus tough to match
schedules with -- means that accommodation must be made to al-
low them to work asynchronously, or in a time-shifted manner, as
when uploading documents or annotations to shared workspaces,
or making contributions to a wiki.

The reverse, of course, is working synchronously, where everyone
interacts in real time, as in online meetings, through instant
messag-ing, or via Skype.

Since it is logical to assume that the virtual team is made up of
peo-ple selected for their expertise and experience in the area
being worked on, it is important to give them mechanisms for
capturing and tagging what they know so it can be shared --
especially in light of the asynchronicity just described, which can
make it tough for them to connect in real time to do a more
conventional "brain dump."

Wikis, forums, and social tools like Jive and Yammer are often
used for this purpose, and the most effective installations make
excellent use of the directory search functions baked within so
users can identify colleagues who can help them if they cant think
of any right off the bat.

Presence capabilities are often part and parcel of many collabora-
tion tools, "lighting up" when enrolled individuals are online and
displaying their state of availability (in a meeting, etc.). Instant mes-
saging applications have been doing this for years, of course, and
its incredibly useful when users are looking for a quick answer or
response since they can focus on colleagues who are showing
theyre "in" rather than send an email and have to sit and wonder.


81
Of course, virtual teams need to get together once in a while to
catch up on all the individual activity and to make decisions. On-
line meeting tools are the answer here, and they run the gamut
from enabling simple desktop sharing and presentations, to pro-
ducing formal conferences, Q&A sessions, and voting -- and
some-times the ability to archive and index the proceedings so
they can be referenced and leveraged later on. Which way to go
obviously depends on the size of the team and how often it may
need to in-clude numbers of outsiders.

Another variable has to do with the volume and nature of the docu-
ments the team is working with and working on. Shared work-
spaces are common in this regard, as they accommodate the shar-
ing of materials team members may need to reference, and the
abil-ity to collaborate on the creation of new ones, be they meeting
notes, reports, or what have you.

Shared workspaces can feature many of the elements of
document and content management, including check-in/check-out,
version control, metatagging, and workflow, as well as the other
capabili-ties covered in this module. Definitionally and functionally,
the dif-ferences between them are probably not worth haggling
over; suf-fice it to say that "shared workspace technology" per se
is designed specifically with collaboration in mind, while it may be
a byprod-uct of the other related stacks.

Perhaps one of the single most important considerations related
to virtual teams is that of bandwidth, which Ive left for last for em-
phasis and because it ties all the other ones together.

Bandwidth, of course, is the capacity of the network to carry all the
traffic generated by the team. In most of todays enterprises, this
isnt even a thought anymore, especially since much of that traffic
is largely text- and image-based (thinking, messaging and docu-
ment transfers, for the most part).

But add audio and video to the mix as many are, and it suddenly
can become an issue -- especially if there are participants who
must connect remotely, say, by smartphone over the cellular
network, or DSL, or (perish the thought!) dialup. These folks likely
will not en-joy the speed and seamlessness of their office-bound
mates plugged into a high-speed network, and the experience can
be quite frustrating -- or even project-killing.

This same line of logic, by the way, also applies to server capacity,
either internally or in the cloud, for virtual team processing loads
can be significant, but probably only at times. Managing the load
therefore can be something of a challenge.

Leveraging Consumer IT and Commercial Sites

Wikipedia describes itself as "the free encyclopedia that anyone
can edit," and that pretty well sums it up. Long criticized as a false
authority because of its populist roots, it is coming to be regarded
as a legitimate resource because its user base is so large and
active that inaccuracies tend to be found and fixed fairly quickly.
Still, it probably does make sense to validate its contents before
incorporat-ing any into any kind of business document.

From an organizational perspective, Wikipedias ready availability
and no cost of usage means it can be leveraged by anyone wanting
to use it as a reference tool. This, of course, saves the time and cost
of having to build a wiki yourself, but that still may be necessary to
house those bits of knowledge that are more specific to your ac-
tivities than a general-purpose public service ever could be.

Tungle.me is another general utility that facilitates scheduling for
groups of people by overlaying individuals own calendars on top
of each other and the dates and times proposed by the meeting or-
ganizer. Common slots of availability are then graphically high-
lighted to indicate the best times to meet. The beauty of the
service is that it eliminates the need to send, track, and remember
the usual chain of emails involved with doing this manually -- and
be-cause it is free as well, it can be used at any time by anybody
want-ing to automate the emailing of the initial invitation and any
follow-up reminders.

FreeConferencePro is a free, full-featured audio conferencing serv-
ice that permits the making of unlimited audio conference calls
without having to pay for any bridging time. Perfect for teams or
smaller organizations not having access to an enterprise capability of
this sort, it gives the conference call host a lot of control over the
process, including the ability to record a customized greeting, view a
real-time list of call participants, mute or disconnect people if de-
sired, and generate a report after the call is completed.

Skype can also enable conference calling, but it requires the use of
dedicated software for either your PC or smartphone, and access to
the Internet to enable connectivity. And thats OK, because it also
does a whole lot of things regular conference calling cant, includ-ing
instant messaging, file transfer, video calling, screen sharing, and
more. Skype also can serve as a low-cost alternative to commer-cial
telephone service, for its basis in voice-over-IP technology means it
can ring landline and cell phones worldwide for much less money than
conventional carriers can. Again, teams and small organizations may
find this especially attractive, as larger compa-
nies usually have telephony systems of their own to handle
these and other functions.

Google Maps, like its counterparts from Yahoo and MapQuest,
serves as an online atlas and route-finder for direction-challenged
travelers such as collaborators from different offices meeting in per-
son with team members for the first time. It also can provide other
forms of useful information like traffic conditions and estimated time
to arrival, and when viewed on a location-aware smartphone, it is a
fairly serviceable GPS alternative. Not insignificantly, it also
presents a variety of views to the user, including an option to see
the map as photographed by a satellite, and to see specific loca-
tions -- and travel from one to another -- as photographed at the
street level. Incredibly useful for someone visiting an area for the
first time, this technology also raises some pretty big questions
about privacy (as in, why was your truck parked at Fenway Park
when you were supposed to be in our meeting?) that are only now
beginning to be addressed.

Management by Roles and Responsibilities

Roles, of course, define the function people perform for their em-
ployers: marketing, accounting, technology management, etc.
Gen-erally speaking, they align with the organizational structure --
and when they do, information managers say a word of thanks (or
at least, they should) because the enterprises telephone and
systems directories often are managed the same way through
functions like Active Directory and LDAP (the Lightweight Directory
Access Pro-tocol). And where they exist, they are ready and
waiting to be in-cluded in the routing mechanism.




Responsibilities, on the other hand, involve the things theyre ac-
countable for getting done: writing press releases, preparing the
quarterly statements, supporting users, etc. Here, the delineations
may not follow the org chart as closely since, for example, product
marketers can live in the lines of business while corporate market-
ers can occupy staff positions at HQ, both at the same time.

Understanding and applying roles and responsibilities as a routing
tool is important because these attributes provide ready "handles"
to use to "steer" activities at a more macro level than a specific per-
son.

For example, some collaboration systems can be set up to track
ac-tivities by due date, and to send alerts to process participants
and supervisors when a deadline is missed. But as you move up
the spectrum of sophistication, systems can also automatically
include other people who play the same role or have the same
responsibil-ity. This is the real strength of the strategy in a
collaborative work context.

This same capability comes into play when the system is notified
that someone is out on vacation, or has left the organization -- ei-
ther directly by a user or the HR department, or automatically from
changes made to the enterprise directory (remember LDAP?).
Thus, processes dont have to be "reprogrammed" every time a
per-son comes or goes; they can simply incorporate or exclude as
neces-sary according to the rules that have been established.

This automation can be extended by using timestamps as triggers of
"next steps," just as the deadline information is used in the exam-ple
we just discussed. In this case, the logging of a completed step --
say, the approval of a brochure, or the uploading of a new con-
tract -- would kick off the next round of activity without need for
any human intervention.

Where this gets interesting is when processes kick of other proc-
esses -- as when receipt of a signed consulting contract triggers
the sending of invitations to qualified professionals to join the
project team. This is something that generally is required to happen
quickly, and automating the process makes life just a little easier
for those in the affected roles and/or with the involved responsibil-
ity.

The preceding slide raises yet another interesting benefit of routing
via roles and responsibility: namely, the ability to foster and man-
age several activities at once, rather than to have to tend them one
at a time as human beings necessarily do. Being able to split the
work onto parallel tracks -- such as sending the automated invita-
tions while repurposing sections of the contract for use as a client
backgrounder -- clearly can slash a process timetable compared to
having it be performed in sequence. This is one of the most dra-
matic (if obvious) advantages of combining collaboration with
workflow and BPM tools, and the key to making it work is basing it
on roles and responsibilities, not individuals.

Governance by Roles and Responsibilities

On its surface, Information governance is a set of formal and docu-
mented policies, procedures, and rules that control how informa-tion
will be managed across its entire lifecycle, from creation to de-
struction. But it is so much more than that, too, as success in this
regard requires a culture of accountability to which employees at all
levels -- senior executives, business unit managers, end users, and
IT, records, and legal staff -- must be committed. Otherwise,


the best technology and the most well-considered guidelines will
mean little, and operational standardization and compliance both
will go out the window.

The critical first step to achieving governance is the establishment
of an organizational structure to guide, oversee, and arbitrate the
process. Populated with representatives from all walks of organiza-
tional life, the list of responsibilities is long and generally includes:

Establishing policies and standards, including implementation
methodologies, development platforms, and integration proto-
cols so everything works together the way theyre supposed to

Prioritizing projects, starting with the most achievable as defined
by feasibility, impact, or sponsorship (in other words, who wants
it)

Enforcing rules and providing a conduit to executive authority
for final judgment
Maintaining best practices through shared vocabularies and stan-
dard operating procedures
Establishing a measure-and-improve mindset by capturing met-
rics and analyzing query logs and click trails to identify areas
needing enlargement

Integrating the handling of taxonomy, metadata, user interfaces,
and search to ensure they all work together for usability, compli-
ance, and proper tagging to facilitate automation

Good governance requires that all of these tasks be undertaken
and in an organized way. It wont all happen overnight, though, so
breaking it into smaller pieces -- and perhaps assigning those
pieces to smaller subcommittees -- is not a bad way to go.

There are a number of specific roles that are critical to involve in
any governance initiative. However, the individuals who fill those
roles can vary from organization to organization, and it is not un-
common for one person to wear more than one hat. These roles
dis-till down into:

Executive sponsor -- The executive sponsor is the person who
sets the initial direction and goals for the initiative, tracks its
progress, and approves the policies that emerge. Many times,
this is the same individual who is driving the collaboration or
information management project itself, though broader visibility
and scope is not unheard of.

Information strategist and queue management lead -- The informa-
tion strategist is responsible for figuring out how to assure and
maintain the quality and integrity of the information being man-
aged, and how to propagate awareness and enforcement of the
gov-ernance policies being developed. Sometimes this involves
making improvements to operational business processes, and it
almost al-ways entails a communications program to ensure users
are fully aware of the existence, purpose, and expectations of
information governance within the group and the organization.

Information quality leader -- The information quality leader (IQL) is
responsible for the day-to-day oversight and management of in-
formation quality. This person should have significant experience
with Information Quality Management (IQM), as he or she will be
deeply involved in all aspects of the program. He or she also is



charged with actively managing business process improvement
ac-tivity per the direction set by the information strategist.

Content stewards -- The fourth important governance role is that of
content steward, which may already exist -- perhaps with a dif-
ferent name -- within medium-sized or large organizations. Con-
tent stewards serve as the conduit or bridge between IT and the
line of business operation, helping to translate business needs into
technical requirements, explaining technical functionality to busi-
ness users, and actually enacting and enforcing governance man-
dates. They are accountable for defining content, content-specific
processes, and information quality levels for specific information
subject areas within a limited organizational scope or across the
en-terprise, as specified.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org






















86
SECTION 2

Social Media


In This Section...

1. Social Media
Value Proposition
2. Primary Varieties of Social
Media
3. Mobile, Local, and Social as
Business Process Enablers
4. Social Content Management
5. Integrating Social
Technologies













Social Media Value Proposition

One of the most immediately appealing aspects of
social media is its sheer reach, an advertising term
that refers to the total number of different people
exposed to a medium during a given pe-riod of time.
According to The Fiscal Times, there are some 750
million Facebook users and 100 mil-






































lion Twitter users worldwide. For sure, a high
percentage of them likely use both -- but if we
say that figure is 80%, that still leaves us with a
core of some 170 million people, which is the
rough equivalent of somewhat more than half of
the population of the United States.



87
Anyone with a message to communicate -- or any company want-
ing to -- therefore has a significant audience to play before as-
suming it is possible to stand out from the noise. This is a different
problem, of course, but the potential for immediate and huge visi-
bility certainly exists, and that makes social media a valuable me-
dium to be pursued.

This audience has grown very large, relatively very fast -- MyS-pace,
for example, one of the earlier social media outlets of the sort we
think of today, was founded only in 2003. One of the reasons this
has been so is that social media tools are so easy to use.

For the most part, their screens are uncomplicated and their func-
tions well labeled, so even the most computer-illiterate of users
can find their way around all right. And they require only a Web
browser to get going, so no special training -- as a user or as a
con-tent creator -- is needed to get up and running.

This browser-based foundation provides a real comfort because it is
so familiar to so many at this point. Email, e-banking, eBay, and the
like all use this same means of access, and new services can be
picked up fairly readily as long as they utilize the same essential
interface. And even programmatically, the back-end servers and
infrastructure needed to provide a social service are well known to IT
professionals. So the technology barrier to entry for organiza-tions
wanting to create or install something new is relatively low.

The cost barrier is relatively low as well, as theres nothing terribly
magic about the skills and systems needed to offer a social capabil-
ity -- as long as you have, or have access, to them! Not that theyre
free, but they are well established and well proven, and the big
tasks boil down to matching the intended capabilities to your
server, storage, and communications capacities.

The news is even better for users, as there generally is no cost bar-
rier at all! Its quite amazing, actually, what you can do for free: eve-
rything from sending and receiving emails to using your PC to hold
videoconferences to editing and uploading high-definition videos of
yourself for all the world to see. Its not that long ago that all this
was the purview of only the most well-heeled of major cor-
porations, so theres no reason at all not to become engaged, at
least at some level.

Primary Varieties of Social Media

Social media is a broad term for applications that allow users to
generate content. It encompasses a wide variety of capabilities
in-cluding writing in new media like blogs and wikis, posting com-
ments on other peoples blogs, submitting ratings of content, and
providing status updates and brief commentary via "microblogs."

Commercial services in this space include FaceBook, LinkedIn,
SlideShare, Twitter, Wikipedia, and YouTube. Though you
certainly can use them for free, they often offer additional services
for pay, and can earn significant money by selling advertising or
offering online community games that feature opportunities to buy -
- with real money -- items to be used in their virtual world.
Farmville is one example of this.

The services just mentioned are so-called "branded services" be-
cause they maintain a single identity everywhere they are used.
However, other offerings exist that let you hide the brand if you so
desire: so-called white-label technologies, they include the likes of
Groupsite and Ning, and let you establish your own social net-


88
works, under any names you desire, if that is what youre looking
to do.

A third variation on the theme are social technologies in the enter-
prise, which represent new ways to connect employees rather
than customers or other outside audiences. Yammer is a great
example of this, as it is being specifically pitched for internal use,
and has the added benefit of being free to get started with! Google
Apps is another offering being positioned this way, and others like
Word-press fit the mold as well. On the other side of the aisle lies
the likes of Atlassian, whose Confluence product is constructed as
a paid service all the way but is oriented in the same direction.

No discussion of social media varieties would be complete without
taking a cut across the different types in terms of the basis on
which you would use them: implementing them yourself, or hav-ing
them hosted by someone else.

Fundamentally a "make" vs. "buy" decision, the tradeoffs generally
are measured in terms of time, skills, cash flow, and overall control.
For example, because an implemented solution lives inside an or-
ganization's firewall, you should be able to more directly and pre-
cisely control its security -- including by hooking into Active Direc-
tory, LDAP, or other directory service you may already be maintain-
ing. Plus, you may be better able to impose the look and feel you
want -- and all only for what often can be a sizable expense with
the exception, just to complicate matters, of cases using open-
source software like Mediawiki! Plus, theres the need to handle
system upgrades and maintenance, so its an individual balancing
act to be sure.
Picking up the example from the previous slide, both Yammer and
Atlassian are enterprise offerings, but the former is hosted while
the latter can be installed inside the firewall. So even within the
categories, there are choices to be made.

Mobile, Local, and Social as Business Process Enablers

As phones have gotten significantly cheaper, smaller, and smarter
over the past decade, and today they literally really do put the
whole Internet, and all its capabilities, at our fingertips. Most re-
cently joined by pads and tablets, they have taken the notion of the
mobile worker and blown it out to include much of the general
public -- a fact that has serious ramifications on how information is
being generated and consumed. The thing is, it is now possible --
nay, routine -- for people to share their thoughts and opinions not
just about the world around them, but the specific neighborhood
they are in -- and for the recipients of that wisdom to put it to use
as they shape their own itineraries and activities, and increasingly
do so from mobile devices of their own.

For you see, most mobile devices today include location
awareness features that pinpoint their geographic position, and
inform the suitably-equipped apps they contain as to where they
are. This al-lows any social media channel the user accesses to
deliver informa-tion keyed to that place and its environs -- taking
personalization to the next level by presenting not only what, but
where, and pro-viding the ability to roll group-think (i.e., informal
public consen-sus) into the equation.

This trend is being driven hard by the likes of Twitter, FourSquare,
and Yelp, all of which have embraced localization and made it a pri-
mary focus. Their existence, and others, in turn are adding a new



89
dimension to many business processes that can make good use
of the ever-tighter coupling of the three capabilities.

Imagine, for example, a delivery route driver, or your airport limo
man, who nowadays carries some sort of mobile device that con-
tains locator software. The companys system thus can track his
progress and communicate updates regarding new stops to make
or changes to the route, and social services can provide traffic up-
dates to help speed him on his way. Meanwhile, the same system
can notify you as to when your pickup or delivery can be expected.

Pretty much the same thing can happen, by the way, with, say, an
important contract making its way through a workflow and using
internal social channels to keep employees abreast of its status.
Or an executive traveling in an unfamiliar city wanting to find a
place to eat. Because her smartphone knows where she is -- or at
least, where it is -- it brings up local responses when she enters
her Web query, and social media can offer up reviews and
recommenda-tions.

Social Content Management

Content management is the systematic collection and organizing of
information that is to be used by a designated audience. Neither a
single technology nor a methodology nor a process, it is a dynamic
combination of strategies, methods, and tools used to capture,
man-age, store, preserve, and deliver information throughout its
lifecy-cle.

Social content management is the application of traditional content
management strategies to content generated using social business
technologies.
Why is this distinction important? Because social media content
represents uncharted waters in many cases by virtue of its popu-
list, and thus uncontrolled, roots -- and that can be a very uncom-
fortable place to be.

Lets start with the capture piece since every Tweet, blog post,
blog comment, and wall entry you and your organization mates
upload -- and every one you receive -- is a piece of content that
theoreti-cally should be ingested and managed to ensure control,
decorum, and perhaps regulatory and records compliance.

Daunting though this sounds, the great mitigator is that most so-
cial media applications utilize databases and templates under the
covers. This means that enterprise solutions installed inside your
firewall can be managed pretty much as any of your other data-
bases are. For other solutions, whether commercial or hosted, the
first step is to bring their content inside using any of a number of
mechanisms to do so: Facebook Connect, RSS, third-party
services, gateways, and so forth. Once local, the content then can
be brought to heel.

Capturing the content is one thing; finding it again for tracking or
audit purposes is quite another! The key to this, of course, is tag-
ging, a critical piece of content management that begs to be applied
to the amorphous mass that is social media. A big part of the chal-
lenge is that social media content tends not to be as structured as,
say, images are. There are some handles, though, that you can latch
onto as a starting point; Twitter, for instance, has these:

Sender

Mentions (the @ or DM it is addressed to)




A unique Twitter ID

A ReTweet ID if it was ReTweeted

Date and time sent, and perhaps

A hashtag, which may or may not represent the Tweets subject

Still, the "usual suspects" frequently are not present, as there often
is no subject line or topic, no mechanism for filing it, and no real
keywords (except maybe the hashtag). So you may have to de-
velop a thesaurus or other mapping tool to piece meaningful meta-
data together, such as by matching the @usernames in a Tweet to
a username in the company directory.

Throughout all this is the need to control access to the social plat-
form -- not only to manage who gets to produce and participate in
social content creation on your organizations behalf, but whos al-
lowed to set up accounts; approve content, look, and feel; and
mod-erate and respond to comments. These tasks are no different
here than anywhere else in the world of application administration
when it comes to securing social media you are hosting yourself, or
managing on a third-party server. Many social technologies sup-
port the setting up of groups, the changing of privacy settings, or
otherwise restricting the ability of some users to do some things.
And since many of the enterprise solutions hook into Active Direc-
tory, managing this on a macro level in the enterprise is eminently
practical. A 2011 Cerulli Associates survey found regulatory record-
keeping to be the biggest challenge asset managers face when
deal-ing with social media use. And practical experience tells us
that the same dynamic is bedeviling executives at a wide range of
other kinds of organizations as well.
The challenge, of course, is that social media is appealing precisely
because it is so unregulated! Anyone, after all, can write a Tweet or
click a "Like" button -- but compliance officers and legal counsel
sometimes might prefer that the subject go unremarked upon. AIIM
President John Mancini put it well when he blogged that, "So-cial
media creates a vast new pool of informal and ad hoc content
available in forms and on devices that were unimaginable only a
few years ago. This content is certainly not usually a record in the
traditional sense but all this social information and content is
something that needs management and governance."

Good governance starts with good policies, and best-practice
says having a policy governing usage is the first line of defense
should things get litigious -- followed oh-so-closely by applying
and en-forcing it consistently. Any good policy will describe what
is per-missible and what is not regarding such issues as privacy,
accept-able use, confidentiality and if there are regulations to
be heeded in your business, theyll be reflected there as well.
Other smart guidelines include these:

Social content is just another form of content. Period.

Your social content policy should apply to most or all social me-
dia tools -- and to other content/communication-related tech-
nologies as well.

DONT write separate policies for Facebook and Twitter and
LinkedIn and for every other social media that might be encoun-
tered by employees. Technology changes fast, so develop
some-thing comprehensive enough to cover new technologies
as they appear.



Applying and enforcing your policies requires a formal govern-
ance framework to be most effective. Besides the policy and
proce-dures outlined on the previous slide, your framework also
should include a team of representatives from senior management
and the organization itself.

It is especially important that higher-level executives be actively
engaged so they can determine overall strategic goals and support
the need for social media initiative(s), policy guidance, technology
solutions, and organizational transformation as needed. At the
same time, organizational participants can and should be drawn
from different functional, geographic, or other groups that manage
information and there should be NEW creatures present known
as social media strategists, community managers, and
moderators, who are charged with maintaining the quality of and
control over the information being posted.

And so now we return to where we started: with the need to keep
records -- a requirement that obviously first requires knowing ex-
actly what a record is, and thus what must be retained, in the
world of social content. This likely will vary not just by content, but
also by the nature of the tool. For example, an individual social
network status update or Tweet may not rise to the level of a re-
cord, but a protracted discussion on a particular topic or over a
given period on someones wall or via Twitter might qualify.

Integrating Social Technologies

The phrase integrated social technologies can mean different
things to different people, not in the least because the phrase "so-
cial technologies" does as well! As used here, it refers to the merg-
ing together -- programmatically or merely experientially -- of mul-
tiple community-building capabilities, with themselves or with a
"regular" Web site, be they internal or external, or some combina-
tion thereof. The first baby step on this road was taken via simple
linking, but weve moved so far beyond that both technically and
culturally that it is difficult to recall how exciting it was to be able to
surf at all!

Facebook Connect is the mechanism that allows you to create and
sign into Web accounts using your Facebook credentials, rather
than having to establish new identifies each and every time. Twit-
ter and Google are supporting the same sort of capability, and for
the user, the result is much greater convenience and usability. For
the social media providers, it initiates a two-way flow of informa-tion
about the user -- which is either a good thing or a bad thing de-
pending upon which side of the equation you stand.

Another manifestation of the trend toward integration is seen in
platforms like tibbr, which extends social technologys people-
based functionality to information systems as well. For example,
an "application event stream" essentially "tweets" things that hap-
pen in your key applications, which you follow more or less like
you do your friends. The same can be done with subject matter
that you care about, and when items are received that you want to
move on immediately, the product supports unified communica-
tions that allow you to choose or mix-and-match from among
video conferencing, desktop sharing, and voice.

The prior example illustrates how social media can be leveraged
and integrated by more "conventional" Web sites and information
systems. By using Web services, they can also be mashed together
in what is known, appropriately enough, as mashups, which are



Web sites that use information from more than one site to present
their information. Zillow.com is a simple but excellent example of
this, as it combines real estate information with aerial maps. But
not content to stop there, Zillow has recognized the power embed-
ded within and has taken the concept one step further by opening
up its APIs so other sites can call it and build additional value on
top.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org



































93
SECTION 3

Information Workplace


In This Section...

1. Enabling Tools

2. Social Computing and Web
2.0

























Enabling Tools

Dating back to the mid-2000s and gaining daily
relevance today, the phrase "information work-
place" refers to the use of new Web capabilities,
in combination with collaboration tools, office
productivity applications, and content manage-
ment, to get work done. This essentially means


























creating a new environment in which a focus can
be maintained on the information rather than the
tools, which are coming to be so well connected
that they interoperate seamlessly and allow work-
ers to do what they do -- even from remote loca-
tions!



94
Among other benefits -- like greater employee satisfaction and
lesser demands on office space and energy -- the ability to enable
work to be performed from anywhere means organizations can
farm out those functions that are not core to their businesses, like
accounting, for instance, or legal. This frees them to focus on their
core competencies and staff up accordingly, providing greater flexi-
bility and economy than if everything had to be in-house.

What makes this possible, in large measure, are a couple of espe-
cially relevant technical developments that have flung the doors
open to a smooth, "you dont work here but wed never know it"
kind of interoperability.

One is emergence and adoption of Web services, which are
defined by the W3C -- the same organization that gave us the
World Wide Web -- as "a software system designed to support
interoperable machine-to-machine interaction over a network." Or
in other words, a standard way to get computing systems to talk to
one an-other. Practically speaking, it is especially evident where
software-and infrastructure-as-a-service come into play.

Another is XML, which is short for Extensible Markup Language
and is a way to encode documents, and parts of documents, so
they can be more easily searched and parsed. Though it focuses
mainly on documents, it is also widely used to represent data struc-
tures, and many application programming interfaces (APIs) have
been developed so software can process XML data. Web services,
too, can rely heavily on it.

Both of these are endemic to a Service-Oriented Architecture, or
SOA, which is built atop a strategy calling for applications to be
self-declaring, enabling their discovery by other applications in a
dynamic manner. This provides for the "on the fly" development of
potentially complex, integrated solutions. The down side is that it
also requires a considerable amount of up-front planning and possi-
bly much in terms of subsequent coding and execution. However,
the advent of XML and Web services is mitigating this somewhat,
and since the alternative is the more traditional and expensive
world of "one at a time" integration between components, its an
option that is definitely worth looking into.

From an applications standpoint, portals represent an obvious fo-
cal point because they do so much to put information in front of
people doing work. Frameworks for integrating information, peo-
ple, and processes across organizational boundaries, they provide
a single secure unified access point -- often via a Web browser -- to
different applications or content repositories. As such, they open an
easy door (hence the word "portal") to a lot of data and function-
ality all at once -- as long as all the back-end integrations are in
place and exposed through the application-specific "portlets" on the
main screen.

Content management is the systematic collection and organizing of
information that is to be used by a designated audience -- business
executives, customers, etc. Neither a single technology nor a meth-
odology nor a process, it is a dynamic combination of strategies,
methods, and tools used to capture, manage, store, preserve, and
deliver information that supports key organizational processes. Of-
ten, that information is viewed and accessed through a portal.

Say the words "office productivity tools" to most people, and theyll
conjure up a list of software packages like Microsoft Word, Excel,
and PowerPoint, email, and maybe a shared calendar or con-


tact database. In this day and age, however, these products are
starting to be replaced by services like Google Apps, Microsoft Of-
fice 365, and Zoho, which put many of these same capabilities,
and more, in peoples hands but on an online basis, not through
the use of locally-installed software.

What this means, of course, is that not only can people theoreti-
cally gain access to published information from anywhere, at any
time, but they can do the same for information theyre working on,
working with, and creating anew. And thanks to the latest in smart-
phone and tablet technology, they dont even have to be at a com-
puter -- never mind their own -- to do so. Its a real potential game-
changer, as it even changes the definition of "workplace."

Social Computing and Web 2.0

The term social media refers to the use of Web-based and mobile
technologies to turn communication into an interactive dialogue.
Social media take on many different forms, including Internet fo-
rums, blogs and microblogs, wikis, podcasts, and social bookmark-
ing. It differs from traditional or "industrial" media in that it is rela-
tively inexpensive and accessible to anyone (even private individu-
als) wanting to publish or access information -- and as such, it is a
great equalizer even as it leads to eruptions of information that may
or may not be vetted or even true. Information workers are wise to
consider both sides of this equation as they search, find, and make
use of the gobs of content that is now available.

Social media is part of what has come to be known as Web 2.0, a
term that is associated with Web applications that facilitate partici-
patory information sharing, interoperability, user-centered design,
and collaboration. This is as opposed to traditional Web sites -- rep-
resentative of the Web 1.0 -- that limit users to the passive viewing
of content that was created for them. In a 2006 interview, World
Wide Web creator Tim Berners-Lee called the term Web 2.0 a
"piece of jargon" because his vision for the Web was for it to be "a
collabo-rative medium, a place where we [could] all meet and read
and write." So he calls it the "Read/Write Web" instead.

Whatever we call it, the fact is that todays Web allows us to inter-
act with information as never before, and the phrase Web 3.0 is al-
ready being bandied about to encompass the mainstreaming of a
great many new Web-based capabilities like semantic search,
antici-patory retrieval (in which the "Web" somehow knows what
you want to see next), and production quality audio and video.
Bring-ing all this to the desktop -- or laptop, or smartphone --
requires further advances in bandwidth and technology, of course.
But also will kick the notion of the information workplace into a
higher gear.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org



SECTION 4

Instant Messaging


In This Section...

1. Instant Messaging Basics

2. Instant Messaging Risks and
Responses
3. Instant Messaging
Architectures





















Instant Messaging Basics

Instant messaging is a system for electronic syn-
chronous one-to-one or one-to-many communica-
tions. In the real world, it is more or less the typed
equivalent to making a phone call in that it is a
real-time medium -- hence the word "synchro-
nous." And contrary to popular belief, it has been






































around since way before there was an Internet or
a World Wide Web, though those two develop-
ments certainly made it more accessible and us-
able than ever before. The "live and in person"
na-ture of the medium allows for much more than
chatting, as the typing of messages back and



97
forth is known. Once the connection is established, many other
valuable interactions can take place as well.

The public face on the medium is probably the contact or buddy
list, which serves as the directory to all the folks a user has identi-
fied as someone with whom he or she would like to chat, or has al-
ready. This list is more than just a bland list, however, as it can
also display other possibly useful information like a contacts
status (available, away, etc.) and the instant messaging service he
or she uses.

Sit and watch a contact list long enough and youll see peoples
icons blink on and off as their owners log on and off their instant
messaging service. As you do, be sure to appreciate whats going
on, specifically the ability to note someones presence online with-
out having to call or otherwise reach out to them first. Also known
as "awareness," this function can be quite useful, especially in a
larger organization, when you have a pressing question but are not
sure whos around to answer it or point you toward a suitable solu-
tion. You really cant beat it when time is of the essence, as it so of-
ten is, and it is a standard part of most collaborative services today.

Once connected, participants in an instant messaging session can
engage in two especially valuable activities: file and link sharing. At
the time the technology first became popular with ordinary peo-ple
(as opposed to computer scientists), the ability to send docu-ments
back and forth was generally based on either email or Fed-eral
Express -- neither of which was (or is) especially real-time. As
instant messaging matured, however, the capability quickly be-
came a mere matter of clicking a button, indicating the file that was
to be sent to the recipient, and off it would go! Links to Web sites
could be shared with equal ease, and ultimately the medium was
expanded to include voice and video as well. In this way, the tech-
nology was an important precursor to services like Skype, which
took the concept to a new level.

Instant Messaging Risks and Responses

Like so many other highly accessible communications media -- two
prominent ones being email and social networks -- instant messag-
ing is fraught with risk if left ungoverned. Its just so easy to log on
and type whatever springs to mind that savvy organizations are
seeking ways to exert some measure of control over how it gets
used.

Policies in this regard are known as acceptable use policies, and
they are intended to govern the way people leverage not only in-
stant messaging, but all the other communications tools available to
them as well. Chances are that if you are, or know of, a recent hire,
student, or online community member, youve been ac-quainted
with the need to agree to certain terms before fully partici-pating in
the activities of the day. Acceptable use policies tend to be
dominated by things NOT to do, like send messages containing:

Obscene language or otherwise inappropriate content

Jokes or chain letters

Racial, ethnic, religious, or other slurs

And sometimes non-standard signature blocks and confidential-
ity statements

The reason acceptable use documents exist is to reduce the poten-
tial for legal action against an organization by clearly articulating


98
whats sanctioned and whats not, whos responsible if a problem
arises, and what disciplinary actions can be taken if an issue
arises. The good ones also are integral parts of broader
information secu-rity frameworks, and are both concise and clear.

One way to boost the odds of compliance is to ensure your people
are trained in the particulars of acceptable use and the potential
perils of casual chat and screen-sharing. Part of this must also in-
clude awareness of the latest developments in malware, spyware,
key logging, phishing, and other illegal invasions of privacy, as
well ongoing innovations in protective technology in these and re-
lated areas like information confidentiality. As such, this training
must be constant and evergreen -- especially for administrators! --
and perhaps baked into the "new hire" process so new employees
are immediately indoctrinated into how things are to be done.

The aforementioned factors are in play whether were talking
about consumer instant messaging systems as from AOL,
Yahoo, and Microsoft, or enterprise instant messaging systems
like those from IBM (Sametime), Symantec (Enterprise Instant
Messenger.cloud), and yes Microsoft again (Lync 2010).

Given the ubiquity and functionality of public services, it may be
logical to wonder why organizations choose to implement their
own. The answers are many, but most often boil down to these:

It provides much more sophisticated means to manage and en-
sure the authenticity of network users.
It more reliably keeps IM-transmitted information secure and
confidential.
It helps limit use to business productivity purposes rather than
casual communication.

Or in other words, the same security and productivity considera-
tions that are at work whenever inside, outside, or hosted informa-
tion solution decisions are to be made.

To mitigate risks and satisfy the need to reduce liability and adhere
to regulation, some organizations are installing compliance solu-
tions that are designed specifically for the task.

One example is Actiance (formerly FaceTime Communications),
whose Unified Security Gateway lets applications like instant mes-
saging be safely accessed via URL filtering, anti-malware, and
Web anti-virus solutions. Another company in the space is Global
Relay, which offers an Archive for Public Instant Messaging that
captures and archives messages from public IM services ranging
from AOL Instant Messenger to YellowJacket. Gateways are
another option for mitigating the risks associated with instant
messaging. Basi-cally tools for interconnecting multiple IM
networks, they sit at the perfect point to apply protections and
provide the kind of capture and archiving capabilities as the
enterprise products just men-tioned -- many of which, not
coincidentally, include gateways as part of their core functions.
Another company active in this arena is Symantec, which has a
product that is cleverly called Messag-ing Gateway.

Instant Messaging Architectures

On a client/server network, every computer is either a client or a
server. A server, of course, shares its resources among the client
computers on the network, and is usually located in a secured area,
such as a locked closet or data center, because of the value of the


99
information it typically contains. Any other computer on the net-
work is a client.

Primary advantages of this kind of architecture revolve around the
ability of administrators to centrally manage and secure the serv-
ers, and to control access to them by linking to an enterprise direc-
tory like Active Directory or LDAP. In most enterprise contexts,
client/server-based instant messaging systems include the follow-
ing components:

A directory server for authentication and lookup

An IM server and one or more multiplexors to aggregate the cli-
ent signals
A Web server to serve up the pages

One alternative to this is the peer-to-peer construct, in which each
client can communicate with any other client on the network to
which it has been granted access rights. As such, each one
basically can act as a server as well -- a practical fact that makes
equals of them all, and thus gives the architecture its name.

Instant messaging can be implemented on a peer-to-peer basis, an
arrangement that does not require an Internet connection to oper-
ate. This adds a high level of privacy to this type of instant commu-
nication, ensuring that no one else but the intended recipient will be
able to read the contents of your instant message. But it can also
involve tradeoffs in terms of performance and security since, by de-
sign, it involves the pooling of individual and non-uniformly capa-
ble computing resources rather than more easily managed and
con-trolled central servers. PopNote and the LanToucher products
are two examples of these kinds of offerings.
Store-and-forward architectures are those in which sent messages
are stored if the recipients applications are not running and avail-
able, and simply forwarded when they come back online later on.
Email, text messaging, and voice mail are perhaps the three best-
known technologies that feature this capability, and while instant
messaging typically does not feature this kind of capability, the po-
tential is there for it to occur -- especially in an "always on" server-
based environment. One place It can be seen is the chat
mechanism on Skype, which will display messages typed to you
while you were offline.

The proliferation of wi-fi and Internet-enabled mobile devices is
putting instant messaging in the hands of those who are not in
front of their computers, as there are dozens of IM client applica-
tions, Web messengers, and text messaging-based options
available for those kinds of products.

The technology that allows instant messaging services to be ac-
cessed this way is called, cleverly, Mobile instant Messaging , or
MIM. Making it available is more than merely creating an interface
that runs on the mobile platform, however, as there are major fac-
tors to be considered along the way, including

Radio bandwidth

Device memory

Availability of media formats

Virtual or physical keyboard

Display resolution

CPU performance, and


Battery power

MIM comes in two varieties:

With embedded clients, involving a tailored IM client for every
specific device with a special back-end server installed at the
cel-lular carrier, or

As a client-less platform, utilizing a browser-based application
that requires no software to be downloaded to the handset or
any technical support by the cellular operator.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org






























101
SECTION 5

Telecommuting Support


In This Section...

1. Telecommuting Access
and Use
2. Telecommuting Device and
Network Issues
























Telecommuting Access and Use

Telecommuting is the practice of performing
work without having to go into the office. Thanks
to todays collaboration and communica-tion
tools, people can do more, more flexibly, from
more locations than ever before. Its a terri-fic
option for organizations wishing to accommo-
























date people who live far away, have disabilities,
or are looking to "go green," but the out-of-sight
nature of the model may make it unsuitable for
some roles and for some people -- bosses as
well as underlings who may require a more
conven-tional setup for comfort.



102
Remote users have the same information needs as their in-house
counterparts, and the fact that they live beyond the bounds of the
internal infrastructure only makes the usual concerns about con-
trolled access to that information that much more acute. At a high
level, there are three primary techniques used to provide the anti-
trespassing, gating capabilities needed to allow only those users
with permission to log in: password-protection, directory services,
and access control lists.

John Daintiths 2004 Dictionary of Computing defined a pass-
word as being a unique character string held by each user, a
copy of which is stored within the system. If a password entered
by the user corresponds with the value of the one stored, the
user is accepted by the system and is on his or her way.

A directory service identifies all resources on a network and
makes them accessible to users and applications. Two of the
most widely-used are LDAP (the Lightweight Directory Access
Protocol), which is used primarily for e-mail addresses, and Net-
ware Directory Service (NDS), which is used on Novell Netware
networks. Virtually all directory services are based on the X.500
ITU standard, although it is so large and complex that no vendor
complies with it fully.

An access control list (ACL) is a list of permissions attached to
an object in a computer file system that specifies which users or
system processes are allowed to access it, as well as what
opera-tions are permitted or prohibited. For instance, if a file
has an ACL that contains (Alice, delete), this would give Alice
permis-sion to delete the file.
Appropriate access thus assured, the next question has to do with
the applications that will need to be accessed. On the desktop, a
cer-tain minimum level of functionality and standardization typically
is required, and often subsidized. But from a collaboration perspec-
tive, there are a number of applications telecommuters need to be
equipped to leverage. These may include the following:

Portals, for ready access to organizational news and key applica-
tions and documents
Shared file services, like Google Docs and Dropbox, which allow
multiple editors to view and update documents simultaneously,
from remote locations

Shared workspaces, such as Documentum eRoom to house
and manage materials being collaborated upon
Web conferencing, for even more meaningful interactivity along
a spectrum including instant messaging, Skype, and other real-
time tools, and

Social media, to help remote employees connect with each other
and benefit from the rankings and alerts that are part and parcel
of belonging to an online community

However protected the means of access, and however rich the
col-laborative toolset, no telecommuting program can be
successful without being surrounded by good business practices,
and sub-jected to their enforcement. In this, it is little different than
any other kind of information management initiative. For instance,





Policies regarding system access, information governance, and
records management must be just as rigorous for
telecommuters as they are for in-house workers.

Requirements regarding the work environment must be codified
so there is no misunderstanding about the need to present
profes-sionally to the world, in written and spoken demeanor, in
dress and background when videoing, and so forth -- just as is
the case for employees in the office place.

Schedules should be constructed with mandated regular check-
ins with co-workers to ensure tasks and expectations remain in
sync across the organization.

Telecommuting Device and Network Issues

The nature of the access device telecommuters use is a big part of
making this model work, as PCs, tablets, and smartphones now
are all mainstream options. Relatively speaking, they are
inexpensive, flexible, and, of course, highly mobile -- attributes
than recom-mend them well for telecommuting but add to the
information managers woes since they also are prone to loss,
theft, breakage, and unauthorized use. Capabilities, therefore, are
needed to aid re-covery, repair, and insulation from "evil-doing."
Here are just a few ideas to aid your efforts in this regard:

Security, physical and electronic, including everything from ca-
ble locks to screen locks and self-destruct modes to keep sensi-
tive information out of hackers hands

Malware protection, to keep viruses, keyloggers, and the like
from infiltrating first the device, and then your network
Maintenance and support programs, through which to manage
hardware and software updates, as well as training, repairs, and
replacements

Another critical factor to consider has to do with the network over
which people are "phoning home," as speeds and bandwidths obvi-
ously vary widely depending upon the combination of devices plus
wired, wifi, and cellular connections are being used.

This in turn can have a dramatic effect on how your data is stored
and distributed: for example, centrally or distributed, on-demand or
at replicated intervals, etc. And THIS can significantly affect the
state of your budget since it will determine how many servers and
communications links, and of what capacities, youll need to pro-
cure.

No discussion of telecommuting-related device and network issues
would be complete without mention of the VPN -- the Virtual Pri-
vate Network. A VPN is a network that uses a public telecommuni-
cation infrastructure -- like Internet -- to provide remote users ac-
cess to a central organizational network. For the user, the experi-
ence is just like being connected directly to the central network, es-
pecially when there is good device processor power and network
bandwidth.

Secure VPNs use cryptographic tunneling protocols to block inter-
cepts, packet sniffing, identity spoofing, and attempts to alter mes-
sages. Among the more common of the many protocols used are:

IPsec (Internet Protocol Security)

Transport Layer Security (SSL/TLS)




Secure Shell (SSH)

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org
















































105
SECTION 6

Web Conferencing


In This Section...

1. Web Conferencing































Web Conferencing

The term Web conferencing encompasses meth-
ods and tools for conducting meetings remotely via
the Internet. A type of synchronous collabora-tion
since the proceedings take place in real time, it
counts among its capabilities the likes of desk-top
sharing and presentations, Q&A sessions,































polling, voice and videoconferencing, and the
ability to archive events for review or audit later
on. This functionality comes in two essential va-
rieties:

Open participation, when the entire group can
edit what they see on the screen



106
Mediated participation, when the attendees can only read and
comment on what they see, while the organizer makes the
edits and shares them for all to enjoy

Among its most compelling business benefits are its abilities to:

Reduce travel costs and enable communication among partici-
pants who are dispersed organizationally or geographically
Eliminate time lost simply getting to and back from physical
meetings
Record and reuse Web conference content

Support "green" initiatives

Security is one of the most significant needs associated with
Web conferencing, for unless youre running a public-facing
online event of some kind, you dont want just anyone to be able
to log on. There are several techniques that have arisen to take
care of this.

The obvious one is to require a password before allowing people
to log on.
Even better is to hide the session from the public listing (even in-
ternally) so no one but invited guests even know its taking
place.

Thirdly, disabling desktop sharing will guard against people wit-
tingly or not revealing things better left invisible.
And where you have folks taking part from multiple locations,
using secure connections and possibly even VPNs will further
protect your information from prying eyes.

Other keys to the kingdom boil down to ease of use since the idea
is to mimic the character of a face-to-face meeting to the greatest
degree possible, and there are few obstacles associated with that
short of being physically able to attend. Chief among the chief fac-
tors here are:

Ease of connection to the conferencing system via simple menus
or, better, automatic login
Ease of access to organizational information, including single
sign-0n so once the user has dialed in, he can get to the
reposito-ries he needs without further interference

Availability of archives and indexes of the online proceedings so
they can be referenced and leveraged later on, or seen for the
first time by anyone missing the original meeting

From an efficiency standpoint, most systems support a healthy
measure of "do it yourself" -- and in a good way, to promote agility
in management and usage. This generally manifests itself in one
of two ways:

Self-provisioning of the conferencing application so participants
can acquire and configure the requisite software without need of
any (or much) human intervention. This is something well famil-
iar to users of services like Go to Meeting.








Self service, so no operator or IT intervention is required to in-
vite users, share desktops, take a vote, or perform other func-
tions.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org












































108
Secure and Preserve

























































cix
SECTION 1

Security


In This Section...

1. Security Concepts and
Principles
2. Security Types



























Security Concepts and Principles

An access permission schema is primarily an
anti-trespassing, gating device designed to keep
out every potential user of the system except for
any the owner has granted permission to. In the
past, this purpose was served on an individual
PC basis by material devices like hardware keys






































and dongles, but the prevalence of large-scale
network-based computing infrastructures has
rendered those devices cost-prohibitive. So
nowa-days, access permission is granted by
software programs that include password-
protected login procedures, directory services,
and access control lists.



111
According to John Daintiths 2004 Dictionary of Computing, a pass-
word is a unique character string held by each user, a copy of
which is stored within the system. During login, an authentication
process takes place; if the password entered by the user corre-
sponds with the stored value, the user is accepted by the system
and is on his or her way.

A good password should contain at least six to eight apparently
random characters. Personal details, such as vehicle license num-
bers or relatives names, are too easily guessed to be secure, and
even dictionary words are susceptible to the automated, exhaus-
tive search procedures used by hackers. It goes without saying
that you should never share a password, because it destroys the
princi-ple in which password protection is grounded: secrecy. In
the case of a forgotten password, the most secure method to
restore it is to have it reset, a practice that is an essential part of
password man-agement policy.

A directory service identifies all resources on a network and makes
them accessible to users and applications. Resources include e-
mail addresses, computers, and peripheral devices such as
printers and scanners. Ideally, the directory service should make
the physical network topology and protocols transparent so a user
on a net-work can access any resource without knowing where it is
or how it is physically connected.

Two of the most widely-used are LDAP (the Lightweight Directory
Access Protocol), which is used primarily for e-mail addresses,
and Netware Directory Service (NDS), which is used on Novell
Net-ware networks. Virtually all directory services are based on the
X.500 ITU standard, although it is so large and complex that
no vendor complies with it fully.

An access control list (ACL) is a list of permissions attached to an
object in a computer file system that specifies which users or sys-
tem processes are allowed to access it, as well as what operations
are permitted or prohibited. Each entry in a typical ACL specifies a
subject and an operation. For instance, if a file has an ACL that
con-tains (Alice, delete), this would give Alice permission to delete
the file. Note that an ACL is a form of authorization, which is about
de-fining what you can do, which is a concept that differs from
authen-tication, which is about validating who you are.

Role-based security provided by LDAP and NDS directories and
ACLs have several advantages over user-based security systems,
which are those built on usernames and passwords. Their primary
advantage is the ease of implementation and management
because the number of roles will almost always be smaller than
the number of users -- and if everyone in a role needs to be
changed, it can be done once to the role permissions in the ACL,
rather than being changed separately for each user.

A classification schema arranges or divides objects into groups
based on characteristics that the objects have in common. This
al-lows users to find an object more quickly than if it had been left
in an undifferentiated mass, makes it easier to detect duplicate
ob-jects, and conveys meanings that may not be conveyed by the
ob-jects name or its spelling.

A well considered schema is valuable in a security context because
it allows groups of objects to be managed and secured as a unit,



112
rather than as individual elements -- a much more efficient way to
go.

Encryption refers to the coding of sensitive data to protect it while
stored or traveling over a network. Credit card numbers, banking
information, and system passwords are but a few examples of
data that is commonly encrypted, but, really, any information in
any file or any directory can be as well.

There are two major types of algorithms used in encryption imple-
mentations: symmetric and asymmetric. In symmetric encryption,
data is scrambled using one key on the transmitter and unscram-
bled with the same key on the receiver. Thus the two keys have to
be the same for the data to be readable. Asymmetric encryption,
on the other hand, uses two different keys at both ends: one is
called the public key and the other is called the private key. The
public key must be known to everyone who wants to communicate
with the person owning that key, but the private key must not be
known for the encryption to be successful.

Redaction involves expunging confidential, personal, and other-
wise sensitive data from documents before they are released to
readers. A common method of redacting an imaged document is
to manually overlay a black layer on the sensitive portion and then
"burn" the redaction into the image by recasting as in a flattened
TIFF or other image format. (Not recasting it means the redactions
may be "lifted" using imaging tools.)

The redaction process need not be manual, as it can be intelligently
programmed and automated to eliminate sensitive data from pre-
determined categories of documents. Today's heightened aware-
ness of the legal implications of exposing information is leading
some companies -- particularly law firms -- to automatically re-
move sensitive material from all email messages before sending
them.

Security Types

If you had to put a management headline on security, a good one
to consider would be risk management because protecting your or-
ganizations assets -- and its information certainly is one of those --
ultimately comes down to safeguarding its ability to work on be-
half of its customers, employees, and stakeholders.

Done right, risk management is an ongoing process, not a silver
bullet, and follows the same kind of circular lifecycle track most
worthy initiatives do, as illustrated here. In computing security
terms, it operates on several different levels:

Your physical computing devices, to be sure

Your applications

Your network, and

Your information itself

Lets now go down the list.

Computer security encompasses the physical protection of comput-
ing devices and systems, preventing unauthorized access to them
and protecting the information they store from theft, malware, and
viruses.

Application security, or application data security, encompasses
measures taken throughout an application's lifecycle to protect it
from security gaps that stem from flaws in the design, develop-


ment, deployment, upgrade, or maintenance of the application As
shown here, techniques include assessing the current state,
setting policies and controls, monitoring and enforcing compliance
with those policies, and measuring the results achieve so areas of
needed improvement can be identified and targeted in the next
round.

It is important to note that application security does not extend
any further than the application itself, which only controls the use
of resources granted to it not which resources those are.

Network security consists of the provisions and policies adopted
by the network administrator to prevent and monitor unauthor-
ized access, misuse, modification, or denial of the network and
network-accessible resources. In this era of cloud computing, net-
work security is of paramount importance since an organizations
"network" can now include pieces of the Internet itself. Security
tools to use include multiple firewalls, individual passwords, data
encryption, and software keys.

Not everyone in an organization needs access to every database,
every record, or every document. Information security protects the
information itself and the underlying systems by applying three
core principles that are conveniently summarized in the acronym
"CIA":

The first is Confidentiality -- This ensures that information is not
accessed by unauthorized persons. For example, a credit card trans-
action on the Internet requires the credit card number to be trans-
mitted from the buyer to the merchant and from the merchant to a
transaction processing network. The system attempts to enforce
confidentiality by encrypting the card number during transmis-
sion, by limiting the places where it might appear (in databases,
log files, backups, printed receipts, and so on), and by restricting
access to the places where it is stored.

The "I" is Integrity -- which ensures that information is not altered
by unauthorized persons in a way that is not detectable by author-
ized users. Integrity is violated when a message is actively modi-
fied in transit. Information security systems typically guard against
this while providing data confidentiality.



AIIM provides deep dive training - both

in person and online - in such areas as ECM,
ERM, SharePoint, Taxonomy and Metadata,
Social Media Governance and others.



Then theres Authentication -- which ensures that the data, transac-
tions, communications, and documents (electronic or physical) are
genuine, and that users are whom they claim to be. The most com-
monly used tool for facilitating authentication is password protec-
tion. Note that authentication is about who you are, a concept that
differs from authorization, which is about what you can do. The lat-
ter is governed by access control lists.

WORM (or write once, read many) technology is a popular and reli-
able way to secure the integrity of data written to a file. It allows
information to be written to storage media a single time, at which
point the data is "cast in stone" or write-protected, which prevents
anyone from accidentally or intentionally altering or erasing the
data, thus permanently ensuring the integrity of the file. WORM


storage helps security administrators take all reasonable steps to
protect electronic records, to certify only authorized access, and
to ensure that audit procedures are available for inspection by the
relevant authorities.

These different types of security are getting more and more airtime
as regulatory activity has intensified and compliance has become
the watchword, especially in the realms of national security, per-
sonal healthcare privacy, and financial information. In the United
States, the "biggies" in each of these categories are:

The USA Patriot Act of 2001, which among other things
reduced restrictions on law enforcement agencies ability to
search tele-phone, e-mail communications, medical, financial,
and other re-cords

HIPAA, the Health Insurance Portability and Accountability Act

of 1996, which required the establishment of national standards
Key Links:
for electronic health care transactions and addressed the security
Background information on the CIP.
and privacy of health data to encourage the widespread use of
Practice Test -- Do a Self-Assessment.
electronic data interchange in the U.S. health care system, and
Free videos to prepare for the test.
The Sarbanes-Oxley Act of 2002, which set new or enhanced ac-
White paper on the CIP.
counting and reporting standards for all U.S. public company
Register for the test.
boards, management, and public accounting firms
Contact for more information: jwilkins [at] aiim.org

















115
SECTION 2

Records Management


In This Section...

1. Records Management
Principles and Standards
2. The Records
Management Lifecycle
3. Regulatory and
Jurisdictional Factors




















Records Management Principles and Stan-
dards

Records management refers to a set of activities
required for systematically controlling the crea-
tion, distribution, use, maintenance, and disposi-
tion of recorded information maintained as evi-
dence of business activities and transactions.




















The key word in this definition is "evidence." Put
simply, a record can be defined as evidence that
a particular event took place: a birth, an X-ray, a
purchase, a contract approval, the sending and
receipt of an email. Records management is pri-
marily concerned with the evidence of an organi-



116
zation's activities, and is usually applied according to the value of
the records rather than their physical format.

Essential records management capabilities include assigning
unique identifiers to individual records, providing safeguards
against unauthorized changes being made to those records, and
cre-ating an unbreakable audit trail for reasons of accountability
and ediscovery.

Unique identifiers are usually generated within a database for
systems administration and tracking purposes, and should not
be confused with reference codes, which may be composed of
more than one part.

Unauthorized changes are prevented by implementing airtight
manual procedures or using software applications (such as
en-cryption or digital signature) to keep a document from
being modified after it has been declared as a record.

Audit trails guarantee an enforceable chain of custody by mak-
ing it possible to know what a record said at a particular point in
time, how its content evolved to that point, and who was in-
volved with it. This is key to preserving the link between the re-
cord and the process or event it describes, and for being able to
demonstrate exactly who made what changes and when.

It is important here to note here that, as important as these capabili-
ties are, and as critical as it is to find a records management solu-tion
that supports them as well as the tasks illustrated on this slide, it is
even more vital that you take a long-term view of the process since
some records -- most notably in healthcare and government -- need
to be managed literally for decades, and digital technology
tends to change frequently and degrade quickly certainly faster
than paper does. So it is imperative that you periodically refresh
and migrate your electronic records in order to ensure their long-
term accessibility.

In the past, the term "records management" was sometimes used
to refer only to the records that were no longer in everyday use but
still needed to be kept -- "semi-current" or "inactive" records, often
stored in basements or off site. Today, though, it tends to refer to
managing the 'lifecycle' of electronic records, from the point of crea-
tion through their eventual disposal, as illustrated here.

Its important to note that a record can be either a tangible object
or digital information, but for all practical purposes, regulatory com-
pliance necessitates that business records be managed electroni-
cally. Hence the term electronic document records management
(EDRM).

In order to be viable, an electronic record must be capable of
being digitally created or captured, and then copied, distributed,
used, maintained, stored, and ultimately disposed of, with ease. It
must also be usable with and by other EDRM systems in order to
facili-tate commerce and any other activities that require records
to be exchanged with other organizations. To facilitate this,
standards of information description and records management
naturally arose, perhaps the most immediately notable of which
are described on the following slides.

ISO 15489 provides an international standard for defining and exe-
cuting records management, putting a number of critical practices
under this umbrella:



Setting policies and standards

Assigning responsibilities and authorities

Establishing and promulgating procedures and guidelines that
ensure the key characteristics of a record: authenticity,
reliability, integrity, and usability

Providing a range of services relating to the management and
use of records
Designing, implementing, and administering specialized sys-
tems for managing records and
Integrating records management into business systems and proc-
esses

ISO 23081 complements 15489 by specifically covering the meta-
data principles and practices relevant to electronic information han-
dling and record-keeping. A two-part technical specification, it de-
fines metadata needed to manage records. Part 1 addresses princi-
ples; Part 2 addresses conceptual and implementation issues.

And then theres the Dublin Core, which many of the metadata-
minded know of even if they cant cite the ISO numbers from mem-
ory. Named after the core set of metadata elements developed by
metadata and web specialists in Dublin, Ohio back in 1995, it has
since been adopted by the ISO as Standard 15836.

This organizations work has been ongoing ever since through the
auspices of the Dublin Core Metadata Initiative, or "DCMI," an
open organization engaged in the development of interoperable
metadata standards that support a broad range of purposes and
business models. Today it is dedicated to providing simple stan-
dards to facilitate the finding, sharing, and management of infor-
mation.

CMIS (Content Management Interoperability Services) is aimed
specifically at improving interoperability between content manage-
ment solutions -- and thus, by extension, it embraces records man-
agement solutions as well. Initiated by AIIM, CMIS is now being
administered by the OASIS standards body.

MoReq 2010 is an important specification for validating records
management applications ability to manage records throughout
their lifecycle. Short for "Modular Requirements for Records Sys-
tems," It was commissioned by the European Union and includes
a metadata model that is predominately a superset of the Dublin
Core model.

And if were talking MoReq 2010, then we have to mention DoD
5015.2, another metadata validation specification published in the
U.S. Department of Defenses Design Criteria Standard for Elec-
tronic Records Management software applications. Originally for
use in defense, it has many metadata elements associated with
the security classification and the access controls applicable to
military information. But it is now used more widely and used as a
quality benchmark even where formal compliance isnt required.

Both MoReq 2010 and DoD 5015.2 are specifications that allow ven-
dors to get tested and certify that their solutions meet particular re-
cord management standard requirements.

The Records Management Lifecycle

All records have a lifecycle that subjects them to a number of differ-
ent activities from the time they are declared until the time they are

disposed of. This time may be as short as a few hours, as is the
case with some transient records, or as long as forever, as is the
case with records of enduring historical value.

Knowing how long something must be kept is fundamental to de-
veloping a records retention schedule, a control document like the
one shown here that exists for all records irrespective of the
format in which they are maintained or the media upon which they
are stored.

Records can be, and generally are, used to demonstrate
compliance with applicable laws and regulations to law
enforcement agencies, government regulators, and other parties
seeking confirmation of compliance (e.g., prospective investors or
other business partners or parties seeking to bring legal action
against the company). Such compliance is the major rationale for
records retention since failure to establish a formal program may
lead to inappropriate destruc-tion or retention of records that may
cause significant legal and business issues later on.

Cost savings and litigation prevention also are at work here since
the proper management of records can trim storage requirements
and save time when the need arises to review information in-cluded
in documents created in the distant past. Time can also be saved
by disposing of records that no longer need to be retained, since
such documents would no longer be relevant for discovery
purposes in a litigation context. Further, valuable business informa-
tion about customers and other business partners can be more
quickly and easily shared if there are fewer places for employees to
hunt for it.
Note that multiple copies of records may exist and these need to be
addressed as well, generally by identifying the original or designat-
ing one version as the copy of record and retaining other copies for
shorter periods.

For the purpose of this discussion of the records lifecycle, lets use
as a foundation an itemization called "7 Elements of an Effective
Records Management Program" from THE Ohio State University.
These elements are controlled and driven by policies and proce-
dures, which differ from industry to industry and company to com-
pany. Ultimately, they should be rolled together into a written,
adopted, implemented, and regularly updated manual of polices
and procedures, the existence of which encourages and promotes
consistency in how records are handled.



AIIM provides deep dive training - both

in person and online - in such areas as ECM,
ERM, SharePoint, Taxonomy and Metadata,
Social Media Governance and others.



Lets now take them one by one.

The beginning of any good records management program starts by
conducting a records inventory that is a complete and accurate list-
ing of records, whether paper-, microform-, or electronically-based,
that indicates: 1) how and where the records are stored; 2) the vol-
ume of storage; 3) how the records are classified for future use and
retrieval sensitivity of information and access; and what their reten-




119
tion period is if known, OR its legal, fiscal, and/or administrative
value if not

Once these factors are determined, and the record is classified
and saved as such, it is said to be "declared."

Next, a records retention schedule is developed. Besides noting at
a minimum how long records must be retained and what their ulti-
mate disposition is to be, a retention schedule may indicate:

a legal or regulatory citation that mandates a specific retention
period;
how long the records should be maintained in active on-site files;

how long they may need to be retained in inactive off-site stor-
age; and
whether they are vital records.

Each record's lifecycle is determined by analyzing three primary
needs: (1) legal, (2) fiscal, and (3) administrative. There are
three secondary needs: (1) evidential, (2) historical, and (3)
informa-tional.

The next step is to determine whether your existing filing and stor-age
strategy is adequate for the task you are evaluating, or whether you
need to develop a new one. Questions to ask here include:

Where and how do you store your active records?

Where and how do you store your inactive records?

Do you have a "records hold" procedure in event of litigation?
What are the access procedures for sensitive records?

What are your procedures for transferring records of enduring
historical value to the archives?
How are you storing your electronic records?

What are the environmental condition of your storage facilities?

At some point in a records life, it may be converted to a digital im-
age, to microfilm, or both to enhance access, reduce physical stor-
age, or to provide disaster recovery and preservation tools. So you
may need to ask these questions in a couple of different contexts -
- including what records management applications may already
ex-ist and how they are approaching these questions.

Determining which are your vital records is a critical part of this
process. Vital records are those essential organizational records
needed to meet operational responsibilities under emergency or
disaster conditions. A good way to begin figuring out which those
are is to ask, "Which records would need to be recreated from
backup copies if the originals are lost or rendered inaccessible in
a disaster?"

In a private corporation, these typically are shorter-term records that
have legal and fiscal implications and amount to 1% to 7% of an
organizations records. In government agencies, these often are
long-term records like birth certificates, titles and deeds, and other
personal and realty data that must be preserved for historical pur-
poses. Vital records should be identified as an integral part of a dis-
aster recovery plan for business continuity.





120
Disaster prevention safeguards are included in records manage-
ment procedures and applications that can protect records by: 1)
adding a unique identifier as metadata; 2) safeguarding against un-
authorized editing or deletion; and 3) providing an audit trail of any
authorized changes to record, metadata, or systems settings.

Properly executed disaster prevention policies and procedures can
forestall or eliminate altogether the events that necessitate disaster
recovery.

Speaking of disaster recovery plans these "must-haves" are writ-
ten, approved, and implemented procedures for the prevention,
mitigation, and recovery from records loss in an emergency or dis-
aster. They should include at least the following components:

Chain of command with contact information

Decision tree for appropriate actions

Listing of emergency management officials with contact informa-
tion
Listing of records reclamation vendors

Listing of vendors (supplies, computer equipment, records stor-
age, etc.)
List of supplies needed to help mitigate loss

Identification of an alternative operational site that is either: 1) a
"hot site" containing all the computing equipment and software
necessary to put yourself "back in business" or 2) a cold site to
which you have to bring your back-ups and all your computing
equipment, software, furniture, fixtures, etc.
Backup policy and procedures for electronic files, with backup
preferably stored offsite at least 5 miles from an operating sys-
tem

System restoration procedures

Remember that the biggest threat to records is water weather from
flood, leaky pipe, or even the water putting out a fire. Remember,
too, to review and test your plan on a regular basis to make sure it
still is appropriate to your needs. Another major threat to records is
the fact that hardware/software incompatibilities and media in-
stabilities, among other things, can complicate the maintaining and
restoring of electronic records over extended periods of time.

The final stage of record management is disposition, and is when
a record is either destroyed or permanently retained. These
actions typically fall into one of two categories: 1) destruction via
disposal in trash or recycling, shredding, macerating, incinerating,
pulping, and deleting or other electronic obliteration, or 2) transfer
to an ar-chives for permanent preservation

Regularly disposing of records lightens the load of what you have
to retain and manage, and often makes legal counsel and compli-
ance executives happy as well since there are fewer places for po-
tential "smoking guns" to be found.

Regulatory and Jurisdictional Factors

A phrase that is much bandied about, regulatory compliance refers
to the effort to ensure organizational personnel are aware of and
take steps to abide by laws and regulations relevant to their use of
information and information technology.


While this sounds straightforward enough, there usually are multi-
ple regulatory authorities to deal with, different geographic juris-
dictions to navigate (including international ones), and industry
oversight bodies as well. These all put their own spin on how and
how long records are to be created, managed, and destroyed --
and especially in cases where repositories include information
about citizens of more than one country, the impact can be
dramatic, not only on records policies, but on the supporting
infrastructure as well, which may need to incorporate separate
databases in each country in order to be compliant.

In the financial sector, regulations subject financial institutions to
certain requirements, restrictions, and guidelines in order to main-
tain the integrity of the financial system. These may be handled by
either governmental or non-governmental organizations including
in the U.S.:

Securities and Exchange Commission (SEC)

Financial Industry Regulatory Authority (FINRA)

Federal Reserve System (the "Fed")

Office of the Comptroller of the Currency (OCC)

National Credit Union Administration

In most cases, financial regulatory authorities regulate all financial
activities. But in some cases, there are specific authorities to regu-
late each sector of the finance industry, mainly banking, securities,
insurance, and pensions markets, and in some cases also commodi-
ties, futures, forwards, etc. For example, in Australia, the Austra-lian
Prudential Regulation Authority (APRA) supervises banks
and insurers, while the Australian Securities and Investments Com-
mission (ASIC) is responsible for enforcing financial services and
corporations laws.

Adding to the fluid nature of this environment is the fact that the
structure of financial regulation has changed significantly in the past
two decades, leading to the blurring and globalization of the legal
and geographic boundaries between markets in banking, secu-
rities, and insurance.

The most significant recent piece of U.S. financial regulation may
well be the Sarbanes-Oxley Act of 2002, which holds corporate top
management significantly more personally responsible for the accu-
racy of reported financial statements than ever before.

Named for its Congressional sponsors -- Senator Paul Sar-

banes and Representative Michael Oxley -- the bill was enacted
in the wake of a number of major corporate and accounting
scandals including those involving Enron and Tyco. These scan-
dals, which cost investors billions of dollars when the share prices

of affected companies collapsed, shook public confidence in the
na-tion's securities markets. The act, by the way, does not apply to
privately-held companies.

SarbOx contains 11 titles, or sections, and requires the SEC to
imple-ment rulings on requirements to comply with the new law. It
also created a new, quasi-public agency, the Public Company
Account-ing Oversight Board, or PCAOB, and charged it with
overseeing, regulating, inspecting, and disciplining accounting firms
in their roles as auditors of public companies.


The Act also covers issues such as auditor independence, corporate
governance, internal control assessment, and enhanced financial
disclosure. Depending upon the number of victims involved, a CEO
can receive up to 20 years in jail if convicted of a violation, and things
can get worse if destruction of records is involved.

And THAT, of course, is why it is most important to us here, today.

In healthcare, the honor of Most Influential in the U.S. likely goes to
the Health Insurance Portability and Privacy Act (or HIPAA) Pri-
vacy Rule, which establishes national standards to protect individu-
als medical records and other personal health information, and ap-
plies to health plans, health care clearinghouses, and those health
care providers that conduct certain health care transactions elec-
tronically.

Specifically, it requires appropriate safeguards be maintained to
protect the privacy of personal health information, and sets limits
and conditions on the uses and disclosures that may be made of
such information without patient authorization. The Rule also
gives patients rights over their health information, including the
right to obtain and examine a copy of their health records, and to
request corrections be made. The HIPAA Enforcement Rule con-
tains provisions relating to compliance and investigations, the im-
position of civil money penalties for violations of the HIPAA Ad-
ministrative Simplification Rules, and procedures for hearings.

Pharmaceutical companies worldwide also are subject to a variety
of laws and regulations regarding the patenting, testing, safety, effi-
cacy, and marketing of drugs, controlled on a country-by-country
basis. In the United States, new pharmaceutical products must be
approved by the Food and Drug Administration (FDA) as being
both safe and effective. Post-approval surveillance ensures that
dur-ing its marketing following approval, the safety of a drug is
moni-tored closely due to the fact that even the largest clinical trials
can-not effectively predict the prevalence of rare side-effects.

In the realm of national security, the USA PATRIOT Act serves as a
prime example of a key regulation to comply with. Signed into law in
2001 in response to the terrorist attacks of September 11th, it was
extended in parts in 2011 and actually stands for something: "The
Uniting (and) Strengthening America (by) Providing Appropriate
Tools Required (to) Intercept (and) Obstruct Terrorism Act".

Among its many provisions are several that dramatically reduce
restrictions on law enforcement agencies ability to do a number of
things that relate to information management, perhaps most nota-
bly to (1) search telephone, e-mail, medical, financial, and other
re-cords, and (2) to regulate financial transactions, particularly
those involving foreign individuals and entities.

Among the provisions extended was one permitting searches of
business records (the "library records provision"), which will con-
tinue to be a part of the information managers duties for at least
another four years.

Non-governmentally, broad industry regulations exist to safeguard
privacy. To date, the U.S. has no single data protection law
compa-rable to the E.U.s Data Protection Directive, tending
instead to be adopted on an ad hoc basis, with legislation arising
when certain sectors and circumstances require (e.g., the Video
Privacy Protec-tion Act of 1988, the Cable Television Protection
and Competition Act of 1992, the 2010 Massachusetts Data
Privacy Regulations, etc.).


Generally speaking, the increased ability to gather and send infor-
mation has had negative implications for regulating the dissemina-
tion of personal information. Many organizations are attempting to
calm customers and users fears by updating and publicizing new
and newer policies, but breaches are reported with disturbing regu-
larity, and to this observer, at least, the risk of ever more stringent
regulation is on the rise not only in the States, but especially in
Europe, where such sensitivities seem to be more acute.

As alluded to earlier, the specifics of an organizations approach
to records management must reflect the particular requirements of
the industries and geographies in which it is active.

For instance, jurisdictions and retention requirements vary from
country to country, depending upon their individual regulatory re-
quirements, the existence of reciprocal agreements with other
coun-tries, and the jurisdictional authority each country accepts. In
the U.S., the domestic laws governing records retention are largely
fed-eral and not state, so jurisdiction is determined mainly by the
regu-latory authority of the federal agency involved rather than a
depart-ment of the city or state in which the records are stored.
Often, however, more than one institution regulates and supervises
a given industry, and thats where the challenges really begin.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org




124
SECTION 3

Data Privacy


In This Section...

1. Data Privacy Fundamentals

2. Administration Rights and
Personal Information
Management
























Data Privacy Fundamentals

Data privacy centers on the ability to share data
while protecting personally identifiable informa-
tion, and concerns about it exist wherever that
kind of information is collected and stored -- in
digital form or otherwise. Improper or non-
existent disclosure control can be the root cause






































for problems here, which can arise in the
context of information of all different kinds:

Healthcare records

Criminal justice proceedings

Financial transactions



125
Biological traits

Residence records

Privacy breaches

Personal Internet data

By far, the areas most focused upon -- and thus subject to the
most regulatory attention -- are financial data privacy, healthcare
pri-vacy, and personal data privacy, especially in this online age.

Information about a person's financial transactions, including the
amount of assets, positions held in stocks or funds, outstanding
debts, and purchases, can be quite sensitive. And of course, if
crimi-nals gain access to the likes of credit card and bank account
num-bers, fraud and identity theft are very real possibilities.

Less directly invasive but no less scary is the fact that information
about a person's purchases can reveal a great deal about his or her
history, such as places visited, people contacted, products used,
ac-tivities engaged in, or medications used. Many corporations no
doubt ache to use this information to target individuals with cus-
tomized marketing campaigns -- and some (like Google and its abil-
ity to tailor ads based on an analysis of a Gmails contents) are
known to be proceeding down this path already.

The U.S. Right to Financial Privacy Act (RFPA) is a federal law that
gives the customers of financial institutions the right to some level of
privacy from government searches. Before the Act was passed, the
government did not have to tell customers that it was accessing their
records, and customers did not have the right to prevent it. It was
passed after the Supreme Court held that financial records are
the property of the financial institution with which they are held,
rather than the property of the customer.

The Health Insurance Portability and Accountability Act of 1996,
known as HIPAA, among other provisions regulates the use and
disclosure of certain information held by "covered entities" (gener-
ally, health care clearinghouses, employer sponsored health
plans, health insurers, and medical service providers that engage
in cer-tain transactions).

It establishes regulations for the use and disclosure of Protected
Health Information, which is any information held by a covered
entity which concerns health status, provision of health care, or
payment for health care that can be linked to an individual. This is
interpreted rather broadly and includes any part of an individ-ual's
medical record or payment history.

HIPAA further requires covered entities to take reasonable steps
to ensure the confidentiality of communications with individu-als,
to keep track of disclosures, and to notify individuals of uses of
their protected information.

Covered entities must also document privacy policies and proce-
dures, and must appoint a Privacy Official and a contact person
responsible for receiving complaints, and train all members of
their workforce in procedures regarding protected information.

So as you see, the arm of this law is fairly long. BUT the agency
chartered with handling the complaints -- the Department of Health
and Human Services Office for Civil Rights -- has a long backlog
and, says the Wall Street Journal, ignores most complaints.


The right to personal privacy, though guaranteed by the Constitu-
tion as interpreted by the U.S. Supreme Court, is an implicit right in
the United States. (This is not case in Europe, where it is an ex-
plicit right.) This means that the country takes what it calls a "secto-
ral" approach to data protection legislation, relying on a combina-
tion of legislation, regulation, and self-regulation, rather than gov-
ernmental regulation alone.

While the European Union has the Data Protection Directive on its
books, the U.S. has no single data protection law that is compara-
ble. Privacy legislation in the States tends to be adopted on an ad
hoc basis, arising only when certain sectors and circumstances re-
quire (e.g., the Video Privacy Protection Act of 1988, the Fair
Credit Reporting Act, and the 2010 Massachusetts Data Privacy
Regula-tions).

Though not legislative, one set of U.S. guidelines serving to
govern fair information practices in an electronic marketplace is
the Fed-eral Trade Commissions collection of Fair Information
Practice Principles, the core of which address these elements:

Notice/Awareness, under which consumers should be given no-
tice of an entity's information practices before any personal infor-
mation is collected

Choice/Consent, or giving consumers options to control
whether and how their data is used. The two most typical types
of choice models are opt-in and opt-out.

Access/Participation, addressing not only a consumer's ability to
view the data collected, but also to verify and contest its accu-
racy in an inexpensive and timely manner
Integrity/Security to ensure that the data collected is accurate
and protected
Enforcement/Redress, in the form of (1) self-regulation by the
information collectors or an appointed regulatory body, (2) pri-
vate remedies that give civil causes of action for individuals
whose information has been misused to sue violators, and (3)
government enforcement, which can include civil and criminal
penalties.

The existence of privacy laws and guidelines means organizations
must safeguard the personal data with which they are entrusted. This
is accomplished largely through the use of high technology security
protocols -- a term that generally refers to a suite of compo-nents that
work in tandem, and when used with a communications protocol,
provides secure delivery of data between two parties.

For example, the 802.11i standard provides these functions for
wire-less LANs. For the Web, SSL is widely used to provide
authentica-tion and encryption in order to send sensitive data such
as credit card numbers to a vendor.

The primary components of a security protocol are as follows:

Access control authenticates user identity and grants entry to
specific resources based on permissions and policies
Encryption algorithms are formulas that turn ordinary data, or
"plaintext," into a secret code known as "ciphertext." Each algo-
rithm uses a string of bits known as a "key" to perform the calcu-
lations. The larger the key (the more bits), the greater the num-
ber of potential patterns that can be created, and the harder it is
to break the code and descramble the contents.


127
Key management involves the creation, distribution, and mainte-
nance of a secret cryptographic key or keys, and makes them
available to all parties to the information exchange.

Message integrity ensures that a transmitted message is valid
and that an encrypted message has not been tampered with.
The most common approach is to use a one-way hash function
that combines all the bytes in the message with a secret key
and pro-duces a message digest that is impossible to reverse.

Administration Rights and Personal Information Manage-
ment

Administrative rights are those that permit access to and control
over to system data in four major areas:
Authentication: ensuring that all access to data is genuine and
verified and trusted
Authorization: granting or denying permission for different
types of access activity
Confidentiality: making sure that sensitive information is not dis-
closed to unauthorized individuals, entities or processes, and
Accountability: enabling activities on the data to be traced to in-
dividuals or entities that may be held responsible for their ac-
tions -- the key to enabling auditability

Without these rights, a user cannot perform many such system
modifications, including the installing software or the changing of
network settings. More specific to data privacy, only the local or do-
main administrator can access all users files on a computer.
In order to gain administrative rights, a user first has to be de-
clared an administrator, or someone who has a local account or is
part of a local security group that has complete and unrestricted
access to create, delete, and modify files, folders, and settings on
a particular computer or multiple PCs. This stands in contrast to
other types of user accounts that have been granted only specific
permissions and levels of access.

Now, people love to have control over their own computers, mostly
because they think of them as THEIR computers, not the
companys. But it is not always desirable for them to have adminis-
trative privileges because optimal security is attained by giving
each user in the system the least number of privileges necessary
for them to do their jobs -- that way they cant do anything
(intention-ally or otherwise) to compromise the security and
integrity of the system. This is called the principle of least privilege,
and it ulti-mately reduces the "attack surface" by eliminating
unnecessary privileges that could result in exploiting network
vulnerabilities and otherwise compromising the security of the
system, and the privacy of the information stored therein.

This isnt to say that people dont have information that is used
mostly only by them, and the practice of protecting and managing
that is known as personal information management (PIM). It cen-
ters on the activities people perform in order to acquire, organize,
maintain, retrieve, and use information like documents (paper-
based and digital), Web pages, and email messages for everyday
use, to complete tasks (work-related or not), and fulfill a persons
various roles (as employee, customer, parent, friend, member of
community, etc.).




One ideal of PIM is to always have the right information in the right
place, in the right form, and of sufficient completeness and quality
to meet the current need. Technologies and tools known as
personal information managers help create more time to make
crea-tive, intelligent use of the information at hand in order to get
things done. But because they exist in so many different forms and
formats -- Outlook and Google calendars, smartphones and tablets,
paper desk calendars, etc. -- they also can lead to fragmentation
and a diffusion of energies spent, and can present more opportuni-
ties for privacy breaches and out-and-out data loss if care isnt
taken.

While not a breach, another opportunity for information exposure
arises in the form of Open Records or Freedom of Information
legis-lation. These refer to laws that guarantee access to data held
by the state and establish a "right-to-know" legal process by which
re-quests may be made for government-held information, to be re-
ceived freely or at minimal cost, barring standard exceptions.

Also variously referred to as "sunshine laws" (especially in the U-
nited States), these regulations work with increasingly common
government commitments to publish and promote openness. In
many countries, there are constitutional guarantees for the right of
access to information, but these are usually unused if specific sup-
port legislation does not exist. Many states expand government
transparency through open meeting laws, which require govern-
ment meetings to be announced in advance and held publicly.

One way to mitigate these possibilities is to ensure your people are
trained in how best to use and protect their personal information.
Part of this must also include awareness of the latest developments
in malware, spyware, key logging, phishing, and other illegal inva-
sions of privacy, as well ongoing innovations in protective technol-
ogy in these areas. As such, this training must be constant and
ever-green -- especially for administrators! -- and perhaps baked
into the "new hire" process so new employees are immediately
indoctri-nated into how things are to be done.

The fact that you are providing training will keep you in good
standing with any auditors -- internal or external -- that should
come calling because it shows you have policies concerning, and
are actively supporting, the protection of the data in your care. As
noted at the start of this module, accountability is a big part of this,
and the existence of training should make clear that you are taking
it seriously.

An information security audit, of course, is a formal check on the
level of that protection. Most commonly, the controls being audited
can be categorized as technical, physical, and administrative, as
they dig into the hardware and software, non-virtual, and
permissions-based strategies in use. As depicted on this slide,
good auditing procedures should include a whole cycle of activities,
be-ginning with identifying the issue (in this case, data privacy),
defin-ing the benchmarks of acceptable performance, monitoring
what is actually occurring and comparing the result with the goal,
and then making any improvements deemed necessary.



Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org



















































130
SECTION 4

Digital Rights Management


In This Section...

1. Principles and Compliance
Support
2. Protection and Security
Tools
























Principles and Compliance Support

Digital Rights Management, or DRM, refers to the
access control technologies used by hardware
manufacturers, publishers, copyright holders, and
individuals to limit the use of digital content (and
devices) in ways that is neither desired nor
intended by the content provider. Such content
























includes Internet distributed music, movies, e-
books, video games, and software applications.

DRM is rooted in two principles:

Protecting intellectual property -- namely copy-
right, which allows artists to optimize royal-



131
ties, as required by the Digital Millennium Copyright Act, and

Doing so without infringing upon the "fair use" permissions of a
copyrighted object.

These principles drive the following objectives: 1) They fight copy-
right infringement online; 2) They keep users safe from computer
viruses, and 3) They help the copyright holder maintain artistic
control over his or her work and ensure continued revenue
streams.

The just-mentioned Digital Millennium Copyright Act (DMCA), is
one of DRMs major legislative underpinnings. An amendment to
United States copyright law passed unanimously on May 14,
1998, it criminalizes the production and dissemination of
technology that enables users to circumvent technical copy-
restriction meth-ods.

Under the Act, doing so is illegal if done with the primary intent of
violating the rights of copyright holders. As such, the Act gives
prosecutors the big stick they need to heavily motivate potential
violators to obey copyright compliance laws.

Circumventing copyrights and hijacking revenue streams are two
of the primary outcomes -- if not the original objectives -- of digital
piracy, and DRM seeks to prevent access, copying, or conversion
to other formats by end users by controlling the use of digital
media. It is also the means by which compliance with information
security regulations such as the DMCA and the U.S. Copyright Act
can be facilitated.
In a business context, key benefits of DRM include the ability to
control IP beyond the organization's boundaries and limit or pro-
hibit redistribution of content beyond its intended abilities.

DRMs success in that regard has led to the creation of new busi-
ness models along the way. For example, buying DRM-protected
music from iTunes locks you into using iTunes and iPods to play
those files since they wont play on Windows Media Player or a
SanDisk MP3 player. So besides enforcing copyright protection,
the use of DRM has helped to enforce the iTunes pricing model as
well.

This hasnt been a constant, however, as in 2009, Apple chief
Steve Jobs made iTunes available as "DRM-Free." Some chose to
interpret it as Jobs admission of the ineffectiveness of the DMCA,
while oth-ers praised him for smartly appeasing existing
consumers and at-tracting new ones by relaxing his stance. Either
way, a new pricing model was the giveback to the music industry
for allowing the change.

Protection and Security Tools

DRM systems can exhibit and combine a number of protection ca-
pabilities to assure the security of media content and their delivery
systems. These include authentication, digital watermarks, digital
fingerprints, digital certificates, digital signatures, conditional ac-
cess systems, and product activation codes, a variation on which
is shown on this slide.

Of all the choices available, perhaps the most robust is encryption,
which establishes and maintains security associations between two
network elements, and ensures that the traffic passing through the
interface between them is cryptographically secure.


132
There are two major types of algorithms for implementing encryp-
tion: symmetric encryption and asymmetric encryption.

In symmetric encryption, data is scrambled using one key on the
transmitter and is processed at the receiver using the same key.
Thus the two keys must be the same for the encrypted data to
be readable.

Asymmetric encryption, on the other hand, uses two different
keys at the transmitter and the receiver to encrypt the data. One
is called public key, and the other, the private key. The public
key must be known to everyone who wants to communicate, but
the private key must not be known to anyone but its owner for
the encryption to be successful.

Public Key Infrastructure (PKI) is a good example of the way DRM
uses asymmetric encryption. A set of hardware, software, people,
policies, and procedures needed to create, manage, distribute,
use, store, and revoke digital certificates, a PKI binds public keys
with respective user identities by means of a certificate authority
(CA). The user identity must be unique within each CA domain.

This binding is established through a registration and issuance
process that, depending on the level of assurance the binding
has, may be carried out by software at a CA, or under human
supervi-sion. The PKI role that assures this binding is called the
Registra-tion Authority (RA), which ensures that the public key is
bound to the individual to which it is assigned in a way that
ensures non-repudiation.

Like most conversations about information management tools,
weve spent our time so far talking about how to apply technology
to the problem of digital rights management. But another effective
approach is contractual, as in the case of developing restrictive li-
censing agreements.

Here, the access to digital materials, copyright, and public domain
are controlled. Some restrictive licenses are imposed on
consumers as a condition of entering a Web site or when
downloading soft-ware, and most are aimed at controlling access to
and the reproduc-tion of online information, including the making of
backup copies for personal use.

You wont be surprised to know that technology does have a play
here too, as the scrambling of expressive material and the
embed-ding of tags can also be used to reinforce the contractual
agree-ment.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org


SECTION 5

Archiving


In This Section...

1. Archiving and Storage
Concepts
2. Alternative Approaches
3. Standards and Their Effect
on Archiving
4. Long Term Access




















Archiving and Storage Concepts

Archiving is the practice of long-term preserva-
tion of records with enduring historical value, and
the archive itself is either a collection of his-torical
records, or the physical place they are lo-cated.
According to Wikipedia, archivists prefer the term
"archives" (with an S) as the correct ter-






































minology to serve as both the singular and plu-
ral, since "archive," as a noun or a verb, has ac-
quired meanings related to computer science.

An information archive, on the other hand, is de-
fined in a computing environment by
PCmag.com as a file that contains one or more
compressed files. Although archived files may re-


134
main on the same computer as the originals, the word "archive" im-
plies data retention policies are at work, and archived data typi-
cally is stored somewhere else for backup and historical purposes.
In fact, today almost all archiving is done to removable media be-
cause this also allows the data to be sent to offsite storage far
away from the original location, safeguarding it should there be an
issue at the physical location.

Deciding what should be archived and what can be discarded should
be based on operational need and the retention period as-signed to
the content in question -- and those decisions should drive which
formats are used for the purpose. Most archive for-mats are capable
of storing folder structures in order to reconstruct the file/folder
relationship when decompressed. Open standards governing file
format and media types -- like PDF/A (ISO 19005), for example -- are
better for long-term access than proprietary ones since they are
more likely to receive support by many organiza-tions over time,
rather than just the one that owns the technology.

Storage differs from archives in the same way a dresser drawer dif-
fers from a safety-deposit box: put-away-yet-readily-retrievable vs.
socked-away-for-protection.

Technologically, computer storage is the place where data is held
in an electromagnetic or optical form for access by a computer
proces-sor (CPU). It comes in three basic forms: 1) online,
including hard drives and mounted removable media; 2) nearline,
including un-mounted removable media; and 3) offline, including
media stored somewhere "else" like in a data center or off site

RAID is a storage technology that provides increased reliability
and functions through redundancy. It is short for Redundant Array
of Independent Disks, and it achieves its objectives by combining
multiple disk drive components into a logical unit, in which data is
distributed across the drives in one of several ways called "RAID
levels." The different schemes or architectures are named by the
word RAID followed by a number (e.g., RAID 0, RAID 1), and each
one provides a different balance between two key goals: increase
data reliability and increase input/output performance.

RAID is now used as an umbrella term for computer data storage
schemes that can divide and replicate data among multiple physi-
cal drives. The physical drives are said to be in a RAID, which is
accessed by the operating system as one single drive, a concept
that is an example of storage virtualization.

Multi-tier storage strategies involve the use of virtual or physical
storage devices with different input/output performance specs,
data availability, and relative cost characteristics to provide differ-
entiated online storage for computer systems. It is common data
management practice to create multiple file systems on storage
de-vices of different types and move files between them to meet
busi-ness needs.

For example, an active database application might keep current
transactions in a file system on top-tier storage, and move 30-day-
old transactions to another file system on a second-tier device. As
production storage devices grow dangerously close to full, large,
inactive files might be relocated to larger, slower, less expensive
de-vices.

A Hierarchical Storage Management (HSM) system involves using
software that scans a file system periodically and migrates files
that meet certain criteria (usually based on data activity and inac-

tivity) to alternate storage devices (such as RAID systems and
tape). HSM leaves stubs in the file system to indicate the locations
of migrated files so that they can be restored to file system storage
automatically when applications or users access them.

Alternative Approaches

Centralized storage, also referred to as storage consolidation or
storage convergence, is a method of centralizing data storage
among multiple servers. The objective is to facilitate data backup
and archiving for all subscribers in an enterprise, while minimiz-ing
the time required to access and store data. Other desirable fea-
tures include simplification of the storage infrastructure, central-
ized and efficient management, optimized resource utilization, and
low operating cost.

If users requests for information only require data from within one
functional area to answer -- say, finance people only ask for infor-
mation in the finance domain, human resources people only ask
for HR information, etc. -- then managing it in-place, or in a silo (or
book of record) may be just fine.

But if users consistently want information that can only be pro-
duced using data from multiple areas, then you essentially have to
build a centralized storage facility. In other words, if an organiza-
tions storage needs are mainly driven by a business need to con-
solidate and coordinate the information that will be used by the en-
terprise as a whole, then centralize. If not, and you are sure there
are no long term implications, then silos are fine.

Centralized storage comes in three different varieties: Network-
attached storage (NAS), Redundant Array Of Independent Disks
(RAID), and the Storage Area Network (SAN). In NAS, the hard
drive that stores the data has its own network address. Files can
be stored and retrieved rapidly because they do not compete with
other computers for processor resources. In a RAID setup, the
data is located on multiple disks, and the array appears as a single
logi-cal hard drive. This facilitates balanced overlapping of input/
output operations and provides fault tolerance, minimizing down-
time and the risk of catastrophic data loss. The SAN is the most
so-phisticated architecture, and usually employs Fibre Channel
tech-nology. SANs are noted for high throughput and the ability to
pro-vide centralized storage for numerous subscribers over a large
geo-graphic area. SANs also support data sharing and data
migration among servers.

Web archiving is the process of collecting portions of the World
Wide Web and ensuring the collection is preserved in an archive,
such as an archive site, for future researchers, historians, and the
public to utilize. Due to the Webs massive size, Web archivists
typi-cally employ Web crawlers for automated collection.

The largest Web archiving organization based on a crawling ap-
proach is the Internet Archive, which strives to maintain an archive
of the entire Web. National libraries, national archives, and various
consortia of organizations are also involved in archiving culturally
important Web content, and there are commercial web archiving
software and services available to organizations that need to ar-
chive their own Web content for corporate heritage, regulatory, or
legal purposes. Reed Technology and Information Services is one
of these.

Though it sounds similar to Web archiving, Web storage in fact is
something different as it refers to storing or backing up data over


the Internet. Many third-party storage providers are in business to-
day that let users upload and store all types of computer files,
which typically can be shared via password or made public to any-
one with Internet access and a Web browser. Many services --
Box.net and Dropbox.com to name two -- offer a limited amount of
disk space for free with monthly fees for higher capacities.

Cloud storage is a variant on both of these themes as it uses a
model of networked online storage whereby data is stored on mul-
tiple virtual (or physical!) servers, generally hosted by third par-ties,
rather than being hosted on dedicated servers. Hosting compa-nies
operate large data centers, and people who require their data to be
hosted buy or lease storage capacity from them and use it for their
storage needs.

The data center operators, in the background, virtualize the re-
sources according to the requirements of the customer and
expose them as storage pools, which the customers can
themselves use to store files or data objects.

The problem is that the term "cloud" today has come to be used
as a metaphor for the Internet itself, a usage rooted in early
cloud-shaped drawings used originally to represent the telephone
net-work, and later to depict the Internet in computer network dia-
grams as an abstraction of the underlying infrastructure it repre-
sents.

A legacy system is an old method, technology, computer system, or
application program that continues to be used, typically because it
still functions for the users needs, even though newer technology or
more efficient methods of performing a task are now available. A
legacy system may include procedures or terminology that are
no longer relevant in the current context, and may hinder or con-
fuse understanding of the methods or technologies used. In order
for a legacy system or line of business (LOB) system archive to be
used in modern business processes by departments outside of its
resident information silo, it must be integrated with other applica-
tions using middleware and message brokering applications or other
apps that allow for enterprise application integration.

To this end, the Service Object Access Protocol (SOAP) is the proto-
col most used to enable the integration of Web services with legacy
and LOB systems. SOAP, encoded in XML and HTTP, is a message-
passing channel and serves as a platform-independent, uniform in-
vocation mechanism between Web Services. Using the Web as a
me-dium, developers can build a framework that allows for several
im-portant things:

the identification of reusable business logic in large legacy sys-
tems in the form of major legacy components,
the identification of interfaces between the legacy components
and the rest of the legacy system,
the automatic generation of coded "wrappers" such as CORBA to
enable for remote access, and finally,
the seamless interoperation with Web services via HTTP based
on the SOAP messaging mechanism.

The so-called "Service Web" has emerged as a promising frame-
work to address the issues related to process integration and
access-ing a legacy archive, or creating a new Internet-based
archive from legacy and LOB archives, via the Web.


Standards and Their Effect on Archiving

Contemporary archival standards reflect the broad definition of ar-
chiving that has been adopted by todays major archival associa-
tions, which is: "The process of capturing, collating, analyzing, and
organizing any information that serves to identify, manage, locate,
and interpret the holdings of archival institutions and explain the
contexts and records systems from which those holdings were se-
lected."

These standards apply to every process during which information
about records, repositories, staff, or users is captured, processed,
or retrieved. These include not just the production of finding aids,
but also accessions documentation, compilation of statistics, and
other activities throughout the records lifecycle.

The most fundamental characteristic shared by contemporary archi-
val standards -- whatever their strength -- is that each is the prod-
uct of consensus. The standards listed in this slide are the result of
some kind of group effort and received broad review within their
organizations or by potential users before being adopted or pub-
lished. For products of official standards organizations, like the
U.S. National Information Standards Organization (NISO), there
are explicit procedures including formal review periods and ballot-
ing by voting members of the organization. Guidelines are more
often compiled and adopted by committees of a professional or-
ganization.

Archival objectives typically include one of two outcomes, and it is
important to be clear which one you are aiming toward when
choosing a standard to adhere to:

Conservation: the restoration of an object to its original condi-
tion -- the book, the manuscript, the original, thumbable object
Preservation: any act that propagates in any form (page,
movie, file, etc.) or medium (paper, tape, film, CD, DVD, hard
drive, etc.) the words or images of the original object

While these terms appear to be similar, over time they have be-
come antonyms in the world of the archivist. Case in point: the dis-
binding of every volume in order to enable and speed up micro-
photography and avoid "gutter shadow" at the U.S. Library of Con-
gress, where a former department head described the microfilm
de-partment as being "like having a sausage factory, in a way.
Youve got to feed the beast."

This issue notwithstanding, technology advances are enabling re-
cords managers and archivists to improve their methods of replac-
ing paper-based storage. Although micrographics have long been
the conversion tool of choice for a great many librarians and re-
cords managers, enterprise content management (ECM)-based
al-ternatives, such as disk storage, are fast overtaking them. They
have the following advantages over microfilm and microfiche:

Random access versus linear access

Simultaneous access versus serial access

Better speed, greater reliability, and easier modification

These are important characteristics to consider because the sheer
length of time some archival material needs to be saved means mul-
tiple media eventually will come into play and exert an influence over
how the stored records are accessed and used. As depicted in


138
this slide, documents from well within many of our lifetimes began
on paper, eventually likely were filmed, and now frequently are be-
ing converted to disk -- and since each of these has its own
lifecycle issues to contend with (durability, readability, etc.), its like
taking three bits at the archival apple rather than only one.

Further complicating the picture is the fact that as documents are
becoming more complex, it is becoming increasingly meaningless
to think of them as existing at all. At least, beyond their interpreta-
tion by the software that created them, since the bits and bytes in
each document file are meaningful only to the program that cre-
ated them. A document file is not a document in its own right: it
merely describes a document that comes into existence only when
the file is "run" by the program that created it. Without this author-
ing program -- or some equivalent viewing software -- the docu-
ment is held cryptic hostage to its own encoding.

"Unplanned obsolescence" thus is created by high-tech advances
and proprietary file and storage formats that go out of vogue, forc-
ing users to adopt a data migration program that is sporadic and
largely unplanned. So eternal vigilance is the price that archivists
must pay for digital conservation, even after they think a docu-
ments lifecycle has run its course.

Long Term Access

Digital longevity is the notion of the length of time archived infor-
mation can survive -- an oft-overlooked but critical issue that stems
from the twin facts that digital storage media deteriorates rela-tively
quickly and storage devices and file formats change nearly as fast.
Ask anyone who recently has tried to pull information off of a floppy
disk or ZIP drive and youll know exactly what I mean,
and youll understand why digital information is widely consid-ered
to require more constant and ongoing attention than other me-dia
in this regard.

Digital preservation is the business practice that addresses this
need by actively managing digital information over time to ensure
its accessibility. The constant input of effort, time, and money to
handle rapid technological and organizational advance is consid-
ered a major stumbling block as the challenge is compounded by
the multiplicity of complex factors -- hardware, software, OS, driv-
ers, media, firmware, etc. -- that contribute to the problem.

How ironic it is that we can still read written documents from sev-
eral thousand years ago, but digital information created in the
1990s is in serious danger of being lost! In fact, the digital storage
media shown on this slide have already failed to remain readable
for 1/100th as long as the Rosetta Stone, which is shown in the
cen-ter. The thing is, the unique characteristics of digital
manifestations make that it easy to create content and keep it up-
to-date, also pre-sent difficulties in its long-term preservation.

Most people who think that digital information will last forever fail
to realize how prone to degradation digital media actually is, a fact
of life that has caused the loss of many large bodies of digital
information including significant parts of the Viking Mars mission.

In the analog world, our efforts to preserve a work focus on that
work as an artifact as a physical object. However, magnetic media
deteriorates from the time it leaves the factory, and over time,
archi-vists had to abandon this perspective and instead to take a
concep-tual leap from preserving the material object to preserving
its infor-mational content.


139
Another part of the challenge is that digital media often becomes
obsolete or unusable mere years after introduction, superseded
by new media or new, incompatible formats. The past few
decades have witnessed the demise of numerous forms of digital
storage -- the 5-and-a-quarter-inch floppy, the 3-and-a-half-inch
floppy -- prompting the observation that digital information lasts
forever or five years, whichever comes first.

Sometimes the issue isnt hardware- or medium-related.

Because digital documents must be viewed by using the appropri-
ate software, they can be rendered useless when new and non-
backwards-incompatible software is introduced. When this hap-
pens, a bewildering and ever-changing collection of incompatible
document file formats must be translated back and forth, often
with annoying losses of format, structure, and even content -- or
else a standard format must be settled upon (like PDF in many
cases) to keep this from happening.

Simply put, analog-to-digital conversion means capturing an ana-
log signal in digital form. A bit more technically, its an electronic
process in which a continuously variable (analog) signal is
changed, without altering its essential content, into a multi-level
(digital) signal.

One common and well-known example form of this kind of conver-
sion for storage and preservation purposes is scanning a paper
document and converting it into a TIFF or JPG or PDF file that can
be stored on a hard drive, CD, or other digital media. Intelligent
character recognition (ICR) is another, used If the image involves
text.
The reasoning behind this task is that once digitized, a file is rela-
tively easily converted from one digital medium to another, with no
loss of quality, in a process known as refreshing.

Refreshing involves periodically moving a file from one physical
storage medium to another to get out from under the physical de-
cay or the obsolescence of the original medium -- a process likely
to be necessary many times over as the introduction of new tech-
nologies continues.

Along the same lines, digital migration involves periodically mov-ing
files from one file encoding format to another that is useable in a
more modern computing environment. A good example is the
moving of a Wordstar file to WordPerfect, then to Word 3.0, then to
Word 5.0, and so on, perhaps all the way to the latest, Word 2010.

Migration seeks to mitigate readability issues associated with hav-
ing files are encoded in a wide variety of older file formats by
gradually bringing them all into a limited number of contempo-rary
formats.

Digital emulation seeks to solve a similar problem, but its ap-
proach is to focus on the application software rather than on the
files. Emulation backers want to build software that mimics every
type of application that has ever been written for every type of file
format, and make them run on whatever the current computing en-
vironment is. So, for example, with the proper emulators, applica-
tions like Wordstar and Word 3.0 could be effectively run on to-
days machines, and those older files thus read and reused.




Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org



















































141
SECTION 6

E-Discovery


In This Section...

1. Practices and Processes

2. E-Discovery Web Capture,
Authentication, and Costs
3. ESI Asset Production and
Compilation





















Practices and Processes

eDiscovery is short for electronic discovery, which
is defined as the process of discovery in civil
litigation that is carried out in electronic for-mats. It
encompasses what most often is referred to as
electronically stored information, or ESI.





















Examples of the types of ESI included are e-
mails, instant messaging chats, documents, ac-
counting databases, CAD/CAM files, Web sites,
and any other electronic information that could
be relevant evidence in a lawsuit. Also included
in e-discovery are "raw data" and "metadata,"





142
which forensic investigators can review for hidden evidence.

As a practice, eDiscovery runs from the time a lawsuit is foresee-
able to the time the digital evidence is presented in court. At a high
level, the process is as follows:

Data is identified as relevant by attorneys and placed on legal
hold.
Attorneys from both sides determine scope of discovery, identify
the relevant ESI, and make eDiscovery requests and
challenges. Search parameters can be negotiated with an
opposing counsel or auditor to identify what is being searched
and to ensure needed evidence is identified and non-evidence is
screened out, thereby reducing the overall effort required to
search, review, and produce it.

Evidence is then extracted and analyzed using digital forensic
procedures, and is usually converted into PDF or TIFF form for
use in court. It often can be advantageous to use pattern and
trend identification and other analytical search techniques here
so these tasks can be performed more efficiently and make
less use of expensive human resources.

The first stages are Information Management and Identification.
These involve getting your electronic house in order so you can
mitigate risk and expenses should e-discovery become an issue
anywhere from the initial creation of electronically-stored informa-
tion through its final disposition. For example, accurate, well-
managed metadata means faster searching, as well as defensible
audit trails, and improved security and access controls. The better
the information management practices, the less data there will be
to sift through, including records past their retention date, un-
tracked copies of documents, and misclassified documents.

Next come Preservation and Collection, to ensure that ESI is pro-
tected against inappropriate alteration or destruction, and can be
gathered for further use in the e-discovery process (processing,
re-view, etc.).

The next segment involves Processing, Review, and Analysis.
These steps are aimed at reducing the volume of ESI and convert-
ing it, if necessary, to forms more suitable for review and analysis;
evaluating it for relevance and privilege, and evaluating it for con-
tent and context, including key patterns, topics, people and discus-
sions.

Last but by no means least come Production and Presentation.
These are engaged in delivering ESI to others in appropriate forms
and using appropriate delivery mechanisms, as well as displaying
ESI before audiences (at depositions, hearings, trials, etc.), espe-
cially in native and near-native forms, to elicit further information,
validate existing facts or positions, or persuade an audience.

Rounding out the picture, here is an eDiscovery Maturity Model,
which documents the evolution of organizational eDiscovery strat-
egy used to respond to litigation or regulatory demands. It has the
standard five levels that range from "ad hoc and chaotic" at the
early stages to degrees of "optimizing" at the more mature stages.
Besides gauging the different levels of process maturity,
movement through the levels also represents the acceptance and
incorporation of eDiscovery as a necessary business process.



E-Discovery Web Capture, Authentication, and Costs

In the course of analyzing Web sites and pages during eDiscovery,
it often becomes necessary to preserve important Web-based evi-
dence like site text that manifests trademark infringement to

site photos related to copyright infringement or site files that dem-
onstrate security breaches have occurred. It may also become neces-
sary to analyze, after the fact, the way certain Web pages and

sites appeared on a certain date and time, as litigation
procedures may require that certain Web site manifestations be
presented as evidence in Court.

Web spider and crawler software programs are used in eDiscovery
to "mirror" Web pages and reproduce whole sites for a given date
and time. Most spider and crawler "capture" programs work the
same way: you provide a starting URL, presumably either the front
page of the site or a deep link for the main offending page, and
then you tell the program how many levels deep of the site or third-
party sites you want to spider and capture.

Be aware, though, when you input your capture criteria and start
the spidering software that the required amount of memory grows
exponentially based on decisions regarding levels to crawl, linked
pages to store, and whether to follow links to third-party sites and
servers.

Spidering tools, like so many, come in different flavors. At their
simplest, getting access to historical manifestations of Web pages
and Web sites can be had by searching the Google cache or use
the Wayback machine. The more complex of them, though, allow for
much more accurate page and site capture and are particularly
useful for copying a large number of Web pages and sites over a

long period of time, and fully automating the process -- up to and
including the periodic programmatic review of the target Web sites
for changes and subsequent copies.

These programs use specialized methods of Web spidering and
capture that preserve the integrity of the original site and thus may
reduce or mitigate attacks on evidence admissibility. Some of
these methods include maintaining actual filenames, server
directory structure, and Unix compatibility. These tools cannot
mine source code, however; obtaining the original Web source
code may re-quire the use of litigation methods like subpoenas
and document requests.

The goal in all of this, of course, as it is with any evidence, is to lay
a proper foundation to make a case. Early court decisions required
that authentication of digital documents call "for a more compre-
hensive foundation." [US v. Scholle, 553 F.2d 1109 (8th Cir.
1976)]. But as courts became more familiar with these sorts of
materials, they backed away from the higher standard and have
since held that "computer data compilations should be treated
as any other record." [US v. Vela, 673 F.2d 86, 90 (5th Cir. 1982)]

A common attack on digital evidence is that digital media can be
easily altered. However, in 2002, a U.S. court ruled that "the fact
that it is possible to alter data contained in a computer is plainly
insufficient to establish untrustworthiness." [(US v. Bonallo, 858 F.
2d 1427 - 1988 - Court of Appeals, 9th)] Nevertheless, the "more
comprehensive" foundation remains good practice, and The Ameri-
can Law Reports lists a number ways to establish this basis.
Specifi-cally, it suggests that the proponent demonstrate:

the reliability of the computer equipment,



144
the manner in which the basic data was initially entered,

the measures taken to insure the accuracy of the data as entered,

the method of storing the data and the precautions taken to pre-
vent its loss,
the reliability of the computer programs used to process the
data, and
the measures taken to verify the accuracy of the program

At the end of the day, the operating principle in force is that of the
Best Evidence Rule, which pertains only to documents and states
that to prove the contents of a writing, recording, or photograph,
the original document must be produced, provided its unavailabil-
ity is not the fault of the party trying to authenticate the contents.

If the original cannot be located, the author can validate its con-
tents through sworn testimony. Either that, or the person who read
the writing, listened to the recording, or viewed the photograph
may testify as to its content. However, if the best evidence isnt
available, then federal and state rules of evidence usually permit
the use of a mechanical, electronic, or other similarly produced
fac-simile instead of the original.

Now, heres the tricky part: a "digital document" is, in actuality, a
set of code-based document descriptors that materially exist as
magnetic impulses on a hard drive which, when viewed with the
right document authoring software, can qualify as legal evidence.
Absent the authoring software, those impulses cannot qualify as a
document. Likewise, they dont qualify as evidence in the court-
room -- unless a surviving paper copy, or a digital copy in "living"
media format, can be discovered and authenticated.

The lesson? Always be sure you have software available to read
the document -- and/or, as counterintuitive as it sounds, a hard
copy backup. Because all this work comes at a cost, its important
to un-derstand what you may be getting into before you get into it
(which is not a bad way to justify an eDiscovery solution).

Collectively, the task of electronic data discovery (EDD) includes
loading the data onto an EDD platform, extracting and "flattening"
attachments and embedded files, converting the data to a readable
format, reducing the volume of the data by culling it down based
upon key word filters, de-duping it, and then creating a load file to
enable the data to be loaded into an electronic review software.

In 2007, EDD processing costs were $1,500 - $2,000 per Gigabyte
(GB); today, they have dropped to less than $1,500 per GB depend-
ing upon whether or not the clients request key word filtering, de-
duplication, or other specialized processing. Some service provid-ers
and vendors are even charging less than $1,000 per GB for what
they are calling "quick peek" EDD processing, which is nothing more
than flattening and converting the Electronically Stored Infor-mation
(ESI) into a format that can be read by one of the electronic
document review platforms. Hourly rates for manual document review
range from $250-$500 per hour or higher, but if the labor is
outsourced, rates drop to as low as $35 per hour.

ESI Asset Production and Compilation

The Federal Rules of Civil Procedure (FRCP) allow for the discov-
ery of "electronically stored information" (ESI), which is meant to
cover any data that is stored in electronic form, including data-


145

bases, emails, and information on mobile devices, as well as
tradi-tional categories of "documents" and "things."

As the law is written, the default ESI forms are those (1) "in which
[the electronically stored information] is ordinarily maintained" or

(2) "that are reasonably usable." [Source: Federal Rules of Civil
Pro-cedure]

ESI production requirements for audits are governed by the re-
cords retention policies that control the requirements of the busi-
ness at hand. For example, financial audit ESI production for a pub-
lic corporation would be governed by IRS rules, USA Patriot ACT
rules, Sarbanes Oxley rules, and then the retention requirements of
the industry involved, such as banking, healthcare, or stock broker-
age.

Information subject to legal hold is also on the list to produce, legal
hold being the process by which an organization preserves all
forms of relevant information when litigation is reasonably antici-
pated.

Meeting all the preceding criteria requires knowing what ESI as-sets
you have and where they are located. Any such inventory should
encompass servers, applications, backup/archival hold-ings,
workstation hard drives, external hard drives, CD-ROMs, DVDs,
floppy disks, flash drives, backup tapes, backup hard drives, cellular
and smart phones, social networking sites, voice-mailboxes, and
other types of digital media, plus assets stored off-site and in the
"cloud." Since ESI asset requirements may vary by country, by
industry, and by company, creating a list of ESI assets depends
upon pursuing a methodology suited for the situation. The following
general procedure can be used as a starting point.
1 -- Obtain a simple overview as to how data is managed within the
corpo-rate structure, and ask questions like:

How is the collection of data managed internally?

Who is collecting the data?

Is it self-collection or is it managed by an outside partner?

What types of reporting are available regarding the data?

2 -- Identify data mapping and chain of custody procedures
within the company.

3 -- Review how data is handled on a day to day basis by the
business unit and managed by the IT services organization.

4 -- Determine from the records manager how the records
management program is handled and how the electronic document
records manage-ment process is affected by a litigation hold. Ask:

What are the regulatory compliance requirements in force, and
how are they met?
What type of audit trail or chain of custody is in place as part of
the day-to-day business activities?

5 -- Finally, find out how data is managed in overseas subsidiaries.

What safeguards are in place to collect data from these locations?

Can data be transferred across borders pursuant to US Depart-
ment of Commerce Safe Harbor or other criteria?

This is a particularly thorny issue since each country has its own
set of rules and regulations to be abided by.

146

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org
















































147
Architecture and Systems

























































cxlviii
SECTION 1

Information Architecture


In This Section...

1. User Experience
and Personalization
2. Information Architecture
Fundamentals
3. Content Organization and
Classification
4. Information Relationship
Building
5. Conducting a Content and
Metadata Audit
6. Web Site and
Social Navigation







User Experience and Personalization

Information Architecture Institute defines infor-
mation architecture (IA) as "the structural design
of shared information environments," a succinct
phrase that encompasses the organizing and la-
beling Web sites, intranets, online communities,






































and software to support usability and findability.
Among the elements involved are:

Classification schemas

Metadata management

Navigation systems, and





150
Labeling to support the user experience

IA consultant and guru Louis Rosenfeld distills IAs primary objec-
tive as "balancing the characteristics and needs of users, content,
and context," explaining that, among other things, it can "make
work easier and save money for individual business units; and im-
prove the user experience and build brand loyalty among custom-
ers, and organizational loyalty among employees.

Achieving these goals boils down to maximizing the comfort and
usability of information systems design, accessibility, and presen-
tation -- or in more task-oriented terms,

Designing or significantly redesigning user interfaces to make
navigation more intuitive
Removing obstacles to allowing users to get at the information
they need to do their jobs (while still controlling access to it), re-
ducing time spent and costs associated with asking IT for help,
and

Facilitating knowledge-sharing by improving information
search, retrieval, and display, as well as repository
interconnec-tivity

A big part of this is personalizing the experience for each user,
ei-ther by role, by device, by location, by behavior -- or all of the
above.

Who someone is, what responsibilities she has, what she is using
as an access device, from where she is using it, and what she does
with the information she consumes all contribute to how ulti-mately
useful and satisfying the system is to her. A good architec-
ture will support variations in each of these areas, so, for instance,
the same person can come away equally fulfilled whether using a
PC or a smartphone, in the office or on the road, and utilizing inter-
faces of his own configuration to support his own particular needs.

Information Architecture Fundamentals

One of the most critical components of information architecture is
metadata, which is "data about data" --labels, or tags, stored
either within or externally from content, that describes that content
for the purpose of identifying and organizing it. The idea behind it
is to facilitate search and retrieval by providing relevant "hooks" to
latch onto when looking for particular pieces of information, and
no useful information management system can exist without it.

Another critical component is access permission schema, which
pri-marily are anti-trespassing devices designed to keep out every
po-tential user of the system except for those allowed to come in.
Most often, access permission these days involves one or more of
pass-word protection, directory services, and access control lists.

Workflow is yet another fundamental building block, as it auto-
mates the movement of information through an organization. More
often than not, it allows tasks to be carried out in parallel and
manages multiple concurrent processes, saving time and increas-
ing productivity. Exceptions and conditions are accommodated by
applying user-defined rules. Workflow also generally includes a
graphical process designer with which users can chart and refine
the way they want their processes to flow, to whom, and according
to what parameters.

For maximum effectiveness, each of the preceding should be
con-structed with a certain logical organization in mind, be it:

151
Alphabetical

Hierarchical

Chronological

Geographical

Or some combination thereof.

Each in its own way imposes an essential underlying organization
to the content and tasks in question, and provide a means for re-
flecting any issues that are especially important to a particular
group.

Content Organization and Classification

Virtually everything associated with information management be-
gins with organizing and classifying content so it can be found and
leveraged -- hard work that is absolutely vital to solution success
because without it, all you have built is an enormous "bit bucket"
into which everything is thrown and nothing readily emerges.

This work begins with the development or refinement of a taxon-
omy: a hierarchical representation of a body of information based
on identified categories or labels. According to Patrick Lambes
2007 work "Organizing Knowledge," there are seven different op-
tions available for creating one: lists, trees, hierarchies,
polyhierar-chies, matrices, facets, and system maps.

A list is the simplest representation of a taxonomy. It is particularly
useful when the domain is simple and the amount of content is
small. Organizing telephone numbers by country-code is an exam-
ple of a list.
Trees provide an implied relationship between categories and
sub-categories -- the "branches" of the tree structure -- and are
useful when a list gets to be too long and can be broken into
natural sec-tions. The listings and categories of a yellow pages
directory is an example of this.

Hierarchies are more formal and less flexible than trees in that
ele-ments can appear in only one place. Examples include
military rankings and org charts.

Polyhierarchies are used when an item belongs in more than one
place in the real world, and multiple organizing principles are re-
quired. This provides "virtual linking" between hierarchies. An ex-
ample is a single collection of content concerning diseases, which
can be organized by affected body part and cause.



AIIM provides deep dive training - both

in person and online - in such areas as ECM,
ERM, SharePoint, Taxonomy and Metadata,
Social Media Governance and others.



Matrices are used to organize multiple individual taxonomies that
intersect. For instance, IT project management includes informa-
tion about the project staff, supporting documents, and stages of
implementation. A matrix taxonomy would allows users to navi-gate
by "jumping" from one taxonomy to another and expose differ-ent
kinds of information related to the original query.



Facets are multi-dimensional taxonomies comprised of multiple
tags that each represent an individual taxonomy. Thus the content
is categorized in multiple ways, within a single interface. One good
example is represented by the ability to select wines based on
characteristics such as type, price, varietal, region, and price.

System maps are visual representations of a domain of knowledge
that are labeled with relevant categories like a diagram of the hu-
man body that points the way to medical content about the human
nervous system.

Which will prove most effective depends on a number of factors,
including organization size, breadth of markets served, degree of
regulation, etc. For example, large or complex enterprises with
many operations may benefit from more structure rather than less
in order to manage and reconcile the various terminologies. But
even if you have a taxonomy you already like, changes in your or-
ganization can necessitate a "rethink" at any time -- such as after
setting up a new division, buying another company, or being
bought by one.

Whatever your particular case, bear in mind that the goal is to
strike a workable balance between how strictly your terms are de-
fined and locked into place -- a controlled vocabulary -- and how
well your search and retrieval works. For instance, unregimented
types of taxonomies like folksonomies and social tagging may be
highly populist but can make it hard to develop the standard defi-
nitions needed support effective findability -- not in the least be-
cause you can use standard terminology to force users to select
from vocabulary values that youve chosen for them, and pre-
sented perhaps in a drop-down menu.
Definitionally, you should know that:

Folksonomies are combinations of individuals view of how things
should be organized. The result of collaborative and personalized
approaches to tagging content, they are largely unregulated -- but
curiously enough, they often produce clusters of tags that commu-
nities can then rally around.

Social tagging is end user tagging thats usually Web-based.
Gener-ally enabled by very simple, free-form interfaces, it usually
sup-ports the leaving of feedback by other users, and lets people
tag and retag content as they desire. Over time, this can lead to
the emergence of new categories that were built by consensus.

Ontologies and topic maps occupy the other extreme because they
are highly structured, not just to minimize opportunities for seman-
tic variation, but also to help analyze domain knowledge, as youll
see in a second. They are more common in the academic and
scien-tific communities than in general business, mostly because
their re-liance on definition and structure more makes the rigor of
develop-ment more justifiable.

An ontology is the explicit specification of the relationships be-
tween concepts. More even than an ultra-controlled vocabulary, it
represents an entire domain of knowledge, applying rules that not
only defines terms, but also the relationships between them. The
classic example is that of a salad, the ontology of which would in-
clude everything up to and including the growers, the ingredients,
the rodents that eat them in the field, and how a salad is different
in Japan versus Italy.



153
A topic map is a visual representation of a knowledge domain and
thus is a type of ontology. In the diagram shown, topics are de-
picted as peach-colored ovals, the relationships between them as
purple lines, and specific occurrences as green circles. As you can
see, while a single topic map can contain a large number of items,
it also can show you how many there are, and how the pieces fit
to-gether.

One thing all these alternatives have in common is a basis, in one
way or another, in metadata, which is "data about data" -- labels,
or tags, stored either within or externally from content, that
describes that content for the purpose of identifying, organizing,
and retriev-ing it.

Structurally, metadatas basic unit is the statement, which consists of
a property -- like "color" -- and a value -- say, "blue." A statement
also describes resources that can be used by content technologies
such as a search engine, that would be instructed to find all
products with a property equal to "color" and value of "blue."

How effective your metadata is depends upon how unified a vo-
cabulary you use -- and since developing and agreeing upon that
vocabulary can be highly subjective and enormously challenging, a
number of formal standards have been promulgated to use as con-
textual touchstones. A few of the more prominent are shown here,
and are also discussed elsewhere in this Course.

Information Relationship Building

A thesaurus is a file that manages and tracks the definition of
words and phrases and their relationships to one another, in a hier-
archical fashion. Ranging far beyond simple antonyms and syno-
nyms , it also includes comparisons like "equal to," "related to," and
"opposite of," and it is critical to ensuring a correlation can be
made between the taxonomies and metadata of every repository,
business unit, or functional group touched by an information solu-
tion.

Semantic networks are functionally similar to thesauri but operate
on a higher conceptual plane. For example, in the context of a
"salad," a semantic-network-based system would understand that
content about mesclun greens, endive, and radicchio has some-
thing in common with content about lettuce, and it would use a
metadata-based infrastructure to unlock these particular secrets.

Relational knowledge representation essentially is nothing but a
fancy phrase that means "presenting comparisons." From an infor-
mation management perspective, database tables often do the trick
by systematically setting out in columns each fact about a set of ob-
jects. A simple example is shown here, and paves the way people --
or knowledge systems -- to answer such questions as Who is still
alive? Who plays jazz? Who plays the trumpet? and so forth.

Once information relationships have been established,
technology can be effectively utilized to automate content
extraction, descrip-tion, and classification, and thereby make
indexing, and thus re-trieval, not only more efficient, but also
more reliable and consis-tent. For example,

Auto-classification software identifies documents by matching
their observed (or calculated) characteristics against a prede-
fined list of descriptors.

Auto-categorization sounds a similar theme but differs in that its
list of descriptors can be built from the documents being ana-


lyzed, rather than having to use a predetermined set of values.
The two often work together -- and in fact live within the same
engines -- and thus are frequently treated as synonyms.

Operating at a level below the document, entity extraction in-
volves plucking key words and descriptors directly from the in-
formation itself, not only improving keyword searching but also
opening the door to semantic networking. Examples include
OCR, ICR, OMR, bar coding, and forms processing -- all of
which are explored in this Courses module on Capture.

And then theres summarization, which can be used to quickly
identify the key topics of a document while eliminating redun-
dant information.

Conducting a Content and Metadata Audit

A big part of developing or improving an information architecture is
understanding the breadth and depth of the organizations exist-
ing content and metadata. This helps to determine the size and
scope of the effort and is achieved by conducting an audit of the
current state.

Any audit worth its salt will touch all the stakeholder communities
associated with the solution. These are people who have a valid in-
terest in the process, whether or not they are affected directly by it
or even work for the organization itself. They may include the likes
of:

Senior management

Business unit managers

Legal staff
Records managers

IT personnel

End users

Business partners

Investors

Identifying who these stakeholders are frequently involves a study
of the organizational chart, from which much often can be derived
about the kinds of content that may be in play. For example,
depart-ments called "finance" or "payables" -- and executives like
"CFO" -- clearly will deal in lots of financial data, and information
about suppliers, clients, and perhaps investors. So even a quick
review can provide pointers in the right direction.

Charting how information flows throughout the organization is also
important in order to guard against the development of bottle-
necks or extraneous flows -- and to facilitate improvements in effi-
ciency -- that might be highlighted by using the content and its me-
tadata as navigational aids.

Take, for example, a sales managers end-of-quarter report. Work-
ing with her team, she consolidates her groups results and fore-
casts into a single document, and sends it to her boss -- who then
consolidates it with his other financials to pass further upstairs. At
each stage, content is added or amended, and metadata about
who did what, when, and what content was involved can be
instrumen-tal in determining new ways to stage, process, and track
informa-tion going forward.




155
Policies are yet another source of insight into how information is or
should be architected because they generally address how that in-
formation is supposed to be used. Records managers are prime
can-didates for research in this regard because many information
poli-cies begin and end with them.

Understanding what the rules are makes it more clear when they
arent followed, and easier to identify the types of information that
may be more prone to misuse, and under what circumstances.
This in turn enables the embedding of appropriate governance
safe-guards in any new architecture.

Web Site and Social Navigation

Navigation means moving through a corpus of information by trav-
ersing directories or following hypertext links. Typically utilizing a
graphical user interface (GUI), it puts in the users hands the meta-
data or taxonomy you developed to organize the body of content.

Jared Spool, a noted interface expert at UIE.com, noted the
impor-tance of the "scent of information" in the navigation
paradigm, ex-plaining that users seeking essentially are on the
hunt. They pick up a scent, follow it, and keep pursuing it until
they get what theyre after.

Good navigation enhances this information scent, and there are
a number of principles to adhere to in order to make that scent
as strong as possible.

In fact, how best to design a Website navigation scheme is one of
those subjects about which everyone seems to have an opinion.
However, here are a few best-practices that seem to be fairly univer-
sal, presented here by way of Chicago-based Jessica Rosengard
De-signs, on whose advice they are based.

Naming: every nav button or link should have a short name,
preferably one word only.
Organization: sites with many pages (say, more than 7 or 8),
may be well advised to group those pages into categories and
navi-gated via drop down menus. Remember: cleaner interfaces
make for easier finding and better user experiences.

Drop Down Menus: Less is more. Someone rolling over four but-
tons and five levels deep into your navigation inevitably will slip
or misfire with the mouse, and watch with aggravation as
everything disappears. Unless theyre truly committed, chances
are theyll abandon ship and surf off to someplace else.

Functionality: Navigation should be somewhat interactive in that
mouse rollovers and button-clicks should cause something to
happen on the screen, like a text color change or graphical in-
dentation, to keep the user engaged and moving forward.

Simplicity: Flashing stars, sounds, and dancing bears are com-
pletely unnecessary. Stick instead to simple state changes (of
color, size, text decoration, etc.) so as not to distract from the
busi-ness at hand.

Location: Website navigation is generally expected to be in one of
three places: at the top of the page, or on either the left or right
sidebar (as many blogs do). Placing it anywhere else risks hav-ing
it go unseen, and including it in more than one place is a re-
dundancy that should be avoided because it results in confusion,




156
not clarity. Also, the primary/top-level nav information should
be placed consistently throughout the site for the same reason.

Size: navigation buttons should stand out from the rest of the
page content so they dont visually get lost.
Structure: Ideally, users should be able to get to any page in the
site from any page in the site. This may not be strictly practical
with the very largest of sites, but the principle is worthy of con-
stant consideration nevertheless.

Social navigation is a bit different, since it has little to do with mov-
ing through a Website and lots to do with moving through and par-
ticipating in communities on the Web itself.

In 1994, Paul Dourish and Matthew Chalmers of the Rank Xerox
Research Centre in Cambridge, England delivered a paper at the
HCI conference in which they wrote that "in social navigation,
movement from one item to another is provoked as an artifact of
the activity of another or a group of others. So, moving towards a
cluster of other people, or selecting objects because others have
been examining them would both be examples of social naviga-
tion."

Social navigation is the whole point behind such services as Goo-
gle+, whose "+1" button is all about alerting, attracting, or finding
like-minded people on the Web when noteworthy pieces of infor-
mation are identified. Facebooks "Like" button serves a similar pur-
pose, as do the scores and reviews on sites like Amazon.com and
Foursquare.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org







157
SECTION 2

Technical Architecture


In This Section...

1. Implementation Models and
Backup
2. Pilots, System Audits and
Server Responsibility
3. Virtualization
4. Consumer Technology
and Organizational
Architectures
















Implementation Models and Backup

Technical architecture is a broad term that ap-
plies to the analyzing and modeling of a technol-
ogy infrastructure. Also known as enterprise ar-
chitecture, it can denote coverage of both an en-
tire enterprise, including all of its information sys-
tems, and/or a specific domain within the enter-






































prise, encompassing the domains of business,
ap-plications, and information as well as
infrastruc-ture.

Together, these Pillars of Architecture Terminol-
ogy -- so called by Forrester Researchs Randy
Heffner -- embody a logical progression of archi-



158
tectural thought from designing the business to delivering busi-
ness technology solutions.

Business architecture relates to the structure and behavior of a
busi-ness system (not necessarily related to computers), and
covers busi-ness goals, business functions or capabilities, business
processes and roles, etc. Business functions and processes are
often mapped to the applications and data they need.

Data architecture relates to the data structures used by a business
and/or its applications. It includes descriptions of data in storage
and in motion, as well as data stores, data groups, and data items,
and maps those data artifacts to data qualities, applications, loca-
tions, and the likes.

Applications architecture related to the structure and behavior of
the applications used in a business, focusing on how they interact
with each other and with users. Centered on the data consumed
and produced by applications, rather than their internal structure, it
usually maps applications to business functions and to applica-tion
platform technologies. This is software architecture at the low-est
level of granularity.

Infrastructure architecture relates to the structure and behavior of
the technology infrastructure itself, covering the client and server
nodes of the hardware configuration, the infrastructure applica-
tions that run on them, the infrastructure services they offer to ap-
plications, and the protocols and networks that connect applica-
tions and nodes.
Technical architecture embraces multiple systems and multiple
functional groups within the enterprise, and can be viewed in two
ways:

As a formal description of an enterprise system, with a detailed
plan at component level to guide its implementation, and
As the structure of enterprise components, their interrelation-
ships, and the principles and guidelines governing their design
and evolution over time

Multiple models may exist within a single infrastructure, and may
include the following:

Desktop/standalone systems, which encompass desktop or lap-
top computers that are used on their own and have no need of a
network connection. These are considered stand-alone even if
they are actually connected, as long as the connection is not
man-datory for general use.

A plug-in is a hardware or software module that adds a specific
feature or service to a larger system. The idea is that the new
component simply plugs in without much fuss, muss, program-
ming, or configuration.

A computer appliance is a separate and discrete hardware device
designed to provide a specific computing resource. Containing
integrated software (called firmware), it stands as a "turn-key"
solution to a particular problem, and is called an "appliance" be-
cause its "closed and sealed" nature, with no user-serviceable
parts, evokes the character of most household appliances.






159
Client/server computing involves a relationship between two
computers, in which one, the client, requests a service from the
other, the server, which fulfills the request. This model provides
an efficient way to provide distributed users access to centrally
hosted and managed applications.

N-tier computing is similar to client/server but involves three or
more separate computers in a distributed network. Meaning
some number of tiers, the most common form is the 3-tier appli-
cation, in which user interface programming is on the users com-
puter, business logic is in a more centralized server, and needed
data lives in a database. N-tier computing is at the center of most
enterprise architecture projects.

Widgets are generic software applications comprising portable
code intended for one or more different platforms. The term of-
ten implies that either the application, user interface, or both, are
light, meaning relatively simple and easy to use. Such implemen-
tations are exemplified by desk accessories or applets, as op-
posed to more complete software packages like spreadsheets or
word processors.

Software as a Service, or SaaS, is a new model for software
con-sumption and distribution of IT services over the Internet. In
it, applications are hosted by a vendor or service provider and
made available remotely to customers over a network, typically
the Internet.

The most popular form today is known as cloud computing, which
has come to mean nearly any remote, Internet-based com-puting
model in which shared resources, software, and informa-tion are
provided online to PCs and other client devices on de-
mand. It is especially attractive to companies that want to con-
trol their hardware, software, development, and maintenance ex-
penses -- but it is worrying to those concerned about housing
their data behind someone elses firewall, and less than ideal for
organizations needing something a bit different than everybody
else using the same service.

Systems integration is the process of tying solutions together.
Rarely a simple chore, it may involve hardware and software, and
may require the coupling of existing or legacy systems with signifi-
cant new applications or functionality. Systems integrators are indi-
viduals or businesses that do this for a living, especially where
combining hardware and software products from multiple vendors
is concerned.



AIIM provides deep dive training - both

in person and online - in such areas as ECM,
ERM, SharePoint, Taxonomy and Metadata,
Social Media Governance and others.



A sandbox is a solution-testing environment that isolates untested
code changes and outright experimentation from the production
environment or repository. This protects "live" servers and their
data, vetted source code distributions, and other collections of
code, data and/or content, from possibly damaging and irreversi-
ble changes.

In a Web development environment, sandboxing tends to concen-
trate on ensuring that changes appear and function as intended be-

fore being merged into the master copy of the pages, scripts, text,
and other components that are actually being served to the real,
public user base. Here, sandboxes are more commonly known as
"test servers" or "development servers," on which each developer
typically has an instance of the site and can alter and test it at a
par-ticular hostname, directory path, or data port.

Backup must be part of any technical architecture to ensure copies
of data are made should it become necessary to restore the original
information after some kind of corruption or disaster. This is the
practices primary purpose, in fact. Backup is also used, however, to
recover data from an earlier time, according to your data reten-tion
policy. Since a backup system contains at least one copy of all data
worth saving, the storage requirements can be considerable.

Many different techniques have been developed to optimize the
backup procedure, including optimizations for dealing with open
files and live data sources, compression, encryption, and de-
duplication. Many organizations and individuals work to define
measurements, validation techniques, and testing procedures for
all this, which builds confidence in the process.

To minimize the specter of total data loss, there should be more
than one backup system in operation, preferably situated in a dif-
ferent location, thus combining safety with redundancy. [Wikipe-
dia]

Pilots, System Audits and Server Responsibility

Piloting is a systems development and implementation strategy un-
der which a new solution is fielded on a test basis so you can see
how it works and identify needed changes well prior to full-scale
deployment. In most cases, the new system is rolled out to only a
subset of the organization and run for a meaningful period of time
in order to:

Evaluate the usefulness and usability of the system

Improve the systems design based on user feedback, practical
use experience, and observed results (using so-called
"objective measures" like productivity or quality data)

Identify necessary or desirable changes to the organization or
the processes in which the system will be embedded
Provide input to implementation strategies and plans, based on
users reactions to the pilot

Before the pilot implementation is fielded, is an information tech-
nology audit, or information systems audit, should be conducted to
examine the management controls within the IT infratructure for
their effectiveness in safeguarding assets, maintaining data integ-
rity, and operating effectively to achieve the organizations goals or
objectives. These reviews may be performed in conjunction with a
financial statement audit, internal audit, or other form of attesta-tion
engagement.

For such an audit to be successful, it is critical to understand who
is responsible for the servers in your organization. This sounds ob-
vious, for clearly there exist IT qualifications for server manage-
ment, but it can be surprisingly difficult to figure this out since line-
of-business units increasingly are purchasing and managing their
own without IT involvement.

One way to go about mapping the landscape is to take the responsi-
bilities a server manager is expected to have and use them as the


start of a logic trail that will lead you to the right people. Ask your-
self who is qualified to do things like:

Install any software on the server

Configure server to suit Websites needs

Monitor server performance of continuously

Monitor the databases and update them regularly

Constantly update the operating system

Constantly update anti-virus software and the firewall

Periodically back up the data on server

Regularly update server applications

Regularly check server applications for any failures

If a problem arises, provide effective and timely troubleshooting

Install and maintain security measures that prevent any intru-
sion
Perform periodical audits to monitor server security

Virtualization

Virtualization is the creation of a virtual (rather than actual) ver-
sion of a computing resource or environment, such as a hardware
platform, operating system, machine, storage device, or network
resource. Forms of virtualization include:
Presentation Virtualization, which isolates processing from the
graphics and input/output, making it possible for applications to
project their user interfaces remotely for user sessions.

User State Virtualization, a term invented by Microsoft to mean a
condition in which the "user state" is separated from the under-
lying Windows OS through the use of roaming profiles, plus
folder redirection with offline folder support.

Server Virtualization, which involves running applications in
separate, isolated partitions (separate "virtual machines") within
a single server.

Application Virtualization, which occurs where an application is
installed on a server and delivered to each users PC as needed.
[encyclopedia at PCmag.com]

Desktop virtualization, which allows users to remotely access their
desktop from any location and use it as if they were in front of their
actual computer. Virtualization offers advantages over operat-ing
individual units as each virtualization does not require its own
hardware, operating system, and software, and it can lower the
cost of deploying applications.

On the flip side, virtualization does have some drawbacks
associ-ated with it. Among them are these:

High risk of physical faults, because if the physical server host-
ing several virtual servers goes down, the failure will take all of
those virtual servers offline. The remedy, which includes hard-
ware redundancy, can increase costs significantly. Plus, it is diffi-
cult to find a disaster recovery solution that supports all the vari-
ous virtualization solutions out there.


162
Risk of performance loss, because the virtual environment layer
may not support a virtual operating system and virtual applica-
tions as efficiently as it does the real things.

Installing and running virtual objects requires specialize knowl-
edge. An expert in server hardware and software setup and con-
figuration may not be as capable when it comes to doing the
same things in a virtual environment because of the special
knowledge required to use the likes of VMware, Hyper-v, or Xen
server virtualization software.

Virtualization is not supported by all applications. Some core ap-
plications, including a few database applications, are not yet
ready, or are not 100% certified, for virtualization, and may not
behave properly when run that way.

Virtualization sprawl. Because virtual computing objects are
easy to clone and install, the number of virtual servers can
easily grow faster than the number of staff who are supposed to
man-age them, taxing both personnel and performance.

The deployment of new applications for use across the enterprise is
easily performed through varied combinations of application, op-
erating system, and server virtualization. Through virtualization
induced "containers," applications can be isolated from both the
hardware and one another, preventing configuration conflicts that
often complicate their introduction into IT systems.

Virtualization has transitioned into a mainstream technology in to-
day's datacenters and is widely used to increase hardware utiliza-
tion as well as lower server operational costs in the datacenter. On
the server side, virtualized infrastructures is a direct on-ramp into
cloud computing and often will be deployed as a private cloud.

Fundamentally, you see, if an array of virtualized computing ob-
jects with these properties are centralized on a server in a physical
location that is separate from the enterprise, and remote users
have access to that server, you have cloud computing!. This topic
is cov-ered elsewhere in this Course, but while were here, it may
be use-ful to know that there are at least five ways that
virtualization un-locks the door to cloud computing:

It enables economies of scale

It decouples users from implementation

It provides speed, flexibility, and agility

It breaks software pricing and licensing models, and

It enables and motivates departmental chargeback

Consumer Technology and Organizational Architectures

Consumer technology today is exerting powerful pressure on or-
ganizational IT to extend the enterprise beyond its walls and into
its executives hip pockets. The reason is that an increasingly mo-
bile and tech-savvy work force is demanding support for portable
data applications to run on the smartphones and tablets they use
in their personal lives -- a practice known as Bring Your Own
Device (BYOD) -- so they can enjoy more flexibility in when,
where, and how they work. Consequently, more consumers and
employees are accessing applications and services remotely using
those devices, and more business transactions are being
conducted that way than ever before.


163
As this "consumerization" of IT accelerates, issues around mobile
process control and security are bubbling to the fore as businesses
realize they must figure out how to integrate mobile applications
into their computing architectures. Here are but a few of the factors
to consider:

Security: Because the encroachment of consumer devices into
the workplace is being driven by the market and not by the
organiza-tion, new security problems have been introduced. Lost
phones and tablets, for instance, represent physical risks of data
breaches, and data encryption, if it exists at all, is often an after-
thought.

Compatibility and interoperability: Standards must be adopted,
independent of carrier and operating system, with respect to a
number of elements including data input/output, display abil-ity,
and power consumption. Such standards would allow the
continued use of older devices with newer solutions, and this, in
turn, would lessen the need for costly replacements and migra-
tions. This pain, in fact, coupled with the potential need to sup-
port dozens different devices and operating systems, is why so
many shops standardize on one vendor or configuration -- a
strategy that has the added benefit of being able to predict and
schedule less frequent, vendor-confirmed release cycles.

Third, theres optimization of Web services: Enterprises must rap-
idly transform business service Web interfaces as mobile devices
are used to access organizational Websites, line-of-business sys-
tems, and perhaps even legacy applications. The use of a Web-
based application architecture facilitates standardization that en-
ables device-independence for enterprise users. But at the same
time, IT also must step up its identity and access control strate-
gies and figure out ways to accommodate unpredictable Internet
and cell phone connect quality.

Workers who bring in their own devices to the office are both sav-
vier about how their device works -- or at least they think they are
-- and have more psychological ownership of them. So besides
de-manding BYOD rights, they also want to self-provision their
mo-bile apps based on recommendations gleaned from their
social tools and personal contacts.

This has led to the "IT-ization of the consumer," a counter-trend in
which BYOD employees increasingly are trying to solve technical
problems on their own instead of calling IT for help. In response,
corporations are starting to create dedicated teams within IT to im-
plement a "BYOD policy" that helps employees better manage
their devices -- thereby lowering IT costs while ensuring network
secu-rity.

Along these same lines, app stores are now making their way into
the enterprise the way they already have infiltrated our homes.
Just as the Apple App Store has enabled consumers to download
and update the software on their iPhones and iPads, so will the
likes of the new Windows 8 store do the same for PCs in the
enter-prise.

One of the keys here that any such resource must have a "discover-
ability" feature built-in that will allow self-provisioned employee
devices to automatically find and secure the apps in their sights, and
thus take some of the burden off of IT. But IT still must be in-volved
to limit the choices available to BYOD users, as well as to ensure
that work done on a BYOD device is subjected to the same


164
ownership/records/ediscovery controls as work done on an
organization-issued machine.

Theres little doubt that portable data applications have only just
begun to transform the way work gets done and information gets
managed by putting systems literally in peoples pockets if they so
desire. Thanks to the tremendous push of consumer technology,
business users have figured out that they can operate as
effectively, or nearly so, away from their desks as they can when
parked be-hind them.

Even the need for network connectivity isnt the requirement it once
was, as apps are being developed to run natively on pads and tablets
even when offline. Coupled with the growing ability to add and
manage apps without IT involvement, users are feeling more free
and empowered by the day, suggesting that mobile devices are
bringing back everything that was personal about the PC. In some
ways just another computing client, their portability and capabili-ties
clearly set them apart and beg extra and specialized attention.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org












165
SECTION 3

Cloud Computing


In This Section...

1. Cloud Computing































Cloud Computing

The term "cloud computing" derives its name
from the cloud-like shapes used in old diagrams
to represent the telephone network, and later to
depict the Internet.

Often used as a metaphor for the entire Internet,
it is defined generally by the U.S. National Insti-































tute of Standards and Technology (NIST) as "a
model for enabling ubiquitous, convenient, on-
demand network access to a shared pool of con-
figurable computing resources -- networks, serv-
ers, storage, applications, and services -- that can
be rapidly provisioned and released with mini-mal
management effort or service provider inter-

166
action." As such, it is a major shift away from the mainframe and
client-server computing models it replaces.

As a matter of practical fact, the cloud is very much akin to the
time-sharing, service bureau, and Application Service Provider of-
ferings of yore. Whats different is the new IT services model that it
embodies. Fundamentally, there are four variations on the theme
as follows:

A private cloud is a cloud infrastructure provisioned for exclu-
sive use by a single organization comprising multiple consumers
(e.g., business units). It may be owned, managed, and operated
by the organization, a third party, or some combination, and it
may exist on- or off-premises.

A community cloud involves a cloud infrastructure provisioned
for exclusive use by a specific community of consumers from or-
ganizations that have shared concerns (e.g., mission, security
re-quirements, policy, and compliance considerations). It may
be owned, managed, and operated by one or more of the
organiza-tions in the community, a third party, or some
combination, and it, too, may exist on- or off-premises.

A public cloud infrastructure is provisioned for open use by the
general public. It may be owned, managed, and operated by a
business, academic, or government organization, or some
combi-nation. It exists on the premises of the cloud provider.

A hybrid cloud is a composition of two or more distinct cloud in-
frastructures (private, community, or public) that remain unique
entities, but are bound together by standardized or proprietary
technology that enables data and application portability (e.g.,
cloud bursting for load balancing between clouds).

Cloud computing services come in three essential varieties, all of
which are aimed at relieving organizations of many of the technical
and financial burdens associated with IT development, manage-
ment, and use.

Cloud application services, or "Software as a Service (SaaS)" offer-
ings, deliver applications, well, as a service over the Internet,
eliminating the need to install and run them on an organizations
own computers, thereby simplifying maintenance and support.

Cloud infrastructure services, or "Infrastructure as a Service"
(IaaS), deliver computing infrastructures -- typically platform vir-
tualization environments -- as a fully outsourced services, along
with raw (block) storage and networking.

Cloud platform services, or "Platform as a Service (PaaS)," de-
liver computing platforms or solution stacks as a service, often
consuming cloud infrastructure and sustaining cloud
applications. These facilitate deployment of applications without
the cost and complexity of buying and managing the underlying
hardware and software layers.

These models are all built atop several value propositions that all
potentially conspire to make an IT managers life much easier --
even if they come at the expense of having to use generally stan-
dard and non-customizable services. In a nutshell, they boil down
to these, as articulated by NIST and advisory firm MWD Advisors:

Third-party ownership. As a new form of outsourcing, cloud
computing lets customers trying to focus the allocation of scarce

167
capital resources on their core businesses move IT infrastructure
off their balance sheet. The cloud provider owns not only

the IT infrastructure, but IT management responsibilities as well.
Software upgrades, data backups, and the countless other tasks
required to manage mission-critical business applications on a
day-to-day basis are the third-partys responsibility, and are gov-
erned by a well-defined Service Level Agreement.

Measured service. Cloud computing systems automatically con-trol
and optimize resource use by leveraging a metering capabil-ity at
some level of abstraction appropriate to the type of service (storage,
processing, bandwidth, active user accounts, etc.). Like a utility,
customers consume computing and storage services on demand
and pay for them as operating expenses, instead of pay-ing for
infrastructure resources up-front as capital expenditures.

On-demand self-service. With cloud computing, a consumer can
unilaterally provision computing capabilities, such as server time
and network storage, as needed and automatically, without
requiring human interaction with each service provider. Busi-ness
end users can self-provision applications and user accounts with a
few mouse clicks, knowing in advance what the addi-tional per-
user cost is and enjoying nearly instant availability.

Rapid elasticity. Cloud capabilities can be elastically provisioned
and released, in some cases automatically, to scale rapidly out-
ward and inward according to demand. So, rather than tap into a
fixed set of resources, users can add or remove capacity at will,
and only pay for what they actually use.

Resource pooling and virtualization. Cloud computing resources
are pooled to serve multiple consumers, with different physical
and virtual resources dynamically assigned and scaled elasti-
cally according to consumer demand. Virtual slices of resources
are created from clusters of servers and storage devices in the
cloud, perfectly sized to fit the specific needs of multiple users.
Examples of resources include storage, processing, memory,
and network bandwidth.

Broad network access. Cloud capabilities are available over the
network and accessed through standard mechanisms that pro-
mote use by heterogeneous thin or thick client platforms (e.g.,
mobile phones, tablets, laptops, and workstations).

Location independence. the customer generally has no control
or knowledge over the exact location of the provided resources
but may be able to specify location at a higher level of
abstraction (e.g., country, state, or datacenter).

Interestingly, the cloud is as much a consumer trend as it is a busi-
ness phenomenon, as the likes of Google Calendar, Apples
iTunes Match, and banks online bill-pay services -- not to mention
AOL, Gmail, Yahoo, LinkedIn, and Facebook - all utilize the cloud
model and are being leveraged by "just plain folk" who dont give
their hosting a single thought.

Examples on the business side include the erstwhile
SalesForce.com for sales and contact management, Microsofts
Win-dows Azure for running Windows applications and storing files
and data using Microsofts datacenters, and sites like Wordpress
for blogs -- never mind every Web hosting service in the world, all
of which are provided on a cloud basis! Online forums, shared file
storage spaces, and event registration offerings are other solutions
that are commonly moved out-of-house and into the cloud as well.

The offloading into the cloud of IT management and cost notwith-
standing, governance is something that organizations should, do,
and must retain ownership of -- after all, if a judicial finger is going
to pointed at anyone because of an infrastructure-related issue, its
going be aimed directly at you.

Here is quick roundup of key points to consider when thinking
about cloud solutions:

Compliance: can your cloud provider submit to audits and secu-
rity certifications?
Data location: is your provider willing to contractually commit
that they it is obeying the privacy laws of the local, regional, and
national jurisdictions in which it is storing your data?

Data segregation: is your data properly segregated from every-
body elses on the shared server so it cant be read by just any-
body?

Availability: have you defined your service-level requirements,
and are there penalty clauses that can be invoked should they
not be met?

Recovery: what happens if your cloud provider suffers a natural
disaster, or even simply a crash that results in total data loss?
Can it do a complete restoration, in a timely fashion?

Viability: what happens to your data if your provider gets ac-
quired or goes bankrupt?

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org








169
SECTION 4

Mobile Applications


In This Section...

1. Mobile Device Capture and
Access
2. Notification Techniques and
Location-based Services
3. Impacts on E-Commerce,
Information Architecture
and Usability


















Mobile Device Capture and Access

Mobile applications are those used on a portable
computing device to access information and re-
sources. As a term, it encompasses mobile com-
munication, hardware, and software, and the abil-
ity to both gather and display information.


















Mobile devices can be used to collect information
in a number of ways, perhaps the most popular
and obvious of which is by taking digital pic-tures
and recording digital video. From an infor-mation
management standpoint, whats notable about
this has less to do with the taking of a pic-ture --
which is valuable unto itself in, say, an in-



170
surance claims context -- but the kinds of things a person can take
a picture of: like a dinner receipt, which can be instantly tagged and
transmitted over a wireless or cellular network to the organiza-
tions expense accounting system for processing before you even
leave the restaurant.

Other capabilities are adding to the list of intriguing possibilities on
a daily basis, such as the ability to use the mobile devices geo-
tagging function to verify the location of that dinner, its document
reading and editing utility to show and amend the materials that
were discussed over coffee, a barcode app to manage inventory
or for mobile marketing, or an electronic form to capture polling or
rating information.

The flip side of this coin is mobile information access, or the ability
to receive information on mobile devices -- which until recently had
relatively small screens and keyboards, limited memory, and
relatively weak processors compared to laptop or desktop comput-
ers.

Even though todays newest smartphones and tablets belie these
once-established facts, the experience -- and the technology -- is
dif-ferent enough that accommodations must be made to ensure
infor-mation is displayed properly and applications run well.

One common technique is to optimize your intranet, extranet, or
Internet site for mobile access: which is to say, to enable the site to
recognize the device as a mobile platform, and to serve up its
pages in a format specific to mobile viewing: smaller pictures re-
quiring less bandwidth and horsepower to transmit and render; dif-
ferent, more readable fonts; fewer or no scripts, etc.
Other, more mundane ways of pushing information out to mobile
devices are by text, email, and -- lets not forget! -- voice, all of
which are perfectly serviceable as conduits for receiving informa-
tion and are commonplace in even the dumbest of smart devices.

On the other hand, the trend is accelerating toward developing na-
tive apps to run locally on mobile devices. This not only takes
some of the load off the traditional server and network by putting
the computing and communications engine right in your pocket,
but it positions such apps for use in even the most remote environ-
ments -- including subways and airplanes, where an Internet con-
nection is not a given.

For what is a mobile device these days if not the ultimate distrib-
uted, occasionally-disconnected computing client? Architecturally
and philosophically, it is the logical extension of todays conven-
tional wisdom. Technically and financially, though, the story is
somewhat different, as the different platforms -- iGadget, Android,
Windows Mobile -- are incompatible and thus require their own
app versions! In some cases they are even specific to particular
products in the same family (e.g., iPad vs. iPhone) or to particular
versions of the same operating system (iOS 4 vs. 5).

There is an alternative, of course, which is to develop the app in
question as a Web app, which like as not will work across all plat-
forms. And eventually, the situation probably will resolve itself in the
same way it did back when PC/Macintosh and Java/.NET were
either/or considerations. But for now, it is leading many or-
ganizations to standardize on a single platform in order to exert some
measure of control over the time and cost acquiring, support-




ing, and developing software for the devices -- never mind the se-
curity and privacy of the information they carry.

Notification Techniques and Location-based Services

Mobile notification techniques involve applications driven by push
technology, or server push, which describes a style of Internet-
based communication where the request for a given transaction is
initiated by the publisher or central server. Email is probably the
most widely-used example of a push-enabled mobile notification
application. Push technology is contrasted with pull technology,
where the request for the transmission of information is initiated by
the receiver or client.

Mobile notification services are often based on information prefer-
ences expressed in advance. This technique is known as a
publish/ subscribe model; in it, a client subscribes to various
information channels that broadcast information such as news and
sports scores. Whenever new content is available on one of those
chan-nels, the server pushes that information out to the user.

Other push-enabled notification applications include market data
distribution online (stock ticker information), chat/messaging sys-
tems (Webchats), auctions, online betting and gaming, monitoring
consoles, and sensor network monitoring.

A Location-Based Service is one that makes use of a mobile de-
vices geographical position to provide information relevant to the
area in which the phone -- and presumably the user! -- is located.
Examples include services that serve up a map and directions from
the current point to a desired destination, or provide the names and
addresses of nearby points of interest (museums, restaurants,
hotels, historical houses, etc.), or utilize social media apps to in-
form the user which of their friends are nearby, and where they are --
and vice versa! They may also direct advertising to local custom-ers
and provide personalized weather services, or even allow par-
ticipation in location-based scavenger hunts and other games.

Augmented reality is a term for a live direct or indirect view of a
physical, real-world environment whose elements are enhanced by
computer-generated sensory input such as sound, video, graphics,
or GPS data. It is related to a more general concept called
mediated reality, in which a view of reality is modified by a
computer. These are different than virtual reality, which replaces
the real world with a simulated one.

One familiar example is the use of the yellow "first down" line su-
perimposed on the field in television broadcasts of American foot-
ball games, which is applied to make clear how far the team on
of-fense must advance the ball to receive a first down. Similarly,
TV networks often display virtual messages on the walls of
baseball parks that are real.

Impacts on E-Commerce, Information Architecture and
Us-ability

Mobile applications are exerting a significant influence on virtually
every aspect of information management. Nowhere is this more ap-
parent than in the realm of eCommerce, which is feeling the effect
in several notable ways.

Location awareness: knowing where a person is by virtue of the
GPS capabilities baked into many mobile devices means mer-
chants can push highly-localized messages and special offers
to customers, or develop campaigns that take local weather
condi-tions, say, into account (as for an outerwear retail outlet).


Price and satisfaction transparency: the ability to conduct instant
price comparisons and identify a local store offering a better deal
than the one in which the user is standing is bringing an unprece-
dented transparency to pricing models and is empowering con-
sumers as never before. The ability to post reviews of products
and store experiences is doing the same for customer satisfac-
tion.

Tap-and-pay capabilities: apps are now becoming available to
en-ables customers to pay for things without having to carry cash
or even credit or debit cards. Instead, the mobile device sends
en-crypted credit or debit card information to the given establish-
ments electronic payment system or the one embedded in a
suitably-equipped vending machine.

Information architectures also are being forced to accommodate
mobile devices, and much of this work has to do with developing
requirements for connecting to and with a variety of models, oper-
ating systems, and user interfaces.

One way to minimize the effort is to create an application layer
that provides the same data, security, and functionality to the user
whether its a native mobile or a Web app -- in other words, make
the UI dependent on the device and mode of access, but keep the
underlying data layer independent.

Mobile usability demands that applications -- be they native or Web
apps -- take the characteristics of mobile devices into account. Some
of these are limitations -- like the relatively small size of most
smartphone screens -- but others represent opportunities for inno-
vation, such as applications that take advantage of the touch
screens and gesture-navigation these phones and their tablet
cous-ins support so well.

Specific considerations include:

Page length and width, perhaps showing less information per
mobile device page
Image and text scaling, to ensure readability

Navigation controls and menus, especially as they relate to posi-
tion, color, clickability, etc., and
Battery life

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org




SECTION 5

Websites and Portals


In This Section...

1. Internet and Web Properties
and Principles
2. Web Content
Management (WCM)
Principles and Standards
3. WCM Tools
4. Website Usability
5. Means of Internet Access
















Internet and Web Properties and Principles

In case you somehow havent already heard of it,
the Internet is a global system of interconnected
computer networks that use the standard Inter-
net Protocol Suite (TCP/IP) to serve billions of
users worldwide. It is a network of networks that
consists of millions of private, public, academic,






































business, and government networks, of local to
global scope, that are linked by a broad array of
electronic, wireless, and optical networking tech-
nologies.

The Internet carries a vast range of information
resources and services, including inter-linked hy-
pertext documents, data files, and electronic mail


174
messages. It is important to note that the Internet is not the World
Wide Web, which can be thought of as the graphical front-end to
the original as a text-command -based medium. But because the
Web is the public face of the Internet, it is commonly thought of as
being the same thing, and protocols such as HTTP (hyper text
trans-fer protocol), FTP (file transfer protocol), and Telnet (for
remote computer access and control) have, perhaps invisibly,
become part of our daily usage.

The Web is populated by Websites that are collections of text, im-
ages, videos, or other digital assets, organized into "pages," that
are hosted on at least one Web server that is accessible via a
network such as the Internet or a private local area network. The
mecha-nism used to find one vs. another is known as a Uniform
Resource Locator (URL). We know these as the http part in most
Web ad-dresses -- and now we know that the "http" is the protocol
and the "www" actually stands for "world wide web."

Web portals are types of Websites but are more tightly focused in
that they function as a point of access to lots of information on the
World Wide Web by presenting material from diverse sources in a
unified way. Apart from the standard search engine feature, Web
portals can offer other such services as e-mail, news, stock
prices, sports updates, and entertainment.

In a corporate environment, they enable enterprises to apply a
con-sistent look and feel, and a controlled access, to multiple
applica-tions and databases that otherwise would have been
behaved as different entities altogether. Examples of public web
portals that can be customized and personalized include AOL,
Excite, iGoogle, MSN, Netvibes, and Yahoo!
Intranets are computer networks that organizations use to securely
share any part of their internal information. Typically utilizing the
same Internet Protocol technology as the Internet itself, they may
host multiple private Websites and constitute an important compo-
nent and focal point of internal communication and collaboration.

Extranets, as may be surmised by the name, play essentially the
same role but on an outward-facing basis. Extending a company's
intranet to offer information to external users -- usually partners,
vendors, and suppliers -- they allow controlled access from the out-
side for specific business or educational purposes.

Web content management tools enable intranets, extranets, and
Web portals to offer users various capabilities, such as check-in/
check-out, personalization, and other methods of customization
and collaboration.

Web development is the broad term used to describe the develop-
ment of a Website for the Internet, intranet, extranet, or Web
portal. Areas of activity include design, content development, client
liai-son, client-side/server-side scripting, Web server and network
se-curity configuration, and e-commerce development.

In practice, many Web developers have several basic
interdiscipli-nary skills/roles, including:

Graphic design/Web design

Information architecture

Copywriting/copyediting,

Web usability and accessibility, and



Search engine optimization

With the growing use of commercial and social media Websites as
vehicles for marketing and e-commerce, more and more entrepre-
neurs and corporations are integrating their intranets, extranets,
and portals with sites such as Facebook and Google. The integra-
tion process can go both ways, for example by integrating Face-
book or Google Maps into an organization's Website and by allow-
ing third-party sites to gain access to some content from the
organi-zation's extranet.

One popular way to do this is through the use of service connec-
tors like Facebook Connect, which is the mechanism that allows
you to create and sign into Web accounts using your Facebook
cre-dentials, rather than having to establish new identifies each
and every time. Twitter and Google have the same sort of
capability, and for the user, the result is much greater convenience
and usabil-ity.

Web Content Management (WCM) Principles and
Stan-dards

Web Content Management, or WCM, is a lot like content manage-
ment in that it manages the integrity, revisions, and lifecycle of in-
formation -- except it specializes in content that is specifically des-
tined for the Web. The key features of WCM systems are:

The ability to design and organize Websites to provide efficient
and effective access to relevant and up-to-date content
The ability to control and prepare the content for publication, in-
cluding orchestrating and controlling content evaluation and ap-
proval before publication on the Web site, and
The automation of key parts of the publishing process

Most systems today are designed to allow users with little knowl-
edge of Web programming or markup languages to create and
man-age Website content with relative ease. Typically, they use a
data-base to store page content, metadata, and other information
assets that might be needed by the system, and administration is
usually done through browser-based interfaces.

WCM comes into play whether the site is intended to be used inter-
nally by organizational personnel (an intranet), externally by busi-
ness partners and other outside-but-business-critical parties (an
ex-tranet), or publicly on the Web itself (on the Internet). The Web
con-tent being managed may be different, however. For instance,
Intra-nets are generally as secure as any inside resource, and, in
fact, need not even connect to the outside Internet at all.
Information-ally, they typically contain:

Work content to help people do their day-to-day jobs, including
accounting, IT, phone/email directories, and conference room
scheduling

HR information like benefits, employee newsletter, vacation re-
quests, and training availability
Corporate material including annual reports, governance, and
press releases
Social content such as regarding social events, company
sports, and charitable activities

Extranets are also secured, but are opened for use by trusted exter-
nal entities like distribution channel partners and critical suppliers.


176
They typically contain content like: 1) virtual meeting rooms in
which joint project teams can share documents, make
comments, ask questions; 2) product catalogs for reference by
sales partners; and 3) procurement pages to facilitate customer
ordering and ac-count status review

Internet sites, in contrast two the other two, are accessible to the
general public, though they may have protected areas for custom-
ers and subscribers to use for self-service purposes. Information
found here can encompass a little bit of everything, including:

Organizational background information

Service offerings

Product specs

Blogs

Financials, and perhaps most importantly for marketing reasons,

Contact info

As is the case with other information management system, not
every WCM solution provides the same functions as every other,
or the same level of depth. So depending upon what you need,
and what you anticipate needing in the time ahead, it is smart to
con-sider the alternatives in terms of how different capabilities can
be activated or added later on. Among these are:

Configuration: the ability to turn built-in features on and off, uni-
versally or selectively, by changing administrative settings
Extension: the ability to install modules of new functionality to
the original solution, as when plugging new capabilities into an
application platform

Customization: the ability to take what youre given and pro-
grammatically form-fit it to your specifications via supplied or
purchased toolkits or Application Programming Interfaces

Integration: the ability to tie the WCM solution to others that
have already been installed, either programmatically or by lever-
aging interoperability methods such as Web services

Talking about Web services brings us into the realm of Web stan-
dards, without which ready interoperability -- and even the Web as
we know it -- would not exist. Generally speaking, Web standards
compliance relates to a Web sites or pages officially correct use
of HTML and xHTML for page construction, CSS stylesheets, and
JavaScript for interactivity. Full standard compliance also encom-
passes the use of valid character encoding, RSS or Atom news
feeds, RDF, metadata, XML, object embedding, script embedding,
browser- and resolution-independent codes, and server settings.
Foundational publications governing these and other attributes
cover the likes of:

Proper use of HTTP and MIME to deliver pages, return data
from them, and request other resources referenced, based on
the IETFs RFC 2616

Properly formed names and addresses for pages and all other re-
sources referenced from it (URIs), based on the IETFs RFC 2396
Recommendations for Document Object Models (DOM), from
the W3C

177
Web Content Accessibility Guidelines from the W3C's Web Acces-
sibility Initiative

Semantic Web work at the W3C related to the Resource Description
Framework (RDF), Gleaning Resource Descriptions from Dialects of
Languages (GRDDL), and Web Ontology Language (OWL).
WCM Tools

Web content management often begins with the use of a Web
tem-plate, a Web publishing tool present in content management
sys-tems, software frameworks, HTML editors, and many other
con-texts. Essentially a Website layout design, it amounts to a
ready-made empty site shell that allows people to create their
own qual-ity sites in a fraction of the time and cost it would take to
design and build one from scratch. It may be important to note
that most Web templates do not contain any type of programming
or script-ing since how such capabilities are deployed depend
heavily on the Web server platform and applications involved. But
their very nature enables a great deal of control over how pages
are created and rendered.

Content on a Web page -- text, images, form fields, etc. -- can be
made to change in response to different contexts or conditions. In
these so-called dynamic sites, page content and page layout are
cre-ated separately, and usually are managed as components
rather than completed entities unto themselves. These components
are then stored in a database and retrieved for placement on a
Web page only when needed or asked, according to predefined
layout rules such as imposed by the likes of PHP.

This allows for quicker page loading, and it allows just about any-
one with limited Web design experience to update their own Web-
site via an administrative tool. This set-up is ideal when frequent
changes need to be made to a site, such as in an e-commerce
situa-tion.

Beyond this, dynamic Web pages can further provide custom con-
tent for users, based on the results of a search, form fields filled or
left blank, location, or some other parameter. Also known as "dy-
namic HTML" or "dynamic content," the "dynamic" term here is used
when referring to interactive Web pages created for each user in
contrast to the billions of static Web pages that do not change.

This puts sites firmly on the road to personalization, a process in
which Web content is individualized by tailoring it on-the-fly to a
particular user, based on the characteristics (interests, social cate-
gory, context, etc.) of that Website visitor. Changes are based on
im-plicit data such as items purchased or pages viewed, and can
cause performance issues under heavy usage.

Web personalization models include rules-based filtering, based
on "if this, then that" rules processing, and collaborative filtering,
which serves up relevant material like books, music, and video to
customers by comparing their own personal preferences with the
preferences of other like-minded persons. Many companies offer
Web recommendations and referral services that are based on
anonymously collected user behaviors. Amazon, Google, Yahoo,
AOL, and eBay are among the many high-profile sites that use
per-sonalization extensively.

Another tool in common use is Web-based workflow, an applica-
tion that controls and drives the tasks, people, required inputs and
outputs, and collaboration or transaction processes resulting from
a Website interaction.

Each step in a workflow can be designed to launch one or more
ad-ditional steps, which means that the tool can exert tremendous
in-fluence over how works gets done and thus how it can be im-
proved. In the site design context, for instance, completing a layout
can trigger simultaneous requests for artwork, content, and ap-
proval; or in a customer scenario, placing an order can kick off all
at once the billing, fulfillment, and shipping processes.

Optimizing workflow invariably improves usability by automating
many tasks and providing simple interfaces to move things along,
as well as collaboration by automatically linking the people and de-
partments involved in ensuring Web site quality, control, and out-
comes.

If all this sounds like a lot, dont despair! The good news is that
most Web content management capabilities can be added on
either to a solution you already have or to an otherwise out-of-the-
box of-fering that you really like but wish could be made to take
this one extra step.

For instance, Web services are a great way of tying systems to-
gether -- internally or even with outside organizations like busi-
ness partners, which can be enabled to connect to and leverage
your content without having to "screen scrape" it off your public
Website. Or your repository can be made interoperable with theirs
through the use of standards like CMIS (the Content Management
Interoperability Services standard). Or, perhaps more simply, they
can be given direct access to your back-end data using a log-in.

Either way, proper authentication and authorization techniques need
to be applied to ensure your systems security is maintained.
Underlying all this are a series of tools that enable these tools
to mesh.

One of these is actually beyond the control of the organization since
it involves the management of the settings of the users Web
browser. This requires only the use of simple check boxes to acti-
vate or deactivate certain options like whether Websites are al-
lowed to run or install scripts or controls, display pop-up win-dows,
or even access the Internet itself -- all of which have signifi-cant
ramifications on what your users experiences end up being.

Assuming the user enables it, scripting is a great way to manage
and enhance the Web experience. A method of programming, it
sends a sequence of instructions that result in the browser achiev-
ing a specific effect like "assemble the Web page out of these
com-ponents" or "open a mortgage calculator" or "display the
following banners." Web scripting languages include Perl and
VBScript, and are often written to handle forms input or other
Website services processed by the Web server. Script written in
JavaScript, on the other hand, runs in the browser itself. In
general, script languages are easier and faster to code in than the
more structured and com-piled languages such as C and C++.

Elsewhere, the Wireless Access Protocol (WAP) is a device-
independent standard for providing mobile access to messaging
and Web-based services. It includes a wireless version of TCP/IP
and a framework for telephony integration such as call control and
phone book access, and it supports keypad and voice recognition
as well. WAP is independent of the air interface and runs over all
major wireless networks.



Website Usability

Website usability is an approach to making Websites easy to use.
Upon arriving at your site, visitors should be able to intuitively re-
late the actions to be performed there with other interactions seen
in the general domain of computing life -- e.g., when an icon or a
hyperlink is clicked, something happens on the screen. No sur-
prises, no specialized training. Just a familiar set of actions and
re-sults.

Broadly speaking, Website usability is tied to these best practices:

Present the information to the user in a clear and concise way.

Make sure that Website navigation is simple, straightforward,
and reliable.
Give the correct choices to the users in a very obvious way.

Eliminate any ambiguity regarding the consequences of an ac-
tion, e.g., clicking, deleting, purchasing, etc.
Put the most important thing in the right place on a Website
page or a Web application.

A big part of usability is out-and-out accessibility, a term that
speaks to the fundamental ability to actually discern what is on the
site and enjoy a complete and rich viewing experience.

For example, best-practice is to avoid the use of tables to lay out
Web pages because different browsers -- and different versions
of the same browsers -- can render them differently, and result in
quite the mixed-up page. A reliance on multimedia plug-ins is a
dicey way to go for similar browser-compatibility reasons, and for

the simple fact that not everyone -- or every device -- has the
ones you might require.

Other recommendations include:

Using standard color palettes to maximize the odds a visitors
screen will display them properly -- and avoiding the use of
color coding or color contrasts alone to highlight information
since people with color blindness may not be able to tell the dif-
ference

Embedding <ALT> tags in images so some information can be
provided even if the image doesnt load, as well as avoiding the
use of images as the sole means of navigation, supporting
them by including text links as well, and

Using scripts with care so as not to limit the experience of visi-
tors who have turned off their browsers scripting support

Usability is often overlooked as a discipline because site designers
and owners know their sites so well that its easy to forget that
first-time visitors dont have the same familiarity. Usability testing,
therefore, is a very good idea, so you can gauge how easily a
new-comer can find his way around. A non-functional requirement,
us-ability cannot be directly measured, but it can be quantified by
means of indirect measures like, say, tracking the number and fre-
quency of reported problems with the interface, navigation, or vo-
cabulary used on the site.

The preferred method for this is to observe actual users of a work-ing
system. Although there are many variants, one accepted way is to
perform direct user testing, which involves: 1) Getting some rep-
resentative users; 2) Asking them to perform representative tasks

180
within the design; and 3) Observing and recording what they do,
where they succeed, and where they have difficulties. It's impor-
tant to test users individually and let them solve any problems on
their own lest you sway the results by becoming involved yourself.

Usability can also be enhanced by integrating intranet, extranet,
and portal sites with commercial sites like Facebook and Google.
This integration process can go both ways; for example, integrating
Facebook or Google Maps into an organization's Website and
allow-ing third-party sites to access specific content from the
organiza-tion's extranet. Larger businesses allow users within their
intranet to access public internet through firewall servers, screening
mes-sages coming and going to keep security intact.

While were on the subject, intranets typically feature a broad
scope of functionality and a wide variety of content and system in-
terfaces, and thus can be much more complex than their public
counterparts. As a result, how usable they are is of special impor-
tance since a poor score in this regard can be a real drag on
produc-tivity.

Metrics such as those compiled when tracking page loads and
time-on-site can be illuminating when seeking areas in need of par-
ticular improvement.

Means of Internet Access

For the most part, how most users access the Internet is a function
of first availability, and then cost. Here are some of the most com-
mon methods:

Dial-Up uses a modem and standard telephone line, which can
be used either for calls or "Internetting," but not both at the same
time. The connection is made as needed, and maximum
speed does not exceed 56 kilobits per second (Kbps).

ISDN, Integrated Services Digital Network, also utilizes existing
telephone lines but allows 64Kbps on a single channel. Two
chan-nels can be combined for a maximum of 128Kbps.

DSL (for Digital Subscriber Line) also uses telephone lines, inte-
grating regular phone service and Internet access via a DSL hub
so as to allow for an "always connected" situation in which voice
calls dont interrupt data sessions. Download speeds can vary
from 256Kbps to 6 megabits per second (Mbps), depending
upon the level of service and the physical distance from the
phone companys central office. Upload speeds are almost
always much slower.

Cable modems employ cable TV coaxial cables rather than
tradi-tional phone lines, and require service from is a cable TV
pro-vider. The connection to a computer is made via a network
inter-face card (NIC) and an Ethernet cable. Speeds here
theoretically can reach 30 Mbps, but most providers offer
service with be-tween 1 and 6 Mbps for downloads, and
between 128 and 768 Kbps for uploads as the total bandwidth is
divided among all the users in the given area.

T-1 lines are highly-specialized telecommunications circuits that
do not work over normal telephone lines. Popular in a large
number of businesses for many years, they are divided into 24
channels that can be used for numerous purposes, but can be
combined to achieve a maximum speed of 1.54Mbps.


Finally, there are satellite connections, which are made via satel-
lites orbiting the Earth. In this arrangement, each subscribers
hardware includes a satellite dish antenna and a transceiver
(transmitter/receiver) that operates in the microwave portion of
the radio spectrum. Typical speeds are comparable to T-1 lines
for downloading but are more similar to low-end DSL on the up-
load.

Once connected to the Internet, the next step is to connect to the
in-formation management resources you need to access in order
to ac-complish the tasks of the day. There are a number of ways to
ac-complish this, including these:

A Virtual Private Network (VPN) essentially forges a secure "tun-
nel" through the public Internet. It usually requires remote users
to be authenticated and often calls for the use of data encryption
as well to prevent disclosure of private information to unauthor-
ized parties. Once connected, the VPN user experience is
exactly that of a user directly hooked to the network in the office,
and supports any and all usual functions like file sharing,
database lookup, and printing. Functionally, VPNs have
eliminated the need to requisition and maintain expensive
dedicated leased-line telecommunication circuits once typical of
wide-area network in-stallations.

Remote desktop is a feature in high-end versions of Windows (XP
Pro, Vista Business, Win7 Pro, etc.) that allows a Windows
computer to be run remotely from another Windows machine over
any TCP/IP connection (dial-up, LAN, or Internet). Also called
"Remote Desktop Connection," it uses the Remote Desk-
top Protocol (RDP) to exchange keystrokes, mouse
movements, and screen changes.

Finally, a Citrix Server uses Microsoft Terminal Services software
to deliver Windows applications to PCs, Apple computers, X ter-
minals, and UNIX workstations, where each device plays the role
of a "dumb terminal." This configuration enables users of those
systems to access and use Windows programs.

In recent years, the notion of "remote access" has shifted to one of
"mobile access" to take advantage of the growing sophistication of
smartphones and tablet computers, classes of devices that both
pro-vide better Internet access and browser-based Web
experiences to-day than their predecessors did.

As the trend continues, mobile browsers will gain increased direct
access to mobile hardware (including accelerometers and GPS
chips), and the speed and abilities of browser-based applications
will improve. Persistent storage and access to sophisticated
browser GUI functions also will appear, and the result will be an
intensification of the emerging debate regarding browser-based
and platform-specific native applications. About all that is certain is
that both will continue onwards, as connecting in real-time to en-
terprise servers is a long-established practice on the one hand,
and gathering and analyzing data (captured via electronic forms or
di-rect input) in areas without a reliable -- or even present --
connec-tion is crucial in certain situations on the other.

Other issues to be sorted out include interoperability and usability,
as multiple platforms, form factors, and interfaces still roam the
marketscape. This makes it difficult for organizations to know
which will be around for the long haul and thus to invest in. The


182
smartphones relatively small size also is problematic for general-
purpose information management since display real estate is
scarce and interaction with the screen requires a precision that
can be tiresome.

Finally, the penetration of the intranet into all corners of an organi-
zation is causing the lines between the topics just discussed to
blur. For instance, if enabled by the system admin, a user logging
on to her internal network can as seamlessly access a business
partners product catalog or project chat room as she can the
public Internet. In such a circumstance, the system itself could
automatically estab-lish a VPN to connect the two enterprises
when the extranet link is clicked, and it would matter to nobody if
she ditched her company-issued ThinkPad in favor of an iPad.

As the worlds draw inexorably together, and as technology contin-
ues to advance in terms of device processing power, availability of
high-speed bandwidth, and cultural acceptance, the door will be
flung even further wide open to Enterprise 2.0 and Information
Workplace capabilities than it is today. These combine traditional
corporate computing with new-era social networking to create com-
prehensive and integrated user experiences, and unlock opportuni-
ties to share knowledge and act on it.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org




183
Plan and Implement

























































clxxxiv
SECTION 1

Strategic Planning


In This Section...

1. Key Planning Components

2. Planning and Analysis Tools
and Types
3. Maturity Models,
Technology Trends, and
Internal IT Impact Analysis




















Key Planning Components

In a nutshell, strategic planning for information
management involves defining an organization's
information management strategy or direction, and
making decisions about how to allocate re-sources
to pursue this strategy. A proper informa-tion
management program involves creating and






































following policies and procedures based on the
principles that:

Information assets are corporate assets.

Information must be made available and
shared.



186
Information the organization needs to keep is managed and re-
tained corporately.

Because information management is a corporate responsibility, or-
ganizations must require that all of its employees, from top to bot-tom,
be held accountable to capture, manage, store, share, preserve and
deliver information in an appropriate and responsible manner.

In order to achieve maximum participation, an organization must
rely on executives to design a roadmap, convey a vision, persuade
people to buy in, and follow up to ensure compliance. Accordingly,
it is important that these executives recruit key stakeholders, from
both inside and outside the organization, whose input and support
will be vital to the success of the program. This can include, but is
not limited to, members of IT, Records, Legal, Business, and Hu-
man Resources departments. Although every organization has its
own particular needs and challenges, many will require someone
responsible for:

Data classification/retention

Data disposition

Backup and archiving

Legal hold and electronic discovery

Information technology

Training and human resources

This activity needs to take place in support of the business objec-
tives that made information management a priority in the first
place. Like as not, these include one or more of the following:
Regulatory compliance

Risk avoidance

Cost control/profit enhancement

Greater efficiency

Increased collaboration

Establishment of an environment that will support new initia-
tives

One way to develop and prioritize objectives is to use TimeManage-
mentHQs SMART framework, which requires that they be Objec-
tive, Specific, Measurable, Attainable, Relevant, and Time-bound.
Business objectives can -- and should -- be both short and long
term. For example, an organization might want to do the following in
the short-term:

Design a system that is reliable and bug free

Provide users with timely and appropriate information in an
easy to use format, and
Improve the effectiveness of business operations

In the long term, it also may want to:

Design a system that is easy to maintain and update

Improve the productivity of the users and managers

Generate financial returns


187
A big part of the challenge is that information management pro-
grams can encompass many projects that each require a lot of
time and effort -- and significant changes take place in technology
every two or three years. So while long-term strategies can
establish use-ful timelines and set target dates for important long-
term invest-ment decisions, they should be reviewed on a short-
term periodic basis to determine what is still realistic and what may
need to be changed.

Planning and Analysis Tools and Types

Strategic planning tools provide various methodologies for evaluat-
ing variables, both inside and outside an organization, that must be
quantified and prioritized in order to create an effective plan. Some
tools, like a Balanced Scorecard, are primarily focused inter-nally,
while others, such as PEST analysis, are more focused on ex-
ternal factors like market conditions and position. And there are
some, such as a SWOT analysis, that attempt to strike a balance
be-tween the two.

Porter's Five Forces is a framework for industry analysis and busi-
ness strategy development developed by Michael E. Porter of Har-
vard Business School in 1979 to derive five forces that determine a
markets competitive intensity, and thus its attractiveness -- which
in this context refers to overall profitability.

PEST analysis stands for "Political, Economic, Social, and
Techno-logical" analysis and describes a framework of factors an
organiza-tion should take into consideration when seeking to
understand market growth or decline, business position, business
potential, and a future direction.
SWOT stands for Strengths, Weaknesses, Opportunities, and
Threats, and is used to evaluate how well an organization is pre-
pared to achieve a particular objective based on positive and
nega-tive internal and external factors.

Popularized by Robert S. Kaplan and David P. Norton in the early
1990s, the balanced scorecard was designed to be "a more robust,
general set of measurements that goes beyond the financials to
cap-ture the drivers of future value creation." More specifically, it is
predicated on the notion that organizational evaluations should en-
compass not only traditional finances, but also the likes of cus-
tomer satisfaction, internal process efficiencies, and the ability to
innovate.



AIIM provides deep dive training - both

in person and online - in such areas as ECM,
ERM, SharePoint, Taxonomy and Metadata,
Social Media Governance and others.



Today, the balanced scorecard is a semi-standard structured
report that has been adapted for use in a wide variety of enterprise
con-texts, not the least of which is information management. Here,
the idea is to use it capture metrics that can help align and support
key processes, and translate strategy into operational objectives,
meas-ures, targets, and initiatives.

These tools are especially effective when used with a business im-
pact analysis, which is aimed at differentiating between critical and
non-critical organization functions or activities. A function may be


188
considered critical if the implications of damage or disruption to it
are deemed unacceptable operationally, financially, or legally.

In the information management arena, such activities might in-
clude those involving data that affects revenue or expenses, proc-
ess efficiency or effectiveness, organizational change, stakeholder
expectations, or knowledge sharing, including the building of com-
munities of practice.

Business impact analyses endeavor to assign two values to each
critical function. The first is the Recovery Point Objective (RPO),
which is the acceptable latency of data that will be recovered, a
point set to ensure the maximum tolerable data loss for each activ-
ity is not exceeded. The second, is the Recovery Time Objective
(RTO), which is the acceptable amount of time to restore the func-
tion. It also is set to ensure the Maximum Tolerable Period of Dis-
ruption (MTPD) for each activity is not exceeded.

Once these values have been assigned, recovery requirements
for each critical function can be set in place. Recovery
requirements consist of both the business and the technical
requirements for re-covery of the critical function.

Risk analysis is related to business impact analysis but focuses
on potential risks and opportunities, rather than the most critical
points of possible failure. It uses techniques involving analytic re-
view and predictive analysis.

Analytic review is an auditing process that tests relationships and
looks for unusual changes and questionable items among the fac-
tors being studied: in this case, business and environment factors
that could put the organization and its information at risk.
Predictive analysis refers to various statistical and analytical tech-
niques employed in the development of models that forecast behav-
iors or events. Depending on the type of predicted behavior or
event, these models can take on a number of forms, but they will
usually involve some method of scoring (e.g., a credit score). Data
mining plays a large part in this as it is centered on analyzing data
to find patterns, trends, and other connections (e.g., the likelihood a
hurricane will destroy a data center in South Florida).

Among the items to consider when analyzing and quantifying
risks are the likes of:

Understanding the responses that available in the event the risk
becomes real
Gauging your organizations willingness to accept the risk, and

Determining its tolerance of the outcomes of the risk

Underlying all of the tools and techniques we just discussed is the
need to develop and utilize metrics at every opportunity -- and not
merely metrics, but metrics that can be applied most effectively to
the tasks at hand. According to The Data Warehouse Institute,
there are 10 attributes that make metrics effective. They must be:

1. Strategic, helping an organization monitor whether it is making
progress toward its goals
2. Simple, ensuring calculations, targets, and what is being meas-
ured is understandable
3. Owned, so that someone is held accountable for its outcome



4. Actionable, so that corrective measures can be taken to improve
performance
5. Timely, so that action can be taken before it is too late

6. Referenceable, so users can have confidence in the data

7. Correlated, to ensure the metrics are driving desired outcomes

8. Game-proof, limiting the impact of potentially negative influ-
ences
9. Aligned, to avoid undermining corporate objectives through dif-
fusion of energy

10.Standardized, to promote consistent use throughout an
organiza-tion

Maturity Models, Technology Trends, and Internal IT
Im-pact Analysis

Maturity models have become a popular means of "benchmarking"
an organizations performance against that of others, or at the very
least, of developing an understanding how it can best improve its
performance.

A few years ago, Baseline Magazine and the BTM Institute ex-
panded their examination of companies attempting to improve in-
formation management to include a maturity model that cuts
across four areas: Process, Organization, Information, and
Technol-ogy, as shown.

Such an assessment typically involves baselining and benchmark-
ing the organizations perceived status in each area (where it is
now), targeting a future status (where it wants to be), and evaluat-
ing their along the way. Models vary in the number and type of
stages and categories, but the thinking behind all of them is basi-
cally the same: if an organization doesnt know its current position
and doesnt know its destination, it can expect a drawn-out and
costly voyage.

Another important exercise is to understand the latest technology
trends in and around your organizations areas of activity. By con-
ducting an industry analysis, insights can be developed regarding
which technologies are "hot" and which are "not," how competitors
and collaborators may be viewing the world and what they are do-
ing about it, and what this all may mean for you.

It is especially important to identify any potential disruptive tech-
nologies on the horizon -- those that are likely to transform your
market or customer base or method of operation in the way, say,
Web, social, local, and mobile technologies have. Individually and
together, these have created a highly-connected world in which
people have more ways to create, access, and share information
than ever before, and organizations must learn not only how to em-
brace the new opportunities created thereby, but also to mitigate
the new risks (of competition, perhaps, or obsolescence, or
noncom-pliance) that come along for the ride.

A third excellent introspection technique is the internal IT impact
analysis, which is used to determine which of your IT-related serv-
ices or assets (e.g., information, people, software, hardware, facili-
ties, etc.) are most essential to protect. As a general rule, a service
or asset is essential when disclosing, modifying, misusing, or de-
stroying it will impede your organizations progress toward achiev-


ing its goals in mission critical areas such as finance,
compliance, reputation, and human safety.

According to Wikipedia, a Classical Life Cycle Impact
Assessment contains the following elements:

Selection of impact categories, category indicators, and charac-
terization models;
Classification of specific inventory parameters; and

Impact measurement, quantifying the damage that would result
from an issue with particular assets

One last point to consider is the need to remember that achieving
operational effectiveness is not a substitute for sound strategic
posi-tioning. Although it is important to eliminate waste, save
money, and limit risk, these outcomes are focused on day-to-day
opera-tions. Strategic positioning, on the other hand, is concerned
with assembling a set of activities that deliver a unique combination
of values, resulting in a competitive advantage that is not easily
repli-cated.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org








191
SECTION 2

Building the Business Case


In This Section...

1. Clarifying Needs and
Metrics
2. Business Case and Risk
Analysis Elements and
Expertise
3. Metrics, Trade Studies
and Budgets


















Clarifying Needs and Metrics

Developing the business case for your informa-
tion management initiative is one of the most
critical tasks of any, for a poor job here means
the project will be burdened with improperly set
ex-pectations at best, and terminated before
begin-ning at worst.


















This activity centers on developing and commu-
nicating the business reasons for initiating the
project or endeavor, and weighing the associated
time and expenditure against value of the pro-
jects specific outcomes. It taking it on, it is impor-
tant to remember that a business case is not the
same as a business plan. Where a plan is more



192
strategic and broad strokes, a case is more tactical and focused
on specific costs and advantages, in terms of both hard dollars
and soft benefits.

One of the first steps in making the business case is clarifying the
business need. This involves more than simply gathering data;
rather, it also mandates collecting opinions and perspectives from
all the relevant stakeholders -- supporters and detractors alike -- to
describe both the tangible and intangible benefits to derive from
the project.

This can be done using a variety of techniques, and like as not,
youll use more than one when the time comes. Among the
more popular techniques are:

Simple observation

Brainstorming

Formal surveys

Personal interviews, and

Focus groups

Besides helping you to truly understand what their needs and de-
sires are, spending time with stakeholders is crucial to helping
them believe that your effort will result in something that will truly
benefit them. This is about the best way there is to cultivate their
buy-in to the initiative and any changes in process or technol-ogy
that theyll have to accommodate -- and that buy-in is crucial to
success since non-belief will breed skepticism and resistance, and
ultimately will sour the whole experience no matter how good an
idea it may be.
Part and parcel of clarifying the business need is developing an un-
derstanding of how both the current and future state of the organi-
zation are to be measured -- and thus, how to chart and manage
the activity in between. Its very tough to determine where, when,
why, and to what degree improvements are being made -- or
regres-sions are cropping up -- if you dont have any current
benchmarks to compare them to.

The challenge here is that there are many different kinds of
metrics to choose from, and it is important to know how to think
about each one.

Financially quantifiable and non-financially quantifiable data are
exactly what they sound like: directly measurable types of informa-
tion that either does or does not relate to financial performance.
Cost decreases, income enhancement, and the time value of
money are three common examples of financially quantifiable data.
Used in cost-benefit analyses, these usually are involved in the
setting of targets and dates by which those targets will be met.

Examples of non-financially quantifiable data include reduced proc-
essing times, shorter sales cycles, and quicker information re-
trieval. Numeric targets can be set for these types of benefits, to
measure whether they are being delivered. For example, a metric
can be set for the availability of information, and targets could be
set at a departmental level to measure the number of "lost," or the
number of times problems occurred when finding, retrieving, or
sharing records.

Non-quantifiable or intangible data in general refers to information
about something that isnt directly measurable but clearly has
value to the organization. Examples here include improved cus-

tomer satisfaction, shorter time to market, and better employee mo-
rale. Note that measurable targets can be set for these by using
indi-rect measures -- like, say, a reduction in employee turnover
rate in the matter of the last example -- that indicate the benefit is
being achieved.

Qualifiable data relates to measurements of "softer" issues like opin-
ions and experiences, rather than "hard" factors like money, time, and
simple counts of the number of times something happens. In the
context of a Web site, the qualifiable data involves user satisfac-tion
ratings or confidence levels, while the quantifiable data meas-ures
users time-on-site, clicks, and successful task completion.

Sustainable data is that which can be communicated to and under-
stood by unknown users, and unknown processing systems, today
and at unknown times in the future. Related to format and systems
obsolescence in archiving terms, it centers more on long-term rele-
vance in the context of the business case, such as the ability to cap-
ture and compare certain system performance metrics (say, storage
capacities and response times) today and 10 years from now.

Value-added data combines different types of data to deepen an
or-ganizations understanding of a given issue. Roughly equivalent
to a Website mashup, it might take the server access statistics of a
dis-tributed content management system and roll up the times of
day, length of time connected, and repositories touched by users in
a given location in order to gain greater insight into the organiza-
tions network bandwidth requirements.

Business Case and Risk Analysis Elements and Expertise
The prototypical business case contains a number of standard
ele-ments, some overtly financial and others not so much. These
in-clude:

Costs, for the hardware, software, and human resources required

Return on Investment (ROI), or how long until the new solution
is paid for out of the savings achieved
Budget, or the amount of money to be allocated in given periods
of time, and the source or sources of that funding
Timeline, or an accounting of how the project and its associated
expenditures are expected to unfold
Key stakeholders, or the people inside and outside the organiza-
tion who are most likely to be affected by the new solution
Business benefits, in terms of cost savings, process
efficiencies, revenue opportunities, etc., and
Scope, or the reach the new solution has across technology
stacks and operational departments

Other elements to provide include:

The strategic business vision, defining what the organization
sees as its future state
Strategic critical success factors (CSFs), or the high-level
business drivers or opportunities for the project
Strategic key performance indicators (KPIs), defining the statis-
tics that need to be captured to measure and manage success
or failure


194
Strategic success measures, refining the KPIs into specific targets
to be met, the timetable to adhere to, and the scope of the organi-
zation to be involved (department, region, enterprise, etc.), and

Strategic change drivers, which are the primary motivations for
achieving the business vision, including any current difficulties
being experienced

These "difficulties" lead directly into the realm of risk analysis, a task
performed to identify and assess factors that may jeopardize the
success of a project. Details that should be addressed include:

An accounting of the technical and non-technical risks that have
been uncovered
The probability of those risks occurring -- particularly those asso-
ciated with inaction
The impact on the project or organization should the risks occur

The probable cost of this impact, and

Actions to take to avoid and/or minimize such an occurrence
Effective business case development requires the following:

Clear and compelling writing and presentation skills

Thorough research skills to collect and analyze input from all af-
fected stakeholders in the context of the stated business goals
Industry knowledge to use as a benchmark against which to com-
pare the organizations current and future states
Knowledge of the organizations operations and critical business
processes
An understanding of, and the ability to articulate, both financial
and non-financial costs and benefits associated with the initia-
tive in question

A deep understanding of the key audience members to whom
the business case will be made, and the skills to present the
case in that context

The ability to anticipate and answer objections, in real time and
as follow-up action items

Metrics, Trade Studies and Budgets

Making the business case for an information management project
is critical on a number of levels, not the least of which is to support
the development of metrics to determine the value of a given solu-
tion to your organization and, ultimately, the creation of a budget
to pay for it.

There are several calculations that are commonly used to
perform this work:

Return on investment (ROI)

Payback period

Net present value (NPV), and

Internal rate of return (IRR)

The math involved with these can be complex, and is beyond the
scope of this Guide. Suffice it to say, they are collectively aimed at


195
quantifying the level of benefit that can be expected from the
sys-tem and will depend upon or primarily emphasize: how long
it will take to "pay for itself" out of improved efficiencies and cost
savings, how much its worth spending today to achieve those
benefits in the future, and/or its effects on cash flows.

In all of this, it is important to remember that there is a difference
between operational effectiveness and strategic positioning, and
the calculations and judgment calls you make must be aligned with
whichever of these you have top-of-mind. Its fine to select a highly-
efficient and cost-effective solution that works perfectly, but if the
reason youre acquiring it is to move your organization in a certain
long-term direction -- rather than, say, simply to drive costs out of
your business -- then you may not have applied the right for-mulas
or made the right decision.

Strategic goals are those that are more macro and far-reaching, like,
for instance, enabling the ability to bring new products to market
faster than they can be today. From a solutions standpoint, this may
suggest the need for strong interoperability with the design systems
that are in place in the R&D or engineering departments so prototype
drawings, spec sheets, and other documents can be readily shared,
commented upon, and approved.

Operational goals, on the other hand, are more granular and may
be centered on particular departments or business processes. In
the scenario just painted, wherein the strategic goal is paramount,
this may mean the best solution is relatively light on enterprise
work-flow but is worth the tradeoff in order to speed the interaction
with R&D.
As in so many other corners of information management, keeping
all your stakeholders up to date on the factors being considered
and the decisions being made is critical to your eventual success:
not only does it help solidify their buy-in, but it keeps you on
course and greatly mitigates the risk of losing time, energy, and
sleep by virtue of having to back up and revisit your assumptions.

This prioritization can be supported through the conducting of a
trade study, or trade-off study, a practice bred in the aerospace in-
dustry that Wikipedia defines as "the activity of identify[ing] the
most balanced technical solutions among a set of proposed vi-
able solutions. These viable solutions are judged by their satisfac-
tion of a series of measures or cost functions [that] describe the
de-sirable characteristics of a solution. They may be conflicting or
even mutually exclusive."

The Lyle School of Engineering at SMU identifies trade studies
as falling into three categories:

Controlled Convergence, a quick method to compare "primitive"
design variables
Cost Effectiveness, which links force structure implications to
top-level requirements analysis, and
Comprehensive, which considers all applicable decision criteria

It doesnt take much of a leap to translate this into information tech-
nology terms and apply them to:

Quickly comparing your solution alternatives to your most im-
portant requirements
Developing the metrics discussed a moment ago, and



196
Rolling all your other criteria into the equation

By the end of this activity, you should have a fairly solid idea as to
what your solution should cost, how quickly it should generate a
return, and what size that return should be. Now, you can turn
your attention to the budget itself.

Budget-building is as much art as science because of the organiza-
tion dynamics -- read: politics -- that are typically involved. So once
again, establishing and maintaining a good rapport with all the
stakeholders is an important piece of the puzzle.

Beyond the raw numbers associated with purchase, there are a
number of other factors to take into consideration, including any
costs (internal or external) related to:

Technical training: does your IT staff have the skill necessary to
work with the new solution?
Existing infrastructure: do you have a sufficient number of serv-
ers, storage units, network connections, etc. -- of a sufficient ca-
pacity -- in place already, can you leverage some of what is
there, or do you need to buy these as well?

Software licensing and maintenance: do you need to acquire
more licenses for any application with which the new system is
expected to interoperate, or for any new ones? Will your existing
maintenance agreements apply to the future-state solution?

User training: will you have to train end users on how to use the
new system?

All of these calculations need to be performed as well, and then
plugged into a timeline so the impact on cash flow can be properly
determined, and the conversations held with any departments ex-
pected to contribute to the initiative.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org

SECTION 3

Implementation Planning


In This Section...

1. Project Planning

2. Software Development
Methodologies
3. RFIs and RFPs
4. Management Statement
of Work, Procurement,
and Scope


















Project Planning

Project planning involves a number of critical
steps in which the initiatives objectives are ini-
tially broken down and preliminary estimates are
prepared regarding costs, timetable, and re-
quired resources. Effective plans incorporate
ideas from both the people executing them and


















those having it inflicted upon them. Hence, its
important that it include ways to resolve conflicts
between these two groups and among all other
affected stakeholders as well.

The first step is to determine the project scope,
which is the work that needs to be accomplished
to deliver the specified features, functions, and,


198
ultimately, results. A good job here makes it easier to detect and
de-ter scope creep, which is not an unlikeable person but rather a
term that refers to the incremental expansion of the project to
include or introduce things that may not have been a part of the
initial spec, and do not lead to the adjustment of the schedule,
resources, or budget.

Next, the identified tasks are decomposed, or broken into, smaller
tasks and grouped by logical dependencies in order to determine
which can be done in parallel, and which are those whose comple-
tion is required before further progress can be made. Those falling
into the latter category are said to be on the projects "critical path"
and generally are given priority when their time comes in order to
ensure no bottlenecks arise that can interfere with on-time
comple-tion.

Now that the tasks have been identified and their priority deter-
mined, estimates can be prepared regarding schedules, costs, and
resources (technical and human), for each task and overall. The
re-sult here is a baseline against which future progress -- and
possible instances of scope creep -- will be measured.

All of this is prelude to the nitty-gritty of implementation plan-
ning, which specifies in detail how a solution actually is to be de-
signed, built, tested, and deployed.

According to the MIKE2 open source standard for information
management, this should be an iterative process after a point, as
feedback from earlier activities inform and improve those that fol-
low. In Mike 2, there are five basic phases to be planned and exe-
cuted upon, as follows:
Phase One is the Business Assessment, which forms the basis
of the strategy for the entire implementation.
Phase Two is the Technology Assessment, which determines
where your organization is in terms of its current technology,
where you want to go with the technology that ultimately will be
deployed, and where the gaps are between the two.

Phase Three involves the Information Management Roadmap,
which is built by gathering everything that has been assessed
in the first two Phases and

Phases Three, Four, and Five are approached as an iterative loop
in which solution build-out leads to design increments based on
the roadmap and foundation activities, iterative development and
testing, deployment, and continuous improvement.

Software Development Methodologies

The waterfall (or cascade) framework involves a sequential
process in which each stage generally must be completed before
the next can begin. Although it allows for review to make sure each
one has been completed, and completed properly, it discourages
revisiting stages once they are complete. As such, it has been
criticized as be-ing too rigid, especially compared to more flexible
methods like it-erative frameworks.

Iterative frameworks call for the periodic, if not frequent, review and
revision of work done, promoting development through evolu-
tionary advancement. Iterations typically will occur at an incre-
mental level, as with a "time-boxed mini-project," though some
methods are iterative but not incremental (e.g., Spiral).




Agile and Lean are among the many iterative development meth-
ods.

Agile software frameworks develop solutions and requirements
though the use and collaboration of self-organizing, cross-
functional teams that utilize these techniques to do their work:

Adaptive planning

Evolutionary development and delivery

A time-boxed iterative approach

Rapid and flexible responses to change

Lean development takes its cues from Toyotas lean manufacturing
strategy, and was first proposed by Mary and Tom Poppendieck in
their 2003 book "Implementing Lead Software Development." As in
car building, the idea is to "manufacture" code with as much crea-
tivity and little wasted energy as possible. This methodology can
be broken down into seven principles:

1. Eliminate waste

2. Amplify learning

3. Decide as late as possible

4. Deliver as fast as possible

5. Empower the team

6. Build in integrity

7. See the whole
The spiral method is both top-down and bottom-up in its approach
to development, combining aspects of the waterfall framework and
prototype modeling. Spiral is also both iterative and incremental,
and includes elements of risk management by virtue of identifying
both technical and managerial risks. Beginning in the middle, every
trip around the spiral passes through the following task re-gions:


Determine the objectives, alternatives, and constraints on the
new iteration
Evaluate alternatives and identify and resolve risk issues

Develop and verify the product for this iteration

Plan the next iteration

Rapid Application Development methods (RAD) design systems
by using structured techniques and prototyping to define user re-
quirements. In the first stage, preliminary data and business proc-
ess models are developed; in the next, the requirements are veri-
fied through prototyping. Then these stages are repeated itera-
tively. RAD approaches can trade performance and functionality
for quicker development and better application maintenance, so
care must be taken that too much quality is not sacrificed in the
name of speed.

RFIs and RFPs

RFI stands for Request for Information. A formal document issued
in an early stage of a procurement process, it usually is a prelude
to an RFP, which is a Request for Proposal.





RFIs are sent to potential suppliers in order to gather information
for use in an equitable and simultaneous vendor comparison. They
also are often used as a solicitation and are sent to a broad base
of prospective vendors, VARs, integrators, consultants, and other
po-tential responders for the purpose of conditioning their minds
for any forthcoming RFP.

Among the typical varieties of information requested are these:

Vendor essentials, including facilities, finances, attitudes, and
motivations
Vendor strategic focus, business strategies, and product plans

Breadth and width of product/service offerings, by supplier

Market conditions and trends

Alternative pricing strategies

Additional sources

A well-written RFI should be short, sweet, and to the point. Since it
comes with no promise of an opportunity to bid on the project, re-
spondents appreciate a conciseness that allows them to spend a
minimum of time and money in answering the request. Here is a
list of items that RFIs typically include:

A Statement of Need that clearly defines your project and articu-
lating its goals and objectives. Four or five sentences should be
enough a small or medium-sized project

A Background section consisting of a paragraph describing your
organization
A List of Qualifications that you expect your prospective suppli-
ers to possess
An Outline of Information Requested that encompasses every-
thing you want to know vendors hate it -- and can hold it
against you later on -- when important information is left out
and they are asked to supply additional information for the
same RFI.

Evaluation Criteria so respondents know on what basis you will
be making a decision. Be sure to include an insurance and
bond-ing requirement and a non-disclosure agreement to which
the candidate must adhere.

Timetable for Response so everyone knows when you require
a response and how long you will take to review the material re-
ceived

Your Contact information, including email and phone number, so
that vendors may talk freely with you. Where RFPs generally
have "no contact" rules (except for requests for clarification), RFIs
should encourage any interaction that produces useful infor-
mation. It is not uncommon to learn from vendors questions what
information was left out, something that could save a lot of rework
at RFP time.

And speaking of RFPs these are formal invitations that are sent to
a broad base of prospective vendors, VARs, integrators, consult-
ants, and other potential suppliers, often through a bidding proc-ess,
to submit a proposal on a specific product or service. The RFP
process brings structure to the customers procurement decision and
allows the risks and benefits of a project to be identified clearly


up front. RFPs dictate to varying degrees the exact structure and
format of the suppliers expected response. Those that are the
most effective reflect the strategy and short/long-term business
objec-tives of the customer, and provide detailed insights upon
which vendors will be able to build their solutions. A good RFP has
these characteristics:

Informs suppliers that you are looking to buy and encourages
them to make their best effort
Specifies what you propose to purchase. Any requirements analy-
sis can be incorporated quite easily.
Alerts suppliers that the selection process is competitive

Allows for wide distribution and response

Ensures that suppliers respond factually to the identified require-
ments
Details the evaluation and selection procedure

RFPs should include more than an inquiry about price. Questions
about basic corporate information and history, finances, technical
capability, product information such as stock availability, estimated
completion period, and availability of customer references are all
within bounds, as is a request for the professional credentials any-
one who will be working on the project.

Common elements include:

Your company background, so respondents can get a feel for
who you are
A project description summarizing your key problems, opportu-
nities, and goals
Design requirements, perhaps including information pertaining
to how the project deliverable will operate
Technical and infrastructure requirements, including the likes of
the applications, databases, servers, and other components to
be accommodated or worked around

Functional requirements, including a short description of each
moving part
Estimated project duration or required completion dat

Assumptions and agreements, such as a limit on the bid
amount, the non-returnability of proposals, and your right to
dismiss any proposal for any reason

Submission deadline and return-to Information

Allowed contacts, either one for all clarifications or several for
several disciplines, such as one for technical questions and an-
other for business inquiries

Basis for Award of Contract information, so respondents under-
stand your evaluation criteria
Anticipated Selection Schedule

Management Statement of Work, Procurement, and Scope

A statement of work is a document that describes the work that
needs to be done in an information management project, detailing
with contractual precision such items as:



202
Major deliverables

Specific deliverable timeframes

Tasks and jobs

Staff assignments

Financial resources and facilities required

Sources of funding

In cases where vendors or contractors actually are to be engaged,
pricing and terms are also sometimes specified, if not already indi-
cated in an pre-existing contract.

Because the statement of work is so significant as a planning and,
ultimately, legal document, it is absolutely critical that it be pre-
pared properly and accurately, and include inputs and guidance
from all affected departments. While the specifics will vary from
organization to organization, here are a few simple-sounding roles
and responsibilities that should be part of every such initiative:

The project coordinator plans out the SOW and its parts

An author or authorship team creates a draft, sometimes by pars-
ing out the sections to individuals with specific expertise
Reviewers then check it over, offering comments and making
changes as necessary
A designated editor analyzes and synthesizes the inputs, asking
questions where necessary, and incorporates the changes into
the final document
Appropriate managers then approve and sign it

Preparing the statement of work is a precursor to actually procur-
ing the services needed to get that work done. While price is obvi-
ously an important consideration, timetable, reputation, and re-
sources (yours and the service providers) also are key elements
to weigh carefully. Most times, the procurement process can be
bro-ken into five phases:

1. Defining the business need: This step entails capturing detailed
business requirements and formulating them into a requirement
statement, identifying and committing resources, and obtaining
stakeholder buy-in to the project

2. Developing the procurement strategy: Here, timescales must be
understood and the current business environment -- inside and
out of the organization -- must be evaluated. A list of possible
suppliers is also readied, and an approach to each of them is
pre-pared.

3. Supplier evaluation and selection. Here, a short list of suppliers
and related solutions are selected to be taken to final negotia-
tions.

4. Negotiation and award of contract. This is where the deal actu-
ally done.
5. Induction and integration. Here, it is important to make sure
that the supplier is fully prepared and able to deliver all aspects
of the contract, and all the relevant performance measures and
reporting mechanisms have been put into place.




203
Once the project is underway, guarding against and managing
"scope creep" is vital to keeping a project on schedule and on
budget. All too often, project managers find their projects "grow-
ing" as tasks not originally planned make their way onto the list of
things to do.

One of the simplest ways to mitigate the effects of scope creep is
to make sure that the scope is clearly and thoroughly defined in
the beginning, communicated to all of the relevant participants,
and then referenced at every point along the way. This isnt to say
that additional work wont become necessary as the weeks go
along. But establishing boundaries and implementing vetting
processes go a long way toward keeping the project from
becoming unman-ageable.

When changes do need to be made -- say, upon discovery of an er-
ror or omission in the statement of work, or a regulatory change --
then a change order procedure should be followed to document the
particulars and limit the work to the tasks defined. In this day and
age, this process is managed electronically so all the attributes of
records management can be brought to bear on it.

The process itself is fairly straightforward, typically involving stat-
ing what the changes are, the reasons they must be made, and
the estimated impact on costs and schedule. When approved, the
change order will serve as an official modification to the original
agreement and become part of the new plan.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org











204
SECTION 4

Requirements Definition


In This Section...

1. Requirements Definition































Requirements Definition

Business requirements are the operational needs
called for by, well, business users, and they usu-
ally are expressed in terms of broad outcomes,
rather than specific functions. Although design
standards may be referenced, specific design ele-
ments are usually outside the scope here, as the































exercise is not primarily concerned with how to
meet the expressed needs through IT develop-
ment.

Best gathered early in the solution development
cycle, business requirements may encompass
pre-requisites for both functional and non-
functional requirements .


205
Functional requirements are those that describe the actual
function-ality required to accomplish the stated business
requirements: e.g., integration between the capture engine and the
repository, out-of-the-box compliance with Dublin Core metadata
management, LDAP-enabled security management, etc.

Non-functional requirements are more qualitative in nature and
may include the likes of a certain interface look-and-feel, the solu-
tions Total Cost of Ownership, the solution providers knowledge
of your vertical industry, etc.

System requirements, on the other hand, lay out the expectations
for technology and infrastructure, and can be more granular than
their business counterparts. These, too, should be gathered early
in the solution development cycle, and may relate either to the
tech-nology itself or the domain of expertise.

Technical requirements include those regarding the likes of per-
formance, maintainability, adaptability, reliability, availability, secu-
rity, and scalability.

Domain requirements reflect needs that are endemic to the area in
which the system or organization is active; for instance, a records
management system requirement for a military agency likely will
call for compliance with the standard DoD 5015.2, which is specific
to both records management and the military.

Gathering requirements of both types can be done using a variety of
techniques, and like as not, youll use more than one when the time
comes. Regardless of the precise mix, it will be important to capture
the inputs in a way that is fully traceable so the source of
each requirement can be found and any changes made can
be tracked.

Among the more popular techniques are:

Simple observation

Brainstorming

Formal surveys

Personal interviews

Focus groups

Scenario development

Personas, which are possibly fictitious characters created to repre-
sent different user types
User analysis, a similar creature identifying and characterizing
potential users of a system
Task analysis, studying how work gets done

Gap analysis, comparing actual with potential performance

It is also important to involve representatives of all stakeholder
communities in the requirements development process. These are
people who have a valid interest in the process, whether or not
they are affected directly by it or even work for the organization it-
self. They may include the likes of:

Senior management

Business unit managers




206
Legal staff

Records managers

IT personnel

End users

Business partners

Investors

Then last major piece of requirements-gathering involves invento-
rying the existing system to identify any dependencies on other sys-
tems, or other systems that depend on it. This way care can be
taken to manage the impact on applications and databases that
may be outside the direct scope of work but nonetheless will be af-
fected by it. Besides analyzing current use, a study should be made
of potential future overlaps as well so as clear and predictable a
road map as possible can be developed.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org













207
SECTION 5

Solution Design


Lorem Ipsum

1. Solution Design































Solution Design

In the context of information management, the
term solution design encompasses activities
relat-ing to a process or product that is intended
to meet specific organizational and user require-
ments. Understanding what these are generally
involves a sequence of major steps:































Conducting an information needs assessment

Performing a requirements analysis to deter-
mine the features and functions required to
meet those needs, and



208
Developing change procedures to capture, evaluate, and make
any modifications deemed necessary once the solution design
is done

Solution design itself consists of two key components: information
architecture and system architecture, both of which need to be
based on business requirements that are developed in parallel
with the just-mentioned steps.

According to the information Architecture Institute, information
architecture is "the structural design of shared information environ-
ments," a succinct phrase that encompasses the organizing and la-
beling Web sites, intranets, online communities, and software to
support usability and findability.

IAs primary objective, says IA consultant and guru Louis Rosen-
feld, is "balancing the characteristics and needs of users, content,
and context," explaining that, among other things, it can "make
work easier and save money for individual business units; and im-
prove the user experience and build brand loyalty among custom-
ers, and organizational loyalty among employees.

Information architects thus tasked with optimizing a tricky three-
way balance, and must be familiar with the intentions and mind-
sets affecting all three areas.

System architecture translates the logical design of an information
system into a physical structure that supports it by properly orches-
trating such basic components as servers, applications, storage sys-
tems, network support, workflows, and security. A key part of this has
to do with enabling integration and interoperability so all these
components can communicate regarding the information they are
involved with. If this sounds complex, thats because it is! Fortu-
nately, though, a number of standards have been developed to
help facilitate the process.

One of these is the Content Management Interoperability Services
(CMIS) specification, which can be used to improve interoperabil-
ity between enterprise content management systems. Single sign-
on tackles the issue as well, though from an entirely different direc-
tion, by allowing users to access all the systems for which they are
authorized by logging in only once. Not a technical standard, its a
best-practice that not only makes users lives easier by freeing
them from having to remember, and take the time to enter, multi-
ple passwords, but also lowers the IT costs associated with pass-
word support.

Solution design includes a raft of other considerations as well, such
as whether you buy a system or build one yourself. The latter can
provide a more custom fit, performing at just the right level and
giving you total control over all you have created. But the time, ex-
pertise, and, ultimately, cost this can entail may be prohibitive com-
pared to simply acquiring the fruits of someone elses labor -- but
that, of course, means you end up with someone elses vision of
what the features and functions ought to be, and in what relative
strength (for instance, the presence, absence, or mix of enterprise
search, imaging, workflow, records management, collaboration,
etc.).

Another big question has to do with whether to implement a point
solution that addresses a certain specific need, or go the platform
route that permits the plugging in of capabilities that address differ-
ent needs as they arise. And then theres the matter of using open



source or proprietary software. Open source solutions, of course,
are not controlled by a single vendor, and feature the ability to
modify the code (as long as those modifications are made
available to the general community). And while they may be less
expensive to acquire -- even free, maybe -- they may not enjoy the
same level or consistency of support as a proprietary offering, and
may or may not be as secure and standardized. And either way,
there are decisions to be made regarding whether theyre
workable out-of-the-box or will require customization.

If this isnt enough, options for systems deployment are varied too.
Do you want to host and staff it in-house and take direct charge of
its support, maintenance, integration, and upgrading? Do you want
to retain an outside service provider to handle these tasks but do
so on your premises? Or would you prefer to have it handle the
hosting too, and simply connect your network to its servers?

This last option brings you dangerously close to cloud computing,
which is coming to be a catch-all phrase to describe nearly any
kind of outside service bureau but can involve many discrete varia-
tions, not the least of which is whether your instance of the solu-
tion lives on a dedicated or shared host, and whether it was even
designed to support self-provisioning and remote access and shar-
ing.

Some of the major competing interests will be cost, performance,
scalability, compatibility, and security, and how they are weighted
depends entirely upon your the organizations strategy and objec-
tives.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org







210
SECTION 6

Change Management


In This Section...

1. Change Management
Techniques and Principles
2. Transition Planning
Strategies
3. Organization
Readiness, Governance
and User Support


















Change Management Techniques and Prin-
ciples

Change management is the process of guiding an
organization from a current state to a future state
that is aligned with a desired business outcome.

Reasons for its adoption include a change in the
organizations mission, a restructuring of its op-


















erations, the hiring of a new senior executive,
the adoption of new technology, and a post-
merger reorganization.

Centered as it is on human comportment, change
management is one of the more challenging tech-
nology implementation tasks, as it is focused on
taking a structured and controlled approach to


211
transforming the values, priorities, and behaviors of individuals so
they align with and enable that defined future state.

Change management planning should begin with a clear articula-
tion of why a change is even needed -- without this, buy-in from
all affected stakeholders is extremely difficult to achieve, and any
strategic direction is very difficult to maintain.

The next steps involve a series of activities aimed at preparing
for, managing, and ultimately reinforcing change. The research
firm Prosci describes one methodology:

Phase 1 is aimed at getting ready and provides the situational
awareness that is critical for effective change management.
Phase 2 focuses on creating the plans that ultimately devolve
into the projects actual activities -- what people typically think
of when they talk about change management.

Finally, Phase 3 is all about ensuring that the change is sustained
and incorporates the development of measures and mechanisms
to see if the change has taken hold.

The key to making all this work is communication. Because change
breeds uncertainty about the future, and human beings generally
dislike uncertainty, the more people know about whats going on --
and the more opportunity they have to influence it -- the more com-
fortable they become with the eventuality.

What this communication is, and what form it takes, depends upon
the nature of the message being delivered, and the roles of the
people needing to deliver and receive it. For example, a senior
manager or executive sponsor likely would communicate directly
to his subordinates and perhaps send a broad message of support
to the whole organization; individual line workers, on the other
hand, would get their marching orders from their immediate super-
visor, who may narrow the message into only those things that af-
fect them.

Best practices dictate that several varieties of communication
should take place during the different phases of activity. For in-
stance:

Notice of the coming change is something that should be given to
all stakeholders whenever possible so as not to spring a poten-
tially unhappy surprise upon them and to begin boosting their
awareness so their expectations can align with the future vision.

Progress reports should be provided throughout, starting with an
articulation of the reasons for change, including how both they
and the organization will benefit from successful implemen-tation.
It also may be prudent to communicate details of the changes
including those affecting roles, locations, processes, tech-nology,
and costs.

Training should be conducted so stakeholders are thoroughly
prepared to understand, embrace, and function in the new envi-
ronment. In particular, introducing and orienting users to the
new or changed systems and processes will help to ensure a
smoother rollout.

Incentives should be created to motivate stakeholders to partici-
pate and ultimately buy into the change initiative.
And audits should be conducted after the change is imple-
mented to analyze its impact on both organizational functions

212
and stakeholder sentiment. If resistance or an impediment is dis-
covered, it can be addressed by revisiting the communication,
training, and incentives programs just mentioned.

Most every successful change management program has an organ-
izational champion working on its behalf. This means that some-one
-- often at a high level, but not necessarily -- takes a leadership role
in garnering support for the initiative. The ideal champion is a
respected, highly-networked person with a broad understanding of
how the organization does and should operate. Considered an
opinion maker, he or she can translate the overall vision into local
scenarios that illustrate the benefits to other stakeholders.

The presence of change champions can lead to faster project
deliv-ery and sustainable benefit achievement. Though not
everyone has the skills to be ambassadors or advocates of
change, or even the requisite commitment to the cause, some of
the key attributes can be developed through training and coaching.
Here are a few of the most important:

Communication (including presentation) and engagement skills

Facilitation skills

Interpersonal skills

Influencing skills

Problem solving skills

Project management skills

Once the change initiative has gained even a little traction, it is im-
portant to find and foster the so-called early adopters in the organi-
zation -- those stakeholders in the first wave of people who are in-
terested in and accepting of the program. Actively supported and
encouraged, their influence can help build momentum and shorten
the overall adoption curve.

Like champions, early adopters generally hold opinions that are re-
spected by their peers and are more open to embracing change.
However, unlike the term "champion," the term "early adopter" is
more of a label than a role since it entails less responsibility and
fewer duties than a more activist champion.

Nevertheless, identifying and uplifting early adopters can be ex-
tremely important in launching a change management program be-
cause they can influence their peers by participating in pilots, help-
ing to develop or validate documentation and training materials,
and becoming "super-users" who can provide basic assistance to
others when the time comes.

Transition Planning Strategies

Change management involves more than just figuring out what
and how to move from todays way of doing things to tomorrows --
it also requires figuring out how to get from here to there! This is
where transition planning strategy development comes into play,
and generally speaking, there are four options commonly chosen
from among: direct cutover, parallel operation, pilot operation, and
a phased approach.

With a direct cutover, there is an immediate transition from the old
system to the new once it is operational. Although it is one of the
less expensive approaches, it does involve more risk of total sys-
tem failure because it is implementing the complete new system
without first testing it on a smaller scale.


In a parallel operation, both the old and new systems will operate
simultaneously for a period of time, and the output from each will
be compared to the other. Once management, users, and the IT
de-partment are satisfied that the new system is operating
correctly, the old system will be taken offline. This approach is one
of the more costly, but has a lower risk of system failure since the
organi-zation doesnt put all its eggs in the new basket until their
safety is assured.

A pilot operation begins implementation on a small scale, testing it
in one area of the organization before making a full transition to
the whole. It can be used in conjunction with a direct cutover or
parallel operation, but restricting initial implementation to a pilot
site avoids the risk of a full system failure right out of the chute.

In a phased approach, the transition to the new system is made in
stages or modules. Each subsystem can be implemented in
conjunc-tion with the three previously mentioned system transition
meth-ods. With this approach, risk of failure or error is limited only
to the implemented module.

Whichever route is chosen, certain key steps must be taken to
en-sure the change management projects success.

First, its enormously important to gain the support of top manage-
ment so the weight of authority can be brought to bear on all com-
munications and training, and on the organizations commitment to
the effort itself.

Second, be sure to get users and business owners involved early
on. Not only does this lead to a future-state that meets their articu-
lated needs, but it helps cement their support since they had
a hand in designing it.

Next, focus on the business process and not the underlying technol-
ogy. Though the right technology can create new process possibili-
ties, process is in the drivers seat and should be viewed that way.

Fourth, develop "personas" -- composite characterizations of vari-
ous stakeholders -- to understand how the new system will impact
different kinds of users, and to begin to prepare the help desk for
the support demands it inevitably will receive.

Fifth, start small, and build from there.

Sixth, be fanatical about internal PR and communication. Let no op-
portunity pass to let stakeholders know whats happening and why
they should be excited about it.

Finally, train, train, train your people in the "whats," "whys," and
"hows" of the new situation.

Organization Readiness, Governance and User Support

A big part of the success of any information management program
is the readiness of the organization to adapt to the new policies
and procedures that likely will be required. Best assessed when
the specific requirements are defined, since thats when the
strongest endorsements or objections are likely to be expressed,
this can be examined by studying the corporate culture, structure,
processes, communication models, leadership styles, and even
vertical indus-try influences.

Many times, a SWOT analysis (strengths-weaknesses-
opportunities-threats) is then performed to gauge peoples reac-


214
tions to the more concrete issues it showcases, and thus to identify
areas of possible resistance. This is especially important to do, per-
haps in simplified fashion, when working with end users since the
impact of the new system on them -- and their response to it -- will
be central to its ultimate effectiveness.

The state of your organizations governance practices is another
fac-tor that can affect the ability to manage change since good
govern-ance boils down to consistent and considered control, and
control is something not many people are comfortable with
changing or relinquishing.

Governance is a culture of accountability to which employees at all
levels -- senior executives, business unit managers, end users,
and IT, records, and legal staff -- must be committed. Otherwise,
the best technology and the most well-considered guidelines will
mean little, and operational standardization and compliance both
will go out the window.

Achieving governance requires the establishment of an organiza-
tional structure to guide, oversee, and arbitrate the process -- hope-
fully with better discipline than exhibited by the committee in this
cartoon! Populated with representatives from all walks of organiza-
tional life, the list of responsibilities is long and generally includes:

Establishing policies and standards, including implementation
methodologies, development platforms, and integration proto-
cols so everything works together the way theyre supposed to

Prioritizing projects, starting with the most achievable as defined
by feasibility, impact, or sponsorship (in other words, who wants
it)
Enforcing rules and providing a conduit to executive authority
for final judgment
Maintaining best practices through shared vocabularies and stan-
dard operating procedures
Establishing a measure-and-improve mindset by capturing met-
rics and analyzing query logs and click trails to identify areas
needing enlargement

Integrating the handling of taxonomy, metadata, user interfaces,
and search to ensure they all work together for usability, compli-
ance, and proper tagging to facilitate automation

Good governance requires that all of these tasks be undertaken
and in an organized way. It wont all happen overnight, though, so
breaking it into smaller pieces -- and perhaps assigning those
pieces to smaller subcommittees -- is not a bad way to go. Since
even the best-laid plans can go awry at times, it is critical to build
troubleshooting resources into the plan so people running into
snags have a readily-identifiable, confidence-inspiring route to fol-
low when seeking solutions.

These resources should exist on both the business and technology
sides of the house, since issues are likely to crop up in both areas.
Business issues may include the likes of not understanding a new
process, having to follow updated compliance procedures, or utiliz-
ing new metadata tags. Technology issues, on the other hand, may
include the likes of interface confusion ("where do I go to find X?"),
search mechanisms and terms (especially if theres a new taxon-
omy in use), permission management, or even out-and-out soft-
ware debugging.



In-person training and online resources can serve well as minimiz-
ers of and guideposts to solving problems. But theres no
substitute for a support or help desk when it comes to digging deep
or rapid response.

A typical help desk provides the users a single point of contact to
receive help on various information technology issues. It typically
manages its requests via help desk software, such as an issue
track-ing system, that allows them to track user requests with a
unique number.

Extensive knowledge of the system and easy and friendly interac-
tion with users are the twin keys to boosting peoples comfort with
something new and thus fostering their willingness to change. The
benefits work in two directions, though, as the regular communica-
tion with users allows the help desk to monitor the environment
and anticipate possibly disruptive issues ranging from the highly
technical to simple preferences and satisfaction. This kind of infor-
mation is invaluable in terms of avoiding surprises, and embed-
ding the responses in the online help materials can result in fewer
calls to the desk and thus lower support costs.

Running a help desk generally involves three main levels of sup-
port, one by phone, one in person, and one for especially thorny
issues related to specific products or custom applications.
Though every situation is different, there are at least four
touchstones for achieving maximum efficiency and effectiveness:

Designate and adhere to the support levels, and staff them
appro-priately. Assign individuals with less training and expertise
to the lower levels, and those with more to higher levels. Anyone
unable to resolve an issue kicks it up a level in a controlled esca-
lation that ensures efficiency and allows for higher-level employ-
ees to focus on higher-level jobs.

Establish a workflow that enters the problem into the help desk
software when it first comes in, and assigns it to a member of
the support staff, and makes clear who is responsible for
ensuring the problem is satisfactorily resolved.

Track all issues, even if they are fixed quickly. Careful time
stamping and tracking helps technicians take care of issues in a
timely manner, and charting which machines and people are hav-
ing the problem can highlight possibly defective devices and em-
ployees who may need more training.

Provide constant training and support for help desk staff, who
need to know every inch of both the help desk software and the
information solutions in their care.

Key Links:

Background information on the CIP.
Practice Test -- Do a Self-Assessment.
Free videos to prepare for the test.
White paper on the CIP.
Register for the test.
Contact for more information: jwilkins [at] aiim.org

Вам также может понравиться