Вы находитесь на странице: 1из 77

Lecture Plan (MBA)

MS-3402 Business Intelligence and Applications


What Is an Information System?

Too often you hear someone say, "Oh yeah, I know how to use a computer. I can surf the
Web with the best of them and I can play Solitaire for hours. I'm really good at
computers." Okay. So that person can pound a keyboard, use a mouse at lightning speed,
and has a list of favorite Web sites a mile long. But the real question is "Is that person
information literate?" Just because you can pound the keyboard doesn't necessarily mean
you can leverage the technology to your advantage or the advantage of your organization.
An organization can gather and keep all the data on its customers that a hard drive can
hold. You can get all the output reports that one desk can physically hold. You can have
the fastest Internet connection created to date. But if the organization doesn't take
advantage of customer data to create new opportunities, then all it has is useless
information. If the output report doesn't tell the management that it has a serious problem
on the factory floor, then all that's been accomplished is to kill a few more trees. If you
don't know how to analyze the information from a Web site to take advantage of new
sales leads, then what have you really done for yourself today?
Most of us think only of hardware and software when we think of an Information System.
There is another component of the triangle that should be considered, and that's the
people side, or "persware." Think of it this way:

In this section of the text, Laudon & Laudon discuss the components of an information
system. They talk about the input, processing, output and feedback processes. Most
important is the feedback process; unfortunately it's the one most often overlooked. Just
as in the triangle above, the hardware (input and output) and the software (processing)
receive the most attention. With those two alone, you have computer literacy. But if you
don't use the "persware" side of the triangle to complete the feedback loop, you don't
accomplish much. Add the "persware" angle with good feedback and you have the
beginnings of information literacy.
A Business Perspective on Information Systems
Using feedback completes the information processing loop. To be a good Information
Systems manager, however, you must bring into that loop far more than just the computer
data. For instance, your information system reports that you produced 100,000 widgets
last week with a "throwback" rate of 10%. The feedback loop tells you that the throwback
rate has fallen 2% in the last month. Wow, you say, that's a pretty good improvement. So
far, so good. But if you put that information into a broader context, you're still costing the
organization a huge sum of money because each percentage point on the throwback rate
averages $10,000. And when you bring in available external environmental information,
your company is 5% above the industry norm. Now that's information you can use - to
your advantage or disadvantage!
If you, as a manager, can then take other information from the internal and external
environments to come up with a solution to this problem, you can consider yourself
"information literate."
Organizations are funny things. Each one tends to have its own individual personality and
yet share many things in common with other organizations. Look at some of the
organizations you may be associated with - softball team, fraternity/sorority, health club,
or a child's soccer team. See, organizations exist everywhere and each of them has its
own structure, just as workplace organizations have their own structure and personality to
fit their needs, or in some cases, habits.
A baseball team needs talented, well-trained players at different positions. Sometimes,
the success of the team depends on a good, well-informed coach or manager. So too with
the workplace organization. Business organizations need many kinds of players with
various talents, who are well-trained and well-informed, in order to succeed.
Every organization requires tools to help it succeed. If the baseball team uses bats that are
25 years old against a team whose bats are 2 years old, they will have to work harder on
their own to make up for that disadvantage. If your child's soccer team uses balls with
torn seams, they're going to have a harder time hitting the ball into the goal. So if your
organization is using older equipment or uses it the wrong way, it just stands to reason it
is going to have a harder time beating the odds.
Every good organization needs a good manager. Pretty simple, pretty reasonable. Take
professional baseball managers. They don't actually play the game, they don't hit the
home run, catch the flyball for the last out, or hang every decoration for the celebration
party. They stay on the sidelines during the game. Their real role is to develop the game
plan by analyzing their team's strengths and weaknesses. But that's not all; they also
determine the competition's strengths and weaknesses. Every good manager has a game
plan before the team even comes out of the locker room. That plan may change as the
game progresses, but managers pretty much know what they're going to do if they are
losing or if they are winning. The same is true in workplace organizations.
Do you own a Digital Video Disk? Probably not, since it's only been on the market for a
short time. How old is your car or truck? Manufacturers are constantly offering us new
vehicles, yet we tend to upgrade only every few years. Your personal computer may be a
year old or three years old. Do you have the latest gadgets? Chances are you don't. Face
it, you just can't keep up with all the new stuff. No one can. Think about how hard, not to
mention expensive, it is for an individual to acquire everything introduced to the
marketplace. Think how difficult it is sometimes to learn how to use every feature of all
those new products.
Now put those thoughts into a much larger context of an organization. Yes, it would be
nice if your company could purchase new computers every three months so you could
have the fastest and best technology on the market. But it can't. Not only is it expensive
to buy the hardware and the software, but the costs of installing, maintaining, updating,
integrating, and training must all be taken into account. We'll look at the hardware and
software sides of the Information Systems triangle in upcoming chapters, but it's
important that you understand now how difficult it is for an organization, large or small,
to take advantage of all the newest technology.

Role of Information Systems

As a consumer, you have instant access to millions of pieces of data. With a few clicks of
the mouse button, you can find anything from current stock prices and video clips of
current movies. You can get product descriptions, pictures, and prices from thousands of
companies across India and around the world. Trying to sell services and products? You
can purchase demographic, economic, consumer buying pattern, and market-analysis
data. Your firm will have internal financial, marketing, production, and employee data for
past years. This tremendous amount of data provides opportunities to managers and
consumers who know how to obtain it and analyze it to make better decisions.
The speed with which Information Technology (IT) and Information Systems (IS) are
changing our lives is amazing. Only 50 years ago communication was almost limited to
the telephone, the first word processors came out in the mid-sixties and the fax entered
our offices in the 1970's. Today information systems are everywhere; from supermarkets
to airline reservations, libraries and banking operations they have become part of our
daily lives.
The first step in learning how to apply information technology to solve problems is to get
a broader picture of what is meant by the term information system. You probably have
some experience with using computers and various software packages. Yet, computers
are only one component of an information system. A computer information system
(CIS) consists of related components like hardware, software, people, procedures, and
collections of data. The term information technology (IT) represents the various types
of hardware and software used in an information system, including computers and
networking equipment. The goal of Information System is to enable managers to make
better decisions by providing quality information. The physical equipment used in
computing is called hardware. The set of instructions that controls the hardware is
known as software. In the early days of computers, the people directly involved in are
tended to be programmers, design analysts, and a few external users. Today, almost
everyone in the firm is involved with the information system. Procedures are
instructions that help people use the systems. They include items such as user manuals,
documentation, and procedures to ensure that backups are made regularly. Databases are
collections of related data that can be retrieved easily and processed by the computers.
To create an effective information system, you need to do more than simply purchase the
various components. Quality is an important issue in business today, particularly as it
relates to information systems. The quality of an information system is measured by its
ability to provide exactly the information needed by managers in a timely manner. The
information must be accurate and up-to-date. Users should be able to receive the
information in a variety of formats: tables of data, graphs, summary statistics, or even
pictures or sound:
Users have different perspectives and different requirements, and a good information
system must have the flexibility to present information in diverse forms for each user.
Lecture - 3

The Relationship Between Organizations and Information Systems

How organizations and information systems work together, or sometimes against each
other. The idea of course is to keep them in sync, but that's not always possible. We'll
look at the nature of organizations and how they relate to Information Systems.
The Two-Way Relationship

This figure shows the complexity of the relationship between organizations and
information technology. Installing a new system or changing the old one involves much
more than simply plunking down new terminals on everyone's desk. The greatest
influence, as the text points out, could simply be sheer luck!
What Is an Organization?
An organization is very similar to the Information System described previously.

These figures have many things in common. Both require inputs and some sort of
processing, both have outputs, and both then depend on feedback for successful
completion of the loop.
Information Systems use data as their main ingredient. Organizations rely on people.
However, the similarities are remarkable. They are both a structured method of turning
raw products (data/people) into useful entities (information/producers).
Think of some of the organizations you've been involved in. Didn't each of them have a
structure, even if it wasn't readily apparent? Perhaps the organization seemed chaotic or
didn't seem to have any real purpose. Maybe that was due to poor input, broken-down
processing, or unclear output. It could very well be that feedback was ignored or missing

Often times an organization's technical definition, the way it's supposed to work, is quite
different from the behavioral definition, the way it ireally works. For instance, even
though Sally is technically assigned to the Production Department with Sam as her
supervisor on paper, she really works for Tom in Engineering. When a company is
developing a new information system, it's important to keep both the technical and
behavioral definitions in perspective and build the system accordingly.

Salient Features of Organizations

This section gives you a perspective on how organizations are constructed and compares
their common and uncommon features.
Why Organizations Are So Much Alike: Common Features
The class you're enrolled is an organization of sorts, isn't it? Think about it. Look at the
table describing the characteristics of an organization:

When you hear the term bureaucracy, you immediately think of government agencies.
Not so; bureaucracies exist in many private and public companies. Bureaucracies are
simply very formal organizations with strict divisions of labor and very structured ways
of accomplishing tasks. They are usually thought of in a negative way, but they can be
Standard Operating Procedures
How many of these characteristics fit your college class? How many fit any organization
you're in? Some of the Standard Operating Procedures (SOPs), politics, and culture
are so ingrained in organizations that they actually hinder the success of the group. Think
about your experiences in groups. You had a leader (hierarchy), a set of rules by which
you operated (explicit rules and procedures), and people appointed to perform certain
tasks (clear division of labor). You probably voted on different issues (impartial
judgments), and you decided on the best person to fill various positions within the group
(technical qualifications for positions). Hopefully, the organization was able to fulfill its
goals (maximum organizational efficiency), whether winning a softball game or putting
on an award-winning play. If your organization wasn't successful, perhaps it was because
of the SOPs, the politics, or the culture.
The point is, every group of people is an organization. The interesting question you could
ask yourself would be "How would the world look and function without some kind of
Organizational Politics
Everyone has their own opinion about how things should get done. People have
competing points of view. What might be good for Accounting may not be to the
advantage of Human Resources. The Production Department may have a different agenda
for certain tasks than the Shipping Department. Especially when it comes to the
allocation of important resources in an organization, competition heats up between people
and departments. The internal competition can have a positive or negative influence on
the organization, depending on how it's handled by management. The fact remains that
politics exist in every organization and should be taken into account when it comes to the
structure of the information system.
Organizational Culture
Just as countries or groups of people have their own habits, methods, norms, and values,
so too do businesses. It's not unusual for companies to experience clashes between the
culture and desired changes brought about by new technologies. Many companies are
facing such challenges as they move toward a totally different way of working, thanks to
the Internet.

Introduction to Decision Making
Everybody makes decisions. It's a natural part of life, and most of the time we don't even
think about the process. In an organization, decisions are made at every level. The level at
which the decision is made can also determine the complexity of the decision in relation
to the input of data and output of information.
Levels of Decision Making
In Chapter 2 we discussed the various types of Information Systems and how they relate
to the levels of an organization. We can also relate those Information Systems to the
types of decisions managers make.

• Strategic Decision Making. These decisions are usually concerned with the
major objectives of the organization, such as "Do we need to change the core
business we are in?" They also concern policies of the organization, such as "Do
we want to support affirmative action?"
• Management Control. These decisions affect the use of resources, such as "Do
we need to find a different supplier of packaging materials?" Management-level
decisions also determine the performance of the operational units, such as "How
much is the bottleneck in Production affecting the overall profit and loss of the
organization, and what can we do about it?"
• Knowledge-Level Decision Making. These decisions determine new ideas or
improvements to current products or services. A decision made at this level could
be "Do we need to find a new chocolate recipe that results in a radically different
taste for our candy bar?"
• Operational control. These decisions determine specific tasks that support
decisions made at the strategic or managerial levels. An example is "How many
candy bars do we produce today?"

Types of Decisions: Structured versus Unstructured

Some decisions are very structured while others are very unstructured. You may wake up
in the morning and make the structured, routine decision to get out of bed. Then you have
to make the unstructured decision of what clothes to wear that day (for some of us this
may be a very routine decision!). Structured decisions involve definite procedures and are
not necessarily very complex. The more unstructured a decision becomes the more
complex it becomes.
Types of Decisions and Types of Systems
One size does not fit all when it comes to pairing the types of systems to the types of
decisions. Every level of the organization makes different types of decisions, so the
system used should fit the organizational level, as shown in Figure 4.4.
It's easy to develop an information system to support structured decision making. Do you
increase production on the day shift or hold it to the swing shift; do you purchase another
piece of equipment or repair the old one? What hasn't been so easy to develop is a system
that supports the unstructured decision making that takes place in the upper echelons of a
company. Do we expand into foreign markets or stay within the confines of our own
country; do we build a new plant in Arizona or Alabama; do we stop production of a
long-time product due to falling demand or boost our marketing? The ability to create
information systems to support the latter decisions is long overdue.

Stages of Decision Making

Some people seem to make sudden or impulsive decisions. Other people seem to make
very slow, deliberate decisions. But regardless of appearances, the decision-making
process follows the same stages of development and implementation. Let's use the
example of purchasing a new television, using Figure

• Intelligence. You identify the facts: You don't have a television or the one that
you do have isn't any good. You intuitively understand what the problem is and
the effect it's having on you. You missed your favorite show last night.
• Design. You design possible solutions: You could watch the television in your
neighbor's apartment or you could purchase a new one for yourself. Your
neighbor will get annoyed if you keep coming over. On the other hand, you won't
be able to go on vacation if you use your money to buy a new television.
• Choice. You gather data that helps you make a better decision: Your neighbor
doesn't like the same shows you like or she's getting rather tired of you being
there. You also determine that televisions cost a lot of money so you figure out
how you can afford one. You choose to purchase a new television instead of
watching your neighbor's.
• Implementation. You implement the decision: You stop at the appliance store on
your way home from work and carry out your decision to purchase a new
• Feedback. You gather feedback: You're broke but you can watch anything you

Of course this is a simplified example of the decision-making process. But the same
process is used for almost every decision made by almost every person.
Information Systems help improve the decision-making process by
• providing more information about the problem
• presenting a greater variety of possible alternatives
• showing consequences and effects of choices
• measuring the outcome of different possible solutions
• providing feedback on the decision that is made
Codd's Rules
Rule 1 : The information Rule.
"All information in a relational data base is represented explicitly at the
logical level and in exactly one way - by values in tables."
Everything within the database exists in tables and is accessed via table
access routines.
Rule 2 : Guaranteed access Rule.
"Each and every datum (atomic value) in a relational data base is
guaranteed to be logically accessible by resorting to a combination of
table name, primary key value and column name."
To access any data-item you specify which column within which table it
exists, there is no reading of characters 10 to 20 of a 255 byte string.
Rule 3 : Systematic treatment of null values.
"Null values (distinct from the empty character string or a string of blank
characters and distinct from zero or any other number) are supported in
fully relational DBMS for representing missing information and
inapplicable information in a systematic way, independent of data type."
If data does not exist or does not apply then a value of NULL is applied,
this is understood by the RDBMS as meaning non-applicable data.
Rule 4 : Dynamic on-line catalog based on the relational model.
"The data base description is represented at the logical level in the
same way as-ordinary data, so that authorized users can apply the
same relational language to its interrogation as they apply to the regular
The Data Dictionary is held within the RDBMS, thus there is no-need for
off-line volumes to tell you the structure of the database.
Rule 5 : Comprehensive data sub-language Rule.
"A relational system may support several languages and various modes
of terminal use (for example, the fill-in-the-blanks mode). However,
there must be at least one language whose statements are expressible,
per some well-defined syntax, as character strings and that is
comprehensive in supporting all the following items

• Data Definition
• View Definition
• Data Manipulation (Interactive and by program).
• Integrity Constraints
• Authorization.

Every RDBMS should provide a language to allow the user to query the
contents of the RDBMS and also manipulate the contents of the
Rule 6 : .View updating Rule
"All views that are theoretically updatable are also updatable by the
Not only can the user modify data, but so can the RDBMS when the
user is not logged-in.
Rule 7 : High-level insert, update and delete.
"The capability of handling a base relation or a derived relation as a
single operand applies not only to the retrieval of data but also to the
insertion, update and deletion of data."
The user should be able to modify several tables by modifying the view
to which they act as base tables.
Rule 8 : Physical data independence.
"Application programs and terminal activities remain logically
unimpaired whenever any changes are made in either storage
representations or access methods."
The user should not be aware of where or upon which media data-files
are stored
Rule 9 : Logical data independence.
"Application programs and terminal activities remain logically
unimpaired when information-preserving changes of any kind that
theoretically permit un-impairment are made to the base tables."
User programs and the user should not be aware of any changes to the
structure of the tables (such as the addition of extra columns).
Rule 10 : Integrity independence.
"Integrity constraints specific to a particular relational data base must be
definable in the relational data sub-language and storable in the catalog,
not in the application programs."
If a column only accepts certain values, then it is the RDBMS which
enforces these constraints and not the user program, this means that an
invalid value can never be entered into this column, whilst if the
constraints were enforced via programs there is always a chance that a
buggy program might allow incorrect values into the system.
Rule 11 : Distribution independence.
"A relational DBMS has distribution independence."
The RDBMS may spread across more than one system and across
several networks, however to the end-user the tables should appear no
different to those that are local.
Rule 12 : Non-subversion Rule.
"If a relational system has a low-level (single-record-at-a-time)
language, that low level cannot be used to subvert or bypass the
integrity Rules and constraints expressed in the higher level relational
language (multiple-records-at-a-time)."

Enhancing Management Decision Making

The more information you have, based on internal experiences or from external sources,
the better your decisions. Business executives are faced with the same dilemmas when
they make decisions. They need the best tools available to help them.

Decision-Support Systems
When we discussed Transaction Processing Systems and Management Information
Systems, the decisions were clear-cut: "Should we order more sugar to support the
increased production of candy bars?" Most decisions facing executives are unstructured
or semi structured: "What will happen to our sales if we increase our candy bar prices by
Decision Support Systems (DSS) help executives make better decisions by using
historical and current data from internal Information Systems and external sources. By
combining massive amounts of data with sophisticated analytical models and tools, and
by making the system easy to use, they provide a much better source of information to
use in the decision-making process.
In order to better understand a decision support system, let's compare the characteristics
of an MIS system with those of a DSS system:

Structured decisions Semi structured, unstructured decisions
Focused on specific decisions or classes of
Reports based on routine flows of data
General control of organization End-user control of data, tools, and sessions
Emphasizes change, flexibility, quick
Structured information flows
Presentation in form of reports Presentation in form of graphics
Greater emphasis on models, assumptions,
ad hoc queries
Develop through prototyping; iterative
Traditional systems development
You can also understand the differences between these two types of systems by
understanding the differences in the types of decisions made at the two levels of

Types of Decision-Support Systems

Because of the limitations of hardware and software, early DSS systems provided
executives only limited help. With the increased power of computer hardware, and the
sophisticated software available today, DSS can crunch lots more data, in less time, in
greater detail, with easy to use interfaces. The more detailed data and information
executives have to work with, the better their decisions can be.
Model-Driven DSS were isolated from the main Information Systems of the organization
and were primarily used for the typical "what-if" analysis. That is, "What if we increase
production of our candy bars and decrease the shipment time?" These systems rely
heavily on models to help executives understand the impact of their decisions on the
organization, its suppliers, and its customers.
Data-Driven DSS take the massive amounts of data available through the company's
TPS and MIS systems and cull from it useful information which executives can use to
make more informed decisions. They don't have to have a theory or model but can "free-
flow" the data.
By using data mining, executives can get more information than ever before from their
data. One danger in data mining is the problem of getting information that, on the surface,
may seem meaningful but when put into context of the organization's needs, simply
doesn't provide any useful information.
For instance, data mining can tell you that on a hot summer day in the middle of Texas,
more bottled water is sold in convenience stores than in grocery stores. That's useful
information executives can use to make sure more stock is targeted to convenience stores.
Data mining could also reveal that when customers purchase white socks, they also
purchase bottled water 62% of the time. We seriously doubt there is any correlation
between the two purchases. The point is that you need to beware of using data mining as
a sole source of decision making and make sure your requests are as focused as possible.
Laudon and Laudon describe five types of information you can get from data mining
customer information:

• Associations: Immediate links between one purchase and another purchase

• Sequences: Phased links; because of one purchase, another purchase will be made
at a later time
• Classification: Predicting purchases based on group characteristics and then
targeting marketing campaigns
• Clustering: Predicting consumer behavior based on demographic information
about groups to which individuals belong
• Forecasting: Use existing values to determine what other values will be
Components of DSS

A DSS has three main components, as shown in Figure 15.1: the database, software and
models. The database is, of course, data collected from the organization's other
Information Systems. Another important source of information the organization may use
is external data from governmental agencies or research data from universities. The data
can be accessed from the warehouse or from a data mart (extraction of data from the
warehouse). Many databases are now being maintained on desktop computers instead of
The DSS software system must be easy to use and adaptable to the needs of each
executive. A well-built DSS uses the models that the text describes. You've probably
used statistical models in other classes to determine the mean, median, or deviations of
data. These statistical models are the basis of data mining.
The What-If decisions most commonly made by executives use sensitivity analysis to
help them predict what effect their decisions will have on the organization. Executives
don't make decisions based solely on intuition. The more information they have, the more
they experiment with different outcomes in a safe mode, the better their decisions. That's
the benefit of the models used in the software tools.

Examples of DSS Applications

• The Advanced Planning System - A Manufacturing DSS: Uses the sensitivity

analysis model of "What-If" analysis

• Southern California Gas Company: Uses classification and clustering data mining
techniques to focus new marketing efforts.

• Shop-Ko Stores: Uses the datamining technique of associations to recognize

customers' purchasing patterns.

• Geographic Information Systems (GIS): Very popular with, of all people, farmers
and ranchers. Using GIS tools, they can determine exactly how much fertilizer to
spread on their fields without over- or under-spraying. They save money, time,
and the land! Because of pinpoint accuracy, GIS systems are used by emergency
response teams to help rescue stranded skiers, hikers, and bicyclists.

Web-Based DSS
Of course, no discussion would be complete without information about how companies
are using the Internet and the Web in the customer DSS decision-making process. Figure
15.3 shows an Internet CDSS (Customer Decision-Support System).

Here's an example: You decide to purchase a new home and use the Web to search real
estate sites. You find the perfect house in a good neighborhood but it seems a little
pricey. You don't know the down payment you'll need. You also need to find out how
much your monthly payments will be based on the interest rate you can get. Luckily the
real estate Web site has several helpful calculators (customer decision support systems)
you can use to determine the down payment, current interest rates available, and the
monthly payment. Some customer decision support systems will even provide an
amortization schedule. You can make your decision about the purchase of the home or
know instantly that you need to find another house.


Group Decision-Support Systems

More and more, companies are turning to groups and teams to get work done. Hours
upon hours are spent in meetings, in group collaboration, in communicating with many
people. To help groups make decisions, a new category of systems was developed--the
group decision-support system (GDSS).
What Is a GDSS?
You've been there: a meeting where nothing seemed to get done, where some people
dominated the agenda and others never said a word, which dragged on for hours with no
clear agenda. When it was all over no one was sure what was accomplished, if anything.
But the donuts and coffee were good!
Organizations have been struggling with this problem for years. They are now using
GDSS as a way to increase the efficiency and effectiveness of meetings. The text
includes a list of elements that GDSS use to help organizations. We'll highlight a few of

• Preplanning: A clear-cut agenda of the topics for the meeting.

• Open, collaborative meeting atmosphere: Free flow of ideas and communications
without any of the attendees feeling shy about contributing
• Evaluation objectivity: Reduces "office politics" and the chance that ideas will be
dismissed because of who presented them instead of what was presented
• Documentation: Clear communication about what took place and what decisions
were made by the group
• Preservation of "organizational memory": Even those unable to attend the meeting
will know what took place; great for geographically separated team members.

GDSS Characteristics and Software Tools

In GDSS the hardware includes more than just computers and peripheral equipment. It
also includes the conference facilities, audiovisual equipment, and networking equipment
that connects everyone. The persware extends to the meeting facilitators and the staff that
keeps the hardware operating correctly. As the hardware becomes more sophisticated and
widely available, many companies are bypassing specially equipped rooms in favor of
having the group participants "attend" the meeting through their individual desktop
Many of the software tools and programs discussed in Chapter 14, Groupware, can also
be used to support GDSS. Some of these software tools are being reworked to allow
people to attend meetings through Intranets or Extranets. Some highlights:

• Electronic questionnaires: Set an agenda and plan ahead for the meeting
• Electronic brainstorming: Allows all users to participate without fear of reprisal or
• Questionnaire tools: Gather information even before the meeting begins, so facts
and information are readily available
• Stakeholder identification: Determines the impact of the group's decision
• Group dictionaries: Reduce the problem of different interpretations

Now instead of wasting time in meetings, people will know ahead of time what is on the
agenda. All of the information generated during the meeting is maintained for future use
and reference. Because input is anonymous, ideas are evaluated on their own merit. And
for geographically separated attendees, travel time and dollars are saved. Electronic
meeting systems make these efficiencies possible. Figure 15.6 shows the sequence of
activities at a typical EMS meeting.

All is not perfect with EMS, however. Face-to-face communications is critical for
managers and others to gain insight into how people feel about ideas and topics. Body
language can often speak louder than words. Some people still may not contribute freely
because they know that all input is stored on the file server, even though it is anonymous.
And the system itself imposes disciplines on the group that members may not like.

How GDSS Can Enhance Group Decision Making

Go back to the previous list of problems associated with meetings and you can determine
how GDSS solve some of these problems.

1. Improved preplanning: Forces an agenda to keep the meeting on track.

2. Increased participation: Increases the number of people who can effectively
contribute to the meeting.
3. Open, collaborative meeting atmosphere: Nonjudgmental input by all attendees.
4. Criticism-free idea generation: Anonymity can generate more input and better
5. Evaluation objectivity: The idea itself is evaluated and not the person contributing
the idea.
6. Idea organization and evaluation: Organized input makes it easier to comprehend
the results of the meeting.
7. Setting priorities and making decisions: All management levels are on equal
8. Documentation of meetings: Results of meeting are available soon after for
further use and discussion.
9. Access to external information: Reduces amount of disagreements by having the

Lecture -10

Groupware Technologies

Groupware technology has long been heralded as a way to improve business processes
and individual work practices. However, many instantiations of groupware technologies
have not met expectations. Some groupware has failed to be adopted by enough
individuals in an organization to make its use beneficial. Failure has been in part
attributed to deployment problems where the technology was not available to those who
could most benefit from it [4], or required those who would not benefit from it to adopt it
Electronic calendars/on-line meeting schedulers make good groupware examples for
study because they have obvious mappings to real world artifacts, putting them among
the seemingly simplest groupware technologies. Additionally, a 1991 Internet-
administered survey found that calendaring systems were the most available groupware
technology, although they were the least used [1]. Further, information about their early
use is available as a result of investigations by Ehrlich [2,3] and Grudin [5] who studied
organizations, including software development companies, that failed to adopt
calendaring systems.
Today there are examples where groupware--and calendaring systems in particular--are
taking a strong hold. What has changed? Studies of the use of calendaring systems at two
sites--Microsoft and Sun Microsystems--reported in Grudin & Palen [6] have revealed
several organizational, behavioral and technical factors that enable widespread use. This
study has also raised additional research issues about adaptation of groupware
technologies to their organizational environments and individual work practices. Interim
findings indicate that social norms and communication behaviors about meeting
arranging might be influenced by the amount of information calendars reveal; that
tangible artifacts can be born out of technologically-supported collaborations which in
turn are useful for other purposes; and that there are potentially critical trade-offs
between efficiency, information resource creation, and privacy. My dissertation research
will refine and elaborate the conditions that facilitate groupware adoption, and investigate
subsequent integration of groupware technology into work practices and the
organizational environment.

Expert System

An expert system, also known as a knowledge based system, is a computer program that
contains the knowledge and analytical skills of one or more human experts, related to a
specific subject. This class of program was first developed by researchers in artificial
intelligence during the 1960s and 1970s and applied commercially throughout the 1980s
The primary goal of expert systems research is to make expertise available to decision
makers and technicians who need answers quickly. There is never enough expertise to go
around -- certainly it is not always available at the right place and the right time. Portable
with computers loaded with in-depth knowledge of specific subjects can bring decades
worth of knowledge to a problem. The same systems can assist supervisors and managers
with situation assessment and long-range planning. Many small systems now exist that
bring a narrow slice of in-depth knowledge to a specific problem, and these provide
evidence that the broader goal is achievable.
These knowledge-based applications of artificial intelligence have enhanced productivity
in business, science, engineering, and the military. With advances in the last decade,
today's expert systems clients can choose from dozens of commercial software packages
with easy-to-use interfaces.
Each new deployment of an expert system yields valuable data for what works in what
context, thus fueling the AI research that provides even better applications.
Lecture –11

What is Data Warehouse?

A decision support database that is maintained separately from the organization’s

operational database
Support information processing by providing a solid platform of consolidated, historical
data for analysis.
“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile
collection of data in support of management’s decision-making process.”—W. H. Inmon

Data Warehouse—Integrated

Constructed by integrating multiple, heterogeneous data sources

relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are applied.
Ensure consistency in naming conventions, encoding structures, attribute measures, etc.
among different data sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted

Data Warehouse—Time Variant

The time horizon for the data warehouse is significantly longer than that of operational
Operational database: current value data.
Data warehouse data: provide information from a historical perspective (e.g., past 5-10
Every key structure in the data warehouse
Contains an element of time, explicitly or implicitly
But the key of operational data may or may not contain “time element”.

Data Warehouse—Non-Volatile

Operational update of data does not occur in the data warehouse environment.
Does not require transaction processing, recovery, and concurrency control mechanisms
Requires only two operations in data accessing:
Initial loading of data and access of data.

Data warehousing and on-line analytical processing (OLAP) are essential elements of
decision support, which has increasingly become a focus of the database industry. Many
commercial products and services are now available, and all of the principal database
management system vendors now have offerings in these areas. Decision support places
some rather different requirements on database technology compared to traditional on-
line transaction processing applications. We describe back end tools for extracting,
cleaning and loading data into a data warehouse; multidimensional data models typical of
OLAP; front end client tools for querying and data analysis; server extensions for
efficient query processing; and tools for metadata management and for managing the
Data Warehouse vs. Heterogeneous DBMS

Traditional heterogeneous DB integration:

Query driven approach
When a query is posed to a client site, a meta-dictionary is used to translate the query into
queries appropriate for individual heterogeneous sites involved, and the results are
integrated into a global answer set
Complex information filtering, compete for resources

Data warehouse: update-driven, high performance

Information from heterogeneous sources is integrated in advance and stored in
warehouses for direct query and analysis

Data Warehouse vs. Operational DBMS

OLTP (on-line transaction processing)

Major task of traditional relational DBMS
Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll,
registration, accounting, etc.

OLAP (on-line analytical processing)

Major task of data warehouse system
Data analysis and decision making

Distinct features (OLTP vs. OLAP):

User and system orientation: customer vs. market
Data contents: current, detailed vs. historical, consolidated
Database design: ER + application vs. star + subject
View: current, local vs. evolutionary, integrated
Access patterns: update vs. read-only but complex queries
The Case Against Data Warehousing
Data warehousing systems, for the most part, store historical data that have been
generated in internal transaction processing systems. This is a small part of
the universe of data available to manage a business. Sometimes this part has
limited value.
Data warehousing systems can complicate business processes significantly.
If most of your business needs are to report on data in one transaction processing
system and/or all the historical data you need are in that system and/or the
data in the system are clean and/or your hardware can support reporting
against the live system data and/or the structure of the system data is
relatively simple and/or your firm does not have much interest in end user ad
hoc query/report tools, data warehousing may not be for your business.
Data warehousing can have a learning curve that may be too long for impatient
Many "strategic applications" of data warehousing have a short life span and
require the developers to put together a technically inelegant system quickly.
Some developers are reluctant to work this way.
There is a limited number of people available who have worked with the full data
warehousing system project "life cycle".
Data warehousing systems can require a great deal of "maintenance" which many
organizations cannot or will not support.
From Tables and Spreadsheets to Data Cubes

Data warehouse is based on a multidimensional data model which views data in the form
of a data cube
A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
Dimension tables, such as item (item_name, brand, type), or time(day, week, month,
quarter, year)
Fact table contains measures (such as dollars_sold) and keys to each of the related
dimension tables
In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-
D cuboids, which holds the highest-level of summarization, is called the apex cuboid.
The lattice of cuboids forms a data cube.

Conceptual Modeling of Data Warehouses

Modeling data warehouses: dimensions & measures

Star schema: A fact table in the middle connected to a set of dimension tables

Snowflake schema: A refinement of star schema where some dimensional hierarchy is

normalized into a set of smaller dimension tables, forming a shape similar to snowflake

Fact constellations: Multiple fact tables share dimension tables, viewed as a collection
of stars, therefore called galaxy schema or fact constellation

Online Transaction Processing (OLTP) Schema

In Online Transaction Processing (OLTP), the database is designed to achieve efficient

transactions such as INSERT and UPDATE. This is very different from the OLAP

Unlike OLAP, normalization is very important to reduce duplicates and also cut down on
the size of the data. our OLTP schema may look like this:

Locations Table
Field Name Type
Loc_Id INTEGER (4)
Loc_Code VARCHAR (5)
Loc_Name VARCHAR (30)
State_Id INTEGER (4)
Country_Id INTEGER (4)
States Table

Field Name Type

Sate_Id INTEGER (4)
State_Name VARCHAR (50)
Countries Table

Field Name Type

Country_Id INTEGER (4)
Country_Name VARCHAR (50)
In order to query for all locations that are in country 'USA' we will have to join these
three tables. The SQL will look like:
SELECT * FROM Locations, States, Countries where Locations.State_Id =
States.State_Id AND Locations.Country_id=Countries.Country_Id and

Dimension Tables - Key elements of a Dimension Table

Dimensional modeling allows only one table per dimension. But your OLTP data spans
across multiple tables as described. So we need de-normalize the OLTP schema and
export into your Dimension Tables.

For example, for the location dimension, you achieve this by joining the three OLTP
tables and inserting the data into the single
Location Table will look like this:

Location Dimension Table Schema

Field Name Type

Dim_Id INTEGER (4)
Loc_Code VARCHAR (4)
Name VARCHAR (50)
State_Name VARCHAR (20)
Country_Name VARCHAR (20)
All Dimension tables contain a key column called the dimension key. In this example
Dim_Id is our dimension Id. This is the unique key into our Location dimension table.

The actual data in your Location Table may look like this

Location Dimension Table Data

Dim_Id Loc_Code Name State_Name Country_Name

1001 IL01 Chicago Illinois USA
1002 IL02 Arlington Illinois USA
1003 NY01 Brooklyn New York USA
1004 TO01 Toronto Ontario Canada
1005 MX01 Mexico City Distrito Mexico
We may notice that some of the information is repeated in the above dimension table.
The State Name and Country Name are repeated through out the table. You may feel that
this is waste of data space and against the normalization principles. But in dimensional
modeling this type of design makes the querying very optimized and reduces the query
times. Also we will learn later that in a typical data warehouse, the dimension tables
make up only 10 to 15 % of the storage as the fact table is by far the largest table and
takes up the rest of the storage allocation.

Time Dimension Table

After de-normalization, your Time table will look like this:

Time Dimension Table Schema

Field Name Type

Dim_Id INTEGER (4)
Month_Name VARCHAR
Quarter SMALL
Quarter_Name VARCHAR

The actual data in your Time Table may will look like this:

Time Dimension Data

_Dim_Id _Month _Month_Name _Quarter _Quarter_Name
1001 1 Jan 1 Q1 2003
1002 2 Feb 1 Q1 2003
1003 3 Mar 1 Q1 2003
1004 4 Apr 2 Q2 2003
1005 5 May 2 Q2 2003

Product Dimension Table

After de-normalization, your Product table will look like this:
Product Dimension Table Schema

Field Name Type

Dim_Id INTEGER (4)
Name VARCHAR (30)
Category VARCHAR (30)

In this table PR_Dim_Id is our dimension Id. This is the unique key into our Product
dimension table.
The actual data in your Product Table may look like this:

Product Dimension Table Data

Dim_Id SKU Name Category

1001 DOVE6K Dove Soap 6Pk Sanitary
1002 MLK66F# Skim Milk 1 Gal Dairy
1003 SMKSAL55 Smoked SalmonMeat
Lecture – 13

Data Warehouse Design Process

Top-down, bottom-up approaches or a combination of both

Top-down: Starts with overall design and planning (mature)
Bottom-up: Starts with experiments and prototypes (rapid)

Typical data warehouse design process

Choose a business process to model, e.g., orders, invoices, etc.
Choose the grain (atomic level of data) of the business process
Choose the dimensions that will apply to each fact table record
Choose the measure that will populate each fact table record

Three Data Warehouse Models

Enterprise warehouse
collects all of the information about subjects spanning the entire organization

Data Mart
A subset of corporate-wide data that is of value to a specific groups of users. Its scope is
confined to specific, selected groups, such as marketing data mart
Independent vs. dependent (directly from warehouse) data mart

Virtual warehouse
A set of views over operational databases
Only some of the possible summary views may be materialized

OLAP Server Architectures

Relational OLAP (ROLAP)

Use relational or extended-relational DBMS to store and manage warehouse data and
OLAP middle ware to support missing pieces
Include optimization of DBMS backend, implementation of aggregation navigation logic,
and additional tools and services
greater scalability

Multidimensional OLAP (MOLAP)

Array-based multidimensional storage engine (sparse matrix techniques)

fast indexing to pre-computed summarized data


User flexibility, e.g., low level: relational, high-level: array

The Case for Data Warehousing

To perform server/disk bound tasks associated with querying and reporting on

servers/disks not used by transaction processing systems
To use data models and/or server technologies that speed up querying and reporting
and that are not appropriate for transaction processing
To provide an environment where a relatively small amount of knowledge of the
technical aspects of database technology is required to write and maintain
queries and reports and/or to provide a means to speed up the writing and
maintaining of queries and reports by technical personnel
To provide a repository of "cleaned up" transaction processing systems data that
can be reported against and that does not necessarily require fixing the
transaction processing systems
To make it easier, on a regular basis, to query and report data from multiple
transaction processing systems and/or from external data sources and/or
from data that must be stored for query/report purposes only
To provide a repository of transaction processing system data that contains data
from a longer span of time than can efficiently be held in a transaction
processing system and/or to be able to generate reports "as was" as of a
previous point in time
Lecture – 14

Data Warehouse Usage

Three kinds of data warehouse applications

Information processing
supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts
and graphs

Analytical processing
multidimensional analysis of data warehouse data
supports basic OLAP operations, slice-dice, drilling, pivoting

Data mining
knowledge discovery from hidden patterns
supports associations, constructing analytical models, performing classification and
prediction, and presenting the mining results using visualization tools.

Data explosion problem

Automated data collection tools and mature database technology lead to tremendous
amounts of data stored in databases, data warehouses and other information repositories
We are drowning in data, but starving for knowledge!
Solution: Data warehousing and data mining
–Data warehousing and on-line analytical processing

Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data

in large databases
Data Marts
In some data warehouse implementations, a data mart is a miniature data warehouse; in
others, it is just one segment of the data warehouse. Data marts are often used to provide
information to functional segments of the organization. Typical examples are data marts
for the sales department, the inventory and shipping department, the finance department,
upper level management, and so on. Data marts can also be used to segment data
warehouse data to reflect a geographically compartmentalized business in which each
region is relatively autonomous. For example, a large service organization may treat
regional operating centers as individual business units, each with its own data mart that
contributes to the master data warehouse.
Data marts are sometimes designed as complete individual data warehouses and
contribute to the overall organization as a member of a distributed data warehouse. In
other designs, data marts receive data from a master data warehouse through periodic
updates, in which case the data mart functionality is often limited to presentation services
for clients.
Regardless of the functionality provided by data marts, they must be designed as
components of the master data warehouse so that data organization, format, and schemas
are consistent throughout the data warehouse. Inconsistent table designs, update
mechanisms, or dimension hierarchies can prevent data from being reused throughout the
data warehouse, and they can result in inconsistent reports from the same data. For
example, it is unlikely that summary reports produced from a finance department data
mart that organizes the sales force by management reporting structure will agree with
summary reports produced from a sales department data mart that organizes the same
sales force by geographical region. It is not necessary to impose one view of data on all
data marts to achieve consistency; it is usually possible to design consistent schemas and
data formats that permit rich varieties of data views without sacrificing interoperability.
For example, the use of a standard format and organization for time, customer, and
product data does not preclude data marts from presenting information in the diverse
perspectives of inventory, sales, or financial analysis. Data marts should be designed
from the perspective that they are components of the data warehouse regardless of their
individual functionality or construction. This provides consistency and usability of
information throughout the organization. Data Warehouse Architecture

Architecture Design &

Project Planning for
Business Intelligence,
Data Warehouse, and
Corporate Performance
Management Projects

Examine data warehouse

architecture along the following
• Data
• Information
• Technology
• Product

Architecture Deliverables

Data • Define what data is needed to meet business

user needs.
• Examine the completeness and correctness of
source systems that are needed to obtain
• Identify the data facts and dimensions.
• Define the logical data models.

• Establish preliminary aggregation plan.

Information • Define the framework for the transformation

of data into information from the source
systems to information used by the business
• Recommend the data stages necessary for
data transform and information access.
• Develop source-to-target data mapping for
each data stage.
• Review data quality procedures and
reconciliation techniques.

• Define the physical data models.

Technology • Define technical functionality used to build a

data warehousing and business intelligence
• Identify available technologies available and
review tradeoffs associated between any
overlapping or competing technologies.
• Review current technical environment and
company's strategic technical directions.

• Recommend technologies to be used to meet

your business requirements and
implementation plan.

Product • List product categories needed to implement

the technology architecture.
• Review tradeoffs between overlapping or
competing product categories.
• Outline implementation of product
architecture in stages.
• Identify short list of products in each of these
• Recommend products and implementation

What Is Data Mining?

Data mining (knowledge discovery in databases):

Extraction of interesting (non-trivial, implicit, previously unknown and potentially
useful) information or patterns from data in large databases

Data mining: a misnomer?

Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern
analysis, data archeology, data dredging, information harvesting, business intelligence,

From On-Line Analytical Processing to On Line Analytical Mining (OLAM)

Why online analytical mining?

High quality of data in data warehouses

DW contains integrated, consistent, cleaned data
Available information processing structure surrounding data warehouses
ODBC, OLEDB, Web accessing, service facilities, reporting and OLAP tools
OLAP-based exploratory data analysis
mining with drilling, dicing, pivoting, etc.
On-line selection of data mining functions
Integration and swapping of multiple mining functions, algorithms, and tasks.

Data Mining: On What Kind of Data?

Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous and legacy databases
Data Mining Functionalities

Concept description: Characterization and discrimination

–Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet regions

Association (correlation and causality)

–Multi-dimensional vs. single-dimensional association
age(X, “20..29”) ^ income(X, “20..29K”)  buys(X, “PC”) [support = 2%, confidence =
contains(T, “computer”)  contains(x, “software”) [1%, 75%]

Classification and Prediction

Finding models (functions) that describe and distinguish classes or concepts for future
E.g., classify countries based on climate, or classify cars based on gas mileage
Presentation: decision-tree, classification rule, neural network
Prediction: Predict some unknown or missing numerical values

Cluster analysis

Class label is unknown: Group data to form new classes, e.g., cluster houses to find
distribution patterns
Clustering based on the principle: maximizing the intra-class similarity and minimizing
the interclass similarity

Outlier analysis

Outlier: a data object that does not comply with the general behavior of the data
It can be considered as noise or exception but is quite useful in fraud detection, rare
events analysis

Trend and evolution analysis

Trend and deviation: regression analysis

Sequential pattern mining, periodicity analysis
Similarity-based analysis
Lecture – 16

OLAP Mining: An Integration of Data Mining and Data Warehousing

Data mining systems, DBMS, Data warehouse systems coupling

No coupling

On-line analytical mining data

Integration of mining and OLAP technologies

Interactive mining multi-level knowledge

Necessity of mining knowledge and patterns at different levels of abstraction by

drilling/rolling, pivoting, slicing/dicing, etc.

Major Issues in Data Mining

Mining methodology and user interaction

–Mining different kinds of knowledge in databases

–Interactive mining of knowledge at multiple levels of abstraction
–Incorporation of background knowledge
–Data mining query languages and ad-hoc data mining
–Expression and visualization of data mining results
–Handling noise and incomplete data
–Pattern evaluation: the interestingness problem

Performance and scalability

–Efficiency and scalability of data mining algorithms

–Parallel, distributed and incremental mining methods
Issues relating to the diversity of data types

Handling relational and complex types of data

Mining information from heterogeneous databases and global information systems

Issues related to applications and social impacts

Application of discovered knowledge

•Domain-specific data mining tools
•Intelligent query answering
•Process control and decision making
Integration of the discovered knowledge with existing knowledge: A knowledge fusion
Protection of data security, integrity, and privacy

Data Mining System Architectures

Coupling -data mining system with DB/DW system

No coupling—flat file processing, not recommended

nLoose coupling
Fetching data from DB/DW

Semi-tight coupling—enhanced DM performance

nProvide efficient implement a few data mining primitives in a DB/DW system, e.g.,
sorting, indexing, aggregation, histogram analysis, multiway join, precomputation of
some stat functions

Tight coupling—A uniform information processing environment

DM is smoothly integrated into a DB/DW system, mining query is optimized based on
mining query, indexing, query processing methods, etc.

Types of Text Data Mining

-Keyword-based association analysis

-Automatic document classification
-Similarity detection
--Cluster documents by a common author
--Cluster documents containing information from a common source
-Link analysis: unusual correlation between entities
-Sequence analysis: predicting a recurring event
-Anomaly detection: find information that violates usual patterns
-Hypertext analysis
--Patterns in anchors/links
---Anchor text correlations with linked objects

Keyword-based association analysis

-Collect sets of keywords or terms that occur frequently together and then find the
association or correlation relationships among them
-First preprocess the text data by parsing, stemming, removing stop words, etc.
-Then evoke association mining algorithms
--Consider each document as a transaction
--View a set of keywords in the document as a set of items in the transaction
-Term level association mining
--No need for human effort in tagging documents
--The number of meaningless results and the execution time is greatly reduced

Mining the World-Wide Web

-The WWW is huge, widely distributed, global information service center for
--Information services: news, advertisements, consumer information, financial
management, education, government, e-commerce, etc.
--Hyper-link information
--Access and usage information
-WWW provides rich sources for data mining

-Too huge for effective data warehousing and data mining
-Too complex and heterogeneous: no standards and structure

Web search engines

Index-based: search the Web, index Web pages, and build and store huge keyword-based
Help locate sets of Web pages containing certain keywords

-A topic of any breadth may easily contain hundreds of thousands of documents
-Many documents that are highly relevant to a topic may not contain keywords defining
them (polysemy)

Web Mining: A more challenging task

Searches for
-Web access patterns
-Web structures
-Regularity and dynamics of Web contents

-The “abundance” problem
-Limited coverage of the Web: hidden Web sources, majority of data in DBMS
-Limited query interface based on keyword-oriented search
-Limited customization to individual users

Mining the Web's Link Structures

Finding authoritative Web pages

-Retrieving pages that are not only relevant, but also of high quality, or authoritative on
the topic
Hyperlinks can infer the notion of authority
-The Web consists not only of pages, but also of hyperlinks pointing from one page to
-These hyperlinks contain an enormous amount of latent human annotation
-A hyperlink pointing to another Web page, this can be considered as the author's
endorsement of the other page
Problems with the Web linkage structure
-Not every hyperlink represents an endorsement
--Other purposes are for navigation or for paid advertisements
--If the majority of hyperlinks are for endorsement, the collective opinion will still
-One authority will seldom have its Web page point to its rival authorities in the same
-Authoritative pages are seldom particularly descriptive
-Set of Web pages that provides collections of links to authorities


Data Mining Applications

Data mining is a young discipline with wide and diverse applications

-There is still a nontrivial gap between general principles of data mining and domain-
specific, effective data mining tools for particular applications
Some application domains
-Biomedical and DNA data analysis
-Financial data analysis
-Retail industry
-Telecommunication industry

Biomedical Data Mining and DNA Analysis

-DNA sequences: 4 basic building blocks (nucleotides): adenine (A), cytosine (C),
guanine (G), and thymine (T).
-Gene: a sequence of hundreds of individual nucleotides arranged in a particular order
-Humans have around 100,000 genes
-Tremendous number of ways that the nucleotides can be ordered and sequenced to form
distinct genes
-Semantic integration of heterogeneous, distributed genome databases
--Current: highly distributed, uncontrolled generation and use of a wide variety of DNA
--Data cleaning and data integration methods developed in data mining will help

Data Mining for Financial Data Analysis

Financial data collected in banks and financial institutions are often relatively complete,
reliable, and of high quality
Design and construction of data warehouses for multidimensional data analysis and data
-View the debt and revenue changes by month, by region, by sector, and by other factors
-Access statistical information such as max, min, total, average, trend, etc.
Loan payment prediction/consumer credit policy analysis
-feature selection and attribute relevance ranking
-Loan payment performance
-Consumer credit rating

Data Mining for Retail Industry

Retail industry: huge amounts of data on sales, customer shopping history, etc.

Applications of retail data mining

-Identify customer buying behaviors

-Discover customer shopping patterns and trends
-Improve the quality of customer service
-Achieve better customer retention and satisfaction
-Enhance goods consumption ratios
-Design more effective goods transportation and distribution policies

Data Mining for Telecomm. Industry

A rapidly expanding and highly competitive industry and a great demand for data mining
-Understand the business involved
-Identify telecommunication patterns
-Catch fraudulent activities
-Make better use of resources
-Improve the quality of service
Multidimensional analysis of telecommunication data
-Intrinsically multidimensional: calling-time, duration, location of caller, location of
callee, type of call, etc.
Fraudulent pattern analysis and the identification of unusual patterns
-Identify potentially fraudulent users and their atypical usage patterns
-Detect attempts to gain fraudulent entry to customer accounts
-Discover unusual patterns which may need special attention
Multidimensional association and sequential pattern analysis
-Find usage patterns for a set of communication services by customer group, by month,
-Promote the sales of specific services
-Improve the availability of particular services in a region
Use of visualization tools in telecommunication data analysis

How to choose a data mining system

Commercial data mining systems have little in common
-Different data mining functionality or methodology
-May even work with completely different kinds of data sets
Need multiple dimensional view in selection
Data types: relational, transactional, text, time sequence, spatial?
System issues
-running on only one or on several operating systems?
-a client/server architecture?
-Provide Web-based interfaces and allow XML data as input and/or output?

Data sources
-ASCII text files, multiple relational data sources
-support ODBC connections (OLE DB, JDBC)?
Data mining functions and methodologies
-One vs. multiple data mining functions
-One vs. variety of methods per function
--More data mining functions and methods per function provide the user with greater
flexibility and analysis power
Coupling with DB and/or data warehouse systems
-Four forms of coupling: no coupling, loose coupling, semitight coupling, and tight
--Ideally, a data mining system should be tightly coupled with a database system
-Row (or database size) scalability
-Column (or dimension) scalability
-Curse of dimensionality: it is much more challenging to make a system column scalable
that row scalable
Visualization tools
-“A picture is worth a thousand words”
-Visualization categories: data visualization, mining result visualization, mining process
visualization, and visual data mining
Data mining query language and graphical user interface
-Easy-to-use and high-quality graphical user interface
-Essential for user-guided, highly interactive data mining

Examples of Data Mining Systems

IBM Intelligent Miner

-A wide range of data mining algorithms
-Scalable mining algorithms
-Toolkits: neural network algorithms, statistical methods, data preparation, and data
visualization tools
-Tight integration with IBM's DB2 relational database system

SAS Enterprise Miner

-A variety of statistical analysis tools
-Data warehouse tools and multiple data mining algorithms
Mirosoft SQLServer 2000
-Integrate DB and OLAP with mining
-Support OLEDB for DM standard
SGI MineSet
-Multiple data mining algorithms and advanced statistics
-Advanced visualization tools

Clementine (SPSS)
-An integrated data mining development environment for end-users and developers
-Multiple data mining algorithms and visualization tools

DBMiner (DBMiner Technology Inc.)

-Multiple data mining modules: discovery-driven OLAP analysis, association,
classification, and clustering
-Efficient, association and sequential-pattern mining functions, and visual classification
-Mining both relational databases and data warehouses

Visual Data Mining

Visualization: use of computer graphics to create visual images which aid in the
understanding of complex, often massive representations of data
Visual Data Mining: the process of discovering implicit but useful knowledge from
large data sets using visualization techniques
Purpose of Visualization
-Gain insight into an information space by mapping data onto graphical primitives
-Provide qualitative overview of large data sets
-Search for patterns, trends, structure, irregularities, relationships among data.
-Help find interesting regions and suitable parameters for further quantitative analysis.
-Provide a visual proof of computer representations derived

Visual Data Mining & Data Visualization

Integration of visualization and data mining
-data visualization
-data mining result visualization
-data mining process visualization
-interactive visual data mining
Data visualization
-Data in a database or data warehouse can be viewed
--at different levels of granularity or abstraction
--as different combinations of attributes or dimensions
-Data can be presented in various visual forms

Audio Data Mining

-Uses audio signals to indicate the patterns of data or the features of data mining results
-An interesting alternative to visual mining
-An inverse task of mining audio (such as music) databases which is to find patterns from
audio data
-Visual data mining may disclose interesting patterns using graphical displays, but
requires users to concentrate on watching patterns
-Instead, transform patterns into sound and music and listen to pitches, rhythms, tune, and
melody in order to identify anything interesting or unusual

Scientific and Statistical Data Mining

There are many well-established statistical techniques for data analysis, particularly for
numeric data
-applied extensively to data from scientific experiments and data from economics and the
social sciences
-predict the value of a response (dependent) variable from one or more predictor
(independent) variables where the variables are numeric
-forms of regression: linear, multiple, weighted, polynomial, nonparametric, and robust
Generalized linear models
-allow a categorical response variable (or some transformation of it) to be related to a set
of predictor variables
-similar to the modeling of a numeric response variable using linear regression
-include logistic regression and Poisson regression

Regression trees
-Binary trees used for classification and prediction
-Similar to decision trees:Tests are performed at the internal nodes
-Difference is at the leaf level
--In a decision tree a majority voting is performed to assign a class label to the leaf
--In a regression tree the mean of the objective attribute is computed and used as the
predicted value
Analysis of variance
-Analyze experimental data for two or more populations described by a numeric response
variable and one or more categorical variables (factors)
Mixed-effect models
-For analyzing grouped data, i.e. data that can be classified according to one or more
grouping variables
-Typically describe relationships between a response variable and some covariates in data
grouped according to one or more factors

Factor analysis
-determine which vars are combined to generate a given factor
-e.g., for many psychiatric data, one can indirectly measure other quantities (such as test
scores) that reflect the factor of interest
Discriminant analysis
-predict a categorical response variable, commonly used in social science
-Attempts to determine several discriminant functions (linear combinations of the
independent variables) that discriminate among the groups defined by the response
Time series: many methods such as autoregression, ARIMA (Autoregressive integrated
moving-average modeling), long memory time-series modeling
Survival analysis
-predict the probability that a patient undergoing a medical treatment would survive at
least to time t (life span prediction)
Quality control
-display group summary charts

Understanding Knowledge

• Knowledge can be defined as the ``understanding obtained through the process

of experience or appropriate study.''
• Knowledge can also be an accumulation of facts, procedural rules, or heuristics.
o A fact is generally a statement representing truth about a subject matter or
o A procedural rule is a rule that describes a sequence of actions.
o A heuristic is a rule of thumb based on years of experience.
• Intelligence implies the capability to acquire and apply appropriate knowledge.
o Memory indicates the ability to store and retrieve relevant experience
according to will.
o Learning represents the skill of acquiring knowledge using the method of
• Experience relates to the understanding that we develop through our past actions.
• Knowledge can develop over time through successful experience, and experience
can lead to expertise.
• Common sense refers to the natural and mostly unreflective opinions of humans.

Cognitive Psychology

• Cognitive psychology tries to identify the cognitive structures and processes that
closely relates to skilled performance within an area of operation.
• It provides a strong background for understanding knowledge and expertise.
• In general, it is the interdisciplinary study of human intelligence.
• The two major components of cognitive psychology are:
o Experimental Psychology: This studies the cognitive processes that
constitutes human intelligence.
o Artificial Intelligence(AI): This studies the cognition of Computer-based
intelligent systems.
• The process of eliciting and representing experts knowledge usually involves a
knowledge developer and some human experts (domain experts).
• In order to gather the knowledge from human experts, the developer usually
interviews the experts and asks for information regarding a specific area of
• It is almost impossible for humans to provide the completely accurate reports of
their mental processes.
• The research in the area of cognitive psychology helps to a better understanding
of what constitutes knowledge, how knowledge is elicited, and how it should be
represented in a corporate knowledge base.
• Hence, cognitive psychology contributes a great deal to the area of knowledge

Data, Information and Knowledge

• Data represents unorganized and unprocessed facts.

o Usually data is static in nature.
o It can represent a set of discrete facts about events.
o Data is a prerequisite to information.
o An organization sometimes has to decide on the nature and volume of data
that is required for creating the necessary information.
• Information
o Information can be considered as an aggregation of data (processed data)
which makes decision making easier.
o Information has usually got some meaning and purpose.
• Knowledge
o By knowledge we mean human understanding of a subject matter that has
been acquired through proper study and experience.
o Knowledge is usually based on learning, thinking, and proper
understanding of the problem area.
o Knowledge is not information and information is not data.
o Knowledge is derived from information in the same way information is
derived from data.
o We can view it as an understanding of information based on its perceived
importance or relevance to a problem area.
o It can be considered as the integration of human perceptive processes that
helps them to draw meaningful conclusions.
Figure 1.1: Data, Information, Knowledge and Wisdom
Kinds of Knowledge

• Deep Knowledge: Knowledge acquired through years of proper experience.

• Shallow Knowledge: Minimal understanding of the problem area.
• Knowledge as Know-How: Accumulated lessons of practical experience.
• Reasoning and Heuristics: Some of the ways in which humans reason are as
o Reasoning by analogy: This indicates relating one concept to another.
o Formal Reasoning: This indicates reasoning by using deductive (exact) or
inductive reasoning.
 Deduction uses major and minor premises.
 In case of deductive reasoning, new knowledge is generated by
using previously specified knowledge.
 Inductive reasoning implies reasoning from a set of facts to a
general conclusion.
 Inductive reasoning is the basis of scientific discovery.
 A case is knowledge associated with an operational level.
• Common Sense: This implies a type of knowledge that almost every human being
possess in varying forms/amounts.
• We can also classify knowledge on the basis of whether it is procedural,
declarative, semantic, or episodic.
o Procedural knowledge represents the understanding of how to carry out a
specific procedure.
o Declarative knowledge is routine knowledge about which the expert is
conscious. It is shallow knowledge that can be readily recalled since it
consists of simple and uncomplicated information. This type of knowledge
often resides in short-term memory.
o Semantic knowledge is highly organized, ``chunked'' knowledge that
resides mainly in long-term memory. Semantic knowledge can include
major concepts, vocabulary, facts, and relationships.
o Episodic knowledge represents the knowledge based on episodes
(experimental information). Each episode is usually ``chunked'' in long-
term memory.
• Another way of classifying knowledge is to find whether it is tacit or explicit
o Tacit knowledge usually gets embedded in human mind through
o Explicit knowledge is that which is codified and digitized in documents,
books, reports, spreadsheets, memos etc.

Expert Knowledge

It is the information woven inside the mind of an expert for accurately and quickly
solving complex problems.

• Knowledge Chunking
o Knowledge is usually stored in experts long-range memory as chunks.
o Knowledge chunking helps experts to optimize their memory capacity and
enables them to process the information quickly.
o Chunks are groups of ideas that are stored and recalled together as an unit.
• Knowledge as an Attribute of Expertise
o In most areas of specialization, insight and knowledge accumulate
quickly, and the criteria for expert performance usually undergo
continuous change.
o In order to become an expert in a particular area, one is expected to master
the necessary knowledge and make significant contributions to the
concerned field.
o The unique performance of a true expert can be easily noticed in the
quality of decision making.
o The true experts (knowledgeable) are usually found to be more selective
about the information they acquire, and also they are better able in
acquiring information in a less structured situation.
o They can quantify soft information, and can categorize problems on the
basis of solution procedures that are embedded in the experts long range
memory and readily available on recall.
o Hence, they tend to use knowledge-based decision strategies starting with
known quantities to deduce unknowns.
o If a first-cut solution path fails, then the expert can trace back a few steps
and then proceed again.
o Nonexperts use means-end decision strategies to approach the the problem
o Nonexperts usually focus on goals rather than focusing on essential
features of the task which makes the task more time consuming and
sometimes unreliable.
o Specific individuals are found to consistently perform at higher levels than
others and they are labeled as experts.

Thinking and Learning in Humans

• Research in the area of artificial intelligence has introduced more structure into
human thinking about thinking.
• Humans do not necessarily receive and process information in exactly the same
way as the machines do.
• Humans can receive information via seeing, smelling, touching, hearing (sensing)
etc., which promotes a way of thinking and learning that is unique to humans.
• On macro level, humans and computers can receive inputs from a multitude of
• Computers can receive inputs from keyboards, touch screens etc.
• On micro level, both human brain and CPU of a computer receive information as
electrical impulses.
• The point to note here is that the computers must be programmed to do specific
tasks. Performing one task does not necessarily transcend onto other tasks as it
may do with humans.
• Human learning: Humans learn new facts, integrate them in some way which they
think is relevant and organize the result to produce necessary solution, advice and
decision. Human learning can occur in the following ways:
o Learning through Experience.
o Learning by Example.
o Learning by Discovery.

Challenges in KM Systems Development

• Changing Organizational Culture:

o Involves changing people's attitudes and behaviours.
• Knowledge Evaluation:
o Involves assessing the worth of information.
• Knowledge Processing:
o Involves the identification of techniques to acquire, store, process and
distribute information.
o Sometimes it is necessary to document how certain decisions were
• Knowledge Implementation:
o An organization should commit to change, learn, and innovate.
o It is important to extract meaning from information that may have an
impact on specific missions.
o Lessons learned from feedback can be stored for future to help others
facing the similar problem(s).

Capturing Knowledge

• Capturing Knowledge involves extracting, analyzing and interpreting the

concerned knowledge that a human expert uses to solve a specific problem.
• Explicit knowledge is usually captured in repositories from appropriate
documentation, files etc.
• Tacit knowledge is usually captured from experts, and from organization's stored
• Interviewing is one of the most popular methods used to capture knowledge.
• Data mining is also useful in terms of using intelligent agents that may analyze
the data warehouse and come up with new findings.
• In KM systems development, the knowledge developer acquires the necessary
heuristic knowledge from the experts for building the appropriate knowledge
• Knowledge capture and knowledge transfer are often carried out through teams
(refer to Figure 2.4).
• Knowledge capture includes determining feasibility, choosing the appropriate
expert, tapping the experts knowledge, retapping knowledge to plug the gaps in
the system, and verify/validate the knowledge base (refer to Table 3.4 in page 76
of your textbook).

Figure 2.4: Matching business strategies with KM strategies

The Role of Rapid Prototyping

• In most of the cases, knowledge developers use iterative approach for capturing
• Foe example, the knowledge developer may start with a prototype (based on the
somehow limited knowledge captured from the expert during the first few
• The following can turn the approach into rapid prototyping:
o Knowledge developer explains the preliminary/fundamental procedure
based on rudimentary knowledge extracted from the expert during the few
past sessions.
o The expert reacts by saying certain remarks.
o While the expert watches, the knowledge developer enters the additional
knowledge into the computer-based system (that represents the prototype).
o The knowledge developer again runs the modified prototype and continues
adding additional knowledge as suggested by the expert till the expert is
• The spontaneous, and iterative process of building a knowledge base is referred to
as rapid prototyping.

The Role of the Knowledge Developer

• The knowledge developer can be considered as the architect of the system.

• He/she identifies the problem domain, captures knowledge, writes/tests the
heuristics that represent knowledge, and co-ordinates the entire project.
• Some necessary attributes of knowledge developer:
o Communication skills.
o Knowledge of knowledge capture tools/technology.
o Ability to work in a team with professional/experts.
o Tolerance for ambiguity.
o To be able ti think conceptually.
o Ability to frequently interact with the champion, knowledge workers and
knowers in the organization.
Figure 2.5: Knowledge Developer's Role

Designing the KM Blueprint

This phase indicates the beginning of designing the IT infrastructure/ Knowledge

Management infrastructure. The KM Blueprint (KM system design) addresses a number
of issues.

• Aiming for system interoperability/scalability with existing IT infrastructure of

the organization.
• Finalizing the scope of the proposed KM system.
• Deciding about the necessary system components.
• Developing the key layers of the KM architecture to meet organization's
requirements. These layers are:
o User interface
o Authentication/security layer
o Collaborative agents and filtering
o Application layer
o Transport internet layer
o Physical layer
o Repositories

Knowledge Generation

• Knowledge update can mean creating new knowledge based on ongoing

experience in a specific domain and then using the new knowledge in
combination with the existing knowledge to come up with updated knowledge for
knowledge sharing.
• Knowledge can be created through teamwork (refer to Figure 3.1)
• A team can commit to perform a job over a specific period of time.
• A job can be regarded as a series of specific tasks carried out in a specific order.
• When the job is completed, then the team compares the experience it had initially
(while starting the job) to the outcome (successful/disappointing).
• This comparison translates experience into knowledge.
• While performing the same job in future,the team can take corrective steps and/or
modify the actions based on the new knowledge they have acquired.
• Over time, experience usually leads to expertise where one team (or individual)
can be known for handling a complex problem very well.
• This knowledge can be transferred to others in a reusable format.

Figure 3.1: Knowledge Creation/Knowledge Sharing via Teams

• There exists factors that encourage (or retard) knowledge transfer.
• Personality is one factor in case of knowledge sharing.
• For example, extrovert people usually posses self-confidence, feel secure, and
tend to share experiences more readily than the introvert, self-centered, and
security-conscious people.
• People with positive attitudes, who usually trust others and who work in
environments conductive to knowledge sharing tends to be better in sharing
• Vocational reinforcers are the key to knowledge sharing.
• People whose vocational needs are sufficiently met by job reinforcers are usually
found to be more likely to favour knowledge sharing than the people who are
deprived of one or more reinforcers.

Figure 3.2: Impediments to Knowledge Sharing

Capturing the Tacit Knowledge

• Knowledge Capture can be defined as the process using which the expert's
thoughts and experiences can be captured.
• In this case, the knowledge developer collaborates with the expert in order to
convert the expertise into the necessary program code(s).
• Important steps:
o Using appropriate tools for eliciting information.
o Interpreting the elicited information and consequently inferring the experts
underlying knowledge/reasoning process.
o Finally, using the interpretation to construct the the necessary rules which
can represent the experts reasoning process.
Fuzzy Reasoning & Quality of Knowledge Capture

• Sometimes, the information gathered from the experts via interviewing is not
precise and it involves fuzziness and uncertainty.
• The fuzziness may increase the difficulty of translating the expert's notions into
applicable rules.
• Analogies/Uncertainties:
o In the course of explaining events, experts can use analogies (comparing a
problem with a similar problem which has been encountered in possibly
different settings, months or years ago).
o An expert's knowledge or expertise represents the ability to gather
uncertain information as input and to use a plausible line of reasoning to
clarify the fuzzy details.
o Belief, an aspect of uncertainty, tends to describe the level of credibility.
o People may use different kinds of words in order to express belief.
o These words are often paired with qualifiers such as highly, extremely.
• Understanding experience:
o Knowledge developers can benefit from their understanding/knowledge of
cognitive psychology.
o When a question is asked, then an expert operates on certain stored
information through deductive, inductive, or other kinds of problem-
solving methods.
o The resulting answer is often found to be the culmination of the processing
of stored information.
o The right question usually evokes the memory of experiences that
produced good and appropriate solutions in the past.

Sometimes, how quickly an expert responds to a question depends on the

clarity of content, whether the content has been recently used , and how
well the expert has understood the question.

• Problem with the language: How well the expert can represent internal processes
can vary with their command of the language they are using and the knowledge
developer's interviewing skills.

The language may be unclear in the following number of ways:

o Comparative words (e.g., better, faster) are sometimes left hanging.

o Specific words or components may be left out of an explanation.
o Absolute words and phrases may be used loosely.
o Some words always seem to have a built-in ambiguity
Interviewing as a Tacit Knowledge Capture Tool

• Advantages of using interviewing as a tacit knowledge capture tool:

o It is a flexible tool.
o It is excellent for evaluating the validity of information.
o It is very effective in case of eliciting information regarding complex
o Often people enjoy being interviewed.
• Interviews can range from the highly unstructured type to highly structured type.
o The unstructured types are difficult to conduct, and they are used in the
case when the knowledge developer really needs to explore an issue.
o The structured types are found to be goal-oriented, and they are used in the
case when the knowledge developer needs specific information.
o Structured questions can be of the following types:
 Multiple-choice questions.
 Dichotomous questions.
 Ranking scale questions.
o In semistructured types, the knowledge developer asks predefined
questions, but he/she allows the expert some freedom in expressing his/her
• Guidelines for successful interviewing:
o Setting the stage and establishing rapport.
o Phrasing questions.
o Listening closely/avoiding arguments.
o Evaluating the session outcomes.
• Reliability of the information gathered from experts:

Some uncontrolled sources of error that can reduce the information's reliability:

o Expert's perceptual slant.

o The failure in expert's part to exactly remember what has happened.
o Fear of unknown in the part of expert.
o Problems with communication.
o Role bias.
• Errors in part of the knowledge developer: validity problems are often caused by
the interviewer effect (something about the knowledge developer colours the
response of the expert). Some of the effects can be as follows:
o Gender effect
o Age effect
o Race effect
• Problems encountered during interviewing
o Response bias.
o Inconsistency.
o Problem with communication.
o Hostile attitude.
o Standardizing the questions.
o Setting the length of the interview.
• Process of ending the interview:
o The end of the session should be carefully planned.
o One procedure calls for the knowledge developer to halt the questioning a
few minutes before the scheduled ending time, and to summarize the key
points of the session.
o This allows the expert to comment a schedule a future session.
o Many verbal/nonverbal cues can be used for ending the interview.
• Issues: Many issues may arise during the interview, and to be prepared for the
most important ones, the knowledge developer can consider the following
o How would it be possible to elicit knowledge from the experts who can
not say what they mean or can not mean what they say.
o How to set up the problem domain.
o How to deal with uncertain reasoning processes.
o How to deal with the situation of difficult relationships with expert(s).
o How to deal with the situation when the expert does not like the
knowledge developer for some reason.
• Rapid Prototyping in interviews:
o Rapid prototyping is an approach to building KM systems, in which
knowledge is added with each knowledge capture session.
o This is an iterative approach which allows the expert to verify the rules as
they are built during the session.
o This approach can open up communication through its demonstration of
the KM system.
o Due to the process of instant feedback and modification, it reduces the risk
of failure.
o It allows the knowledge developer to learn each time a change is
incorporated in the prototype.
o This approach is highly interactive.
o The prototype can create user expectations which in turn can become
obstacles to further development effort.

Some Knowledge Capturing Techniques On-Site Observation (Action Protocol)

• It is a process which involves observing, recording, and interpreting the expert's

problem-solving process while it takes place.
• The knowledge developer does more listening than talking; avoids giving advice
and usually does not pass his/her own judgment on what is being observed, even
if it seems incorrect; and most of all, does not argue with the expert while the
expert is performing the task.
• Compared to the process of interviewing, on-site observation brings the
knowledge developer closer to the actual steps, techniques, and procedures used
by the expert.
• One disadvantage is that sometimes some experts to not like the idea of being
• The reaction of other people (in the observation setting) can also be a problem
causing distraction.
• Another disadvantage is the accuracy/completeness of the captured knowledge.


• It is an unstructured approach towards generating ideas about creative solution of

a problem which involves multiple experts in a session.
• In this case, questions can be raised for clarification, but no evaluations are done
at the spot.
• Similarities (that emerge through opinions) are usually grouped together logically
and evaluated by asking some questions like:
o What benefits are to be gained if a particular idea is followed.
o What specific problems that idea can possibly solve.
o What new problems can arise through this.

The general procedure for conducting a brainstorming session:

o Introducing the session.

o Presenting the problem to the experts.
o Prompting the experts to generate ideas.
o Looking for signs of possible convergence.
• If the experts are unable to agree on a specific solution, they knowledge developer
may call for a vote/consensus

Electronic Brainstorming

• Is is a computer-aided approach for dealing with multiple experts.

• It usually begins with a pre-session plan which identifies objectives and structures
the agenda, which is then presented to the experts for approval.
• During the session, each expert sits on a PC and get themselves engaged in a
predefined approach towards resolving an issue, and then generates ideas.
• This allows experts to present their opinions through their PC's without having to
wait for their turn.
• Usually the comments/suggestions are displayed electronically on a large screen
without identifying the source.
• This approach protects the introvert experts and prevents tagging comments to
• The benefit includes improved communication, effective discussion regarding
sensitive issues, and closes the meeting with concise recommendations for
necessary action (refer to Figure 5.1 for the sequence of steps).
• This eventually leads to convergence of ideas and helps to set final specifications.
• The result is usually the joint ownership of the solution.

Protocol Analysis (Think-Aloud Method)

• In this case, protocols (scenarios) are collected by asking experts to solve the
specific problem and verbalize their decision process by stating directly what they
• Knowledge developers do not interrupt in the interim.
• The elicited information is structured later when the knowledge developer
analyzes the protocol.
• Here the term scenario refers to a detailed and somehow complex sequence of
events or more precisely, an episode.
• A scenario can involve individuals and objects.

A scenario provides a concrete vision of how some specific human activity can be
supported by information technology.

Consensus Decision Making

• Consensus decision making usually follows brainstorming.

• It is effective if and only if each expert has been provided with equal and
adequate opportunity to present their views.
• In order to arrive at a consensus, the knowledge developer conducting the
exercise tries to rally the experts towards one or two alternatives.
• The knowledge developer follows a procedure designed to ensure fairness and
• This method is democratic in nature.
• This method can be sometimes tedious and can take hours.

Repertory Grid

• This is a tool used for knowledge capture.

• The domain expert classifies and categorizes a problem domain using his/her own
• The grid is used for capturing and evaluating the expert's model.
• Two experts (in the same problem domain) may produce distinct sets of personal
and subjective results.
• The grid is a scale (or a bipolar construct) on which elements can be placed within
• The knowledge developer usually elicits the constructs and then asks the domain
expert to provide a set of examples called elements.
• Each element is rated according to the constructs which have been provided.

Nominal Group Technique (NGT)

• This provides an interface between consensus and brainstorming.

• Here the panel of experts becomes a Nominal Group whose meetings are
structured in order to effectively pool individual judgment.
• Ideawriting is a structured group approach used for developing ideas as well as
exploring their meaning and the net result is usually a written report.
• NGT is an ideawriting technique.

Delphi Method

• It is a survey of experts where a series of questionnaires are used to pool the

experts' responses for solving a specific problem.
• Each experts' contributions are shared with the rest of the experts by using the
results from each questionnaire to construct the next questionnaire.

Concept Mapping

• It is a network of concepts consisting of nodes and links.

• A node represents a concept, and a link represents the relationship between
concepts (refer to Figure 6.5 in page 172 of your textbook).
• Concept mapping is designed to transform new concepts/propositions into the
existing cognitive structures related to knowledge capture.
• It is a structured conceptualization.
• It is an effective way for a group to function without losing their individuality.
• Concept mapping can be done for several reasons:
o To design complex structures.
o To generate ideas.
o To communicate ideas.
o To diagnose misunderstanding.
• Six-step procedure for using a concept map as a tool:
o Preparation.
o Idea generation.
o Statement structuring.
o Representation.
o Interpretation
o Utilization.
• Similar to concept mapping, a semantic net is a collection of nodes linked
together to form a net.
o A knowledge developer can graphically represent descriptive/declarative
knowledge through a net.
o Each idea of interest is usually represented by a node linked by lines
(called arcs) which shows relationships between nodes.
o Fundamentally it is a network of concepts and relationships


• In this case, the experts work together to solve a specific problem using the
blackboard as their workspace.
• Each expert gets equal opportunity to contribute to the solution via the
• It is assumed that all participants are experts, but they might have acquired their
individual expertise in situations different from those of the other experts in the
• The process of blackboarding continues till the solution has been reached.
• Characteristics of blackboard system:
o Diverse approaches to problem-solving.
o Common language for interaction.
o Efficient storage of information
o Flexible representation of information.
o Iterative approach to problem-solving.
o Organized participation.
• Components of blackboard system:
o The Knowledge Source (KS): Each KS is an independent expert observing
the status of the blackboard and trying to contribute a higher level partial
solution based on the knowledge it has and how well such knowledge
applies to the current blackboard state.
o The Blackboard : It is a global memory structure, a database, or a
repository that can store all partial solutions and other necessary data that
are presently in various stages of completion.
o A Control Mechanism: It coordinates the pattern and flow of the problem
• The inference engine and the knowledge base are part of the blackboard system.
• This approach is useful in case of situations involving multiple expertise, diverse
knowledge representations, or situations involving uncertain knowledge

Knowledge Codification

• Knowledge codification means converting tacit knowledge to explicit knowledge

in a usable form for the organizational members.
• Tacit knowledge (e.g., human expertise) is identified and leveraged through a
form that is able to produce highest return for the business.
• Explicit knowledge is organized, categorized, indexed and accessed.
• The organizing often includes decision trees, decision tables etc.
• Codification must be done in a form/structure which will eventually build the
knowledge base.
• The resulting knowledge base supports training and decision making.
o Diagnosis.
o Training/Instruction.
o Interpretation.
o Prediction.
o Planning/Scheduling.
• The knowledge developer should note the following points before initiating
knowledge codification:
o Recorded knowledge is often difficult to access (because it is either
fragmented or poorly organized).
o Diffusion of new knowledge is too slow.
o Knowledge is nor shared, but hoarded (this can involve political
o Often knowledge is not found in the proper form.
o Often knowledge is not available at the correct time when it is needed.
o Often knowledge is not present in the proper location where it should be
o Often the knowledge is found to be incomplete.

Modes of Knowledge Conversion

• Conversion from tacit to tacit knowledge produces socialization where knowledge

developer looks for experience in case of knowledge capture.
• Conversion from tacit to explicit knowledge involves externalizing, explaining or
clarifying tacit knowledge via analogies, models, or metaphors.
• Conversion from explicit to tacit knowledge involves internalizing (or fitting
explicit knowledge to tacit knowledge.
• Conversion from explicit to explicit knowledge involves combining, categorizing,
reorganizing or sorting different bodies of explicit knowledge to lead to new
Codifying Knowledge

• An organization must focus on the following before codification:

o What organizational goals will the codified knowledge serve?
o What knowledge exists in the organization that can address these goals?
o How useful is the existing knowledge for codification?
o How would someone codify knowledge?
• Codifying tacit knowledge (in its entirety) in a knowledge base or repository is
often difficult because it is usually developed and internalized in the minds of the
human experts over a long period of time. ]

Decision Table

• It is another technique used for knowledge codification.

• It consists of some conditions, rules, and actions.

A phonecard company sends out monthly invoices to permanent customers and gives
them discount if payments are made within two weeks. Their discounting policy is as
``If the amount of the order of phone cards is greater than $35, subtract 5% of the order;
if the amount is greater than or equal to $20 and less than or equal to $35, subtract a 4%
discount; if the amount is less than $20, do not apply any discount.''
We shall develop a decision table for their discounting decisions, where the condition
alternatives are `Yes' and `No'.

Figure 6.2: Example: Decision Table

Decision Tree

• It is also a knowledge codification technique.

• A decision tree is usually a hierarchically arranged semantic network.

A decision tree for the phonecard company discounting policy (as discussed above) is
shown next.

Figure 6.3: Example: Decision Tree


• A frame is a codification scheme used for organizing knowledge through previous

• It deals with a combination of declarative and operational knowledge.
• Key elements of frames:
o Slot: A specific object being described/an attribute of an entity.
o Facet: The value of an object/slot.
Lecture-29 & 30

Production Rules

• They are conditional statements specifying an action to be taken in case a certain

condition is true.
• They codify knowledge in the form of premise-action pairs.
• Syntax: IF (premise) THEN (action)
• Example: IF income is `standard' and payment history is `good', THEN `approve
home loan'.
• In case of knowledge-based systems, rules are based on heuristics or experimental
• Rules can incorporate certain levels of uncertainty.
• A certainty factor is synonymous with a confidence level , which is a subjective
quantification of an expert's judgment.
• The premise is a Boolean expression that should evaluate to be true for the rule to
be applied.
• The action part of the rule is separated from the premise by the keyword THEN.
• The action clause consists of a statement or a series of statements separated by
AND's or comma's and is executed if the premise is true.

In case of knowledge-based systems, planning involves:

• Breaking the entire system into manageable modules.

• Considering partial solutions and liking them through rules and procedures to
arrive at a final solution.
• Deciding on the programming language(s).
• Deciding on the software package(s).
• Testing and validating the system.
• Developing the user interface.
• Promoting clarity, flexibility; making rules clear.
• Reducing unnecessary risk.

Role of inferencing:

• Inferencing implies the process of deriving a conclusion based on statements that

only imply that conclusion.
• An inference engine is a program that manages the inferencing strategies.
• Reasoning is the process of applying knowledge to arrive at the conclusion.
o Reasoning depends on premise as well as on general knowledge.
o People usually draw informative conclusions.
Case-Based Reasoning

• It is reasoning from relevant past cases in a way similar to human's use of past
experiences to arrive at conclusions.
• Case-based reasoning is a technique that records and documents cases and then
searches the appropriate cases to determine their usefulness in solving new cases
presented to the expert.
• The aim is to bring up the most similar historical case that matches the present
• Adding new cases and reclassifying the case library usually expands knowledge.
• A case library may require considerable database storage as well as an efficient
retrieval system.

Knowledge-Based Agents

• An intelligent agent is a program code which is capable of performing

autonomous action in a timely fashion.
• They can exhibit goal directed behaviour by taking initiative.
• they can be programmed to interact with other agents or humans by using some
agent communication language.

In terms of knowledge-based systems, an agent can be programmed to learn from the user
behaviour and deduce future behaviour for assisting the user.1