Вы находитесь на странице: 1из 50

SYSTEMS, INFORMATION AND DECISION THEORY

Information Theory
bits of information in messages
(Shannon & Weaver model of communication)
Information theory deals with measurement and transmission of information through a
channel. A fundamental work in this area is the Shannon's Information Theory, which
provides many useful tools that are based on measuring information in terms of bits or - more
generally - in terms of (the minimal amount of) the complexity of structures needed to encode
a given piece of information.
Noise can be considered data without meaning; that is, data that is not being used to transmit
a signal, but is simply produced as an unwanted by-product of other activities. Noise is still
considered information, in the sense of Information Theory.
Shannons ideas

Form the basis for the field of Information Theory

Provide the yardsticks for measuring the efficiency of communication system.

Identified problems that had to be solved to get to what he described as ideal


communications systems

Information:
In defining information, Shannon identified the critical relationships among the elements of a
communication system the power at the source of a signal the bandwidth or frequency range
of an information channel through which the signal travels the noise of the channel, such as
unpredictable static on a radio, which will alter the signal by the time it reaches the last
element of the System the receiver, which must decode the signal.
To get a high-level understanding of his theory, a few basic points should be made. First,
words are symbols to carry information between people. If one says to an American, Lets
go!, the command is immediately understood. But if we give the commands in Russian,
Pustim v xod!, we only get a quizzical look. Russian is the wrong code for an American.
Second, all communication involves three steps

Coding a message at its source


UNIT 3

Page 1

SYSTEMS, INFORMATION AND DECISION THEORY

Transmitting the message through a communications channel


Decoding the message at its destination.
In the first step, the message has to be put into some kind of symbolic representation words,

musical notes, icons, mathematical equations, or bits. When we write Hello, we encode a
greeting. When we write a musical score, its the same thing only were encoding sounds.
For any code to be useful it has to be transmitted to someone or, in a computers case, to
something. Transmission can be by voice, a letter, a billboard, a telephone conversation, a
radio or television broadcast. At the destination, someone or something has to receive the
symbols, and then decode them by matching them against his or her own body of information
to extract the data.
Fourth, there is a distinction between a communications channels designed symbol rate of so
many bits per second and its actual information capacity. Shannon defines channel capacity
as how many kilobits per second of user information can be transmitted over a noisy channel
with as small an error rate as possible, which can be less than the channels raw symbol
rate.
EXAMPLE:
Suppose we are watching cars going past on a highway. For simplicity, suppose 50% of the
cars are black, 25% are white, 12.5% are red, and 12.5% are blue. Consider the flow of cars
as an information source with four words: black, white, red, and blue. A simple way of
encoding this source into binary symbols would be to associate each color with two bits, that
is: black = 00, white = 01, red = 10, and blue = 11, an average of 2.00 bits per color.
A Better Code Using Information Theory
A better encoding can be constructed by allowing for the frequency of certain symbols, or
words: black = 0, white = 10, red = 110, blue = 111.
How is this encoding better?
0.50 black x 1 bit = .500
0.25 white x 2 bits = .500
0.125 red x 3 bits = .375
0.125 blue x 3 bits = .375
Average-- 1.750 bits per car
UNIT 3

Page 2

SYSTEMS, INFORMATION AND DECISION THEORY

ENTROPY:
A quantitative measure of the disorder of a system and inversely related to the amount of
energy available to do work in an isolated system. The more energy has become dispersed,
the less work it can perform and the greater the entropy.
Furthermore Information Theory tells us that the entropy of this information source is 1.75
bits per car and thus no encoding scheme will do better than the scheme we just described. In
general, an efficient code for a source will not represent single letters, as in our example
before, but will represent strings of letters or words. If we see three black cars, followed by a
white car, a red car, and a blue car, the sequence would be encoded as 00010110111, and the
original sequence of cars can readily be recovered from the encoded sequence.
SHANNONS THEOREM
Shannon's theorem, proved by Claude Shannon in 1948, describes the maximum possible
efficiency of error correcting methods versus levels of noise interference and data corruption.
The theory doesn't describe how to construct the error-correcting method; it only tells us how
good the best possible method can be. Shannon's theorem has wide-ranging applications in
both communications and data storage applications.

Where;
C is the post-correction effective channel capacity in bits per second;
W is the raw channel capacity in hertz (the bandwidth); and
S/N is the signal-to-noise ratio of the communication signal to the Gaussian noise interference
expressed as a straight power ratio (not as decibels)
Channel capacity, shown often as "C" in communication formulas, is the amount of discrete
information bits that a defined area or segment in a communications medium can hold.
The phrase signal-to-noise ratio, often abbreviated SNR or S/N, is an engineering term for
the ratio between the magnitude of a signal (meaningful information) and the magnitude of
background noise. Because many signals have a very wide dynamic range, SNRs are often
expressed in terms of the logarithmic decibel scale.
Example
UNIT 3

Page 3

SYSTEMS, INFORMATION AND DECISION THEORY

If the SNR is 20 dB, and the bandwidth available is 4 kHz, which is appropriate for telephone
communications, then C = 4 log2(1 + 100) = 4 log2 (101) = 26.63 kbit/s. Note that the value
of 100 is appropriate for an SNR of 20 dB. If it is required to transmit at 50 kbit/s, and a
bandwidth of 1 MHz is used, then the minimum SNR required is given by 50 = 1000
log2(1+S/N) so S/N = 2C/W -1 = 0.035 corresponding to an SNR of -14.5 dB. This shows
that it is possible to transmit using signals which are actually much weaker than the
background noise level.
SHANNONS LAW
Shannon's law is any statement defining the theoretical maximum rate at which error free
digits can be transmitted over a bandwidth limited channel in the presence of noise.
Core Assumptions and Statements
According to the theory, transmission of the message involved sending information through
electronic signals. Information in the information theory sense of the word, should not be
confused with information as we commonly understand it. According to Shannon and
Weaver, information is defined as a measure of ones freedom of choice when one selects a
message. In information theory, information and uncertainty are closely related. Information
refers to the degree of uncertainty present in a situation. The larger the uncertainty removed
by a message, the stronger the correlation between the input and output of a communication
channel, the more detailed particular instructions are the more information is transmitted.
Uncertainty also relates to the concept of predictability. When something is completely
predictable, it is completely certain. Therefore, it contains very little, if any, information. A
related term, entropy, is also important in information theory. Entropy refers to the degree of
randomness, lack of organization, or disorder in a situation. Information theory measures the
quantities of all kinds of information in terms of bits (binary digit). Redundancy is another
concept which has emerged from the information theory to communication. Redundancy is
the opposite of information. Something that is redundant adds little, if any, information to a
message. Redundancy is important because it helps combat noise in a communicating system
(e.g. in repeating the message). Noise is any factor in the process that works against the
predictability of the outcome of the communication process. Information theory has
contributed to the clarification of certain concepts such as noise, redundancy and entropy.
These concepts are inherently part of the communication process.
Shannon and Weaver broadly defined communication as all of the procedures by which one
mind may affect another. Their communication model consisted of an information source:
UNIT 3

Page 4

SYSTEMS, INFORMATION AND DECISION THEORY

the sources message, a transmitter, a signal, and a receiver: the receivers message, and a
destination. Eventually, the standard communication model featured the source or encoder,
who encodes a message by translating an idea into a code in terms of bits. A code is a
language or other set of symbols or signs that can be used to transmit a thought through one
or more channels to elicit a response in a receiver or decoder. Shannon and Weaver also
included the factor noise into the model. The study conducted by Shannon and Weaver was
motivated by the desire to increase the efficiency and accuracy or fidelity of transmission and
reception. Efficiency refers to the bits of information per second that can be sent and
received. Accuracy is the extent to which signals of information can be understood. In this
sense, accuracy refers more to clear reception than to the meaning of message. This
engineering model asks quite different questions than do other approaches to human
communication research.
Conceptual Model

Mathematical (information) model of communication.


Scope and Application
Studies on the model of Shannon and Weaver takes two major orientations. One stresses the
engineering principles of transmission and perception (in the electronic sciences). The other
orientation considers how people are able or unable to communicate accurately because they
have different experiences and attitudes (in the social sciences).

Application of information theory:


1) Data compression
2) Constructing decision tree using information gain
3) Accounting applications

Information content
UNIT 3

Page 5

SYSTEMS, INFORMATION AND DECISION THEORY

From Wikipedia, the free encyclopedia


The term information content is used to refer to the meaning of information as opposed to
the form or carrier of the information. For example, the meaning that is conveyed in an
expression (which may be a proposition) or document, which can be distinguished from the
sounds or symbols or codes and carrier that physically form the expression or document. An
information content is composed of a propositional content and an illocutionary force.
Example:
The meaning that is conveyed in an expression or document, which can be distinguished from
sounds or symbols or codes and carrier that physically form the expression or document.
Shannons theorem expressed the capacity of a channel: defining the amount of information
that can be sent down a noisy channel in terms of transmits power and bandwidth. He showed
that the engineers could choose to send a given amount of information using high power and
low bandwidth or high bandwidth and low power.
The traditional solution was to use narrow-band radios, which would focus all their power
into a small range of frequencies. The problem was that the number of users increased, the
number of channels began to be used up. Additionally, such radios were highly susceptible to
interference: so much power was confined to a small portion of the spectrum that a single
interfering signal in the frequency range could disrupt communication.
Shannon offered a solution to this problem by redefining the relationship between
information, noise and power. Shannon quantified the amount of information in a signal,
stating that is the amount of unexpected data the message contains. He called this
information content of message entropy. In digital communication a stream of
unexpected bits is just random noise. Shannon showed that the more a transmission resembles
random noise, the more information it can hold, as long as it is modulated to an appropriate
carrier: one needs a low entropy carrier to carry a high entropy message. Thus shannon stated
that an alternative to narrow band radios was sending a message with low power, spread over
a wide bandwidth.
Spread Spectrum is one technique, where it takes a narrow band signal and spreads its
power over a wide band of frequencies. This makes it incredibly resistant to interferences.

ENTROPY AS INFORMATION CONTENT:


UNIT 3

Page 6

SYSTEMS, INFORMATION AND DECISION THEORY

Entropy is defined in the context of a probabilistic model. Independent fair coin flips have
entropy of 1bit per flip. A source that always generates a long string of Bs has entropy of 0,
since the next character will always be a B.
The entropy rate of a data source means the average number of bits per symbol needed to
encode it. Shannons experiments with human predictors show information rate of between
0.6 and 1.3 bits per character, depending on the experimental set up, the PPM compression
can achieve a compression ratio of 1.5 bits per character in English text.
From the preceding example, note the following points:
1) The amount of entropy is not always an integer number of bits.
2) Many data bits may not convey information. For example, data structures often store
information redundantly, or have identical sections regardless of the information in
the data structure.
There are number of entropy related concepts that mathematically quantify information
content in some way:
1) The self information of an individual message or symbol taken from a given
probability distribution.
2) The entropy of a given probability distribution of messages or symbols
3) The entropy rate of a stochastic process
Limitations of entropy as information content:
Although entropy is often used as characterization of the information content of a data source
this information content is not absolute: it depends crucially on the probabilistic model. A
source that always generates the same symbol has an entropy rate of 0, but the definition of
what a symbol is depends on the alphabet.
Example:
Consider a source that produces the string ABABABABAB.. in which A is always
followed by B and vice cersa. If the probabilistic model considers individual letters as
independent, the entropy rate of the sequence is 1 bit per character. But if the sequence is
considered as AB AB AB AB AB. With symbols as two-character blocks, then the
entropy rate is 0 bits per character.

UNIT 3

Page 7

SYSTEMS, INFORMATION AND DECISION THEORY

However, if we use very large blocks, then the estimate of per-character entropy rate may
become artificially low. This is because in reality, the probability distribution of the sequence
is not knowable exactly; it is only an estimate. For example, suppose one considers the text of
every book ever published as a sequence, with each symbol being the text of a complete
book. If there are N published books, and each book is only published once, the estimate of
the probability of each book is 1/N, and the entropy (in bits) is -log 2 1/N = log2 N. As a
practical code, this corresponds to assigning each book a unique identifier and using it in
place of the text of the book whenever one wants to refer to the book. This is enormously
useful for talking about books, but it is not so useful for characterizing the information
content of an individual book, or of language in general: it is not possible to reconstruct the
book from its identifier without knowing the probability distribution, that is, the complete
text of all the books. The key idea is that the complexity of the probabilistic model must be
considered. Kolmogorov complexity is a theoretical generalization of this idea that allows the
consideration of the information content of a sequence independent of any particular
probability model; it considers the shortest program for a universal computer that outputs the
sequence. A code that achieves the entropy rate of a sequence for a given model, plus the
codebook (i.e. the probabilistic model), is one such program, but it may not be the shortest.
For example, the Fibonacci sequence is 1, 1, 2, 3, 5, 8, 13, ... . Treating the sequence as a
message and each number as a symbol, there are almost as many symbols as there are
characters in the message, giving an entropy of approximately log 2(n). So the first 128
symbols of the Fibonacci sequence has an entropy of approximately 7 bits/symbol. However,
the sequence can be expressed using a formula [F(n) = F(n-1) + F(n-2) for n={3,4,5,...},
F(1)=1, F(2)=1] and this formula has a much lower entropy and applies to any length of the
Fibonacci sequence.

Redundancy
Redundancy in information theory is the number of bits used to transmit a message minus
the number of bits of actual information in the message. Informally, it is the amount of
wasted "space" used to transmit certain data. Data compression is a way to reduce or
eliminate unwanted redundancy, while checksums are a way of adding desired redundancy
for purposes of error detection when communicating over a noisy channel of limited capacity.

UNIT 3

Page 8

SYSTEMS, INFORMATION AND DECISION THEORY

Quantitative definition
In describing the redundancy of raw data, recall that the rate of a source of information is the
average entropy per symbol. For memoryless sources, this is merely the entropy of each
symbol, while, in the most general case of a stochastic process, it is

the limit, as n goes to infinity, of the joint entropy of the first n symbols divided by n. It is
common in information theory to speak of the "rate" or "entropy" of a language. This is
appropriate, for example, when the source of information is English prose. The rate of a
memory less source is simply H(M), since by definition there is no interdependence of the
successive messages of a memory less source.
The absolute rate of a language or source is simply

the logarithm of the cardinality of the message space, or alphabet. (This formula is sometimes
called the Hartley function.) This is the maximum possible rate of information that can be
transmitted with that alphabet. (The logarithm should be taken to a base appropriate for the
unit of measurement in use.) The absolute rate is equal to the actual rate if the source is
memory less and has a uniform distribution.
The absolute redundancy can then be defined as

the difference between the absolute rate and the rate.

The quantity

is called the relative redundancy and gives the maximum possible data

compression ratio, when expressed as the percentage by which a file size can be decreased.
(When expressed as a ratio of original file size to compressed file size, the quantity R:r gives
the maximum compression ratio that can be achieved.) Complementary to the concept of

relative redundancy is efficiency, defined as

so that

. A memory less source

with a uniform distribution has zero redundancy (and thus 100% efficiency), and cannot be
compressed.

UNIT 3

Page 9

SYSTEMS, INFORMATION AND DECISION THEORY

Other notions of redundancy


A measure of redundancy between two variables is the mutual information or a normalized
variant. A measure of redundancy among many variables is given by the total correlation.
Redundancy of compressed data refers to the difference between the expected compressed
data length of n messages

(or expected data rate

) and the entropy

(or entropy rate ). (Here we assume the data is ergodic and stationary, e.g., a memoryless
source.) Although the rate difference
the actual difference

can be arbitrarily small as

increased,

, cannot, although it can be theoretically upper-bounded

by 1 in the case of finite-entropy memoryless sources.

CLASSIFICATION AND COMPRESSION

Availability classification

Sensitivity classification
o

Concepts

Class : Public / non classified information

Class : Internal information

Class : Confidential information

Class : Secret information

Information of different types need to be secured in different ways. Therefore a classification


system is needed, whereby information is classified, a policy is laid down on how to handle
information according to it's class and security mechanisms are enforced on systems handling
information accordingly.
4.1 Availability classification
Here a classification system is proposed which has four availability classes. It is based on the
author's experience, as no equivalent standards are available for reference.
To improve availability, preventative measures reduce the probability of downtime and
recovery measures reduce the downtime after an incident.

UNIT 3

Page 10

SYSTEMS, INFORMATION AND DECISION THEORY

Class
Maximum

allowed

Server

downtime, per event


On which Days?

1 Week

1 Day

1 Hour

1 Hour

Mon-Fri

Mon-Fri

Mon-Fri

7 Days

07:00-18:00

24h

99.5%

99.9%

During what hours?


Expected availability percentage 80%
==> expected max. downtime

=
day/week

95%
1=

hours/Week

= 20min./Week = 12min./month

4.2 Sensitivity classification


A classification system is proposed which classes information / processes into four levels.
The lowest is the least sensitive and the highest is for the most important information /
processes.
4.2.1 Concepts

All data has an owner.

The data or process owner must classify the information into one of the security
levels- depending on legal obligations, costs, corporate into policy and business
needs.

If the owner is not sure at what level data should be classified, use level .

The owner must declare who is allowed access to the data.

The owner is responsible for this data and must secure it or have it secured (e.g. via a
security administrator) according to it's classification.

All documents should be classified and the classification level should be written on at
least the title page.

Once the data on a system has been classified to one of the following levels, then that system
should be installed to conform to all directives for that class and classes below. Each level is a
superset of the previous level. For example, if a system is classified as class

, then the

system must follow the directives of class , and .


If a system contains data or more than one sensitivity class, it must be classified according
that needed for the most confidential data on the system.
UNIT 3

Page 11

SYSTEMS, INFORMATION AND DECISION THEORY

4.2.2 Class : Public / non classified information


Data on these systems could be made public without any implications for the
company (i.e. the data is not confidential). Data integrity is not vital. Loss of service
due to malicious attacks is an acceptable danger.
Examples: Test services without confidential data, certain public information services,
product brochures widely distributed, data available in the public domain anyway.
4.2.3 Class : Internal information
External access to this data is to be prevented, but should this data become public, the
consequences are not critical (e.g. the company may be publicly embarrassed).
Internal access is selective. Data integrity is important but not vital.
Examples of this type of data are found in development groups (where no live data is
present), certain production public services, certain Customer Data, "normal" working
documents and project/meeting protocols, Telephone books.
4.2.4 Class : Confidential information
Data in this class is confidential within the company and protected from external
access. If such data were to be accessed by unauthorised persons, it could influence
the company's operational effectiveness, cause an important financial loss, provide a
significant gain to a competitor or cause a major drop in customer confidence. Data
integrity is vital.
Examples: Datacenters normally maintain this level of security. Salaries, Personnel
data, Accounting data, passwords, information on corporate security weaknesses, very
confidential customer data and confidential contracts.
4.2.5 Class : Secret information
Unauthorised external or internal access to this data would be critical to the company.
Data integrity is vital. The number of people with access to this data should be very
small. Very strict rules must be adhered to in the usage of this data.
Examples: Military data, secret contracts.

Data compression
UNIT 3

Page 12

SYSTEMS, INFORMATION AND DECISION THEORY

In computer science and information theory, data compression, source coding,[1] or bit-rate
reduction involves encoding information using fewer bits than the original representation.
Compression is useful because it helps reduce the consumption of resources such as data
space or transmission capacity. Because compressed data must be decompressed to be used,
this extra processing imposes computational or other costs through decompression. For
instance, a compression scheme for video may require expensive hardware for the video to be
decompressed fast enough to be viewed as it is being decompressed, and the option to
decompress the video in full before watching it may be inconvenient or require additional
storage. The design of data compression schemes involve trade-offs among various factors,
including the degree of compression, the amount of distortion introduced (e.g., when using
lossy data compression), and the computational resources required to compress and
uncompress the data. Compression was one of the main drivers for the growth of information
during the past two decades.[2]

Lossless and lossy compression


Main articles: Lossless data compression and Lossy data compression
Lossless compression algorithms usually exploit statistical redundancy to represent data more
concisely without losing information. Lossless compression is possible because most realworld data has statistical redundancy. For example, an image may have areas of colour that
do not change over several pixels; instead of coding "red pixel, red pixel, ..." the data may be
encoded as "279 red pixels". This is a simple example of run-length encoding; there are many
schemes to reduce size by eliminating redundancy.
Lossless compression is contrasted with lossy data compression. In these schemes, some loss
of information is acceptable. Depending upon the application, detail can be dropped from the
data to save storage space. Generally, lossy data compression schemes are guided by research
on how people perceive the data in question. For example, the human eye is more sensitive to
subtle variations in luminance than it is to variations in color. JPEG image compression
works in part by "rounding off" less-important visual information. There is a corresponding
trade-off between information lost and the size reduction. A number of popular compression
formats exploit these perceptual differences, including those used in music files, images, and
video.

Lossy
UNIT 3

Page 13

SYSTEMS, INFORMATION AND DECISION THEORY

Lossy image compression is used in digital cameras, to increase storage capacities with
minimal degradation of picture quality. Similarly, DVDs use the lossy MPEG-2 Video codec
for video compression.
In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or
less audible) components of the signal. Compression of human speech is often performed
with even more specialized techniques, so that "speech compression" or "voice coding" is
sometimes distinguished as a separate discipline from "audio compression". Different audio
and speech compression standards are listed under audio codecs. Voice compression is used
in Internet telephony for example, while audio compression is used for CD ripping and is
decoded by audio players.

Lossless
The LempelZiv (LZ) compression methods are among the most popular algorithms for
lossless storage. DEFLATE is a variation on LZ which is optimized for decompression speed
and compression ratio, but compression can be slow. DEFLATE is used in PKZIP, gzip and
PNG. LZW (LempelZivWelch) is used in GIF images. Also noteworthy are the LZR (LZ
Renau) methods, which serve as the basis of the Zip method. LZ methods utilize a tablebased compression model where table entries are substituted for repeated strings of data. For
most LZ methods, this table is generated dynamically from earlier data in the input. The table
itself is often Huffman encoded (e.g. SHRI, LZX). A current LZ-based coding scheme that
performs well is LZX, used in Microsoft's CAB format.
The very best modern lossless compressors use probabilistic models, such as prediction by
partial matching. The BurrowsWheeler transform can also be viewed as an indirect form of
statistical modelling.
In a further refinement of these techniques, statistical predictions can be coupled to an
algorithm called arithmetic coding. Arithmetic coding, invented by Jorma Rissanen, and
turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to
the better-known Huffman algorithm, and lends itself especially well to adaptive data
compression tasks where the predictions are strongly context-dependent. Arithmetic coding is
used in the bilevel image-compression standard JBIG, and the document-compression
standard DjVu. The text entry system, Dasher, is an inverse-arithmetic-coder.

Theory
UNIT 3

Page 14

SYSTEMS, INFORMATION AND DECISION THEORY

The theoretical background of compression is provided by information theory (which is


closely related to algorithmic information theory) for lossless compression, and by rate
distortion theory for lossy compression. These fields of study were essentially created by
Claude Shannon, who published fundamental papers on the topic in the late 1940s and early
1950s. Coding theory is also related. The idea of data compression is deeply connected with
statistical inference.

Machine learning
See also: Machine learning
There is a close connection between machine learning and compression: a system that
predicts the posterior probabilities of a sequence given its entire history can be used for
optimal data compression (by using arithmetic coding on the output distribution), while an
optimal compressor can be used for prediction (by finding the symbol that compresses best,
given the previous history). This equivalence has been used as justification for data
compression as a benchmark for "general intelligence".[3]

Data differencing
Main article: Data differencing
Data compression can be viewed as a special case of data differencing:[4][5] data differencing
consists of producing a difference given a source and a target, with patching producing a
target given a source and a difference, while data compression consists of producing a
compressed file given a target, and decompression consists of producing a target given only a
compressed file. Thus, one can consider data compression as data differencing with empty
source data, the compressed file corresponding to a "difference from nothing". This is the
same as considering absolute entropy (corresponding to data compression) as a special case
of relative entropy (corresponding to data differencing) with no initial data.
When one wishes to emphasize the connection, one may use the term differential
compression to refer to data differencing.

Outlook and currently unused potential


It is estimated that the total amount of the information that is stored on the world's storage
devices could be further compressed with existing compression algorithms by a remaining
UNIT 3

Page 15

SYSTEMS, INFORMATION AND DECISION THEORY

average factor of 4.5 : 1. It is estimated that the combined technological capacity of the world
to store information provides 1,300 exabytes of hardware digits in 2007, but when the
corresponding content is optimally compressed, this only represents 295 exabytes of Shannon
information.[2]

Uses
Audio
Audio data compression, as distinguished from dynamic range compression]], reduces the
transmission bandwidth and storage requirements of audio data. Audio compression
algorithms are implemented in software as audio codecs. Lossy audio compression
algorithms provide higher compression at the cost of fidelity, are used in numerous audio
applications. These algorithms almost all rely on psychoacoustics to eliminate less audible or
meaningful sounds, thereby reducing the space required to store or transmit them.
In both lossy and lossless compression, information redundancy is reduced, using methods
such as coding, pattern recognition and linear prediction to reduce the amount of information
used to represent the uncompressed data.
The acceptable trade-off between loss of audio quality and transmission or storage size
depends upon the application. For example, one 640MB compact disc (CD) holds
approximately one hour of uncompressed high fidelity music, less than 2 hours of music
compressed losslessly, or 7 hours of music compressed in the MP3 format at a medium bit
rate. A digital sound recorder can typically store around 200 hours of clearly intelligible
speech in 640MB[6].
Lossless audio compression produces a representation of digital data that decompresses to an
exact digital duplicate of the original audio stream, unlike playback from lossy compression
techniques such as Vorbis and MP3. Compression ratios are around 5060% of original
size[7], similar to those for generic lossless data compression. Lossy compression depends
upon the quality required, but typically yields files of 5 to 20% of the size of the
uncompressed original.[8] Lossless compression is unable to attain high compression ratios
due to the complexity of wave forms and the rapid changes in sound forms. Codecs like
FLAC, Shorten and TTA use linear prediction to estimate the spectrum of the signal. Many of
these algorithms use convolution with the filter [-1 1] to slightly whiten or flatten the
spectrum, thereby allowing traditional lossless compression to work more efficiently. The
process is reversed upon decompression.
UNIT 3

Page 16

SYSTEMS, INFORMATION AND DECISION THEORY

When audio files are to be processed, either by further compression or for editing, it is
desirable to work from an unchanged original (uncompressed or losslessly compressed).
Processing of a lossily compressed file for some purpose usually produces a final result
inferior to creation of the same compressed file from an uncompressed original. In addition to
sound editing or mixing, lossless audio compression is often used for archival storage, or as
master copies.
A number of lossless audio compression formats exist. Shorten was an early lossless format.
Newer ones include Free Lossless Audio Codec (FLAC), Apple's Apple Lossless, MPEG-4
ALS, Microsoft's Windows Media Audio 9 Lossless (WMA Lossless), Monkey's Audio, and
TTA. See list of lossless codecs for a complete list.
Some audio formats feature a combination of a lossy format and a lossless correction; this
allows stripping the correction to easily obtain a lossy file. Such formats include MPEG-4
SLS (Scalable to Lossless), WavPack, and OptimFROG DualStream.
Other formats are associated with a distinct system, such as:

Direct Stream Transfer, used in Super Audio CD

Meridian Lossless Packing, used in DVD-Audio, Dolby TrueHD, Blu-ray and HD


DVD

Lossy audio compression

UNIT 3

Page 17

SYSTEMS, INFORMATION AND DECISION THEORY

Comparison of acoustic spectrograms of a song in an uncompressed format and various lossy


formats. The fact that the lossy spectrograms are different from the uncompressed one
indicates that they are in fact lossy, but nothing can be assumed about the effect of the
changes on perceived quality.
Lossy audio compression is used in a wide range of applications. In addition to the direct
applications (mp3 players or computers), digitally compressed audio streams are used in most
video DVDs; digital television; streaming media on the internet; satellite and cable radio; and
increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater
compression than lossless compression (data of 5 percent to 20 percent of the original stream,
rather than 50 percent to 60 percent), by discarding less-critical data.
The innovation of lossy audio compression was to use psychoacoustics to recognize that not
all data in an audio stream can be perceived by the human auditory system. Most lossy
compression reduces perceptual redundancy by first identifying sounds which are considered
perceptually irrelevant, that is, sounds that are very hard to hear. Typical examples include
high frequencies, or sounds that occur at the same time as louder sounds. Those sounds are
coded with decreased accuracy or not coded at all.
Due to the nature of lossy algorithms, audio quality suffers when a file is decompressed and
recompressed (digital generation loss). This makes lossy compression unsuitable for storing
the intermediate results in professional audio engineering applications, such as sound editing
and multitrack recording. However, they are very popular with end users (particularly MP3),
as a megabyte can store about a minute's worth of music at adequate quality.
Coding methods

In order to determine what information in an audio signal is perceptually irrelevant, most


lossy compression algorithms use transforms such as the modified discrete cosine transform
(MDCT) to convert time domain sampled waveforms into a transform domain. Once
transformed, typically into the frequency domain, component frequencies can be allocated
bits according to how audible they are. Audibility of spectral components is determined by
first calculating a masking threshold, below which it is estimated that sounds will be beyond
the limits of human perception.
The masking threshold is calculated using the absolute threshold of hearing and the principles
of simultaneous maskingthe phenomenon wherein a signal is masked by another signal
separated by frequency, and, in some cases, temporal maskingwhere a signal is masked by
another signal separated by time. Equal-loudness contours may also be used to weight the
UNIT 3

Page 18

SYSTEMS, INFORMATION AND DECISION THEORY

perceptual importance of different components. Models of the human ear-brain combination


incorporating such effects are often called psychoacoustic models.
Other types of lossy compressors, such as the linear predictive coding (LPC) used with
speech, are source-based coders. These coders use a model of the sound's generator (such as
the human vocal tract with LPC) to whiten the audio signal (i.e., flatten its spectrum) prior to
quantization. LPC may also be thought of as a basic perceptual coding technique;
reconstruction of an audio signal using a linear predictor shapes the coder's quantization noise
into the spectrum of the target signal, partially masking it.
Lossy formats are often used for the distribution of streaming audio, or interactive
applications (such as the coding of speech for digital transmission in cell phone networks). In
such applications, the data must be decompressed as the data flows, rather than after the
entire data stream has been transmitted. Not all audio codecs can be used for streaming
applications, and for such applications a codec designed to stream data effectively will
usually be chosen.
Latency results from the methods used to encode and decode the data. Some codecs will
analyze a longer segment of the data to optimize efficiency, and then code it in a manner that
requires a larger segment of data at one time in order to decode. (Often codecs create
segments called a "frame" to create discrete data segments for encoding and decoding.) The
inherent latency of the coding algorithm can be critical; for example, when there is two-way
transmission of data, such as with a telephone conversation, significant delays may seriously
degrade the perceived quality.
In contrast to the speed of compression, which is proportional to the number of operations
required by the algorithm, here latency refers to the number of samples which must be
analysed before a block of audio is processed. In the minimum case, latency is 0 zero samples
(e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal).
Time domain algorithms such as LPC also often have low latencies, hence their popularity in
speech coding for telephony. In algorithms such as MP3, however, a large number of samples
have to be analyzed in order to implement a psychoacoustic model in the frequency domain,
and latency is on the order of 23 ms (46 ms for two-way communication).
Speech encoding

Speech encoding is an important category of audio data compression. The perceptual models
used to estimate what a human ear can hear are generally somewhat different from those used
for music. The range of frequencies needed to convey the sounds of a human voice are
UNIT 3

Page 19

SYSTEMS, INFORMATION AND DECISION THEORY

normally far narrower than that needed for music, and the sound is normally less complex. As
a result, speech can be encoded at high quality using a relatively low bit rate.
This is accomplished, in general, by some combination of two approaches:

Only encoding sounds that could be made by a single human voice.

Throwing away more of the data in the signalkeeping just enough to reconstruct an
"intelligible" voice rather than the full frequency range of human hearing.

Perhaps the earliest algorithms used in speech encoding (and audio data compression in
general) were the A-law algorithm and the -law algorithm.
History

Solidyne 922: The world's first commercial audio bit compression card for PC, 1990
A literature compendium for a large variety of audio coding systems was published in the
IEEE Journal on Selected Areas in Communications (JSAC), February 1988. While there
were some papers from before that time, this collection documented an entire variety of
finished, working audio coders, nearly all of them using perceptual (i.e. masking) techniques
and some kind of frequency analysis and back-end noiseless coding.[9] Several of these papers
remarked on the difficulty of obtaining good, clean digital audio for research purposes. Most,
if not all, of the authors in the JSAC edition were also active in the MPEG-1 Audio
committee.
The world's first commercial broadcast automation audio compression system was developed
by Oscar Bonello, an Engineering professor at the University of Buenos Aires.[10] In 1983,
using the psychoacoustic principle of the masking of critical bands first published in 1967, [11]
he started developing a practical application based on the recently developed IBM PC
computer, and the broadcast automation system was launched in 1987 under the name

UNIT 3

Page 20

SYSTEMS, INFORMATION AND DECISION THEORY

Audicom. 20 years later, almost all the radio stations in the world were using similar
technology, manufactured by a number of companies.

Video
Video compression uses modern coding techniques to reduce redundancy in video data. Most
video compression algorithms and codecs combine spatial image compression and temporal
motion compensation. Video compression is a practical implementation of source coding in
information theory. In practice most video codecs also use audio compression techniques in
parallel to compress the separate, but combined data streams.
The majority of video compression algorithms use lossy compression. Large amounts of data
may be eliminated while being perceptually indistinguishable. As in all lossy compression,
there is a tradeoff between video quality, cost of processing the compression and
decompression, and system requirements. Highly compressed video may present visible or
distracting artifacts.
Video compression typically operates on square-shaped groups of neighboring pixels, often
called macroblocks. These pixel groups or blocks of pixels are compared from one frame to
the next and the video compression codec sends only the differences within those blocks. In
areas of video with more motion, the compression must encode more data to keep up with the
larger number of pixels that are changing. Commonly during explosions, flames, flocks of
animals, and in some panning shots, the high-frequency detail leads to quality decreases or to
increases in the variable bitrate.
Encoding theory
Video data may be represented as a series of still image frames. The sequence of frames
contains spatial and temporal redundancy that video compression algorithms attempt to
eliminate or code in a smaller size. Similarities can be encoded by only storing differences
between frames, or by using perceptual features of human vision. For example, small
differences in color are more difficult to perceive than are changes in brightness.
Compression algorithms can average a color across these similar areas to reduce space, in a
manner similar to those used in JPEG image compression.[12] Some of these methods are
inherently lossy while others may preserve all relevant information from the original,
uncompressed video.
One of the most powerful techniques for compressing video is interframe compression.
Interframe compression uses one or more earlier or later frames in a sequence to compress
UNIT 3

Page 21

SYSTEMS, INFORMATION AND DECISION THEORY

the current frame, while intraframe compression uses only the current frame, effectively
being image compression.
The most commonly used method works by comparing each frame in the video with the
previous one. If the frame contains areas where nothing has moved, the system simply issues
a short command that copies that part of the previous frame, bit-for-bit, into the next one. If
sections of the frame move in a simple manner, the compressor emits a (slightly longer)
command that tells the decompresser to shift, rotate, lighten, or darken the copy: a longer
command, but still much shorter than intraframe compression. Interframe compression works
well for programs that will simply be played back by the viewer, but can cause problems if
the video sequence needs to be edited.
Because interframe compression copies data from one frame to another, if the original frame
is simply cut out (or lost in transmission), the following frames cannot be reconstructed
properly. Some video formats, such as DV, compress each frame independently using
intraframe compression. Making 'cuts' in intraframe-compressed video is almost as easy as
editing uncompressed video: one finds the beginning and ending of each frame, and simply
copies bit-for-bit each frame that one wants to keep, and discards the frames one doesn't
want. Another difference between intraframe and interframe compression is that with
intraframe systems, each frame uses a similar amount of data. In most interframe systems,
certain frames (such as "I frames" in MPEG-2) aren't allowed to copy data from other frames,
and so require much more data than other frames nearby.
It is possible to build a computer-based video editor that spots problems caused when I
frames are edited out while other frames need them. This has allowed newer formats like
HDV to be used for editing. However, this process demands a lot more computing power than
editing intraframe compressed video with the same picture quality.
Today, nearly all commonly used video compression methods (e.g., those in standards
approved by the ITU-T or ISO) apply a discrete cosine transform (DCT) for spatial
redundancy reduction. Other methods, such as fractal compression, matching pursuit and the
use of a discrete wavelet transform (DWT) have been the subject of some research, but are
typically not used in practical products (except for the use of wavelet coding as still-image
coders without motion compensation). Interest in fractal compression seems to be waning,
due to recent theoretical analysis showing a comparative lack of effectiveness of such
methods.

UNIT 3

Page 22

SYSTEMS, INFORMATION AND DECISION THEORY

SUMMARIZING AND FILTERING


Information System with Summarization
Use
The Information System for investment programs consists of two parts. There are the
drilldown reports for the "normal" database of your system or client. Along with these, there
are also drilldown reports that access a summarization database. The summarization database
makes it possible for you to update and report on a summarized dataset for an investment
program and its dependent appropriation requests and measures. This offers several
advantages:

The drilldown reports that access the summarized dataset have better performance
than unsummarized reports.

Data from several local systems can be imported into the summarization database.
These local systems can be SAP R/3 Systems, SAP R/2 Systems or non-SAP systems.

You can manage different versions of the summarization database. You can thereby
manage "snapshots" of the investment data of your enterprise. Using versions, you
can create a sequence showing the data on the investments in your enterprise over
time.

You can still generate reports on data from the past, even if the objects to be reported
on have themselves already been deleted from the system.

Features
The central part of summarized reporting is the summarization database. In the
summarization database, you store summarized characteristics and key figures on investment
programs and the appropriation requests and measures assigned to them. You store these on a
periodic basis. The characteristics and key figures contain three types of data:

The actual values

The master data

The investment program hierarchy

UNIT 3

Page 23

SYSTEMS, INFORMATION AND DECISION THEORY

The summarization database is client-dependent. You can define only one summarization
database in each client for investment programs. But you can still define any number of
summarization versions in a given client.
Data Transfer
Depending on the source of the summarization data, there are different system functions for
transferring the data to the summarization database (Investment Programs Periodic
processing Summarization).

The summarization data come from the same client or R/3 System as the client or
system in which the summarization database is managed. In this case, you can select
summarization data immediately (Copy program) and write it to the summarization
database (Summarize values).

The summarization database is supplied with values from other R/3 systems. Then
you have to first export the data to a file (Output to file). Then you can import the data
to the central system and summarize it there (Import from file).
When you export data to a file, you can select either already summarized data, or
unsummarized data. Selecting summarized data makes sense when the system
providing the data has its own summarization database that is supplied with data from
other local systems (such as, other subsidiaries).

The data comes from an SAP R/2 RK-P System. In this case, there is a data
procurement program you can use to select the relevant projects. It then writes them
to the summarization database in the R/3 System. You can also assign a project in
SAP R/2 RK-P to an investment program position in the reporting system. You can
either assign the project directly, or by means of the appropriation request.

The data comes from an external, non-SAP system. There is an interface for updating
the summarization database from external systems. For more information, see Special
Considerations for Summarizing from Local Systems

Entities
Along with the investment program structure and the values, you can also store the following
entities in the summarization database with their keys and short texts:

Measures

Appropriation requests

UNIT 3

Page 24

SYSTEMS, INFORMATION AND DECISION THEORY

Profit centers

Cost centers

Plants

Functional locations

Storing the key and short text of the entities in the summarization database is necessary in
order to be able to display the entities in report. The same applies when the summarization
takes place within one system or client. There is a function for this in the Investment
Programs menu under Periodic processing Summarization.
Storing in the summarization database also has another advantage. The entities are still
available, even if they are changed or reorganized in their original database.

Characteristics that Can Be Summarized


The system achieves the "summarization" by summarizing using specific characteristics,
rather than storing all characteristics and all characteristic values of the investment program.
For example, if you want to create reports using the summarization database that only have to
do with organizational units, you do not have to store investment data at the object level.
Instead, you summarize using the characteristics "appropriation request," and "measure." In
this way, you can considerably reduce the number of data records in the summarization
database.
In Customizing, you specify which characteristics the system should summarize, and which
should not be summarized (Information System Using Summarization).
You cannot summarize using the characteristics "fiscal year," "plan version," or "value type."
Otherwise the information about the type of key figure is lost. Summarizing using the
characteristics "investment program" or "approval year" is also not possible, because the
summarization data always has to be stored in relation to an investment program. Since most
standard reports use investment program positions as drilldown characteristics, it generally
does not make sense to use "investment program position" as a characteristic for
summarization.

Summarization by Scale
UNIT 3

Page 25

SYSTEMS, INFORMATION AND DECISION THEORY

If you manage the characteristics "appropriation request" or "measure" in the summarization


database in detail display form (meaning that the data are not summarized), then you can
specify in Customizing that objects that have a certain scale are summarized anyway. To do
this, remove the "detail display" indicator from the definition of the scale you want to use
(Appropriation Requests Master Data Define allowed values for certain master data fields
Define scale).
Using this method, you can exclude measures and appropriation requests that have an
insignificant value from summarization reporting. At the same time you benefit from a detail
display of more important measures and appropriation requests.
You enter scales in the master data of investment program positions, appropriation requests
and measures. For objects that are not managed in an SAP R/3 System, the system determines
the scale from the appropriation request or investment program position to which they are
assigned.

Summarization Versions
In order for you to have multiple "snapshots" of your investment programs at different times,
the system saves the data in the summarization database in summarization versions. The
summarization versions allow you to "preserve" the investment program in different phases.
You have to enter a summarization version for the procurement of data for the
summarization, as well as when you run summarization reports. If you request data collection
for a summarization version that already has data in the summarization database, the system
overwrites the existing data.
Summarization Versions
You can define any number of summarization versions in Customizing (Information System
Information System Using Summarization). A summarization version consists solely of a key
and a text.

Activities
You create a summarization database in the following steps (Periodic processing
Summarization):
UNIT 3

Page 26

SYSTEMS, INFORMATION AND DECISION THEORY

Copy the investment program to the summarization database (Summarization In


own client Copy program).

Export the investment program values from the original system to a file
(Summarization Output to file).

Export the entities from the original system, each to its own file.

Import the investment program values and entities (Summarization Import from file).

Exporting and importing are not necessary if the data does not come from other clients or
systems (that is, all the data is in the same system or client as the summarization database). In
that case, you have to use the functions Summarize values and Copy entities (under
Summarization (In own client).

Information filtering system


An Information filtering system is a system that removes redundant or unwanted
information from an information stream using (semi)automated or computerized methods
prior to presentation to a human user. Its main goal is the management of the information
overload and increment of the semantic signal-to-noise ratio. To do this the user's profile is
compared to some reference characteristics. These characteristics may originate from the
information item (the content-based approach) or the user's social environment (the
collaborative filtering approach).
Whereas in information transmission signal processing filters are used against syntaxdisrupting noise on the bit-level, the methods employed in information filtering act on the
semantic level.
The range of machine methods employed builds on the same principles as those for
information extraction. A notable application can be found in the field of email spam filters.
Thus, it is not only the information explosion that necessitates some form of filters, but also
inadvertently or maliciously introduced pseudo-information.
On the presentation level, information filtering takes the form of user-preferences-based
newsfeeds, etc..
Recommender systems are active information filtering systems that attempt to present to the
user information items (film, television, music, books, news, web pages) the user is interested
in. These systems add information items to the information flowing towards the user, as
opposed to removing information items from the information flow towards the user.
UNIT 3

Page 27

SYSTEMS, INFORMATION AND DECISION THEORY

Recommender systems typically use collaborative filtering approaches or a combination of


the collaborative filtering and content-based filtering approaches, although content-based
recommender systems do exist.
History
Before the advent of the Internet, there are already several methods of filtering information;
for instance, if a government controls and restricts the flow of information, speaking of
censorship, although, somewhat in a democratic country it will do to satisfy needs of
beneficiaries.
On the other hand, we are going to talk about information filters if we refer to newspaper
editors and journalists when they provide a service that selects the most valuable information
for their clients, readers of books, magazines, newspapers, radio listeners and viewers TV.
This filtering operation is also present in schools and universities where there is a selection of
information to provide assistance based on academic criteria to customers of this service, the
students. With the advent of the Internet it increases the possibility that anyone can publish
low-cost all one wish. In this way, it increases considerably the less useful information and
consequently the quality information is disseminated. With this problem, it began to devise
new filtering with which we can get the information required for each specific topic to easily
and efficiently.
Operation
A filtering system of this style consists of several tools that help people find the most
valuable information, so the limited time you can dedicate to read / listen / view, is correctly
directional in the most interesting and valuable documents, aside from the most
inconsequential. These filters are also used to organize and structure information in a correct
and understandable way, in addition to group messages on the mail addressed. These filters
are very necessary in the results obtained of the search engines on the Internet. The functions
of filtering improves every day to get downloading Web documents and more efficient
messages.
Criterion
One of the criteria used in this step is whether the knowledge is harmful or not, whether
knowledge allows a better understanding with or without the concept. In this case the task of
information filtering to reduce or eliminate the harmful information with knowledge.
Learning System
UNIT 3

Page 28

SYSTEMS, INFORMATION AND DECISION THEORY

A system of learning content consists, in general rules, mainly of three basic stages:
1. First, a system that provides solutions to a defined set of tasks.
2. Subsequently it undergoes assessment criteria which will measure the performance of
the previous stage in relation to solutions of problems.
3. Acquisition module which its output obtained knowledge that are used in the system
solver of the first stage.
Future
Currently the problem is not finding the best way to filter information, but the way that
these systems require to learn independently the information needs of users. Not only because
they automate the process of filtering but also the construction and adaptation of the filter.
Some branches based on it, such as statistics, machine learning, pattern recognition and data
mining, are the base for developing information filters that appear and adapt in base to
experience. To allow the learning process can be carried out, part of the information has to be
pre-filtered, it means there are positive and negative examples which we named training data,
which can be generated by experts or, via feedback through ordinary users.
Error
As data is entered, the system includes new rules; if we consider that this data can generalize
the training data information, then we have to evaluate the system development and measure
the system's ability to correctly predict the categories of new information. This step is
simplified by separating the training data in a new series called "test data" that we will use to
measure the error rate. As a general rule it is important to distinguish between types of errors
(false positives and false negatives). For example, in the case on an aggregator of content for
children, it doesnt have the same gravity to allow the passage of information not suitable for
them, that shows violence or pornography, than the mistake to discard some appropriated
information. To improve the system to lower error rates and have these systems with learning
capabilities similar to humans we require development of systems that simulate human
cognitive abilities, such as natural language understanding, capturing meaning Common an
other forms of advanced processing to achieve the semantics of information.
Fields of use
Nowadays, there are numerous techniques to develop information filters, some of these reach
error rates lower than 10% in various experiments. Among these techniques there are decision
UNIT 3

Page 29

SYSTEMS, INFORMATION AND DECISION THEORY

trees, support vector machines, neural networks, Bayesian networks, linear discriminants,
logistic regression, etc.. At present, these techniques are used in different applications, not
only in the web context, but in thematic issues as varied as voice recognition, classification of
telescopic astronomy or evaluation of financial risk.

INFRENCES AND UNCERTAINTY


Inference
Inference is the act or process of deriving logical conclusions from premises known or assumed to be
true.[1] The conclusion drawn is also called an idiomatic. The laws of valid inference are studied in the
field of logic.

Human inference (i.e. how humans draw conclusions) is traditionally studied within the field
of cognitive psychology; artificial intelligence researchers develop automated inference
systems to emulate human inference. Statistical inference allows for inference from
quantitative data.

Definition of inference
The process by which a conclusion is inferred from multiple observations is called inductive
reasoning. The conclusion may be correct or incorrect, or correct to within a certain degree of
accuracy, or correct in certain situations. Conclusions inferred from multiple observations
may be tested by additional observations.

This definition is disputable (due to its lack of clarity. Ref: Oxford English dictionary:
"induction ... 3. Logic the inference of a general law from particular instances.") The
definition given thus applies only when the "conclusion" is general.
1. A conclusion reached on the basis of evidence and reasoning. 2. The process of reaching
such a conclusion: "order, health, and by inference cleanliness".

Examples of inference
Greek philosophers defined a number of syllogisms, correct three part inferences, that can be
used as building blocks for more complex reasoning. We begin with the most famous of them
all:
1. All men are mortal
2. Socrates is a man
UNIT 3

Page 30

SYSTEMS, INFORMATION AND DECISION THEORY


3. Therefore, Socrates is mortal.

The reader can check that the premises and conclusion are true, but Logic is concerned with
inference: does the truth of the conclusion follow from that of the premises?
The validity of an inference depends on the form of the inference. That is, the word "valid"
does not refer to the truth of the premises or the conclusion, but rather to the form of the
inference. An inference can be valid even if the parts are false, and can be invalid even if the
parts are true. But a valid form with true premises will always have a true conclusion.
For example, consider the form of the following symbological track:
1. All fruits are sweet.
2. A banana is a fruit.
3. Therefore, a banana is sweet.

For the conclusion to be necessarily true, the premises need to be true.


Now we turn to an invalid form.
1. -All A are B.
2. -C is a B.
3. -Therefore, C is an A.

To show that this form is invalid, we demonstrate how it can lead from true premises to a
false conclusion.
1. All apples are fruit. (Correct)
2. Bananas are fruit. (Correct)
3. Therefore, bananas are apples. (Wrong)

A valid argument with false premises may lead to a false conclusion:


1. All fat people are Greek.
2. John Lennon was fat.
3. Therefore, John Lennon was Greek.

When a valid argument is used to derive a false conclusion from false premises, the inference
is valid because it follows the form of a correct inference.
A valid argument can also be used to derive a true conclusion from false premises:
1. All fat people are musicians
UNIT 3

Page 31

SYSTEMS, INFORMATION AND DECISION THEORY


2. John Lennon was fat
3. Therefore, John Lennon was a musician

In this case we have two false premises that imply a true conclusion.

Incorrect inference
An incorrect inference is known as a fallacy. Philosophers who study informal logic have
compiled large lists of them, and cognitive psychologists have documented many biases in
human reasoning that favor incorrect reasoning.

Automatic logical inference


AI systems first provided automated logical inference and these were once extremely popular
research topics, leading to industrial applications under the form of expert systems and later
business rule engines.
An inference system's job is to extend a knowledge base automatically. The knowledge base
(KB) is a set of propositions that represent what the system knows about the world. Several
techniques can be used by that system to extend KB by means of valid inferences. An
additional requirement is that the conclusions the system arrives at are relevant to its task.

Example using Prolog


Prolog (for "Programming in Logic") is a programming language based on a subset of
predicate calculus. Its main job is to check whether a certain proposition can be inferred from
a KB (knowledge base) using an algorithm called backward chaining.
Let us return to our Socrates syllogism. We enter into our Knowledge Base the following
piece of code:
mortal(X) :- man(X).
man(socrates).

( Here :- can be read as if. Generally, if P

Q (if P then Q) then in Prolog we would code

Q:-P (Q if P).)
This states that all men are mortal and that Socrates is a man. Now we can ask the Prolog
system about Socrates:
?- mortal(socrates).

(where ?- signifies a query: Can mortal(socrates). be deduced from the KB using the rules)
gives the answer "Yes".
UNIT 3

Page 32

SYSTEMS, INFORMATION AND DECISION THEORY

On the other hand, asking the Prolog system the following:


?- mortal(plato).

gives the answer "No".


This is because Prolog does not know anything about Plato, and hence defaults to any
property about Plato being false (the so-called closed world assumption). Finally ?- mortal(X)
(Is anything mortal) would result in "Yes" (and in some implementations: "Yes": X=socrates)
Prolog can be used for vastly more complicated inference tasks. See the corresponding article
for further examples.

Use with the semantic web


Recently automatic reasoners found in semantic web a new field of application. Being based
upon first-order logic, knowledge expressed using one variant of OWL can be logically
processed, i.e., inferences can be made upon it.

Bayesian statistics and probability logic


Philosophers and scientists who follow the Bayesian framework for inference use the
mathematical rules of probability to find this best explanation. The Bayesian view has a
number of desirable featuresone of them is that it embeds deductive (certain) logic as a
subset (this prompts some writers to call Bayesian probability "probability logic", following
E. T. Jaynes).
Bayesians identify probabilities with degrees of beliefs, with certainly true propositions
having probability 1, and certainly false propositions having probability 0. To say that "it's
going to rain tomorrow" has a 0.9 probability is to say that you consider the possibility of rain
tomorrow as extremely likely.
Through the rules of probability, the probability of a conclusion and of alternatives can be
calculated. The best explanation is most often identified with the most probable (see Bayesian
decision theory). A central rule of Bayesian inference is Bayes' theorem, which gave its name
to the field.
See Bayesian inference for examples.

Nonmonotonic logic[2]
A relation of inference is monotonic if the addition of premises does not undermine
previously reached conclusions; otherwise the relation is nonmonotonic. Deductive inference,
UNIT 3

Page 33

SYSTEMS, INFORMATION AND DECISION THEORY

is monotonic: if a conclusion is reached on the basis of a certain set of premises, then that
conclusion still holds if more premises are added.
By contrast, everyday reasoning is mostly nonmonotonic because it involves risk: we jump to
conclusions from deductively insufficient premises. We know when it is worth or even
necessary (e.g. in medical diagnosis) to take the risk. Yet we are also aware that such
inference is defeasiblethat new information may undermine old conclusions. Various kinds
of defeasible but remarkably successful inference have traditionally captured the attention of
philosophers (theories of induction, Peirces theory of abduction, inference to the best
explanation, etc.). More recently logicians have begun to approach the phenomenon from a
formal point of view. The result is a large body of theories at the interface of philosophy,
logic and artificial intelligence.

Applications
Computer applications
Bayesian inference has applications in artificial intelligence and expert systems. Bayesian
inference techniques have been a fundamental part of computerized pattern recognition
techniques since the late 1950s. There is also an ever growing connection between Bayesian
methods and simulation-based Monte Carlo techniques since complex models cannot be
processed in closed form by a Bayesian analysis, while a graphical model structure may allow
for efficient simulation algorithms like the Gibbs sampling and other MetropolisHastings
algorithm schemes.[13] Recently Bayesian inference has gained popularity amongst the
phylogenetics community for these reasons; a number of applications allow many
demographic and evolutionary parameters to be estimated simultaneously. In the areas of
population genetics and dynamical systems theory, approximate Bayesian computation
(ABC) is also becoming increasingly popular.
As applied to statistical classification, Bayesian inference has been used in recent years to
develop algorithms for identifying e-mail spam. Applications which make use of Bayesian
inference for spam filtering include DSPAM, Bogofilter, SpamAssassin, SpamBayes, and
Mozilla. Spam classification is treated in more detail in the article on the naive Bayes
classifier.
Solomonoff's Inductive inference is the theory of prediction based on observations; for
example, predicting the next symbol based upon a given series of symbols. The only
assumption is that the environment follows some unknown but computable probability
UNIT 3

Page 34

SYSTEMS, INFORMATION AND DECISION THEORY

distribution. It combines two well-studied principles of inductive inference: Bayesian


statistics and Occams Razor. Solomonoff's universal prior probability of any prefix p of a
computable sequence x is the sum of the probabilities of all programs (for a universal
computer) that compute something starting with p. Given some p and any computable but
unknown probability distribution from which x is sampled, the universal prior and Bayes'
theorem can be used to predict the yet unseen parts of x in optimal fashion. [14][15]

Rule of inference
In logic, a rule of inference, inference rule, or transformation rule is the act of drawing a
conclusion based on the form of premises interpreted as a function which takes premises,
analyses their syntax, and returns a conclusion (or conclusions). For example, the rule of
inference modus ponens takes two premises, one in the form of "If p then q" and another in
the form of "p" and returns the conclusion "q". The rule is valid with respect to the semantics
of classical logic (as well as the semantics of many other non-classical logics), in the sense
that if the premises are true (under an interpretation) then so is the conclusion.
Typically, a rule of inference preserves truth, a semantic property. In many-valued logic, it
preserves a general designation. But a rule of inference's action is purely syntactic, and does
not need to preserve any semantic property: any function from sets of formulae to formulae
counts as a rule of inference. Usually only rules that are recursive are important; i.e. rules
such that there is an effective procedure for determining whether any given formula is the
conclusion of a given set of formulae according to the rule. An example of a rule that is not
effective in this sense is the infinitary -rule.[1]
Popular rules of inference include modus ponens, modus tollens from propositional logic and
contraposition. First-order predicate logic uses rules of inference to deal with logical
quantifiers.

Overview
In formal logic (and many related areas), rules of inference are usually given in the following
standard form:
Premise#1
Premise#2
...

UNIT 3

Page 35

SYSTEMS, INFORMATION AND DECISION THEORY

Premise#n
Conclusion
This expression states, that whenever in the course of some logical derivation the given
premises have been obtained, the specified conclusion can be taken for granted as well. The
exact formal language that is used to describe both premises and conclusions depends on the
actual context of the derivations. In a simple case, one may use logical formulae, such as in:
AB
A
B
This is just the modus ponens rule of propositional logic. Rules of inference are usually
formulated as rule schemata by the use of universal variables. In the rule (schema) above, A
and B can be instantiated to any element of the universe (or sometimes, by convention, some
restricted subset such as propositions) to form an infinite set of inference rules.
A proof system is formed from a set of rules chained together to form proofs, or derivations.
Any derivation has only one final conclusion, which is the statement proved or derived. If
premises are left unsatisfied in the derivation, then the derivation is a proof of a hypothetical
statement: "if the premises hold, then the conclusion holds."

Admissibility and derivability


Main article: Admissible rule
In a set of rules, an inference rule could be redundant in the sense that it is admissible or
derivable. A derivable rule is one whose conclusion can be derived from its premises using
the other rules. An admissible rule is one whose conclusion holds whenever the premises
hold. All derivable rules are admissible. To appreciate the difference, consider the following
set of rules for defining the natural numbers (the judgment

asserts the fact that

is a

natural number):

UNIT 3

Page 36

SYSTEMS, INFORMATION AND DECISION THEORY

The first rule states that 0 is a natural number, and the second states that s(n) is a natural
number if n is. In this proof system, the following rule demonstrating that the second
successor of a natural number is also a natural number, is derivable:

Its derivation is just the composition of two uses of the successor rule above. The following
rule for asserting the existence of a predecessor for any nonzero number is merely admissible:

This is a true fact of natural numbers, as can be proven by induction. (To prove that this rule
is admissible, assume a derivation of the premise and induct on it to produce a derivation of
.) However, it is not derivable, because it depends on the structure of the derivation of
the premise. Because of this, derivability is stable under additions to the proof system,
whereas admissibility is not. To see the difference, suppose the following nonsense rule were
added to the proof system:

In this new system, the double-successor rule is still derivable. However, the rule for finding
the predecessor is no longer admissible, because there is no way to derive

. The

brittleness of admissibility comes from the way it is proved: since the proof can induct on the
structure of the derivations of the premises, extensions to the system add new cases to this
proof, which may no longer hold.
Admissible rules can be thought of as theorems of a proof system. For instance, in a sequent
calculus where cut elimination holds, the cut rule is admissible.

Other considerations
Inference rules may also be stated in this form: (1) some (perhaps zero) premises, (2) a
turnstile symbol , which means "infers", "proves" or "concludes", (3) a conclusion. This
usually embodies the relational (as opposed to functional) view of a rule of inference, where
the turnstile stands for a deducibility relation holding between premises and conclusion.
Rules of inference must be distinguished from axioms of a theory. In terms of semantics,
axioms are valid assertions. Axioms are usually regarded as starting points for applying rules
of inference and generating a set of conclusions. Or, in less technical terms:
UNIT 3

Page 37

SYSTEMS, INFORMATION AND DECISION THEORY

Rules are statements ABOUT the system, axioms are statements IN the system. For example:

The RULE that from

you can infer

is a statement that says if

you've proven p, then it is provable that p is provable. This holds in Peano arithmetic,
for example.

The Axiom

would mean that every true statement is provable.

This does not hold in Peano arithmetic.


Rules of inference play a vital role in the specification of logical calculi as they are
considered in proof theory, such as the sequent calculus and natural deduction.

Uncertainty
Uncertainty is a term used in subtly different ways in a number of fields, including physics,
philosophy, statistics, economics, finance, insurance, psychology, sociology, engineering, and
information science. It applies to predictions of future events, to physical measurements
already made, or to the unknown.

Concepts
Although the terms are used in various ways among the general public, many specialists in
decision theory, statistics and other quantitative fields have defined uncertainty, risk, and
their measurement as:
1. Uncertainty: The lack of certainty, A state of having limited knowledge where it is
impossible to exactly describe the existing state, a future outcome, or more than one
possible outcome.
2. Measurement of Uncertainty: A set of possible states or outcomes where
probabilities are assigned to each possible state or outcome this also includes the
application of a probability density function to continuous variables
3. Risk: A state of uncertainty where some possible outcomes have an undesired effect
or significant loss.
4. Measurement of Risk: A set of measured uncertainties where some possible
outcomes are losses, and the magnitudes of those losses this also includes loss
functions over continuous variables.[1]
UNIT 3

Page 38

SYSTEMS, INFORMATION AND DECISION THEORY

Knightian uncertainty. In his seminal work Risk, Uncertainty, and Profit[2] University of
Chicago economist Frank Knight (1921) established the important distinction between risk
and uncertainty:

Uncertainty must be taken in a sense radically distinct from the familiar notion of
risk, from which it has never been properly separated.... The essential fact is that
'risk' means in some cases a quantity susceptible of measurement, while at other
times it is something distinctly not of this character; and there are far-reaching and
crucial differences in the bearings of the phenomena depending on which of the
two is really present and operating.... It will appear that a measurable uncertainty,
or 'risk' proper, as we shall use the term, is so far different from an unmeasurable
one that it is not in effect an uncertainty at all.

There are other taxonomies of uncertainties and decisions that include a broader sense of
uncertainty and how it should be approached from an ethics perspective:[3]

A taxonomy of uncertainty
There are some things that you know to be true, and others that you know to be false; yet, despite this
extensive knowledge that you have, there remain many things whose truth or falsity is not known to you.
We say that you are uncertain about them. You are uncertain, to varying degrees, about everything in the
future; much of the past is hidden from you; and there is a lot of the present about which you do not have
full information. Uncertainty is everywhere and you cannot escape from it.
Dennis Lindley, Understanding Uncertainty (2006)

For example, if you do not know whether it will rain tomorrow, then you have a state of
uncertainty. If you apply probabilities to the possible outcomes using weather forecasts or
UNIT 3

Page 39

SYSTEMS, INFORMATION AND DECISION THEORY

even just a calibrated probability assessment, you have quantified the uncertainty. Suppose
you quantify your uncertainty as a 90% chance of sunshine. If you are planning a major,
costly, outdoor event for tomorrow then you have risk since there is a 10% chance of rain and
rain would be undesirable. Furthermore, if this is a business event and you would lose
$100,000 if it rains, then you have quantified the risk (a 10% chance of losing $100,000).
These situations can be made even more realistic by quantifying light rain vs. heavy rain, the
cost of delays vs. outright cancellation, etc.
Some may represent the risk in this example as the "expected opportunity loss" (EOL) or the
chance of the loss multiplied by the amount of the loss (10% $100,000 = $10,000). That is
useful if the organizer of the event is "risk neutral," which most people are not. Most would
be willing to pay a premium to avoid the loss. An insurance company, for example, would
compute an EOL as a minimum for any insurance coverage, then add on to that other
operating costs and profit. Since many people are willing to buy insurance for many reasons,
then clearly the EOL alone is not the perceived value of avoiding the risk.
Quantitative uses of the terms uncertainty and risk are fairly consistent from fields such as
probability theory, actuarial science, and information theory. Some also create new terms
without substantially changing the definitions of uncertainty or risk. For example, surprisal is
a variation on uncertainty sometimes used in information theory. But outside of the more
mathematical uses of the term, usage may vary widely. In cognitive psychology, uncertainty
can be real, or just a matter of perception, such as expectations, threats, etc.
Vagueness or ambiguity are sometimes described as "second order uncertainty," where there
is uncertainty even about the definitions of uncertain states or outcomes. The difference here
is that this uncertainty is about the human definitions and concepts, not an objective fact of
nature. It has been argued that ambiguity, however, is always avoidable while uncertainty (of
the "first order" kind) is not necessarily avoidable.
Uncertainty may be purely a consequence of a lack of knowledge of obtainable facts. That is,
you may be uncertain about whether a new rocket design will work, but this uncertainty can
be removed with further analysis and experimentation. At the subatomic level, however,
uncertainty may be a fundamental and unavoidable property of the universe. In quantum
mechanics, the Heisenberg Uncertainty Principle puts limits on how much an observer can
ever know about the position and velocity of a particle. This may not just be ignorance of
potentially obtainable facts but that there is no fact to be found. There is some controversy in
physics as to whether such uncertainty is an irreducible property of nature or if there are
UNIT 3

Page 40

SYSTEMS, INFORMATION AND DECISION THEORY

"hidden variables" that would describe the state of a particle even more exactly than
Heisenberg's uncertainty principle allows.

Measurements
Main article: Measurement uncertainty
In metrology, physics, and engineering, the uncertainty or margin of error of a measurement
is stated by giving a range of values likely to enclose the true value. This may be denoted by
error bars on a graph, or by the following notations:

measured value uncertainty

measured value +uncertainty


uncertainty

measured value(uncertainty)

The middle notation is used when the error is not symmetrical about the value for example
. This can occur when using a logarithmic scale, for example. The latter "concise
notation" is used for example by IUPAC in stating the atomic mass of elements. There, the
uncertainty given in parenthesis applies to the least significant figure(s) of the number prior
to the parenthesized value (ie. counting from rightmost digit to left). For instance, 1.00794(7)
stands for 1.007940.00007, while 1.00794(72) stands for 1.007940.00072.[4]
Often, the uncertainty of a measurement is found by repeating the measurement enough times
to get a good estimate of the standard deviation of the values. Then, any single value has an
uncertainty equal to the standard deviation. However, if the values are averaged, then the
mean measurement value has a much smaller uncertainty, equal to the standard error of the
mean, which is the standard deviation divided by the square root of the number of
measurements.
When the uncertainty represents the standard error of the measurement, then about 68.2% of
the time, the true value of the measured quantity falls within the stated uncertainty range. For
example, it is likely that for 31.8% of the atomic mass values given on the list of elements by
atomic mass, the true value lies outside of the stated range. If the width of the interval is
doubled, then probably only 4.6% of the true values lie outside the doubled interval, and if
the width is tripled, probably only 0.3% lie outside. These values follow from the properties
of the normal distribution, and they apply only if the measurement process produces normally
UNIT 3

Page 41

SYSTEMS, INFORMATION AND DECISION THEORY

distributed errors. In that case, the quoted standard errors are easily converted to 68.3% ("one
sigma"), 95.4% ("two sigma"), or 99.7% ("three sigma") confidence intervals.
In this context, uncertainty depends on both the accuracy and precision of the measurement
instrument. The lower the accuracy and precision of an instrument, the larger the
measurement uncertainty is. Notice that precision is often determined as the standard
deviation of the repeated measures of a given value, namely using the same method described
above to assess measurement uncertainty. However, this method is correct only when the
instrument is accurate. When it is inaccurate, the uncertainty is larger than the standard
deviation of the repeated measures, and it appears evident that the uncertainty does not
depend only on instrumental precision.

Uncertainty and the media


Uncertainty in science, and science in general, is often interpreted much differently in the
public sphere than in the scientific community [5]. This is due in part to the diversity of the
public audience, and the tendency for scientists to misunderstand lay audiences and therefore
not communicate ideas clearly and effectively [5]. One example is explained by the
information deficit model. Also, in the public realm, there are often many scientific voices
giving input on a single topic [5]. For example, depending on how an issue is reported in the
public sphere, discrepancies between outcomes of multiple scientific studies due to
methodological differences could be interpreted by the public as a lack of consensus in a
situation where a consensus does in fact exist [5]. This interpretation may even been
intentionally promoted, as scientific uncertainty may be managed to reach certain goals. For
example, global warming skeptics took the advice of Frank Luntz to frame global warming as
an issue of scientific uncertainty, which was a precursor to the conflict frame used by
journalists when reporting the issue [6].
Indeterminacy can be loosely said to apply to situations in which not all the parameters of
the system and their interactions are fully known, whereas ignorance refers to situations in
which it is not known what is not known, [7]. These unknowns, indeterminacy and ignorance,
that exist in science are often transformed into uncertainty when reported to the public in
order to make issues more manageable, since scientific indeterminacy and ignorance are
difficult concepts for scientists to convey without losing credibility [5]. Conversely,
uncertainty is often interpreted by the public as ignorance [8]. The transformation of

UNIT 3

Page 42

SYSTEMS, INFORMATION AND DECISION THEORY

indeterminacy and ignorance into uncertainty may be related to the publics misinterpretation
of uncertainty as ignorance.
Journalists often either inflate uncertainty (making the science seem more uncertain than it
really is) or downplay uncertainty (making the science seem more certain than it really is) [9].
One way that journalists inflate uncertainty is by describing new research that contradicts
past research without providing context for the change [9]Other times, journalists give
scientists with minority views equal weight as scientists with majority views, without
adequately describing or explaining the state of scientific consensus on the issue [9]. In the
same vein, journalists often give non-scientists the same amount of attention and importance
as scientists [9].
Journalists may downplay uncertainty by eliminating scientists carefully chosen tentative
wording, and by losing these caveats the information is skewed and presented as more certain
and conclusive than it really is [9]. Also, stories with a single source or without any context of
previous research mean that the subject at hand is presented as more definitive and certain
than it is in reality [9]. There is often a product over process approach to science journalism
that aids, too, in the downplaying of [9]Finally, and most notably for this investigation, when
science is framed by journalists as a triumphant quest, uncertainty is erroneously framed as
reducible and resolvable [9].
Some media routines and organizational factors affect the overstatement of uncertainty; other
media routines and organizational factors help inflate the certainty of an issue. Because the
general public (in the United States) generally trusts scientists, when science stories are
covered without alarm-raising cues from special interest organizations (religious groups,
environmental organization, political factions, etc.) they are often covered in a business
related sense, in an economic-development frame or a social progress frame [10]. The nature of
these frames is to downplay or eliminate uncertainty, so when economic and scientific
promise are focused on early in the issue cycle, as has happened with coverage of plant
biotechnology and nanotechnology in the United States, the matter in question seems more
definitive and certain [10].
Sometimes, too, stockholders, owners, or advertising will pressure a media organization to
promote the business aspects of a scientific issue, and therefore any uncertainty claims that
may compromise the business interests are downplayed or eliminated [9].

UNIT 3

Page 43

SYSTEMS, INFORMATION AND DECISION THEORY

Applications

Investing in financial markets such as the stock market.

Uncertainty or error is used in science and engineering notation. Numerical values


should only be expressed to those digits that are physically meaningful, which are
referred to as significant figures. Uncertainty is involved in every measurement, such
as measuring a distance, a temperature, etc., the degree depending upon the
instrument or technique used to make the measurement. Similarly, uncertainty is
propagated through calculations so that the calculated value has some degree of
uncertainty depending upon the uncertainties of the measured values and the equation
used in the calculation.[11]

Uncertainty is designed into games, most notably in gambling, where chance is


central to play.

In scientific modelling, in which the prediction of future events should be understood


to have a range of expected values.

In physics, the Heisenberg uncertainty principle forms the basis of modern quantum
mechanics.

In weather forecasting it is now commonplace to include data on the degree of


uncertainty in a weather forecast.

Uncertainty is often an important factor in economics. According to economist Frank


Knight, it is different from risk, where there is a specific probability assigned to each
outcome (as when flipping a fair coin). Uncertainty involves a situation that has
unknown probabilities, while the estimated probabilities of possible outcomes need
not add to unity.

In entrepreneurship: New products, services, firms and even markets are often created
in the absence of probability estimates. According to entrepreneurship research, expert
entrepreneurs predominantly use experience based heuristics called effectuation (as
opposed to causality) to overcome uncertainty.

In risk assessment and risk management.[12]

In metrology, measurement uncertainty is a central concept quantifying the dispersion


one may reasonably attribute to a measurement result. Such an uncertainty can also be

UNIT 3

Page 44

SYSTEMS, INFORMATION AND DECISION THEORY

referred to as a measurement error. In daily life, measurement uncertainty is often


implicit ("He is 6 feet tall" give or take a few inches), while for any serious use an
explicit statement of the measurement uncertainty is necessary. The expected
measurement uncertainty of many measuring instruments (scales, oscilloscopes, force
gages, rulers, thermometers, etc.) is often stated in the manufacturer's specification.
The most commonly used procedure for calculating measurement uncertainty is
described in the Guide to the Expression of Uncertainty in Measurement (often
referred to as "the GUM") published by ISO. A derived work is for example the
National Institute for Standards and Technology (NIST) publication NIST Technical
Note 1297 "Guidelines for Evaluating and Expressing the Uncertainty of NIST
Measurement Results" and the Eurachem/Citac publication "Quantifying Uncertainty
in Analytical Measurement" (available at the Eurachem homepage). The uncertainty
of the result of a measurement generally consists of several components. The
components are regarded as random variables, and may be grouped into two
categories according to the method used to estimate their numerical values:
o

Type A, those evaluated by statistical methods

Type B, those evaluated by other means, e.g., by assigning a probability


distribution

By propagating the variances of the components through a function relating the


components to the measurement result, the combined measurement uncertainty is
given as the square root of the resulting variance. The simplest form is the standard
deviation of a repeated observation.

Uncertainty has been a common theme in art, both as a thematic device (see, for
example, the indecision of Hamlet), and as a quandary for the artist (such as Martin
Creed's difficulty with deciding what artworks to make).

UNIT 3

Page 45

SYSTEMS, INFORMATION AND DECISION THEORY

INFORMATION NEEDED TO SUPPORT DECISION:

Information requirements of key decision-making groups


Various levels of management in the firm have differing information requirements for
decision support because of their different job responsibilities and the nature of the decisions
made at each level.

l Senior management. Senior management is concerned with general yet timely information
on changes in the industry and society at large that may affect both the long-term and nearterm future of the firm, the firms strategic goals, short-term and future performance, specific
bottlenecks and trouble affecting operational capabilities, and the overall ability of the firm to
achieve its objectives.
Middle management and project teams. Middle management is concerned with specific,
timely information about firm performance, including revenue and cost reduction targets, and
UNIT 3

Page 46

SYSTEMS, INFORMATION AND DECISION THEORY

with developing plans and budgets to meet strategic goals established by senior management.
This group needs to make important decisions about allocating resources, developing shortrange plans, and monitoring the performance of departments, task forces, teams, and special
project groups. Often the work of middle managers is accomplished in teams or small groups
of managers working on a task.
Operational management and project teams. Operational management monitors the
performance of each subunit of the firm and manages individual employees. Operational
managers are in charge of specific projects and allocate resources within the project budget,
establish schedules, and make personnel decisions. Operational work may also be
accomplished through teams.
Individual employees. Employees try to fulfill the objectives of managers above them,
following established rules and procedures for their routine activities. Increasingly, however,
employees are granted much broader responsibilities and decision-making authority based on
their own best judgment and information in corporate systems. Employees may be making
decisions about specific vendors, customers, and other employees. Because employees
interact directly with the public, how well they make their decisions can directly impact the
firms revenue streams.

PROBLEM CHARACTERISTICS AND IS CAPABILITIES IN DECISION MAKING


FUNCTION VIEW
The Challenges in IT Exploitation
1. Business and IT vision: The challenges of using IS services for business-IT (or businesstechnology) alignment.
2. Delivery of IS services: The challenges of delivering high quality IS services cheaper.
3. Design of IT architecture: The challenges of designing and implementing an IT architecture
that is inter-operable with existing and future intra-enterprise systems within the company and
intra-enterprise systems that are running on collaborators platforms.
The Core IS Capabilities
In order to face these three challenges, Feeny and Willcoks (1998) identify a set of nine core IS
capabilities:

UNIT 3

Page 47

SYSTEMS, INFORMATION AND DECISION THEORY


1. Leadership: The capability of integrating IT efforts with business purposes and activities. This
capability is to manage organizational arrangements, structure processes and staffing- tacking
any challenges in these arrangements
2. Business Systems Thinking: The capability of ensuring optimal business-IT arrangement; this
capability is about business problem solving with IT perspective, process reengineering, and
strategic development.
3. Relationship Building: The capability of facilitating wider dialogs between business and IS
communities.
4. Architectural Planning: The capability of creating IT architecture that can respond to present
and future business needs, and allow future growth of the business.
5. Making Technology work: The capability of rapidly achieving technical progress, making the
company forerunner (leader), or a quick adapter (follower).
6. Informed buying: The capability is for analyzing the external markets for IT suitability the
specific business opportunities and for selection of a sourcing strategy to meet business needs
and technology solutions.
7. Contract Facilitation: The capability of ensuring the success of existing contracts for IT
services; this capability is about contract facilitating to ensure that problems and conflicts are
to be solved fairly. This is basically about forming a single point of contact for customer
relationship.
8. Contract Monitoring: The capability of measuring performance of suppliers and managing
suppliers.
9. Vendor Development: The capability of identifying the potential added value of IT business
service suppliers. This capability is about creating the Win-Win solution with collaborating
partners and for forming a long-term relationship, which may among other benefits avoid
switching cost.

Cultivating Core IS Capabilities


1. Technical skills,
2. Business skills,
3. Interpersonal skills,
4. Time horizons (balancing short-term and long-term interests), and
5. Motivating values (multi-talented work force).
UNIT 3

Page 48

SYSTEMS, INFORMATION AND DECISION THEORY

PROCESS VIEW
The Core IS Capabilities
Marchand et al (2000) found out that there were three core IS capabilities:
Information Technology Practices (ITP): The capability of a company to effectively manage
appropriate IT applications and infrastructures in support of operational decision-making and
communication processes.
Information Management Practices (IMP): The capability of a company to manage information
effectively over its life cycle.
Information Behaviors and Values (IBV): The capability of a company to install and promote
behaviors and values in its people for effective use of information.

The three IS capabilities are further divided into 15 competencies:


ITP competencies:

1. IT for operational support


2. IT for business process support
3. IT for innovation support
4. IT for management support

IMP competencies:

5. Sensing information
6. Collecting information
7. Organizing information
8. Processing information
9. Maintaining information
IBV competencies:
10. Integrity: effective sharing of sensitive information
UNIT 3

Page 49

SYSTEMS, INFORMATION AND DECISION THEORY


11. Formality: usage and trust of formal sources of information
12. Control: flow of information about business performance
13. Sharing: free exchange of non-sensitive (and sensitive) information
14. Transparency: openness about failures and mistakes
15. Proactiveness: reacting to changes in the competitive environment

HIERARCHICAL VIEW
The resource level denotes the resource components that are the key ingredients of the IS
competencies, such as skills (e.g., business skills, technical skills), knowledge and experience,
and behavior and attitudes.
The organizing level is concerned with how these resources are utilized, via structures, processes
and roles, to create IS competencies.
The enterprise level is where the IS capability is visible and is recognized in the performance of
the organization.

COMPARING 3 VIEWS ON IDENTIFYING CORE IS CAPABILITIES


1. A functional view based on the challenges in IT exploitation: Feeny and Willcoks (1998)
identify nine core IS capabilities within three overlapping functions,
2. A process view based on measurement of effective information use: Marchand et al. (2000)
identify core IS capabilities as three independent pillars, based on supporting processes, and
3. A hierarchical view based on resource utilization: Peppard and Ward (2004) identify core IS
capabilities based on resource utilization.

UNIT 3

Page 50

Вам также может понравиться