Вы находитесь на странице: 1из 63

Concepts and

Techniques

Chapter 11
Applications and Trends in Data
Mining
Additional Theme: Visual Data Mining

Jiawei Han and Micheline Kamber


Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
2006 Jiawei Han and Micheline Kamber. All rights reserved.
March 16, 20 Data Mining: Concept 1
March 16, 20 Data Mining: Concept 2
Visual Data Mining: An
Overview
What is Visual Data Mining?
Survey of techniques
Data Visualization
Visualizing Data Mining Results
Visual Data Mining

March 16, 20 Data Mining: Concept 3


What Is Visual Data Mining?
Visual data mining discovers implicit and
useful knowledge from large data sets
using data and/or knowledge visualization
techniques
Data visualization + Data mining
techniques

March 16, 20 Data Mining: Concept 4


Why Visual Data Mining?
Advantages of human visual system
Highly parallel processor

Sophisticated reasoning engine

Large knowledge base

Can be used to comprehend data distributions,


patterns, clusters, and outliers

Data Mining Visualization


Algorithms
Actionable +
Evaluation +
Flexibility +
User 16, 20
March Concept
Data Mining: + 5
Why Not Only Visual Data
Mining?
Disadvantages of human visual system
Needs training
Not automated
Intrinsic bias
Limit of about 106 or 107 observations
(Wegman 1995)
Power of integration with analytical
methods

March 16, 20 Data Mining: Concept 6


Scope of Visual Data Mining
Visualization: Use of computer graphics to create
visual images which aid in the understanding of
complex, often massive representations of data
Visual Data Mining: The process of discovering
implicit but useful knowledge from large data sets
using visualization techniques

Human
Compute Multimedi
Compute
r a Systems
r
Graphics
Interface
High Pattern s
Performance Recognitio
Computing n
March 16, 20 Data Mining: Concept 7
Purpose of Visualization
Gain insight into an information space by
mapping data onto graphical primitives
Provide qualitative overview of large data sets
Search for patterns, trends, structure,
irregularities, relationships among data
Help find interesting regions and suitable
parameters for further quantitative analysis
Provide a visual proof of computer
representations derived

March 16, 20 Data Mining: Concept 8


Visual Data Mining & Data
Visualization
Integration of visualization and data mining
data visualization

data mining result visualization

data mining process visualization

interactive visual data mining

Data visualization

Data in a database or data warehouse can be

viewed
at different levels of abstraction

as different combinations of attributes or

dimensions
Data can be presented in various visual forms

March 16, 20 Data Mining: Concept 9


Abilities of Humans and Computers

abilitiesof DataStorage
thecomputer NumericalComputation
Searching

Planning
Diagnosis
Logic
Prediction

Perception
Creativity
GeneralKnowledge

humanabilities
March 16, 20 Data Mining: Concept 10
Visual Mining vs. Scientific Vis. &
Graphics

Scientific Visualization
Often visualize physical model, low

dimensionality
Graphics
More concerned with how to render

(draw) rather than what to render

March 16, 20 Data Mining: Concept 11


Data Visualization
View data in database or data warehouse
User may control
Different levels of details
Subset of attributes
Drawn using boxplots, histograms,
polylines, etc.

March 16, 20 Data Mining: Concept 12


Historical Overview of Exploratory
Data Visualization Techniques (cf. [WB 95])
Pioneering works of Tufte [Tuf 83, Tuf 90] and Bertin [Ber
81] focus on
Visualization of data with inherent 2D-/3D-semantics

General rules for layout, color composition, attribute

mapping, etc.
Development of visualization techniques for different
types of data with an underlying physical model
Geographic data, CAD data, flow data, image data,

voxel data, etc.


Development of visualization techniques for arbitrary
multidimensional data (w.o. an underlying physical model)
Applicable to databases and other information

resources

March 16, 20 Data Mining: Concept 13


Dimensions of Exploratory Data Visualization

DataVisualizationTechniques
Geometric

Iconbased

DistortionTechniques
Pixeloriented

Hierarchical Complex

Simple
Graphbased
InteractionTechniques
Mapping Projection Filtering Link&Brush Zooming

March 16, 20 Data Mining: Concept 14


Classification of Data Visualization Techniques

Geometric Techniques:
Scatterplots, Landscapes, Projection Pursuit, Prosection Views,
Hyperslice, ParallelCoordinates...
Icon-based Techniques:
Chernoff Faces, Stick Figures, Shape-Coding, Color Icons,
TileBars,...
Pixel-oriented Techniques:
Recursive Pattern Technique, Circle Segments Technique, Spiral- &
Axes-Techniques,...
Hierarchical Techniques:
Dimensional Stacking, Worlds-within-Worlds,Treemap, Cone Trees,
InfoCube,...
Graph-Based Techniques:
Basic Graphs (Straight-Line, Polyline, Curved-Line,...)
Specific Graphs (e.g., DAG, Symmetric, Cluster,...)
Systems (e.g., Tom Sawyer, Hy+, SeeNet, Narcissus,...)
Hybrid Techniques: arbitrary combinations from above
March 16, 20 Data Mining: Concept 15
Distortion & Dynamic/Interaction
Techniques

Distortion Techniques
Simple Distortion (e.g. Perspective Wall, Bifocal Lenses,
TableLens, Graphical Fisheye Views,...)
Complex Distortion (e.g. Hyperbolic Repr. Hyperbox,...)
Dynamic/Interaction Techniques
Data-to-Visualization Mapping (e.g. Auto Visual, S Plus,
XGobi, IVEE,...)
Projections: (e.g. GrandTour, S Plus, XGobi,...)
Filtering (Selection, Querying) (e.g. MagicLens, Filter/Flow
Queries, InfoCrystal,...)
Linking & Brushing (e.g. Xmdv-Tool, XGobi, DataDesk,...)
Zooming (e.g. PAD++, IVEE, DataSpace,...)
Detail on Demand (e.g. IVEE, TableLens, MagicLens, VisDB,...)

March 16, 20 Data Mining: Concept 16


Visual Survey
Data visualization techniques
Scatterplot Matrices, Landscapes, Parallel
Coordinates
Icon-based, Dimensional Stacking, Treemaps

March 16, 20 Data Mining: Concept 17


Direct Visualization
Ribbons with Twists Based on Vorticity

March 16, 20 Data Mining: Concept 18


Geometric Techniques
Basic Idea
Visualization of geometric transformations and

projections of the data


Methods

Landscapes [Wis95]

Projection Pursuit Techniques [Hub85] (a

techniques for finding meaningful projections of


multidimensional data)
Scatterplot-Matrices [And72, Cle93]

Prosection Views [FB94, STDS95]

Hyperslice [WL93]

Parallel Coordinates [Ins85, ID90]

March 16, 20 Data Mining: Concept 19


Scatterplot-Matrices [Cleveland 93]

Used by ermission of M. Ward, Worcester Polytechnic Institute

matrix of scatterplots (x-y-diagrams) of the k-dimensional data [total of


(k2/2-k) scatterplots]
March 16, 20 Data Mining: Concept 20
Landscapes [Wis 95]
Used by permission of B. Wright, Visible Decisions Inc.

news articles
visualized as
a landscape

Visualization of the data as perspective landscape


The data needs to be transformed into a (possibly artificial) 2D
spatial representation which preserves the characteristics of the data

March 16, 20 Data Mining: Concept 21


Parallel Coordinates [Ins 85, ID 90]
n equidistant axes which are parallel to one of the screen
axes and correspond to the attributes
the axes are scaled to the [minimum, maximum]range of
the corresponding attribute
every data item corresponds to a polygonal line which
intersects each of the axes at the point which corresponds
to the value for the attribute

Attr. 1 Attr. 2 Attr. 3 Attr. k


March 16, 20 Data Mining: Concept 22
Parallel Coordinates

March 16, 20 Data Mining: Concept 23


Icon-Based Techniques
Basic Idea
Visualization of the data values as features of icons
Overview
Chernoff-Faces [Che73, Tuf83]
Stick Figures [Pic70, PG88]
Shape Coding [Bed90]
Color Icons [Lev91, KK94]
TileBars [Hea95]
(use of small icons representing the relevance
feature vectors in document retrieval)

March 16, 20 Data Mining: Concept 24


Stick Figures
used by permission of G. Grinstein, University of Massachusettes at Lowell

census data
showing age,
income, sex,
education, etc.

March 16, 20 Data Mining: Concept 25


Hierarchical Techniques

Basic Idea: Visualization of the data using


a hierarchical partitioning into subspaces.
Overview
Dimensional Stacking [LWW90]

Worlds-within-Worlds [FB90a/b]

Treemap [Shn92, Joh93]

Cone Trees [RMC91]


InfoCube [RG93]

March 16, 20 Data Mining: Concept 26


Dimensional Stacking [LWW90]

attribute4
attribute2

attribute3

attribute1
partitioning of the n-dimensional attribute space in 2-
dimensional subspaces which are stacked into each
other
partitioning of the attribute value ranges into classes the
important attributes should be used on the outer levels
adequate especially for data with ordinal attributes of low
cardinality
March 16, 20 Data Mining: Concept 27
Dimensional Stacking

Visualization of oil mining data with longitude and


Used by permission of M. Ward, Worcester Polytechnic Institute

latitude mapped to the outer x-, y-axes and ore grade


and depth mapped to the inner x-, y-axes
March 16, 20 Data Mining: Concept 28
Dimensional Stacking
Disadvantages:
Difficult to display more than nine

dimensions
Important to map dimensions

appropriately
May be difficult to understand

visualizations at first

March 16, 20 Data Mining: Concept 29


Treemap [JS 91, Shn92,
Joh93]
Screen-filling method which uses a hierarchical
partitioning of the screen into regions depending on the
attribute values
The x- and y-dimension of the screen are partitioned
alternately according to the attribute values (classes)

MSR Netscan image:

March 16, 20 Data Mining: Concept 30


March 16, 20 Data Mining: Concept 31
Treemap of a File System
(Schneiderman)

March 16, 20 Data Mining: Concept 32


Treemaps
The attributes used for the partitioning and
their ordering are user-defined (the most
important attributes should be used first)
The color of the regions may correspond to
an additional attribute
Suitable to get an overview over large
amounts of hierarchical data (e.g., file
system) and for data with multiple ordinal
attributes (e.g., census data)

March 16, 20 Data Mining: Concept 33


Data Mining Result
Visualization
Presentation of the results or knowledge obtained
from data mining in visual forms
Examples
Scatter plots and boxplots (obtained from
descriptive data mining)
Decision trees
Association rules
Clusters
Outliers
Generalized rules
Text mining
March 16, 20 Data Mining: Concept 34
Boxplots from Statsoft: Multiple
Variable Combinations

March 16, 20 Data Mining: Concept 35


Visualization of Data Mining Results
in SAS Enterprise Miner: Scatter
Plots

March 16, 20 Data Mining: Concept 36


Visualization of Association
Rules in SGI/MineSet 3.0

March 16, 20 Data Mining: Concept 37


Visualization of Decision Tree in
SGI/MineSet 3.0

March 16, 20 Data Mining: Concept 38


Vizualization of Decision Trees

March 16, 20 Data Mining: Concept 39


Visualization of Cluster Grouping
IBM Intelligent Miner

March 16, 20 Data Mining: Concept 40


Association Rules (MineSet)

LHS and RHS items


are mapped to x-,
y-axis
Confidence,
support
correspond to
height of the bar
or disc,
respectively
Interestingness is
mapped to Color

March 16, 20 Data Mining: Concept 41


MineSet: Association Rules

March 16, 20 Data Mining: Concept 42


Association Ball Graph
(DBMiner)

Items are
visualized as balls
Arrows indicate
rule implication
Size represents
support

March 16, 20 Data Mining: Concept 43


Classification (SAS EM [SAS 01])

Tree Viewer

Color corresponds to relative frequency of a class in a


node
Branch line thickness is proportional to the square root of
the objects
March 16, 20 Data Mining: Concept 44
Cluster Analysis (H-BLOB: Hierarchical BLOB)
[SBG 00]

Cluster Form ellipsoids Form blobs


(implicit surfaces)

March 16, 20 Data Mining: Concept 45


H-BLOB

March 16, 20 Data Mining: Concept 46


Text Mining (ThemeRiver [WCF+
00])

Visualization of thematic Changes in documents


Vertical distance indicates collective strength of the themes
March 16, 20 Data Mining: Concept 47
Data Mining Process Visualization
Presentation of the various processes of data mining
in visual forms so that users can see the flow of
data cleaning, integration, preprocessing, mining
Data extraction process
Where the data is extracted
How the data is cleaned, integrated,
preprocessed, and mined
Method selected for data mining
Where the results are stored
How they may be viewed

March 16, 20 Data Mining: Concept 48


Visualization of Data Mining
Processes by Clementine

See your solution


discovery
process clearly

Understand
variations with
visualized data

March 16, 20 Data Mining: Concept 49


Interactive Visual Data Mining

Using visualization tools in the data mining process


to help users make smart data mining decisions
Example
Display the data distribution in a set of attributes
using colored sectors or columns (depending on
whether the whole space is represented by either
a circle or a set of columns)
Use the display to which sector should first be
selected for classification and where a good split
point for this sector may be

March 16, 20 Data Mining: Concept 50


Visual data mining
Projection Pursuits
(Class) Tours [Dhillon et al. 98]
Visual Classification [Ankerst et al. KDD 99]

March 16, 20 Data Mining: Concept 51


Projection Pursuits
Exploratory projection pursuit:
Goal: reduce dimensionality

Define interestingness index to each

possible projection of a data set


Maximize this index, project linearly

Not always possible/useful

March 16, 20 Data Mining: Concept 52


Class Tours
Visualizing Class Structure of
Multidimensional Data by Dhillon et al.
1998
Problem: Visualize multidimensional data
categorized into classes
Solution: Project data into 2D while
preserving distances between class means

March 16, 20 Data Mining: Concept 53


Class-Preserving Projection:
Preserves distances between
projected means

March 16, 20 Data Mining: Concept 54


Tours
Tours are animated and interpolated
sequences of 2D projections [Asimov 1985]
Class tours: sequences of class-preserving
2-dimensional projections
Captures inter-class structure of complex,
multi-dimensional data

March 16, 20 Data Mining: Concept 55


Perception-Based Classification
(PBC)

March 16, 20 Data Mining: Concept 56


Visual Classification
Visual Classification:
An Interactive Approach
to Decision Tree
Construction by
Ankerst et al. KDD 99
Exploit experts domain
knowledge and human
visual processing

March 16, 20 Data Mining: Concept 57


Visual Classification

March 16, 20 Data Mining: Concept 58


Visual Classification Results
Comparable classification accuracy
Can produce more understandable decision
trees
Expert domain knowledge can be exploited

March 16, 20 Data Mining: Concept 59


Audio Data Mining
Uses audio signals to indicate the patterns of data
or the features of data mining results
An interesting alternative to visual mining
An inverse task of mining audio (such as music)
databases which is to find patterns from audio
data
Visual data mining may disclose interesting
patterns using graphical displays, but requires
users to concentrate on watching patterns
Instead, transform patterns into sound and music
and listen to pitches, rhythms, tune, and melody in
order to identify anything interesting or unusual
March 16, 20 Data Mining: Concept 60
Summary
Many visualization methods available
How to evaluate and compare methods?
Need for:
Integrated visualization/exploration

systems
Studies of interaction techniques for

mining
Practical case studies

March 16, 20 Data Mining: Concept 61


Acknowledgments
Many slides and images from Mihael Ankerst,
Boeing, Daniel A. Keim, AT&T, Tutorial at
PKDD'2001
Some pictures from Information Visualization in
Data Mining and Knowledge Discovery, edited by
Usama Fayyad, Georges Grinstein and Andreas
Wierse
A good set of slides were prepared by Andrew Wu
(Spring 2004)

March 16, 20 Data Mining: Concept 62


March 16, 20 Data Mining: Concept 63

Вам также может понравиться