Вы находитесь на странице: 1из 27

SPATIAL DATA MINING

Name: Shan Parvej Gayen Roll No :-069123 Department Of MtechCSE Heritage Institute Of Technology

5 April 2012

LEARNING OBJECTIVES
5 April 2012

Understand the concept of Spatial Data Mining Describe the concepts of patterns and SDM Describe the motivation for SDM Learn about patterns explored by SDM Learn techniques on how to find spatial patterns

EXAMPLES OF SPATIAL PATTERNS


Historic Examples (section 7.1.5, pp. 186)
1855 Asiatic Cholera in London : A water pump identified as the source Fluoride and healthy gums near Colorado river Theory of Gondwanaland - continents fit like pieces of a jigsaw puzzle

5 April 2012

Modern Examples
Cancer clusters to investigate environment health hazards Crime hotspots for planning police patrol routes Bald eagles nest on tall trees near open water Nile virus spreading from north east USA to south and west Unusual warming of Pacific ocean (El Nino) affects weather in USA

WHAT IS A SPATIAL PATTERN?

5 April 2012

What is not a pattern?

Random, haphazard, chance, stray, accidental, unexpected. Without definite direction, trend, rule, method, design, aim, purpose.
A frequent arrangement, configuration, composition, regularity. A rule, law, method, design, description. A major direction, trend, prediction.
4

What is a Pattern?

DEFINING SPATIAL DATA MINING

5 April 2012

Search for spatial patterns. Non-trivial search as automated as possible.


Large search space of plausible hypothesis Ex. Asiatic cholera : causes water, food, air, insects.

Interesting, useful, and unexpected spatial patterns.

Useful in certain application domain

Ex. Shutting off identified water pump => saved human lives. Ex. Water pump Cholera connection lead to the germ theory.
5

May provide a new understanding of the world

WHAT IS NOT SPATIAL DATA MINING

5 April 2012

Simple querying of Spatial Data

Finding neighbors of Canada given names and boundaries of all countries (Search space not large)
Heavy rainfall in Minneapolis is correlated with heavy rainfall in St. Paul (10 miles apart). Common knowledge, nearby places have similar rainfall Diaper sales and beer sales are correlated in evenings

Uninteresting or obvious patterns

Mining of non-spatial data

FAMILIES OF SPATIAL DATA MINING PATTERNS

5 April 2012

Location Prediction:

Where will a phenomenon occur?


Which subset of spatial phenomena interact? Which locations are unusual or share commonalities? Note: Other families of spatial patterns may be defined SDM is a growing field, which should accommodate new pattern families
7

Spatial Interactions

Hot spot

LOCATION PREDICTION
5 April 2012

Where will a phenomenon occur? Which spatial events are predictable? How can a spatial event be predicted from other spatial events? Examples

Where will an endangered bird nest? Which areas are prone to fire given maps of vegitation and drought? What should be recommended to a traveler in a given location?
8

SPATIAL INTERACTIONS
5 April 2012

Which spatial events are related to each other? Which spatial phenomena depend on other phenomenon? Examples

Earth science:

climate and disturbance => {wild fires, hot, dry, lightning}


Disease type and enviornmental events => {West Nile disease, stagnant water source, dead birds, mosquitoes}

Epidemiology:

HOT SPOTS
5 April 2012

Is a phenomenon spatially clutered? Which spatial entities are unusual or share common characteristics? Examples

Crime hot spots to plan police patrols Cancer cluster to learn investigations
10

SPATIAL QUERIES
5 April 2012

Spatial Range Queries

Find all cities within 50 miles of Paris Query has associated region (location, boundary) Answer includes overlapping or contained data regions
Find the 10 cities nearest to Paris Results must be ordered by proximity Find all cities near a lake Join condition involves regions and proximity.

Nearest-Neighbor Queries

Spatial Join Queries



11

UNIQUE PROPERTIES OF SPATIAL PATTERNS

5 April 2012

Items in a traditional data are independent of each other, where as properties of location in a map are often auto-correlated (patterns exist) Traditional data deals with simple domains, e.g. numbers and symbols where as spatial data types are complex Items in traditional data describe discrete objects where as spatial data is continuous

12

ASSOCIATION RULES
5 April 2012

Support = the number of time a rule shows up in a database Confidence = Conditional probability of Y given X Example

(Bedrock type = limestone), (soil depth < 50 ft) => (sink hole risk = high) Support = 20 %, confidence = 0.8 Interpretation: Locations with limestone bedrock and low soil depth have high risk of sink hole formation.

13

APRIORI ALGORITHM TO MINE


ASSOCIATION RULES

5 April 2012

Key challenge

Very large search space


Few associations are support above given threshold Associations with low support are not interesting If an association item set has high support, then so do all its subsets

Key assumption

Key insight

14

ASSOCIATION RULES EXAMPLE

15

5 April 2012

TECHNIQUES FOR ASSOCIATION MINING

5 April 2012

Classical method

Association rules given item types and transactions Assumes spatial data can be decomposed into transactions

Such decomposition may alter spatial patterns

New spatial method


Spatial association rule Spatial co-location

16

ASSOCIATIONS, SPATIAL ASSOCIATIONS, CO-LOCATION

17

5 April 2012

ASSOCIATIONS, SPATIAL ASSOCIATINS, CO-LOCATION

18

5 April 2012

CO-LOCATION RULES
5 April 2012

For point data in space Does not need transaction, works directly with continuous space Use neighborhood definition and spatial joins

19

CO-LOCATION RULES
5 April 2012

Co location

20

CLUSTERING
5 April 2012

Process of discovering groups in large databases

Spatial view: rows in a database = points in a multidimentional space. Visualization may reveal interesting groups

21

CLUSTERING
5 April 2012

Hierarchical

All points in one cluster Split and merge till a stop criterion is reached
Start with random central point Assign points to nearest central point Update the central points Approach with statistical rigor Find clusters based on density of regions

Partitional

Density

22

OUTLIERS
5 April 2012

Observations inconsistent with rest of the dataset Observations inconsistent with their neighborhoods A local instability or discontinuity

23

VARIOGRAM CLOUD
5 April 2012

Create a variogram by plotting attribute difference, distance for each pair of points Select points common to many outlying pairs

24

MORAN SCATTER PLOT


Plot normalized attribute values, weighted average in the neighborhood for each location Select points in upper left and lower right quadrant

25

5 April 2012

SCATTER PLOT

Plot normalized attribute values, weighted average in the neighborhood for each location Fit a liner regression line Select points which are unusually far from the regression line.

26

5 April 2012

CONCLUSION
5 April 2012

Patterns are opposite of random Common spatial patterns:


Location prediction Feature interaction Hot spot Techniques like associations, clustering and outlier detection

Spatial patterns may be discovered using:

27

Вам также может понравиться