Вы находитесь на странице: 1из 62

J.K.

OKE
WITH I.T. APPLICATIONS, HUGE DATA IS
ROUTINELY GENERATED / CREATED &
EQUALLY ROUTINELY COLLECTED &
CONFINED TO THE ARCHIVES OF MOST OF
THE CORPORATES.
THIS LEADS TO A TYPICAL SITUATION
WHERE THE CORPORATE ORGANISATIONS
ARE “DATA-RICH”
BUT “INFORMATION POOR”
 THE HUGE DATA IS POTENTIALLY THE MOST
POWERFUL ASSET IN TODAY’S INFORMATION
AGE,
 THE PROBLEM THAT ARISES IS HOW TO
EXTRACT VALUABLE INFORMATION FROM THIS
DATA & MAKE IT AVAILABLE TO THE RIGHT
PERSON, AT THE RIGHT PLACE,AT THE RIGHT
TIME IN APPROPRIATE FORM
 TO IMPROVE DECISION MAKING ABILITY/
QUALITY
 DATA WAREHOUSING & DATA MINING PLAY
CRUCIAL ROLE IN THIS SITUATION
 IT IS INDEED SAID THAT IN TODAY’S
INCREASING & INTENSIFYING GLOBALISATION
& COMPETITION,
 THE CORPORATE FUTURE HINGES ON
 DATA WAREHOUSING & DATA MINING
 DATA WAREHOUSE COLLECTS BUSINESS
INFORMATION FROM MANY SOURCES IN THE
ENTERPRISE COVERING ALL ASPECTS OF THE
COMPANY’S PROCESSES, PRODUCTS &
CUSTOMERS
... AMY HELEN JOHNSON
 DATA WAREHOUSE IS A SUBJECT-ORIENTED,
INTEGRATED, TIME-VARYING, NON-VOLATILE
COLLECTION OF DATA IN SUPPORT OF THE
MANAGEMENT’S DECISION-MAKING PROCESS
 W.H.INMON,
(FATHER OF DATA WAREHOUSING)
 HELPS INTEGRATE DATA FROM A
NUMBER OF DIVERSE, HETEROGENEOUS
SOURCES
 CONSOLIDATES & STORES DATA IN SUCH
A MANNER AS CAN BE USED FOR
“INFORMATION ANALYTICAL PROCESSING”
 HELPS THE TOP / SENIOR MANAGEMENT TO
CONSIDER ISSUES IN CONTEXT, BY PROVIDING
A CONSISTENT VIEW OF THE ENTERPRISE
HELPS CREATE AN ENTERPRISE-WIDE
INTEGRATED DATABASE OF SUMMARISED,
HISTORICAL INFORMATION.
> SUPPORTS BUSINESS ANALYSIS & DECISION
MAKING & HELPS IMPROVE BUSINESS
PROCESSES.
> IMPROVES OPTIMISATION EFFORTS
THROUGH CONSOLIDATION, CONVERSION,
TRANSFORMATION & INTEGRATION OF
OPERATIONAL DATA

EXECUTIVE SPONSORSHIP A MUST
EXPERIENCED FULL-TIME TEAM
 INVOLVEMENT OF BOTH IT,BUSINESS UNITS &
BUSINESS ANALYST
 UNDERSTANDING OF & FAMILIARITY WITH
DATA WAREHOUSING / MINING TECHNOLOGIES
FOCUSSED & ACHIEVABLE / PRAGMATIC
SCOPE OF PROJECT
 SPECIAL EMPHASIS ON DATA QUALITY
 USE OF ITERATIVE APPROACH FOR
BUILDING , TESTING & IMPLEMENTING THE
SOLUTION
 BROADER PERSPECTIVE
 GENERALLY A DATA WAREHOUSE ADOPTS
A THREE-TIER ARCHITECTURE.
 FOLLOWING ARE THE THREE TIERS OF THE
DATA WAREHOUSE ARCHITECTURE:
 BOTTOM TIER
 MIDDLE TIER
 TOP-TIER
 THE BOTTOM TIER OF THE ARCHITECTURE
IS THE DATA WAREHOUSE DATABASE
SERVER.
 IT IS THE RELATIONAL DATABASE SYSTEM.
WE USE THE BACK END TOOLS AND
UTILITIES TO FEED DATA INTO THE BOTTOM
TIER.
 THESE BACK END TOOLS AND UTILITIES
PERFORM THE EXTRACT, CLEAN, LOAD, AND
REFRESH FUNCTIONS.
 IN THE MIDDLE TIER, WE HAVE THE OLAP
SERVER THAT CAN BE IMPLEMENTED IN
EITHER OF THE FOLLOWING WAYS.
◦ BY RELATIONAL OLAP (ROLAP), WHICH IS AN
EXTENDED RELATIONAL DATABASE
MANAGEMENT SYSTEM. THE ROLAP MAPS THE
OPERATIONS ON MULTIDIMENSIONAL DATA
TO STANDARD RELATIONAL OPERATIONS.
◦ BY MULTIDIMENSIONAL OLAP (MOLAP)
MODEL, WHICH DIRECTLY IMPLEMENTS THE
MULTIDIMENSIONAL DATA AND OPERATIONS.
 THIS TIER IS THE FRONT-END CLIENT
LAYER.
 THIS LAYER HOLDS THE QUERY TOOLS AND
REPORTING TOOLS, ANALYSIS TOOLS AND
DATA MINING TOOLS.
 IT EXPLAINS EIGHT DIFFERENT TYPES OF
DATA WAREHOUSE ARCHITECTURE
INCLUDING SINGLE-, TWO- AND THREE-
LAYER ARCHITECTURE, BUS ARCHITECTURE,
FEDERATED ARCHITECTURE AND HUB-AND-
SPOKE.
 THE DATA WAREHOUSE ARCHITECTURE
DESCRIBED HERE IS A HIGH LEVEL
ARCHITECTURE AND THE PARTS IN THE
ARCHITECTURES MENTIONED ARE FULL
BODIED SYSTEMS AND NOT SYSTEM-PARTS.
 SOURCE DATA COMPONENT
 DATA STAGING COMPONENT
 DATA STORAGE COMPONENT
 INFORMATION DELIVER COMPONENT
 METADATA COMPONENT
 MANAGEMENT AND CONTROL COMPONENT
 TWO DIFFERENT CLASSIFICATIONS ARE
COMMONLY ADOPTED FOR DATA
WAREHOUSE ARCHITECTURES. THE FIRST
CLASSIFICATION IS A STRUCTURE-ORIENTED
ONE THAT DEPENDS ON THE NUMBER OF
LAYERS USED BY THE ARCHITECTURE.
 THE SECOND CLASSIFICATION DEPENDS ON
HOW THE DIFFERENT LAYERS ARE EMPLOYED
TO CREATE ENTERPRISE-ORIENTED OR
DEPARTMENT-ORIENTED VIEWS OF DATA
WAREHOUSES.
 A SINGLE-LAYER ARCHITECTURE IS NOT
FREQUENTLY USED IN PRACTICE. ITS GOAL IS
TO MINIMIZE THE AMOUNT OF DATA STORED;
TO REACH THIS GOAL, IT REMOVES DATA
REDUNDANCIES.
 THERE IS ONLY ONE LAYER PHYSICALLY
AVAILABLE: THE SOURCE LAYER. IN THIS
CASE, DATA WAREHOUSES ARE VIRTUAL.
 THIS MEANS THAT A DATA WAREHOUSE IS
IMPLEMENTED AS A MULTIDIMENSIONAL VIEW
OF OPERATIONAL DATA CREATED BY SPECIFIC
MIDDLEWARE, OR AN INTERMEDIATE
 THE WEAKNESS OF THIS ARCHITECTURE LIES
IN ITS FAILURE TO MEET THE REQUIREMENT
FOR SEPARATION BETWEEN ANALYTICAL AND
TRANSACTIONAL PROCESSING. ANALYSIS
QUERIES ARE SUBMITTED TO OPERATIONAL
DATA AFTER THE MIDDLEWARE INTERPRETS
THEM. IT THIS WAY, THE QUERIES AFFECT
REGULAR TRANSACTIONAL WORKLOADS.
 IN ADDITION, ALTHOUGH THIS
ARCHITECTURE CAN MEET THE REQUIREMENT
FOR INTEGRATION AND CORRECTNESS OF
DATA, IT CANNOT LOG MORE DATA THAN
SOURCES DO. FOR THESE REASONS, A
VIRTUAL APPROACH TO DATA WAREHOUSES
CAN BE SUCCESSFUL ONLY IF ANALYSIS NEEDS
ARE PARTICULARLY RESTRICTED AND THE
DATA VOLUME TO ANALYZE IS HUGE.
 THE REQUIREMENT FOR SEPARATION PLAYS A
FUNDAMENTAL ROLE IN DEFINING THE
TYPICAL ARCHITECTURE FOR A DATA
WAREHOUSE SYSTEM. ALTHOUGH IT IS
TYPICALLY CALLED A TWO-LAYER
ARCHITECTURE TO HIGHLIGHT A SEPARATION
BETWEEN PHYSICALLY AVAILABLE SOURCES
AND DATA WAREHOUSES, IT ACTUALLY
CONSISTS OF FOUR SUBSEQUENT DATA FLOW
STAGES (LECHTENBÖRGER, 2001):
 THE TWO LAYER / STAGE ARCHITECTURE
ACTUALLY CONSISTS OF FOUR SUBSEQUENT
DATA FLOW STAGES (LECHTENBÖRGER,
2001):
 SOURCE LAYER
 DATA STAGING
 DATA WAREHOUSE LAYER
 ANALYSIS
 A TWO LAYER ARCHITECTURE IS USUALLY
JUST A PRESENTATION LAYER AND DATA
STORE LAYER.
 THESE CAN BE ON 1 TIER (1 MACHINE) OR 2
TIERS (2 MACHINES) TO ACHIEVE BETTER
PERFORMANCE BY DISTRIBUTING THE WORK
LOAD.
 THE KIMBALL GROUP’S ENTERPRISE DATA
WAREHOUSE BUS ARCHITECTURE IS A KEY
ELEMENT OF OUR APPROACH.
 INTRODUCED IN THE 1990S, THE
TECHNOLOGY- AND DATABASE-
INDEPENDENT BUS ARCHITECTURE ALLOWS
FOR INCREMENTAL DATA WAREHOUSE AND
BUSINESS INTELLIGENCE (DW/BI)
DEVELOPMENT.
 IT DECOMPOSES THE DW/BI PLANNING
PROCESS INTO MANAGEABLE PIECES BY
FOCUSING ON THE ORGANIZATION’S CORE
BUSINESS PROCESSES, ALONG WITH THE
ASSOCIATED CONFORMED DIMENSIONS.
 NOWADAYS, CORPORATE USUALLY HAS A SET
OF HETEROGENEOUS SYSTEM LANDSCAPE
THAT CONTAINS TRANSACTION SYSTEMS
AND BUSINESS INTELLIGENCE TOOLS WHICH
PROVIDE ANALYTICAL CAPABILITIES FOR
EACH INDIVIDUAL DEPARTMENT NEEDS.
 EACH DEPARTMENT VIEWS A BUSINESS MODEL
FROM THEIR OWN PERSPECTIVE. FOR
EXAMPLE, A PRODUCT IN SALES CAN BE
DEFINED AS A MATERIAL IN MANUFACTURING
AND EQUIPMENT IN SERVICE MANAGEMENT.
 IN ORDER TO INTEGRATE THOSE
HETEROGENEOUS SYSTEMS THAT AIM TO
PROVIDE ANALYTIC CAPABILITIES ACROSS
THE DIFFERENT FUNCTIONS AND
DEPARTMENTS, THE FEDERATED DATA
WAREHOUSE WAS INVENTED.
 A FEDERATED DATA WAREHOUSE IS A
PRACTICAL APPROACH TO ACHIEVING THE
“SINGLE VERSION OF THE TRUTH” ACROSS
THE ORGANIZATION. THE FEDERATED DATA
WAREHOUSE IS USED TO INTEGRATE KEY
BUSINESS MEASURES AND DIMENSIONS. THE
FOUNDATIONS OF THE FEDERATED DATA
WAREHOUSE ARE THE COMMON BUSINESS
MODEL AND COMMON STAGING AREA.
 ACCORDING TO FORRESTER RESEARCH,
ADOPTING A "HUB AND SPOKE"
ARCHITECTURE FOR INFORMATION SYSTEMS
CAN HELP ORGANISATIONS MAXIMISE THE
VALUE OF THEIR DATA.
 ACCORDING TO FORRESTER ANALYST BRIAN
HOPKINS, MANY BUSINESSES ARE STRUGGLING
TO MANAGE DATA IN THEIR CONVENTIONAL
"LAYER-CAKE" ARCHITECTURES. THIS IS
CHARACTERISED BY HAVING A SINGLE
ENTERPRISE DATA WAREHOUSE AND
BUILDING A BUSINESS INTELLIGENCE SYSTEM
ON TOP OF IT.
 “YESTERDAY’S CORRECT DATA
ARCHITECTURE INVOLVED CENTRALISED
WAREHOUSES, MARTS, OPERATIONAL DATA
STORES, AND A LOT OF [EXTRACT, TRANSFER
AND LOAD] ETL," HOPKINS SAID. “BECAUSE
DATA WAREHOUSES WERE EXPENSIVE,
BUILDING ONE AND FORCING ALL DATA INTO
IT WAS A LOGICAL APPROACH.
 “WITH THE MATURING OF DATA WAREHOUSE
APPLIANCES, THIS NEED HAS DISAPPEARED.“
 INSTEAD, HOPKINS ADVOCATES FOR A
DISTRIBUTED ARCHITECTURE CONSISTING OF
DATA REPOSITORIES OF VARYING
CHARACTERISTICS, WHICH CAN BE USED FOR
DIFFERENT SORTS OF ANALYSIS.
 "HUB-AND-SPOKE … FEATURES RAPID
ANALYTICS AND EXTREME-SCALE
OPERATIONS ON RAW DATA IN AN
AFFORDABLE DISTRIBUTED DATA HUB," HE
SAYS. "FIRMS THAT GET THIS CONCEPT
REALISE ALL DATA DOES NOT NEED FIRST-
CLASS SEATING.”
 HE ARGUES THAT THE HUB-AND-SPOKE
MODEL ALLOWS ORGANISATIONS TO
ANALYSE AND PROCESS MORE FORMS OF
DATA, AND IT SAVES MONEY BY REMOVING
THE NEED FOR CAPITAL INTENSIVE DATA
WAREHOUSING INFRASTRUCTURE.
 HUB-AND-SPOKE ARCHITECTURES MATCH
THE BUSINESS STRUCTURE OF MOST LARGE
ENTERPRISES BY OFFERING A CENTRALIZED
EDW AND A SET OF DEPENDENT DATA
MARTS.
 THE EDW HUB ALLOWS THE ENTERPRISE AS A
WHOLE TO SET AND ENFORCE COMMON
STANDARDS WHILE ENABLING ANALYSIS AND
REPORTS THAT CUT ACROSS BUSINESS UNITS.
THE DATA MART SPOKES ALLOW BUSINESS
UNITS TO MEET THEIR OWN NEEDS QUICKLY
AT RELATIVELY LOW COST AND ALSO
CONFORM TO THE IT REQUIREMENTS OF THE
OVERALL ENTERPRISE.
A WELL-MANAGED HUB-AND-SPOKE
ARCHITECTURE ALLOWS BUSINESS UNITS TO
SET THEIR OWN BUDGETS AND PRIORITIES,
WHILE CONTRIBUTING AS NECESSARY TO THE
CENTRAL EDW.
 DIFFERENT DATA WAREHOUSING SYSTEMS
HAVE DIFFERENT STRUCTURES.
 SOME MAY HAVE AN ODS (OPERATIONAL
DATA STORE), WHILE SOME MAY HAVE
MULTIPLE DATA MARTS.
 SOME MAY HAVE A SMALL NUMBER OF
DATA SOURCES, WHILE SOME MAY HAVE
DOZENS OF DATA SOURCES.
 IN VIEW OF THIS,
 IT IS FAR MORE REASONABLE TO PRESENT
THE DIFFERENT LAYERS OF A DATA
WAREHOUSE ARCHITECTURE
 RATHER THAN DISCUSSING THE SPECIFICS
OF ANY ONE SYSTEM.
 · DATA SOURCE LAYER
 · DATA EXTRACTION LAYER
 · STAGING AREA
 · ETL LAYER
 · DATA STORAGE LAYER
 · DATA LOGIC LAYER
 · DATA PRESENTATION LAYER
 · METADATA LAYER
 · SYSTEM OPERATIONS LAYER
 THE RELATIONSHIPS
 AMONG THE DIFFERENT COMPONENTS
 OF THE DATA WAREHOUSE ARCHITECTURE
 ARE ILLUSTRATED ON THE SLIDE THAT
FOLLOWS
 THIS REPRESENTS THE DIFFERENT DATA
SOURCES THAT FEED DATA INTO THE DATA
WAREHOUSE. THE DATA SOURCE CAN BE OF
ANY FORMAT -- PLAIN TEXT FILE,
RELATIONAL DATABASE, OTHER TYPES OF
DATABASE, EXCEL FILE, ETC., CAN ALL ACT AS
A DATA SOURCE.
 MANY DIFFERENT TYPES OF DATA CAN BE A
DATA SOURCE:
 OPERATIONS -- SUCH AS SALES DATA, HR
DATA, PRODUCT DATA, INVENTORY DATA,
MARKETING DATA, SYSTEMS DATA.
 WEB SERVER LOGS WITH USER BROWSING
DATA.
 INTERNAL MARKET RESEARCH DATA.
 THIRD-PARTY DATA, SUCH AS CENSUS
DATA, DEMOGRAPHICS DATA, OR SURVEY
DATA.
 ALL THESE DATA SOURCES TOGETHER FORM
THE DATA SOURCE LAYER.
 DATA GETS PULLED FROM THE DATA
SOURCE INTO THE DATA WAREHOUSE
SYSTEM. THERE IS LIKELY SOME MINIMAL
DATA CLEANSING, BUT THERE IS UNLIKELY
ANY MAJOR DATA TRANSFORMATION.
 THIS IS WHERE DATA SITS PRIOR TO BEING
SCRUBBED AND TRANSFORMED INTO A
DATA WAREHOUSE / DATA MART. HAVING
ONE COMMON AREA MAKES IT EASIER FOR
SUBSEQUENT DATA PROCESSING /
INTEGRATION.
 THIS IS WHERE DATA GAINS ITS
"INTELLIGENCE", AS LOGIC IS APPLIED TO
TRANSFORM THE DATA FROM A
TRANSACTIONAL NATURE TO AN
ANALYTICAL NATURE. THIS LAYER IS ALSO
WHERE DATA CLEANSING HAPPENS. THE ETL
DESIGN PHASE IS OFTEN THE MOST TIME-
CONSUMING PHASE IN A DATA
WAREHOUSING PROJECT, AND AN ETL TOOL
IS OFTEN USED IN THIS LAYER.
 THIS IS WHERE THE TRANSFORMED AND
CLEANSED DATA SIT. BASED ON SCOPE AND
FUNCTIONALITY, 3 TYPES OF ENTITIES CAN
BE FOUND HERE: DATA WAREHOUSE, DATA
MART, AND OPERATIONAL DATA STORE
(ODS). IN ANY GIVEN SYSTEM, YOU MAY
HAVE JUST ONE OF THE THREE, TWO OF THE
THREE, OR ALL THREE TYPES.
 THIS IS WHERE BUSINESS RULES ARE
STORED. BUSINESS RULES STORED HERE DO
NOT AFFECT THE UNDERLYING DATA
TRANSFORMATION RULES, BUT DO AFFECT
WHAT THE REPORT LOOKS LIKE.
 THIS REFERS TO THE INFORMATION THAT
REACHES THE USERS. THIS CAN BE IN A
FORM OF A TABULAR / GRAPHICAL REPORT
IN A BROWSER, AN EMAILED REPORT THAT
GETS AUTOMATICALLY GENERATED AND
SENT EVERYDAY, OR AN ALERT THAT
WARNS USERS OF EXCEPTIONS, AMONG
OTHERS. USUALLY AN OLAP TOOLAND/OR A
REPORTING TOOLIS USED IN THIS LAYER.
 THIS IS WHERE INFORMATION ABOUT THE
DATA STORED IN THE DATA WAREHOUSE
SYSTEM IS STORED. A LOGICAL DATA
MODEL WOULD BE AN EXAMPLE OF
SOMETHING THAT'S IN THE METADATA
LAYER.A METADATA TOOLIS OFTEN USED
TO MANAGE METADATA.
 THIS LAYER INCLUDES INFORMATION ON
HOW THE DATA WAREHOUSE SYSTEM
OPERATES, SUCH AS ETL JOB STATUS,
SYSTEM PERFORMANCE, AND USER ACCESS
HISTORY.
 ACCORDING TO KELLY (1997),
 THE FOLLOWING ARCHITECTURE
PROPERTIES ARE ESSENTIAL FOR A DATA
WAREHOUSE :
 SEPARATION
 SCALABILITY
 EXTENSIBILITY
 SECURITY
 ADMINISTERABILITY
 SEPARATION : ANALYTICAL &
TRANSACTIONAL PROCESSING SHOULD BE
KEPT APART AS MUCH AS POSSIBLE.
 SCALABILITY: HARDWARE AND SOFTWARE
ARCHITECTURES SHOULD BE EASY TO
UPGRADE AS THE DATA VOLUME, WHICH
HAS TO BE MANAGED AND PROCESSED,
AND THE NUMBER OF USERS'
REQUIREMENTS, WHICH HAVE TO BE MET,
PROGRESSIVELY INCREASE.


 EXTENSIBILITY: THE ARCHITECTURE SHOULD
BE ABLE TO HOST NEW APPLICATIONS AND
TECHNOLOGIES WITHOUT REDESIGNING THE
WHOLE SYSTEM.
 SECURITY: MONITORING ACCESSES IS
ESSENTIAL BECAUSE OF THE STRATEGIC
DATA STORED IN DATA WAREHOUSES.
 ADMINISTERABILITY: DATA WAREHOUSE
MANAGEMENT SHOULD NOT BE OVERLY
DIFFICULT.

Вам также может понравиться