Вы находитесь на странице: 1из 35

BUSINESS INTELLIGENCE: CONCEPTS, TECHNOLOGIES AND INDUSTRY ANALYSIS

Andrew J. Brust Founder, Blue Badge Insights

MEET ANDREW
Founder, Blue Badge Insights Chief Technology Officer, Tallan Member, Microsoft BI Partner Advisory Council

Founder, Microsoft BI User Group of NYC


http://www.msbinyc.com

Redmond Review columnist for Visual Studio Magazine and Redmond Developer News

brustblog.com,

@andrewbrust

READ ALL ABOUT IT!

AGENDA
Concepts, terms, approaches and products for DW, multi-dimensional databases Other BI product categories CPM, scorecards, dashboards Stacks, acquisitions, vendors Open Source Excel Cloud NoSQL Fusion

BIS PREMISE
Conventional RDBMSes are for operational, transactional use
Execute a trade; lookup an order; book a flight They perform data maintenance very well If we want to move from tracking data to discovering information, we need different methodologies, architectures We may need different technologies and products too Reading data, not writing it

PREPARING FOR BUSINESS INTELLIGENCE


Data Warehouse

Transactions Process
Transaction Database

Data Relationships Analysis

DIMENSIONAL MODEL
Measure
Dimension Hierarchy Grain

STAR SCHEMAS
Physical data model
Central fact table Multiple dimension tables Used to constrain fact table queries

DATA WAREHOUSE VS. DATA MART AND BILL INMON VS. RALPH KIMBALL
A Data Mart is suited to a purpose, specific team, department, division or other subgrouping of the organizations data
May be subset of DW May be a standalone, tactical repository Kimball vs. Inmon Kimball (bottom-up) Data Warehouse is constructed from Data Marts; both use dimensional model/star schema Inmon (top-down) Data Marts are extracted from the DW. DW uses 3rd Normal Form; Marts use Star Schema

DATA WAREHOUSE 2.0


Introduced in 2008 in DW 2.0: The Architecture for the Next Generation of Data Warehousing (Elsevier Press)
Recognizes importance of:

Transactional data
Age of data Unstructured data Metadata Still a challenger to Kimball

DW 2.0

DW VENDORS
All major RDBMS vendors, including:
IBM (DB2) Oracle Microsoft (SQL Server) Specialty DW technologies and vendors

MASSIVELY PARALLEL PROCESSING (MPP)

A scale out, shared nothing, divide/conquer approach to data warehousing


Split the query amongst nodes, then merge the results Abstract that from developers, users Used to handle DWs in the hundred of terabytes

Often implemented as appliance with network and storage hardware


Products Microsoft SQL Server Parallel Data Warehouse (PDW)

Formerly DATAllegro
Teradata Netezza (acquired by IBM) Greenplum, ParAccel, others

THE LIMITATIONS OF RELATIONAL AND DW


SQL not designed for aggregation
GROUP BY tedious Aggregate functions are minimal Relational engine designed for transactions Oriented to creation, update and quick retrieval of rows

These capabilities go largely unused in DW scenarios


Need databases oriented to dimensional query and preaggregation optimization

ENTER OLAP: ONLINE ANALYTICAL PROCESSING

Multi-Dimensional Model
Multi-Dimensional Data Store Relational Data Store

FROM DW TO MULTIDIMENSIONAL
Data Warehouse

Transactions Process
Transaction Database

Data Relationships Analysis

Multidimensional Hierarchical
OLAP Database

CONVENTIONAL OLAP PLAYERS


Microsoft SQL Server Analysis Services
Oracle/Hyperion Essbase IBM/Cognos (PowerPlay) SAP BW (aka NetWeaver BI)

OLAP STANDARDS
MDX: MultiDimensional eXpression language
An OLAP query language that is superficially similar to SQL XML for Analysis (XMLA) A SOAP Web Service interface for performing admin/DDL and query tasks over HTTP Both come from (Microsoft) Analysis Services Are nonetheless implemented (to varying degrees) in numerous competing products

COLUMN-ORIENTED
Store values for a given column next to each other, instead of values for a given row More efficient for aggregating single columns Facilitates high degree of compression Manipulate large amounts of data in memory Examples Sybase IQ (now owned by SAP) Vertica QlikView Microsoft PowerPivot (and SQL Server v.Next)

OTHER BI TECHNOLOGIES
Enterprise Reporting
Enterprise Data Management (EDM) Extract, transform and load (ETL) Data Quality Management (DQM) Master Data Management (MDM)

Data Mining/Predictive Analytics


Statistics software

PERFORMANCE MANAGEMENT (CPM)


C is for corporate
Highly focused on Key Performance Indicators (KPIs) and scorecards Balanced Scorecards

Management methodology
Perspectives, Objectives, KPIs, Strategy Maps Data Visualization Dashboards If BI is the platform, CPM is the application

BSC EXAMPLE

THE TYPICAL BI STACK


Data Visualization and RIA

ACQUISITIONS
Acquisitions define this market
Biggies IBM-> Cognos, SPSS, Netezza

Oracle -> Hyperion (-> Arbor Software)


SAP -> Business Objects (-> Crystal Decisions), Sybase Remaining pure play biggies: MicroStrategy and SAS Pure plays were the rule; are now assumed to be short-lived exceptions

M&A RAMIFICATIONS
Most mega-vendors stacks are stitched together
Best of breed is difficult strategy Common licenses may bundle disjoint product sets

VENDORS - COMMERCIAL
Mega-vendors
IBM Oracle SAP

Microsoft
Others MicroStrategy

SAS
QlikTech Tableau Tibco/Spotfire

OPEN SOURCE
Pentaho
Japsersoft and

OTHER OPEN SOURCE BI


BIRT Open Sourced version of Actuate

R Project Open Source competitor to SAS and SPSS


Mondrian Pentahos OLAP engine JFree Java chart library and supporting funcitonality JPivot Java browser-based PivotTable-like UI for Mondrian SpagoBI Integrates OSS and proprietary products into full BI suite Marvelit DASH Reporting and dashboards Palo ETL, in-memory OLAP, Web-based reporting, dboards, Excel add-in OpenReports Supports JasperReports, JFreeReport, BIRT; works with Mondrian and JPivot OpenI (combines JPivot, Mondrian, R Project, JFree and JasperReports)

EXCEL AS BI TOOL
Formal classifications notwithstanding, Excel is the #1 BI tool out there
Most BI technologies have an Excel add-in story Excel has native database connectivity; PivotTables (and charts) are designed for dimensional analysis Excel can query SQL Server Analysis Services cubes directly

PivotTables, PivotCharts, CUBE formulas


PowerPivot is an Excel add-in The pros and cons of spreadmarts

CLOUD BI
Amazon Web Services-Based Pentaho/ParAccel Jaspersoft GoodData Proprietary Oco Birst myDIALS Tableau Public Predixion Microsoft: SQL Azure Federation and SQL Azure Reporting

NOSQL
Works with unstructured data; abhors schemas
NoSQL not really a BI technology, but Uses MapReduce and sharding which have some commonality with MPP architectures Lives in the same Open Source and/or startup milieu as MPP and columnar

Is in the cloud, almost by definition

FUSION, CONSOLIDATION
MPP Data Warehouse appliances may be built on open source databases (even SQL Server PDW started that way)
MPP products adding columnar features

Columnar replacing OLAP, but replacing DW too


All products going to cloud Open Source, startup, MPP, columnar, NoSQL, cloud all crosspollenating, with mega-vendors taking an interest

WHAT NEXT?
If youre not using BI, you should be
Start simple small data marts, cubes Exercise healthy skepticism of MPP, Columnar, Cloud, but do not dismiss any of them Have an Excel strategy Design for your users After some experimentation, consider DQM, MDM strategy

RESOURCES
Links to most companies, products and subjects mentioned in this talk can be found at: http://bit.ly/IASABrustBILinks

QUESTIONS
andrew.brust@tallan.com
@andrewbrust

Вам также может понравиться