Вы находитесь на странице: 1из 81
What is Reporting? An Report is an organized collection of data, prepared for viewing, analysis. It is the presentation of a formatted collection of information, that can also be presented on web. A report typically contains data from many transactions. It represents the information retrieved from a table or query in a preformatted manner An Report is an organized collection of data, prepared for viewing, analysis. Report Produetname chy [Country | Saleteral | Unitprice rancher water Sag Pare France SIS 46] S55 EverGlow Kerosene Paris France 2ois2] 3155 Star Garena) Strasbourg [Frene= | 26300.36] 677.05) iterator Setr-infating Mat |siasbourg France | “sase zal 127 27 Canyon Mule Carryal Stracbourg |France 2546] 73.5] [Traitchot Kitchen tit Oriana [France ors 40] 23.5] ratchet otto Qreane [France Bara) 1323] Hibernator Pad Srigans |France | 1490.06] 40.65 Fireny 2 Grieans [France 2208.8] 27.37 Fivemy wu Dreans [rranee | ass al Anal EverGiow Butane Oréanc |France | 3497.20| 67.79) Evert low Larnp, Greans |rrance | 351s s2| 9 37 a1 Fiicker Lantern Griaane |eranee | os7 oa] 3e.09] Mountain Man Analog Orieans [France | z2iso.on| acca Mountain Man Combination |Orean= [France | 2225. 76[ © Sa.a1 Polar Sun Greanc |erance | agaa.ce] 61-84 Polar Sun Grigans |Franee | 1615.02) 61.04] ear Survival Edge Oreans [France 4350.5] S229 BugShield Natural Orléane France 762.6 5.59) Buashield Spray Crieans [France 77a9| a5 Report : Purpose: Final: Collection of Data Analysis - Comparative Study of Data, Historical Data Improve Decision ANALYSIS « Comparison of sales (fact) of a product (dimension) over years (dimension) in the same_ region (dimension). « What is the total sales value (fact) of a particular product (dimension) in a particular grocery store (dimension), over a period (dimension)? « What is the amount spent (fact) for a particular product promotion (dimension) in a particular branch (dimension) or in a particular city (dimension), over a period (dimension)? Business Questions 1. Reporting capabilities for key performance such as: « product profitability; « units sold; « gross revenue. Business Questions 2. Performing complex analysis for : + evaluating success, timing, and duration of promotion| campaigns; « evaluating shopper buying patterns and products, i.e., market basket analysis; * evaluating pricing and promotion strategy by category. Business Questions Finance What is the net income, expenses, gross profit, and net profit for this quarter, year? Accounts What is the sales amount this month and what is the outstanding pending payment? Production How many products are manufactured in each production unit today, weekly, monthly? Sales How many products have been sold in each area today, weekly, monthly? Quality How many products have been defective today, weekly, monthly, quarterly, yearly? Offer Letter’s - Example IBM Accenture Start Up salary 60K 55K 70K Location New York Dallas California Rote. Software Engg. Software Consultant Consultant Responsibility ETL Consuttant Developer PU/Sql ETL Consultant Working Hrs Day Time Day/ Night Day / Night Career Growth Improve Decision : Less chance’s of ETL Lead 1+ years then ETL Lead After 1 Year - Hike Comparative Study, Looking Data at Different Perspective Single purpose of building Data Warehouse - Improve Decision How to achieve Improve Decision STAGE 1 Reporting WHAT happened ? STAGE 2 Analyzing WHY did it happen ? STAGE 3 Predicting WHAT will happen ? STAGE 4 Operationlizing WHAT is happening ? STAGE 5 Final What do | WANT to happen ? DATA WAREHOUSE A data warehouse is the most effective means of organizing and analyzing the vast amount of information generated in the business, and help’s the business to generate a more effective business model for keeping the business profitable. WHY DATA WAREHOUSE ??? Multi Dimensional Analysis of Data - Reporting Improve Decision Making Associations Which items are purchased in a retail store at the same time? Sequential Patterns What is the likelihood that a customer will buy a product next month, if he buys a related item today? A © Data >> Decision ol _— + Information - Meaning by relational connection + Data - Raw observations - No meaning Knowledge - Appropriate collection of information - Intent is to be useful and to change the business processes - Usually answers ‘how’ questions What is Knowledge? ee —_————__ <6 Data Information Knowledge >> mation _Knowle — + Dataincontext — + Information + + Raw Facts. Readily Experience, Judgement Numbers” Captured in + Two types: Tacit (hard documents and to capture) and Explicit database (easier to capture) “Call “Avg resolution + “Top performers and oe Call time, Agent-wise underachievers” a case closure rate” += “Trends in call volume cover days / weeks” Strategic Value S + Knowledge applied to decision- making, work + “Train underachievers” + “Resource allocation| following trend” Chapter 2 + OLTP or Operational Systems + Characteristics - OLTP + Business Intelligence + Why BI? +» Definition of DW Architecture of DW » Advantages - Data Warehouse +» Goals of Data Warehouse Information System / OLTP Systems OLTP Systems - Highly Normalized Databases Purpose of OLTP Systems is to capture Data Do DML Activities - Good Example - ATM Transaction, Railway Systems, Retail Business Purpose of Data Warehouse is for Multi Dimensional Analysis. OLTP Applications Equity Plans Shares, Insurance Loans Savings Example - OLTP Railway System, Air Tel, ATM, Life Style. Information Capturing Systems, Normalized Data Bases, Purpose is to do Insert, Update, Delete data. That is DML activity. What is Normalized Data Bases - 3rd Normal Form (3NF) Business Intelligence « From an information systems standpoint, BI provides users with online analytical processing or data analysis capabilities to predict trends, evaluate business questions, and so on. « From a business analyst viewpoint, it is the process of gathering high-quality, meaningful information about a subject, which enables the analyst to draw conclusions. Definition: Data Warehouse Data Warehousing is a Process, Not a Product Data warehousing is the entire process of data extraction, transformation and loading of data to the warehouse and the access of the data by end users and applications. Data Warehouse Architecture Source End Systems. Users Legacy Informatica J Business Objects . fa (a External = ETL oLaP Data ¥- Warehouse af | Operational (Online Analytical (Extraction Processing ) Transformation Advantage through DW ( Data Analysis) + Acquire New Customers « Retain Existing Customers + Improve Customer Satisfaction Sell More Products + Improve Marketing Initiatives + Realize Niche Opportunities Goals of Data Warehouse Easy Access to Organization Information. Data Warehouse must be adaptive Secure environment to protect information assets. Foundation for Improved Decision making. Chapter 3 Characteristics Data Warehouse Difference - OLTP / DW Dimensional Modeling Size of the Fact Table Star Schema Snow Flake Schema Data Warehouse Characteristics Subject- Integrated oriented Data Warehouse Nonvolatile Time-variant Subject-Oriented + Data is categorized and stored by business subject rather than by application. OLTP Applications Data Warehouse Subject Equity Plans ‘Shares 1 oe insurance — ie W Savings Customer financial information Integrated + Data on a given subject is integrated. Nonvolatile Operational Warehouse w w Load Insert Read Read Update Delete Time-Variant Zar * Time Data 01/01 | January 02/01 | February 03/01 March Data warehouse OLTP v/s Data Warehouse OLTP DW Access Read/Write Lots of scan Unit of Work Short, Simple Transaction Complex Query # of Records Accessed Thousands Millions # Users Thousands Hundreds DB Size 100 MB - GB 100 GB - Terabytes OLTP v/s Data Warehouse OLTP DW Users Clerk, IT Professional Knowledge Worker Function Date to Date Operations Decision Support DB Design Application Oriented Subject Oriented Data Current, Up to date Historical, Summarized detailed , Relational Purpose - DW and OLTP The main intend of Data Warehouse is based on the Multi Dimensional Design (Denormalized) where the goal it to provide the capabilities of end-user querying i.e., efficiency of reporting. The main intend of OLTP/Transaction processing is based on the 3NF designs where the goal is the efficiency of insert as opposed to efficiency of reporting. It is much easier to write a query using a Data Warehouse with four or six tables than writing a query for the equivalent 3NF design that may have 15 or more tables Difference’s - DW and OLTP + They are both databases + They both hold data + But, they have been designed for different scopes: Running the business (Operational Systems) v/s Managing the business (DWH): - Operational systems focus on present data. - DWH’s focus on historical data (present, past). - Operational systems are optimized to insert/update and store data. - DWH’s are optimized to select/analyze data. Data Warehouse Architecture Source Systems Legacy on Informatica — | External ETL fe, Operational iy Data Warehouse Business Objects OLAP. | ( Online Analytical ( Extraction Transformation Processing ) End Users. Dimensional Modeling Dimensional Modeling provides users the ability to view data based on the organization of the business and the important characteristics of the data v_ The flexibility provided by dimensional analysis allows users to restrict data sets and dictate how data will be grouped together on the fly. v Dimensional Modeling allows for analysis at the intersection of two or more dimensions. There are two major components of dimensional analysis: vy Dimensions, which determine how data will be presented; and ¥ Facts, which determine what data will be presented. Dimension Table Examples Retail -- store name, zip code, product name, product category, day of week Telecommunications -- call origin, call destination Banking -- customer name, account number, branch, account officer Insurance -- policy type, insured party Dimension Table Characteristics Dimension tables have the following characteristics: Contain textual information that represents the attributes of the business - Contain relatively static data - Are joined to a fact table through foreign key reference - They are hierarchical in nature and provide the ability to view data at varying levels of details. PreduetDinn: Tresbmersion Product melo Productonte Saloskact Dayiitivoek ProductNamne (eomeoa DeyOtronth Catogory ime DayChveer Suncategary = Monch Brand SolesDalira fuser Hest “est ath Holiday weekend Fact Table Examples Retail -- number of units sold, sales amount Telecommunications -- length of call in minutes, average number of calls Banking -- average monthly balance Insurance -- claims amount Fact Table Characteristics Fact table have the following characteristics Contain numerical metrics of the business Can hold large volumes of data Can grow quickly Are joined to dimension table through foreign keys that reference primary keys in the dimension tables Sales Fact Table Product_id Store_id Customer_id Date Sales_amount Sales_units Identifying Measures and Dimensions + The attribute varies continuously: - Balance - Units Sold - Cost - Sales Measures + The attribute is perceived as constant or discrete: ” Product Dimensions - Location - Time - Size Size of the Fact Table Total company revenue $20 billion Avg. sale price per line item $5 Number of line items per year $20 billion / $5 = 4 billion Number of base fact records 4 billion x 3 yrs = 12 billion Key fields 4 (x 4 bytes) Fact fields 4 (x 4 bytes) Base fact table size 12 billion x 8 fields x 4 bytes = 385GB Star Schema Employee_Dim Employee Id Emp Name A Dimension Table product ban Fact Table, Time Dim Product Id Date Date Product Name Month Employee Id Product Id |Customer Id — Location Id —__ [Sales Amount Location_Dim Unit Safes... Customer_Dim Location Id Customer Id Location Name {ustomer Name Star Schema Fact table Product = —=- Pr Location Facts (units, price) Customer =p 7p sCTime L_ Dimension tables _ 6001} SF Market_Table Key 4[ Brand 1001} ABC Product_Table Star Schema Key 2| Stat vey Year | Month 9001 1998 | March Key 1| Key 2| Key 3| Dollars 1001 | 6001 | 9001 | 6000 1002 | 6002 | 9002 | 10000 1003 | 6003| 9003 | 15200 1004 | 6002 | 9003 | 9526 Fact_Table Time_Table Snowflake Schema CHANNELS CUSTOMERS Dimension J SALES 7 Dimension Fact Measures i. “COUNTRIES | PRODUCTS = Amount | oh Dimension 7 T- Cost |__Dimension Nergin (Snowflake) - Quantity tines LL ros wh taxf_f. PROMOTIONS mension ~_US_State_Tax Chapter 4 Surrogate Keys Grain Level Data Mart / Data Warehouse Data Warehouse Strategies - Kimball / Inmon Staging Area Surrogate Keys It has no meaning, other than stating uniqueness for each record stored in the dimension tables. Surrogate key Will be used in all dimension tables. Product key: 38972 It is a Just an Sequence No. Advantages of surrogate keys include: - Control over data - Reduced fact table size Avoid using the OLTP keys as data warehouse keys Surrogate Keys Example Salesperson_ID | Salesperson_name | Manager_ID | Emp_Change_Date 101-SM-99 Smith 200 030199 102-JN-99 Jones: 300 050599 103-HR-99 Harvey 300 060599 104-JH-01 John 400 061001 Prod_SK| | Prod_ID | Prod_Name | Prod_Grouping | Brand_Code | Prod_Change_ Date 8001 3258-CY | Coffee Hot YUBN 032200 8002 3259-CH | Coffee Hot MAXH 110100 8003 1172-PS | Pop Cold SCHW 061001 8004 1173-TR | Tea Hot RRSE. 061001 Determining Granularity YEAR? QUARTER? MONTH? WEEK? DAY? Data Mart - Data Warehouse + A Data Mart stores data for a limited number of subject areas, such as marketing or sales data. + A Data Warehouse deals with multiple subject areas and is typically implemented and controlled by a central organizational unit such as the Corporate Information Factory. It is often called a central or enterprise data warehouse. Data Warehouse / Data Marts Versus Data Data warehouse mart Property Data Warehouse Data Mart Scope Enterprise Department Subjects Multiple Single Data source ‘Many Few Implementation time | Months to years Months Data Warehousing Strategies Enterprise-wide warehouse, top down, the Inmon methodology Data mart, bottom up, the Kimball methodology When properly executed, both result in an enterprise-wide data warehouse, but with different architectures Top Down Approach Flat Files Operational Systems [ Marketing _ Marketing -— Sales — i Finance rir +> _ Human Resources Data Le Finance Warehouse Data Marts External Data Independent Data Mart Operational Systems. Flat Files p.COC« External Data Sales or Marketing Bottom-Up Approach IF gl Data Data marts warehouse Legacy data a is ' Marketing Operations data sales. Sales ml p Finance en, Finance External data sources Source Systems External fe, Operational Data Warehouse Architecture Informatica ETL iy Data Warehouse Business Objects OLap | (Online ( Extraction Transformation Analytical Processing } End Users Data Staging + Often used as an interim step between data extraction and later steps + No end user access to the staging Basic Elements of the Data Warehouse » Source: Source database or other source form + Data staging area: Intermediate area + Target: Data warehouse or Data Mart ————_ ——s Source, Staging Target Data Warehouse Architecture Source Staging Presentation Access Systems Area Area Tools Data " Warehouse External. (+. Caleb fs Operational Data Marts Slowly Changing Dimensions SCD Type 1 SCD Type 2 SCD Type 3 Conformed Dimension Degenerated Dimension Junk Dimension Type of Fact Factless Fact Slowly Changing Dimensions (SCD) * Slowly changing dimensions change gradually and occasionally over time. Example: Employee change their address, name, marital status Approach Results Overwriting the old values in the Only Losing the ability to dimension record Current track the old history Creating an additional dimension | History+ —_| Segmenting history very record (with a time stamp) at the | Current accurately between the time of the change with the new old description and the attribute values new description Creating new “current” fields and Describe both historical move the old attribute in 2 and current view “Precedent” field SCD Type - 1 Current Records (No History) Approach Results Type 1 | Overwriting the old values in the [Only Losing the ability to dimension record Current _| track the old history SCD Type - 2 History + Current Records Approach Results Creating an additional dimension record (with a time stamp) at the time of the change with the new attribute values Segmenting history very accurately between the old description and the new description SCD Type 3 Previous + Current Records Approach Results Type 3 | Creating new “current” fields and | Previous + Current | Describe both move the old attribute ina historical and “Precedent” field current view Conformed Dimensions + An Dimension Table which is shared across Data Mart’s or more than 1 Fact Table + Example : Calendar/Date/Time - Dimension Customer Dimension Product Dimension Conformed Dimensions Customers create orders for products, and the same customers return the products. The orders and returns tables are assumed to have different cardinalities (i.e., not every return corresponds exactly to a single order). Orders FACT Returns FACT Time_key Customer Dimension Time_key Customer_key Customer_key ‘Customer_key Product_key Customer attributes Product_key Quantity_ordered = Reason_key Extended_amount Quantity_returned Refund_amount Degenerated Dimension + Degenerative Dimension is something dimensional in nature but exist in fact table Customer_key Orders FACT Product Dimension Customer attributes Product_key roduct_key Product attributes ... ORDER_NUM Quantity_ordered Extended_amount Discount_amount, Junk Dimension A Junk Dimension is a dimension which collects together a number of Indicator Flags relating to a Transaction. Time Dimension Time attributes ... Customer_key Orders FACT Trade Dimension Trade_key Origin Code Product Dimension Product_key Product attributes ... Price per Share No. of Shares Commission Reinvest Flag ‘Special Instructions Junk Dimension Origin Code Reinvest Flag Special Instructions Phone Yes Reinvested Web No No Reinvestment Phone No No Reinvestment Web Yes Reinvested Fact Tables Types of Measures - Additive facts - Non-additive facts - Semi-additive facts Fact Tables « Additive Facts - Additive facts are facts that can be summed up through all of the dimensions in the fact table. Example: Dollar value is additive fact. If we want to find out the amount for a particular place for a particular period of time, we can add the dollar amounts and come up with the total amount. Fact Tables + Non - Additive Facts Non-additive facts are facts that cannot be summed up for| any of the dimensions present in the fact table. Example: Measure height(s) for ‘citizens by geographical location’ , when we rollup ‘city’ data to ‘state’ level data we should not add heights of the citizens rather we may want to use it to derive ‘count’ Example: Percentage (%) Fact Tables + Semi - Additive Facts Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others. Fact less Fact Table Teacher dimension, Location dimension Teacher_PK Ts] FFT Troeation pk Teacher_FK Course_FK Student_FK Location_FK Course dimension t t Student dimension Course_PK Student_PK PK = Primary Key & FK = Foreign Key + Implementation Methodology » Environments Development/Test/Production Project Manager Ze \ Business Analyst Architect ETL Lead Role: Requirement Gathering | Role: Design (Solution Provider) Rolo: Team Lead (Data Modeler + Database +" | (DW Concepts + Databaso + ( tin ETL) Domain knowledge) Data Modelor + Good Knowladgo - ETL + Good Knowledge - OLAP) s System Stuth \ ETL Dev. /Con: ource System Study \ Rolo: Developer ETL Dev./Cons. Role: Source System study - specialist, Build ETL Programs J eens \ ose Lets Data Modeler Data Modeler (ope jsise i OP) Role: Design Logical and Physical Model a {ER Modaling , Dimensional Modeting) OLAP Dev, / Cons. \ Role: Developer \ Test Lead Build Reports OLAP Dev. ¥ Cons. Role - Team Load DBA ( Specialist - SL, Testing Process) Rolo: Manago tho Database (Create and Manage Database) Testers Example + Development of any application - web site, life style - Example + Development Process of Data Warehouse Project. « Test Phase, Production Phase. Database Software BI Servers Software cache Wathen Sener Preartaton ayer 200m, oR Merny iors 80 {Application | Hardware Hardware ers SCS hot Posse

Вам также может понравиться