Вы находитесь на странице: 1из 21

expressor Thought leadership webinar

Semantic Types: Making ETL data mapping simpler and easier to maintain

• Bill Kehoe, Chief Architect


• Wiqar Chaudry, Product Manager

www.expressor-software.com
Bill Kehoe

Bill Kehoe is a founding engineer at expressor and has been a key


developer since the original 1.0 version of the product. He is now a
chief architect providing technical leadership on all aspects of product
engineering.

Previously Bill was an architect at Blue Agave Software where he was


the lead developer for the data sub-system of a supply chain
management product. Bill also held a senior architect role at Versata
and a senior developer and program manager at Sybase. At Sybase,
he architected and led the development for SQL Debug, a client
server application for debugging SYBASE Transact-SQL stored
procedures.

Bill graduated Magna Cum Laude from Tufts University in Civil


Engineering and is a member of the Tau Beta Pi Engineering honor
society. He has also done post graduate work at Harvard University.

2 copyright  2011 expressor software corporation


Today’s Agenda

 expressor Studio 3.1 product overview


 Semantic Types
What are they?
Sample Application
 Studio Demo
Multiple data source formats mapped to a
common semantic type
 2011 product roadmap
 Q&A

3 copyright  2011 expressor software corporation


expressor Studio 3.1
 Free & downloadable
 Integrated visual development studio for ETL
applications
 Built-in productivity tools and wizards
– Automatically capture DB connectivity details
– Automatically capture metadata
– Access to standard business rules, e.g. date format
conversions
– Automatic type conversion
 Just-in-time error notification and help
www.expressorStudio.com
 Graphical library system for reusing design
assets
 Fast data processing engine
 Semantic Types
– Field name and type standardization capabilities
– More to come in upcoming 3.x releases

4 copyright  2011 expressor software corporation


expressor

Semantic Types

5 copyright  2011 expressor software corporation


Traditional Point-to-point vs. Canonical Mapping
Point-to-point Mapping Canonical Mapping
(traditional ETL) (expressor)

Semantic
Type

Semantic Types improve


time-to-value through
greater reuse and
simplified data mappings

6 copyright  2011 expressor software corporation


What is a Semantic Type?

An abstract data interface independent of:


 Physical formatting details such as delimiters,
character encoding, field formats like date, time,
currency, etc.
 Defines the logical structure of data, free of
positional constraints (i.e. field order independent)
 Enables rule expression (constraints, data quality,
transformations and derivations) independent of
external interface “baggage”
 Enables rapid application assembly
 Eases Data Governance / Data Lineage Tracking

7 copyright  2011 expressor software corporation


Physical vs. Semantic

Physical metadata Semantic metadata


Field
Column [Attribute of]
[Leaf] Element Atomic Semantic Type

CSV Record
Database Table Composite Semantic Type
Array

[Non-Leaf] Element Nested Composite Type


Nested Table (i.e. Composite Attribute)

8 copyright  2011 expressor software corporation


Recurring Data Integration challenge

 One logical target schema


 Apply common set of rules to ensure logical data
integrity, but …
Input data is messy!
 Dozens of external formats to consume
 Data quality varies depending on data source

Goal: Easily assemble a single, maintainable


application that can consume all sources of data
and supports extensions over time

9 copyright  2011 expressor software corporation


Sample application: Exploration expenses

 Data warehouse for analyzing energy


exploration expenses
 Sub-contractors used for site work
 One target data warehouse table
 One set of validation rules
 Multiple, contractor-specific expense data
formats EXPENSE
Contract_ID : integer
Contractor1 Item_ID: varchar(20)
Item_Description : varchar(1024)
Validation ExpType : varchar(10)
Contractor2 Rules Amount : decimal(10,2)
StartDepth : Integer
EndDepth : integer
Contractor3 ExpenseDate : date

10 copyright  2011 expressor software corporation


Semantic Type example: ContractExpense
ContractIdentifier ItemDescription MonetaryAmount Depth
String String Decimal Integer

ItemIdentifier ExpenseType BookDate


String String Datetime

ItemIdentifier ExpenseType Amount EndDepth

ContractIdentifier ItemDescription ContractExpense ExpenseBookDate StartDepth

11 copyright  2011 expressor software corporation


Semantic Type example: ContractExpense
ContractIdentifier ItemDescription MonetaryAmount Depth

ItemIdentifier ExpenseType BookDate

ItemIdentifier ExpenseType Amount EndDepth

ContractIdentifier ItemDescription ContractExpense ExpenseBookDate StartDepth

Schema: Contractor1
ContractId ItemId ItemDescription ExpType Amount Date StartDepth EndDepth

Contractor 1 Data Format

12 copyright  2011 expressor software corporation


Semantic Type example: ContractExpense
ContractIdentifier ItemDescription MonetaryAmount Depth

ItemIdentifier ExpenseType BookDate

ItemIdentifier ExpenseType Amount EndDepth

ContractIdentifier ItemDescription ContractExpense ExpenseBookDate StartDepth

Schema: Contractor1
ContractId ItemId ItemDescription ExpType Amount Date StartDepth EndDepth

Contractor 1 Data Format

13 copyright  2011 expressor software corporation


Semantic Type example: ContractExpense
ContractIdentifier ItemDescription MonetaryAmount Depth

ItemIdentifier ExpenseType BookDate

ItemIdentifier ExpenseType Amount EndDepth

ContractIdentifier ItemDescription ContractExpense ExpenseBookDate StartDepth

Schema: Contractor1
ContractId ItemId ItemDescription ExpType Amount Date StartDepth EndDepth

Contractor 1 Data Format


Schema: Contractor2
Contract Date ItemId Amount ItemDescription ExpType StartDepth EndDepth

Contractor 2 Data Format

14 copyright  2011 expressor software corporation


Semantic Type example: ContractExpense
ContractIdentifier ItemDescription MonetaryAmount Depth

ItemIdentifier ExpenseType BookDate

ItemIdentifier ExpenseType Amount EndDepth

ContractIdentifier ItemDescription ContractExpense ExpenseBookDate StartDepth

Schema: Contractor1
ContractId ItemId ItemDescription ExpType Amount Date StartDepth EndDepth

Contractor 1 Data Format


Schema: Contractor2
Contract Date ItemId Amount ItemDescription ExpType StartDepth EndDepth

Different field
orders, date and
numeric value Contractor 2 Data Format
formats

15 copyright  2011 expressor software corporation


Semantic Type example: ContractExpense
ContractIdentifier ItemDescription MonetaryAmount Depth

ItemIdentifier ExpenseType BookDate

ItemIdentifier ExpenseType Amount EndDepth

ContractIdentifier ItemDescription ContractExpense ExpenseBookDate StartDepth

ContractId ItemId ItemDescription ExpType Amount Date StartDepth EndDepth

9.999,99
Contractor 1 Data Format
Schema: Contractor2
Contract Date ItemId Amount ItemDescription ExpType StartDepth EndDepth

Contractor 2 Data Format

16 copyright  2011 expressor software corporation


Semantic Type example: ContractExpense

 Key advantages:
– Same dataflow accepts data from any number of
different external formats
– Trivial to add support for new source and target
formats (files, databases, spreadsheets)
– Rule logic is completely insulated from the chaos of
physical/external data storage representations
physical semantic physical

17 copyright  2011 expressor software corporation


expressor Studio – Multiple data source formats
mapped to a common Semantic Type

Demo

On demand Webinar Link:

http://bit.ly/fW27F7

18 copyright  2011 expressor software corporation


Semantic Types white paper

Download our
Semantic Types
White paper

www.expressor-
software.com/semantic-types

19 copyright  2011 expressor software corporation


Thank You!

Questions?

20 copyright  2011 expressor software corporation


Thank You!

info@expressor-software.com

www.expressor-software.com