Вы находитесь на странице: 1из 41

Apply the power of Azure Data Lake Analytics

to your BI Landscape
Tillmann Eitelberg
Oliver Engels
Our Sponsors
Oliver Engels
CEO, oh22data AG

/oengels @oengels oengels

Tillmann Eitelberg
CEO, oh22information services GmbH

/tillmanneitelberg @_Tillmann tillmanneitelberg


Agenda

 Why Big Data and why Azure Data Lake Analytics?


 Ingres and Engres of data load
 Create Federated Queries for Integration into your IT
landscape
http://scet.berkeley.edu/wp-content/uploads/Big-data.jpg
My problem?
I am a #sqlfamily guy!
I love my SQL Server!
I never touched Java, only
for good coffee!
I I want T-SQL!
SQL
I don‘t want to be a
zookeeper!
I just want to jumpstart into
big data jobs!
However, big data is hard…

Top 3 Challenges To Adopting Big Data


Current Big Data promises and real life

Current State Of Technology

Hadoop promises to deliver on the need, and HAS:

large and vibrant


adoption growth lots of innovation rich eco-system
community

But currently, Hadoop IS NOT:

cheap to bi-lingual –
easy to use enterprise friendly
operationalize it exclusive to Java
Azure Data Lake Analytics: Decision tree
My problem?
I want to do big data
when I have a real big
data problem!
Everything else I do with
I
my favourite SQL Server!
SQL
If I do big data, it will
never stand alone, I
need to integrate!
Azure Data Lake
as part of Cortana Analytics Suite
Information Big Data Stores Machine Learning Dashboards and
Management and Analytics Visualizations
Power BI
Business
apps
Azure Azure Azure Personal Digital Assistant
Data Factory SQL Data Warehouse Machine Learning

Cortana
Azure
Stream Analytics People
Azure
Custom Data Catalog Perceptual Intelligence
apps Azure
HDInsight (Hadoop) Face, vision
Azure
Data Lake Store Speech, text
Azure Azure
Event Hub Data Lake Analytics Business Scenarios
Sensors Recommendations,
and devices Automated
customer churn,
Systems
forecasting, etc.

DATA INTELLIGENCE ACTION


Traditional business analytics process
1. Start with end-user requirements to identify desired reports
and analysis
2. Define corresponding database schema and queries
3. Identify the required data sources
4. Create a Extract-Transform-Load (ETL) pipeline to extract
required data (curation) and transform it to target schema
(‘schema-on-write’)
5. Create reports. Analyze data

Dedicated ETL tools (e.g. SSIS)


Relational Queries
ETL pipeline
LOB Results
Applications Defined schema

All data not immediately required is discarded or archived


New big data thinking: All data has value
All data has potential value
Data hoarding
No defined schema—stored in native format
Schema is imposed and transformations are done at query time (schema-on-read).
Apps and users interpret the data as they see fit

Iterate

Gather data
Store indefinitely Analyze See results
from all sources
Why data lakes?

A data lake is a massive,


easily accessible,
centralized repository
of large volumes
of structured and
unstructured data.
http://www.pwc.com/us/en/technology-forecast/2014/cloud-computing
ADL Store: Ingress
Data can be ingested into Azure Data Lake Store from a variety of sources

Apache
SQL Flume

Azure SQL DB
Server logs
Built-in
copy service

Azure Data Factory Azure Storage Blobs


Azure SQL DW
Apache Sqoop ADL Store .NET SDK
JavaScript CLI
Azure Portal
Table Storage Azure PowerShell
Azure tables

On-premises databases Azure Event Hub Custom programs


ADL Store: Egress
Data can be exported from Azure Data Lake Store into numerous targets/sinks

Built-in
SQL copy service

Azure SQL DB Azure Storage Blobs

Azure Data Factory

Apache Sqoop
Azure SQL DW
ADL Store

.NET SDK
Table Storage
JavaScript CLI
Azure Tables Azure Portal
Azure PowerShell

On-premises databases Custom programs


App Development – Languages and Tools
Azure Data Lake Store supports multiple languages for application
development

Java Developers WebHDFS

C++ Developers LibWebHDFS

.NET Developers Azure .NET SDK

Other languages x-plat SDK

Note: If you are using Hadoop (Map Reduce programs or Hive or HBase) or Spark, then you will not be programming directly to the Azure Data Lake Store as they all will transparently access
Azure Data Lake Store under the covers.
Developing scripting applications
 Provides native Windows and
cross-platform (Mac, Linux)
scripting experience
 Scripting operations include Azure PowerShell cmdlets
JavaScript CLI
Create new directories
Scripting
Listing the contents of a directory
Upload files to directory
Delete files/directories
Rename files/directories
Azure Data Lake Store

Federated queries:
Query data where it lives
Easily query data in multiple Azure data stores without moving it to a single store

Benefits
Avoid moving large amounts of data across the Azure
network between stores Storage Blobs

Single view of data irrespective of physical location


Query
Minimize data proliferation issues caused by U-SQL
maintaining multiple copies Query
Result
Single query language for all data Azure Data Azure SQL
Lake Analytics in VMs
Each data store maintains its own sovereignty
Design choices based on the need

Azure
SQL DB
Combining RowSets
U-SQL provides a number of operators to combine RowSets

 LEFT OUTER JOIN  EXCEPT ALL


 LEFT INNER JOIN  EXCEPT DISTINCT
 RIGHT INNER JOIN  INTERSECT ALL
 RIGHT OUTER JOIN  INTERSECT DISTINCT
 FULL OUTER JOIN  UNION ALL
 CROSS JOIN  UNION DISTINCT
 LEFT SEMI JOIN
 RIGHT SEMI JOIN
How do I integrate? Let‘s do a sample

OK, I have:
Transactional Data on my
SQL Azure DB (PaaS)
I

Master Data on my
SQL

SQL Server 2016 (IaaS)


and a bunch of files on my
Azure Blob Storage!
How do I integrate? Let‘s do a sample

Master Data Management Transactional Data on Market Research Data


on SQL Server 2016 (IaaS) SQL Azure DB (PaaS) as CSV Files on Prem

Merge Internal and Marketresearch Data


standardize it with MDS and Analyse it in
SQL Server or Power BI maybe?
How do I integrate? Let‘s do a sample
 3 Data Sources from different systems
 For CSV files we use the ADLA internal Extractors,
which can read files that are stored on the Azure Data Lake
and on the Azure Blob Storage
 For SQL Azure DB and for SQL Server 2016 (in IaaS) we use
Federation a service in ADLA where you can link in relational
sources to your queries
 The query result will be an output to the Azure Data Lake or
to Azure Blob Storage
How do I integrate? Let‘s do a sample
 CSV files:
 We have > 100 files from Market Research
 We upload them to Azure Blob Storage by using
Upload Task from Azure Feature Pack for SSIS
https://msdn.microsoft.com/de-de/library/mt146770(v=sql.120).aspx

 We connect Azure Blob Storage as a Data Source


to Azure Data Lake Analytics
 Now we have access to storage, that we can use
from on prem and from ADLA = easy!
 ADLS is currently not implemented from SSIS by
Microsoft… but…
DEM0
Loading Files with SSIS
Azure Feature Pack
How do I integrate? Let‘s do a sample

 Federation Step by Step:


 1: Create a secret in your ADLA database for each federated system
with the help of powershell
 2: Create a credential in your ADLA database for each federated
system with the help of U-SQL
 3: Create a data source in your ADLA database for each federated
system with the help of U-SQL
DEM0
Creating Federated Systems
Visual Studio & U-SQL
How do I integrate? Let‘s do a sample

 Create secrets for your PaaS and IaaS databases in the database
(here: master) of your ADLA account
 Check if your Network Settings allow access from outside to the
DBs
How do I integrate? Let‘s do a sample

 Create the credential with an existing user name (SQL


Authentication) and the identity of the secret you created in
step one in PowerShell
How do I integrate? Let‘s do a sample

 Create data source with the type of AZURESQLDB and


SQLSERVER based on your previous defined credentials and a
providerstring with the name of the databases you want
connect to
How do I integrate? Let‘s do a sample
How do I integrate? Let‘s do a sample

 Federated Queries:
//Get Data from Azure SQL Database //Get Data from SQL Server 2016 IaaS Database on Azure
@sales_internal = SELECT @country_mds = SELECT *
Countryiso FROM EXTERNAL COUNTRY_MASTER EXECUTE @"
,Market SELECT
,Product [Code]
,SYear ,[Name] AS Country
,SMonth ,[ISO2] AS Countryiso
,Sales ,[Capital]
,Units ,[Area]
FROM EXTERNAL SALES EXECUTE @" ,[Population]
SELECT FROM [TransformationDB].[dbo].[vw_GetCountryMaster]";
[COUNTRYISO] AS Countryiso
,[MARKET] AS Market
,[PRODUCT] AS Product
,[SMONTH] AS SMonth
,[SYEAR] AS SYear
,[SALES] AS Sales
,[UNITS] AS Units
FROM [dbo].[Sales]";
How do I integrate? Let‘s do a sample

 Federated Queries:
// Union the two streams from Internal Sales and External Marketresearch
// Calculate Sales
@sales =
@sales_external=
SELECT Countryiso,
SELECT C.Countryiso,
Market,
S.Market,
Product,
S.Product,
SYear,
S.SYear,
SMonth,
S.SMonth,
Sales,
SUM((Ttype == "SALES") ? SValue : 0 ) AS Sales,
Units,
SUM((Ttype == "SALES") ? SValue : 0 ) AS Units
"External" AS Source
FROM @sales_external_unp AS S INNER JOIN @country_mds AS C ON
FROM @sales_external
S.Country == C.Country
UNION
GROUP BY
C.Countryiso,
SELECT Countryiso,
S.Market,
Market,
S.Product,
Product,
S.SYear,
Convert.ToInt32(SYear) AS SYear,
S.SMonth;
Convert.ToInt32(SMonth) AS SMonth,
Sales,
Units,
"Internal" AS Source
FROM @sales_internal;
How do I integrate?

 Simple Data Transformation flow


 You can do this in Hadoop or SSIS as well!
 But:
 Can you simply use your SQL Skills?
 Can you do this as a service?
 Can you do this with 1TB of data
automatically scaled?
How do I integrate? Full Circle

 Using ADF

Azure Data Factory you can


 Load Files to blob storage
via the Data Management Gateway
 Running U-SQL Scripts
to transform data

 Orchestrate your transformation


process in the cloud
How do I integrate? Full Circle

https://azure.microsoft.com/en-
us/documentation/articles/data-factory-
data-movement-activities/#supported-
data-stores-and-formats
How do I integrate? Access ADLA Results

 Azure Data Lake Analytics results can only be out put to:

 Azure Data Lake Storage


 Azure Blob Storage

 How can you integrate this into your on prem world without
downloading data
 POLYBASE!
 Build an external table with POLYBASE to access the data
 Works actually only with Azure Blob Storage – not ADLS
DEM0
Polybase to Access Azure
Data Lake Analytics results
How do I integrate? Trigger ADLA Jobs

 Visual Studio
 ADF
 Azure Portal
 .Net (SDKs)
 PowerShell:
#Azure Data Lake Analytics Job Execution via PowerShell

$ADLA_Account = "adlaoh22"
$usql = "C:\Users\oengels\OneDrive\PASS\Summit2016\FederatedQuery.usql„

Submit-AzureRmDataLakeAnalyticsJob -Name "PS_FederatedQueryMarketSales"


-AccountName $ADLA_Account –ScriptPath $usql
How do I integrate?

Woohoo!
I That is nice technology, I
ADLA will start today with the
preview!
Thank You!

Вам также может понравиться