Вы находитесь на странице: 1из 76

LATAM AI+ Tour

aka.ms/latamai
Agenda
• Best Practices
• Predictions/trends
• AI on Azure
Agenda
• Best Practices
• Predictions/trends
• AI on Azure
Not Best Practices – Canonical Cavestatistician

kill data
Best Practices – working with a team
• Working as a team means working together
• Pick a framework and use it
• CRISP-DM
• Microsoft Team Data Science Process
Best Practices - Frameworks
Best Practices - Frameworks
Best Practices – Source Control
• Key for collaboration
• The company is paying for these assets, make sure they're
usable
• Not always a normal skill for a statistician
• Some options
• Git
• Visual Studio Team Foundation Services/Server -> Azure DevOps
Best Practices – Source Control
Best Practices – Org Chart
• Should team include Data Engineering?
• Where should Analytics reside:
• IT: bias towards hardware or DevOps?
• LOB: multiple Analytic groups which all need to be linked
by a COP
• Analytics: need close relationship with the business for
domain expertise
Best Practices – building a team
• Talent level
• Headcount cost
• Level of resource needed
• Grow your own?
Best Practices – building a team
• Backgrounds
• all analytics/stats?
• size of team
• Roles
• business analysts, data engineers, developers, architects, machine learning
engineers, DevOps specialists, compliance specialists, security professionals
• Strengths
• Sales
• Consulting
Best Practices – building a team
• Languages/tools
• Data Engineering
• Front end/consumption
Best Practices – leveraging data
Best Practices – leveraging data (data protection)
• PII information
• GDPR
• bias blog post

https://en.wikipedia.org/wiki/List_of_data_breaches
Best Practices – prepping data
Before we begin remember:

Always keep your raw data as we sometimes


can lose information in processing
Best Practices – prepping data
Best Practices – prepping data
Best Practices – prepping data
Best Practices - ETL
• Tactical / Enterprise
• Schema on read?
Best Practices – Modeling Environment
• In-database/In-Spark/In-place
• Efficient queries
• Laptops and local server
• Cloud
Best Practices - Modeling
• Standard predictive analytics
• simple/explainable vs complex
• big data vs sampling
• hyper parameter tuning/learning
• model management/retraining
• AI
• pretrained model usage
• MaaS
• Dev talent
Best Practices – Showing ROI
• Because you can doesn’t mean you should; create an Analytic
Roadmap!

• Think about:
(1) Value
(2) Difficulty
(3) Time
Best Practices - Operationalization
• The best models are meaningless if you don't
do something with them
• Model scores can be used directly, but think
about decisioning (where we tie in with rules
engines or optimization)
Agenda
• Best Practices
• Predictions/trends
• AI on Azure
Predictions/trends - AI
Not everything is AI, BUT it is truly disruptive

Better ways to deal with


unstructured data benefits
all analytic functions
AI has great potential, but also risk

https://www.telegraph.co.uk/technology/2016/03/24/microsofts-teen-girl-ai-turns-into-a-hitler-loving-sex-robot-wit/
AI has great potential, but also risk

https://www.cnet.com/news/what-happens-when-ai-bots-invent-their-own-language/
AI has great potential, but also risk

https://www.propublica.org/article/facebook-enabled-advertisers-to-reach-jew-haters
AI has great potential, but also risk

https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-
ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G
AI has great potential, but also risk

https://www.techrepublic.com/article/google-home-mini-spied-on-user-thousands-of-times-a-day-sent-recordings-to-google/
Predictions/trends – Open Source
The battle has won been won;
the empire is still growing

• Previously we saw R and


Python
• Devs are also talking up
analytics and bringing
additional languages
• Particularly around deep
learning
http://www.asimovinstitute.org/neural-network-zoo/
Predictions/trends – Additional techniques
• Online learning
• Transfer learning
• Incremental learning
• Reinforcement learning
Predictions/trends - Data
• Aggregators will
continue to grow and
more companies will
begin to monetize
• Big Other: Surveillance
Capitalism and the
Prospects of an
Information Civilization
by Shoshana Zuboff
https://mashable.com/2017/03/23/senate-voted-to-let-internet-providers-
collect-and-sell-your-data/
Predictions/trends – Compute
Hardware arms race
• CPU / GPU / FPGA / ASIC / Edge device

Usage options
• Auto-scale / Serverless /
PaaS

Microsoft Research Project: https://aka.ms/project-brainwave


Predictions/trends - Bookends

Modeling
Predictions/trends - Automation
• Tracking changes to input stream
• Model management
• Push inference to event

Just remember lights out


processes can be risky
Predictions/trends – Meta-analysis
• What algs work best; without trial and error?
• What hyperparameters should we start with?
• How do we minimize the gap in a model
degrading vs our responding?
ORG CHART
ANALYTICS
Agenda
• Best Practices
• Predictions/trends
• AI on Azure
Software

Custom
Pre-built
Training |
Deployment

Hardware
AI
Azure Conversational
Agents
Bots
Search
On-premises Edge
Architectural Patterns
The Azure Data/Analytics Landscape

AZURE
AZURE AZURE IMPORT AZURE SQL DB AZURE COSMOS DB AZURE SQL DATA WAREHOUSE POWER BI
ANALYSIS SERVICES
DATA FACTORY EXPORT SERVICE

AZURE CLI AZURE SDK


AZURE DATA LAKE AZURE AZURE AZURE ML ML SERVER AZURE
AZURE STORAGE AZURE DATA LAKE ANALYTICS HDINSIGHT DATABRICKS DATABRICKS
BLOBS STORE

AZURE IOT HUB AZURE EVENT HUBS

AZURE SEARCH AZURE AZURE AZURE


AZURE HDINSIGHT DATABRICKS BOT SERVICE COGNITIVE SERVICES
STREAM ANALYTICS
DATA CATALOG
KAFKA ON
AZURE HDINSIGHT

AZURE EXPRESSROUTE AZURE AZURE NETWORK AZURE KEY OPERATIONS AZURE FUNCTIONS
VISUAL STUDIO
ACTIVE DIRECTORY SECURITY GROUPS MANAGEMENT SERVICE MANAGEMENT SUITE
The Azure BIG Data/Analytics Landscape
AZURE
AZURE AZURE IMPORT AZURE SQL DB AZURE COSMOS DB AZURE SQL DATA WAREHOUSE POWER BI
ANALYSIS SERVICES
DATA FACTORY EXPORT SERVICE

AZURE CLI AZURE SDK


AZURE DATA LAKE AZURE AZURE AZURE ML ML SERVER AZURE
AZURE STORAGE AZURE DATA LAKE ANALYTICS HDINSIGHT DATABRICKS
DATABRICKS
BLOBS STORE

AZURE IOT HUB AZURE EVENT HUBS

AZURE SEARCH AZURE AZURE AZURE


AZURE HDINSIGHT DATABRICKS BOT SERVICE COGNITIVE SERVICES
STREAM ANALYTICS
DATA CATALOG
KAFKA ON
AZURE HDINSIGHT

AZURE EXPRESSROUTE AZURE AZURE NETWORK AZURE KEY OPERATIONS AZURE FUNCTIONS
VISUAL STUDIO
ACTIVE DIRECTORY SECURITY GROUPS MANAGEMENT SERVICE MANAGEMENT SUITE
DevOps Clients

Management

Applications PaaS &


DevOps

App Frameworks
& Tools

Databases &
Middleware

Infrastructure
Customers System integrators ISVs Training partners
A D VA N C E D A N A LY T I C S PAT T E R N I N A Z U R E
Performing data collection/understanding, modeling and deployment

AZURE ML AZURE ML ML AZURE DATABRICKS SQL Server DATA BATCH AI


STUDIO SERVER (Spark ML) (In-database ML) SCIENCE VM
SENSORS AND IOT COSMOS DB
(UNSTRUCTURED)

APPLICATIONS

SQL DB

r
LOGS, FILES AND MEDIA
(UNSTRUCTURED) DATA LAKE AZURE COSMOS DB SQL DB DATA LAKE ANALYTICS AZURE DATABRICKS HDINSIGHT
SQL DW
STORE STORAGE

AZURE
ANALYSIS
SERVICES DASHBOARDS
BUSINESS / CUSTOM
APPS AZURE CONTAINER SQL Server
(STRUCTURED) DATA SERVICE (In-database ML)
FACTORY
Batch scoring on Azure for deep learning models
Big Data Real Time Architecture

AZURE DATABRICKS
(Spark ML, SparkR, sparklyr)
AZURE HDINSIGHT
(Kafka)

WEB & MOBILE APPS


AZURE DATA FACTORY
AZURE STORAGE AZURE DATABRICKS AZURE COSMOS DB
(Spark)

Polybase

AZURE DATA FACTORY


AZURE SQL DATA
WAREHOUSE

ANALYTICAL DASHBOARDS
Improved text prediction
Deep learning and natural language processing boosts search efficacy and tagging accuracy

Azure GPU Data Science


Virtual Machine

SQL
Azure
Microsoft Azure Machine Azure
Machine
SQL Server Learning Kubernetes
Learning
managed Service
deployment

Predictive web
SDK for AML service Jupyter application
Python Notebook

Machine learning
model
Case Studies
Drone-based electric grid
inspector powered by deep
learning

Challenge
• Traditional power line inspection services are costly
• Demand for low cost image scoring and support for
multiple concurrent customers
• Needed powerful AI to execute on a drone solution

Solution
• Deep learning to analyze multiple streaming data
feeds
• Azure GPUs support Single Shot multibox detectors
• Reliable, consistent, and highly elastic scalability
with Azure Batch Shipyards
eSmart architecture
Data Sources Ingest Prepare Analyze Publish Consume

Azure
10
01
Functions
10
01
Azure
Drone Blob On-prem
collected Azure Blob Azure Cosmos DB command
images Raw storage Batch Contain center
Batch upload inventory
of drone
Cosmos results and
images DB state
Docker Image
changes
DNN contained in
a Docker image

DATA INTELLIGENCE ACTION


Our strategy is to build best-in-class platforms and
productivity services for an intelligent cloud and an
intelligent edge infused with artificial intelligence
(“AI”).
2017 Annual Report
By the numbers:
• 18.5B Annual revenue
• 1-1.5B Users worldwide
• 50M Lines of code (Windows Server 2003)
By the numbers:
• $2.5B acquisition of Mojang in 2014
• 144M copies sold
• 75M MAU
By the numbers:
• 60M members
• 50k skills
• 20M companies
• 15M open jobs
• 60K schools
• $26.2B acquisition
By the numbers:
• WW market share 9%
• 12B monthly searches
• 6.1B Annual revenue
(with 13% ann. growth)
Microsoft Research
Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
http://nicolofusi.com
Azure ML Services – Automated ML
Capabilities
• Supported Frameworks – Scikit-Learn and TensorFlow • Integrated with Azure Machine Learning
• Azure Data bricks (ADB) Integration*
• Scenarios – Supervised Learning : Classification & • Run AutoML training jobs on ADB spark clusters
Regression • Azure Notebooks
• Python SDK for deployment and hosting for inference. • Power BI Integration*
• Enables citizen data scientists to train models for churn
• Notebook integration – Azure & Jupyter
prediction etc. on Dynamics CRM & Power BI data
• Input Data – Numeric and Text data. • ONNX framework*
• Enables to train models in any framework (Tensor flow,
• Model Training – Local Machine, Remote Windows and
CNTK etc.), and inference in any run time
Linux DSVM, Batch AI
• VS Tools for AI*
• Faster model training using multiple cores and parallel • Enables developers to quickly build models and deploy in
experiments Azure
• Support for custom and automated cross validation • ML.Net*
splits. • Enables C# and .NET developers build ML models using
AutoML
• Transparent – View run history, model explanation and • CI/CD Integration*
model selection • Enables Train / Test / Deploy in continuous manner with
• Secure & Compliant – GDPR, ISO 27001, SOC, EU minimal human intervention
Model Clauses, HIPAA ready at GA • Azure ML Packages*

In Development*
Project BrainWave
A Scalable FPGA-powered DNN Serving Platform

L1 Network switches Instr Decoder


& Control
L0 L0
FPGAs Neural FU
F F F F F F

Pretrained DNN Model Scalable DNN Hardware BrainWave


in CNTK, etc. Microservice Soft DPU
Planet scale Real-Time Inferencing for GeoAI

vs
Freely Available Imagery Labeled Training Data Inferred Land Cover Map
Real-Time Low-Latency Inferencing with FPGAs
Setup:
800 FPGAs on Azure
195 Million Images; 20TB
Real-time inferencing 1 image at a time

Results: 415K
images/sec

415K inferences/second @ 1.8ms


latency
10.6 minutes total
Order of magnitude better Price/Perf FPGA
over CPU & GPU (V100 with TensorRT
Using Project Brainwave for Land Use Mapping
FPGAs for ultra-fast inferencing different types of land use for the entire United States ( ESRI NAIP Data, 20+ TB)

Data Build Train Deploy Intelligent Apps

Satellite Images
for the Entire US

Geo AI Data Azure Machine


Science Virtual Learning
Machine

NAIP Data
20TB, 200M images

Azure Batch AI
Stored on
10 Visual Studio
01 Azure Premium
Storage Tools for AI

Land Classification Model Ultra-fast Inferencing


ResNet-50 using FPGAs

D ATA INTELLIGENCE ACTION


Project Brainwave under the hood
Azure ML integration
End-to-end deployment and model lifecycle support

Hardware Accelerated Model Gallery – Accelerated models available through Azure ML


Model Gallery Initially: ResNet50 – object classification at blazing speed and affordable cost
Just announced: ResNet 152, VGG-16, SSD-VGG, and DenseNet-121

Compiler & Runtime


Brainwave Low-friction path to deploying accelerated models – no hardware knowledge required
Easily convert models to scalable hardware microservices
Compiler & Runtime
Federated runtime supports TensorFlow & CNTK and orchestrates across FPGA & CPU

“Brainslice” Soft Neural Processing Unit (NPU)


“Brainslice” Soft Exposes optimized DNN operators for blazing fast inference
Flexible and extensible to support fast-changing AI algorithms
Neural Processing Unit Allows us to reach higher performance at small batch sizes – important for real-time AI
https://aka.ms/LatamAI

¡Muchas Gracias!

Вам также может понравиться