Вы находитесь на странице: 1из 42

Data warehousing in the cloud

Ekta Parashar
Solutions Architect Manager, AISPL

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
• Quick recap of Amazon Redshift

• Overview of updates in the past 6-12 months

• Redshift best practices

• Additional resources

• Open Q&A

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Who am I ?
• Solution architect manager

• Based in Mumbai, India

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon SWF Amazon VPC IAM Amazon EC2

OLAP

MPP

Columnar
PostgreSQL
Amazon Redshift

Amazon Amazon
Amazon S3 AWS KMS
Route 53 CloudWatch
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
November 2018

February 2013

> 140 significant patches

> 220 significant features

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift architecture SQL clients/BI tools

JDBC/ODBC
Massively parallel, shared-nothing columnar architecture
128GB
16
RAM
Leader node Leader
cores
node
16TB
SQL endpoint
disk
Stores metadata
Coordinates parallel SQL processing
128gb 128gb 128gb
Compute nodes 16
ram
Compute 16
ram
Compute 16
ram
Compute
Local, columnar storage cores
node cores
node cores
node
16TB 16TB 16TB
Executes queries in parallel disk disk disk
Load, unload, backup, restore Load
Unload
Backup Amazon
Amazon Redshift Spectrum nodes ... Redshift
Restore 1 2 3 4 N
Execute queries directly against Spectrum
Amazon Simple Storage Service (Amazon S3)
Amazon S3
Use the ETL, SQL, and BI tools you love
Data Integration Business Intelligence Systems Integrators

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift: What’s new and what’s coming
We’re innovating across the 4 things that matter most to customers

Speed Scale Simplicity Security

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Result caching
Compiled code cache COPY operation when Late materialization
ingesting data from Parquet
and ORC formats
Support for lateral
column alias reference Queries operating over CHAR Single-row inserts
and VARCHAR columns Queries with intermediate subquery
results that can be distributed
Query processing
2x the number of tables
Query planning improvements in a cluster Complex EXCEPT
subqueries
Cluster
resize operations
DC2 nodes
Improvements to speed
Short query
acceleration
Hash join memory utilization
optimizations and cache line
Resource management for prefetching
Queries that refer to stable
Improvements for the COPY memory-intensive queries functions with constant expressions
operation when ingesting data Expressions on the partition
from Parquet and ORC formats
Faster string manipulation columns of external tables

Performance improvement for


Commit processing queries that refer to stable functions
Joins involving large numbers of
NULL values in a join key column
enhancements Query rewrites that push down selective joins over constant expressions
into a subquery
*Since re:Invent 2017
© 2018,
2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://www.amazo
n.com

Amazon Redshift is now

10x faster
than it was two years ago

More than 200 features and enhancements released due to


lessons learned from more than 10,000 customer deployments
processing more than 2 exabytes of data every day
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Concurrency Scaling New!

1
Automatically Consistently fast
creates more performance even
clusters on- with thousands of
demand concurrent queries

2 Backup Caching Layer 3


No advance Quickly scale to
Hydration serve changing
required query workload

Amazon Redshift
Managed S3

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Enabling Concurrency Scaling

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Concurrency Scaling configuration

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift’s throughput scales with
concurrent users

12000 Throughput scales linearly


Queries per Hour (QpH)

10000
97% of users will never see a
8000
charge for auto-scale
6000 resources
4000
For every 24 hours your main
2000 cluster is in use, we’ll provide a
0 one-hour credit for concurrent
5 40 80 120 150 180 cluster usage
Number of concurrently active users

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Renaming external
table columns
DATE data type Push the LENGTH()
Support for Parquet, ORC, Avro, CSV, string function to
Retrieving metadata for late-binding Spectrum
and other open file formats viewsSupport for Enhanced VPC Routing

Query external tables


New Amazon
during a resize operation Redshift Spectrum
regions

Improvements to scale
Specify the root of an Integrate seamlessly with your data lake Arrays of arrays and
arrays of maps
S3 bucket as the source
for an existing table

Table property to specify the file


Spectrum support for compression type for external tables
Spectrum support
JSON and ION IN-list predicate processing
in Spectrum scans Map data types in
for nested data
Spectrum to contain
Spectrum queries with arrays
ALTER TABLE ADD/DROP aggregations on partition columns
COLUMN for external tables is now
supported via standard JDBC calls
*Since re:Invent 2017
© 2018,
2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift elastic resize (GA)
Scale up and down in minutes

JDBC/ODBC

Adds nodes to Run queries Amazon


Amazon faster in busy Leader Node Redshift
periods Cluster
Redshift cluster

CN1 CN2 CN3 CN4


Minimal Scale compute
transition time and storage on-
demand
Backup

Amazon Redshift Managed S3

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Sizing an Amazon Redshift cluster for production
Estimate the uncompressed size of the incoming data
Assume 3x compression (actual can be > 4x)
Target 30-40% free space (resize to add/remove storage as needed)
• Disk utilization should be at least 15% and less than 80%
Based on performance requirements, pick SSD or HDD
• If required, nodes can be added for increased performance

Example:
20 TB of uncompressed data ~= 6.67 TB compressed
Depending on performance requirements, recommendation:
• 4xDC2.8xlarge or 5xDS2.xlarge = ~10 TB of capacity

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resizing Amazon Redshift
Classic resize
• Data is transferred from old cluster to new cluster (within hours)
• Change node types
• Enable/disable full disk encryption

Elastic resize
• Nodes are added/removed to/from existing cluster (within minutes)

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Classic resize
SQL clients/BI tools

JDBC/ODBC

Leader
Leader
Binary data transfer node
node

Node 1 Node 2 Node 3 Node 1 Node 2 Node 3 Node 4

• Source cluster is placed into read-only mode during resize


• All data is copied and redistributed on the target cluster
• Allows for changing node types

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Elastic resize
SQL Clients/BI Tools

JDBC/ODBC 15 ±10 min

Elastic resize is
requested
Leader
Node

Node 1 Node 2 Node 3 Node 4 • At the start of elastic resize, we take an


automatic snapshot to Amazon S3 and
provision the new node(s)

Backup Backup Backup • Cluster is fully available for read and writes

Amazon S3

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Elastic resize
SQL Clients/BI Tools

JDBC/ODBC 15 ±10 min ~4 min

Elastic Elastic
Elastic
resize is resize
Leader resize
requeste finishes
Node starts
d

Node 1 Node 2 Node 3 Node 4 • Slices are redistributed to/from nodes

• Inflight queries/connections are put on hold

Backup Backup Backup • Some queries within transactions may be


rolled back
Amazon S3

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Elastic resize
SQL Clients/BI Tools

JDBC/ODBC 15 ±10 min ~4 min Node rehydrated from Amazon S3

Elastic Data transfer


Elastic Elastic finishes
resize is resize
Leader resize
requeste finishes
Node starts
d

Node 1 Node 2 Node 3 Node 4 • Cluster is fully available; data transfer continues
in the background

Restore
• Hot blocks are moved first
Backup Backup Backup

Amazon S3

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
When to use elastic vs. classic resize
Elastic resize Classic resize
Scale up and down for workload
spikes ✔
Incrementally add/remove storage

Change cluster instance type (SSD
←→ HDD) ✔

If elastic resize is not an option


because of sizing limits ✔
Limited availability during resize < 5 minutes (parked 1-24 hours (read-only)
connections)

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Efficiency of backup performance CloudWatch metrics for
Enhancements to workload execution
breakdown
Automatic vacuum delete VACUUM DELETE

Cluster resize Amazon Redshift Advisor for best Query editor


Amazon CloudWatch
metrics for query practice recommendations
Manage components throughput by WLM
of a multi-part query queues
in the AWS console

Improvements to
Current and trailing tracks
for release updates

Lateral column
alias reference
simplicity
Stream real-time data in CloudWatch metrics
Parquet or ORC formats for query duration by Cluster resize operations
using Amazon Kinesis Data WLM queues Short query
Firehose acceleration is
Query Monitoring Rules (QMR)
now support 3x more rules self-optimizing
Free upgrade from DC1
RIs to DC2
DISTSTYLE AUTO
CloudWatch
query runtime breakdown metric distribution style
CloudWatch metrics for query
throughput, query duration
SUMMIT © 2018,
2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift query editor
Launched in October!

Query data
directly from
the AWS console
Results are instantly
visible within the console
No need to install
and set up an external
JDBC/ODBC client

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Run stored procedures in Amazon Redshift
Bring your existing stored
procedure and run it in
Amazon Redshift.

Migrating to Amazon where


Redshift is even easier! the data must efficiently
run ETL, data validation,
and custom business logic.

Amazon Redshift will support


stored procedures in PL/pgSQL
format, enabling you to bring
your existing stored procedures
to Amazon Redshift.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift Advisor

Provides automated >96% of Actionable Intelligent


recommendations clusters WLM recommendations
to help optimize database have tailored COPY, storage, for tuning based on
performance and feedback and system continuous workload
decrease operating costs maintenance advice analysis

© 2018,
2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift performs administration automatically
ALL

Slice 1 Slice 2 Slice 3 Slice 4

Automates data Node 1 Node 2

distribution in tables for


improved performance and Auto keyA keyB
KEY
keyC keyD

disk space utilization. distribution key


Slice 1 Slice 2 Slice 3 Slice 4

Provides intelligent Node 1 Node 2

No more messing recommendations for


with distkeys! tuning based on EVEN
continuous workload
analysis. Slice 1 Slice 2
Node 1
Slice 3
Node 2
Slice 4

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Federated
authentication with
Encrypt your previously unencrypted single sign-on Cross-region backups for
cluster with 1 click KMS-encrypted clusters

Default access Tag-based


privileges permissions

Utilization Improvements to Enhanced


VPC routing
alerts for RIs
security
SAS integration
enhancements
Encrypt unloaded data using S3
server-side encryption with
AWS KMS keys Superusers can grant users
IAM roles with COPY access to all rows in
and UNLOAD selected system tables
commands

*Since re:Invent 2015


© 2018,
© 2019,
Amazon
Amazon
WebWeb
Services,
Services,
Inc.Inc.
or its
or affiliates.
its affiliates.
All rights
All rights
reserved.
reserved.
Integration with Amazon Lake Formation Coming Soon!

IAM KMS
AI Services

OLTP ERP

Amazon Athena

CRM LOB

Data Amazon EMR


Catalog

Amazon ES
Sensors Devices

Amazon
Redshift

Kinesis
Social Web
Amazon
QuickSight

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift
New features
Improving 10x average
Elastic Concurrency
Speed short query performance
resize Scaling
acceleration improvement

Spectrum
Unload
Scale Request
to Parquet
Accelerator

Auto-
WLM Support for
Deferred Vacuum & Snapshot Auto Data
Simplicity Maintenance Auto- Scheduler Distribution
Concurrency stored
Setting procedures
Analyze

Amazon Lake
Security Formation
integration

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
More than 10K
customers use

AWS

for their data


warehouse workloads

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Nasdaq uses AWS to build a data lake

• Migrate legacy on-premises warehouse to


Amazon Redshift
• 4.8B rows inserted per trading day
Flat Amazon
EMR (orders, trades, quotes)
files
SQL • Ingest data from multiple sources, validates,
Amazon clients
S3 and stages in S3
Amazon • Amazon Redshift reads data out of S3 for fast
Operational
databases Redshift queries
• Presto on Amazon EMR and S3 used for
analysis of massive historical dataset
Data from all 7 exchanges operated
by Nasdaq
(orders, quotes, trade executions)

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data lake analytics with Amazon Redshift Spectrum

• Seamlessly analyzing open file formats directly


NUVIAD is a mobile marketing platform providing in Amazon S3 to provide fresh, up-to-the-
professional marketers, agencies, and local businesses with minute insights
hyper-targeted analytics at petabyte scale
• Unlimited analytics and query concurrency with
Amazon Redshift
Amazon
Redshift • Unlimited data capacity with Amazon S3
Spectrum
Data Amazon AWS BI Tools • 80% performance gain using Parquet data
sources S3 Glue
format
Amazon
Redshift

Amazon Redshift Spectrum is a game changer for us. Reports that took minutes to produce are now delivered in
seconds. We like the ability to scale compute on-demand to query petabytes of data in S3 in various open file
formats.”
– Rafi Ton
CEO, NUVIAD © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Redshift @ Dow Jones
We have spun up These dashboards
are powered by

different
dashboards different fact &
to support our dimension tables in
stakeholders Amazon Redshift

Across the whole of the The platform is


warehouse, we hold upwards of currently accessed by

of Data users

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
More places to learn about Amazon Redshift
Try it out for yourself: https://aws.amazon.com/redshift/

Modern Data Warehousing on AWS ebook


Sign up for Concurrency Scaling
Amazon Redshift and the art of performance optimization in the cloud by Werner
Vogels
Performance matters: Amazon Redshift is now up to 3.5x faster for real-world workloads
Amazon Redshift customer use cases
Building a Proof of Concept for Amazon Redshift

© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learn from AWS experts. Advance your skills and
knowledge. Build your future in the AWS Cloud.

Digital Training Classroom Training AWS Certification


Free, self-paced online Classes taught by accredited Exams to validate expertise
courses built by AWS AWS instructors with an industry-recognized
experts credential
Ready to begin building your cloud skills?
Get started at: https://www.aws.training/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Why work with an APN Partner?
APN Partners are uniquely positioned APN Partners with deep expertise in
to help your organization at any AWS services:
stage of your cloud adoption journey, AWS Managed Service Provider (MSP)
and they:
Partners
• Share your goals—focused on your APN Partners with cloud infrastructure and
success application migration expertise

• Help you take full advantage of all the AWS Competency Partners
business benefits that AWS has to offer APN Partners with verified, vetted, and validated
specialized offerings
• Provide services and solutions to
support any AWS use case across your AWS Service Delivery Partners
full customer life cycle APN Partners with a track record of delivering
specific AWS services to customers

Find the right APN Partner for your needs: https://aws.amazon.com/partners/find/


© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you for attending AWS Innovate
We hope you found it interesting! A kind reminder to complete the survey.
Let us know what you thought of today’s event and how we can improve the event
experience for you in the future.

aws-apac-marketing@amazon.com
twitter.com/AWSCloud
facebook.com/AmazonWebServices
youtube.com/user/AmazonWebServices
slideshare.net/AmazonWebServices
twitch.tv/aws
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Вам также может понравиться