Академический Документы
Профессиональный Документы
Культура Документы
1
About Me
Work [ed | s] @
Co-Chair for
Maintainer of
Spare time
2
Apache Airflow
What is it?
3
Apache Airflow : What is it?
In a :
Airflow is a platform to
programmatically author, schedule
and monitor workflows (a.k.a. DAGs
or Directed Acyclic Graphs)
4
Apache Airflow
UI Walk-Through
5
Apache Airflow : UI Walk-through
6
Airflow - Authoring DAGs
Airflow: Visualizing a DAG
7
Airflow - Authoring DAGs
Airflow: Author DAGs in Python! No need to bundle many XML files!
8
Airflow - Authoring DAGs
Airflow: The Tree View offers a view of DAG Runs over time!
9
Airflow - Performance Insights
Airflow: Gantt charts reveal the slowest tasks for a run!
10
Airflow - Performance Insights
Airflow: …And we can easily see performance trends over time
11
Apache Airflow
Why use it?
12
Apache Airflow : Why use it?
When would you use a Workflow Scheduler like
Airflow?
• ETL Pipelines
• Enforce SLAs
• E.g. Alerting if time or correctness SLAs are not met
• Configuration-as-code
• Usability - Stunning UI / UX
• Centralized configuration
• Resource Pooling
• Extensibility
15
Use-Case : Message
Scoring
Batch Pipeline Architecture
16
Use-Case : Message Scoring
S3 uploads every 15
minutes
enterprise A
enterprise B S3
enterprise C
17
Use-Case : Message Scoring
enterprise A
enterprise B S3
18
Use-Case : Message Scoring
enterprise A
enterprise B S3 S3
enterprise C
Spark job writes scored
messages and stats to
another S3 bucket
19
Use-Case : Message Scoring
enterprise A
enterprise B S3 S3
enterprise C
This triggers SNS/SQS
SNS
messages events
SQS
20
Use-Case : Message Scoring
enterprise A
enterprise B S3 S3
enterprise C
An Autoscale Group SNS
(ASG) of Importers spins
up when it detects SQS SQS
messages
ASG
Importers
21
Use-Case : Message Scoring
enterprise A
enterprise B S3 S3
enterprise C
SNS
The importers rapidly ingest scored
messages and aggregate statistics into SQS
the DB
ASG
DB
Importers
22
Use-Case : Message Scoring
enterprise A
enterprise B S3 S3
enterprise C
SNS
Users receive alerts of
untrusted emails & SQS
can review them in
the web app
ASG
DB
Importers
23
Use-Case : Message Scoring
enterprise A
enterprise B S3 S3
enterprise C
SNS
ASG
DB
Importers
24
Airflow DAG
25
Apache Airflow
Incubating
26
Apache Airflow : Incubating
Timeline
• Airflow was created @ Airbnb in 2015 by Maxime
Beauchemin
• Max launched it @ Hadoop Summit in Summer 2015
• On 3/31/2016, Airflow —> Apache Incubator
Today
• 2400+ Forks
• 7600+ GitHub Stars
• 430+ Contributors
• 150+ companies officially using it!
• 14 Committers/Maintainers <— We’re growing here
27
Thank You!
28
Apache Airflow
Behind the Scenes
29
Apache Airflow : Behind the Scenes
It ships with a
• DAG Scheduler
• Web application (UI)
• Powerful CLI
• Celery Workers!
30
Apache Airflow : Behind the Scenes
Webserver
Celery / RabbitMQ
3. The scheduler picks up new
schedules and distributes
work over Celery / Worker Worker Worker
RabbitMQ
35