Вы находитесь на странице: 1из 23

FLIGHT

DELAY
PREDICTION
IN SOEKARNO-HATTA
INTERNATIONAL AIRPORT
Alva Thomson -- IGN Putra Sattvika -- M. Reza Qorib
Project Background

╺ It is not uncommon for flight delays


to happen
╺ The passengers might just got
informed a couple of minutes
beforehand
╺ en route passengers most probably
have no way to know whether their
flight is on schedule or not
2
Project Aim

╺ Predict how long flight will be


delayed (in minutes)

3
Data Science Questions
╺ How to predict flight delay in
Soekarno-Hatta International Airport
using weather and flight information?

Supporting questions:

╺ Is there any correlation between each


features and delay time?

╺ Which airline has the most delayed flight?

4
Data Acquisition
Data acquisition, integration, and wrangling

5
Data Sources
Flight Data
Acquired from FlightRadar24 website every 6
hours

Meteorological Terminal Aviation Routine Weather


Report (METAR)
Acquired from BMKG website once every week

Airport Weather Data


Weather data are acquired from BMKG website
every 15 to 30 minutes

6
Features
Visibility
Day of the
Temperature
week
Dew Point
Wind Direction
Domestic/International
Air Pressure
Airline Code
Previous Flight Cloud Elevation
Wind Speed
Weather
Departure Hour
Cloud Condition
7
Data Analysis
Descriptive and Exploratory Analysis

8
Airlines with Worst Average
Delay (in minutes)
60

40

20

Korean Qantas Sriwijaya Asiana Philippine


Air Air Airlines Airlines
9
Features Correlation:
ANOVA

10
Experiments &
Results
Data preprocessing & Regression

11
Tools
╺ Python 3.6
╺ scikit-learn

12
Data preprocessing
╺ Numeric → Max-min scaler

╺ Categorical → One-hot encoding

13
Regression
╺ Linear Regression
╺ Kernel Ridge Regression

14
Performance Measure
╺ 10-fold Cross Validation
╺ Error measure:

15
Linear Regression
╺ CV RMSE: 20.9768 ± 1.8526
╺ Deemed to be inadequate
╺ More experiments are needed
with different algorithms

16
Ridge Regression
with polynomial kernel
Training CV
Degree CV StDev
RMSE RMSE

14 18.4433 20.0864 2.2570
15 18.2534 20.0745 2.2395
16 18.0641 20.0712 2.2172
17 17.8757 20.0770 2.1907
18 17.6891 20.0922 2.1605

Using degree = 16 yields best result


17
Unreported Experiments
╺ Label Encoding
╺ Polynomial Feature Interaction
╺ Feature Selection with RFE
(Linear Regression)
╺ Feature Selection with Lasso
Regression
╺ Feature Selection with RBM
18
Conclusion

19
Conclusion: Data Analysis
╺ Some international airlines
have worst delay time, namely
Korean Air and Qantas
╺ Correlation feature - delay
time
╶ Best: temperature
╶ Worst: destination
20
Conclusion: Regression
╺ Kernel Ridge Regression
╺ RSME: 20.0712 minutes
╺ Not a very good result

21
Conclusion: Future Works
╺ Introduce new features
╶ Real time airport congestion
╶ Weather in flight path
╶ Condition in destination airport
╺ Meta-algorithms (e.g. AdaBoost)
╺ Add more data

22
Thank you

23

Вам также может понравиться