Вы находитесь на странице: 1из 77

DATA VISUALISATION USING PYTHON

Dr.Roopa Chandrika R
Prof,Dept of IT
Malla Reddy College of Engineering and Technology
Hyderabad
mroopachandrika@gmail.com
Why Data Visualisation?
Example:
Query: CM of Tamil Nadu is requesting the health care department to produce the
cases infected by corona virus
Ans : Reports are generated and produced
First format of report presented :
• TAMIL NADU : 1,149 new coronavirus cases were reported as of 8:00 AM on Jun 1
in Tamil Nadu, according to data released by the Ministry of Health and Family
Welfare.
• This brings the total reported cases of coronavirus in Tamil Nadu to 22,333.
• Among the total people infected as on date, 12,757 have recovered and 173 have
passed away.
• District-wise breakup is available for 1520 of the total 22333 cases reported in the
state.
• Chennai had the highest number of Covid-19 cases at 303 confirmed infections

22-06-2020 DV - PYTHON 2
Second format of report presented:

State Total Cases New Cases Deaths

Maharashtra 24427 1026 921


Gujarat 8904 362 537
Tamil Nadu 8718 716 61
Delhi 7639 406 86
Rajasthan 4021 138 117
Madhya Pradesh 3986 201 225
Uttar Pradesh 3664 91 82
West Bengal 2173 110 198
Andhra Pradesh 2051 33 46
Punjab 1914 37 32
22-06-2020 DV - PYTHON 3
Third format of report presented:

22-06-2020 DV - PYTHON 4
• Third format presented is an easier and quicker way to understand
the information required
• When we have information that include thousands or extensive
variables and relationships
“A picture is worth a thousand words”
• Complicated data, start to make sense when presented graphically
• Graphical results helps organizations focus on areas that are most
likely to influence their most important goals
• We can identify new patterns
• To find relationships and understand the data

22-06-2020 DV - PYTHON 5
Common general types of data visualization:

• Charts
• Tables
• Graphs
• Maps
• Infographics
• Dashboards

22-06-2020 DV - PYTHON 6
More specific examples of methods to visualize data:
• Area Chart
• Bar Chart
• Box-and-whisker Plots
• Bubble Cloud
• Cartogram
• Dot Distribution Map
• Gantt Chart
• Heat Map
• Histogram
• Matrix
• Scatter Plot (2D or 3D)
• Text Tables
• Timeline
• Treemap
• Word Cloud

22-06-2020 DV - PYTHON 7
Common data visualization tools
• R language –ggplot2, ggplot and Python are used in academia.
• The most familiar tool for ordinary users is MS Excel.
• Commercial products include Tableau, FineReport, Power BI
Commonly used Python Libraries For Data Visualization
Matplotlib
• Matplotlib is the most popular data visualization library of Python
• It is a 2D plotting library.
Plotly
• Plotly is a web-based toolkit to form data visualisations.
Seaborn
• The Python data visualization library of Seaborn is a library based on Matplotlib.
ggplot
• ggplot is also a declarative style library
22-06-2020 DV - PYTHON 8
22-06-2020 DV - PYTHON 9
22-06-2020 DV - PYTHON 10
Different types of chart using Matplotlib
• Bar plot ( distribution)
• Histogram plot ( distribution)
• Area plot ( deviation )
• Box plot ( distribution)
• Scatter plot (correlation)
• Waffle chart ( composition)
• Word Cloud (word frequency analysis)

22-06-2020 DV - PYTHON 11
22-06-2020 DV - PYTHON 12
Import matplotlib
• the package is imported into the Python script by
adding the following statement
• import matplotlib.pyplot as plt
• #Plotting to canvas
• plt.plot( [1,2,3] , [4,5,1] )
• #display the plot
• plt.show()

22-06-2020 DV - PYTHON 13
• import numpy as np
• from matplotlib import pyplot as
• plt x = np.arange(1,11)
• y=2*x+5
• plt.title("Matplotlib demo")
• plt.xlabel("x axis caption")
• plt.ylabel("y axis caption")
• plt.plot(x,y)
• plt.show()

22-06-2020 DV - PYTHON 14
Sr.No. Character & Description

1 '-' Solid line style


Color plotting 2
3
'--‘
'-.'
Dashed line style
Dash-dot line style

• Character color 4
5
': ‘
'.'
Dotted line style
Point marker

• 'b‘ blue 6
7
',' Pixel marker
'o' Circle marker

• 'g‘ Green 8
9
'v' Triangle_down marker
'^' Triangle_up marker

• 'r‘ Red 10
11
'<' Triangle_left marker
'>' Triangle_right marker

• 'c‘ Cyan 12
13
'1' Tri_down marker
'2' Tri_up marker

• 'm‘ Magenta 14
15
'3' Tri_left marker
'4' Tri_right marker

• 'y‘ Yellow 16
17
's' Square marker
'p' Pentagon marker

• 'k‘ Black 18
19
'*' Star marker
'h' Hexagon1 marker

• 'w‘ White 20
21
'H' Hexagon2 marker
'+' Plus marker
22 'x' X marker
23 'D' Diamond marker
24 'd' Thin_diamond marker
22-06-2020 25 '|' Vline marker
DV - PYTHON 15
26 '_' Hline marker
• import numpy as np
• import matplotlib.pyplot as plt
• x = np.arange(1,11)
• y=2*x+5
• plt.title("Matplotlib demo")
• plt.xlabel("x axis caption")
• plt.ylabel("y axis caption")
• plt.plot(x,y,"bo")
• plt.show()

22-06-2020 DV - PYTHON 16
Bar plot
• A barplot (or barchart) is one of the most common type of plot
• It shows the relationship between a numerical variable and
a categorical variable.
• Example of bar chart is displaying the height of several individuals
• The length of the bars represents the magnitude/size of the
feature/variable.
• A barplot can also display values for several levels of grouping

Mercury Venus Earth Mars Jupiter


156 160 163 139 212
22-06-2020 DV - PYTHON 17
Basic Bar plot horizontal Bar plot Color Bar plot

22-06-2020 DV - PYTHON 18
matplotlib.pyplot.bar
• matplotlib.pyplot.bar(x, height, width=0.8, color=‘b’, align='center‘)
• The x coordinates of the bars
• The height(s) of the bars
• Width of the bars ( default =0.8)
• align : {'center', 'edge'}

22-06-2020 DV - PYTHON 19
import numpy as np
import matplotlib.pyplot as plt
(vertical bar chart)
objects = ('Python', 'C++', 'Java', 'Perl', 'Scala', 'Lisp')
x = np.arange(len(objects)) // ie., 1 to 6
y = [10,8,6,4,2,1]

plt.bar(x, y, align='center', alpha=0.5)


plt.xticks(x, objects)
plt.ylabel('Usage')
plt.title('Programming language usage')

plt.show()

Python C++ Java Perl Scala Lisp


10
22-06-2020 8 6 4 2
DV - PYTHON 1 20
Horizontal bar chart
• import numpy as np
• import matplotlib.pyplot as plt
• (horizontal bar chart)
• objects = ('Python', 'C++', 'Java', 'Perl', 'Scala',
'Lisp')
• x = np.arange(len(objects))
• y = [10,8,6,4,2,1]

• plt.barh(x, y, align='center')
• plt.yticks(x, objects)
• plt.xlabel('Usage')
• plt.title('Programming language usage')

• plt.show()
22-06-2020 DV - PYTHON 21
Bar chart – To compare two data series
import numpy as np
import matplotlib.pyplot as plt

# data to plot
frank = (90, 55, 40, 65)
guido = (85, 62, 54, 20)

x = np.arange(4)
rect1 = plt.bar(x, frank, color='b',label='Frank')
rect2 = plt.bar(x, guido, color='g',label='Guido')
plt.xlabel('Person')
plt.ylabel('Scores')
plt.title('Scores by person')
plt.xticks([1,2,3,4], ('A', 'B', 'C', 'D'))
plt.legend()
plt.show()
22-06-2020 DV - PYTHON 22
Histogram
• An histogram is an accurate graphical representation of the
distribution of numerical data.
• It takes as input one numerical variable only.
• The variable is cut into several bins, and the number of observation
per bin is represented by the height of the bar.
• The shape of the histogram will be different with the number of bins
we set.

22-06-2020 DV - PYTHON 23
• Case study : Frequency distribution (continuous)of the number
(population) of new immigrants from the various countries to Canada
in 2013
• Answer :we need to plot a histogram
• Using numpy
• # histogram returning 2 values
• # count is the frequency , bin_edges are the bin ranges
• count,bin_edges = np.histogram([df_Canada[’2013’])
• print(count)
• print(bin_edges)

22-06-2020 DV - PYTHON 24
Histogram – using matploblib
• import numpy as np
• import matplotlib.pyplot as plt
• df_Canada[‘2013’].plot(kind = ‘hist’ , figsize=(8,5))
• plt.title(‘Histogram of Immigrants from 195 countries in 2013’)
• plt.xlabel(‘Number of Immigrants’)
• plt.ylabel(‘Number of Countries’)
• plt.show()

22-06-2020 DV - PYTHON 25
Basic -Area plot
• An area chart is really similar to a line chart, except that the area
between the x axis and the line is filled in with color or shading.
• It represents the evolution of a numerical
variable following another numerical variable
• In python, area chart can be done using the fill_between function
of matplotlib
• import numpy as np
• import matplotlib.pyplot as plt
• x=range(1,6)
• y=[1,4,6,8,4]
• # Area plot
• plt.fill_between(x, y)
• plot.show()
22-06-2020 DV - PYTHON 26
Area plot – more than one series
• # df_country is the dataframe
• import numpy as np
• import matplotlib.pyplot as plt
• df_country.plot(kind = ‘area’ , figsize=(10,10),stacked = ‘false’)
• plt.title(‘Immigration trend of Top 5 countries’)
• plt.legend()
• plt.ylabel(‘Number of Immigrants’)
• plt.xlabel(‘Years’)
• plt.show()

22-06-2020 DV - PYTHON 27
Box Plot
• Boxplot is also one of the most common type of graphic.
• It gives a nice summary of one or several numeric
variables.
• The line that divides the box into 2 parts represents
the median of the data
• The end of the box shows the upper and lower quartiles.
• The extreme lines shows the highest and lowest value
excluding outliers

22-06-2020 DV - PYTHON 28
Box plot
• import numpy as np
• import pandas as pd
• import matplotlib.pyplot as plt
• # load the dataset
• df = pd.read_csv("tips.csv")
• # display 5 rows of dataset
• df.head()
• df.boxplot(by ='day', column =['total_bill'], grid = False)

22-06-2020 DV - PYTHON 29
SCATTER PLOT
• Scatter plots are useful for showing the association or correlation
between two variables
• A correlation can be quantified, such as a line of best fit, that too can
be drawn as a line plot on the same chart, making the relationship
clearer.
• A Scatterplot (dot plot) displays the value of 2 sets of data on 2
dimensions.
• Each dot represents an observation.
• The position on the X (horizontal) and Y (vertical) axis represents the
values of the 2 variables
• It is common to provide even more information using colors or shapes
22-06-2020 DV - PYTHON 30
Two sets of scatter plots in same plot
• # Draw two sets of points
• # green dots
• plt.plot([1,2,3,4,5], [1,2,3,4,10], 'go')
• # blue stars
• plt.plot([1,2,3,4,5], [2,3,4,5,11], 'b*')
• plt.show()

22-06-2020 DV - PYTHON 31
Add the basic plot features:
Title, Legend, X and Y axis labels

• plt.plot([1,2,3,4,5], [1,2,3,4,10], 'go', label='GreenDots')


• plt.plot([1,2,3,4,5], [2,3,4,5,11], 'b*', label='Bluestars')
• plt.title('A Simple Scatterplot')
• plt.xlabel('X')
• plt.ylabel('Y')
• plt.legend(loc='best')
• # legend text comes from the plot's label parameter.
• plt.show()

22-06-2020 DV - PYTHON 32
22-06-2020 DV - PYTHON 33
Horizontal subplot
• t = arange(0.0, 20.0, 1)
• s = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

• subplot(2,1,1)
• title('subplot(2,1,1)')
• plot(t,s)

• subplot(2,1,2)
• title('subplot(2,1,2)')
• plot(t,s,'r-')

• show()

22-06-2020 DV - PYTHON 34
Vertical subplot

• t = arange(0.0, 20.0, 1)
• s = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

• subplot(1,2,1)
• xticks([]), yticks([])
• title('subplot(1,2,1)')
• plot(t,s)

• subplot(1,2,2)
• xticks([]), yticks([])
• title('subplot(1,2,2)')
• plot(t,s,'r-')

• show()
22-06-2020 DV - PYTHON 35
Draw 2 scatterplots in different panels
• # Left hand side plot
• # (nRows, nColumns, axes number to plot)
• plt.subplot(1,2,1)
• # green dots
• plt.plot([1,2,3,4,5], [1,2,3,4,10], 'go')
• plt.title('Scatterplot Greendots')
• plt.xlabel('X‘); plt.ylabel('Y')
• plt.xlim(0, 6); plt.ylim(0, 12)
• # Right hand side plot
• plt.subplot(1,2,2)
• # blue stars
• plt.plot([1,2,3,4,5], [2,3,4,5,11], 'b*')
• plt.title('Scatterplot Bluestars')
• plt.xlabel('X'); plt.ylabel('Y')
• plt.xlim(0, 6); plt.ylim(0, 12)
• plt.show()
22-06-2020 DV - PYTHON 36
Draw 2 scatterplots in different panels

22-06-2020 DV - PYTHON 37
Basic Matplotlib scatterplot - dataframe

import matplotlib.pyplot as plt


import numpy as np
import pandas as pd

# Create a dataset:
df=pd.DataFrame(
{
'x': range(1,101),
'y': np.random.randn(100)*15+ range(1,101)
}
)

# plot
plt.plot( 'x', 'y', data=df, linestyle='none', marker='o')
plt.show()

22-06-2020 DV - PYTHON 38
• Create dataset
N = 60
g1 = (0.6 + 0.6 * np.random.rand(N), np.random.rand(N),0.4+0.1*np.random.rand(N))
g2 = (0.4+0.3 * np.random.rand(N), 0.5*np.random.rand(N),0.1*np.random.rand(N))
g3 = (0.3*np.random.rand(N),0.3*np.random.rand(N),0.3*np.random.rand(N))
data = (g1, g2, g3)
colors = ("red", "green", "blue")
groups = ("coffee", "tea", "water")
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax = fig.gca(projection='3d')
for data, color, group in zip(data, colors, groups):
x, y, z = data
ax.scatter(x, y, z, alpha=0.8, c=color, s=30, label=group)
plt.title('Matplot 3d scatter plot')
plt.legend(loc=‘best’)
Note :zip() is to map the similar index of multiple containers so that they can be used just using as single entity.
gca() : get current axes function
Introduction to seaborn
• Seaborn is a library for making statistical graphics in Python
• It is built on top of matplotlib
• Seaborn uses matplotlib to draw plots
• It is closely integrated with pandas data structures
• Seaborn aims to make visualization a central part of exploring and
understanding data.
• It has dataset-oriented plotting functions that operate on dataframes
and arrays . The dataframes and arrays contain whole datasets
• Many tasks can be accomplished with only seaborn functions, but
further customization might require using matplotlib directly

22-06-2020 DV - PYTHON 40
Seaborn Functionality
• A dataset-oriented API for examining relationships between multiple variables
• Specialized support for using categorical variables to
show observations or aggregate statistics
• Options for visualizing univariate or bivariate distributions and
for comparing them between subsets of data
• For different kinds dependent variables seaborn does automatic estimation
and plotting of linear regression models
• Convenient views onto the overall structure of complex datasets
• High-level abstractions for plotting multi-plot grids that helps to easily
build complex visualizations
• Concise control over matplotlib figure styling with several built-in themes
• Tools for choosing color palettes that faithfully reveal patterns in your data

22-06-2020 DV - PYTHON 41
Scatterplot using Seaborn
• A scatterplot is helpful in visualizing
relationships between two variables.
• Each point shows an observation in
the dataset
• These observations are represented
by dot-like structures
• The plot shows the joint distribution
of two variables using a cloud of
points
• To draw the scatter plot,
the relplot() function of the seaborn
library is used
22-06-2020 DV - PYTHON 42
Import packages
• import numpy as np
• import matplotlib.pyplot as plt
• import seaborn as sns
• import pandas as pd

22-06-2020 DV - PYTHON 43
• sns.relplot(x="Views", y="Upvotes", data = df)
• The parameters – x, y, and data – represent the variables on X-axis, Y-axis
and the data
• Hue Plot:
• to see the tag associated with the data ( differentiate with colors – use hue
parameter)
• sns.relplot( x="Views", y="Upvotes", hue = "Tag", data = df)

22-06-2020 DV - PYTHON 44
• If the hue semantic is categorical -it has a different colour palette
• For example the Tag attribute in the dataframe had values (a,b,j,p,…)
• If the hue semantic is numeric, then the colouring becomes
sequential ( the colour shades are from light to dark shade)
• sns.relplot(x="Views", y="Upvotes", hue = "Answers", data = df);
Seaborn Lmplots
• Lmplots are used for Linear Regression data analysis
• Linear Regression is a statistical concept for predictive analytics
• predicting an outcome (dependent) variable
• Analyses which variables in particular are significant predictors for the
outcome variable
• sns.lmplot(), we have three mandatory parameters and the rest are
optional
• These 3 parameters are values for X-axis, values for Y-axis and
reference to dataset
• Optional hue parameter takes in categorical columns and helps to
group data plot as per hue parameter values
Load the dataset

This linear line across our plot is the best available fit for the trend of the tip usually customers give with respect to
the total bill that gets generated.
Seaborn Regplots:
• regplot() is pretty similar to lmplot()
• regplot() has mandatory input parameter flexibility
• x and y variables DO NOT necessarily require strings as input
• Unlike lmplot(), these two parameters shall also accept other formats like
simple
• NumPy arrays
• Pandas Series objects,
• references to variables in a Pandas DataFrame object
• Sample Program
• import numpy as np
• import seaborn as sns
• x=np.random.randn(230)
• y=np.random.randn(100)
• sns.regplot(x=x,y=y,color=‘red’)
seaborn.boxplot
• A box plot (or box-and-whisker plot) shows the distribution of
quantitative data
• The box shows the quartiles of the dataset
• whiskers extend to show the rest of the distribution
• points that are determined to be “outliers”
• Draw a single horizontal boxplot:
• import seaborn as sns
• sns.set(style="whitegrid")
• tips = sns.load_dataset("tips")
• ax = sns.boxplot(x=tips["total_bill"])

22-06-2020 DV - PYTHON 49
Advanced Visualisation tool-WordCloud
• Wordcloud library developed by Andreas Mueller
• A Wordcloud (or Tag cloud) is a visual representation of text data
• It displays a list of words, the importance of each word being shown with
font size or color
• WordCloud is a technique to show which words are the most frequent
among the given text.
• Or it is a cloud filled with lots of words in different sizes, which represent
the frequency or the importance of each word
• Wordcloud needs the pillow library a package that enables image reading.
• Pillow is a wrapper for PIL - Python Imaging Library.
• The library is required to read in image as the mask for the wordcloud
22-06-2020 DV - PYTHON 50
Simple example –word cloud
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# Create a list of word
• Text = ("Python Python Python Matplotlib Matplotlib Seaborn Network
Plot Violin Chart Pandas Datascience Wordcloud Spider Radar Parrallel
Alpha Color Brewer Density Scatter Barplot Barplot Boxplot Violinplot
Treemap Stacked Area Chart Chart Visualization Dataviz Donut Pie Time-
Series Wordcloud Wordcloud Sankey Bubble")
# Create the wordcloud object
• wordcloud = WordCloud(width=480, height=480,
margin=0).generate(Text)
# Display the generated image:
• plt.imshow(wordcloud, interpolation='bilinear')
• plt.axis("off")
• plt.margins(x=0, y=0)
• plt.show()

22-06-2020 DV - PYTHON 51
Word cloud with specific shape

• We can use an external image as a mask


to give a specific shape to the
wordcloud.
• We need to have the image in the
current directory and give it to the
wordcloud function.

22-06-2020 DV - PYTHON 52
Steps to mask an image in wordcloud
Step 1. load libraries
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import numpy as np
Step 2. library to load the image
from PIL import Image
Step 3. Create a list of word
text = ("Data visualization or data visualisation is viewed by many disciplines as a
modern equivalent of visual communication. It involves the creation and study of
the visual representation of data, meaning information that has been abstracted in
some schematic form, including attributes or variables for the units of information
A primary goal of data visualization is to communicate information clearly and
efficiently via statistical graphics, plots and information graphics. Tables are
generally used where users will look up a specific measurement, while charts of
various types are used to show patterns or relationships in the data for one or
more variables")
22-06-2020 DV - PYTHON 53
• Step 4: Load the image
wave_mask = np.array(Image.open( "wave.jpg"))
• Step 6: Make the figure
wordcloud = WordCloud(mask=wave_mask).generate(text)
plt.figure()
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.margins(x=0, y=0)
plt.show()

22-06-2020 DV - PYTHON 54
Wordcloud interpolation methods

22-06-2020 DV - PYTHON 55
22-06-2020 DV - PYTHON 56
Advanced visualization tool-Waffle charts
• PyWaffle is an open source, MIT-licensed Python package for
plotting waffle charts
• A waffle chart is an interesting visualization that is normally
created to display progress toward goals.
• Also known as the "square pie chart
• The waffle chart is a chart in the shape of a square with small
squares inside the big square
• Each square to represent one percent out of a total one hundred
• It is commonly an effective option to add interesting visualization
features to a visual that consists mainly of cells, such as an Excel
dashboard
• Its ideal for displaying percent contributions, simple distributions
over a single dimension.
• It is used to show the percentage of a certain category compared
to all the categories.

22-06-2020 DV - PYTHON 57
Basic Example: Waffle chart
• Plot a 5-row, 10-column chart with a list of values
import matplotlib.pyplot as plt
from pywaffle import Waffle
plt.figure(
FigureClass=Waffle,
rows=5,
columns=10,
values=[48, 46, 6]
)

22-06-2020 DV - PYTHON 58
Waffle chart
Parameter values also accept dict data.
The key of the dict would be used as labels and legends.

plt.figure(
FigureClass=Waffle, rows=5, columns=10,
values={'Cat1': 25, 'Cat2': 15, 'Cat3': 10},
legend={'loc': 'upper right‘}
)

22-06-2020 DV - PYTHON 59
Spatial Visualizations and Analysis in Python with Folium
• Folium is a Python Library that can allow us to visualize spatial data in
an interactive manner
• folium makes it easy to visualize data on an interactive leaflet map
where the data is been manipulated in Python.
• Leaflet is the leading open-source JavaScript library for mobile-
friendly interactive maps.
• The library has a number of built-in tilesets from OpenStreetMap,
Mapbox, and Stamen
• folium supports both Image, Video, GeoJSON and TopoJSON overlays
• By default, Folium creates a map in a separate HTML file.
• In Jupyter notebook-get inline maps
22-06-2020 DV - PYTHON 60
• Creating a simple interactive web-map
• First thing that we need to do is to create a Map instance
# Create a Map instance
import folium
• m = folium.Map(location=[60.25, 24.8], zoom_start=10, control_scale=True)
• The first parameter location takes a pair of latitude, longitude values
(determine where the map will be positioned when user opens up the map)
• zoom_start -parameter adjusts the default zoom-level for the map
(the higher the number the closer the zoom is)
control_scale defines if map should have a scalebar or not
# Save the map ( by default an html filewill be created )
• save the map as a html file base_map.html:
• outfp = "base_map.html"
• m.save(outfp)
22-06-2020 DV - PYTHON 61
22-06-2020 DV - PYTHON 62
• To create a base map, simply pass the starting coordinates to Folium
• import folium
• m = folium.Map(location=[45.5236, -122.6750])

22-06-2020 DV - PYTHON 63
Use StamenToner tileset
folium.Map(
location=[45.5236, -122.6750],
tiles='Stamen Toner',
zoom_start=13
)

22-06-2020 DV - PYTHON 64
Adding layers : Leaflet’s Circle and CircleMarker
m = folium.Map(
location=[45.5236, -122.6750],
tiles='Stamen Toner',
zoom_start=13
)
folium.CircleMarker(
location=[45.5215, -122.6261],
radius=50,
popup='Laurelhurst Park',
color='#3186cc',
fill=True,
fill_color='#3186cc'
).add_to(m)

22-06-2020 DV - PYTHON 65
Case study - An e-commerce company ‘ wants to
get into logistics “Deliver4U
• It wants to know the pattern for maximum pickup calls from different
areas of the city throughout the day
• For this the company uses its existing customer data in Delhi to find
the highest density of probable pickup locations in the future
i) Build optimum number of stations where its pickup delivery
personnel will be located.
ii) Ensure pickup personnel reaches the pickup location at the earliest
possible time.
Case study – using folium
Development Tools (Stack)
• Python 3
• PyData stack (Pandas, matplotlib, seaborn)
• Folium
Setting Up Environment
import folium
from folium import plugins
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
• Data set : Downloaded the following from the location specified by
the trainer.
• The dataset contains two separate data files
• train_del.csv and test_del.csv.
• The difference is that train_del.csv contains additional column which
is trip_duration.
• Drop the trip_duration column
combine the 2 different files ( train.csv and tetst.csv )as one
dataframe
Reference :
x = lambda a : a + 10
print(x(5))
• First step : generate the base map

• Second step :visualize the rides data using a class method called
Heatmap()
Generate heatmap using the function and
add the layer to the existing base map
• Using data from May(5th month ) to June 2016 to generate the heat map
• df_copy = df[df.month>4].copy()
• df_copy['count'] = 1
• HeatMap(data=df_copy[['pickup_latitude', 'pickup_longitude', 'count']].
groupby(['pickup_latitude', 'pickup_longitude']).
sum().reset_index().values.tolist(),
radius=8, max_zoom=13).add_to(base_map) )
• By looking at the heatmap we can also see that there are spots where
the demand around that area is higher than the areas surrounding it
• There is high demand for cabs in areas marked by the heat map which
is central Delhi most probably and other surrounding areas
22-06-2020 DV - PYTHON 77

Вам также может понравиться