Uber - Analysis - Jupyter - Notebook

9/14/2019 Uber_Analysis - Jupyter Notebook
In [45]: %pylab inline

import pandas
import seaborn
import matplotlib.pyplot as plt
Populating the interactive namespace from numpy and matplotlib
Load CSV file into Memory

In [2]: data=pandas.read_csv('uber-raw-data-apr14.csv')
In [3]: data.head()
Out[3]:
Date/Time Lat Lon Base
0 04/01/2014 0:11 40.7690 -73.9549 B02512
1 04/01/2014 0:17 40.7267 -74.0345 B02512
2 04/01/2014 0:21 40.7316 -73.9873 B02512
3 04/01/2014 0:28 40.7588 -73.9776 B02512
4 04/01/2014 0:33 40.7594 -73.9722 B02512
Adding useful Columns

In [4]: data['Date/Time']=data['Date/Time'].map(pandas.to_datetime)
#This might take a little time
#it maps each cell of Date/Time column with the Python Timestamp format
In [5]: def get_dom(dt):

return dt.day
data['dom']=data['Date/Time'].map(get_dom)
In [6]: def get_weekday(dt):

return dt.weekday()
data['weekday']=data['Date/Time'].map(get_weekday)
In [7]: def get_hour(dt):

return dt.hour
data['hour']=data['Date/Time'].map(get_hour)
localhost:8888/notebooks/Uber_Analysis.ipynb 1/10
In [8]: data.tail()
Out[8]:
Date/Time Lat Lon Base dom weekday hour
564511 2014-04-30 23:22:00 40.7640 -73.9744 B02764 30 2 23
564512 2014-04-30 23:26:00 40.7629 -73.9672 B02764 30 2 23
564513 2014-04-30 23:31:00 40.7443 -73.9889 B02764 30 2 23
564514 2014-04-30 23:32:00 40.6756 -73.9405 B02764 30 2 23
564515 2014-04-30 23:48:00 40.6880 -73.9608 B02764 30 2 23
Start the Analysis

In [13]: hist(data.dom,bins=30,rwidth=0.8,range=(0.5,30.5))
#by default histogram has 10 bins; but because we have 30 days in a month,
#so we have to take bins=30
#rwidth is to provide width between each bar
#range is to take data on xlabel from 0.5 to 30.5
xlabel('Day Of Month')
ylabel('Frequency')
title('Uber Frequency details for Apl 14')
#xlabel/ylabel might return an error as 'str object not callable'

#This might be because xlabel/ylabel might have been set as some string somewhere in the pr
#Close the notebook and restart the kernel to resolve this
Out[13]: Text(0.5, 1.0, 'Uber Frequency details for Apl 14')
In [14]: for k,rows in data.groupby('dom'):
print((k,len(rows)))
#This returns the frequency in numbers for each day of the month
(1, 14546)
(2, 17474)
(3, 20701)
(4, 26714)
(5, 19521)
(6, 13445)
(7, 19550)
(8, 16188)
(9, 16843)
(10, 20041)
(11, 20420)
(12, 18170)
(13, 12112)
(14, 12674)
(15, 20641)
(16, 17717)
(17, 20973)
(18, 18074)
(19, 14602)
(20, 11017)
(21, 13162)
(22, 16975)
(23, 20346)
(24, 23352)
(25, 25095)
(26, 24925)
(27, 14677)
(28, 15475)
(29, 22835)
(30, 36251)
In [16]: def count_rows(rows):

return len(rows)
by_date=data.groupby('dom').apply(count_rows)
#works the same as above function but is more according to pandas convention
In [17]: plot(by_date)
#plotting above Data on a graph
Out[17]: [<matplotlib.lines.Line2D at 0x20a9fe9cf28>]
In [19]: bar(range(1,31),by_date)
#Bar Graph- same as the one plotted earlier
Out[19]: <BarContainer object of 30 artists>
In [20]: by_date_sorted=by_date.sort_values()
by_date_sorted
#sorts the data as per the values
Out[20]: dom
20 11017
13 12112
14 12674
21 13162
6 13445
1 14546
19 14602
27 14677
28 15475
8 16188
9 16843
22 16975
2 17474
16 17717
18 18074
12 18170
5 19521
7 19550
10 20041
23 20346
11 20420
15 20641
3 20701
17 20973
29 22835
24 23352
26 24925
25 25095
4 26714
30 36251
dtype: int64
In [76]: plt.figure(figsize=(12,8))
bar(range(1,31),by_date_sorted)
#This plots a bar graph but x axis is still sorted as 1,2,3...
xticks(range(1,31),by_date_sorted.index)
#This sorts the x-axis according to the values in ascending order
#but it generates some random string
#To prevent that, append a ';' in the end
xlabel('Day Of Month')
ylabel('Frequency')
title('Uber Frequency details for Apl 14')
#applying x-axis,y-axis and labels
Out[76]: Text(0.5, 1.0, 'Uber Frequency details for Apl 14')
Analysis on the basis of Weekday
In [83]: hist(data.weekday,bins=7,range=(-0.5,6.5),rwidth=0.8,color='#AA6666',alpha=0.4)
xticks(range(7),'Mon Tue Wed Thu Fri Sat Sun'.split())
;
Out[83]: ''
Cross Analysis
In [91]: cross=data.groupby('weekday hour'.split()).apply(count_rows).unstack()
seaborn.heatmap(cross)
yticks(range(7),'Mon Tue Wed Thu Fri Sat Sun'.split())
plt.xlabel('Hour',fontsize=20)
plt.ylabel('Weekday',fontsize=20)
Out[98]: Text(87.0, 0.5, 'Weekday')
Analysis by Latitude/Longitude
In [119]: hist(data['Lon'],bins=100,range=(-74.2,-73.7),color='y',alpha=0.5,label='Longitude')
legend(loc='upper left')
twiny()
hist(data['Lat'],bins=100,range=(40.50,41),color='b',alpha=0.5,label='Latitude')
grid()
legend()
;
Out[119]: ''
plot(data['Lon'],data['Lat'],'.',ms=1,alpha=0.5)
grid()
xlim(-74.2,-73.7)
ylim(41,40.50)
Out[127]: (41, 40.5)
In [ ]:

Uber - Analysis - Jupyter - Notebook

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Uber - Analysis - Jupyter - Notebook

Загружено:

Авторское право:

Доступные форматы

9/14/2019 Uber_Analysis - Jupyter Notebook

In [45]: %pylab inline

Populating the interactive namespace from numpy and matplotlib

Load CSV file into Memory

0 04/01/2014 0:11 40.7690 -73.9549 B02512

1 04/01/2014 0:17 40.7267 -74.0345 B02512

2 04/01/2014 0:21 40.7316 -73.9873 B02512

3 04/01/2014 0:28 40.7588 -73.9776 B02512

4 04/01/2014 0:33 40.7594 -73.9722 B02512

Adding useful Columns

In [5]: def get_dom(dt):

In [6]: def get_weekday(dt):

In [7]: def get_hour(dt):

564511 2014-04-30 23:22:00 40.7640 -73.9744 B02764 30 2 23

564512 2014-04-30 23:26:00 40.7629 -73.9672 B02764 30 2 23

564513 2014-04-30 23:31:00 40.7443 -73.9889 B02764 30 2 23

564514 2014-04-30 23:32:00 40.6756 -73.9405 B02764 30 2 23

564515 2014-04-30 23:48:00 40.6880 -73.9608 B02764 30 2 23

Start the Analysis

#xlabel/ylabel might return an error as 'str object not callable'

Out[13]: Text(0.5, 1.0, 'Uber Frequency details for Apl 14')

In [16]: def count_rows(rows):

Out[17]: [<matplotlib.lines.Line2D at 0x20a9fe9cf28>]

Out[19]: <BarContainer object of 30 artists>

Out[76]: Text(0.5, 1.0, 'Uber Frequency details for Apl 14')

Analysis on the basis of Weekday

In [91]: cross=data.groupby('weekday hour'.split()).apply(count_rows).unstack()

Out[98]: Text(87.0, 0.5, 'Weekday')

Out[127]: (41, 40.5)

Вам также может понравиться