Вы находитесь на странице: 1из 10

9/14/2019 Uber_Analysis - Jupyter Notebook

In [45]: %pylab inline


import pandas
import seaborn
import matplotlib.pyplot as plt

Populating the interactive namespace from numpy and matplotlib

Load CSV file into Memory


In [2]: data=pandas.read_csv('uber-raw-data-apr14.csv')

In [3]: data.head()

Out[3]:
Date/Time Lat Lon Base

0 04/01/2014 0:11 40.7690 -73.9549 B02512

1 04/01/2014 0:17 40.7267 -74.0345 B02512

2 04/01/2014 0:21 40.7316 -73.9873 B02512

3 04/01/2014 0:28 40.7588 -73.9776 B02512

4 04/01/2014 0:33 40.7594 -73.9722 B02512

Adding useful Columns


In [4]: data['Date/Time']=data['Date/Time'].map(pandas.to_datetime)
#This might take a little time
#it maps each cell of Date/Time column with the Python Timestamp format

In [5]: def get_dom(dt):


return dt.day
data['dom']=data['Date/Time'].map(get_dom)

In [6]: def get_weekday(dt):


return dt.weekday()

data['weekday']=data['Date/Time'].map(get_weekday)

In [7]: def get_hour(dt):


return dt.hour

data['hour']=data['Date/Time'].map(get_hour)

localhost:8888/notebooks/Uber_Analysis.ipynb 1/10
9/14/2019 Uber_Analysis - Jupyter Notebook
In [8]: data.tail()

Out[8]:
Date/Time Lat Lon Base dom weekday hour

564511 2014-04-30 23:22:00 40.7640 -73.9744 B02764 30 2 23

564512 2014-04-30 23:26:00 40.7629 -73.9672 B02764 30 2 23

564513 2014-04-30 23:31:00 40.7443 -73.9889 B02764 30 2 23

564514 2014-04-30 23:32:00 40.6756 -73.9405 B02764 30 2 23

564515 2014-04-30 23:48:00 40.6880 -73.9608 B02764 30 2 23

Start the Analysis


In [13]: hist(data.dom,bins=30,rwidth=0.8,range=(0.5,30.5))

#by default histogram has 10 bins; but because we have 30 days in a month,
#so we have to take bins=30
#rwidth is to provide width between each bar
#range is to take data on xlabel from 0.5 to 30.5
xlabel('Day Of Month')
ylabel('Frequency')
title('Uber Frequency details for Apl 14')

#xlabel/ylabel might return an error as 'str object not callable'


#This might be because xlabel/ylabel might have been set as some string somewhere in the pr
#Close the notebook and restart the kernel to resolve this

Out[13]: Text(0.5, 1.0, 'Uber Frequency details for Apl 14')

localhost:8888/notebooks/Uber_Analysis.ipynb 2/10
9/14/2019 Uber_Analysis - Jupyter Notebook
In [14]: for k,rows in data.groupby('dom'):
print((k,len(rows)))

#This returns the frequency in numbers for each day of the month

(1, 14546)
(2, 17474)
(3, 20701)
(4, 26714)
(5, 19521)
(6, 13445)
(7, 19550)
(8, 16188)
(9, 16843)
(10, 20041)
(11, 20420)
(12, 18170)
(13, 12112)
(14, 12674)
(15, 20641)
(16, 17717)
(17, 20973)
(18, 18074)
(19, 14602)
(20, 11017)
(21, 13162)
(22, 16975)
(23, 20346)
(24, 23352)
(25, 25095)
(26, 24925)
(27, 14677)
(28, 15475)
(29, 22835)
(30, 36251)

In [16]: def count_rows(rows):


return len(rows)

by_date=data.groupby('dom').apply(count_rows)
#works the same as above function but is more according to pandas convention

localhost:8888/notebooks/Uber_Analysis.ipynb 3/10
9/14/2019 Uber_Analysis - Jupyter Notebook
In [17]: plot(by_date)
#plotting above Data on a graph

Out[17]: [<matplotlib.lines.Line2D at 0x20a9fe9cf28>]

In [19]: bar(range(1,31),by_date)
#Bar Graph- same as the one plotted earlier

Out[19]: <BarContainer object of 30 artists>

localhost:8888/notebooks/Uber_Analysis.ipynb 4/10
9/14/2019 Uber_Analysis - Jupyter Notebook
In [20]: by_date_sorted=by_date.sort_values()
by_date_sorted
#sorts the data as per the values

Out[20]: dom
20 11017
13 12112
14 12674
21 13162
6 13445
1 14546
19 14602
27 14677
28 15475
8 16188
9 16843
22 16975
2 17474
16 17717
18 18074
12 18170
5 19521
7 19550
10 20041
23 20346
11 20420
15 20641
3 20701
17 20973
29 22835
24 23352
26 24925
25 25095
4 26714
30 36251
dtype: int64

localhost:8888/notebooks/Uber_Analysis.ipynb 5/10
9/14/2019 Uber_Analysis - Jupyter Notebook
In [76]: plt.figure(figsize=(12,8))
bar(range(1,31),by_date_sorted)
#This plots a bar graph but x axis is still sorted as 1,2,3...

xticks(range(1,31),by_date_sorted.index)
#This sorts the x-axis according to the values in ascending order
#but it generates some random string
#To prevent that, append a ';' in the end

xlabel('Day Of Month')
ylabel('Frequency')
title('Uber Frequency details for Apl 14')
#applying x-axis,y-axis and labels

Out[76]: Text(0.5, 1.0, 'Uber Frequency details for Apl 14')

localhost:8888/notebooks/Uber_Analysis.ipynb 6/10
9/14/2019 Uber_Analysis - Jupyter Notebook

Analysis on the basis of Weekday

In [83]: hist(data.weekday,bins=7,range=(-0.5,6.5),rwidth=0.8,color='#AA6666',alpha=0.4)
xticks(range(7),'Mon Tue Wed Thu Fri Sat Sun'.split())
;

Out[83]: ''

Cross Analysis

In [91]: cross=data.groupby('weekday hour'.split()).apply(count_rows).unstack()

localhost:8888/notebooks/Uber_Analysis.ipynb 7/10
9/14/2019 Uber_Analysis - Jupyter Notebook
In [98]: plt.figure(figsize=(12,8))
seaborn.heatmap(cross)
yticks(range(7),'Mon Tue Wed Thu Fri Sat Sun'.split())
plt.xlabel('Hour',fontsize=20)
plt.ylabel('Weekday',fontsize=20)

Out[98]: Text(87.0, 0.5, 'Weekday')

localhost:8888/notebooks/Uber_Analysis.ipynb 8/10
9/14/2019 Uber_Analysis - Jupyter Notebook
Analysis by Latitude/Longitude

In [119]: hist(data['Lon'],bins=100,range=(-74.2,-73.7),color='y',alpha=0.5,label='Longitude')

legend(loc='upper left')
twiny()
hist(data['Lat'],bins=100,range=(40.50,41),color='b',alpha=0.5,label='Latitude')
grid()
legend()
;

Out[119]: ''

localhost:8888/notebooks/Uber_Analysis.ipynb 9/10
9/14/2019 Uber_Analysis - Jupyter Notebook
In [127]: plt.figure(figsize=(12,8))
plot(data['Lon'],data['Lat'],'.',ms=1,alpha=0.5)
grid()
xlim(-74.2,-73.7)
ylim(41,40.50)

Out[127]: (41, 40.5)

In [ ]:

localhost:8888/notebooks/Uber_Analysis.ipynb 10/10

Вам также может понравиться