Академический Документы
Профессиональный Документы
Культура Документы
Recap of Phase-2
Extension of Project
Attribute
Datatype
Attribute
Datatype
vendorid
number
dropoff_latitude
number
trip_pickup_datetime
floating_timestamp
payment_type
number
trip_dropoff_datetime
floating_timestamp
fare_amount
number
passenger_count
number
extra
number
trip_distance
number
mta_tax
number
pickup_longitude
number
tip_amount
number
pickup_latitude
number
tolls_amount
number
ratecodeid
number
store_and_fwd_flag
text
total_amount
number
dropoff_longitude
number
NYC Taxi dataset of 2013 made available in 2014 under FOIL (The Freedom of
Information Law)
Data was requested and collected by Chris Whong (Guy above) on Hard Disk and
Analysis Project made available as open source on Git-Hub.
Later 2013 Dataset decoded by Vijay Pandurangan and 2 field of dataset namely
medallion and hack_license has been decrypted and normalized logic made
openly available
In Sep-2014 Anthony Tucker documented at least two cases in which the database did in
fact reveal, or at least confirm, passenger data. These passengers where famous
celebrities Bradly Cooper and Jessica Alba.
Mentioned two field removed in 2014 and 2015 dataset and we have limited attributes
mentioned in previous slide.
Frequent Trip analysis on the data
New User-define type Location containing Latitude and Longitude will be created to make
analysis simpler
Mapper2 Output <Key, Value> : <Pair (Round (pickup_Location), Round (Dropoff_Location)), list
(occurrence)>
Reducer2 Output <Key, Value> : < Pair (Round (pickup_Location), Round (Dropoff_Location)),
Count>
Taxi Request frequency day, Month & Time
Few analysis are simple but which is useful on our data like Overall NYC Taxi Request count
Now we will write the output of Date_Reducer in csv file. And use it as input of another
Program
By Time
The generous area of New-York
This one is the simple analysis but kind of interesting one, As we already mentioned we are
going to introduce new class(Datatype) Named location
Rounding location will create an area and it is like round in the map
Fair increase of Taxi and Outliner Trip Days
For this analysis we are going to Integrate 2014 and 2015 dataset of NYC Taxi and then will
perform below analysis.
We will use the output of Analysis A and use it as an extension of this one we will take
highest frequent trip locations and use it for fair data
Thank You