Вы находитесь на странице: 1из 8

NYC TAXI DATA ANALYSIS

Parth Shah - 0989400

Recap of Phase-2

Dataset and Attribute

Analysis using Map-Reduce (Abstract)

Data collection and Integration

Extension of Project
Attribute

Datatype

Attribute

Datatype

vendorid

number

dropoff_latitude

number

trip_pickup_datetime

floating_timestamp

payment_type

number

trip_dropoff_datetime

floating_timestamp

fare_amount

number

passenger_count

number

extra

number

trip_distance

number

mta_tax

number

pickup_longitude

number

tip_amount

number

pickup_latitude

number

tolls_amount

number

ratecodeid

number

store_and_fwd_flag

text

total_amount

number

dropoff_longitude

number

Introduction and Past

NYC Taxi dataset of 2013 made available in 2014 under FOIL (The Freedom of
Information Law)

Data was requested and collected by Chris Whong (Guy above) on Hard Disk and
Analysis Project made available as open source on Git-Hub.

Later 2013 Dataset decoded by Vijay Pandurangan and 2 field of dataset namely
medallion and hack_license has been decrypted and normalized logic made
openly available

In Sep-2014 Anthony Tucker documented at least two cases in which the database did in
fact reveal, or at least confirm, passenger data. These passengers where famous
celebrities Bradly Cooper and Jessica Alba.

Online document has been reveled on analysis of


Public NYC Taxicab Database Lets You See How Celebrities Tip.

Mentioned two field removed in 2014 and 2015 dataset and we have limited attributes
mentioned in previous slide.


Frequent Trip analysis on the data

New User-define type Location containing Latitude and Longitude will be created to make
analysis simpler

Mapper1 Output <Key, Value> : <Round (pickup_Location), list (Round (Dropoff_location))>

Reducer1 Output <Key, Value> : <Round (pickup_Location), Round (Dropoff_Location)>

Utilize the Reducer1s output for next phase of Map-Reduce

Mapper2 Output <Key, Value> : <Pair (Round (pickup_Location), Round (Dropoff_Location)), list
(occurrence)>

Reducer2 Output <Key, Value> : < Pair (Round (pickup_Location), Round (Dropoff_Location)),
Count>


Taxi Request frequency day, Month & Time

Few analysis are simple but which is useful on our data like Overall NYC Taxi Request count

Date_Mapper Output <Key, Value> :- <RoundbyDate (trip_pickup_datetime), list (occurrence)>

Date_Reducer Output <Key, Value> :- <trip_pickup_date, count>

Now we will write the output of Date_Reducer in csv file. And use it as input of another
Program

Month_Mapper Output <Key, Value> :- <RoundbyMonth (trip_pickup_date), list (occurrence)>

Month_Reducer Output <Key, Value> :- <trip_pickup_Month, count>

By Time

Time_Mapper Output <Key, Value> :- <RoundbyHours (trip_pickup_datetime), list (occurrence)>

Time_Reducer Output <Key, Value> :- <hour, count>


The generous area of New-York

This one is the simple analysis but kind of interesting one, As we already mentioned we are
going to introduce new class(Datatype) Named location

Rounding location will create an area and it is like round in the map

Mapper Output <Key, Value> :- <Round (Location), list (tip)>

Reducer Output <Key, Value> :- < Round(Location), Avg (tip)>


Fair increase of Taxi and Outliner Trip Days

For this analysis we are going to Integrate 2014 and 2015 dataset of NYC Taxi and then will
perform below analysis.

We will use the output of Analysis A and use it as an extension of this one we will take
highest frequent trip locations and use it for fair data

Mapper Output <Key, Value> :


<ForFrequentTrip (RoundbyDate (trip_pickup_datetime)), List (Fair)>

Reducer Output <Key, Value> :


< ForFrequentTrip (RoundbyDate (trip_pickup_datetime)), Avg (Fair)>

Thank You

Вам также может понравиться