Вы находитесь на странице: 1из 18

Predicting Bus Ridership in King County:

Uncovering Trends at the Census Tract Level

Authors:

Obinna Amobi Jessie Huang Ryan Gregory Miller

Final Paper

URBDP 422: Urban and Regional Geospatial Analysis

DEPARTMENT OF URBAN DESIGN AND PLANNING

UN I VE R S I TY O F WA SHIN GT O N

EXECUTIVE SUMMARY
This study explores King County Metro bus ridership in an effort to identify the spatial and demographic factors that influence transit utilization. We sought to determine what it is about a neighborhood that makes an individual more or less likely to use Metro for their morning commute. Boarding data provided by Metro is used to derive average morning ridership by Census Tract over a five-year period. Average morning ridership is then compared with spatial and socioeconomic characteristics to identify significant correlations. Finally, eight spatial and socioeconomic charactaristics are used to construct a regression model which successfully predicts over 70% of the variation in average morning ridership across the County.

AGENCY BRIEF
This project uses average ridership sample data provided by Metro to construct a model that predicts the percentage of people who board the bus in the morning at the Census Tract level. Our model, using a spatial variable along with seven demographic variables, is able to predict 71% of the variation in Metros AM Boardings across Census Tracts. As part of a course project, this report is considered a preliminary stab at modeling Metros ridership. To that end, the data we used for our analysis are being included with this report. A data dictionary provided in the appendix of this document which summarizes shapefiles and data tables in the hopes that Metro may be interested in further analysis. Key conclusions for Metro include the following: The number of stops and routes per acre in a Census Tract is the most significant predictor of the number of people who board the bus in the morning Key demographics such as household size, population density, and the percentage of workers who do not have access to a car stand out as strong predictors. Many of the Seattle neighborhoods with the largest percentage of residents boarding Metro in the morning exhibit small, carless households at high densities. As Metro plans for future service increases, it should target neighborhoods that are trending in these directions.

Predicting Bus Ridership in King County

Amobi, Huang, and Miller

PROBLEM DEFINITION
King County Metro would like to benefit the community by expanding transit access and offering attractive alternatives to automobile commuting. In order to understand the ways in which Metro can be most effective in its goal, the factors that influence transit ridership must be identified. Creation of a universal model that includes both socioeconomic/demographic factors as well as spatial factors allows Metro to identify which individual variables have the greatest effect on transit ridership. Identification of these variables can help Metro target specific populations and specific areas in its efforts to grow market demand.

PROJECT QUESTION
In plain English, this project aims to answer the question: what it is about a neighborhood that makes an individual more or less likely to use Metro for their morning commute? Because an answer to this question requires the use of demographic data, analysis was performed by Census Tract. This is the smallest level for which detailed demographics are available via the American Community Survey. Our question is then further defined as we move into bivariate and multivariate analysis. Bivariate analysis asks the question: what is the relationship between morning ridership and a single individual variable that might explain it? Multivariate analysis asks a more complex question: what is the overall relationship between morning ridership and all of the variables that might explain it, and to what degree does each variable contribute to that relationship? The table below summarizes our hypotheses about how these factors predict transit ridership.

Neighborhood Factor
Average Bus Stop Boarding Possibilities Per Acre Average Commute Time Population Per Acre Percent of Population Attending Undergraduate or Graduate Studies Dummy Variable - Average Housing Age Older than 1950 Average Household Size Median Household Income Percent of Workers with No Car Available

Expected Effect
Positive Negative Positive Positive Positive Negative Negative Negative

Predicting Bus Ridership in King County

Amobi, Huang, and Miller

METHODOLOGY
The following section describes the methods used to prepare independent and independent variables. ArcGIS 10.0, Microsoft Access 2010, PASW Statistics 18 were used for this analysis. References to specific geographic tools occur in bold.

DEPENDENT VARIABLE

Several steps were taken to transform the data provided by Metro into our dependent variable, AM Boardings by Census Tract.

Step 1: Removing Park-And-Ride Stops

DATA AGGREGTION REMOVE U N WA N T E D STOPS

Tabular data provided by Metro contained information about average boardings by stop by route by several time periods throughout the day. First, this data was converted to DBASE format using Microsoft Access, and it was then opened in ArcMap. In ArcMap, the data was joined to a list maintained informally by Metro staff of bus stops which function as park-and-ride facilities. Records of data for these bus stops were then removed from the analysis.

Step 2: Aggregating AM Boardings and Joining to Bus Stop Geography

Once park-and-ride data was removed from the DBASE table, a selection of morning boardings was performed. A Select by Attribute highlighted only records for AAM or AM boardings, which corresponds to all morning boardings prior to 9 AM. With this selection in place, Summary Statistics were performed, using each bus stops unique indentifier as a case field to aggregate average boardings by stop, regardless of route. The output of this process was then joined to a bus stop point shapefile provided by Metro.

Step 3: Generating a Point Density Raster

In order to estimate the areas from which people are walking to board the bus in the morning, a raster surface was generated from the bus stop point file with ridership data attached. A grid of 200 foot2 raster cells was generated using the Point Density function in Spatial Analyst. Average AM Boardings was used as the population field, and values were spread over a 7-cell (1400 foot) radius. This radius was chosen because it approximates 1320 feet, or a quarter mile, a commonly accepted radius for bus stop walksheds.

NORMALIZE DATA

Step 4: Using Raster Calculator to Normalize Values

The output from Step 3 represented average AM boardings per square foot. In order to show average AM boardigns per cell, the output from Step 3 was multiplied by 40,000 using the Raster Calculator. 4
Amobi, Huang, and Miller

Predicting Bus Ridership in King County

Step 5: Zonal Statistics by Census Tract

ZONAL STATISTICS JOIN OUTPUT TABLE AVERAGE 5 YEARS

Normalized boarding data for each 40,000 square foot cell is now ready to be aggregated to the Census Tract level. Zonal Statistics to Table was performed, using Census Tracts as the input feature, the Census Tracts GEOID as the zone field, and the output from Step 4 as the input value raster. This process created a table of aggregate AM boardings for each Census Tract.

Step 6: Calculating a 5-year Average

In order to prepare boarding data at the same time interval as demographic information, Steps 1-5 were performed indepdently for five years of data: 2007, 2008, 2009, 2010, and 2011. Using a series of joins, a table was constructed that contained aggregate AM boarding data for each Census Tract over each time period. Using the Field Calculator, we added a new floating-point field and calculated a 5 -year average. This 5-year average column was finally joined to 2010 Census Tract geometry for visualization.

INDEPENDENT VARIABLES

Preparing dependent variables required gathering tabular data from the 2007-2011 American Community Survey, and also generating a spatial variable that measures opportunities to board busses.

Step 1: Obtaining Census Data

DOWNLOAD VIA CENSUS FTP

Target sequences of the 2007-2011 American Community Survey Summary File were downloaded via the Census Bureaus FTP server. Data were then linked to geographic identifiers in Microsoft Access, exported as a DBASE table, and joined to the Census Tract shapefile containing the depedent variable in ArcMap.

Step 2: Creating a Spatial Index for Boarding Opportunities

A spatial index was generated that measures the average number of opportunities available to board a bus in a given Census Tract. Point Density was used to generate a grid, much as in Step 3 of the depdent variable calculation, but with the number of routes going through each stop during the monring hours used as the population field, not the number of boardings. The Raster Calculator was used to normalize these values, as in Step 4 of the dependent variable process, and Zonal Statistics to Table was used to find the mean index value for each Census Tract. This output table was then joined to the Census Tract geometry shapefile containing the depedent variable and demographic data.
Predicting Bus Ridership in King County

Amobi, Huang, and Miller

RESULTS
A series of bivariate comparisons were performed between our dependent and independent variables. A multivariate regression model was then constructed using all of the independent variables together. The following section summarizes the corrleations and patterns we uncovered.

BIVARIATE STATISTICS

For Full-Size Maps, See Appendix B

Bus Stop Boarding Possibilities


Expected Relationship: Positive Observed Relationship: Positive Correlation Coefficient (r): .811

Average Commute Time


Expected Relationship: Negative Observed Relationship: Negative Correlation Coefficient: (r): -.218

Population Per Acre


Expected Relationship: Positive Observed Relationship: Positive Correlation Coefficient: (r): . 268

Percent of Population Attending Undergraduate or Graduate Studies


Expected Relationship: Positive Observed Relationship: Positive Correlation Coefficient: (r): . 131

Predicting Bus Ridership in King County

Amobi, Huang, and Miller

Binary Variable: Average Housing Age Older Than 1950


Expected Relationship: Positive Observed Relationship: Not Significant Correlation Coefficient: (r): N/A

Average Household Size


Expected Relationship: Negative Observed Relationship: Negative Correlation Coefficient (r): -.400

Median Household Income


Expected Relationship: Negative Observed Relationship: Negative Correlation Coefficient: (r): -.260

Percent of Workers with No Car Available


Expected Relationship: Positive Observed Relationship: Positive Correlation Coefficient: (r): .533

Predicting Bus Ridership in King County

Amobi, Huang, and Miller

MULTIVARIATE MODEL
The previous eight variables were used to create a multivariate linear regression model using PASW Statistics 18. The model assumes the following functional form: PCT_AM_ONi = 0 + 1AVG_STPSi - 2AVG_COM_MIi + 3POP_ACREi + 4PCT_UNIVi + 5BEF_1950i 6AVG_ HHSIZEi 7MED_HH_INCi + 8PCT_NOCARi

Model Summary

The model above exhibits an Adjusted R2 of .710, meaning that 71% of the variation in AM boardings per Census Tract can be explained using the explanatory variables that we identified. The following table summarizes the standardized coefficients and significance of each variable in the model. Standardized coefficients are measures of relative importance, not actual unit change.

Variable Name
Bus Stop Boarding Possibilities Average Commute Time Population Per Acre Percent of Population Attending College Binary Variable: Average Housing Age Older than 1950 Average Household Size Median Household Income Percent of Workers with No Car Available

Standardized Coefficient
.923 -.065 -.297 .018 .061 -.056 -.102 -.049

Significance
Significant Significant Significant Not Significant Significant Not Significant Significant Not Significant

Analysis

The multivariate regression results show markedly different findings than bivariate statistics. Before too much weight is given to the results of this model, it is important to understand the limitations of multivariate regression. The problem of multicollinearity is at work here, meaning that the exlanatory variables are all correlated not only with the dependent variables but with one another. In such a case, it becomes extremely difficult for statistical software to determine which explanatory variable is truly causing the change in the depedent variable.

Predicting Bus Ridership in King County

Amobi, Huang, and Miller

INTERPRETATION OF RESULTS
Determination of relationships between our explanatory variables and AM ridership, while imperfect, offers several helpful peices of information for King County Metro. Identification of boarding opportunities as the strongest predictor of AM ridership both in bivariate and multivariate analyses speaks to the importance of living proximate to many bus stops with many routes passing through each of them. Frequency, though not addressed in this study, is also a likely related variable. This finding suggests that ridership faces declines if and when bus stops and/or routes are elimated. Identification of household size, population per acre, and workers without cars as significant variables also provides usable data for Metro. As Seattles urban neighborhoods continue to develop, they are becoming more dense, and the people who live in these neighborhoods often form smaller households and choose not to own cars at the same rates as their predecessors. These trends reflect broader cultural and demographic shifts taking place throughout the nation. As these trends continue, Metro would do best to focus on the neighborhoods which are most experiencing changes in these variables in an effort to expand its service in an efficient manner.

A NOTE ON THE LIMITATIONS OF GIS


Geographic Information Systems, like any effort to describe the real world using digital models, is a tool that is often crude and imprecise. The application of GIS for this project is no exception. Error was likely introduced at several steps along the process. This includes, but is not limited to, sampling errors with the data that was provided by Metro, errors stemming from assumptions about how far individuals are willing to walk to a bus stop, and errors inherent in working between both raster and vector data formats. Working at the Census Tract level also creates error. American Community Survey data, used for many of our explanatory variables, not only has high margins of error associated with it, but also assumes an even distribution of demographic conditions throughout an entire Census Tract. In reality, we know this is not true. In a world where infinite resources could be spent on geographically detailed surveys, this analysis might be performed at the Census Block or even at the household level. There is always error inherent in summarizing activities performed by individuals (point data) within the context of often arbitrary polygons such as Census Tracts. Our statistical analysis is also prone to severe error. Not only are our independent variables highly correlated with one another, making identification of specific factors difficult, but our dependent variable is highly autocorrelated, meaning that the power of our statistical model is likely overstated. See Appendix B for a map of autocorrelation in the data. Despite the limitations of our methods, we sincerely believe there is value in our analysis. We hope that Metro can make use of this report as it considers service changes, and that the methodology outlined here is enough to inspire future research into the factors that predict bus ridership across the County.
Predicting Bus Ridership in King County

Amobi, Huang, and Miller

APPENDIX A: DATA DICTIONARY


Data Layers Provided for Metro:
The whole enchilada: shapfile with average ridership and all explanatory variables: Relevant Attributes: PCT_AM_ON, MED_HH_INC, AVG_HHSIZE, PCT_NOCAR, PCT_UNIV, POP_ACRE, AVG_STPS_A, AVG_COM_MI 5_Year_Average_AM_Ridership_With_Demographics.shp Raster Grids: Normalized Average Boardings Per 40,000ft2: am_on_07n.grid am_on_08n.grid am_on_09n.grid am_on_10n.grid am_on_11n.grid Shapefiles for AM boardings by bus stop for the five-year period: Relevant Attribute: SUM_AVG_ON AM_Ridership_2007.shp AM_Ridership_2008.shp AM_Ridership_2009.shp AM_Ridership_2010.shp

Predicting Bus Ridership in King County

10

Amobi, Huang, and Miller

APPENDIX B: MAPS
Visualization and Map of Autocorrelation using Morans I

Predicting Bus Ridership in King County

11

Amobi, Huang, and Miller

Boardings Per Acre: 2007 - 2011

Predicting Bus Ridership in King County

12

Amobi, Huang, and Miller

Predicting Bus Ridership in King County

13

Amobi, Huang, and Miller

Average AM Boardings as a Percent of Census Tract Population: 2007-2011 Average

Predicting Bus Ridership in King County

14

Amobi, Huang, and Miller

Average AM Boarding Opportunities by Census Tract

Average AM Boardings and Selected Demographics: Comparison Maps

Predicting Bus Ridership in King County

15

Amobi, Huang, and Miller

Predicting Bus Ridership in King County

16

Amobi, Huang, and Miller

Predicting Bus Ridership in King County

17

Amobi, Huang, and Miller

Predicting Bus Ridership in King County

18

Amobi, Huang, and Miller

Вам также может понравиться