Вы находитесь на странице: 1из 20

EXPLORATORY APPLICATION OF NEURAL NETWORKS TO SCHOOL FINANCE: FORECASTING EDUCATIONAL SPENDING Bruce D.

Baker Department of Teaching and Leadership 202 Bailey Hall University of Kansas Lawrence, KS 66045 (785) 864-9844
bdbaker@ukans.edu www.soe.ukans.edu/baker

Craig E. Richards Department of Organization and Leadership Teachers College, Columbia University 525 West 120th Street, Box 16 Main Hall 212A New York, NY 10027 (212) 678-3420
cer8@columbia.edu

DRAFT DO NOT CITE OR QUOTE WITHOUT PERMISSION

ANNUAL MEETING OF THE AMERICAN EDUCATIONAL RESEARCH ASSOCIATION SAN DIEGO, CA APRIL 13, 1998

EXPLORATORY APPLICATION OF NEURAL NETWORKS TO SCHOOL FINANCE: FORECASTING EDUCATIONAL SPENDING Bruce D. Baker University of Kansas Craig E. Richards Teachers College, Columbia University Abstract This study provides a side by side comparison of linear regression methodologies used by the National Center for Education Statistics in preparing projections of educational spending, with relatively new, flexible, non-linear regression methods. These methods have come to be known as Neural Networks because they are designed to mimic the pattern learning processes of a simple brain. Neural Networks have been promoted for their predictive accuracy in both cross-sectional (Buchman et. al.; Odom, 1994; Worzala, Lenk and Silva, 1995) and time series analyses (Caudill, 1995b; Hansen and Nelson, 1997; McMenamin, 1997). Others have recently implicated Neural Networks for their inferential value in revealing nonlinearities in complex data (Liao, 1992). This study finds that Neural Networks provide comparable prediction accuracy to the NCES model. More importantly, however, the Neural Networks reveal theoretically sound, non-linear patterns overlooked by the simple linear approach.

Introduction Over the past decade, Neural Network technologies have found their way into numerous competitive industries from the financial markets (McMenamin, 1997), to real estate (Worzala, Lenk and Silva, 1995), to medicine (Buchman et. al. 1994). Neural Networks are touted primarily for their predictive accuracy compared with more common linear modeling methods. Ostensibly, Neural Networks are simply an extension of regression modeling which can be referred to as flexible non-linear regression models (McMenamin, 1997). More recently, researchers have begun to assess the usefulness of neural network methodologies beyond predictive accuracy toward developing a deeper understanding of trends and patterns in data (Hansen and Nelson, 1997; Liao, 1992). Application of Neural Networks in the public sector has been extremely limited. Failure to advance forecasting methods may be due, in part, to lack of competitiveness of public agencies, hence the diminished need for predictive tools. Competition, however,

Page 2 of 20

need not be the sole force that drives the advancement of methodologies. Much can be gained in the public sector by testing possible applications of new methodologies, including the enhancement of strategic financial planning by tax revenue prediction (Hansen and Nelson, 1997). Given the long-run history of growth in educational spending, and current scrutiny regarding the returns to our educational investment (Hanushek, 1994), we would be wise to consider how these new tools can provide advantage in projecting and understanding the future of educational spending. This study explores the value of using flexible non-linear regression methods, or neural networks, along side of the National Center for Education Statistics linear regression model for forecasting educational spending.

Forecasting in Public Finance The most common method used in public finance forecasting is expert judgmental forecasting (McCollough, 1990). While the dominance of judgmental methods is partly due to lack of technical training among public officials, lack of sufficient time series data is also an issue (Nabangi, 1992). When statistical time series methods are used, emphasis tends to be placed on univariate analyses (Hansen and Nelson, 1997; Nabangi, 1992). Hansen and Nelson note that in the past five years the Utah legislature has developed a portfolio of primarily univariate methods including exponential smoothing and autoregressive integrated moving average (ARIMA). Most recently, for the 1996 legislative session, the Utah legislature added univariate Neural Network models to their forecasting portfolio (Hansen and Nelson, 1997). Tools for univariate time series analysis continue to be more developed and better understood than tools for multivariate analyses. While more complex methods are available, multivariate methods for forecasting in public finance tend to be based on a typical multiple linear regression framework. One common method involves ordinary least squares (OLS) estimation of the linear regression model, testing the residuals for autocorrelation, then adjusting the regression coefficients if necessary (Newbold, 1983, p. 599). A second method involves treating the autocorrelated errors as a separate univariate time series and constructing an ARIMA model of the error term (Pankratz, 1991, p.165). While advantages might be gained by using more intricate, non-linear, and

Page 3 of 20

multiple equation models, such methods present additional difficulties to the forecaster that render them an unlikely tool for most involved with public finance forecasting.

A Primer on Neural Networks Model Structure There are two basic differences between simple linear regression models and Neural Network models. The first is that the linear regression model is linear in its parameters and the second is that the linear regression model contains no hidden layer, or middle layer, functions (McMenamin, 1997). In Neural Network terms, the simple linear model: Y = XB + u can be viewed as a single output, feed forward system with no hidden layer and with a linear activation function (McMenamin, 1997). To dissect this description of the linear model, let us compare the simple linear form with that of a relatively simple three layer feed forward Neural Network:

Figure 1: Network Diagram with 3 Inputs and Two Hidden Neurons

Output Layer

Hidden Layer

H1

H2

Input Layer

X1

X2

X3

Adapted from McMenamin (1997)

Page 4 of 20

A regression specification for this model, with two hidden or middle neurons is: Y = b0 + b1H1 + b2H2 + u Where H1 and H2 represent the middle layer functions of the network. In general, these functions consist of a logistic, or S shaped activation function, sometimes referred to as a hidden layer transfer function (McMenamin, 1997).1 These functions may also be referred to as Squashing functions (Rao and Rao, 1993). Note in Figure 1 that all inputs feed forward into each hidden layer neuron. This is why some refer to Neural Network models as connectionist models (Buchman et. al., 1994). Duplication of inputs in the middle layer neurons appears to create irresolvable multicollinearities in the model. Duplication of inputs presents greater concern when the goal is to interpret the parameters and weightings of the middle layer. Most frequently, however, neural networks are applied for predictive purposes, rather than inference. The advantage gained by including two or more different mathematical treatments of the same inputs is that some of the inputs may be emphasized in one neuron, while others are emphasized in another. Likewise, the degree of nonlinearities and interactions between inputs in neurons may vary.

Estimation While the model structure for Neural Networks is a departure from traditional regression modeling, methods for estimating network coefficients can be quite similar. In neural networks, the coefficients are referred to as connection strengths, or weights, constant terms are called biases and, at times, slopes are called tilts (McMenamin, 1997). In general, estimation procedures consist of some type of non-linear search algorithm. The network begins by randomly assigning weights. The residuals of the output are then assessed and the weights adjusted in the appropriate direction2, until they ultimately converge on a solution. The iterative, convergent estimation procedure known as backpropagation is quite similar to Iterative Maximum Likelihood Estimation in

1 2

A hyperbolic tangent function can also be used (Rao and Rao, 1993) Momentum terms are used to keep the weights changing in the established direction. One benefit of these terms is that they often keep the network from getting stuck in local minima (Rao and Rao, 1993)

Page 5 of 20

regression modeling. The difference is in what is estimated, rather than how the estimates are achieved. Another estimation procedure that is gaining popularity uses genetic algorithms to select optimum equation structures from a pool of randomly generated equations. This procedure has been integrated into commercial software for selecting smoothing parameters in Generalized Regression Neural Networks (WSG, 1994). While consistent predictive accuracy is generally attainable by this method, equation structures and coefficient values may vary widely from one model to the next, making the inferential value of the models questionable (Baker, 1997)

Model Testing With most Neural Networks, data are segregated into two classes: Training Data and Test Data. The training set data are the data to which the weights or coefficients are initially applied. In regression, training data are equivalent to the data used for estimation. For neural nets to converge on a generalizable solution through iteration there must be a test set, against which prediction accuracy is compared. The test set may be randomly extracted from the larger sample or, in the case of time series, may consist of the most recent few events. In some software packages there is also an option to extract a production set (WSG, 1995). The production set may include predictors for which the outcome measure is still unknown. The trained network is applied to the production set predictors to determine the new predicted outcomes. A commonly expressed concern over flexible non-linear estimation methods is the tendency to overfit sample data (Murphy, Fogler and Koehler, 1994). In regression, as the number of predictors approaches the number of cases, we can achieve a near perfect fit to the outcome measure, but sacrifice the significance of the individual parameters, inferences that can be drawn from parameters and the ability to generalize. It is assumed that due to the relatively high number of weights in Neural Networks that overfit would be equally likely and yield similar complications. Murphy, Fogler and Koehler (1994) note that as tolerance, nodes or layers are increased in backpropagation networks, while training set errors are asymptotic, test set errors fail to improve beyond an identifiable

Page 6 of 20

optimum. These findings provide bases for using test set error to optimize the model structure during training.

Neural Network Architectures Until recently, backpropagation neural networks made up approximately 80% of all neural network applications (Caudill, 1995a). Use of backpropagation has declined due to the relatively long required training times for the iterative algorithm and the development of new, quicker estimation procedures (McMenamin, 1997). Figure 1 displays a structure common to backpropagation neural networks. Permutations of this structure include: (1) number of layers of the network (2) numbers of neurons in each layer and (3) numbers and locations of the connections. Recurrent backpropagaton networks, in particular, include feedback connections from the output layer to either the middle or input layer. Backpropagation has been proven an effective tool for both time series prediction (Hansen and Nelson, 1997; Baker, 1997) and cross-sectional prediction (Buchman et. al., 1994; Odom and Sharda, 1994; Worzala, Lenk and Silva, 1995). A variation on the traditional backpropagation model, Recurrent Backpropagation, which includes connections from output to middle or hidden layers, has been recommended for use in time series prediction (Caudill, 1995b). Drawing inferences from backpropagation models can be difficult due to the relatively high number of weights (coefficients).3 Some software packages, however, contain aggregate indexes of weightings that aid in determining the relative importance of model inputs to prediction (WSG, 1995). These aggregate indexes, however, are not helpful in determining the nature of the non-linear relationships within the model. Specht (1991) developed a flexible form of non-linear regression referred to as the Generalized Regression Neural Network (GRNN). GRNN removes the necessity to specify a functional form by making use of the probability density function of the observed data. The GRNN model interpolates the relationship between each input, and between the input and outcome measures, applying a smoothing parameter, , to each relationship to moderate the degree of non-linearity. Optimized models generally
3

The total number of weights increases dramatically as additional layers, neurons or connections are added, further complicating interpretation of the models.

Page 7 of 20

include different smoothing parameters for each input (Specht, 1991). The difficulty is in estimating these parameters. Two methods have generally been employed: (1) the holdout method (Specht, 1991) and (2) genetic algorithms (WSG, 1995). The holdout method involves using randomly removed samples as a test set for the prediction accuracy of the model. Genetic Algorithms involve the random creation of sets of equations, followed by fitness testing and selective breeding; that is, equations with poor predictive power cease to exist, while smoothing parameters in the fit equations are randomly recombined to create a new pool to begin the next cycle. GRNNs have not been used as widely as backpropagation but are recommended for use with sparse data and are not as sensitive to the scale of the data (Specht, 1991). GRNN has been proven effective as a prediction tool (Buchman et. al., 1994). Potential Advantages of Neural Networks While a non-linear regression model is likely to predict equally as well as a flexible non-linear model, there are distinct advantages to using the flexible estimation approach. The difficulty in constructing the optimal non-linear regression model by traditional methods comes with the need to make a priori assumptions regarding each of the non-linear components of the model. In many cases, we simply do not know what nonlinearities exist, nor have we developed the necessary theoretical insights to understand all relationships within more complex, multivariate systems. Similar problems exist with understanding the lag structure of multivariate time series models. Even more traditional time series software developers have moved toward the use of iterative search algorithms to aid in identification of lag structures in multivariate models (Liu and Hudak, 1996). Recurrent Nets, which include feedback connections from output to middle or input layers, are one Neural Network approach to dealing with the lag structures and autoregressive properties of time series data (Caudill, 1995b). Efforts have been made to promote the value of combining Neural Network methods with traditional time series (Hansen and Nelson, 1997; Lachtermacher and Fuller, 1995). These studies, however, have focussed on understanding and forecasting univariate time series. Lachtermacher and Fuller promote using Box-Jenkins analyses to

Page 8 of 20

determine degrees of autocorrelation, stationarity, seasonalities and outliers as a preprocessing step in Neural modeling. Hansen and Nelson promote using a portfolio of models, including both ARIMA and Neural models when forecasting state tax revenues. Hansen and Nelson indicate that while Neural Nets do not always produce the greatest predictive accuracy, they often provide important insights into turning points in business cycles missed by other methods (p. 872). Thus, inferences gained from neural modeling can be used to locally optimize ARIMA models. Neural Networks are most often used with relatively large cross-sectional or univariate time series data sets. While the goal of early studies has been to assess and compare predictive accuracy of alternative methods, more recent endeavors have focussed on the use of flexible models for inductive exploration of data (McMenamin, 1997). Questions remain as to whether these flexible non-linear models can provide similar advantages with more limited data, or in a multivariate time series context. This study addresses the comparative value of two neural network architectures, backpropagation and GRNN, with multiple linear regression forecasts of educational expenditures produced by the National Center for Education Statistics.

Methods Data All data were acquired from the National Center for Education Statistics annual Projections of Education Statistics series (Gerald and Hussar, 1990-1997). For each model, fifteen years of annually collected data were used for estimation. Therefore, forecasts of 1991-1995 were based on data from 1976-1990, forecasts of 1992-1995 were based on data from 1977-1991 and so on. Economic time series variables used in developing the forecasts included the following: CUREXP: current expenditures of public elementary and secondary schools per pupil in average daily attendance in constant dollars. PCI: disposable income per capita in constant dollars. SGRNT: local governments education revenue receipts from state sources, per capita, in constant dollars. ADAPOP: the ratio of average daily attendance to the population.

Page 9 of 20

PERTAX1: personal taxes and non-tax receipts to state and local governments per capita in constant dollars. BUSTAX1: indirect business taxes and tax accruals, excluding property taxes, to state and local governments, per capita in constant dollars. ININCR (1992-1995 models): rate of change of inflation RCPIANN (1996 model): inflation rate measured by the CPI RCPIANN1 (1996 model): inflation rate measure by the CPI lagged 1 period. Because multivariate methods were employed, forecasts of predictors were required for the periods 1990-1995, 1992-1995 and so on. For the NCES forecasts, DRI-McGraw Hill provides (1) Optimistic, (2) Pessimistic and (3) Trend scenarios for forecast predictors from which NCES generates it's high, middle and low estimates (Gerald and Hussar, 1990-1997). Only middle estimates, based on the trend scenario, were considered in this study.

Linear Regression Models Since 1990, there have been few modifications of the general structure of the NCES linear regression, forecasting model. The permutations of the basic structure are displayed in Appendix A. Then main equation consists of a multiplicative functional form which has been estimated as an AR(1) model. In 1992, a second equation was included for the prediction of state level grants to public education, for subsequent use in the main equation. While this underlying model had taken an additive form from 19921995, the 1996 iteration of the model used a multiplicative form for estimating state support and included a lagged indicator for the consumer price index (Hussar and Gerald, 1996).

Neural Analyses Two types of neural networks were applied to the multivariate time series: (1) Recurrent Backpropagation and (2) Generalized Regression. In each case, for all time period tests (1990-1995) the first twelve years of estimation data were identified as the

Page 10 of 20

Training Set and the final three years of estimation data were identified as the Test Set.4 Trend scenario forecast predictors were identified as Production Set data and used in generating the forecasts (WSG, 1995). Duplicate sets of neural models were developed whereby each of the following differential treatments were applied: 1. Forecasts current expenditures using the main equation variables were performed with both neural forecasts of state level support and NCES forecasts of state level support. 2. Data were entered into the neural models both as untreated time series and as ln (natural log) transformed series.5 The recurrent backpropagation structure used in this study, referred to as a Jordan-Elman network, is displayed in Figure 2 (WSG, 1995). Consistent with the AR(1) specification of the NCES model, the Jordan-Elman network relates each output at time = t, to each input at time = t + 1. The Jordan-Elman network also, however, relates each input at time = t, with other inputs at time = t+1. Settings for backpropagation training, or iterative estimation, were based on reduction of test set error.6 Figure 2: Jordan-Elman Recurrent Backpropagation Neural Network

Slab 1

Slab 2

Slab 3

Slab 4

Ward Systems Group (1995) Neuroshell v2.0.

Unlike the NCES models, which rely on data from 1959-60 on, only 15 years of data were available for model estimation. Acquisition of additional, compatible time series data through DRI/McGraw-Hill was cost prohibitive for this project. Because the neural networks include their own Sigmoid Squashing Function, in this case a logistic function, it was questionable whether the transformation would yield improvement. The network was set to stop learning when 100,000 iterations had occurred without improving test set error.

Page 11 of 20

For the Generalized Regression Neural Networks, the genetic algorithm, rather than the hold-out method was used for smoothing parameter () estimation (WSG, 1995, p. 133). The equation breeding pool size was set to the maximum (300) to increase the chances of finding and optimal solution. In place of iterations, a stop criteria of 20 generations was used with respect to improvement of test set error.

Comparison of Forecast Accuracy All forecasts were compared against CPI adjusted actual values of current expenditures per pupil in average daily attendance from the most recent edition of the Projections of Education Statistics series (Hussar and Gerald, 1997). The comparison measure used was the Mean Absolute Percentage Error (MAPE).

Inferential Exploration While the predictive power of Neural Networks has gained attention, few standards exist for the inferential use of Neural Networks. The initial step used for comparing the linear regression and flexible non-linear models was to assess the visual characteristics of each forecast to determine whether particular trends or patterns were revealed by the neural networks that may have been missed by the linear models.7 The second phase of inferential analysis involved more closely analyzing the weightings, or coefficients, applied to generate the nonlinearities in the models. In backpropagation, one can go directly to the matrices of weightings applied between each input and all other inputs, and each input and the outputs, which becomes a particularly cumbersome task with recurrent nets, or one can view aggregate indices, in some cases referred to as contribution factors (WSG, 1995). Unfortunately, contribution factors do not express any direct mathematical relationship between input and output measures, they simply summarize the relative importance of each input to the model's predictive capabilities. Also, because contribution factors reduce the relationship between input and output to a single value, non-linear characteristics of the relationship

This approach is similar to the approach taken by Hansen and Nelson (1997) in determining where Neural Networks picked up turns in patterns in their univariate analyses of state revenues (pp. 869-872)

Page 12 of 20

are lost. With time series data, however, some non-linearites in the response of the output to inputs can be revealed by assessing contribution factors of annually updated models (Baker, 1997).8 Similar to contribution factors in backpropagation, Neuroshell 2.0 provides an aggregate index for the smoothing parameters of the GRNN algorithm (WSG, 1995). While actual, individual, smoothing parameters range from 0 to 1, the aggregate indices provided in Neuroshell 2.0 range from 0 to 3, and, like the contribution factors, do not express any direct mathematical relationship of input to outcome. They simply relate the relative sensitivity of the response of the output to a given input (WSG, 1995).9 Another difficulty in using the aggregate smoothing indices is that the magnitude of the indices is not comparable from one network to the next, thus only the relative importance of inputs can be compared across updated models (Baker, 1997).

Results Forecast Accuracy Table 1 displays the Mean Absolute Percentage Error results of the predictive accuracy of the Neural Network models and the NCES linear regression models. In each case, the best predictions are indicated in bold and the poorest predictions are underlined. Across all years, the results are somewhat mixed. While in most cases the best prediction was produced by a neural network, in all cases, the worst predictions were also produced by neural networks. Only the GRNN of the Ln transformed data, outperformed all other models with relative consistency.

This approach alleviates the necessity to extensively analyze each weighting matrix by revealing general changes in the importance of inputs in the prediction of outputs that occur over time. (pp. 91-92) For a more detailed description see p. 155 of the Neuroshell 2.0 User's Manual (WSG, 1995).

Page 13 of 20

Table 1: Summary of MAPE values for NCES and Neural models


NCES Untreated Data Recurrent Backpropagation 3.20% GRNN 3.76% Log Transformed Data Recurrent Backpropagation 3.30% GRNN 2.64% Untreated Data (neural SGRANT) Recurrent Backpropagation GRNN Log Transformed Data (neural SGRANT) Recurrent Backpropagation GRNN 1991-1995 1992-1995 1993-1995 1994-1995 1995 2.84% 2.98% 4.29% 0.80% 1.06% 4.02% 3.82% 4.11% 2.00% 5.32% 5.05% 5.75% 2.74% 5.25% 4.70% 5.53% 4.79% 2.00% 0.42% 2.02% 1.81% 2.03% 1.47% 1.87% 1.76% 1.63% 2.93% 1.86% 2.34% 1.92% 2.15% 2.22% 2.34%

Most forecasts, NCES and Neural Network, were significantly improved from 1993 to 1994, but with spending increasing much less than expected in 1995, prediction accuracy again eroded (1994-1995). Earlier versions (1991-1993) of the DRI/McGrawHill forecast predictors seemed to produce consistent overestimates of educational spending for both the NCES linear models and all neural models.

Inferential Results Forecast characteristics could only be assessed where long enough forecast time periods were available. Figure 3 and Figure 4 display the forecast characteristics for the 1991-1995 backpropagation and GRNN forecasts compared with NCES forecasts and actual expenditures per pupil. While the forecasts in Figure 3 and Figure 4 were not necessarily the most accurate forecasts, the neural networks did display some advantages. Despite consistent overestimation of spending by the backpropagation neural network, this network still appeared to better identify the slowed growth in educational expenditures from 1991 through 1993. The network then seemed to pick up the upturn in spending in 1994. The GRNN (Figure 4) closely approximated the slowed growth of the early 1990s, but then overestimated the magnitude of the upturn in spending. For this same period the NCES linear regression model projected a nearly constant slope, yielding consistent overestimation beyond 1991.

Page 14 of 20

Figure 3: Characteristics of Backpropagation Forecast (MAPE = 3.30%)


Backprop & NCES Forecasts 1991-1995
$5,100 $5,000 $4,900 $4,800 $4,700 $4,600 $4,500 1991 1992 1993 1994 1995 Actuals '88-89 NCES Neural

Figure 4: Characteristics of GRNN Forecast (MAPE = 2.64%)


GRNN & NCES Forecasts 1991-1995
$5,200 $5,100 $5,000 $4,900 $4,800 $4,700 $4,600 $4,500 1991 1992 1993 1994 1995 Actuals '88-89 NCES Neural

Table 2 displays the linear regression coefficients (AR1 estimation), contribution factors and GRNN smoothing parameters for each input of the main model for annually updated models. Perhaps the most revealing pattern is the changing relative importance of the role of the states in providing funding to public education (SGRANT). Overall, the relative importance of SGRANT increased from earlier to later backpropagation models. To the contrary, the magnitude of the coefficient for SGRANT in the NCES model is greatest in the 1991 forecast model. We know, however, that state funding began to level off in the early 1990s, while the state role, in general, had increased over the past fifteen

Page 15 of 20

to twenty years (Gold, 1995). The placement of increased weighting on state support to education may provide one explanation for why the backpropagation neural network was able to pick up the slowed growth in total spending from 1991-1993. In short, the backpropagation models became increasingly driven by SGRANT, while the linear regression models became increasingly PCI (Per Capita Income) driven. GRNN smoothing factors were much less useful for deriving inferences, in that vast inconsistencies existed from model to model. The inconsistencies were likely due to the process that was used for selection of the optimum model: the Genetic Algorithm. Optimum models produced by this method vary according to the members of the initial gene pool of equations. Table 2: Comparison of regression coefficients from NCES model with contribution factors generated by neural networks.
Backpropagation Year of Predictor Model 1991 PCI SGRNT ADAPOP 1992 PCI SGRNT ADAPOP 1993 PCI SGRNT ADAPOP 1994 PCI SGRNT ADAPOP 1995 PCI SGRNT ADAPOP Regression Coefficient 0.445 0.702 0.416 0.466 0.691 0.409 0.640 0.591 0.334 0.521 0.651 0.374 0.597 0.614 0.345 Contribution (Untreated Data) 0.509 0.241 0.149 0.371 0.394 0.176 0.390 0.366 0.193 0.186 0.370 0.325 0.250 0.395 0.162 Backpropagation Contribution (Log Transformed Data) 0.381 0.261 0.176 0.351 0.327 0.177 0.290 0.286 0.333 0.181 0.380 0.334 0.244 0.374 0.189 GRNN Smoothing Factor* (Untreated Data) 2.918 1.776 1.718 2.812 2.847 1.424 0.647 2.953 2.682 0.271 1.165 2.765 1.094 2.094 0.576 GRNN Smoothing Factor (Log Transformed Data) 1.753 1.082 0.188 2.529 0.482 1.471 2.988 2.941 0.541 0.753 0.482 2.176 2.282 1.847 0.541

Conclusions and Recommendations While the results on prediction accuracy were mixed, there is evidence to suggest that an appropriate neural network architecture can be selected that will consistently outperform simple linear regression models. In this study, the GRNN model of the (Ln) transformed series maintained the most consistent predictive accuracy. While it remains

Page 16 of 20

likely that a non-linear regression model can be constructed by traditional methods that will perform equally well, neural networks provide a relatively simple, user-friendly approach to the development of data driven, flexible, non-linear models. The future value of Neural Networks in public finance policy research may have little to do with prediction accuracy at all. Prediction tools generally find their place in competitive industries where gaining the edge on prediction accuracy yields an identifiable competitive advantage. In public finance policy research, the real value of neural networks may be their potential use as an inductive, or exploratory, analytical tool. While in this study, recurrent backpropagation revealed patterns of changing state support that can be easily grounded in economic theory, applications of similar methods to data with greater degrees of variance, and more complex nonlinearities, may reveal patterns or relationships that have long evaded economists and educational researchers.

Page 17 of 20

References Baker, Bruce (1997) A Comparison of Statistical and Neural Network Models for Forecasting Educational Spending. Doctoral Dissertation. Teachers College, Columbia University. Buchman, Timothy G. ; Kubos, Ken L.; Seidler, Alexander J.; Siegforth, Michael J. (1994) A comparison of statistical and connectionist models for the prediction of chronicity in a surgical intensive care unit. Journal of Critical Care Medicine. 22 (5) 750-762 Caudill, Maureen (1995a-February) Part 1: The View from Now. In Using Neural Networks. AI Expert. 5-12 Caudill, Maureen. (1995b-February) Part 3: Putting Time in a Bottle. In Using Neural Networks. AI Expert. 19-24 Caudill, Maureen (1995c-February) Part 7: GRNN and Bear It. In Using Neural Networks. AI Expert. 47-52 Gerald, Deborah and Hussar, William. (1991) Projections of Education Statistics to 2001. National Center For Education Statistics. U.S. Dept. of Ed. Gerald, Deborah and Hussar, William. (1992) Projections of Education Statistics to 2002. National Center For Education Statistics. U.S. Dept. of Ed. Gerald, Deborah and Hussar, William. (1993) Projections of Education Statistics to 2003. National Center For Education Statistics. U.S. Dept. of Ed. Gerald, Deborah and Hussar, William. (1994) Projections of Education Statistics to 2004. National Center For Education Statistics. U.S. Dept. of Ed. Gerald, Deborah and Hussar, William. (1995) Projections of Education Statistics to 2005. National Center For Education Statistics. U.S. Dept. of Ed. Gerald, Deborah and Hussar, William. (1996) Projections of Education Statistics to 2006. National Center For Education Statistics. U.S. Dept. of Ed. Gerald, Deborah and Hussar, William. (1997) Projections of Education Statistics to 2007. National Center For Education Statistics. U.S. Dept. of Ed. Gold, Steve (1995) The Outlook for School Revenue in the Next Five Years. Consortium for Policy Research in Education. CPRE Research Report Series #34. Hansen, James V. Nelson, Ray D. (1997) Neural Networks and Traditional Time Series Methods: A Synergistic Combination in State Economic Forecasts. IEEE Transactions on Neural Networks. 8 (4) 863-873 Hanushek, Eric. (1994) Making Schools Work: Improving Performance and Controlling Costs. The Brookings Institution, Washington DC.

Page 18 of 20

Lachtermacher, Gerson; Fuller, David. (1995) Backpropagation in time-series forecasting. Journal of Forecasting. 14 (4) 381-393 Liao, Tim Futing (1992) A modified GMDH approach for social science research: exploring patterns of relationships in the data. Quality and Quantity. 26. 19-38 Liu, Lon-Mu and Hudak, Gregory B. (1996) Modeling and Forecasting Time Series Using SCA-Expert Capabilities. Scientific Computing Associates. Oak Brook, Ill. McCollough, Jane. (1990-October) Municipal Revenue and Expenditure Forecasting: Current Status and Future Prospects. Government Finance Review. 38-40 McMenamin, J. Stuart (1997- Fall) A Primer on Neural Networks for Forecasting. Journal of Business Forecasting. 17-22 Murphy, Christopher; Fogler, H. Russell; Koehler, Gary (1994-March 30) Artificial Stupidity. ICFA Continuing Education. 44-49 Nabangi, Fabian Kafuko. (1992) The utility of ARIMA models to revenue forecasting for Alabama urban county governments. Doctoral Dissertation. Department of Economics. The University of Alabama. Newbold, P. (1984) Statistics for Business and Economics. Prentice-Hall, Englewood Cliffs, NJ. Odom, Marcus. (1994) A Neural Network Model for Bankruptcy Prediction. Oklahoma State University. Unpublished Manuscript. Pankratz, Alan. (1991) Forecasting with Dynamic Regression Models. John Wiley and Sons. New York. Rao, Valluru; Rao, Hayagriva. (1993) C++ Neural Networks and Fuzzy Logic. MIS Press. Shen, Jue-Chi (1994) Estimating state expenditures on public education: an economic study from pooled time-series and cross-sectional data. Doctoral Dissertation. Department of Economics. The University of Tennessee. Dissertation Abstracts International 56-04A. Specht, D. F. (1991) A Generalized Regression Neural Network. IEEE Transactions on Neural Networks 2 (5) 568-576. Vandaele, Walter. (1983) Applied Time Series and Box-Jenkins Models. Academic Press Inc. San Diego. Ward Systems Group(WSG) (1995) Neuroshell 2. Users Guide. Frederick, MD. 1995 Worzala, Elaine; Lenk, Margarita; Silva, Ana.(1995) An exploration of neural networks and its application to real estate valuation. Journal of Real Estate Research. 10 (2) 185-201

Page 19 of 20

Appendix A: NCES Equations for Forecasting Educational Expenditures


Publication year 1991 ln(CUREXP) = -.771 + 0.445ln(PCI) + 0.702ln(SGRNT) -0.416ln(ADAPOP) Equation for Forecasting CUREXP Equation for Forecasting SGRNT

1992

ln(CUREXP) = -.892 + 0.466ln(PCI) + 0.691ln(SGRNT) -0.409ln(ADAPOP)

SGRNT = -161 + 0.30 PERTAX1 + 0.18 BUSTAX1 + 1047 ADAPOP - 10.2 ININCR

1993

ln(CUREXP) = -1.836 + 0.640ln(PCI) + 0.591ln(SGRNT) -0.334ln(ADAPOP)

SGRNT = -125.8 + 0.31PERTAX1 + 0.28 BUSTAX1 + 718 ADAPOP - 13.3 ININCR

1994

ln(CUREXP) = -1.145 + 0.521ln(PCI) + 0.651ln(SGRNT) -0.374ln(ADAPOP)

SGRNT = -126.1 + 0.27PERTAX1 + 0.31 BUSTAX1 + 688 ADAPOP - 13.6 ININCR

1995

ln(CUREXP) = -1.620 + 0.597ln(PCI) + 0.614ln(SGRNT) -0.345ln(ADAPOP)

SGRNT = -85.8 + 0.22 PERTAX1 + 0.35 BUSTAX1 + 438 ADAPOP - 8.7 ININCR

1996

ln(CUREXP) = -1.704 + 0.612ln(PCI) + 0.605ln(SGRNT) -0.338ln(ADAPOP)

ln(SGRNT) = -0.26 + 0.33 ln(PERTAX1) + 0.63 ln(BUSTAX1) + .34 ln(ADAPOP) 0.031(RCPIANN/RCPIANN1)

Page 20 of 20

Вам также может понравиться