Академический Документы
Профессиональный Документы
Культура Документы
ANN-Model Training. The third step is to train the ANN model data, it is normal to have the validation error statistics slightly
against the database. The objective of the training is for the network larger because that set is relatively small in size and the database
to learn the underlying behavior of the training data set. There are inevitably contains some noise. However, all differences in
numerous algorithms available for training neural-network models. Table 1 are considered to be well within an acceptable range.
During the training phase, the ANN determines the weighting values Even at 3.0% mean relative error for the validation set, this indi-
W and the bias values b to quantify the relationships between input cates only a small error for a model that predicts GOR covering a
values x and corresponding output values y. range of 10 scf/STB to more than 100,000 scf/STB.
In this GOR ANN example, 80% of the 1,834 points available For an additional evaluation of the GOR ANN model, we
from the PVT database were selected at random for training, 10% compared its performance with two other models: (1) the algo-
were chosen as the calibration set, and the remaining 10% were rithm for the DFA tool described by Fujisawa et al. (2002), which
used for validation of the model. Furthermore, to avoid any even- is a four-component model (C1, C2–C5, C6+, CO2); and (2) the
tual overfitting, the training algorithm was developed to minimize algorithm for the DFA tool described by Dong et al. (2007),
not only the GOR-training error but also the differences between which, like the ANN, is a five-component model. Also note that
the ANN-model derivatives and physically sound GOR deriva- in contrast to the ANN model, which was trained with 90% of the
tives with respect to mass composition (Gaganis and Varotsis 1,834-point database, the DFA-tool algorithms had never seen the
2005). The backpropagation method with a batch-learning database. A statistical summary of the performance evaluation is
paradigm, in which optimization is carried out with respect to given in Table 2. The validation set are the only data unseen by
the entire training set simultaneously, was used. In addition, the all three models; the comparison of the results for this set is shown
Broyden-Fletcher-Goldfarb-Shanno algorithm (Nocedal and in the last row of the table. These results show the improved
Wright 1999) was chosen for the optimization step. performance of the ANN model in comparison to the DFA-tool
During the training phase, the objective is to minimize the algorithms. Fig. 2 shows the predicted vs. laboratory GOR values
error between the ANN model and the training set (i.e., 80% of for the ANN model, and Fig. 3 displays the relative error for the
the database). The purpose of the calibration set (i.e., 10% of the ANN model. Corresponding graphs for DFA Tool 1 are presented
database) is to monitor the error independently during training, in Figs. 4 and 5, respectively, and for DFA Tool 2 in Figs. 6
thereby serving as a cross-validation. This allows an early- and 7, respectively. In summary, the ANN model performed con-
stopping technique to be used—the training is terminated when sistently well throughout the GOR range, while the other two
the error for the calibration set stops decreasing, even if the error models showed larger errors at low-GOR and high-GOR values,
for the training set itself is still decreasing. We can say that during with a majority of the errors being negative.
training, the training set is directly seen by the ANN model, and As we previously described, the ANN model was trained and
the calibration set is indirectly seen. Thus, 90% of the database validated against a worldwide database. For an additional check of
participates in the training, while the remaining 10%, the vali- the model, we examined its performance against data subsets for
dation set, is unseen by the ANN during training. the various main geographic regions represented in the database,
such as the North Sea, Middle East, and Gulf of Mexico regions.
Model Evaluation. The fourth and final step is to evaluate the We observed that the ANN GOR model errors showed no regional
ANN model. This is done by checking the performance of the dependency. To verify this observation further, we collected addi-
trained ANN against the unseen validation set. The results are tional PVT laboratory reports for formation-fluid samples and
presented in Table 1. (The statistical terms are defined in the tested the model. For example, we collected 453 reports for sam-
Appendix.) The ANN model performs almost identically against ples from the Gulf of Mexico region. The performance of the
both the training and validation data, indicating the robustness of ANN GOR model against this completely unseen data is presented
the model. Note that although, for example, the mean relative in Fig. 8. The results are consistent with the model performance
error for the validation set is 1.5% larger than that for the training against the worldwide database (Fig. 2).
Fig. 5—Distribution of relative error for DFA-Tool-1 (four- Fig. 6—Comparison of DFA-Tool-2 (five-component compo-
component composition) GOR model against PVT database. sition) GOR model against PVT database.