Вы находитесь на странице: 1из 8

ISIJ International, Vol.

57 (2017),
ISIJ International,
No. 12 Vol. 57 (2017), No. 12, pp. 2229–2236

Prediction of Ac3 and Martensite Start Temperatures by a Data-


driven Model Selection Approach

Hoheok KIM,1)* Junya INOUE,1,2) Masato OKADA3) and Kenji NAGATA3)

1) Graduate School of Materials Engineering, The University of Tokyo, 7-3-1 Bunkyo, Hongo, Tokyo, 113-8656 Japan.
2) Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro, Tokyo, 153-0041
Japan.
3) Graduate School of Frontier Science, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8651 Japan.
(Received on April 13, 2017; accepted on August 2, 2017)

Four different information criteria, which are widely used for model selection problems, are applied to
reveal the explanatory variables for phase transformation temperatures of steels, austenitise temperature
(Ac3) and martensite-start temperature (Ms). Using existing datasets for CCT diagram for various steels,
the predictive equations for these critical temperatures are derived. A number of empirical equations have
been proposed to enable efficient prediction of the the Ac3 and Ms temperatures of steels. However, the
key parameters in those equations are usually chosen based on researchers’ trials and errors. In this study,
the performance of the information criteria is evaluated first using a simulated dataset mimicking the
characteristics of those for the Ac3 and the Ms temperatures. Then the criteria are applied to the experi-
mental data obtained from two different sources. The key parameters are chosen for the Ac3 and Ms
temperatures and the derived equations are found to be in better agreement with experimental data than
the previous empirical equations. Thus, it was clarified that the methods can be applied to automatically
discover the hidden mechanism from complex multi-dimensional datasets of steels’ chemical composition.

KEY WORDS: steel; martensite start temperature; Ac3 temperature; modeling; model selection criterion;
AIC; ABIC; BIC; cross validation.

unfortunately is not automatically done using the conven-


1. Introduction
tional multivariate regression analysis. Accordingly, several
Due to the importance of phase transformations and data-driven approaches have been introduced. For instance,
heat treatments on the mechanical properties of steels, a a Bayesian neural network has been introduced to establish
large number of studies have been conducted to clarify the predictive models for Ac3 by Vermeulen et al.8) and Ms
effect of various alloying elements on phase transformation by Sourmail and Garcia-Mateo,9) and it was demonstrated
temperatures, such as the martensite-start temperature (Ms) that this new data-driven approach enables both automatic
and austenite transformation temperature (Ac3). Therefore, model selection and provides superior estimation of those
various empirical models have been proposed and the most transformation temperatures.
commonly used equations among those are listed in Table Although the neural network approach does provide a
1.1–7) All these equations were derived by the multivariate good estimation, it does not allow the explicit separation of
regression analysis using large experimental datasets com- different roles of alloying elements. Of course, it is possible
posed of several tens or hundreds of steels with various to incorporate the effect of each alloying elements from
chemical compositions. the output of the neural networks to a series of controlled
However, the key parameters in these equations, which inputs. However, from an engineering point of view, the
are considered to have notable effects on the phase trans- explicit correlation is usually more important, so the previ-
formation temperatures, were selected mostly from the ous empirical regression models are still widely used.10,11)
insight of the experienced researchers. As a result, there Recently, another data-driven approach called as data
are several disagreements on how alloying elements affect driven model selection has been increasingly introduced to
phase transformation temperatures. For instance, the terms solve the physical problems. The data-driven model selec-
corresponding to the effect of carbon on Ac3 are totally tion is a method where the model most suitable for a given
different between equations by Andrews and Hougardy.1,2) dataset is selected based on the information criteria. There
For reliable prediction, therefore, determination of the exist several information criteria, such as Akaike informa-
key parameters is the most important process, which tion criterion (AIC) by Akaike,12) Bayesian information cri-
terion (BIC) by Schwarz,13) Akaike’s Bayesian information
* Corresponding author: E-mail: ho_heok_kim@metall.t.u-tokyo.ac.jp criterion (ABIC) by Akaike,14) and cross-validation (CV).15)
DOI: http://dx.doi.org/10.2355/isijinternational.ISIJINT-2017-212 Some of application of the methods in materials engineering

2229 © 2017 ISIJ


ISIJ International, Vol. 57 (2017), No. 12

Table 1. A summary of the past equations for Ac3 and Ms temperature based on chemical composition of steels.

Ac3 temperature
Proposed by Equation
Andrews (1965) Ac3(°C) = 910 −203C1/2 + 44.7Si −15.2Ni +31.5Mo +104.4V+13.1W
Hougardy (1984) Ac3(°C) = 902 −255C +19Si −11Mn − 5Cr +13Mo −20Ni + 55V
Ac 3(°C) = 912 − 370C − 27.4Mn + 27.3Si − 6.35Cr − 32.7Ni + 95.2V +190Ti + 72Al + 65.6Nb + 5.57W + 332S + 276P + 485N
Kasatkin et al. (1984) −900B +16.2CMn + 32.3CSi +15.4CCr + 48CNi + 4.32SiCr −17.3SiMo +18.6SiNi + 4.8MnNi + 40.5MoV+174C2
+ 2.46Mn2 − 6.86Si2 + 0.322Cr 2 +9.9Mo2 +1.24Ni2 − 60.2V2
Trzaska and Dobrza ski (2007) Ac3(°C) = 973 −224.5C1/2 −17Mn +34Si −14Ni + 21.6Mo + 41.8V −20Cu
Ms temperature
Proposed by Equation
Payson and Savage (1944) Ms(°C) = 489.9 −316.7C −33.3Mn −27.8C −16.7Ni −11.1(Si +Mo +W)
Grange and Stewart (1946) Ms(°C) = 537.8 −361.1C −38.9(Mn +Cr) −19.4Ni −27.8Mo
Andrews (Linear, 1965) Ms(°C) = 539 − 423C −30.4Mn −17.7Ni −12.1Cr −7.5Mo
Andrews (Non linear, 1965) Ms(°C) = 512 − 453C −16.9Ni +15Cr − 9.5Mo + 217C2 −71.5CMn − 67.6CCr
Wang et al. (2000) Ms(°C) = 545 − 470.4C −3.96Si −37.7Mn −21.5Cr +38.9Mo

area are as follows: Al-Rubaie et al.16) used AIC to select Suppose each observation is generated independently, then
the model for fatigue crack growth rate, and Cockayne and the likelihood function of the entire data, p(y|x, θ), can be
van de Walle17) obtained a cluster expansion model for the represented by
CaZr1 − xTixO3 solid solution by applying CV. N

All of these criteria can be used to select the best model p ( y|x,θ ) = ∏ p ( yn |x,θ )
n =1
suitable for a given dataset, but difference in their assump-
tion may lead to different performance under different 1  1 N
 .... (2)
∑ ( yn − θ x )
2
= exp  − 2 .
circumstances, such as the amount of data and the kinds of N
 2s
model to be selected. Therefore, a comprehensive under- ( 2π s )
2 2 n =1 

standing of the behavior of each model selection criterion where N represents the number of data. The model param-
is needed when applying it to a specific problem. eters that maximize the likelihood are chosen for the given
In this paper, AIC, BIC, ABIC, and CV are first briefly data. This is to maximize the agreement of the candidate
explained, and then the performance of each model selection model with the observations, and this method is known as
criterion evaluated using a simulated dataset considering the maximum likelihood estimation (MLE). With MLE, the
characteristics of the databases of Ac3 and Ms temperatures. maximum likelihood L̂ is written as
Finally, the criteria are applied to determine the key param-
 1 
( ) ∑ ( yn − θˆ x )
N
eters for the Ac3 and Ms temperatures using two kinds of 1 2
Lˆ = p y|x,θˆ = exp  − 2  ... (3)
data sources: one is exactly same dataset used by Andrews1)
( 2π s )
N
2 2  2s n =1 
and the other from the CCT database for steels provided by
the NIMS Materials Database (MatNavi).18) where θˆ is a parameter determined by the least-squares
method. In other words, the probability of a model with
the optimized parameters for given data can be calculated
2. Likelihood Function and Model Selection Criteria
by MLE.
This section presents a brief review of the basis of typical
model selection criteria. 2.2. Model Selection Criteria
Simply comparing candidate models with the maximum
2.1. Likelihood Function likelihood may lead to an overfitting problem. This occurs
In many scientific studies, the objective is to find the when a model is too complex such as having too many
underlying relationship that yields the data. This problem parameters relative to the observation data. Then the model
usually leads to the evaluation of a set of candidate models. describes error or noise instead of the underlying relation-
In such a problem, a likelihood is used to estimate the prob- ship. This problem can be explained from Fig. 1(a). A
ability of a model for given observations. Assuming that simple model cannot describe given data precisely and also
each observation yn includes Gaussian noise in a model and gives a poor prediction. When the complexity is moderate,
parameters, the likelihood of yn is given by Eq. (1), where the model fits both the given data and new observations
x, θ, and s indicate the input data, the parameters, and the successfully. A more complex model, on the other hand,
standard deviation of noise, respectively. describes the given data perfectly but fails to predict new
observations. As shown in Fig. 1(b), the disagreement of
1  1 2 a model with given data, which is usually represented as
p ( yn |x,θ ) = exp  − 2 ( yn − θ x )  ....... (1)
2π s 2
 2 s  training error, decreases as a model becomes more complex.
However, the prediction error of the model with unseen data

© 2017 ISIJ 2230


ISIJ International, Vol. 57 (2017), No. 12

is high when the model is too simple or complex and low Finally, we consider cross-validation (CV) which is a
when the complexity is moderate. Therefore, MLE, which technique for assessing how well the analysis will general-
estimates how well a model fits the observations, should not ize an independent dataset.15) The calculation procedure
be used as the only indicator when choosing the best model. is illustrated in Fig. 2. In the calculation, a part of data is
For this reason, a variety of methods have been suggested used for analysis (training dataset) and the remainder for
which include a penalty term to MLE to consider the model validation (validation dataset). The goal of cross-validation
complexity. is to avoid overfitting problems by verifying a model with
Akaike1) proposed the AIC based on information theory. the validation dataset. In particular, in leave-one-out cross
It estimates a relative quality of a given model measuring validation (LOOCV) only one datum is used as a validation
the Kullback Leibler distance. It also provides an estimate set and the remaining data are used as a training set. Regard-
of the information loss when a model is used to express a ing these criteria, a model with a lower AIC, BIC, ABIC,
data-generating process. In this way, it evaluates both the or CV is preferred.
goodness of fit and the complexity of a model. The AIC is
defined as Eq. (4), which evaluates the model fitting with
3. Validation of Each Criterion Using Toy Model
the term −2lnL̂ and penalizes the model complexity with the
term 2 k where k is the number of parameters. 3.1. Setting of Toy Models
First, to demonstrate the effectiveness of each criterion
AIC = −2lnLˆ + 2 k ........................... (4) in the feature and model selection problem, the following
The BIC is a criterion for model selection among a simple linear combination model is considered:
finite set of models and was developed by Schwarz.13) 10
The equation for the probability of a model for given data Model #1 : y = ∑ f ( x ) + w,
was derived by the Laplace approximation. It balances the i =1 .................... (9)
increase in the likelihood with the term calculated from the
number of parameters (k) and the number of data (N).
(
f ( x ) = xiθi , w ~ N 0, σ 2
)
where y, xi, f(x), w, and θi are the observations, input data,
BIC = −2lnLˆ + kln ( N ) ....................... (5) underlying relationship, noise, and parameters, respectively.
Next, Akaike’s Bayesian information criterion (ABIC) The input data are distributed uniformly within a range
proposed by Akaike14) is considered. With the assumption between 0 and 1, and the noise is given by a zero mean
that the noise in data follows a Gaussian distribution and normal distribution with standard deviation σ. An overview
the prior follows a uniform distribution, the probability of a of the toy model is listed in Table 2. Only five out of 10
model for given data can be expressed as Eq. (6) inputs are designed to actually have an effect on the output.
1 1 By varying the standard deviation σ from 1 to 10, 50 to 800
ABIC = − µ T Λ−1 µ + 2 yT y combinations of input data and the output observations are
2 2s
............ (6) randomly generated to observe the trend of model selection
n−k 1
+
2
(
log 2π s 2 − log ( Λ )
2
) result with increasing number of data. The parameters for
two cross terms of input data (a9 for x1 * x7 , a10 for x2 * x8 )
where are also considered. The whole procedure is repeated 100
times to evaluate the performance of each criterion.
( ) ................................ (7)
−1
Λ = xx T Additionally, zero data are inserted into the input data
with different ratios to investigate the effect of blank data on
µ = ( xx ) xy. .............................. (8)
−1
T
the model selection procedure. The zero data are randomly
The first three terms estimate how well a model is fitted to inserted in the input data and their fraction of each param-
the data and the last term is a penalty for the model com- eter is increased from 33% to 99%. The following simple
plexity. linear combination model is considered:

Fig. 2. Schematic representation of the cross-validation method.

Table 2. An overview of the the values of the parameters of the


Model #1 and their portion of including zero data.

Parameter a1 a2 a3 a4 a5 a6 a7 a8 a9 a10
Fig. 1. Schematic plot of the input, the observation and the mod-
els with different complexity (a) and the training and pre- Value 3 0 7 0 11 0 15 0 19 0
diction errors as the model complexity increases are repre-
% of zero data 10 10 30 30 50 50 70 70 70–80 70–80
sented (b).

2231 © 2017 ISIJ


ISIJ International, Vol. 57 (2017), No. 12

20
and are all excluded, the result is counted as successful. For
Model # 2 : y = ∑ f ( x ) + w,
i =1
.......... (10) example, Fig. 3(a) plots the model selection result with dif-
ferent number of data (N) for the AIC. When the noise level
(
f ( x ) = xiθi , w ~ N 0, σ 2 ) is 1, AIC succeeds 39 times out of 100 trials with 50 data,
The overview of the parameter and the fraction of zero data and 54 times with 800 data. All four criteria have a common
is listed in Table 3. trend that the frequency of selecting true model decreases as
the noise level increases. Also, when there are more data,
3.2. Simulation Result for Toy Model the true model is more likely to be found. It is natural that
Figure 3 illustrates the frequency of selecting the true if there is much more noise and less data, it becomes more
relationship for each criterion with Model #1 under differ- difficult to find the true underlying relationship.
ent numbers of data and noise levels. In this analysis, if θ1, Among the four criteria, the BIC shows the best perfor-
θ3, θ5, θ7, and θ9 are all selected and θ2, θ4, θ6, θ8, and θ10 mance in terms of true model selection problems with a
probability of 70% of finding the true model with the largest
dataset in the entire range of noise levels. With small data-
sets, it still chooses the true model in 70% of cases when
there is little noise.
The frequency of finding the true model was lowest with
the AIC and CV and was less than 60% even with a large
dataset. With small dataset, the frequency of finding the
true model was less than 50%. In addition to their poor
performances, the trend of their performances appears to
be similar. It was proved by Stone19) that the calculation
processes of the AIC and CV are equivalent in terms of the
case of linear model selection. Therefore, both criteria can
be treated as the same method in this linear case.
Furthermore, the frequency of features chosen as a work-
ing parameter was also evaluated to investigate the model
Fig. 3. The frequency of choosing the true model by each crite-
selection performance from the viewpoint of feature selec-
rion. The figure (a), (b), (c) and (d) respectively show the
result by AIC, BIC, ABIC and CV. tion with Model #1, and the result is plotted in Fig. 4. This

Fig. 4. The result of parameter selection by each criterion. The y axis indicates the times that each parameter is chosen
as a key parameter by each model selection criterion with different noise level (the x axis) and number of data (n)
when the calculation is repeated 100 times.

Table 3. An overview of the values of the parameters of the Model #2 and their portion of including zero data.

Parameter a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20
Value 0 0 0 0 0 5 5 5 5 5 10 10 10 10 10 20 20 20 20 20
% of zero data 33 66 77 88 99 33 66 77 88 99 33 66 77 88 99 33 66 77 88 99

© 2017 ISIJ 2232


ISIJ International, Vol. 57 (2017), No. 12

figure shows how frequently each parameter is estimated as is decreased with increasing noise. The trend of the result
a necessary feature by the four methods for various numbers changes slightly until the fraction of zero data becomes
of data and noise levels. The result shows that parameters 88%. However, with little information, the key parameters
with lower values and including larger percentages of zero were barely chosen when the standard deviation of noise is
data, which are shown in Table 2 tend to be excluded as the higher than 5. This result suggests that zero data has no con-
noise level increases. This suggests that such parameters are siderable effect on the model selection as far as the quality
considered unnecessary because they are buried in noise. of input data is good and the noise level is sufficiently low.
Among the four criteria, the BIC recorded the highest ratio
of finding the true key parameters (a1, a3, a5, a7 and a9) and
4. Derivation of Working Parameters of Ac3 and Ms
excluding the unnecessary ones. The ABIC is not an effec-
Temperatures of Steels Using Existing Datasets
tive method with small datasets but it showed improved
performance with a large number of datasets. The AIC and 4.1. Datasets Used to Derive Equations for Ac3 and Ms
CV produced almost the same results. From the viewpoints Temperatures
of both true model selection and feature selection, the BIC Two sets of data are used to find working parameters
method showed the best performance. for the Ac3 and Ms temperatures. The first dataset is from
In addition, Fig. 5 shows the effect of zero data on model various sources in Andrews’ paper1), which include high
selection result using the BIC with Model #2. Each chart carbon contents and low contents of alloying elements. The
indicates the fraction of success for the various fractions second dataset is provided by NIMS Materials Database
of zero data, noise in data, and value of parameters. When (MatNavi),18) which covers lower carbon contents and higher
there is no zero data and the number of data is 100, the contents of alloying elements than those of the sources used
parameter of 20 is chosen 100% in all noise range. As the by Andrews. An overview of the data and the distribution of
value of the parameter is decreased, the fraction of success important elements in both databases are listed in Table 4

Fig. 5. The model selection result with different zero data ratio. It shows that there is no big difference of the results
until when the% of zero data is 88.

Table 4. The chemical composition and the minimum to maximum range of the data used to derive predictive equations
for Ac3 and Ms temperatures. Upper and lower values in the table indicate maximum and minimum content of
each alloying element.

Data No. of
C Si Mn Ni Cr Cu Mo V Ti Nb B P S Al N W As
source data
0.11/ 0.06/ 0.04/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/
Andrews 155 – – –
0.95 1.78 1.98 5.00 4.48 0.91 1.02 0.7 0.05 0.06 0.06 0.10 4.1 0.07
Ac3
0.03/ 0.01/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/
NIMS 198 – –
0.4 1.76 1.98 9.33 9.04 0.92 1.32 0.56 0.03 0.18 0.02 0.19 0.14 0.1 0.01
0.11/ 0.11/ 0.04/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/
Andrews 243 – – – – –
0.58 1.89 4.87 5.04 4.61 0.91 5.4 0.7 0.05 0.04 8.88 0.07
Ms
0.02/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/ 0/
NIMS 258 – –
0.94 1.76 2.05 9.11 9.04 1.1 1.66 0.56 0.1 0.18 0.02 0.19 0.04 0.1 0.01

2233 © 2017 ISIJ


ISIJ International, Vol. 57 (2017), No. 12

and plotted in Fig. 6, respectively.


In this study, the data used by Andrews are analyzed
and the results are compared with the equation obtained by
Andrews. Then, both sets of data are employed to derive
new equations.

4.2. Model Selection Using Andrews’ Dataset


First, the four criteria are applied to the datasets used by
Andrews. In this way, the determination of the key param-
eter by the researcher and the data-driven approach can be
compared. The following linear combination models includ- Fig. 6. Chemical composition ranges of major alloying elements
ing lower and higher-order terms of carbon are considered: in Andrews and NIMS data sets.
1 1
Ac3 ( °C ) = 910 + a1C 3 + a2C 2 + a3C + a4C 2
+ a5 Si + a6 Mn + a7 Ni + a8Cr + a9Cu .... (11) conducted. The model selection results are compared with the
+ a10 Mo + a11V + a12Ti + a13 P + a14 S equations derived by Andrews and are listed in Table 5. The
results for each criterion are similar to the equation defined
+ a15 Al + a16W + a17 As by Andrews. However, Andrews failed to include terms for
1 1 titanium, sulfur, aluminum, and arsenic. This is because the
Ms ( °C ) = 539 + a1C 3 + a2C 2 + a3C + a4C 2 amount of these elements were less than 0.1 wt% in the
+ a5 Si + a6 Mn + + a7 Ni + a8Cr + a9Cu ... (12) steels used in his study, which made it difficult to discover
their effects on the Ac3 and Ms temperatures. The result for
+ a10 Mo + a11V + a12Ti + a13 P + a14 S
the ABIC appears to be different from the others in that it
+ a15 Al + a16W + a17 As. includes many parameters that are determined to be unnec-
The symbol of each element in Eqs. (11), (12) represents essary in the other methods. Considering that the number of
its chemical composition in steels in wt%. In many equa- samples is only 155, the method using the ABIC is likely
tions for Ac3, the carbon term is often expressed in the to overfit the data, as indicated above using the toy model.
form of a linear term or square-root term. For example,
Hougardy2) derived his equation using a linear form. On the 4.4.2. Ms Temperature
other hand, Andrews1) assumed that the Ac3 temperature is It is clear from Table 5 that the terms for manganese,
proportional to the square root of the carbon content. For nickel, and chromium, which are considered important in
this reason, the basic model for the Ac3 and Ms temperatures the Andrews’ equation, are also in every equation obtained
in the present paper includes all the possible series of carbon from the criteria. The results for Ms temperature, how-
found in the literature. ever, are considerably different from those of Andrews in
the selection of terms concerning the carbon content. The
4.3. Model Selection Using Both Andrews’ and NIMS Andrews’ equation and the equation obtained by CV include
Datasets only the linear term listed in Table 5. However, the other
Next, we applied each criterion to the combined dataset of criteria have all carbon terms indicating that the carbon con-
Andrews and NIMS. In this analysis, the linear combination tent does not monotonously affect the Ms temperature. In
models also include cross terms between carbon and strong addition, molybdenum, which is included in the Andrews’
carbide-forming elements such as chromium, vanadium, equation does not appear in the equations obtained by the
titanium, manganese, and molybdenum: AIC, BIC, and ABIC. It is because molybdenum’s actual
1 1 effect on Ms temperature is not greater than the noise level;
Ac3 ( °C ) = 910 + a1C 3 + a2C 2 + a3C + a4C 2 therefore, its effect is buried by noise in data.
+ a5 Si + a6 Mn + a7 Ni + a8Cr + a9Cu These results clearly indicate that the data-driven approach
+ a10 Mo + a11V + a12Ti + a13 P + a14 S ... (13) can efficiently find the working parameters without the
experience and knowledge of experts, such as Andrews,
+ a15 Al + a16W + a17 As + a18CCr
simply by collecting a large number of existing datasets.
+ a19CMo + a20CTi + a21CMn + a22CV Note that Andrews selected working parameters in accor-
dance with the knowledge obtained from his experiments,
1 1 in which the chemical composition of each alloying element
Ms ( °C ) = 539 + a1C 3 + a2C 2 + a3C + a4 C 2 was systematically controlled.1)
+ a5 Si + a6 Mn + a7 Ni + a8Cr + a9Cu
4.5. Model Selection Result Using Both Andrews and
+ a10 Mo + a11V + a12Ti + a13 P + a14 S ... (14)
NIMS Datasets
+ a15 Al + a16W + a17 As + a18CCr 4.5.1. Ac3 Temperature
+ a19CMo + a20CTi + a21CMn + a22CV . The equations for the Ac3 temperature based on the chem-
ical composition are estimated using the same procedure as
4.4. Model Selection Result with the Andrews’ Dataset that for the toy model. Each criterion selects different fea-
4.4.1. Ac3 Temperature tures as shown in Table 6. Carbon, silicon, and nickel are
The same procedure as that used for the toy model was included in all the Ac3 equations. From the data-driven meth-

© 2017 ISIJ 2234


ISIJ International, Vol. 57 (2017), No. 12

Table 5. The comparison of coefficients of the Andrews’ equations for Ac3 and Ms with the results derived by each cri-
terion. The same data which Andrews collected were used for the model selection calculation.

Estimated by Constant C1/3 C1/2 C C2 Si Mn Ni Cr Cu Mo V Ti P S Al W As


Andrews
910 0 −203 0 0 44.7 0 −15.2 0 0 31.5 104.4 0 0 0 0 13.1 0
(1965)
AIC 910 0 −191 0 0 35 0 −15 0 0 29 98 673 0 −281 231 14 0
Ac3
BIC 910 0 −198 0 0 33 0 −15 0 0 29 102 772 0 0 0 0 14
ABIC 910 123 −434 203 −112 35 0 −15 0 −12 29 97 700 64 −297 234 14 28
CV 910 0 −191 0 0 35 0 −15 0 0 29 98 673 0 −281 231 14 0
Andrews
539 0 0 −423 0 0 −30.4 −17.7 −12.1 0 38.9 0 0 0 0 0 0 0
(1965)
AIC 539 −4 758 8 491 −6 014 2 252 0 −25 −14 −11 34 0 0 0 336 −438 0 5 −343
Ms
BIC 539 −4 925 8 832 −6 301 2 401 0 −26 −15 −10 0 0 0 0 0 0 0 0 −248
ABIC 539 −4 914 8 771 −6 195 2 316 0 −25 −14 −11 33 0 30 0 319 −438 0 0 −342
CV 539 0 0 −394 0 −9 −34 −18 −16 0 −9 19 0 0 0 0 4 −262

Table 6. The equations for Ac3 and Ms temperatures derived from the combined data. The total 353 Ac3 temperature
data (155 from Andrews and 198 from NIMS) and 501 Ms temperature (243 from Andrews and 258 from
NIMS) are used for the model selection calculation.

Model
Constant C1/3 C1/2 C C2 Si Mn Ni Cr Cu Mo V Ti P S Al W As CCr CMo CTi CMn CV
Selection

AIC 910 1 282 −2 664 1 985 −1 003 29 −22 −16 0 0 20 0 732 182 0 −149 19 139 0 0 0 0 165
BIC 910 0 −161 0 0 27 −27 −16 0 0 0 0 1 078 0 0 0 25 0 0 108 0 0 0
Ac3
ABIC 910 1 301 −2 629 1 850 −912 30 −28 −17 1 0 0 0 1 069 208 −127 −158 21 131 0 85 −982 27 110
CV 910 1 393 −2 826 2 020 −964 29 −24 −17 0 0 0 0 724 258 0 0 26 129 0 111 0 0 0
AIC 539 −985 1 722 −1 525 512 0 −32 −14 −5 17 0 −123 0 202 −446 −206 6 −363 −30 −18 0 0 352
BIC 539 −954 1 663 −1 513 535 0 −32 −14 −5 0 0 0 0 0 0 −211 6 −304 −26 0 0 0 0
Ms
ABIC 539 −993 1 738 −1 537 517 0 −33 −14 −5 17 0 −121 271 203 −445 −205 6 −363 −30 −18 −3 865 0 348
CV 539 −985 1 722 −1 525 512 0 −32 −14 −5 17 0 −123 0 202 −446 −206 6 −363 −30 −18 0 0 352

ods, molybdenum, which is a well-known carbide-forming between the experimental and the estimated Ac3. This result
element, turned out to be a working parameter appearing in suggests that some of the Ac3 temperatures in the data might
a cross term instead of a linear term, as in Andrews’ equa- be confused with the Acm temperatures, which is difficult to
tion. This can be explained from the austenite area of the distinguish simply from the dilatometric analysis commonly
steel. Figure 7(a) illustrates the area where only austenite used to determine Ac3 temperatures.
phase exists. Its lower left line and lower right line indicate The derived equations are found to be in good agreement
the Ac3 line and Acm line, respectively. The Ac3 line of the with experimental data as shown in Fig. 8. The root mean
Fe–C–Mo system based on the equation derived by BIC, square error (RMSE) of the previously reported equations
and the Ac3 and Acm lines of the same system derived using (Figs. 8(a), 8(b)) are as high as 39, while those of newly
the thermodynamic calculation software, Thermo-Calc, are derived equations (Figs. 8(c)–8(f)) are about 30.
plotted in Fig. 7(b). The Ac3 line for 0 wt% molybdenum
estimated using the equation derived by BIC corresponds 4.5.2. Ms Temperature
well with that derived from the thermodynamic calcula- The equations for the Ms temperature were derived in
tion. It is clear from the thermodynamic calculation that, as the same way as for the Ac3 temperature estimation. The
the content of molybdenum increases, cementite becomes newly derived equations and the past results by Andrews1)
stable and the Acm line goes up. The regression equation and Payson and Savage5) both indicate that carbon, man-
actually tracing the behavior of Acm line instead of Ac3 for ganese, nickel, and chromium are key elements for the Ms
the increased molybdenum content. In order to verify this, temperature. In addition, the cross term between carbon and
the Ac3 and the Acm temperatures were calculated from the chromium, which is included in one of Andrews’ equations,
chemical composition of 353 experimental Ac3 data with is also found in all four data-driven equations. To clarify the
Thermocalc software. The estimated Acm of 14 out of 353 reason why these cross terms are chosen, further research
data were found to be higher than the estimated Ac3. In is required. It is shown in Fig. 9 that the Ms temperatures
addition, the differences between the experimental Ac3 and estimated by these equations show better agreement with
the estimated Acm of above 14 data were smaller than those the observation data than the previously reported equations.

2235 © 2017 ISIJ


ISIJ International, Vol. 57 (2017), No. 12

Fig. 7. The austenite area of the Fe–C steel by Thermocalc (a) and
The Ac3 and Acm lines of the Fe–C–Mo steel drawn with
Thermocalc and based on the BIC result for Andrews and
NIMS data (b).

Fig. 9. Comparison of the experimental and estimated Ms tem-


peratures derived by past researchers (a and b) and model
selection criterions (c, d, e and f).

data, but also clarify the hidden mechanism which is usually


difficult to derive because of the high dimensionality of the
material’s dataset. These results suggest that the model selec-
tion method can successfully be applied to other problems
involving feature and working-parameter selection, such as
the problem estimating fatigue and creep lifetime.

REFERENCES
1) K. W. Andrews: J. Iron Steel Inst., 203 (1965), 721.
Fig. 8. Comparison of the experimental and estimated Ac3 tem- 2) H. P. Hougardy: Werkstoffkunde Stahl, Band 1: Grundlagen, Springer/
peratures by past researchers (a and b) and model selection Verlag Stahleisen, Düsseldorf, (1984), 229.
criterions (c, d, e and f). 3) O. G. Kasatkin, B. B. Vinokur and V. L. Pilyushenko: Met. Sci. Heat
Treat., 26 (1984), 27.
4) J. Trzaska and L. A. Dobrazan ́ski: J. Mater. Process. Technol., 192
(2007), 504.
5) P. Payson and C. H. Savage: Trans. ASM, 33 (1944), 261.
5. Conclusion 6) R. A. Grange and H. M. Stewart: Trans. AIME, 167 (1946), 467.
7) J. Wang, P. J. van der Wolk and S. van der Zwaag: Mater. Trans.
To demonstrate the effectiveness of the AIC, BIC, ABIC, JIM, 41 (2000), 761.
and CV, these criteria were evaluated using a toy model. 8) W. G. Vermeulen, P. F. Morris, A. P. De Weijer and S. Van der
The results suggested that they could find meaningful Zwaag: Ironmaking Steelmaking, 23 (1996), 433.
9) T. Sourmail and C. Garcia-Mateo: Comput. Mater. Sci., 34 (2005),
parameters, which are not buried in noise in the given data. 323.
Among the four criteria, the BIC method showed the best 10) E. J. Seo, L. Cho and B. C. De Cooman: Metall. Mater. Trans. A, 46
(2015), 27.
performance for a linear combination model. 11) C.-N. Li, F.-Q. Ji, G. Yuan, J. Kang, R. D. K. Misra and G.-D. Wang:
We applied the technique to the actual experimental dataset Mater. Sci. Eng. A, 662 (2016), 100.
12) H. Akaike: IEEE Trans. Autom. Control., 19 (1974), 716.
used by Andrews1) and that provided by NIMS to derive pre- 13) G. Schwarz: Ann. Stat., 6 (1978), 461.
dictive equations for the Ac3 and Ms temperatures. By apply- 14) H. Akaike: Trab. Estad. Stica Y Investig. N Oper., 31 (1980), 143.
ing the model selection criteria to the datasets collected from 15) C. E. Rasmussen and C. Williams: Gaussian Processes for Machine
Learning, The MIT Press, Cambridge, (2006), 105.
these sources, it was demonstrated that the key parameters, 16) K. S. Al-Rubaie, E. K. L. Barroso and L. B. Godefroid: Int. J.
traditionally chosen by skilled researchers on the basis of sys- Fatigue, 28 (2006), 934.
17) E. Cockayne and A. van de Walle: Phys. Rev. B, 81 (2010), 12104.
tematically controlled experiments, could be found efficiently. 18) NIMS Materials Database (MatNavi), NIMS, http://mits.nims.go.jp/
In addition, simply by integrating the data obtained using index_en.html, (accessed 2015-04-10).
various randomly selected conditions, the derived equations 19) M. Stone: J. R. Stat. Soc. Ser. B, 39 (1974), 44.
not only show improved agreement with the experimental

© 2017 ISIJ 2236