Attribution Non-Commercial (BY-NC)

Просмотров: 52

Attribution Non-Commercial (BY-NC)

- 1 Nonlinear Regression
- Artsy Case Solution
- Report APA
- Censored Variables and Censored Regression
- Regression
- fulltext_7056
- 679-5788-1-PB.pdf
- Chapter12_solutions.pdf
- UT Dallas Syllabus for econ4355.001.10f taught by Kurt Beron (kberon)
- 2441-5043-1-PB
- Multivariate Analysis of Water Quality in Rosario Islands National Park (Colombia)
- A Study Examining the Students Satisfaction in Higher Education
- 2537
- Multiple Regression
- Senam otak
- Output
- biostat3.1
- 13020214410008 Annisa Herdini Tugas 1
- assignment 2 (1).docx
- EXPLORING THE DIRECT AND INDIRECT EFFECTS BETWEEN DEPENDENT AND INDEPENDENT VARIABLE USING REGRESSION ANALYSIS AND PATH ANALYSIS IN RESPECT OF SELF CONFIDENCE AND ANXIETY IN ENGLISH LANGUAGE ON ATTITUDE TOWARDS ENGLISH LANGUAGE

Вы находитесь на странице: 1из 11

multivariate calibration problems

Daniel Vitor de Lucena (dvlucena@gmail.com) , Telma Woerle de Lima Soares (telma@inf.ufg.br)

Anderson da Silva Soares (anderson@inf.ufg.br), Clarimar Jos Coelho (clarimarc@gmail.com)

Instituto de Informtica UFG Departamento de Computao PUC - GO

Abstract

This paper proposes the use of a multi-objective genetic algorithm NSGA-II on the variable selection in

multivariate calibration problems. The NSGA-II algorithm is used for selecting variables for a Multiple

Linear Regression (MLR) by two conflicting objectives: the prediction error and the used variables

number in MLR. For the case study is used wheat data obtained by NIR spectrometry with the objective

for determining a variable subgroup with information about protein concentration. The results of

traditional techniques of multivariate calibration as the Partial Least Square (PLS) and Successive

Projection Algorithm (SPA) for MLR are presents for comparisons. The obtained results showed that the

proposed approach obtained better results when compared with a monoobjective evolutionary algorithm

and with traditional techniques of multivariate calibration.

Keywords: Multivariate Calibration, Genetic Algorithm, Multiobjective Optimization

Introduction

The need of obtaining relevant information

about the concentration of some chemicals called

analytes, collected from analysis tools, stimulated

chemometrics studies, determined as application

of mathematics and statistics techniques for

chemical data analysis (Beebe, et al., 1998). As

the analytes concentration is hardly directly

provided from the analysis tool (Nunes, 2008), the

chemometrics, through calibration, has the

objective to extract these information using

regression models.

According to Martens (1989) the

calibration is defined as the process of

construction of a mathematical model to connect

the output of a tool and a certain property of the

sample and prediction is the process of using the

model to forecast the properties of a given sample

of an output of the tool, for example, the

absorbance at a wavelength can be related to the

concentration of an analyte.

The multivariate calibration term refers to

the construction of a mathematic model that grants

to forecast the value of an interesting size based on

measured values of a set of explanatory variables.

In these cases, some known techniques for

building model regression are: Multiple Linear

Regression (MLR) (Martens, 1989), Principal

Component Regression (PCR) (Jolliffe, 1989) e

Partial Least Square Regression (PLS) (Beebe, et

al., 1998; Wold, et. al., 2001; Martens & Naes,

1989).

Not always is necessary the utilization of

all collected data of a sample during the

calibration, therefore, when is intended to analyze

just some features of the sample.

Selecting variables with information

related to these features of interest allows to create

more tough, simple and of easy interpretation

models, as well avoiding the irrelevant information

processing. Other problems also found on

calibration are: the collinearity, where two or more

have correlated information and the sensitivity to

noise that prejudice the calibration efficiency and

the prediction of the compounds of the sample, in

particular MLR (Martens & Naes, 1989; Draper &

Smith, 1998).

A solution to the collinear variables is to

obliterate them through selection (Guyon, 2003).

2

At this process, the use of evolutionary algorithms,

in particular genetic algorithms (GAs), is an

option. An optimization algorithm like an

evolutionary algorithm can be used to choose a

strong subset of variables and with little

redundancy and information related to the

characteristics of interest.

At this work is studied the use of the

multiobjective genetic algorithms NSGA-II at the

variables selection process for the conflicting

objectives such as minimizing the residual error

between the concentration predicted by the

regression model and the real protein

concentration of the grain as reducing the

computational cost and simplifying the calibration

model.

Multivariate Calibration

Linear Multiple Regression

The regression analysis is a statistical

methodology to predict the values of one or more

response variables (dependents) of a predictors set

(independents) (Johnson & Wichern, 2002).

The classical model of the multiple linear

regression is given by:

Y = X s (1)

where X is the data matrix obtained from

instrumental responses of order (n x p), with n the

amount of samples and p the amount of variables

of each sample, is the order regression

coefficients vector (n x 1) calculated by least

squares from the pseudo-inverse of X, is the

order residual error vector (n x 1), Y is the order

vector (n x 1) that has the values of the properties

of interest obtained by a standard method, each

variable depending on the vector Y is a linear

combination obtained by the independent variables

of the data matrix X (Johnson & Wichern, 2002;

Nunes, 2008).

The model (1) in the matrix notation:

_

n

_ =

1

1

1

X

11

X

21

X

n1

X

12

X

22

X

n2

X

1p

X

2p

X

np

_

[

1

[

2

[

n

_ _

e

1

e

2

e

n

_ (2)

The regression coefficients are determined

by linear combination

=

= ( X

i

X)

-1

( X

i

Y) (3)

and the estimated regression coefficients equal to

those calculated (Johnson & Wichern, 2002).

According to Rencher (2002) the response

variable estimated is defined by linear combination

between the data matrix X and the estimated

regression coefficients

, so:

Y

= X

(4)

The residual error is calculated by

difference between the response variable reference

Y and response variable estimated Y

(Rencher,

2002), thus:

s = Y Y

(5)

The Root Mean Square Error of Prediction

(RMSEP) evaluates how much the concentration

predicted by the model approximates from the

expected concentration. RMSEP is expressed by

RHSEP =

_

( y

i

y

)

n

=1

n

(6)

where y

i

is i-th concentration predicted by the

regression model. This measurement helps on the

evaluation of calibration model performance e

allows us chose models more suitable to

prediction. The regression parameters could be

estimated with some noise because the datum is

measured with error and because they are

estimated from data samples [Varmuza &

Filzmoser, 2000; Gemperline, 2006).

Multicollinearity Problem and Variables

Selection

In statistics, the existence of linear

correlation between two or more independent

variables in a multiple regression model is defined

as multicollinearity. This problem may cause

serious difficulty with the reliability of the

estimates of the model coefficients and difficulty

in understanding the values obtained in response

variable (Alin, 2010; Chong & Jun, 2005).

In prediction problems when the

regression model have many variables, the larger

part can contribute little or nothing to prediction

precision, therefore, select a reduced set with the

3

variables that do influence positively in the

regression model is crucial, but how many and

which variables should compose this subset?

(Snedecor & Cochran, 1972; Hocking, 1976).

To define a smaller set of independent

explanatory variables to be included in the final

regression model is a frequent problem in

regression analysis (Hocking, 1976).

The problem of determining an

appropriate equation based on a subset of the

original set of variables contains three basic

ingredients, namely (a) the computational

technique used to provide the information for the

analysis, (b) the criterion used to analyze the

variables and select a subset, if that is appropriate,

and (c) the estimation of the coefficients in the

equation (3) (Hocking, 1976).

According to Miller (1984) the reasons for

using only some of the available or possible

predictor variables include: a) to estimate or

predict at lower cost by reducing the number of

variables on which data are collected, b) to predict

accurately by eliminating uninformative variables,

c) to describe a multivariate data set

parsimoniously, d) to estimate regression

coefficients with small standard errors

(particularly when some of the predictors are

highly correlated).

The proposed strategy to the problem of

variables selection for multiple linear regression is

the use of genetic algorithm to solve the

multicollinearity problem, reduce cost by reducing

the number of variables and minimize the residuals

errors.

Partial Least Squares Regression

Partial Least Squares (PLS) is a method

for constructing predictive models when the

factors are many and highly collinear. Note that

the emphasis is on predicting the responses and not

necessarily on trying to understand the underlying

relationship between the variables (Tobias, 1997).

PLS regression is a technique that generalizes and

combines features from principal component

analysis and multiple linear regression. The goals

of PLS regression is to predict Y from X and to

describe their common structure (Abdi, 2003).

Successive Projection Algorithm

The successive projections algorithm

(SPA) is a variable selection technique designed to

minimize collinearity problems in multiple linear

regression (MLR) (Galvo, 2008).

SPA comprises three main phases: The

first consists of projection operations carried out

on the matrix X of instrumental responses. These

projections are used to generate chains of variables

with successively more elements. Each element in

a chain is selected in order to show the least

collinearity with the previous one. In the next

phase the candidate subsets of variables are

evaluated according to the RMSEP predictive

performance in the MLR model. The last phase

consists of variable elimination procedure aimed at

improving the parsimony of the model.

The last results of multivariate calibration

literature showed that the SPA-MLR has the better

results in terms of RMSEP and parsimony when

compared with the classical genetic algorithm and

PLS (Soares(a), Galvao Filho, Galvao, & Araujo,

2010; Soares(b), Galvao Filho, Galvao, & Araujo,

2010).

Genetic Algorithm

Genetic Algorithms (GAs) were proposed

by Holland in the 70s. He studied natural

evolution, considering this a simple and powerful

process, ready for adaptation to obtain efficient

computational solutions for optimization

problems. In this context, strength is related to

GAs produces, in general, appropriate solutions

independently of the initial parameters (Goldberg,

1989). The main differential of GAs is the creation

of descendant by the recombination operator

named crossover (De Jong, 2006).

Other important feature of GAs is to use

mutation and recombination operators to balance

two possible conflicting objectives: preserve the

best solutions and the exploration of the search

space. Therefore, the search process is

multidimensional, preserving candidate solutions

and inducing the information exchange between

the explored solutions (Michalewicz, 1996; Von

Zuben, 2000).

According to Michalewicz (1996), the

main steps of a GA are:

4

1. During the gen iteration, a GA keeps a

population of potential solutions

P( gcn) = |x

1

gcn

, , x

n

gcn

|:

2. Each x

gcn

individual is measured,

producing a fitness measurement;

3. New individuals are generated from

individuals of current population, which

are selected for reproduction by a process

that tends to choose individuals with a

larger fitness;

4. Some individuals undergo changes by

recombination and mutation, forming new

potential solutions;

5. Among old and new solutions ( + ), are

selected (survivor) individuals for the next

generation (gen + 1);

This process is repeated until one stop

condition is satisfied. This condition can be an

expected level of solutions adequacy or a

maximum number of iterations.

In the context of multivariate calibration,

the population is the set of possible solutions

where each individual is a candidate solution.

Using the binary representation for the

chromosome , the gene amount is the total of

possible variables to be selected. Each gene

determines the selection of each variable,

according to its value 0 or 1, being selected or not,

respectively.

Multiobjective Optimization

For many decision-making problems in

the world exist the need of simultaneous

optimization of multiple objectives (Michalewicz,

1996), what makes optimization problems analysis

observing only one objective an insufficient

approaching to find satisfactory solutions. Great

part of these problems presents a collection of

objectives to be optimized, not always harmonic

ones, and could have conflicts between the

objectives and consequently the improvement of

one causes deterioration of another.

Multiobjective optimization (MOO)

problems require distinct techniques, that are far

way different of the standard optimization

techniques for monoobjective problems. It is very

clear that if exists two objectives to be optimized,

it could be possible to find a solution that is best

for the first objective and another solution that is

the best for the second objective. (Michalewicz,

1996).

Takahashi (2004) describes, generally, a

MOO problem as:

H00_

X

= x

n

x

=

min

x

mox

x

( x)

suj ei t o a: x F

x

(7)

where X

n

is

the optimization space paiameteis, x* a point

belonging to X

of the problem and F

x

is the set of feasible points

belonging to the optimization parameters space.

The classification of all possible solutions

for MOO problems in dominated and non-

dominated solutions (Pareto-optimal) is

convenient. Given a solution x, it is dominated if

exist a viable solution y that is not worse than x in

all coordinates, in other words, for all objectives

( i = 1, , k) (Michalewicz, 1996):

( x)

( y) ; poro 1 i k (8)

Not existing such relation, in other words,

not existing a dominated solution by any other

viable solution, this is defined as non-dominated

solution (or Pareto-optimal). All Pareto-optimal

solutions can be of some interest, and ideally, the

system should notify the set of all Pareto-optimal

points (Michalewicz, 1996).

Schafter, in 1984, implemented the first

GA for MOO, called VEGA (Vector Evaluated

Genetic Algorithm), and this was an extension of

GENESIS, program to include multi-criteria

functions. In 1994, Srinivas, & Deb (1994)

proposed a new technique called NSGA (Non-

dominated Sorting Genetic Algorithm) based on

classifying the individuals in many groups called

fronts. This grouping process is accomplished

based on non-domination before the selection

(Michalewicz, 1996).

In the multivariate calibration two

conflicting objectives are to minimize the error on

predicting a analyte state and the minor amount of

possible selected variables. These are conflicting

5

objectives because as lower is the amount of

selected variables the error on predicting can be

higher.

Non-Dominated Sorting Genetic

Algorithm II (NSGA-II)

Developed by Deb, Pratap, Agarwal, &

Meyarivan (2002), NSGA-II, as the first NSGA

version, implements the dominance concept,

classifying population in fronts accordingly to its

dominance level. The best solutions of each

generation are located at the first front while the

worst are located at the last front. The process of

classification occurs until all population

individuals are located at a front. Finalized this

process of classification, individuals belonging to

first front are non-dominated, but dominate

individuals from second front and the individuals

from the second front dominate the individuals

from the third front and so on (Deb, Pratap,

Agarwal, & Meyarivan, 2002). Fig. 1 illustrate the

steps of NSGA-II working:

Fig. 1: NSGA-II working

The main difference from NSGA-II to a

simple GA is the way the selection operator is

applied, and this operator is subdivided in two

process: Fast Non-Dominated Sorting and

Crowding-Distance. The other operators are

applied on traditional way (Deb, Pratap, Agarwal,

& Meyarivan, 2002).

The Fast Non-Dominated Sorting process

execution is done in 2 parts: first all population

individuals are compared with each other in orders

to calculate the dominance level. Finished this

first part of the process, the individuals that have

dominance level equal to zero, are classified as

non-dominated and will be inserted at first front

(Pareto-optimal) (Deb, Pratap, Agarwal, &

Meyarivan, 2002).

The second part of Fast Non-dominated

Sorting process will treat individuals which

dominance level is different of zero. In this step

each individual is removed from population,

classified and inserted on the front of its

dominance level. The Fast Non-dominated Sorting

ends when the population is empty (Deb, Pratap,

Agarwal, & Meyarivan, 2002).

The search for a Pareto-optimal solutions

group tends to converge the solutions in a same

region. However, a desired feature in a GA is the

good scattering of found solutions, and, at this

point, the second selection operator process of

NSGA-II, Crowding-distance, works. The

crowding-distance is a new approaching based on

comparison of aggregation proposed to substitute

the approaching of sharing function of NSGA,

eliminating two known problems: the sharing

method performance strongly dependent on the

parameter value of o

shucd

chosen by user, and,

the global complexity of approaching being O(N),

for the comparison of each solution with all other

solutions. Other crowding-distance function is to

order all solutions within the same front (Deb,

Pratap, Agarwal, & Meyarivan, 2002).

In order to better comprehend the

crowding-distance approaching is necessary to

define the metrics for density estimation and

comparison operator.

The density estimation of solutions around

a particular solution of population is obtained by

calculating the average distance between two

points of each side of this point along each one of

the objectives. The I value serves as an estimate of

cuboid perimeter formed using next borderers as

vertices, as shown on Fig.2 (Deb, Pratap, Agarwal,

& Meyarivan, 2002):

6

Fig. 2: Crowding-distance calculation. Marked

points in filled circles are non-dominated front

solutions (Deb, Pratap, Agarwal, & Meyarivan,

2002).

At Crowding Distance calculation I[ i] . m

refers to m-th individual i objective value at set I

and parameters

m

mux

and

m

mn

are maximum and

minimum of m-th objective function. Complexity

is O(MNLogN) for M classification, independent

of, at maximum, N solutions. This calculation

requires the population classification accordingly

to each objective function value in magnitude

ascending order. Each objective function is

normalized before the calculation. After all

population members in the I set were attributed a

distance metric to enable the comparison between

two solutions for their closeness level and, the

lower this distance value, closer to other solutions

it is (Deb, Pratap, Agarwal, & Meyarivan, 2002).

The comparison operator (

n

) has the

objective to guide the selection process on the

various steps of algorithm in direction to a Pareto-

optimal front evenly spread. Supposing that each

individual on population has two attributes:

1. Non-dominance Rank (i

unk

);

2. Crowding Distance (i

dstuncc

).

A partial order

n

is defined by:

i

n

] i ( i

unk

< ]

unk

) or

( ( i

unk

= ]

unk

) onJ ( i

dstuncc

> ]

dstuncc

) )

(9)

for two solutions between different non-dominant

fronts this model gives preference to choose the

best ranked solution, in other words, minor rank,

otherwise, is chosen the solution located on a

minor agglomerated region (Deb, Pratap, Agarwal,

& Meyarivan, 2002).

Multiobjective Decision Maker

NSGA-II algorithm presents a set of

solutions for multiobjective problem at its first

front. To help choosing a solution within this set, it

were applied the t test as a multiobjective decision

maker.

T-test evaluate statistically the

significance of the distance between two

independent samples average, appropriate when

there is a need to compare two groups average

(Trochim & Donnelly, 2007). At this problem

context, t-test analysis is appropriate to verify the

difference between solutions RMSEP values with

5% of significance, considering the increase of

variables at this model.

T-test is given by:

t =

X

1

X

2

_

:or

1

n

1

+

:or

2

n

2

(10)

where X

1

and X

2

are means samples. The upper

part of the formula (10) is the difference between

the average of samples X

1

and X

2

. The bottom part

is difference standard error, and is obtained by the

variance division for each group by the number of

elements of this group (Trochim & Donnelly,

2007).

The null hypothesis tells that solutions are

random samples of independent normal

distributions with equal averages and equal

variances, but unknown, against the alternative

that environments are not equal. The result of t-test

1 indicates a rejection of the null hypothesis at

significance level and 0 indicates a failure when

rejecting the null hypothesis at significance

level.

Materials and Methods

Wheat Data

All the samples are from whole grain

wheat, obtained from vegetal material from

occidental Canadian producers. The standard data

were determined at the Grain Research Laboratory,

in Winnipeg. The standard data are: protein

content (%); test weight (kg/hl); PSI (wheat kernel

7

texture) (%); farinograph water absorption (%);

farinograph dough development time (minutes)

and farinograph mixing tolerance index

(Brabender unities). The data set for the

multivariate calibration study consists of 775 VIS-

NIR spectra of whole-kernel wheat samples, which

were used as shoot-out data in the 2008

International Diffuse Reflectance Conference

(http://www.idrc-chambersburg.org/shootout.html

). Protein content was chosen as the property of

interest. The spectra were acquired in the range

400-2500 nm with a resolution of 2 nm. In the

present work, online the NIR region in the range

1100-2500 nm was employed. In order to remove

undesirable baseline features, first derivative

spectra were calculated by using a Savitzky-Golay

filter with a 2

nd

order polynomial and an 11-

points window. But only the data referring to

protein concentration were used at these tests.

The Kennard-Stone (KS) algorithm

(Kennard & Stone, 1969) was applied to the

resulting spectra to divide the data into calibration,

validation and prediction sets with 389, 193 and

193 samples, respectively. The validation set was

employed to guide the selection of variables in

SPA-MLR, MONO-GA-MLR and MULTI-GA-

MLR. The prediction set was only employed in the

final performance assessment of the resulting

MLR models. In the PLS study, the calibration and

validation sets were joined into a single modeling

set, which was used in the leave-one-out cross-

validation procedure. The number of latent

variables was selected on the basis of the cross-

validation error by using the F-test criterion of

Haaland and Thomas with = 0.25 as suggested

else where (Haaland & Thomas, 1988). The

prediction set was only employed in the final

evaluation of the PLS model.

Environment and Tools

For executing the proposed NSGA-II

algorithm, as well the regression calculation

applying MLR and RMSEP used as fitness value

for the objective of wheat protein concentration

determination were used the Matlab software

version 7.10 (R2010a).

For algorithm execution were used the

parameters at the Table 1:

Table 1 - NSGA-II set.

Population Size 100

Generations Number 30, 50 and 100

Selection Operator Binary Tournament

Mutation Operator Polynomial Mutation

Mutation Probability 1/Population Size

Crossover Operator Simulated Binary

Crossover (SBX)

Crossover Probability 0.9

Results e Discussions

A Fig. 3 presents the derivative spectra of

wheat sample.

Fig. 3: Wheat sample spectrum.

It were done 30 executions of NSGA-II

using parameters showed on Table 1, changing just

the generation number within 30 new executions.

In the executions were used wheat data set

presented at Picture 3, with 389 samples and 690

spectra in each sample. Each spectrum is a

variable in this model. Each NSGA-II execution

has the objective to give a set of solutions at

Pareto-optimal front. Fig 4 shows the label for the

Pareto fronts graphs. Figs. 5, 6 and 7 show an

arrangement of solutions after NSGA-II execution

within the 30 executed for each generation number

set:

Fig. 4 - Pareto fronts label

8

Fig. 5 - Pareto fronts for 30 generations

Fig. 6 - Pareto fronts for 50 generations

Fig. 7 - Pareto fronts for 100 generations

Figs 8,9 and 10 show bar graph of sum of

times that each spectrum were selected in one of

the Pareto-optimal front for each execution set of

NSGA-II. From this graphics is possible to

observe that some regions like as 0-50 and 110-

150 spectral variables are important to build MLR

model for distinct GAs. In order hands, others

regions like as 190-220 spectral variables are not

frequently used. These observations are important

to reduce the hardware cost to obtain the spectral

variables.

Fig. 8 - Selected spectra at Pareto-optimal front

solutions for 30 generations

Fig. 9 - Selected spectra at Pareto-optimal front

solutions for 50 generations

Fig. 10 - Selected spectra at Pareto-optimal front

solutions for 100 generations

To choose one of the possible solutions

from the Pareto-optimal front were used the t test

as a multiobjective decision maker with a 5%

significance level. At Table 2 is presented the

results of the choices done by the multiobjective

decision maker.

9

Table 2 - Choices of the Pareto-optimal front solutions by statistical test t with 5% significance.

30 generations 50 generations 100 generations

RMSEP Average 0.08 0.08 0.08

Larger RMSEP 0.13 0.16 0.13

Smaller RMSEP 0.06 0.06 0.06

Variables Number Average 21 20 19

Larger Variables Number 25 26 24

Smaller Variables Number 17 14 12

Table 3 - Results of the traditional techniques PLS,

SPA-MLR, and MONO-GA-MLR.

RMSEP

PLS 0.21 (15*)

SPA-MLR 0.20 (13

#

)

MONO-GA-MLR 0.21 (146

#

)

Range of protein content in the prediction set: 10.2-16.2 %

m/m. *Number of latent variables.

#

Number of spectral

variables selected.

The Table 3 shows the prediction results

for PLS, SPA-MLR and MONO-GA-MLR. These

are considered the main techniques in the

multivariate calibration. As can see the proposed

algorithm obtained a better RMSEP results using a

slightly larger number of variables.

Conclusion

At this paper were presented a study that

proposed the use of a genetic algorithm and

multiobjective optimization to select a variables

set for multivariate calibration using MLR for

wheat protein concentration prediction.

The obtained results show the techniques

efficiency about to reduce RMSEP with a little

amount of spectral variables in relation to the

spectra amount present on the sample.

From the results, it was possible to

observe that spectra selected on set on the

solutions found by GA are similar. It was also

realized that GA converges with a few generations

that could indicate possible local optima and not

global optima.

When compared to traditional algorithms,

like PLS, APS-MLR and MONO-GA-MLR, the

results obtained by proposed algorithm were better

in all found cases.

As next studies, is suggested the

implementation of another genetic algorithms for

multiobjective optimization, as SPEA II (Strength

Pareto Evolutionary Algorithm).

References

Abdi, H. (2003). Partial least squares (PLS)

regression.. In M. Lewis-Beck, A. Bryman, T.

Futing (Eds): Encyclopedia for research methods

for the social sciences. Thousand Oaks (CA):

Sage. (pp. 792-795).

Alin, A. (2010). Multicollinearity. WIREs Comp

Stat, 2: 370374. doi: 10.1002/wics.84.

Beebe, K. R., Pell R. J., & Seasholtz, M. B.

(1998). Chemometrics: A Practical Guide. John

Wiley & Sons, INC. New York.

Chong, Il-Gyo. Jun, Chi-Hyuck. (2005).

Performance of some variable selection methods

when multicollinearity is present, Chemometrics

and Intelligent Laboratory Systems, v. 78, n. 1-2,

(pp. 103-112).

De Jong, K. A. (2006). Evolutionary computation :

a unified approach. Massachusetts Institute of

Technology.

Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T.

(2002). A Fast and Elitist Multiobjective Genetic

Algorithm: NSGA-II. IEEE TRANSACTIONS ON

EVOLUTIONARY COMPUTATION, VOL. 6,

NO. 2.

10

Draper, R. N. & Smith, H. (1998). Applied

regression analysis, Willey series and probability

and statistics.

Galvo. R. K. H. (2008). A variable elimination

method to improve the parsimony of MLR models

using the successive projections algorithm.

Chemometrics and Intelligent Laboratory Systems,

Volume 92, Issue 1, (pp 83-91).

Gemperline, P. (2006). Practical Guide to

Chemometrics, CRC Taylor & Francis, Boca

Raton, 2006.

Goldberg, D. E. (1989). Genetic algorithms in

search, optimization, and machine learning. New

York: Addison-Wesley.

Guyon, I. (2003). An Introduction to Variable and

Feature Selection. Journal of Machine Learning

Research, [S.l.], v.3, (pp.11571182).

Haaland, D. M. & Thomas, E. V. (1988). Partial

Least-Squares Methods for Spectral Analysis 1.

Relation to Other Quantitative Calibration

Methods and the Extraction of Quantitative

Information, Anal. Chem, 60, (pp 1193).

Hardle, W. & Simar, L. (2003). Applied

Multivariate Statistical Analysis. [S.l.]: Tech.

Hocking, R. R. "The analysis and selection of

variables in linear regression". Biometrics, 32, 1-

49, 1976.

Johnson, R. A. & Wichern, D. W. (2002). Applied

Multivariate Statistical Analysis, Prentice Hall.

Jolliffe, Ian (1982). A Note on the Use of Principal

Components in Regression, Journal of the Royal

Statistical Society. Series C (Applied Statistics) v..

31, n. 3, (pp. 300-303).

Kennard, R.W. & Stone L. A. (1969). Computer

aided design of experiments, Technometrics, 11,

(pp 137-148).

Martens, H. & Naes, T. (1989). Multivariate

Calibration, John Willey & Sons, Chichester.

Michalewicz, Z. (1996). Genetic algorithms +

data structures = evolution programs. 3 ed.

Berlin: Springer.

Miller, A.J. (1984). Selection of Subsets of

Regression Variable. Journal of the Royal

Statistical Society. Series A (General), Vol. 147,

No. 3, (pp.389-425).

Nunes, P. G. A. (2008). Uma nova tcnica para

seleo de variveis em calibrao multivariada

aplicada s espectrometrias UV-VIS e NIR, Tese

de Doutorado, UFPB/CCEN Joo Pessoa.

Rencher, A. C. (2002). Methods of Multivariate

Analysis, Willey-Interscience.

Snedecor, G. W. & Cochran, W. G. (1972).

Statistical Methods. 6 ed., Iowa: Ames.

Soares(a), A. S., Galvao Filho, A. R., Galvao, R.

K. H. & Araujo, M. C. U. (2010). Improving the

computational efficiency of the successive

projections algorithm by using a sequential

regression implementation: a case study involving

nir spectrometric analysis of wheat samples. J.

Braz. Chem. Soc., So Paulo, v. 21, n. 4.

Soares(b), A. S., Galvao Filho, A. R., Galvao, R.

K. H. & Araujo, M. C. U. (2010). Multi-core

computation in chemometrics: case studies of

voltammetric and NIR spectrometric analyses. J.

Braz. Chem. Soc., So Paulo, v. 21, n. 9.

Srinivas, N. & Deb, K.. (1994). Multiobjective

Optimization Using Nondominated Sorting

Genetic Algorithms, Evolutionary Computation,

Vol.2, No.3.

Takahashi, R. H. C. (2004). Otimizao Escalar e

Vetorial. Universidade Federal de Minas

Gerais,[S.l.].

Tobias, R. (1997). An Introduction to Partial

Least Squares Regression, TS-509. SAS Institute

Inc., Cary, NC.

Trochim, W. & Donnelly, J. P. (2007). The

Research Methods Knowledge Base, 3ed.

Thompson Publishing, Mason, OH.

Von Zuben, F. J. (2000). Computao evolutiva:

Uma abordagem pragmtica. In: Anais da I

Jornada de Estudos em Computao de Piracicaba

e Regio (1a. JECOMP), Piracicaba, SP, (pp. 25

45).

Varmuza, K. & Filzmoser, P. (2000). Introduction

to Multivariate Statistical Analysis in

Chemometrics, [S.l.]: CRC Press.

WOLD, SVANT, SJSTRM, MICHAEL.

ERIKSSON, LENNART. (2001), PLS-regression:

a basic tool of chemometrics, Chemometrics and

intelligent laboratory, v. 58, (pp. 109-130).

11

- 1 Nonlinear RegressionЗагружено:EmRan Leghari
- Artsy Case SolutionЗагружено:lenafran22
- Report APAЗагружено:Mujahid Abbas
- Censored Variables and Censored RegressionЗагружено:Stepan Abrahamyan
- RegressionЗагружено:Jatin Virdi
- fulltext_7056Загружено:gabesh3192
- 679-5788-1-PB.pdfЗагружено:Hildegardis Mulu
- Chapter12_solutions.pdfЗагружено:chadbonadonna
- UT Dallas Syllabus for econ4355.001.10f taught by Kurt Beron (kberon)Загружено:UT Dallas Provost's Technology Group
- 2441-5043-1-PBЗагружено:Silvina Hualpa
- Multivariate Analysis of Water Quality in Rosario Islands National Park (Colombia)Загружено:AJER JOURNAL
- A Study Examining the Students Satisfaction in Higher EducationЗагружено:cuachanhdong
- 2537Загружено:Hasool01
- Multiple RegressionЗагружено:Josh Chettiar
- Senam otakЗагружено:muhammad rezky juni
- OutputЗагружено:Vinay Gatagat
- biostat3.1Загружено:Sonia Rahma A
- 13020214410008 Annisa Herdini Tugas 1Загружено:Bayu Dwi Ananta
- assignment 2 (1).docxЗагружено:vikrant vardhan
- EXPLORING THE DIRECT AND INDIRECT EFFECTS BETWEEN DEPENDENT AND INDEPENDENT VARIABLE USING REGRESSION ANALYSIS AND PATH ANALYSIS IN RESPECT OF SELF CONFIDENCE AND ANXIETY IN ENGLISH LANGUAGE ON ATTITUDE TOWARDS ENGLISH LANGUAGEЗагружено:Anonymous CwJeBCAXp
- tugas regresiberganda.docxЗагружено:TalitaSafitri
- Cs Ks Umurbblr.fix160 NewwЗагружено:nater
- ch11-SimpleRegressionЗагружено:Yusuf Sahin
- uji regresi akhir.docЗагружено:tripramono
- Management LiteratureЗагружено:Venkat GV
- 33151-33161Загружено:apocalipsis1999
- Artikel Aderaa.id.EnЗагружено:Rowa Subakti Amigyo
- T-testЗагружено:Daniel Nacorda
- MRAЗагружено:sohail
- 8.3 Data Assignment.docxЗагружено:mohammad_samhan

- Xg BoostЗагружено:Rickdev Bhattacharya
- is.1201-1220.1978Загружено:rajupetalokesh
- Exam 2013 SolutionЗагружено:Ren Yang Yap
- 009-cr.pdfЗагружено:sdagnihotri
- Estrella.et.al.2009.Daniellia.pdfЗагружено:David Aldana Gomero
- BDA+3063_2Загружено:Bryon Dean
- Session 2 Demand ManagementЗагружено:vayugaram
- Mini Tab 14 One Day Training 39403Загружено:Rafafa Falcon
- UNIT 3Загружено:deeksha
- A Systems Perspective of SlipЗагружено:raku2121
- 1549_117045Загружено:Santhosh H Narasimha
- Rosseel Sem CatЗагружено:NabeelSaleem
- OCR S1 Revision SheetЗагружено:Richard Adio
- Abs (Bottom-founded Offshore Wind Turbine)Загружено:vivy
- ObjectiveЗагружено:muralidharan
- Hidrograma de SnyderЗагружено:Jose Luis Anaya
- Tecnicas Interpolacion Var ClimaЗагружено:Daniel Núñez
- lab 4Загружено:Juliet Abalos
- part iii test finalЗагружено:api-355185574
- Chapter9b. Split Plot Experiment 10April2011Загружено:Nor Faizah
- Resume Ekonometrika Bab 1Загружено:Firsa Aulia
- Sampling DistributionsЗагружено:Ravi Gupta
- IPPTCh010Загружено:Rene
- Ch 8- Daily Labor Guide.pdfЗагружено:mazenfarhat
- Hamilton Ch7Загружено:amirriaz1984
- Hypothesis TestingЗагружено:Karl Phillip Ramoran Alcarde
- Arellano - Quantile methods.pdfЗагружено:Invest
- SigmaXL FeaturesЗагружено:ppdat
- rainin_pipetteaccuracyandprecision.pdfЗагружено:Fabian Sealey
- Moisture in Pavements and Design SA EmeryЗагружено:thadikkaran