Multivariate Analysis of Crude Oil Composition

Article
pubs.acs.org/EF
Multivariate Analysis of Crude Oil Composition and Fluid Properties

Used in Multiphase Flow Metering (MFM)
Andreas L. Tomren,*,†,§ Tanja Barth,†,§ and Kjetil Folgerø‡,§
†
Department of Chemistry, University of Bergen, Bergen, Norway
‡
Christian Michelsen Research AS, Bergen, Norway
§
Michelsen Centre, Bergen, Norway
ABSTRACT: Crude oil characterization by infrared (IR) spectroscopy and whole oil gas chromatography (GC) has been used
to provide data for establishing multivariate prediction models for physical properties of crude oils. The parameters of interest are
used in multiphase flowmeters (MFMs) for monitoring production and transport of petroleum fluids, and permittivity
parameters are of special interest. Data for 20 crude oils and condensates has been acquired and modeled using partial least
squares (PLS) modeling. Good quality predictions were obtained for density, velocity of sound, and static and high frequency
permittivity. Biodegradation of crude oil is the main cause of variation in the modeled variables. The data required are obtained in
standard analytical procedures, and thus the approach has a considerable potential for use in on-site quality assurance.
1. INTRODUCTION Quality control based on monitoring the fluid composition using

Modern, sophisticated analytical techniques generate large standard crude oil analytical data provided by generally available
amounts of data that reflect both the composition and properties analytical instrumentation in combination with multivariate esti-
of crude oils and crude oil fractions. This opens up possibilities mation of the required parameters is thus an attractive alternative.
for estimating a wide range of important properties and quality A MFM measures the permittivity of oil−water−gas mix-
parameters for such complex mixtures based on a few, or even a tures, and it is important to know the permittivity of each phase
single, set of analytical data. Conventionally, the values of the (oil, water, and gas) in order to calculate the ratio of the
physical properties of interest are mostly used in the form of different phases in the mixture. The topic of this work is to
simpler numerical scales or expressions, that is, the density, determine the permittivity, in addition to other important
permittivity, or octane number. Such parameters have tradition- variables for MFM metering such as density and velocity of
ally been individually determined, but applications based on sound, of the crude oil phase, and to investigate which
estimations or calibrations using complementary data have been compounds in the oil are important for the variation in the
increasing. Typically, the potential for estimating a number of different variables.
conventional classification parameters based on near infrared Permittivity of liquids is normally measured using dielectric
(NIR) spectral data using multivariate statistical methods has spectroscopy, and the spectra are described by the parameters
recently been reported,1,2 and this approach is already extensively of a Cole−Cole model13 fitted to the dielectric spectrum. When
used in modern refinery operations (e.g., ref 3). Similar strategies predicting the permittivity of a crude oil, either the parameters
have also been used previously to predict more limited sets of of the Cole−Cole equation or the whole spectrum can be
parameters.4−9 modeled.14 This work presents models for prediction of the
Determining the permittivity of mixed oil−water−gas (OWG) Cole−Cole parameters for the permittivity spectra based on
fluid flow is a typical example of a measurement where cali- compositional analysis of crude oils. Some additional param-
bration models could be useful. In crude oil production moni- eters that are relevant for the permittivity and flow monitoring
toring, multiphase flowmeters (MFMs) are used for online in general are also modeled. Both spectral profiles and com-
monitoring of the OWG flow in pipelines, where the output of pound based compositional data are tested for their potential to
the measurement is the volume or mass of each phase passing provide reliable estimates of the parameter values. Standard
the flowmeter in a given time. Data from multiphase meters help IR spectra comprise the spectral data, while individual hydro-
in optimizing petroleum production, increasing the oil recovery carbon distributions are determined using whole oil gas chro-
and lowering the investments and operational costs.10,11 matography. For both analytical techniques, the analytical data
Permittivity measurements provide input for the determination is so extensive that direct bivariate correlation is not feasible,
of flow rates and relative distributions of the fluids in several well and data treatment procedures that can handle large data sets
established measurement technologies.12 Such systems are, how- are needed. PCA (principal component analysis) of the whole
ever, calibrated to the initial oil composition and may loose data set is therefore used initially to explore the systematic
precision over time due to changes in the fluid compositions that variation in the analytical data,15 while partial least squares
result in changes in the actual permittivities relative to the incor-
porated calibration values. Monitoring the permittivity regu- Received: April 12, 2012
larly is one approach to quality assurance of the meter readings, Revised: July 27, 2012
but the necessary instrumentation is often not easily available. Published: August 2, 2012
© 2012 American Chemical Society 5679 dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688
Energy & Fuels Article
(PLS) is used to establish the multivariate calibration models.16 assessment are based on the Norwegian Standard Oil (NSO-1),18
The quality of the models is determined by evaluating the in which the compounds have been identified in a GC-MS, and by
accuracy of the prediction of samples not included in the manual inspection, the compounds in the chromatograms have
initially modeled data. Simple data pretreatment is applied to been identified and quantified for all the oils, assuming a constant
make the data set internally comparable. response factor. The quantified values for two gas chromatograms
for each oil are averaged, giving the quantified values used in the
2. MATERIAL AND METHODS modeling.
2.1. Crude Oil Samples. Twenty crude oils, originating 2.4. Velocity of Sound. The velocity of sound of the oils
mainly from the North Sea, have been analyzed for physical and was measured by the technique described by Bjørndal et al.
electrical properties and chemical composition. The types of (2008),19 where the measurement cell was modified to include
measurements performed and the number of variables pro- a pressure seal and a pressure transmitter. As eq 1 shows, the
duced are specified in Table 1. The data have also previously velocity is dependent on the density of the given fluid.
Table 1. Types of Analytical Data and Number of Variables K

c fluid =
from Each Procedure ρ (1)
analysis no. of variables units Equation 1: Velocity of sound in a given fluid. cfluid = speed of
density 1 g/mL at 20 °C sound in given fluid, K = bulk modulus of given fluid, and ρ =
velocity of sound 1 m/s at 30 °C density of the given fluid.20
whole oil GC 82 normalized peak area 2.5. Dielectric Spectroscopy. The dielectric spectra were
FTIR 1738 absorbance measured on a measurement system for complex permittivity
dielectric spectroscopy 5 as specified for eq 2, 20 °C measurements at 20 °C, based on a system developed by
Christian Michelsen Research AS in 1996.21
been used in a preliminary presentation of some models for an 2.6. Density. The density of the oils has been obtained by
extended range of parameters.9 The data set consists of 4 using an Anton Paar K.G. DMA 60 densitometer with DMA
condensates, which are light crude oils with a clear yellow to 602 measuring cell. Air and distilled water is first measured
brown color, while the rest of the 16 crude oils are black and for calibration, then the oil. Five measurements for each oil
opaque. The biodegradation level of all the crude oils in the have been averaged, and the resulting value is used in the
data set have been determined on the qualitative scale of Peters modeling.
and Moldowan17 by visual inspection of the hydrocarbon 2.7. Data Set for the Multivariate Analysis. The values
distribution as observed in the GC traces. for density and velocity of sound are used directly as variables
All oils, prior to all measurements, have been placed in an in the data set. A total of 82 compounds have been quantified
oven at 60 °C for 4 h, to dissolve waxes that may have from the GC analysis, all of them compounds containing only
precipitated during storage. They have also been shaken and carbon and hydrogen. 1738 variables have been collected from
turned upside down multiple times, to homogenize the oils the FTIR, corresponding to the absorbance at each wave-
thoroughly. number. Five variables have been derived from the dielectric
2.2. Infrared (IR) Spectrospcopy. A Nicolet Protegè 460 spectroscopy, corresponding to the 5 variables in the Cole−
FTIR (Fourier transform infrared) spectrometer with an ATR Cole equation obtained from curve fitting of the spectra (eq 2).
(attenuated total reflection) measuring cell, equipped with a The data available for the analysis are summarized in Table 1.
diamond crystal, has been used for obtaining FTIR spectra εs − ε∞ σ
of the oils. One drop of oil is placed on the crystal, and ε* = ε∞ + 1−α
−j
measurements (32 scans, giving an averaged spectrum) are 1 + (j ϖ τ ) ϖε0 (2)
taken. For quality assurance, 5 drops has been measured for
Equation 2: Cole−Cole equation, for curve fitting of dielectric
each oil to eliminate differences due to lack of homogenization,
spectroscopy. ε* = relative permittivity (dimensionless), ε∞ =
and the average of the resulting five spectra has been used in
high frequency permittivity (dimensionless), εs = static
the modeling. The data is given as absorbance: A = log(1/R),
permittivity (dielectric constant at low frequencies, dimension-
where R is the percentage reflectance divided by 100. Percent
less), ω = angular frequency (radians/second), τ = macroscopic
reflectance shows the amount of infrared energy reflected from
relaxation time (seconds), σ = finite conductivity (Siemens/
the sample: %R = (IS/IB)100, where IS is the intensity of
meter), α = empirical factor (distribution factor, dimension-
infrared energy reflected from the sample and IB is the intensity
less), ε0 = permittivity in vacuum (Farads/meter).
of the infrared energy passing through the reflection accessory
without a sample in place.
2.3. Whole Oil Gas Chromatography (WOGC). The oils 3. MULTIVARIATE ANALYSIS
have been analyzed on a ThermoFinnigan Trace GC 3.1. Data Pretreatment. Data collected directly from an
instrument equipped with a flame ionization detector (FID). instrument is termed raw data. Raw data can contain noise,
The stationary phase is a HP-PONA dimethylpolysiloxane column baseline drift, scattering effects, and other factors that may
(50 m × 0.20 mm × 0.5 μm) from Agilent technologies. The influence the significant information in the data set. Therefore,
mobile phase is helium. The temperature program is as follows: it may be necessary to pretreat the raw data, in order to remove
30 °C for 15 min, 1.5 °C/min 60 C, 4 °C/min 320 °C, 320 C for effects that do not represent chemical, physical, electrical,
35 min. The injector temperature is 300 °C while the FID is kept or biological properties in the sample. To find the variation
at 350 °C. Warm, homogenized crude oil (1 μL) is introduced between the objects, the raw data needs to be centered. This
manually into the GC system through a syringe, using split in- can be done by calculating the average value for each variable,
jection. The assignment of chromatographic peaks and quality and then, subtracting this from each of the original variables.
5680 dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688
Table 2. Range of Measured Physical Properties

static high frequency velocity of sound
permittivity permittivity α τ (s) σ (S/m) density (g/mL) (m/s)
range 2.017−2.270 1.996−2.337 0.0318−0.9210 0−5.7 × 10−7 0−1 × 10−7 0.7300−0.940 1141−1447
standard dev. 0.02 0.02 0.03 7.29% 0.007
exptl 0.05 0.04 0.06 0.013
uncertainty
By doing this, the origin of the coordinate system is placed at Equation 4: Regression model for PLS modeling. N is the total
the center of gravity in the data set. amount of variables in the model. B0 is the starting point, B1,
N B2, ..., BN are regression coefficients for variable 1, 2, ..., N. X1,
1 X2, ..., XN is variable 1, 2, ..., N.
Xcentred(i , j) = X(i , j) − ∑ x(i , j)
N i=1 (3) By evaluating the regression coefficients for a given model,
you can detect which variables have the most significant effect
Equation 3: Centring of the data set. X = full matrix, i = column on the model, be it positive or negative effect and, hence, which
in matrix, j = row in matrix, N = total number of objects. variables are most important for the variation in the model.
Pretreatment of GC Data. When injecting samples onto the Nevertheless, the effect of the total amount of coefficients usually
GC column, it is not certain that the amount of sample is is more important than the effect of one single coefficient.
exactly the same in every injection. To eliminate any effects In the modeling stage, the goal was to find the models that
from this, the quantified amounts have been normalized to
gave as low deviations as possible for the validation objects. The
constant sum. This is done by dividing the selected variables of
number of latent variables (LVs) giving the lowest deviations is
each object with the sum of the objects to obtain the relative
chosen for each model. Based on this, the number of LVs in the
distribution of the variables in each object. This procedure is
normal for GC.22,23 For models based on GC, the data sets has different models is not necessarily the same, causing some of
been centered and normalized to constant sum. the models to be more robust than the others. The number of
Pretreatment of IR Data. For the models based on FTIR, LVs has been chosen bearing in mind that if too many LVs are
the raw data has been centered, but no further pretreatment has chosen, there is a possibility that noise is modeled as well as the
been done. significant signal, causing the model to give poor predictions for
3.2. Modeling. Multivariate data analysis has been unknown samples. Also, if too few LVs are chosen, there is a
performed using the SIRIUS program package, version 7.0.24 possibility that some of the significant information remains
PLS models have been built, based on GC and FTIR data, in unmodeled and gets categorized as noise, also causing the
order to investigate the possibility of predicting the other physical model to give poor predictions for unknown samples.
and electrical properties that have been measured.
The data sets are first investigated as PCA models to deter- 4. RESULTS
mine the degree of systematic variation in the data. The prin-
4.1. Initial Data Evaluation. The ranges for the measured
cipal components (PCs) provide a system of orthogonal axes
physical properties for all the oils are given in Table 2. The
that each describe a maximum of the systematic variation
remaining in the data set. measurements span a reasonable range of values for most of the
The PLS models are then established. PLS models generate parameters, with the exception of τ, dielectric relaxation time,
Latent Variables (LVs) that are orthogonal and describe the and σ, finite conductivity, where the value is zero for 3 and 10
maximum covariation between the independent data matrix (the of the samples, respectively. This lack of variation can represent
analytical data) and the dependent data (the predicted properties). a problem for establishing robust models. The uncertainty in α
Each model is based on 17 of the 20 oils in the data sets. and τ is high because they are based on the curve fit of the
Three oils in each data set have been omitted from the models Cole−Cole model, while σ is close to zero for oils in general.
in order to use them as validation objects. It is important to This means that, among the parameters extracted from the
balance the number of validation objects against the total number Cole−Cole model, static permittivity and high frequency
of samples, as the model might be less robust if too many oils are permittivity are easier to determine than α, τ, and σ.
omitted from the model building. The predictive quality of the The standard deviations in Table 2 are based on replicate
models can be examined by testing the apparently unknown experiments. For the permittivity variables, the standard devia-
objects against the models. For each model, one biodegraded oil, tion is based on two replicates, density is based on three repli-
one nonbiodegraded oil, and one condensate has been omitted cates, while for velocity of sound no replicates were measured.
from the data set, as these oils in general are chemically different. The experimental uncertainty in Table 2 is the standard
These validation objects have been chosen based on their values deviation multiplied by 2, as ±1.96 of the standard deviation
for each variable; for each model, there is one validation object about the mean marks the range within which, when a sample
with high value, one validation object with low value, and one vali- exists, there is a 95% chance that it is a part of the population.
dation object with medium value. This is done in order to validate Standard deviations for σ and velocity of sound could not be
the predictive quality of the model, given unknown samples with obtained because of the insufficient number of measurements.
high, low, and medium values of the modeled variable. The standard deviation for τ is given in %, as the standard
When building a PLS model, you get a model in the form of deviation varies relatively with size.
eq 4. PCA performed on GC data of the sample set of crude oils
show that three groupings occur, separating biodegraded oils,
Y = B0 + B1 × X1 + B2 × X2 + B3 × X3
nonbiodegraded oils and condensates, as shown in Figure 1.The
+ ...BN × XN (4) two first PCs explain 67.6% of the total variance in the data set,
coordinate system. Condensates are partly clustering with the

nonbiodegraded oils and partly spanning out a different direc-
tion than biodegraded and nonbiodegraded oils. This indicates,
as expected, that biodegraded and nonbiodegraded oils are
chemically different, based on GC data.
PCA performed on IR data of the same set of crude and
model oils show similar groupings (Figure 2), though with a
tendency to less variation within each group. The two first PC’s
explain 93% of the total variance in the data set, which show
that the systematic variation in the data is high.
Figure 2 shows that biodegraded oils span out in one direc-
tion from the origin of the coordinate system, while the non-
biodegraded oils span out a different direction of the coordinate
system. Condensates also span out a different direction of the
coordinate system. This indicates that biodegraded oils,
nonbiodegraded oils, and condensates are different based on
IR data.
The presence of such systematic variation indicates that the
Figure 1. PCA score plot of GC data. Brown circles = biodegraded oils. models for each data type describe variations due to chemical
Blue squares = nonbiodegraded oils. Light blue triangles = condensates. compositional factors. If this variation is connected directly or
indirectly to the properties to be modeled, there is a potential
for generating prediction model for the parameters in question
and to use them to predict unknown physical properties of new
oils based on GC and IR data.
Some objects in the data set might be classified as outliers,
but since all of the oils in the data set are representative
samples, they cannot be excluded from the model. If they are
excluded, the range in which the model is valid might be
reduced. A larger data set would most likely expand the range
of the model, in addition to reducing the impression that some
objects are outliers.
3.2. Prediction Models by PLS. Table 3 gives an overview
of the established models, presented at a qualitative scale
ranging from poor, via OK for distinguishing between high and
low value, via OK, via good, to very good. This is done by com-
paring the experimental uncertainty of the measurements (shown
in Tables 4 and 5) with the prediction error of the validation
objects; if the error is much lower than the experimental
Figure 2. PCA score plot of IR data. Brown circles = biodegraded oils.
uncertainty, then the predictive quality of the model is rated as
Blue squares = nonbiodegraded oils. Light blue triangles = condensates. very good, if the error is similar to the experimental uncertainty,
then the predictive quality of the model is rated as good. If
the difference between the experimental uncertainty and the
which shows that there is a high degree of systematic variation in prediction error is larger, the prediction errors are more closely
the data, as opposed to unsystematic noise. examined. If the validation objects are predicted to be of high
Figure 1 shows that biodegraded oils span out in one direc- value, while the measured values are low, the model is rated as
tion from the origin of the coordinate system, while the non- poor. If the validation objects are predicted to be of high value,
biodegraded oils span out in a different direction of the and the measured values are of high value, the model is rated as
Table 3. Overview of Established Models with Qualitative Evaluation

GC IR
no. no.
modeled variable coding quality LVs explained variancea quality LVs explained variancea
static permittivity permvar e_st good 8 99.29% (99.32%) good 4 96.93% (98.52%)
high frequency permvar e_inf good 8 99.77% (99.36%) good 5 97.38% (98.29%)
permittivity
α permvar a OK 6 96.56% (83.24%) OK 4 96.87% (77.51%)
τ permvar t OK for distinguishing 7 98.68% (96.94%) OK for distinguishing between 6 98.78% (91.75%)
between high and low level high and low level
σ permvar s poor OK for distinguishing between
high and low level
density eensity good 6 95.49% (97.66%) good 3 89.80% (96.03%)
velocity of sound velocity of sound good 5 94.43% (96.38%) very good 5 99.57% (99.95%)
a
Total explained variance from Independent block, and total explained variance from dependent block in ().

Table 4. Overview of the Quantitative Predictive Quality of the Established Models for IR
low value object medium value object high value object
avg. dev. val. exptl.
modeled variable coding meas. pred. dev. meas. pred. dev. meas. pred. dev. obj.a unc.b
static permittivity permvar e_st 2.08 2.04 0.04 2.22 2.15 0.07 2.43 2.39 0.04 0.05 0.05
Energy & Fuels
high frequency permvar e_inf 2.04 2.03 0.01 2.14 2.12 0.02 2.22 2.24 −0.02 0.02 0.04
permittivity
α permvar a 0.33 0.26 0.07 0.51 0.43 0.08 0.72 0.62 0.1 0.08 0.06
τ permvar t 0 −2.79 × 10−7 2.79 × 10−7 4.20 × 10−9 9.24 × 10−8 −8.82 × 10−8 4.24 × 10−7 2.54 × 10−7 1.7 × 10−7 1.79 × 10−7
σ permvar s 0 7.70 × 10−9 7.70 × 10−9 9.80 × 10−9 4.30 × 10−9 5.5 × 10−9 7.70 × 10−8 6.75 × 10−8 9.5 × 10−9 7.57 × 10−9
density density 0.766 0.762 0.004 0.824 0.833 −0.009 0.926 0.940 −0.14 0.009 0.013
velocity of sound velocity of sound 1266 1268 −2 1347 1344 3 1398 1397 1 2
a
Average deviation of the validation objects. bStandard deviation for measured values (see Table 2) multiplied with 2, as ±1.96 of the standard deviation about the mean marks the range within which,
when a sample exists, there is a 95% chance that it is a part of the population.
5683
Table 5. Overview of the Quantitative Predictive Quality of the Established Models for GC
low value object medium value object High value object
avg. dev. val. exptl.
modeled variable coding meas. pred. dev. meas. pred. dev. meas. pred. dev. obj.a unc.b
static permittivity permvar e_st 2.08 2.05 0.03 2.22 2.22 0 2.43 2.41 0.02 0.02 0.05
high frequency permvar e_inf 2.04 2.05 −0.01 2.14 2.14 0 2.22 2.22 0 0.003 0.04
permittivity
α permvar a 0.27 0.42 −0.15 0.50 0.48 0.02 0.59 0.68 −0.09 0.09 0.06
τ permvar t 0 −9.00 × 10−10 9.00 × 10−10 4.00 × 10−9 4.20 × 10−9 −2 × 10−10 7.70 × 10−9 8.60 × 10−9 −9 × 10−10 3.37 × 10−9
σ permvar s 0 1.92 × 10−8 −1.92 × 10−8 9.70 × 10−9 3.16 × 10−8 −2.19 × 10−8 7.70 × 10−8 6.24 × 10−8 1.46 × 10−8 1.86 × 10−8
density density 0.766 0.788 −0.022 0.824 0.847 −0.023 0.904 0.902 0.002 0.016 0.013
velocity of sound velocity of 1157 1189 −32 1324 1329 −5 1423 1387 36 24
sound
a
Average deviation of the validation objects. bStandard deviation for measured values multiplied with 2, as ±1.96 of the standard deviation about the mean marks the range within which, when a sample
exists, there is a 95% chance that it is a part of the population.
Article
dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

Figure 3. Predicted value plotted against measured for (a) density based on GC, (b) velocity of sound based on GC, (c) static permittivity (e_st)
based on IR data, (d) high frequency permittivity (e_inf) based on IR data, (e) density based on IR data, (f) velocity of sound based on IR data.
Brown squares = biodegraded oils. Blue squares = nonbiodegraded oils. Light blue squares = condensates. Red squares = validation objects.
either OK for distinguishing between high and low value, or evaluated based on a modeling perspective. As clearly shown in
OK, depending on how close to the experimental uncertainty Figure 3e and f, the validation objects do not stand out in
the prediction errors are. comparison to the model objects, and with a maximum error of
The number of LVs for the different models are also given in 0.22% for the model for IR, it is difficult to argue against the
Table 3. statement that the model is very good. However, since no
Figure 3 shows modeling results for density and velocity of replicate measurements of velocity of sound are done, this
sound based on GC data, and static permittivity, high frequency cannot be verified completely.
permittivity, density, and velocity of sound based on IR data, all 4.3. Chemical Significance of the Models. Figure 4a−f
presented as predicted value plotted against measured value. shows the regression coefficients for the same set of models,
These figures represent the best modeling results for each giving an overview of which variables are important for building
variable for GC and IR. As the models for the permittivity the model, and therefore, the most important variables when
variables α, τ, and σ at best give “OK” results, the results for looking at variation in the different properties.
these models are not shown here. In addition, the modeling Figure 4 shows that positive effect on the model for density
for static and high frequency permittivity based on GC data based on GC data originates almost exclusively from branched
give good quality models, which already have been presented alkanes and the higher molecular weight straight chained alkanes,
in ref 9. while negative effect originates almost exclusively from low to
Figure 3a and b shows that prediction of density and velocity medium molecular weight straight chained alkanes. This indi-
of sound based on GC data are possible with good results cates that biodegradation of crude oil has an important effect
within the range of oil compositions represented in the data. on the variance of density in crude oils, since the smallest
Figure 3c−f shows that prediction of static permittivity, high straight chained alkanes are the first to be removed or altered
frequency permittivity, density, and velocity of sound, all based during biodegradation.17
on IR data, are possible with good results within the range of oil Similar trends are observed for the model for velocity of
compositions represented in the data. No standard deviation is sound, as shown in Figure 4b. Positive effects on the model
obtained for velocity of sound, since there was only made one originate almost exclusively from branched alkanes and the
measurement for each oil, so the quality of these models is higher molecular weight straight chained alkanes, while negative
Figure 4. Regression coefficients for model for (a) density based on GC data, (b) velocity of sound based on GC data, (c) static permittivity (e_st)
based on IR data, (d) high frequency permittivity based on IR data, (e) density based on IR data. The biodegradation effect is more evident for the
model for density than for the models for static and high frequency permittivity, as there are fewer effects that might be considered as noise. (f)
Regression coefficients for the model for velocity of sound based on IR data. Coding for the variables is given in Appendix 1.
effects originate almost exclusively from small to medium The coefficients for high frequency permittivity looks very
molecular weight straight chained alkanes. Thus, biodegrada- similar as for static permittivity, with a positive effect from
tion has an important effect also on the variance of the velocity CH3, both stretch and bend, and a negative effect from CH2
of sound in crude oils. stretch. In addition, there is a negative effect from CH2 bend at
For static permittivity modeled from IR, the area between around 1465 cm−1, enhancing the indication that biodegrada-
2800 cm−1, which corresponds to the signals originating from tion is the most important effect on the model.
CH3 stretch, have a positive effect, while signals originating For velocity of sound, the same trends for CH3 and CH2 as
from CH2 stretch have a negative effect. This is an indication of for the models for density and e_inf are present, but there are
biodegradation, as a high amount of CH3 indicates a high also a lot of effects that looks like noise. However, the predic-
amount of branched alkanes and a high amount of CH2 tive quality of the model is very good.
indicates a high amount of straight chained alkanes.4 As straight 4.4. Precision of Prediction. Figure 5 shows the predicted
chained alkanes are the first compounds to be attacked during and measured values for the three validation objects for the
biodegradation, the branched alkanes will dominate the models for static permittivity, high frequency permittivity, per-
composition increasingly. Other regions having a positive effect mittivity variable α, and density, all based on IR data, and Table 3
are the CH3 bend at around 1375 cm−1 and CO at around summarizes the quantitative modeling results. It is clear that the
1450 cm−1; these signals also indicate biodegradation, as the deviations are small for most of the models and validation
CO might origin from carboxylic acids formed during bio- objects. Exceptions are the permittivity variables α, τ, and σ,
degradation.4 From 800 cm−1 to 694 cm−1, there are signals where α is considered to give the best results of these three
having both positive and negative effects, these signals are in variables.
the fingerprint region and are not very easy to assess, but it is For the variables static permittivity and α, all validation ob-
likely that they originate from C−H bonds in aromatic com- jects are predicted with a value that is lower than the measured.
pounds. The positive effects are due to the fact that bio- It is not possible using PLS to determine whether this is a bias
degraded oils have more absorbance in these regions. or due to the choice of validation objects. A larger data set
Figure 5. Predicted and measured values for PLS models for static
permittivity, high frequency permittivity, permittivity variable α, and
density based on IR data. (a) High value validation objects; (b) Figure 6. Predicted and measured values for PLS models for static
medium value validation objects; (c) low value validation objects. permittivity, high frequency permittivity, permittivity variable α, and
density based on GC data. (a) High value validation objects; (b)
medium value validation objects; (c) low value validation objects.
might clarify this, as more validation objects can be used in the
model validation.
Compared to the deviations that Satya et al.1 achieved for 5. DISCUSSION
density based on NIR spectral data, we see that the results are As Figures 3−6 and Tables 4 and 5 show, PLS calibration
somewhat better; Satya et al. achieved deviations of 1.6% and models for prediction of several properties of crude oil based
5.3% for their two validation objects, while the three validation on both GC data and IR data can be built with good results.
objects in this work have deviations of 0.5%, 1.1%, and 1.5%. The results are comparable or even better in some cases than
Figure 6 shows the predicted and measured values for the previously established models.1 The reason for the improved
three validation objects for the models for static permittivity, quality of the models in this work is uncertain, as the basis for
high frequency permittivity, permittivity variable α, and density, the models in Satya et al.1 is not possible to identify precisely.
all based on GC data, and Table 4 summarizes the quantitative The improvement may be due to more consistent experimental
modeling results. It is clear that the deviations are small for data, since all measurements were performed by one operator
most of the models and validation objects. Exceptions are the during a short time period. The sample quality may also be
permittivity variables α, τ, and σ, where α is considered to give relevant. In addition, the distribution of the sample properties
the best results of the three variables. over a reasonable range may contribute, since the sample sets are
Compared to the deviations that Satya et al.1 achieved for not very large.
their model of density based on NIR, we see that the deviations Overall, the values estimated for the experimental uncertainties
for the model for density based on GC in this work are in the and the model uncertainties lie in the same range, indicating that
same range; Satya et al. achieved deviations of 1.6% and 5.3% the data give a good basis for accurate predictions.
for their two validation objects, while the three validation The model for velocity of sound based on IR data is in fact
objects in this work have deviations of 2.9%, 2.8%, and 0.3%. very good, at least from a modeling perspective, with an average
Table A1. Variable Coding and Variable Names for the GC Data
variable coding variable name variable coding variable name
iC5 iso-pentane nC8 n-octane
nC5 n-pentane e-cyC6 ethylcyclohexane
22dm-C4 2,2-dimethylbutane i-C9 iso-nonane
cyC5 cyclopentane e-benzene ethylbenzene
23dm-C4 2,3-dimethylbutane m-xylene meta-xylene
2 m-C5 2-methylpentane p-xylene para-xylene
3 m-C5 3-methylpentane 4 m-C8 4-methyloctane
nC6 n-hexane 2 m-C8 2-methyloctane
22dm-C5 2,2-dimethylpentane 3 m-C8 3-methyloctane
m-cyC5 methylcyclopentane o-xylene orto-xylene
24dm-C5 2,4-dimethylpentane nC9 n-nonane
223tm-C4 2,2,3-trimethylbutane i-C10 iso-decane
benzene benzene nC10 n-decane
33dm-C5 3,3-dimethylpentane i-C11 iso-undecane
cyC6 cyclohexane nC11 n-undecane
2 m-C6 2-methylhexane nC12 n-dodecane
23dm-C5 2,3-dimethylpentane i-C13 iso-tridecane
11dm-cyC5 1,1-dimethylcyclopentane i-C14 iso-tetradecane
3 m-C6 3-methylhexane nC13 n-tridecane
1c.3dm-cyC5 cis-1−3-dimethylcyclopentane i-C15 iso-pentadecane
1t.3dm-cyC5 trans-1−3-dimethylcyclopentane nC14 n-tetradecane
1t.2dm-cyC5 trans-1−2-dimethylcyclopentane i-C16 iso-hexadecane
nC7 n-heptane nC15 n-pentadecane
m-cyC6 methylcyclohexane nC16 n-hexadecane
113tm-cyC5 1,1,3-trimethylcyclopentane i-C18 iso-octadecane
e-cyC5 ethylcyclopentane nC17 n-heptadecane
25dm-C6 2,5-dimethylhexane pristane pristane
223tm-C5/24dm-C6 2,2,3-trimethylpentane/2,4-dimethylhexane nC18 n-octadecane
1c.2t.4tm-cyC5 cis-1-trans-2−4-trimethylcyclopentane phytane phytane
33dmC6 3,3-dimethylhexane nC19 n-nonadecane
1t.2c.3tm-cyC5 trans-1-cis-2−3-methylcyclopentane nC20 n-icosane
234tm-C5 2,3,4-trimethylpentane nC21 n-henicaosane
Toluen/233tm-C5 toluene/2,3,3-trimethylpentane nC22 n-docosane
23dm-C6 2,3-dimethylhexane nC23 n-tricosane
2 m-C7 2-methylheptane nC24 n-tetracosane
4 m-C7 4-methylheptane nC25 n-pentacosane
3 m-C7 3-methylheptane nC26 n-hexacosane
1.c3dm-cyC6 cis-1−3-dimethylcyclohexane nC27 n-heptacosane
1.14dm-cyC6 trans-1−4-dimethylcyclohexane nC28 n-octacosane
11dm-cyC6 1,1-dimethylcyclohexane nC29 n-nonacosane
1t.2dm-cyC6 trans-1−2-dimethylcyclohexane nC30 n-triacontane
error of 0.12% for the validation objects, while the models for τ biodegradation, the branched and large molecular weight straight
(dielectric relaxation) and σ (conductivity) are more inaccurate. chained alkanes will have a proportionally higher relative
For quality assurance purposes for the flowmeters, the abundance in the crude oil mixture. The small to medium mole-
models for static permittivity, high frequency permittivity, cular weight alkanes are mostly negative contributors, further
density, and velocity of sound, based on GC and IR, respec- supporting that biodegradation is the main cause of variation in
tively, are considered precise enough to be useful. As a minimum, the modeled variables.
the prediction of significantly different values than used in the The permittivity of a pure compound increases with increas-
initial calibration of the MFM will indicate that a new calibration ing number of carbon atoms in the compound;25 hence, biode-
of the system is required. graded oils generally have a higher permittivity than non-
The models also give information on the chemical basis for biodegraded oils, since they have a higher relative abundance of
the variations in the sample set. higher molecular weight straight chained alkanes. This is also
For most of the models, the regression coefficients that build observed in the measurements for this data set.
up most of the models, both for GC and IR, strongly indicate For the models based on IR, the biodegradation effect is also
that biodegradation of crude oils is the main cause of variation in observed since significant values for the regression coefficients
the modeled variables. For models based on GC, the main tend to originate from areas in the IR spectra that are
positive contributors to the model are branched alkanes and long, influenced by the biodegradation process. For all models, CH3
straight chained alkanes. As low to medium molecular weight stretch regions are positive contributors, while CH2 stretch
straight chained alkanes are the first molecules to be depleted by regions are negative contributors, reflecting that biodegraded
oils have higher absorbance in the CH3 regions than non- (5) Parisotto, G.; Ferrao, M.; Muller, A. L. H.; Muller, E. I.; Santos,
biodegraded oils, while they have lower absorbance in the CH2 M. F. P.; Guimaraes, R. C. L.; Dias, J. C. M.; Flores, E. M. M. Energy
regions. A strong absorbance in the CH3 regions indicates a Fuels 2010, 24, 5474−5478.
high amount of branched alkanes, while a strong absorbance in (6) Abbas, O.; Rebufa, C.; Dupuya, N.; Permanyer, A.; Kister, J.
Talanta 2008, 75, 857−871.
the CH2 regions indicates more straight chained alkanes. (7) Flumignan, D. L.; Ferreira, F. O.; Tininis, A. G.; de Oliveira, J. G.
Biodegradation also causes increased contents of polar com- Chemom. Intell. Lab. Syst. 2008, 92, 53−60.
pounds in crude oils, either due to production of, for example, (8) Peinder, P.; Visser, T.; Petrauskas, D. D.; Salvatori, F.; Soulimani,
organic acids as metabolites in the microbial processes, or due F.; Weckhuysen, B. M. Vib. Spectrosc. 2009, 51, 205−212.
to the partial removal of nonpolar bulk hydrocarbons. The IR (9) Tomren, A. L.; Barth, T.; Folgerø, K. Unpublished results 2012.
spectra can reflect this increase directly in the parts of the (10) Falcone, G.; Harrison, B. Oil Gas J. 2011, 109 (10), 68−73.
spectra that reflect carbon−oxygen bonds. This is observed in (11) Thorn, R.; Johansen, G. A.; Hammer, E. A. In 1st World
the regression coefficients as having a positive effect on the Conference on Industry Process Tomography, Buxton, Greater Man-
model, and the trend is that the biodegraded oils have stronger chester, U.K., 1999.
absorbance in those areas than the oils that are not biodegraded, (12) Falcone, G.; Hewitt, G. F.; Alimonti, C.; Harrison, B. J. Pet.
Technol. 2002, 54, 77.
which further supports the interpretation that biodegradation is
(13) Cole, K. S.; Cole, R. H. J. Chem. Phys. 1941, 9, 341−351.
the major cause of variation in the modeled variables. (14) Carlson, J. E.; Tomren, A. L.; Folgerø, K.; Barth, T. Chemom.
The condensate samples are strongly dominated by hydro- Intell. Lab. Syst. 2012, submitted for publication.
carbons in the low molecular range, so the differences in the (15) Wold, S.; Esbensen, K.; Geladi, P. Chemom. Intell. Lab. Syst.
hydrocarbon composition will be dominant in these samples. 1987, 2, 37.
A larger sample set consisting of condensates only might reveal (16) Hoskuldsson, A. J. Chemom. 1995, 9, 91.
patterns of variation that are not obvious in the sample set used (17) Peters, K. E.; Moldowan, J. M. The Biomarker Guide: Interpreting
here. Molecular Fossils in Petroleum and Ancient Sediments; 1st ed.; Prentice
Overall, GC- and IR-based PLS calibration models of Hall: Englewood Cliffs, NJ, 1993.
properties that are important in MFM operation have shown (18) Weiss, H. M.; Wilhelms, A.; Mills, N.; Scotchmer, J.; Hall, P. B.;
Lind, K.; Brekke, T. NIGOGA, The Norwegian Industry Guide to
good predictive quality, supporting their usefulness in oil Organic Geochemical Analyses, edition 4.0; Norsk Hydro, Statoil,
production and transport facilities. A typical application would Geolab Nor, SINTEF Petroleum Research, and the Norwegian
be to detect changes in oil composition during the production Petroleum Directorate: Norway, 2000. Available online: http://www.
lifetime of an oil field and highlight the need for updating npd.no/engelsk/nigoga/default.htm
calibration values. Both analytical techniques provide data that (19) Bjørndal, E.; Frøysa, K. E.; Engeseth, S. A. IEEE Trans. Ultrason.,
give good model results. If MFM calibration is the major Ferroelectr. Freq. Control 2008, 55 (8), 1794−1808.
purpose, IR spectroscopy is the best approach. It is a much (20) Halliday, D.; Resnick, R.; Walker, J. Fundamentals of Physics, 5th
faster and easier measuring technique; also, portable measuring ed. extended; Wiley: Hoboken, NJ, 1997.
devices for IR spectroscopy already exist. GC measurements (21) Folgerø, K. PhD thesis, University of Bergen, 1996.
take several hours but are already in use for quality control in (22) Blomquist, G.; Johansson, E.; Söderström, B.; Wold, S. J.
Chromatogr. 1979, 173, 7−17.
other contexts, and may therefore be a good choice in a com-
(23) Karrer, L. L.; Gordon, H. L.; Rothstein, S. M.; Miller, J. M.;
bined flow assurance perspective.
■
Jones, T. R. B. Anal. Chem. 1983, 55, 1723−1728.
(24) SIRIUS, Pattern Recognition Systems AS (PRS AS), Version 7.0,
APPENDIX A 2004. Available online: http://www.prs.no/Sirius/Sirius.html (ac-
Table A1 shows the variable coding and variable names for the cessed March 28, 2012)
gas chromatography data, as used in Figure 4. (25) Maryott, A. A.; Smiths, E. R. Table of Dielectric Constants of Pure
■
Liquids, Circular 514; United States Department of Commerce,
AUTHOR INFORMATION National Bureau of Standards: Washington, DC, 1951.
Corresponding Author
*E-mail: andreas.tomren@kj.uib.no.
Notes
The authors declare no competing financial interest.
■ ACKNOWLEDGMENTS
The funding for this work comes from the Norwegian Research
Council through the Michelsen Centre for Research Based
Innovation and from Norwegian industrial partner Roxar.
■ REFERENCES
(1) Satya, S.; Roehner, R. M.; Milind, D. D.; Hanson, F. V. Energy
Fuels 2007, 21, 998−1005.
(2) Morris, R. E.; Hammond, M. H.; Cramer, J. A.; Johnson, K. J.;
Giordano, B. C.; Kramer, K. E.; Rose-Pehrssson, S. L. Energy Fuels
2009, 23, 1610−1618.
(3) Statoil, Norway. Advanced analysis technology. Available online:
http://www.statoil.com/en/technologyinnovation/refiningandprocessing/
oilrefining/nir/pages/default.aspx (accessed March 1, 2012).
(4) Genov, G.; Nodland, E.; Skaare, B. B.; Barth, T. Org. Geochem.
2008, 39 (8), 1229−1234.

Multivariate Analysis of Crude Oil Composition

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Multivariate Analysis of Crude Oil Composition

Загружено:

Авторское право:

Доступные форматы

Article

Multivariate Analysis of Crude Oil Composition and Fluid Properties

1. INTRODUCTION Quality control based on monitoring the ﬂuid composition using

Table 1. Types of Analytical Data and Number of Variables K

Table 2. Range of Measured Physical Properties

coordinate system. Condensates are partly clustering with the

Table 3. Overview of Established Models with Qualitative Evaluation

5682 dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

5688 dx.doi.org/10.1021/ef300620r | Energy Fuels 2012, 26, 5679−5688

Вам также может понравиться