You are on page 1of 69

dtscalibration

Release 0.6.3

Apr 09, 2019


Contents

1 Overview 1
1.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Learn by examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Installation 3

3 Usage 5

4 Learn by Examples 7
4.1 1. Load your first measurement files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 2. Common DataStore functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 3. Define calibration sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 4. Calculate variance of Stokes and anti-Stokes measurements . . . . . . . . . . . . . . . . . . . . . 13
4.5 5. Calibration of double ended measurement with OLS . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.6 6. Calibration of double ended measurement with OLS . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.7 7. Calibration of single ended measurement with WLS and confidence intervals . . . . . . . . . . . . 23
4.8 8. Calibration of double ended measurement with WLS and confidence intervals . . . . . . . . . . . 30
4.9 9. Import a time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.10 10. Align double ended measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5 Reference 41
5.1 dtscalibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6 Contributing 53
6.1 Bug reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Documentation improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.3 Feature requests and feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.4 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7 Authors 55

8 Changelog 57
8.1 0.6.3 (2019-04-03) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.2 0.6.2 (2019-02-26) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
8.3 0.6.1 (2019-01-04) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.4 0.6.0 (2018-12-08) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

i
8.5 0.5.3 (2018-10-26) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.6 0.5.2 (2018-10-26) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
8.7 0.5.1 (2018-10-19) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.8 0.4.0 (2018-09-06) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.9 0.2.0 (2018-08-16) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
8.10 0.1.0 (2018-08-01) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

9 Indices and tables 61

Python Module Index 63

ii
CHAPTER 1

Overview

docs
tests

package

citable
Example notebooks

A Python package to load raw DTS files, perform a calibration, and plot the result
• Free software: BSD 3-Clause License

1.1 Installation

pip install dtscalibration

Or the development version directly from GitHub

pip install https://github.com/dtscalibration/python-dts-calibration/zipball/master --


˓→upgrade

1
dtscalibration, Release 0.6.3

1.2 Learn by examples

Interactively run the example notebooks online by clicking the launch-binder button.

1.3 Documentation

https://python-dts-calibration.readthedocs.io/

2 Chapter 1. Overview
CHAPTER 2

Installation

At the command line:

pip install dtscalibration

3
dtscalibration, Release 0.6.3

4 Chapter 2. Installation
CHAPTER 3

Usage

To use dtscalibration in a project:

import dtscalibration

5
dtscalibration, Release 0.6.3

6 Chapter 3. Usage
CHAPTER 4

Learn by Examples

4.1 1. Load your first measurement files

This notebook is located in https://github.com/bdestombe/python-dts-calibration/tree/master/examples/notebooks


The goal of this notebook is to show the different options of loading measurements from raw DTS files. These files
are loaded into a DataStore object. This object has various methods for calibration, plotting. The current supported
devices are: - Silixa - Sensornet
This example loads Silixa files. Both single-ended and double-ended measurements are supported. The first step is to
load the correct read routine from dtscalibration. - Silixa -> dtscalibration.read_silixa_files -
Sensornet -> dtscalibration.read_sensornet_files

import os
import glob

from dtscalibration import read_silixa_files

The example data files are located in ./python-dts-calibration/tests/data.

filepath = os.path.join('..', '..', 'tests', 'data', 'double_ended2')


print(filepath)

../../tests/data/double_ended2

# Bonus: Just to show which files are in the folder


filepathlist = sorted(glob.glob(os.path.join(filepath, '*.xml')))
filenamelist = [os.path.basename(path) for path in filepathlist]

for fn in filenamelist:
print(fn)

7
dtscalibration, Release 0.6.3

channel 1_20180328014052498.xml
channel 1_20180328014057119.xml
channel 1_20180328014101652.xml
channel 1_20180328014106243.xml
channel 1_20180328014110917.xml
channel 1_20180328014115480.xml

Define in which timezone the measurements are taken. In this case it is the timezone of the Silixa Ultima computer
was set to ‘Europe/Amsterdam’. The default timezone of netCDF files is UTC. All the steps after loading the raw files
are performed in this timezone. Please see www..com for a full list of supported timezones. We also explicitely define
the file extension (.xml) because the folder is polluted with files other than measurement files.

ds = read_silixa_files(directory=filepath,
timezone_netcdf='UTC',
file_ext='*.xml')

6 files were found, each representing a single timestep


6 recorded vars were found: LAF, ST, AST, REV-ST, REV-AST, TMP
Recorded at 1693 points along the cable
The measurement is double ended
Reading the data from disk

The object tries to gather as much metadata from the measurement files as possible (temporal and spatial coordinates,
filenames, temperature probes measurements). All other configuration settings are loaded from the first files and stored
as attributes of the DataStore.

print(ds)

<dtscalibration.DataStore>
Sections: ()
Dimensions: (time: 6, x: 1693)
Coordinates:
* x (x) float64 -80.5 -80.38 -80.25 ... 134.3 134.4 134.5
filename (time) <U31 'channel 1_20180328014052498.xml' ... 'channel
˓→1_20180328014115480.xml'

filename_tstamp (time) int64 20180328014052498 ... 20180328014115480


timeFWstart (time) datetime64[ns] 2018-03-28T00:40:52.097000 ... 2018-
˓→03-28T00:41:15.061000

timeFWend (time) datetime64[ns] 2018-03-28T00:40:54.097000 ... 2018-


˓→03-28T00:41:17.061000

timeFW (time) datetime64[ns] 2018-03-28T00:40:53.097000 ... 2018-


˓→03-28T00:41:16.061000

timeBWstart (time) datetime64[ns] 2018-03-28T00:40:54.097000 ... 2018-


˓→03-28T00:41:17.061000

timeBWend (time) datetime64[ns] 2018-03-28T00:40:56.097000 ... 2018-


˓→03-28T00:41:19.061000

timeBW (time) datetime64[ns] 2018-03-28T00:40:55.097000 ... 2018-


˓→03-28T00:41:18.061000

timestart (time) datetime64[ns] 2018-03-28T00:40:52.097000 ... 2018-


˓→03-28T00:41:15.061000

timeend (time) datetime64[ns] 2018-03-28T00:40:56.097000 ... 2018-


˓→03-28T00:41:19.061000

* time (time) datetime64[ns] 2018-03-28T00:40:54.097000 ... 2018-


˓→03-28T00:41:17.061000

acquisitiontimeFW (time) timedelta64[ns] 00:00:02 00:00:02 ... 00:00:02


acquisitiontimeBW (time) timedelta64[ns] 00:00:02 00:00:02 ... 00:00:02
(continues on next page)

8 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

(continued from previous page)


Data variables:
ST (x, time) float64 1.281 -0.5321 ... -43.44 -41.08
AST (x, time) float64 0.4917 1.243 ... -30.14 -32.09
REV-ST (x, time) float64 0.4086 -0.568 ... 4.822e+03
REV-AST (x, time) float64 2.569 -1.603 ... 4.224e+03
TMP (x, time) float64 196.1 639.1 218.7 ... 8.442 18.47
acquisitionTime (time) float32 2.098 2.075 2.076 2.133 2.085 2.062
referenceTemperature (time) float32 21.0536 21.054 ... 21.0531 21.057
probe1Temperature (time) float32 4.36149 4.36025 ... 4.36021 4.36118
probe2Temperature (time) float32 18.5792 18.5785 ... 18.5805 18.5723
referenceProbeVoltage (time) float32 0.121704 0.121704 ... 0.121705
probe1Voltage (time) float32 0.114 0.114 0.114 0.114 0.114 0.114
probe2Voltage (time) float32 0.121 0.121 0.121 0.121 0.121 0.121
userAcquisitionTimeFW (time) float32 2.0 2.0 2.0 2.0 2.0 2.0
userAcquisitionTimeBW (time) float32 2.0 2.0 2.0 2.0 2.0 2.0
Attributes:
uid: ...
nameWell: ...
nameWellbore: ...
name: ...
indexType: ...
startIndex:uom: ...
startIndex:#text: ...
endIndex:uom: ...
endIndex:#text: ...

.. and many more attributes. See: ds.attrs

4.2 2. Common DataStore functions

Examples of how to do some of the more commonly used functions:


1. mean, min, max, std
2. Selecting
3. Selecting by index
4. Downsample (time dimension)
5. Upsample / Interpolation (length and time dimension)

import os

from dtscalibration import read_silixa_files

First we load the raw measurements into a DataStore object, as we learned from the previous notebook.

filepath = os.path.join('..', '..', 'tests', 'data', 'single_ended')

ds = read_silixa_files(
directory=filepath,
timezone_netcdf='UTC',
file_ext='*.xml')

4.2. 2. Common DataStore functions 9


dtscalibration, Release 0.6.3

3 files were found, each representing a single timestep


4 recorded vars were found: LAF, ST, AST, TMP
Recorded at 1461 points along the cable
The measurement is single ended
Reading the data from disk

4.2.1 0 Access the data

The implemented read routines try to read as much data from the raw DTS files as possible. Usually they would have
coordinates (time and space) and Stokes and anti Stokes measurements. We can access the data by key. It is presented
as a DataArray. More examples are found at http://xarray.pydata.org/en/stable/indexing.html

ds['ST'] # is the data stored, presented as a DataArray

<xarray.DataArray 'ST' (x: 1461, time: 3)>


array([[-8.05791e-01, 4.28741e-01, -5.13021e-01],
[-4.58870e-01, -1.24484e-01, 9.68469e-03],
[ 4.89174e-01, -9.57734e-02, 5.62837e-02],
...,
[ 4.68457e+01, 4.72201e+01, 4.79139e+01],
[ 3.76634e+01, 3.74649e+01, 3.83160e+01],
[ 2.79879e+01, 2.78331e+01, 2.88055e+01]])
Coordinates:
* x (x) float64 -80.74 -80.62 -80.49 ... 104.6 104.7 104.8
filename (time) <U31 'channel 2_20180504132202074.xml' ... 'channel 2_
˓→20180504132303723.xml'

filename_tstamp (time) int64 20180504132202074 ... 20180504132303723


timestart (time) datetime64[ns] 2018-05-04T12:22:02.710000 ... 2018-05-
˓→04T12:23:03.716000

timeend (time) datetime64[ns] 2018-05-04T12:22:32.710000 ... 2018-05-


˓→04T12:23:33.716000

* time (time) datetime64[ns] 2018-05-04T12:22:17.710000 ... 2018-05-


˓→04T12:23:18.716000

acquisitiontimeFW (time) timedelta64[ns] 00:00:30 00:00:30 00:00:30


Attributes:
name: ST
description: Stokes intensity
units: -

ds['TMP'].plot(figsize=(12, 8));

4.2.2 1 mean, min, max

The first argument is the dimension. The function is taken along that dimension. dim can be any dimension (e.g.,
time, x). The returned DataStore does not contain that dimension anymore.
Normally, you would like to keep the attributes (the informative texts from the loaded files), so set keep_attrs to
True. They don’t take any space compared to your Stokes data, so keep them.
Note that also the sections are stored as attribute. If you delete the attributes, you would have to redefine the sections.

ds_min = ds.mean(dim='time', keep_attrs=True) # take the minimum of all data


˓→variables (e.g., Stokes, Temperature) along the time dimension

10 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

ds_max = ds.max(dim='x', keep_attrs=True) # Take the maximum of all data variables


˓→(e.g., Stokes, Temperature) along the x dimension

ds_std = ds.std(dim='time', keep_attrs=True) # Calculate the standard deviation


˓→along the time dimension

4.2.3 2 Selecting

What if you would like to get the maximum temperature between 𝑥 >= 20 m and 𝑥 < 35 m over time? We first have
to select a section along the cable.

section = slice(20., 35.)


section_of_interest = ds.sel(x=section)

section_of_interest_max = section_of_interest.max(dim='x')

What if you would like to have the measurement at approximately 𝑥 = 20 m?

point_of_interest = ds.sel(x=20., method='nearest')

4.2.4 3 Selecting by index

What if you would like to see what the values on the first timestep are? We can use isel (index select)

section_of_interest = ds.isel(time=slice(0, 2)) # The first two time steps

section_of_interest = ds.isel(x=0)

4.2.5 4 Downsample (time dimension)

We currently have measurements at 3 time steps, with 30.001 seconds inbetween. For our next exercise we would like
to down sample the measurements to 2 time steps with 47 seconds inbetween. The calculated variances are not valid
anymore. We use the function resample_datastore.

ds_resampled = ds.resample_datastore(how='mean', time="47S")

4.2.6 5 Upsample / Interpolation (length and time dimension)

So we have measurements every 0.12 cm starting at 𝑥 = 0 m. What if we would like to change our coordinate system
to have a value every 12 cm starting at 𝑥 = 0.05 m. We use (linear) interpolation, extrapolation is not supported. The
calculated variances are not valid anymore.

x_old = ds.x.data
x_new = x_old[:-1] + 0.05 # no extrapolation
ds_xinterped = ds.interp(coords={'x': x_new})

We can do the same in the time dimension

4.2. 2. Common DataStore functions 11


dtscalibration, Release 0.6.3

import numpy as np
time_old = ds.time.data
time_new = time_old + np.timedelta64(10, 's')
ds_tinterped = ds.interp(coords={'time': time_new})

4.3 3. Define calibration sections

The goal of this notebook is to show how you can define calibration sections. That means that we define certain parts
of the fiber to a timeseries of temperature measurements. Here, we assume the temperature timeseries is already part
of the DataStore object.
import os

from dtscalibration import read_silixa_files

filepath = os.path.join('..', '..', 'tests', 'data', 'double_ended2')


ds = read_silixa_files(
directory=filepath,
timezone_netcdf='UTC',
file_ext='*.xml')

6 files were found, each representing a single timestep


6 recorded vars were found: LAF, ST, AST, REV-ST, REV-AST, TMP
Recorded at 1693 points along the cable
The measurement is double ended
Reading the data from disk

First we have a look at which temperature timeseries are available for calibration. Therefore we access ds.
data_vars and we find probe1Temperature and probe2Temperature that refer to the temperature mea-
surement timeseries of the two probes attached to the Ultima.
Alternatively, we can access the ds.timeseries_keys property to list all timeseries that can be used for calibra-
tion.
print(ds.timeseries_keys) # list the available timeseeries
ds.probe1Temperature.plot(figsize=(12, 8)); # plot one of the timeseries

['acquisitionTime', 'referenceTemperature', 'probe1Temperature', 'probe2Temperature',


˓→'referenceProbeVoltage', 'probe1Voltage', 'probe2Voltage', 'userAcquisitionTimeFW',

˓→'userAcquisitionTimeBW']

/Users/bfdestombe/Projects/dts-calibration/python-dts-calibration/.tox/docs/lib/
˓→python3.6/site-packages/pandas/plotting/_converter.py:129: FutureWarning: Using an

˓→implicitly registered datetime converter for a matplotlib plotting method. The

˓→converter was registered by pandas on import. Future versions of pandas will

˓→require you to explicitly register matplotlib converters.

To register the converters:


>>> from pandas.plotting import register_matplotlib_converters
>>> register_matplotlib_converters()
warnings.warn(msg, FutureWarning)

A calibration is needed to estimate temperature from Stokes and anti-Stokes measurements. There are three unknowns
for a single ended calibration procedure 𝛾, 𝐶, and 𝛼. The parameters 𝛾 and 𝛼 remain constant over time, while 𝐶 may

12 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

vary.
At least two calibration sections of different temperatures are needed to perform a decent calibration procedure.
This setup has two baths, named ‘cold’ and ‘warm’. Each bath has 2 sections. probe1Temperature is the
temperature timeseries of the cold bath and probe2Temperature is the temperature timeseries of the warm bath.

Name sec- Name reference temperature time se- Number of sec- Location of sections
tion ries tions (m)
Cold bath probe1Temperature 2 7.5-17.0; 70.0-80.0
Warm bath probe2Temperature 2 24.0-34.0; 85.0-95.0

Sections are defined in a dictionary with its keywords of the names of the reference temperature time series. Its values
are lists of slice objects, where each slice object is a section.
Note that slice is part of the standard Python library and no import is required.

sections = {
'probe1Temperature': [slice(7.5, 17.), slice(70., 80.)], # cold bath
'probe2Temperature': [slice(24., 34.), slice(85., 95.)], # warm bath
}
ds.sections = sections

ds.sections

{'probe1Temperature': [slice(7.5, 17.0, None), slice(70.0, 80.0, None)],


'probe2Temperature': [slice(24.0, 34.0, None), slice(85.0, 95.0, None)]}

NetCDF files do not support reading/writing python dictionaries. Internally the sections dictionary is stored in ds.
_sections as a string encoded with yaml, which can be saved to a netCDF file. Each time the sections dictionary is
requested, yaml decodes the string and evaluates it to the Python dictionary.

4.4 4. Calculate variance of Stokes and anti-Stokes measurements

The goal of this notebook is to estimate the variance of the noise of the Stokes measurement. The measured Stokes
and anti-Stokes signals contain noise that is distributed approximately normal. We need to estimate the variance of the
noise to: - Perform a weighted calibration - Construct confidence intervals

import os

from dtscalibration import read_silixa_files


from matplotlib import pyplot as plt

%matplotlib inline

filepath = os.path.join('..', '..', 'tests', 'data', 'double_ended2')

ds = read_silixa_files(
directory=filepath,
timezone_netcdf='UTC',
file_ext='*.xml')

4.4. 4. Calculate variance of Stokes and anti-Stokes measurements 13


dtscalibration, Release 0.6.3

6 files were found, each representing a single timestep


6 recorded vars were found: LAF, ST, AST, REV-ST, REV-AST, TMP
Recorded at 1693 points along the cable
The measurement is double ended
Reading the data from disk

And we define the sections as we learned from the previous notebook. Sections are required to calculate the variance
in the Stokes.

sections = {
'probe1Temperature': [slice(7.5, 17.), slice(70., 80.)], # cold bath
'probe2Temperature': [slice(24., 34.), slice(85., 95.)], # warm bath
}
ds.sections = sections

Lets first read the documentation about the ds.variance_stokes method.

print(ds.variance_stokes.__doc__)

Calculates the variance between the measurements and a best fit


at each reference section. This fits a function to the nt * nx
measurements with ns * nt + nx parameters, where nx are the total
number of obervation locations along all sections. The temperature is
constant along the reference sections, so the expression of the
Stokes power can be split in a time series per reference section and
a constant per observation location.

Assumptions: 1) the temperature is the same along a reference


section.

Idea from discussion at page 127 in Richter, P. H. (1995). Estimating


errors in least-squares fitting.

Parameters
----------
reshape_residuals
st_label : str
label of the Stokes, anti-Stokes measurement.
E.g., ST, AST, REV-ST, REV-AST
sections : dict, optional
Define sections. See documentation

Returns
-------
I_var : float
Variance of the residuals between measured and best fit
resid : array_like
Residuals between measured and best fit

Notes
-----
Because there are a large number of unknowns, spend time on
calculating an initial estimate. Can be turned off by setting to False.

I_var, residuals = ds.variance_stokes(st_label='ST')


print("The variance of the Stokes signal along the reference sections "
(continues on next page)

14 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

(continued from previous page)


"is approximately {} on a {} sec acquisition time".format(I_var, ds.
˓→userAcquisitionTimeFW.data[0]))

The variance of the Stokes signal along the reference sections is approximately 8.
˓→181920419777416 on a 2.0 sec acquisition time

from dtscalibration import plot

fig_handle = plot.plot_residuals_reference_sections(
residuals,
sections,
title='Distribution of the noise in the Stokes signal',
plot_avg_std=I_var ** 0.5,
plot_names=True,
robust=True,
units='',
method='single')

/Users/bfdestombe/Projects/dts-calibration/python-dts-calibration/.tox/docs/lib/
˓→python3.6/site-packages/numpy/lib/nanfunctions.py:1628: RuntimeWarning: Degrees of

˓→freedom <= 0 for slice.

keepdims=keepdims)
/Users/bfdestombe/Projects/dts-calibration/python-dts-calibration/.tox/docs/lib/
˓→python3.6/site-packages/xarray/core/nanops.py:161: RuntimeWarning: Mean of empty

˓→slice

return np.nanmean(a, axis=axis, dtype=dtype)

4.4. 4. Calculate variance of Stokes and anti-Stokes measurements 15


dtscalibration, Release 0.6.3

The residuals should be normally distributed and independent from previous time steps and other points along the
cable. If you observe patterns in the residuals plot (above), it might be caused by: - The temperature in the calibration
bath is not uniform - Attenuation caused by coils/sharp bends in cable - Attenuation caused by a splice

import scipy
import numpy as np

sigma = residuals.std()
mean = residuals.mean()
x = np.linspace(mean - 3*sigma, mean + 3*sigma, 100)
approximated_normal_fit = scipy.stats.norm.pdf(x, mean, sigma)
residuals.plot.hist(bins=50, figsize=(12, 8), density=True)
plt.plot(x, approximated_normal_fit);

16 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

We can follow the same steps to calculate the variance from the noise in the anti-Stokes measurments by setting
st_label='AST and redo the steps.

4.5 5. Calibration of double ended measurement with OLS

A double ended calibration is performed with Ordinary Least Squares. Over all timesteps simultaneous. 𝛾 and 𝛼
remain constant, while 𝐶 varies over time. The weights are considered equal here and no variance or confidence
interval is calculated.
Note that the internal reference section can not be used since there is a connector between the internal and external
fiber and therefore the integrated differential attenuation cannot be considered to be linear anymore.

import os

from dtscalibration import read_silixa_files


import matplotlib.pyplot as plt

%matplotlib inline

filepath = os.path.join('..', '..', 'tests', 'data', 'single_ended')

ds = read_silixa_files(
directory=filepath,
timezone_netcdf='UTC',
file_ext='*.xml')

(continues on next page)

4.5. 5. Calibration of double ended measurement with OLS 17


dtscalibration, Release 0.6.3

(continued from previous page)


ds100 = ds.sel(x=slice(-30, 101)) # only calibrate parts of the fiber, in meters
sections = {
'probe1Temperature': [slice(20, 25.5)], # warm bath
'probe2Temperature': [slice(5.5, 15.5)], # cold bath
}
ds100.sections = sections

3 files were found, each representing a single timestep


4 recorded vars were found: LAF, ST, AST, TMP
Recorded at 1461 points along the cable
The measurement is single ended
Reading the data from disk

print(ds100.calibration_single_ended.__doc__)

Parameters
----------
store_p_cov : str
Key to store the covariance matrix of the calibrated parameters
store_p_val : str
Key to store the values of the calibrated parameters
nt : int, optional
Number of timesteps. Should be defined if method=='external'
z : array-like, optional
Distances. Should be defined if method=='external'
p_val
p_var
p_cov
sections : dict, optional
st_label : str
Label of the forward stokes measurement
ast_label : str
Label of the anti-Stoke measurement
st_var : float, optional
The variance of the measurement noise of the Stokes signals in
the forward
direction Required if method is wls.
ast_var : float, optional
The variance of the measurement noise of the anti-Stokes signals
in the forward
direction. Required if method is wls.
store_c : str
Label of where to store C
store_gamma : str
Label of where to store gamma
store_dalpha : str
Label of where to store dalpha; the spatial derivative of alpha.
store_alpha : str
Label of where to store alpha; The integrated differential
attenuation.
alpha(x=0) = 0
store_tmpf : str
Label of where to store the calibrated temperature of the forward
direction
variance_suffix : str, optional
(continues on next page)

18 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

(continued from previous page)


String appended for storing the variance. Only used when method
is wls.
method : {'ols', 'wls'}
Use 'ols' for ordinary least squares and 'wls' for weighted least
squares
solver : {'sparse', 'stats'}
Either use the homemade weighted sparse solver or the weighted
dense matrix solver of
statsmodels

Returns
-------

ds100.calibration_single_ended(st_label='ST',
ast_label='AST',
method='ols')

Lets compare our calibrated values with the device calibration


ds1 = ds100.isel(time=0) # take only the first timestep

ds1.TMPF.plot(linewidth=1, figsize=(12, 8), label='User calibrated') # plot the


˓→temperature calibrated by us

ds1.TMP.plot(linewidth=1, label='Device calibrated') # plot the temperature


˓→calibrated by the device

plt.title('Temperature at the first time step')


plt.legend();

4.5. 5. Calibration of double ended measurement with OLS 19


dtscalibration, Release 0.6.3

4.6 6. Calibration of double ended measurement with OLS


∫︀ 𝑙
A double ended calibration is performed with ordinary least squares. Over all timesteps simultaneous. 𝛾 and 0 𝛼d𝑥
remain constant, while 𝐶 varies over time. The weights are considered equal here and no variance is calculated.
Before starting the calibration procedure, the forward and the backward channel should be aligned.
import os

from dtscalibration import read_silixa_files


import matplotlib.pyplot as plt
%matplotlib inline

filepath = os.path.join('..', '..', 'tests', 'data', 'double_ended2')

ds = read_silixa_files(
directory=filepath,
timezone_netcdf='UTC',
file_ext='*.xml')

ds100 = ds.sel(x=slice(0, 100)) # only calibrate parts of the fiber


sections = {
'probe1Temperature': [slice(7.5, 17.), slice(70., 80.)], # cold bath
'probe2Temperature': [slice(24., 34.), slice(85., 95.)], # warm bath
}

6 files were found, each representing a single timestep


6 recorded vars were found: LAF, ST, AST, REV-ST, REV-AST, TMP
Recorded at 1693 points along the cable
The measurement is double ended
Reading the data from disk

print(ds100.calibration_double_ended.__doc__)

Parameters
----------
store_p_cov
store_p_val
nt
z
p_val
p_var
p_cov
sections : dict, optional
st_label : str
Label of the forward stokes measurement
ast_label : str
Label of the anti-Stoke measurement
rst_label : str
Label of the reversed Stoke measurement
rast_label : str
Label of the reversed anti-Stoke measurement
st_var : float, optional
The variance of the measurement noise of the Stokes signals in
the forward
direction Required if method is wls.
(continues on next page)

20 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

(continued from previous page)


ast_var : float, optional
The variance of the measurement noise of the anti-Stokes signals
in the forward
direction. Required if method is wls.
rst_var : float, optional
The variance of the measurement noise of the Stokes signals in
the backward
direction. Required if method is wls.
rast_var : float, optional
The variance of the measurement noise of the anti-Stokes signals
in the backward
direction. Required if method is wls.
store_d : str
Label of where to store D. Equals the integrated differential
attenuation at x=0
And should be equal to half the total integrated differential
attenuation.
store_gamma : str
Label of where to store gamma
store_alpha : str
Label of where to store alpha
store_tmpf : str
Label of where to store the calibrated temperature of the forward
direction
store_tmpb : str
Label of where to store the calibrated temperature of the
backward direction
store_tmpw : str
tmpw_mc_size : int
variance_suffix : str, optional
String appended for storing the variance. Only used when method
is wls.
method : {'ols', 'wls', 'external'}
Use 'ols' for ordinary least squares and 'wls' for weighted least
squares
solver : {'sparse', 'stats'}
Either use the homemade weighted sparse solver or the weighted
dense matrix solver of
statsmodels

Returns
-------

st_label = 'ST'
ast_label = 'AST'
rst_label = 'REV-ST'
rast_label = 'REV-AST'
ds100.calibration_double_ended(sections=sections,
st_label=st_label,
ast_label=ast_label,
rst_label=rst_label,
rast_label=rast_label,
method='ols')

After calibration, two data variables are added to the DataStore object: - TMPF, temperature calculated along the
forward direction - TMPB, temperature calculated along the backward direction

4.6. 6. Calibration of double ended measurement with OLS 21


dtscalibration, Release 0.6.3

A better estimate, with a lower expected variance, of the temperature along the fiber is the average of the two. We
cannot weigh on more than the other, as we do not have more information about the weighing.

ds1 = ds100.isel(time=0) # take only the first timestep

ds1.TMPF.plot(linewidth=1, label='User cali. Forward', figsize=(12, 8)) # plot the


˓→temperature calibrated by us

ds1.TMPB.plot(linewidth=1, label='User cali. Backward') # plot the temperature


˓→calibrated by us

ds1.TMP.plot(linewidth=1, label='Device calibrated') # plot the temperature


˓→calibrated by the device

plt.legend();

Lets compare our calibrated values with the device calibration. Lets average the temperature of the forward channel
and the backward channel first.

ds1['TMPAVG'] = (ds1.TMPF + ds1.TMPB) / 2


ds1_diff = ds1.TMP - ds1.TMPAVG

ds1_diff.plot(figsize=(12, 8));

22 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

The device calibration sections and calibration sections defined by us differ. The device only allows for 2 sections,
one per thermometer. And most likely the 𝛾 is fixed in the device calibration.

4.7 7. Calibration of single ended measurement with WLS and confi-


dence intervals

A single ended calibration is performed with weighted least squares. Over all timesteps simultaneous. 𝛾 and 𝛼 remain
constant, while 𝐶 varies over time. The weights are not considered equal here. The weights kwadratically decrease
with the signal strength of the measured Stokes and anti-Stokes signals.
The confidence intervals can be calculated as the weights are correctly defined.
The confidence intervals consist of two sources of uncertainty.
1. Measurement noise in the measured Stokes and anti-Stokes signals. Expressed in a single variance value.
2. Inherent to least squares procedures / overdetermined systems, the parameters are estimated with limited cer-
tainty and all parameters are correlated. Which is expressen in the covariance matrix.
Both sources of uncertainty are propagated to an uncertainty in the estimated temperature via Monte Carlo.

import os

from dtscalibration import read_silixa_files


import matplotlib.pyplot as plt
%matplotlib inline

4.7. 7. Calibration of single ended measurement with WLS and confidence intervals 23
dtscalibration, Release 0.6.3

filepath = os.path.join('..', '..', 'tests', 'data', 'single_ended')


ds = read_silixa_files(
directory=filepath,
timezone_netcdf='UTC',
file_ext='*.xml')

ds = ds.sel(x=slice(-30, 101)) # only calibrate parts of the fiber


sections = {
'probe1Temperature': [slice(20, 25.5)], # warm bath
'probe2Temperature': [slice(5.5, 15.5)], # cold bath
# 'referenceTemperature': [slice(-24., -4)] # The internal coil is not
˓→so uniform

}
ds.sections = sections

3 files were found, each representing a single timestep


4 recorded vars were found: LAF, ST, AST, TMP
Recorded at 1461 points along the cable
The measurement is single ended
Reading the data from disk

print(ds.calibration_single_ended.__doc__)

Parameters
----------
store_p_cov : str
Key to store the covariance matrix of the calibrated parameters
store_p_val : str
Key to store the values of the calibrated parameters
nt : int, optional
Number of timesteps. Should be defined if method=='external'
z : array-like, optional
Distances. Should be defined if method=='external'
p_val
p_var
p_cov
sections : dict, optional
st_label : str
Label of the forward stokes measurement
ast_label : str
Label of the anti-Stoke measurement
st_var : float, optional
The variance of the measurement noise of the Stokes signals in
the forward
direction Required if method is wls.
ast_var : float, optional
The variance of the measurement noise of the anti-Stokes signals
in the forward
direction. Required if method is wls.
store_c : str
Label of where to store C
store_gamma : str
Label of where to store gamma
store_dalpha : str
Label of where to store dalpha; the spatial derivative of alpha.
store_alpha : str
(continues on next page)

24 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

(continued from previous page)


Label of where to store alpha; The integrated differential
attenuation.
alpha(x=0) = 0
store_tmpf : str
Label of where to store the calibrated temperature of the forward
direction
variance_suffix : str, optional
String appended for storing the variance. Only used when method
is wls.
method : {'ols', 'wls'}
Use 'ols' for ordinary least squares and 'wls' for weighted least
squares
solver : {'sparse', 'stats'}
Either use the homemade weighted sparse solver or the weighted
dense matrix solver of
statsmodels

Returns
-------

st_label = 'ST'
ast_label = 'AST'

First calculate the variance in the measured Stokes and anti-Stokes signals, in the forward and backward direction.
The Stokes and anti-Stokes signals should follow a smooth decaying exponential. This function fits a decaying expo-
nential to each reference section for each time step. The variance of the residuals between the measured Stokes and
anti-Stokes signals and the fitted signals is used as an estimate of the variance in measured signals.

st_var, resid = ds.variance_stokes(st_label=st_label)


ast_var, _ = ds.variance_stokes(st_label=ast_label)

Similar to the ols procedure, we make a single function call to calibrate the temperature. If the method is wls and
confidence intervals are passed to conf_ints, confidence intervals calculated. As weigths are correctly passed to
the least squares procedure, the covariance matrix can be used. This matrix holds the covariances between all the
parameters. A large parameter set is generated from this matrix, assuming the parameter space is normally distributed
with their mean at the best estimate of the least squares procedure.
The large parameter set is used to calculate a large set of temperatures. By using percentiles or quantile the
95% confidence interval of the calibrated temperature between 2.5% and 97.5% are calculated.
The confidence intervals differ per time step. If you would like to calculate confidence intervals of all time steps
together you have the option ci_avg_time_flag=True. ‘We can say with 95% confidence that the temperature
remained between this line and this line during the entire measurement period’.

ds.calibration_single_ended(sections=sections,
st_label=st_label,
ast_label=ast_label,
st_var=st_var,
ast_var=ast_var,
method='wls',
solver='sparse',
store_p_val='p_val',
store_p_cov='p_cov'
)

4.7. 7. Calibration of single ended measurement with WLS and confidence intervals 25
dtscalibration, Release 0.6.3

ds.conf_int_single_ended(
p_val='p_val',
p_cov='p_cov',
st_label=st_label,
ast_label=ast_label,
st_var=st_var,
ast_var=ast_var,
store_tmpf='TMPF',
store_tempvar='_var',
conf_ints=[2.5, 97.5],
mc_sample_size=500,
ci_avg_time_flag=False)

Lets compare our calibrated values with the device calibration

ds1 = ds.isel(time=0) # take only the first timestep


ds1.TMPF.plot(linewidth=0.8, figsize=(12, 8), label='User calibrated') # plot the
˓→temperature calibrated by us

ds1.TMP.plot(linewidth=0.8, label='Device calibrated') # plot the temperature


˓→calibrated by the device

ds1.TMPF_MC.plot(linewidth=0.8, hue='CI', label='CI device')


plt.title('Temperature at the first time step')
plt.legend();

26 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

ds.TMPF_MC_var.plot(figsize=(12, 8));

ds1.TMPF_MC.sel(CI=2.5).plot(label = '2.5% CI', figsize=(12, 8))


ds1.TMPF_MC.sel(CI=97.5).plot(label = '97.5% CI')
ds1.TMPF.plot(label='User calibrated')
plt.title('User calibrated temperature with 95% confidence interval')
plt.legend();

4.7. 7. Calibration of single ended measurement with WLS and confidence intervals 27
dtscalibration, Release 0.6.3

We can tell from the graph above that the 95% confidence interval widens furtherdown the cable. Lets have a look
at the calculated variance along the cable for a single timestep. According to the device manufacturer this should be
around 0.0059 degC.

ds1.TMPF_MC_var.plot(figsize=(12, 8));

28 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

The variance of the temperature measurement appears to be larger than what the manufacturer reports. This is already
the case for the internal cable; it is not caused by a dirty connector/bad splice on our side. Maybe the length of the
calibration section was not sufficient.
At 30 m the variance sharply increases. There are several possible explanations. E.g., large temperatures or decreased
signal strength.
Lets have a look at the Stokes and anti-Stokes signal.

ds1.ST.plot(figsize=(12, 8))
ds1.AST.plot();

4.7. 7. Calibration of single ended measurement with WLS and confidence intervals 29
dtscalibration, Release 0.6.3

Clearly there was a bad splice at 30 m that resulted in the sharp increase of measurement uncertainty for the cable
section after the bad splice.

4.8 8. Calibration of double ended measurement with WLS and con-


fidence intervals

4.8.1 Calibration procedure

A double ended calibration is performed with weighted least squares. Over all timesteps simultaneous. 𝛾 and 𝛼 remain
constant, while 𝐶 varies over time. The weights are not considered equal here. The weights kwadratically decrease
with the signal strength of the measured Stokes and anti-Stokes signals.
The confidence intervals can be calculated as the weights are correctly defined.
The confidence intervals consist of two sources of uncertainty.
1. Measurement noise in the measured Stokes and anti-Stokes signals. Expressed in a single variance value.
2. Inherent to least squares procedures / overdetermined systems, the parameters are estimated with limited cer-
tainty and all parameters are correlated. Which is expressen in the covariance matrix.
Both sources of uncertainty are propagated to an uncertainty in the estimated temperature via Monte Carlo.

import os

from dtscalibration import read_silixa_files


(continues on next page)

30 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

(continued from previous page)


import matplotlib.pyplot as plt
%matplotlib inline

filepath = os.path.join('..', '..', 'tests', 'data', 'double_ended2')

ds_ = read_silixa_files(
directory=filepath,
timezone_netcdf='UTC',
file_ext='*.xml')

ds = ds_.sel(x=slice(0, 100)) # only calibrate parts of the fiber


sections = {
'probe1Temperature': [slice(7.5, 17.), slice(70., 80.)], # cold bath
'probe2Temperature': [slice(24., 34.), slice(85., 95.)], # warm bath
}
ds.sections = sections

6 files were found, each representing a single timestep


6 recorded vars were found: LAF, ST, AST, REV-ST, REV-AST, TMP
Recorded at 1693 points along the cable
The measurement is double ended
Reading the data from disk

st_label = 'ST'
ast_label = 'AST'
rst_label = 'REV-ST'
rast_label = 'REV-AST'

First calculate the variance in the measured Stokes and anti-Stokes signals, in the forward and backward direction.
The Stokes and anti-Stokes signals should follow a smooth decaying exponential. This function fits a decaying expo-
nential to each reference section for each time step. The variance of the residuals between the measured Stokes and
anti-Stokes signals and the fitted signals is used as an estimate of the variance in measured signals.

st_var, resid = ds.variance_stokes(st_label=st_label)


ast_var, _ = ds.variance_stokes(st_label=ast_label)
rst_var, _ = ds.variance_stokes(st_label=rst_label)
rast_var, _ = ds.variance_stokes(st_label=rast_label)

resid.plot(figsize=(12, 8));

4.8. 8. Calibration of double ended measurement with WLS and confidence intervals 31
dtscalibration, Release 0.6.3

We calibrate the measurement with a single method call. The labels refer to the keys in the DataStore object containing
the Stokes, anti-Stokes, reverse Stokes and reverse anti-Stokes. The variance in those measurements were calculated
in the previous step. We use a sparse solver because it saves us memory.

ds.calibration_double_ended(
st_label=st_label,
ast_label=ast_label,
rst_label=rst_label,
rast_label=rast_label,
st_var=st_var,
ast_var=ast_var,
rst_var=rst_var,
rast_var=rast_var,
store_tmpw='TMPW',
method='wls',
solver='sparse')

ds.TMPW.plot()

<matplotlib.collections.QuadMesh at 0x11e485e48>

32 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

4.8.2 Confidence intervals

With another method call we estimate the confidence intervals. If the method is wls and confidence intervals are
passed to conf_ints, confidence intervals calculated. As weigths are correctly passed to the least squares proce-
dure, the covariance matrix can be used as an estimator for the uncertainty in the parameters. This matrix holds the
covariances between all the parameters. A large parameter set is generated from this matrix as part of the Monte Carlo
routine, assuming the parameter space is normally distributed with their mean at the best estimate of the least squares
procedure.
The large parameter set is used to calculate a large set of temperatures. By using percentiles or quantile the
95% confidence interval of the calibrated temperature between 2.5% and 97.5% are calculated.
The confidence intervals differ per time step. If you would like to calculate confidence intervals of all time steps
together you have the option ci_avg_time_flag=True. ‘We can say with 95% confidence that the temperature
remained between this line and this line during the entire measurement period’. This is ideal if you’d like to calculate
the background temperature with a confidence interval.

ds.conf_int_double_ended(
p_val='p_val',
p_cov='p_cov',
st_label=st_label,
ast_label=ast_label,
rst_label=rst_label,
rast_label=rast_label,
st_var=st_var,
ast_var=ast_var,
rst_var=rst_var,
rast_var=rast_var,
store_tmpf='TMPF',
(continues on next page)

4.8. 8. Calibration of double ended measurement with WLS and confidence intervals 33
dtscalibration, Release 0.6.3

(continued from previous page)


store_tmpb='TMPB',
store_tmpw='TMPW',
store_tempvar='_var',
conf_ints=[2.5, 50., 97.5],
mc_sample_size=500, # <- choose a much larger sample size
ci_avg_time_flag=False)

ds1 = ds.isel(time=-1) # take only the first timestep


ds1.TMPW.plot(linewidth=0.7, figsize=(12, 8))
ds1.TMPW_MC.isel(CI=0).plot(linewidth=0.7, label='CI: 2.5%')
ds1.TMPW_MC.isel(CI=2).plot(linewidth=0.7, label='CI: 97.5%')
plt.legend();

The DataArrays TMPF_MC and TMPB_MC and the dimension CI are added. MC stands for monte carlo and the CI
dimension holds the confidence interval ‘coordinates’.

(ds1.TMPW_MC_var**0.5).plot(figsize=(12, 4));
plt.ylabel('$\sigma$ ($^\circ$C)');

34 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

ds.data_vars

Data variables:
ST (x, time) float64 4.049e+03 4.044e+03 ... 3.501e+03
AST (x, time) float64 3.293e+03 3.296e+03 ... 2.803e+03
REV-ST (x, time) float64 4.061e+03 4.037e+03 ... 4.584e+03
REV-AST (x, time) float64 3.35e+03 3.333e+03 ... 3.707e+03
TMP (x, time) float64 16.69 16.87 16.51 ... 13.6 13.69
acquisitionTime (time) float32 2.098 2.075 2.076 2.133 2.085 2.062
referenceTemperature (time) float32 21.0536 21.054 ... 21.0531 21.057
probe1Temperature (time) float32 4.36149 4.36025 ... 4.36021 4.36118
probe2Temperature (time) float32 18.5792 18.5785 ... 18.5805 18.5723
referenceProbeVoltage (time) float32 0.121704 0.121704 ... 0.121705
probe1Voltage (time) float32 0.114 0.114 0.114 0.114 0.114 0.114
probe2Voltage (time) float32 0.121 0.121 0.121 0.121 0.121 0.121
userAcquisitionTimeFW (time) float32 2.0 2.0 2.0 2.0 2.0 2.0
userAcquisitionTimeBW (time) float32 2.0 2.0 2.0 2.0 2.0 2.0
gamma float64 482.6
alpha (x) float64 -0.007156 -0.003301 ... -0.005165
d (time) float64 1.465 1.465 1.464 1.465 1.465 1.465
gamma_var float64 0.03927
alpha_var (x) float64 1.734e-07 1.814e-07 ... 1.835e-07
d_var (time) float64 4.854e-07 4.854e-07 ... 4.854e-07
TMPF (x, time) float64 16.8 17.05 16.32 ... 13.49 13.78
TMPB (x, time) float64 16.8 16.83 16.88 ... 13.74 13.69
TMPF_MC_var (x, time) float64 dask.array<shape=(787, 6),
˓→chunksize=(699, 6)>

TMPB_MC_var (x, time) float64 dask.array<shape=(787, 6),


˓→chunksize=(699, 6)>

TMPW (x, time) float64 dask.array<shape=(787, 6),


˓→chunksize=(699, 6)>

TMPW_MC_var (x, time) float64 dask.array<shape=(787, 6),


˓→chunksize=(699, 6)>

p_val (params1) float64 482.6 1.465 ... -0.005271 -0.005165


p_cov (params1, params2) float64 0.03927 ... 1.835e-07
TMPF_MC (CI, x, time) float64 dask.array<shape=(3, 787, 6),
˓→chunksize=(3, 699, 6)>

TMPB_MC (CI, x, time) float64 dask.array<shape=(3, 787, 6),


˓→chunksize=(3, 699, 6)>

TMPW_MC (CI, x, time) float64 dask.array<shape=(3, 787, 6),


˓→chunksize=(3, 699, 6)>

4.8. 8. Calibration of double ended measurement with WLS and confidence intervals 35
dtscalibration, Release 0.6.3

4.9 9. Import a time series

In this tutorial we are adding a timeseries to the DataStore object. This might be useful if the temperature in one of
the calibration baths was measured with an external device. It requires three steps to add the measurement files to the
DataStore object: 1. Load the measurement files (e.g., csv, txt) with pandas into a pandas.Series object 2. Add the
pandas.Series object to the DataStore 3. Align the time to that of the DTS measurement (required for calibration)

import pandas as pd
import os

from dtscalibration import read_silixa_files

4.9.1 Step 1: load the measurement files

filepath = os.path.join('..', '..', 'tests', 'data',


'external_temperature_timeseries',
'Loodswaternet2018-03-28 02h.csv')

# Bonus:
print(filepath, '\n')
with open(filepath, 'r') as f:
head = [next(f) for _ in range(5)]
print(' '.join(head))

../../tests/data/external_temperature_timeseries/Loodswaternet2018-03-28 02h.csv

"time","Pt100 2"
2018-03-28 02:00:05, 12.748
2018-03-28 02:00:10, 12.747
2018-03-28 02:00:15, 12.746
2018-03-28 02:00:20, 12.747

ts = pd.read_csv(filepath, sep=',', index_col=0, parse_dates=True,


squeeze=True, engine='python') # the latter 2 kwargs are to ensure
˓→a pd.Series is returned

ts = ts.tz_localize('Europe/Amsterdam') # set the timezone

ts.head() # Double check the timezone

time
2018-03-28 02:00:05+02:00 12.748
2018-03-28 02:00:10+02:00 12.747
2018-03-28 02:00:15+02:00 12.746
2018-03-28 02:00:20+02:00 12.747
2018-03-28 02:00:26+02:00 12.747
Name: Pt100 2, dtype: float64

Now we quickly create a DataStore from xml-files with Stokes measurements to add the external timeseries to

filepath_ds = os.path.join('..', '..', 'tests', 'data', 'double_ended2')


ds = read_silixa_files(directory=filepath_ds,
timezone_netcdf='UTC',
file_ext='*.xml')

36 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

6 files were found, each representing a single timestep


6 recorded vars were found: LAF, ST, AST, REV-ST, REV-AST, TMP
Recorded at 1693 points along the cable
The measurement is double ended
Reading the data from disk

4.9.2 Step 2: Add the temperature measurements of the external probe to the Data-
Store.

First add the coordinates

ds.coords['time_external'] = ts.index.values

Second we add the measured values

ds['external_probe'] = (('time_external',), ts)

4.9.3 Step 3: Align the time of the external measurements to the Stokes measure-
ment times

We linearly interpolate the measurements of the external sensor to the times we have DTS measurements

ds['external_probe_dts'] = ds['external_probe'].interp(time_external=ds.time)

print(ds.data_vars)

Data variables:
ST (x, time) float64 1.281 -0.5321 ... -43.44 -41.08
AST (x, time) float64 0.4917 1.243 ... -30.14 -32.09
REV-ST (x, time) float64 0.4086 -0.568 ... 4.822e+03
REV-AST (x, time) float64 2.569 -1.603 ... 4.224e+03
TMP (x, time) float64 196.1 639.1 218.7 ... 8.442 18.47
acquisitionTime (time) float32 2.098 2.075 2.076 2.133 2.085 2.062
referenceTemperature (time) float32 21.0536 21.054 ... 21.0531 21.057
probe1Temperature (time) float32 4.36149 4.36025 ... 4.36021 4.36118
probe2Temperature (time) float32 18.5792 18.5785 ... 18.5805 18.5723
referenceProbeVoltage (time) float32 0.121704 0.121704 ... 0.121705
probe1Voltage (time) float32 0.114 0.114 0.114 0.114 0.114 0.114
probe2Voltage (time) float32 0.121 0.121 0.121 0.121 0.121 0.121
userAcquisitionTimeFW (time) float32 2.0 2.0 2.0 2.0 2.0 2.0
userAcquisitionTimeBW (time) float32 2.0 2.0 2.0 2.0 2.0 2.0
external_probe (time_external) float64 12.75 12.75 ... 12.76 12.76
external_probe_dts (time) float64 12.75 12.75 12.75 12.75 12.75 12.75

Now we can use external_probe_dts when we define sections and use it for calibration

4.10 10. Align double ended measurements

The cable length was initially configured during the DTS measurement. For double ended measurements it is important
to enter the correct length so that the forward channel and the backward channel are aligned.

4.10. 10. Align double ended measurements 37


dtscalibration, Release 0.6.3

This notebook shows how to better align the forward and the backward measurements. Do this before the calibration
steps.

import os
from dtscalibration import read_silixa_files
from dtscalibration.datastore_utils import suggest_cable_shift_double_ended, shift_
˓→double_ended

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

suggest_cable_shift_double_ended?

filepath = os.path.join('..', '..', 'tests', 'data', 'double_ended2')

ds_aligned = read_silixa_files(
directory=filepath,
timezone_netcdf='UTC',
file_ext='*.xml') # this one is already correctly aligned

6 files were found, each representing a single timestep


6 recorded vars were found: LAF, ST, AST, REV-ST, REV-AST, TMP
Recorded at 1693 points along the cable
The measurement is double ended
Reading the data from disk

Because our loaded files were already nicely aligned, we are purposely offsetting the forward and backward channel
by 3 ‘spacial indices’.

ds_notaligned = shift_double_ended(ds_aligned, 3)

I dont know what to do with the following data ['TMP']

The device-calibrated temperature doesnot have a valid meaning anymore and is dropped

suggested_shift = suggest_cable_shift_double_ended(
ds_notaligned,
np.arange(-5, 5),
plot_result=True,
figsize=(12,8))

/Users/bfdestombe/Projects/dts-calibration/python-dts-calibration/src/dtscalibration/
˓→datastore_utils.py:240: RuntimeWarning: invalid value encountered in log

i_f = np.log(st / ast)


/Users/bfdestombe/Projects/dts-calibration/python-dts-calibration/src/dtscalibration/
˓→datastore_utils.py:241: RuntimeWarning: invalid value encountered in log

i_b = np.log(rst / rast)

38 Chapter 4. Learn by Examples


dtscalibration, Release 0.6.3

The two approaches suggest a shift of -3 and -4. It is up to the user which suggestion to follow. Usually the two
suggested shift are close

ds_restored = shift_double_ended(ds_notaligned, suggested_shift[0])

print(ds_aligned.x, 3*'\n', ds_restored.x)

<xarray.DataArray 'x' (x: 1693)>


array([-80.5043, -80.3772, -80.2501, ..., 134.294 , 134.421 , 134.548 ])
Coordinates:
* x (x) float64 -80.5 -80.38 -80.25 -80.12 ... 134.2 134.3 134.4 134.5
Attributes:
name: distance
description: Length along fiber
long_description: Starting at connector of forward channel
units: m

<xarray.DataArray 'x' (x: 1687)>


array([-80.123 , -79.9959, -79.8688, ..., 133.913 , 134.04 , 134.167 ])
Coordinates:
* x (x) float64 -80.12 -80.0 -79.87 -79.74 ... 133.8 133.9 134.0 134.2
Attributes:
name: distance
description: Length along fiber
long_description: Starting at connector of forward channel
units: m

Note that our fiber has become shorter by 2*3 spatial indices

4.10. 10. Align double ended measurements 39


dtscalibration, Release 0.6.3

40 Chapter 4. Learn by Examples


CHAPTER 5

Reference

5.1 dtscalibration

class dtscalibration.DataStore(*args, **kwargs)


The data class that stores the measurements, contains calibration methods to relate Stokes and anti-Stokes to
temperature. The user should never initiate this class directly, but use read_xml_dir or open_datastore functions
instead.
data_vars [dict-like, optional] A mapping from variable names to DataArray objects,
Variable objects or tuples of the form (dims, data[, attrs]) which can be used
as arguments to create a new Variable. Each dimension must have the same length in all
variables in which it appears.
coords [dict-like, optional] Another mapping in the same form as the variables argument, except the
each item is saved on the datastore as a “coordinate”. These variables have an associated mean-
ing: they describe constant/fixed/independent quantities, unlike the varying/measured/dependent
quantities that belong in variables. Coordinates values may be given by 1-dimensional arrays or
scalars, in which case dims do not need to be supplied: 1D arrays will be assumed to give index
values along the dimension with the same name.
attrs [dict-like, optional] Global attributes to save on this datastore.
sections [dict, optional] Sections for calibration. The dictionary should contain key-var couples in
which the key is the name of the calibration temp time series. And the var is a list of slice objects
as ‘slice(start, stop)’; start and stop in meter (float).
compat [{‘broadcast_equals’, ‘equals’, ‘identical’}, optional] String indicating how to compare
variables of the same name for potential conflicts when initializing this datastore: - ‘broad-
cast_equals’: all values must be equal when variables are
broadcast against each other to ensure common dimensions.

• ‘equals’: all values and dimensions must be the same.


• ‘identical’: all values, dimensions and attributes must be the same.

41
dtscalibration, Release 0.6.3

dtscalibration.read_xml_dir : Load measurements stored in XML-files dtscalibration.open_datastore


: Load (calibrated) measurements from netCDF-like file
calibration_double_ended(sections=None, st_label=’ST’, ast_label=’AST’,
rst_label=’REV-ST’, rast_label=’REV-AST’, st_var=None,
ast_var=None, rst_var=None, rast_var=None,
store_d=’d’, store_gamma=’gamma’, store_alpha=’alpha’,
store_tmpf=’TMPF’, store_tmpb=’TMPB’, store_tmpw=’TMPW’,
tmpw_mc_size=50, store_p_cov=’p_cov’, store_p_val=’p_val’,
variance_suffix=’_var’, method=’ols’, solver=’sparse’,
nt=None, z=None, p_val=None, p_var=None, p_cov=None,
remove_mc_set_flag=True, reduce_memory_usage=False)
Parameters
• store_p_cov
• store_p_val
• nt
• z
• p_val
• p_var
• p_cov
• sections (dict, optional)
• st_label (str) – Label of the forward stokes measurement
• ast_label (str) – Label of the anti-Stoke measurement
• rst_label (str) – Label of the reversed Stoke measurement
• rast_label (str) – Label of the reversed anti-Stoke measurement
• st_var (float, optional) – The variance of the measurement noise of the Stokes signals in
the forward direction Required if method is wls.
• ast_var (float, optional) – The variance of the measurement noise of the anti-Stokes sig-
nals in the forward direction. Required if method is wls.
• rst_var (float, optional) – The variance of the measurement noise of the Stokes signals in
the backward direction. Required if method is wls.
• rast_var (float, optional) – The variance of the measurement noise of the anti-Stokes
signals in the backward direction. Required if method is wls.
• store_d (str) – Label of where to store D. Equals the integrated differential attenuation at
x=0 And should be equal to half the total integrated differential attenuation.
• store_gamma (str) – Label of where to store gamma
• store_alpha (str) – Label of where to store alpha
• store_tmpf (str) – Label of where to store the calibrated temperature of the forward direc-
tion
• store_tmpb (str) – Label of where to store the calibrated temperature of the backward
direction
• store_tmpw (str)
• tmpw_mc_size (int)

42 Chapter 5. Reference
dtscalibration, Release 0.6.3

• variance_suffix (str, optional) – String appended for storing the variance. Only used when
method is wls.
• method ({‘ols’, ‘wls’, ‘external’}) – Use ‘ols’ for ordinary least squares and ‘wls’ for
weighted least squares
• solver ({‘sparse’, ‘stats’}) – Either use the homemade weighted sparse solver or the
weighted dense matrix solver of statsmodels
calibration_single_ended(sections=None, st_label=’ST’, ast_label=’AST’, st_var=None,
ast_var=None, store_c=’c’, store_gamma=’gamma’,
store_dalpha=’dalpha’, store_alpha=’alpha’, store_tmpf=’TMPF’,
store_p_cov=’p_cov’, store_p_val=’p_val’, variance_suffix=’_var’,
method=’ols’, solver=’sparse’, nt=None, z=None, p_val=None,
p_var=None, p_cov=None)
Parameters
• store_p_cov (str) – Key to store the covariance matrix of the calibrated parameters
• store_p_val (str) – Key to store the values of the calibrated parameters
• nt (int, optional) – Number of timesteps. Should be defined if method==’external’
• z (array-like, optional) – Distances. Should be defined if method==’external’
• p_val
• p_var
• p_cov
• sections (dict, optional)
• st_label (str) – Label of the forward stokes measurement
• ast_label (str) – Label of the anti-Stoke measurement
• st_var (float, optional) – The variance of the measurement noise of the Stokes signals in
the forward direction Required if method is wls.
• ast_var (float, optional) – The variance of the measurement noise of the anti-Stokes sig-
nals in the forward direction. Required if method is wls.
• store_c (str) – Label of where to store C
• store_gamma (str) – Label of where to store gamma
• store_dalpha (str) – Label of where to store dalpha; the spatial derivative of alpha.
• store_alpha (str) – Label of where to store alpha; The integrated differential attenuation.
alpha(x=0) = 0
• store_tmpf (str) – Label of where to store the calibrated temperature of the forward direc-
tion
• variance_suffix (str, optional) – String appended for storing the variance. Only used when
method is wls.
• method ({‘ols’, ‘wls’}) – Use ‘ols’ for ordinary least squares and ‘wls’ for weighted least
squares
• solver ({‘sparse’, ‘stats’}) – Either use the homemade weighted sparse solver or the
weighted dense matrix solver of statsmodels
channel_configuration

5.1. dtscalibration 43
dtscalibration, Release 0.6.3

chbw
chfw
conf_int_double_ended(p_val=’p_val’, p_cov=’p_cov’, st_label=’ST’, ast_label=’AST’,
rst_label=’REV-ST’, rast_label=’REV-AST’, st_var=None,
ast_var=None, rst_var=None, rast_var=None, store_tmpf=’TMPF’,
store_tmpb=’TMPB’, store_tmpw=’TMPW’, store_tempvar=’_var’,
conf_ints=None, mc_sample_size=100, ci_avg_time_flag=False,
ci_avg_x_flag=False, var_only_sections=False,
da_random_state=None, remove_mc_set_flag=True, re-
duce_memory_usage=False)
Parameters
• p_val (array-like or string) – parameter solution directly from calibra-
tion_double_ended_wls
• p_cov (array-like or string) – parameter covariance at the solution directly from calibra-
tion_double_ended_wls If set to False, no uncertainty in the parameters is propagated into
the confidence intervals. Similar to the spec sheets of the DTS manufacturers. And similar
to passing an array filled with zeros
• st_label (str) – Key of the forward Stokes
• ast_label (str) – Key of the forward anti-Stokes
• rst_label (str) – Key of the backward Stokes
• rast_label (str) – Key of the backward anti-Stokes
• st_var (float) – Float of the variance of the Stokes signal
• ast_var (float) – Float of the variance of the anti-Stokes signal
• rst_var (float) – Float of the variance of the backward Stokes signal
• rast_var (float) – Float of the variance of the backward anti-Stokes signal
• store_tmpf (str) – Key of how to store the Forward calculated temperature. Is calculated
using the forward Stokes and anti-Stokes observations.
• store_tmpb (str) – Key of how to store the Backward calculated temperature. Is calculated
using the backward Stokes and anti-Stokes observations.
• store_tmpw (str) – Key of how to store the forward-backward-weighted temperature.
First, the variance of TMPF and TMPB are calculated. The Monte Carlo set of TMPF
and TMPB are averaged, weighted by their variance. The median of this set is thought to
be the a reasonable estimate of the temperature
• store_tempvar (str) – a string that is appended to the store_tmp_ keys. and the variance
is calculated for those store_tmp_ keys
• conf_ints (iterable object of float) – A list with the confidence boundaries that are calcu-
lated. Valid values are between [0, 1].
• mc_sample_size (int) – Size of the monte carlo parameter set used to calculate the confi-
dence interval
• ci_avg_time_flag (bool) – The confidence intervals differ per time step. If you would
like to calculate confidence intervals of all time steps together. ‘We can say with 95%
confidence that the temperature remained between this line and this line during the entire
measurement period’.

44 Chapter 5. Reference
dtscalibration, Release 0.6.3

• ci_avg_x_flag (bool) – Similar to ci_avg_time_flag but then the averaging takes place over
the x dimension. And we can observe to variance over time.
• var_only_sections (bool) – useful if using the ci_avg_x_flag. Only calculates the var over
the sections, so that the values can be compared with accuracy along the reference sections.
Where the accuracy is the variance of the residuals between the estimated temperature and
temperature of the water baths
• da_random_state – For testing purposes. Similar to random seed. The seed for dask.
Makes random not so random. To produce reproducable results for testing environments.
• remove_mc_set_flag (bool) – Remove the monte carlo data set, from which the CI and
the variance are calculated.
• reduce_memory_usage (bool) – Use less memory but at the expense of longer computa-
tion time
conf_int_single_ended(p_val=’p_val’, p_cov=’p_cov’, st_label=’ST’, ast_label=’AST’,
st_var=None, ast_var=None, store_tmpf=’TMPF’,
store_tempvar=’_var’, conf_ints=None, mc_sample_size=100,
ci_avg_time_flag=False, ci_avg_x_flag=False, da_random_state=None,
remove_mc_set_flag=True, reduce_memory_usage=False)
Parameters
• p_val (array-like or string) – parameter solution directly from calibra-
tion_double_ended_wls
• p_cov (array-like or string or bool) – parameter covariance at p_val directly from calibra-
tion_double_ended_wls. If set to False, no uncertainty in the parameters is propagated into
the confidence intervals. Similar to the spec sheets of the DTS manufacturers. And similar
to passing an array filled with zeros. If set to string, the p_cov is retreived by accessing
ds[p_cov] . See p_cov keyword argument in the calibration routine.
• st_label (str) – Key of the forward Stokes
• ast_label (str) – Key of the forward anti-Stokes
• st_var (float) – Float of the variance of the Stokes signal
• ast_var (float) – Float of the variance of the anti-Stokes signal
• store_tmpf (str) – Key of how to store the Forward calculated temperature. Is calculated
using the forward Stokes and anti-Stokes observations.
• store_tempvar (str) – a string that is appended to the store_tmp_ keys. and the variance
is calculated for those store_tmp_ keys
• conf_ints (iterable object of float) – A list with the confidence boundaries that are calcu-
lated. Valid values are between [0, 1].
• mc_sample_size (int) – Size of the monte carlo parameter set used to calculate the confi-
dence interval
• ci_avg_time_flag (bool) – The confidence intervals differ per time step. If you would
like to calculate confidence intervals of all time steps together. ‘We can say with 95%
confidence that the temperature remained between this line and this line during the entire
measurement period’.
• ci_avg_x_flag (bool) – Similar to ci_avg_time_flag but then over the x-dimension instead
of the time-dimension
• da_random_state – For testing purposes. Similar to random seed. The seed for dask.
Makes random not so random. To produce reproducable results for testing environments.

5.1. dtscalibration 45
dtscalibration, Release 0.6.3

• remove_mc_set_flag (bool) – Remove the monte carlo data set, from which the CI and
the variance are calculated.
• reduce_memory_usage (bool) – Use less memory but at the expense of longer computa-
tion time
get_default_encoding()
get_time_dim(data_var_key=None)
Find relevant time dimension. by educative guessing
Parameters data_var_key (str) – The data variable key that contains a relevant time dimension.
If None, ‘ST’ is used.
get_x_dim(data_var_key=None)
Find relevant x dimension. by educative guessing
Parameters data_var_key (str) – The data variable key that contains a relevant time dimension.
If None, ‘ST’ is used.
in_confidence_interval(ci_label, conf_ints, sections=None)
Returns an array with bools wether the temperature of the reference sections are within the confidence
intervals
Parameters
• sections (Dict[str, List[slice]])
• ci_label
• conf_ints
inverse_variance_weighted_mean(tmp1=’TMPF’, tmp2=’TMPB’,
tmp1_var=’TMPF_MC_var’,
tmp2_var=’TMPB_MC_var’, tmpw_store=’TMPW’,
tmpw_var_store=’TMPW_var’)
Average two temperature datasets with the inverse of the variance as weights. The two temperature datasets
tmp1 and tmp2 with their variances tmp1_var and tmp2_var, respectively. Are averaged and stored in the
DataStore.
Parameters
• tmp1 (str) – The label of the first temperature dataset that is averaged
• tmp2 (str) – The label of the second temperature dataset that is averaged
• tmp1_var (str) – The variance of tmp1
• tmp2_var (str) – The variance of tmp2
• tmpw_store (str) – The label of the averaged temperature dataset
• tmpw_var_store (str) – The label of the variance of the averaged temperature dataset
inverse_variance_weighted_mean_array(tmp_label=’TMPF’,
tmp_var_label=’TMPF_MC_var’,
tmpw_store=’TMPW’,
tmpw_var_store=’TMPW_var’, dim=’time’)
Calculates the weighted average across a dimension.
See also:
-()
is_double_ended

46 Chapter 5. Reference
dtscalibration, Release 0.6.3

resample_datastore(how, freq=None, dim=None, skipna=None, closed=None, label=None,


base=0, keep_attrs=True, **indexer)
Returns a resampled DataStore. Always define the how. Handles both downsampling and upsampling. If
any intervals contain no values from the original object, they will be given the value NaN. :Parameters: *
freq
• dim
• how (str) – Any function that is available via groupby. E.g., ‘mean’ http://pandas.pydata.org/
pandas-docs/stable/groupby.html#groupby -dispatch
• skipna (bool, optional) – Whether to skip missing values when aggregating in downsampling.
• closed (‘left’ or ‘right’, optional) – Side of each interval to treat as closed.
• label (‘left or ‘right’, optional) – Side of each interval to use for labeling.
• base (int, optional) – For frequencies that evenly subdivide 1 day, the “origin” of the aggregated
intervals. For example, for ‘24H’ frequency, base could range from 0 through 23.
• keep_attrs (bool, optional) – If True, the object’s attributes (attrs) will be copied from the original
object to the new one. If False (default), the new object will be returned without attributes.
• **indexer ({dim: freq}) – Dictionary with a key indicating the dimension name to resample over and
a value corresponding to the resampling frequency.

Returns resampled (same type as caller) – This object resampled.

sections
Define calibration sections. Each section requires a reference temperature time series, such as the temper-
ature measured by an external temperature sensor. They should already be part of the DataStore object.
Please look at the example notebook on sections if you encounter difficulties.
Parameters sections (Dict[str, List[slice]]) – Sections are defined in a dictionary with its key-
words of the names of the reference temperature time series. Its values are lists of slice
objects, where each slice object is a stretch.
temperature_residuals(label=None)
Parameters label (str) – The key of the temperature DataArray
Returns resid_da (xarray.DataArray) – The residuals as DataArray
timeseries_keys
Returns the keys of all timeseires that can be used for calibration.
to_netcdf(path=None, mode=’w’, format=None, group=None, engine=None, encoding=None, un-
limited_dims=None, compute=True)
Write datastore contents to a netCDF file. :Parameters: * path (str, Path or file-like object, optional) –
Path to which to save this dataset. File-like objects are only
supported by the scipy engine. If no path is provided, this function returns the resulting
netCDF file as bytes; in this case, we need to use scipy, which does not support netCDF
version 4 (the default format becomes NETCDF3_64BIT).

• mode ({‘w’, ‘a’}, optional) – Write (‘w’) or append (‘a’) mode. If mode=’w’, any existing
file at this location will be overwritten. If mode=’a’, existing variables will be overwritten.
• format ({‘NETCDF4’, ‘NETCDF4_CLASSIC’, ‘NETCDF3_64BIT’,)
• ‘NETCDF3_CLASSIC’}, optional – File format for the resulting netCDF file: *
NETCDF4: Data is stored in an HDF5 file, using netCDF4 API

5.1. dtscalibration 47
dtscalibration, Release 0.6.3

features.

– NETCDF4_CLASSIC: Data is stored in an HDF5 file, using only netCDF 3 compatible


API features.
– NETCDF3_64BIT: 64-bit offset version of the netCDF 3 file format, which fully supports
2+ GB files, but is only compatible with clients linked against netCDF version 3.6.0 or
later.
– NETCDF3_CLASSIC: The classic netCDF 3 file format. It does not handle 2+ GB files
very well.

All formats are supported by the netCDF4-python library. scipy.io.netcdf only supports the
last two formats. The default format is NETCDF4 if you are saving a file to disk and have
the netCDF4-python library available. Otherwise, xarray falls back to using scipy to write
netCDF files and defaults to the NETCDF3_64BIT format (scipy does not support netCDF4).
• group (str, optional) – Path to the netCDF4 group in the given file to open (only works for
format=’NETCDF4’). The group(s) will be created if necessary.
• engine ({‘netcdf4’, ‘scipy’, ‘h5netcdf’}, optional) – Engine to use when writing netCDF
files. If not provided, the default engine is chosen based on available dependencies, with a
preference for ‘netcdf4’ if writing to a file on disk.
• encoding (dict, optional) – defaults to reasonable compression. Use encoding={} to disable
encoding. Nested dictionary with variable names as keys and dictionaries of variable specific
encodings as values, e.g., ‘‘{‘my_variable’: {‘dtype’: ‘int16’, ‘scale_factor’: 0.1,
‘zlib’: True}, . . . }‘‘
The h5netcdf engine supports both the NetCDF4-style compression encoding parameters
{'zlib': True, 'complevel': 9} and the h5py ones {'compression':
'gzip', 'compression_opts': 9}. This allows using any compression plugin
installed in the HDF5 library, e.g. LZF.
• unlimited_dims (sequence of str, optional) – Dimension(s) that should be serialized as un-
limited dimensions. By default, no dimensions are treated as unlimited dimensions. Note
that unlimited_dims may also be set via dataset.encoding['unlimited_dims'].
• compute (boolean) – If true compute immediately, otherwise return a dask.delayed.
Delayed object that can be computed later.

ufunc_per_section(sections=None, func=None, label=None, subtract_from_label=None,


temp_err=False, x_indices=False, ref_temp_broadcasted=False,
calc_per=’stretch’, **func_kwargs)
User function applied to parts of the cable. Super useful, many options and slightly complicated.
The function func is taken over all the timesteps and calculated per calc_per. This is returned as a dictio-
nary
Parameters
• sections (Dict[str, List[slice]])
• func (callable, str) – A numpy function, or lambda function to apple to each ‘calc_per’.
• label
• subtract_from_label
• temp_err (bool) – The argument of the function is label minus the reference temperature.

48 Chapter 5. Reference
dtscalibration, Release 0.6.3

• x_indices (bool) – To retreive an integer array with the indices of the x-coordinates in the
section/stretch
• ref_temp_broadcasted (bool)
• calc_per ({‘all’, ‘section’, ‘stretch’})
• func_kwargs (dict) – Dictionary with options that are passed to func
• TODO (Spend time on creating a slice instead of appendng everything)
• to a list and concatenating after.

Examples

# Calculate the variance of the residuals in the along ALL the # reference sections wrt the temperature of
the water baths TMPF_var = d.ufunc_per_section(
func=’var’, calc_per=’all’, label=’TMPF’, temp_err=True )
# Calculate the variance of the residuals in the along PER # reference section wrt the temperature of the
water baths TMPF_var = d.ufunc_per_section(
func=’var’, calc_per=’stretch’, label=’TMPF’, temp_err=True )
# Calculate the variance of the residuals in the along PER # water bath wrt the temperature of the water
baths TMPF_var = d.ufunc_per_section(
func=’var’, calc_per=’section’, label=’TMPF’, temp_err=True )
# Obtain the coordinates of the measurements per section locs = d.ufunc_per_section(
func=None, label=’x’, temp_err=False, ref_temp_broadcasted=False, calc_per=’stretch’)
# Number of observations per stretch nlocs = d.ufunc_per_section(
func=len, label=’x’, temp_err=False, ref_temp_broadcasted=False, calc_per=’stretch’)
# broadcast the temperature of the reference sections to stretch/section/all dimensions. The value of the
reference temperature (a timeseries) is broadcasted to the shape of self[ label]. The self[label] is not used
for anything else. temp_ref = d.ufunc_per_section(
label=’ST’, ref_temp_broadcasted=True, calc_per=’all’)
# x-coordinate index ix_loc = d.ufunc_per_section(x_indices=True)

Note: If self[label] or self[subtract_from_label] is a Dask array, a Dask array is returned Else a numpy
array is returned

variance_stokes(st_label, sections=None, reshape_residuals=True)


Calculates the variance between the measurements and a best fit at each reference section. This fits a
function to the nt * nx measurements with ns * nt + nx parameters, where nx are the total number of
obervation locations along all sections. The temperature is constant along the reference sections, so the
expression of the Stokes power can be split in a time series per reference section and a constant per
observation location.
Assumptions: 1) the temperature is the same along a reference section.
Idea from discussion at page 127 in Richter, P. H. (1995). Estimating errors in least-squares fitting.
Parameters
• reshape_residuals

5.1. dtscalibration 49
dtscalibration, Release 0.6.3

• st_label (str) – label of the Stokes, anti-Stokes measurement. E.g., ST, AST, REV-ST,
REV-AST
• sections (dict, optional) – Define sections. See documentation
Returns
• I_var (float) – Variance of the residuals between measured and best fit
• resid (array_like) – Residuals between measured and best fit

Notes

Because there are a large number of unknowns, spend time on calculating an initial estimate. Can be turned
off by setting to False.
variance_stokes_exponential(st_label, sections=None, use_statsmodels=False, sup-
press_info=True, reshape_residuals=True)
Calculates the variance between the measurements and a best fit exponential at each reference section.
This fits a two-parameter exponential to the stokes measurements. The temperature is constant and there
are no splices/sharp bends in each reference section. Therefore all signal decrease is due to differential
attenuation, which is the same for each reference section. The scale of the exponential does differ per
reference section.
Assumptions: 1) the temperature is the same along a reference section. 2) no sharp bends and splices in
the reference sections. 3) Same type of optical cable in each reference section.
Idea from discussion at page 127 in Richter, P. H. (1995). Estimating errors in least-squares fitting. For
weights used error propagation: w^2 = 1/sigma(lny)^2 = y^2/sigma(y)^2 = y^2
Parameters
• reshape_residuals
• use_statsmodels
• suppress_info
• st_label (str) – label of the Stokes, anti-Stokes measurement. E.g., ST, AST, REV-ST,
REV-AST
• sections (dict, optional) – Define sections. See documentation
Returns
• I_var (float) – Variance of the residuals between measured and best fit
• resid (array_like) – Residuals between measured and best fit
dtscalibration.open_datastore(filename_or_obj, group=None, decode_cf=True,
mask_and_scale=None, decode_times=True, con-
cat_characters=True, decode_coords=True, engine=None,
chunks=None, lock=None, cache=None, drop_variables=None,
backend_kwargs=None, **kwargs)
Load and decode a datastore from a file or file-like object. :Parameters: * filename_or_obj (str, Path, file or
xarray.backends.*DataStore) – Strings and Path objects are interpreted as a path to a netCDF file
or an OpenDAP URL and opened with python-netCDF4, unless the filename ends with
.gz, in which case the file is gunzipped and opened with scipy.io.netcdf (only netCDF3
supported). File-like objects are opened with scipy.io.netcdf (only netCDF3 supported).

• group (str, optional) – Path to the netCDF4 group in the given file to open (only works for
netCDF4 files).

50 Chapter 5. Reference
dtscalibration, Release 0.6.3

• decode_cf (bool, optional) – Whether to decode these variables, assuming they were saved
according to CF conventions.
• mask_and_scale (bool, optional) – If True, replace array values equal to _FillValue with NA
and scale values according to the formula original_values * scale_factor + add_offset, where
_FillValue, scale_factor and add_offset are taken from variable attributes (if they exist). If the
_FillValue or missing_value attribute contains multiple values a warning will be issued and all
array values matching one of the multiple values will be replaced by NA. mask_and_scale de-
faults to True except for the pseudonetcdf backend.
• decode_times (bool, optional) – If True, decode times encoded in the standard NetCDF datetime
format into datetime objects. Otherwise, leave them encoded as numbers.
• concat_characters (bool, optional) – If True, concatenate along the last dimension of character
arrays to form string arrays. Dimensions will only be concatenated over (and removed) if they
have no corresponding variable and if they are only used as the last dimension of character
arrays.
• decode_coords (bool, optional) – If True, decode the ‘coordinates’ attribute to identify coordi-
nates in the resulting dataset.
• engine ({‘netcdf4’, ‘scipy’, ‘pydap’, ‘h5netcdf’, ‘pynio’,)
• ‘pseudonetcdf’}, optional – Engine to use when reading files. If not provided, the default
engine is chosen based on available dependencies, with a preference for ‘netcdf4’.
• chunks (int or dict, optional) – If chunks is provided, it used to load the new dataset into dask
arrays. chunks={} loads the dataset with dask using a single chunk for all arrays.
• lock (False, True or threading.Lock, optional) – If chunks is provided, this argument is passed
on to dask.array.from_array(). By default, a global lock is used when reading data
from netCDF files with the netcdf4 and h5netcdf engines to avoid issues with concurrent access
when using dask’s multithreaded backend.
• cache (bool, optional) – If True, cache data loaded from the underlying datastore in memory as
NumPy arrays when accessed to avoid reading from the underlying data- store multiple times.
Defaults to True unless you specify the chunks argument to use dask, in which case it defaults to
False. Does not change the behavior of coordinates corresponding to dimensions, which always
load their data from disk into a pandas.Index.
• drop_variables (string or iterable, optional) – A variable or list of variables to exclude from
being parsed from the dataset. This may be useful to drop variables with problems or inconsistent
values.
• backend_kwargs (dictionary, optional) – A dictionary of keyword arguments to pass on to the
backend. This may be useful when backend options would improve performance or allow user
control of dataset processing.

Returns dataset (Dataset) – The newly created dataset.

See also:
read_xml_dir()
dtscalibration.read_sensornet_files(filepathlist=None, directory=None, file_ext=’*.ddf’,
timezone_netcdf=’UTC’, timezone_input_files=’UTC’,
silent=False, **kwargs)
Read a folder with measurement files. Each measurement file contains values for a single timestep. Remember
to check which timezone you are working in.
Parameters

5.1. dtscalibration 51
dtscalibration, Release 0.6.3

• filepathlist (list of str, optional) – List of paths that point the the silixa files
• directory (str, Path, optional) – Path to folder
• timezone_netcdf (str, optional) – Timezone string of the netcdf file. UTC follows CF-
conventions.
• timezone_input_files (str, optional) – Timezone string of the measurement files. Remember
to check when measurements are taken. Also if summertime is used.
• file_ext (str, optional) – file extension of the measurement files
• silent (bool) – If set tot True, some verbose texts are not printed to stdout/screen
• kwargs (dict-like, optional) – keyword-arguments are passed to DataStore initialization

Notes

Compressed sensornet files can not be directly decoded, because the files are encoded with encoding=’windows-
1252’ instead of UTF-8.
Returns datastore (DataStore) – The newly created datastore.
dtscalibration.read_silixa_files(filepathlist=None, directory=None, zip_handle=None,
file_ext=’*.xml’, timezone_netcdf=’UTC’, silent=False,
load_in_memory=’auto’, **kwargs)
Read a folder with measurement files. Each measurement file contains values for a single timestep. Remember
to check which timezone you are working in.
The silixa files are already timezone aware
Parameters
• filepathlist (list of str, optional) – List of paths that point the the silixa files
• directory (str, Path, optional) – Path to folder
• timezone_netcdf (str, optional) – Timezone string of the netcdf file. UTC follows CF-
conventions.
• file_ext (str, optional) – file extension of the measurement files
• silent (bool) – If set tot True, some verbose texts are not printed to stdout/screen
• load_in_memory ({‘auto’, True, False}) – If ‘auto’ the Stokes data is only loaded to mem-
ory for small files
• kwargs (dict-like, optional) – keyword-arguments are passed to DataStore initialization
Returns datastore (DataStore) – The newly created datastore.
dtscalibration.plot_dask(arr, file_path=None)
For debugging the scheduling of the calculation of dask arrays. Requires additional libraries to be installed.
Parameters
• arr (Dask-Array) – An uncomputed dask array
• file_path (Path-like, str, optional) – Path to save graph
Returns out (array-like) – The calculated array

52 Chapter 5. Reference
CHAPTER 6

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

6.1 Bug reports

When reporting a bug please include:


• Your operating system name and version.
• Any details about your local setup that might be helpful in troubleshooting.
• Detailed steps to reproduce the bug.

6.2 Documentation improvements

dtscalibration could always use more documentation, whether as part of the official dtscalibration docs, in docstrings,
or even on the web in blog posts, articles, and such.

6.3 Feature requests and feedback

The best way to send feedback is to file an issue at https://github.com/bdestombe/python-dts-calibration/issues.


If you are proposing a feature:
• Explain in detail how it would work.
• Keep the scope as narrow as possible, to make it easier to implement.
• Remember that this is a volunteer-driven project, and that code contributions are welcome :)

53
dtscalibration, Release 0.6.3

6.4 Development

To set up python-dts-calibration for local development:


1. Fork python-dts-calibration (look for the “Fork” button).
2. Clone your fork locally:

git clone git@github.com:your_name_here/python-dts-calibration.git

3. Create a branch for local development:

git checkout -b name-of-your-bugfix-or-feature

Now you can make your changes locally.


4. When you’re done making changes, run all the checks, doc builder and spell checker with tox one command:

tox

5. Commit your changes and push your branch to GitHub:

git add .
git commit -m "Your detailed description of your changes."
git push origin name-of-your-bugfix-or-feature

6. Submit a pull request through the GitHub website.

6.4.1 Pull Request Guidelines

If you need some code review or feedback while you’re developing the code just make the pull request.
For merging, you should:
1. Include passing tests (run tox)1 .
2. Update documentation when there’s new API, functionality etc.
3. Add a note to CHANGELOG.rst about the changes.
4. Add yourself to AUTHORS.rst.

6.4.2 Tips

To run a subset of tests:

tox -e envname -- pytest -k test_myfeature

To run all the test environments in parallel (you need to pip install detox):

detox

1 If you don’t have all the necessary python versions available locally you can rely on Travis - it will run the tests for each change you add in the

pull request.
It will be slower though . . .

54 Chapter 6. Contributing
CHAPTER 7

Authors

• Bas des Tombe - https://github.com/bdestombe


• Bart Schilperoort - https://github.com/BSchilperoort

55
dtscalibration, Release 0.6.3

56 Chapter 7. Authors
CHAPTER 8

Changelog

8.1 0.6.3 (2019-04-03)

• Added reading support for zipped silixa files. Still rarely fails due to upstream bug.
• pretty __repr__
• Reworked double ended calibration procedure. Integrated differential attenuation outside of reference sections
is now calculated seperately.
• New approach for estimation of Stokes variance. Not restricted to a decaying exponential
• Bug in averaging TMPF and TMPB to TMPW
• Modified residuals plot, especially useful for long fibers (Great work Bart!)
• Example notebooks updatred accordingly
• Bug in to_netcdf when passing encodings
• Better support for sections that are not related to a timeseries.

8.2 0.6.2 (2019-02-26)

• Double-ended weighted calibration procedure is rewritten so that the integrated differential attenuation outside
of the reference sections is calculated seperately. Better memory usage and faster
• Other calibration routines cleaned up
• Official support for Python 3.7
• Coverage figures are now trustworthy
• String representation improved
• Include test for aligning double ended measurements
• Example for aligning double ended measurements

57
dtscalibration, Release 0.6.3

8.3 0.6.1 (2019-01-04)

• Many examples were shown in the documentation


• Fixed verbose settings of solvers
• Revised example notebooks
• Moved to 80 characters per line (PEP)
• More Python formatting using YAPF
• Use example of plot_residuals_reference_sections function in Stokes variance example notebook
• Support Python 3.7

8.4 0.6.0 (2018-12-08)

• Reworked the double-ended calibration routine and the routine for confidence intervals. The integrated differ-
ential attenuation is not zero at x=0 anymore.
• Verbose commands carpentry
• Bug fixed that would make the read_silixa routine crash if there are copies of the same file in the same folder
• Routine to read sensornet files. Only single-ended configurations supported for now. Anyone has double-ended
measurements?
• Lazy calculation of the confidence intervals
• Bug solved. The x-coordinates where not calculated correctly. The bug only appeared for measurements along
long cables.
• Example notebook of importing a timeseries. For example, importing measurments from an external temperature
sensor for calibration.
• Updated documentation

8.5 0.5.3 (2018-10-26)

• No changes

8.6 0.5.2 (2018-10-26)

• New resample_datastore method (see basic usage notebook)


• New notebook on basic usage of DataStore
• Support for Silixa v4 (Windows xp based system) and Silixa v6 (Windows 7) measurement files
• The representation string now includes the sections
• Reorganized the IO related files
• CI: Add appveyor to continuesly test on Windows platform
• Auto load Silixa files to memory option, if size is small

58 Chapter 8. Changelog
dtscalibration, Release 0.6.3

8.7 0.5.1 (2018-10-19)

• Rewritten the routine that reads Silixa measurement files


• dts-calibration is now citable
• Refractored the MC confidence interval routine
• MC confidence interval routine speed up, with full dask support
• Link to mybinder.org to try the example notebooks online
• Added a few missing dependencies
• The routine to read the Silixa files is completely refractored. Faster, smarter. Supports both the path to a
directory and a list of file paths.
• Changed imports from dtscalibration to be relative

8.8 0.4.0 (2018-09-06)

• Single ended calibration


• Confidence intervals for single ended calibration
• Example notebooks have figures embedded
• Several bugs squashed
• Reorganized DataStore functions

8.9 0.2.0 (2018-08-16)

• Double ended calibration


• Confidence intervals for double ended calibration

8.10 0.1.0 (2018-08-01)

• First release on PyPI.

8.7. 0.5.1 (2018-10-19) 59


dtscalibration, Release 0.6.3

60 Chapter 8. Changelog
CHAPTER 9

Indices and tables

• genindex
• modindex
• search

61
dtscalibration, Release 0.6.3

62 Chapter 9. Indices and tables


Python Module Index

d
dtscalibration, 41

63
dtscalibration, Release 0.6.3

64 Python Module Index


Index

C R
calibration_double_ended() (dtscalibra- read_sensornet_files() (in module dtscalibra-
tion.DataStore method), 42 tion), 51
calibration_single_ended() (dtscalibra- read_silixa_files() (in module dtscalibration),
tion.DataStore method), 43 52
channel_configuration (dtscalibra- resample_datastore() (dtscalibration.DataStore
tion.DataStore attribute), 43 method), 46
chbw (dtscalibration.DataStore attribute), 43
chfw (dtscalibration.DataStore attribute), 44 S
conf_int_double_ended() (dtscalibra- sections (dtscalibration.DataStore attribute), 47
tion.DataStore method), 44
conf_int_single_ended() (dtscalibra- T
tion.DataStore method), 45 temperature_residuals() (dtscalibra-
D tion.DataStore method), 47
timeseries_keys (dtscalibration.DataStore at-
DataStore (class in dtscalibration), 41 tribute), 47
dtscalibration (module), 41 to_netcdf() (dtscalibration.DataStore method), 47
G U
get_default_encoding() (dtscalibra-
ufunc_per_section() (dtscalibration.DataStore
tion.DataStore method), 46
method), 48
get_time_dim() (dtscalibration.DataStore method),
46
get_x_dim() (dtscalibration.DataStore method), 46
V
variance_stokes() (dtscalibration.DataStore
I method), 49
in_confidence_interval() (dtscalibra- variance_stokes_exponential() (dtscalibra-
tion.DataStore method), 46 tion.DataStore method), 50
inverse_variance_weighted_mean() (dtscali-
bration.DataStore method), 46
inverse_variance_weighted_mean_array()
(dtscalibration.DataStore method), 46
is_double_ended (dtscalibration.DataStore at-
tribute), 46

O
open_datastore() (in module dtscalibration), 50

P
plot_dask() (in module dtscalibration), 52

65