Академический Документы
Профессиональный Документы
Культура Документы
Application to Forecasting
sets such as the smart meter data. consumption, which can be quite useful in build fore-
Typically the raw time series data are not immediately casting models, as demonstrated in section 4.
applicable for analytics purposes and several operations are Another important type of TimeSeries is the
required for cleaning and spatio-temporal alignment. In the
proposed data model, the concept of TimeSeries is also used TimeSeriesPriority, internally composed of a sorted
to support and trace such operations, which can be calcu- map of TimeSeries indexed according to a priority
lated on the fly where possible or in batch processes using order. For a given timestamp, by default, the Time-
the materialization concept. Some examples are: SeriesPriority returns the value from the first Time-
Series in the map where a value is available. Alter-
TimeSeriesWeightedSum: Time series whose value xt , native behaviors can be implemented, where the value
at time t, is the weighted sum of the value of n input from the TimeSeries at a specific index is returned, or
time series ytj , j = 1, . . . n, at the same time, namely from the first TimeSeries starting at the index above
j
xt = n
j=1 wj yt . Typical use cases of weighted sums are a certain threshold.
spatial aggregations (interpolation of high-resolution The main use case of the TimeSeriesPriority is deal-
weather data, extraction of PV generation in a spatial ing with rolling-horizon forecasts, which was the pri-
region, etc.) or applications of power flow equations to mary application behind the development of the pro-
derive the electrical load at a substation. posed data model, as mentioned in section 2. Rolling-
window forecasts, e.g. from weather forecasts pro-
TimeSeriesLagged: Time series whose value xt , at duced by numerical models, are multi-step ahead fore-
time t is given as the value of a reference time series casts that are refreshed at regular intervals. Such quan-
yt at time t h, namely xt = yth . Lagged time series tities cannot be represented as a one-dimensional time
are critical for representing delayed dynamical effects series because there is a one-to-many relation between
in statistical models. Note that the combination of timestamps and values. An option could be to over-
lags and weighted sums can be used to represent quite write the values with the latest available forecasts, at
complex time series models. the loss of potentially valuable information (most re-
cent forecasts are not necessarily more accurate) and
TimeSeriesWindowed: Represents a time series with traceability of operations between live and batch cal-
values xt , at time t, resulting from an operation ap- culations. By using the TimeSeriesPriority, rolling-
plied to the values of an input time series y at times window forecasts are represented as multiple Time-
within a window [t , t]. Use cases for such an Series indexed by a quantity proportional to the fore-
operator are time interpolation and integration, when casting horizon h, specifically where each TimeSeries
raw data come at irregular sampling intervals or mov- is x(t + h|t), with h fixed. By default, the most re-
ing averaging is required (e.g. SCADA gives instan- cent forecast is returned, but one could select the value
taneous power but we are interested in modelling en- from forecasts at least 24-hour ahead, or many other
ergy). An equally important use case is the computa- alternative behaviors could be implemented.
tion of summary statistics of high-resolution data, such An alternative use case of the TimeSeriesPriority is
as daily minimum/maximum of temperature or energy the fallback mechanism between multiple forecasting
models applied to the same energy signal. If the mod- ModelClass: Specifies the structure of the model, in
els are prioritised based on some accuracy measure, the terms of requires set of inputs, as pairs of SignalType
TimeSeriesPriority allows to deal with temporary is- and variable name, and the SignalType of the output.
sues in one model (e.g. data anomalies or missing in-
puts) and to transparently fall back to the next avail- Model: A realization of a ModelClass, with specific val-
able model output. Such mechanism will be demon- ues for the parameters and trained to model a specific
strated in section 4. Signal (pair SignalType/Entity). The parameters
are stored using an XML file following the principles
Finally, the TimeSeriesFlag is also defined, as a mech- of the Predictive Model Markup Language1 (PMML).
anism to associate flags to particular data points of a time
series, in order to deal with anomalies or data quality issues. ModelInstance: An instantiation of a Model, where
Flagging time series points prevents them from being used, each required input, a CovariateInstance, is linked
for example, as input to analytical models and increase the to a specific TimeSeries.
robustness of the live system, for example in the case of
faulty autoregressive model features. Such a representation of the analytical models is powerful
The proposed data model is quite general and can be eas- in supporting the offline tasks of model design and training
ily extended with other fundamental types of TimeSeries. of the data scientist: the relevant data can be transparently
The listed examples already form the basis for quite a rich extracted and aligned by relying on queries of the type given
grammar able to drive powerful features from the raw data. in section 3.2; derived features can be designed by using
the abstract fundamental operation described in section 3.1.
3.2 Signals and Entities Moreover, when dealing with thousands of statistical mod-
One of the main objective of the proposed data model is els, the system makes it quite easy to navigate between mod-
to provide context to the existing data with respect to high- els to verify performance and retrain them where needed.
level physical entities and types of quantities of interest in
the specific domain of application. As a result,as also shown 4. SYSTEM DEMONSTRATION
in Fig. 2, two main dimensions are utilised in the system for
In this section, the application of our system to short-term
identifying one or more TimeSeries:
forecasting of electricity demand is demonstrated. Due to
Entity: represents a physical entity of interest. In the confidentiality reasons, electricity demand data of Vermont
context of energy utilities, for example, we have admin- from January 1st, 2012, till August 31st, 2015, was obtained
istrative or geographical entities such as State, Dis- from the website of ISO New England2 . Those data came in
tributionUtility, County, Town, and grid assets such hourly format, with the measurements describing energy us-
as DistributionSubstation, DistributionFeeder, Net- age (in MWh) over the previous hour. When ingesting those
workBus, NetworkBranch, ServicePoint. data into our database, the time stamps were converted to
Eastern Standard Time (EST). Three classes of covariates
SignalType: represents the type of a physical quan- were used in the forecasting model: weather data, calendar
tity for which it can be expected to have observation variables, and lagged demand values. Next, the workflow
data or for which estimated time series data are ex- that was used for designing and training forecasting models
pected to be required. Some examples are TEMPERA- is described, and the configuration of the system to apply
TURE, ENERGY_DEMAND, ENERGY_RESIDUAL_DEMAND, EN- forecasting models in a live environment is demonstrated.
ERGY_GENERATION_PV, ACTIVE_POWER. Signal types are
the mechanism for cataloguing time series according to 4.1 Modelling
some high-level human-understandable meaning, and For the modeling and forecasting of electricity demand,
to maintain consistency between data of the same type, a popular class of non-linear regression models, which rep-
for example with respect to UnitMeasure. resent the effect of covariates in an additive fashion was used:
The two dimensions of Entity and SignalType are gath- Generalized Additive Models (GAMs). For more background
ered within the concept of Signal, which is a required prop- on GAMs and their application to electricity demand data,
erty of a TimeSeries. The proposed constructs allow the we refer to [13]. Note that, in principal, the proposed data
data and more abstract time series available within the sys- model would support any other class of regression or classifi-
tem to be navigated with very high-level queries of the type: cation methods, as it only describes the inputs and outputs
of the models, but not the exact form of the functional re-
SELECT FROM TIME SERIES TS lation. The mgcv package in R was used for training GAMs
INNER JOIN SIGNALS S ON S . ID=TS . SIGNAL (see [14]), as part of the following general work-flow:
INNER JOIN SIGNAL TYPES ST ON ST . ID=S .STYPE
INNER JOIN ENTITIES E ON E . ID=S . ENTITY 1. A training data frame is compiled by querying histori-
WHERE E .NAME= S u b s t a t i o n n a m e cal electricity demand data and the covariates aligned
AND ST .NAME= ENERGY RESIDUAL DEMAND . with it. For the Vermont data, the choice of the co-
variates had already been defined and implemented
3.3 Models through abstract time series in the data model. Since
the relation between the model and covariates is also
Another important component of the proposed data model represented in our data model, the compilation of the
was designed in order to represent the output of statistical training data frame could be done fully automatically.
models, which relies on yet another type of time series, the
1
TimeSeriesModelled. The details of the model are repre- http://dmg.org/
2
sented through the following entities: http://iso-ne.com
Signal 6792, ENERGY DEMAND MEAN, ISO-NE Vermont
iterative fashion - new model covariates are derived. Some
Ts 360978, TS PRIORITY, Short-term forecast of ISO-NE Vermont of the key operations supported by our data model are, e.g.,
Ts 360979, TS MODELLED, Output of S ISO NE Vermont mean demand A5 v0.pmml
the registration of new predictive models for a given entity,
and the linking of new time series to the model covariates.
Ts 43775, TS CATEGORICAL UNSTRUCTURED, hour