Академический Документы
Профессиональный Документы
Культура Документы
Email: {wolf}@ece.gatech.edu
Abstract—In this paper, we propose a novel framework, algorithms and theoretical analysis for state abstraction and
called Hierarchical MDP framework for Compact System-level MDP computation, and the connection to implementation of
Modeling (HMCSM), for design and implementation of adaptive the hierarchical MDPs and application to real-world systems
embedded signal processing systems. The HMCSM framework
applies Markov decision processes (MDPs) to enable autonomous is not addressed. One objective of this paper is to help bridge
adaptation of embedded signal processing under multidimen- this gap in the context of embedded signal processing systems.
sional constraints and optimization objectives. The framework
integrates automated, MDP-based generation of optimal recon- In particular, in this work, we integrate the MDP schemes
figuration policies, dataflow-based application modeling, and presented in [4] and [3]. This results in a novel approach
implementation of embedded control software that carries out to formulating MDPs for policy optimization in embedded
the generated reconfiguration policies. HMCSM systematically signal processing systems with complex state spaces, and
decomposes a complex, monolithic MDP into a set of separate
MDPs that are connected hierarchically, and that operate more stringent implementation constraints. In our proposed design
efficiently through such a modularized structure. We demonstrate framework, we apply hierarchical MDPs to decompose the
the effectiveness of our new MDP-based system design framework modeling of the application and embedded processing system
through experiments with an adaptive wireless communications into multiple MDPs. Each smaller MDP is formulated using
receiver. an approach similar to that developed in [2]. We refer to this
hybrid MDP approach as the Hierarchical MDP approach for
I. I NTRODUCTION Compact System-level Modeling (HMCSM).
Modern signal processing applications impose increasing
demands of adaptivity, flexibility and reconfigurability. On To promote systematic derivation of embedded implemen-
one hand, adaptive signal processing systems must adjust to tations using the HMCSM approach, we integrate the ap-
dynamically-changing environmental conditions, system status proach into the framework of dataflow-based design of signal
or user requirements; on the other hand, the systems must often processing systems and develop a dataflow-based framework
satisfy stringent constraints on energy-efficiency and real-time for design and implementation of adaptive signal processing
performance. systems using HMCSM.
In this work, we apply Markov decision processes (MDPs)
To demonstrate the efficiency and flexibility of the proposed
to address such challenges. MDPs have been used in many
design framework and the corresponding libraries and tools,
application areas as a foundation for dynamic determination
we implement an adaptive wireless communication receiver
of system configurations in stochastic environments. For ex-
that dynamically optimizes its system configuration in re-
ample, Boutilier et al. propose factored MDPs as a method for
sponse to changes in different use cases. As a key part of
compact representation of large, structured MDPs [1]. Benini
the receiver, the channelizer accounts for most of the computa-
et al. introduce a finite-state, abstract system model for power-
tional complexity and energy consumption [5]. By adapting the
managed systems [2]. However, the complexity of the MDP
configuration of the channelizer based on the communication
algorithms in general grows exponentially with increases in
scenario, we seek to optimize its energy efficiency while
the size of the state space. Jonsson and Barto present an
ensuring its correct functionality at any given time. We design
algorithm that performs hierarchical decomposition of factored
an HMCSM MDP to perform this adaptation, and apply our
MDPs to help alleviate this growth in complexity [3]. How-
dataflow-based MDP implementation framework to realize the
ever, their development of hierarchical MDPs is focused on
resulting adaptive signal processing system on a state-of-the-
art embedded platform.
MDP
Solver
Configuration Parameters
Policy
Mapping System
Engine Sensor
Measurements
System State System
Optimized Policy Configuration State Model
Control
Machine Environment State Environmental
State Model
Measurements
Environmental
Configuration Management Subsystem Sensor
Fig. 1: An illustration of the HMCSM framework for design and implementation of adaptive signal processing systems.
direction for this work is the development of tools to help graph that is developed using LIDE (see Section II-B). This
automate the factoring and decomposition process. dataflow graph implementation is represented by the block
The Configuration Control Machine (CCM), shown in the labeled Parameterized LIDE Implementation, and the software
lower (run time) portion of Figure 1, is used to manage dy- tool that we use to construct this implementation is represented
namic system-level reconfiguration throughout operation of the by the block labeled LIDE Library.
embedded signal processing system. Here, by dynamic recon- In the HMCSM framework, each control action is formu-
figuration, we mean changes to any system-level parameters, lated in terms of specific changes to specific parameters in
including software, platform, and algorithmic parameters, that the parameterized dataflow application model M . Execution
can be manipulated while the system is executing. At run-time, of the control action A at run-time involves setting each
the CCM executes periodically and we refer to each periodic parameter to the corresponding value such that subsequent
execution of the CCM as a reconfiguration round. During operation of M will be performed using these new parameter
a reconfiguration round, the CCM determines, based on the settings. Operation continues with the new parameter settings
current environmental state and system state, whether or not to until a new control action is applied to the system (in some
perform a dynamic reconfiguration operation and the specific subsequent reconfiguration round).
reconfiguration operation if reconfiguration is to be applied to Besides the aforementioned Control Actions, the Stochastic
the system. The blocks labeled System Sensor, System State Models of Environment and System and Reward Function
Model, Environmental Sensor, and Environmental State Model are the other two main components in the formulation of an
represent measurements and models that are used by the CCM MDP in HMCSM (see Figure 1). These two components are
to determine the system and environmental state during a given developed by hand using well-established foundations of MDP
reconfiguration round. modeling, along with domain knowledge of the targeted ap-
The block labeled Control Actions, in the design time plication. The Stochastic Models of Environment and System
portion, encompasses the set of possible reconfiguration op- include, for each of the two models, the definition of the state
erations that can be applied by the CCM in a given recon- space and the state transition matrix (STM). The STM is a
figuration round. If A denotes the set of control actions, then stochastic matrix that defines the probability of the transition
the CCM can be viewed as an implementation of a function to the next state given the existing state, and conditioned
P : Se × Ss → A, where Se is the set of environmental states, on a given action. The Reward Function maps state-action
and Ss is the set of system states. This function P is referred to pairs into scores that assess the utility of performing the
as the reconfiguration policy in an adaptive signal processing associated control action during the given state. The approach
system that is developed using the HMCSM framework. In for incorporating multidimensional design objectives into the
HMCSM, the policy is applied at run-time, and derived at scores produced by the reward function is provided in [4]. In
design-time. It is derived using an MDP Solver, which is our wireless communications case study, which we present in
a software module that automatically generates optimized Section IV, the specific design objectives that we target are (a)
policies from MDP model specifications. energy consumption and (b) probability of packet collisions.
The Policy Mapping Engine, shown near the center of In summary, the HMCSM framework presented in this
Figure 1, translates control actions into updates to dynamic section provides a comprehensive methodology and supporting
parameters in the embedded software that achieve the intended tools for design and implementation of adaptive embedded
actions. These parameter updates are made by setting appropri- signal processing systems. The specific tools that we apply in
ate variables in an implementation of the application dataflow our demonstration of the framework are MDPSOLVE [11] and
Network System
MDP II-a
LIDE, which correspond to the blocks labeled MDP Solver Traffic
Rate
MDP I Bandwidth
Network
Policy MDP II-a System
and LIDE Library, respectively, in Figure 1. However, the Traffic
Rate
Policy Bandwidth
framework is not intended to be specific to these tools, and
Input Adaptive MDP II-b
can readily be adapted to other tools for solving MDPs and Data Channelizer
MDP I MDP II-b
implementing parametric dataflow graphs, respectively. Output
Data
Policy
PSK Demondulator
IV. C ASE S TUDY
Multirate Matched Input Adaptive Output
A. Background FIR Filter Data Channelizer Data
Resampler Preamble
In this section, we describe a detailed case study as an Detector
Received MDP II
illustrative example of the methodology introduced in Sec- Per-channel Rx Processing Packets
tion III. In addition to providing a demonstration of our
proposed HMCSM design framework on a practical, adaptive Fig. 2: Block diagram of receiver signal processing and two
signal processing application, the case study also serves as MDP schemes.
a demonstration of a novel, application-specific, MDP-based
system design.
splits the modeling into two MDPs arranged in a hierarchy.
B. Adaptive Receiver Architecture These two MDPs are labeled MDP-II-a and MDP-II-b.
Our case study is a wireless receiver for a Low Power The channelizer can be implemented by one of two options,
Wide Area Network (LPWAN) used in a “Smart Cities” as detailed in [4] — a bank of M polyphase decimators
Internet of Things (IoT) application [12]. We propose an and mixers (labeled as DCM[1], DCM[2], . . . , DCM[M ]) or a
adaptive LPWAN receiver that dynamically adjusts the system Discrete Fourier Transform Filter Bank (labeled as DFTFB).
bandwidth continually, and periodically transmits the new Both options are capable of meeting the processing require-
bandwidth setting to end nodes through the use of a downlink ments of the channelizer subsystem. However, they do so
beacon. This case study implements the physical layer signal using different algorithmic means and the relative efficiency
processing for such an adaptive receiver. The implemented ar- of one option compared to the other is highly platform-
chitecture consists of reconfigurable channelizer and baseband and implementation-dependent. Using our proposed MDP-
processing algorithms. based approach, the reconfiguration policy is systematically
In this work, we build on the MDP-based dynamically optimized for the exact processing characteristics of a specific
reconfigurable channelizer presented in [4]. Our work on this platform (e.g., measured power consumption).
case study and the underlying HMCSM framework developed
D. PSDF Model
in this paper goes beyond the developments of [4] by (1) our
application of hierarchical MDP techniques; (2) integrating We implement the channelizer subsystem as a PSDF design
MDP-based control with model-based methods for signal with dynamic parameters. Our PSDF model of the channelizer
processing system implementation; (3) expanding the role of system is illustrated in Figure 3. Here, M is a static parameter
the MDP to control not just the channelizer configuration, but of the application that represents the total number of available
also the system bandwidth; and (4) comparing a baseline MDP channels.
framework with our proposed hierarchical MDP approach that At run-time, the MDP-generated reconfiguration policy de-
reduces the model size. termines how many channels to enable, based on the data rate,
and whether to apply DCM processing or DFTFB process-
C. Application Specification ing. These policy decisions are used to manipulate a set of
Figure 2 shows a block diagram of the adaptive receiver dynamic parameters {Y1 , Y2 , . . . , YM } that is associated with
architecture that is investigated in this case study. The chan- M distinct DCM actors DCM[1], DCM[2], . . . , DCM[M ], re-
nelizer is used to separate the incoming wideband signal spectively. The policy decisions are also used to manipulate a
into multiple data streams, where each of the data streams parameter Z that is associated with an actor labeled DFTFB,
is associated with a distinct channel. Each channel is then which represents DFTFB processing on all of the enabled
oversampled for symbol timing recovery, and then processed channels. Each of these (M + 1) parameters is binary valued.
by a Generalized Likelihood Ratio Test (GLRT) detector, In particular, for each i ∈ {1, 2, . . . , M } if DCM processing
which looks for the transmission preamble. Once a detection is enabled, and the ith channel is enabled, then Yi = 1 and
is successful, a matched filter demodulator recovers the trans- Z = 0. Conversely, if DFTFB processing is enabled, then
mitted data and confirms it with an error detection function Z = 1, and Yi = 0 for all i.
(e.g., CRC32). After each reconfiguration round, updated values of
We compare the relative merits of two separate MDP {Y1 , Y2 , . . . , YM } and Z are propagated to the adaptive chan-
schemes, labeled MDP-I and MDP-II. These two schemes are nelizer flowgraph through the Init Graph, as illustrated in
illustrated together in Figure 2. MDP-I consists of a single Figure 3. The channelizer flowgraph is parameterized in terms
MDP employed for control of both the dynamic bandwidth of these (M + 1) dynamic parameters and encapsulated within
as well as the channelizer processing configuration. MDP-II the Body Graph, as shown in the figure. The Init Graph and
Init Graph
The STM for the packet generation rate is computed a priori
Optimized 1
Parameters Src from the traffic rate measured in a design-time simulation. The
from Policy
Mapping Engine Config.
M+1
STM for the processing system is obtained from the dynamics
Sink
of the implementation on the reference platform.
{Y1, Y2, ..., YM, Z}
The MDP reward metric is a linear combination of two
Body Graph Input competing metrics: the probability of packet collision and the
Data
Src 1
power consumption expended in basestation processing. This
linear combination of conflicting metrics tasks the MDP with
M finding an optimal trade-off at any time based on relative
M*Y1
Distrib.
M*Z
weightings that are provided by the designer for the two
M*Y2 M*YM metrics.
M M M M
2) MDP-II: In MDP-II, the control task is split across
DCM[1] DCM[2]
1
DCM[M] DFTFB
1 1 1
two hierarchically arranged MDPs: MDP-II-a and MDP-II-b.
1
1
MDP-II-a is used to determine the optimal number of channels
Y1 Z Y2 Z YM Z
to enable at a given time, while being agnostic to what process-
Chn[1] Chn[2] Chn[M] ing configuration is actually used in the receiver to implement
1 1 1
that configuration. MDP-II-b is used to determine the optimal
processing configuration for an exogenously specified number
Channel 1 Channel 2 Channel M of channels.
V. E XPERIMENTS
As mentioned in Section IV, we compare the relative merits B. Simulation Results
of a conventional, monolithic MDP approach (labeled MDP-I) We performed a physical layer signal processing simula-
with our proposed hierarchical MDP approach (labeled MDP- tion in MATLAB to generate uplink traffic from 1,000 end
II) to adaptive signal processing system design. nodes. The simulation compared the use of the two (adaptive)
1) MDP-I: In MDP-I, the (single) MDP state space consists MDP Schemes and also (non-adaptive) cases where only the
of the instantaneous rate of uplink packets generated by all fixed channelizer configurations were used. Each run of the
end nodes, and the configuration of the basestation processor simulation generated results of the form shown in Figure 4.
(number of channels enabled and channelizer configuration). In this figure, the MDP (MDP-I) is in control of the number
The action space consists of the next configuration of R of receiver channels enabled. Note that when the traffic
the basestation processor (number of channels enabled and rate is high, the system increases R in order to reduce packet
channelizer configuration). collisions.
The two competing MDP schemes, MDP-I and MDP-II, embedded signal processing systems. HMCSM provides a
produced the same control policy. However, a key difference structured design methodology that integrates model-based
is that the hierarchical MDP reduced the model size from design for embedded signal processing in terms of dataflow
1.63MB to 265kB, which is a factor of 6.1 times smaller than methods; MDP formulation using compact, hierarchical mod-
the original size. This reduction becomes especially relevant els; optimal policy generation from these models at design
when housing the MDP model and policy generation code time; and dynamic, system-level reconfiguration at run time.
on the processing platform itself. Such an embedded MDP The framework enables systematic derivation of system-level
realization is useful because it allows the MDP and generated reconfiguration policies that are based on application-specific
policy to adapt dynamically based on learned characteristics functional requirements and operational constraints. Useful
of the operating environment. Integration of embedded MDP directions for future work include adapting the HMCSM
techniques into the HMCSM framework is a useful direction framework for use with other parametric dataflow models
for future work. (beyond PSDF), development of tools to help automate the
MDP-II also provides a benefit in the execution time re- factoring and hierarchical decomposition of MDPs, and inte-
quired for the MDP solver to compute the optimal policy. gration of embedded MDP techniques into HMCSM.
We applied the MATLAB-based open source solver called
ACKNOWLEDGMENTS
MDPSOLVE [11]. The execution times were measured as
294ms for MDP-I, 50.8ms for MDP-II-a and 41.5ms for MDP- This research was sponsored in part by the US National
II-b. As a result, in a deployment with a fixed processing Science Foundation (CNS1514425 and CNS151304).
system that periodically re-computes the control policy in R EFERENCES
response to a changing external environment, the hierarchical
[1] C. Boutilier, R. Dearden, and M. Goldszmidt, “Exploiting structure in
MDP scheme reduces the solver time from 294ms to 50.8ms, policy construction,” in Proceedings of the International Joint Confer-
which is a factor of over 5.7X smaller. ence on Artificial Intelligence, 1995, pp. 1104–1111.
All of the simulation runs that we carried out are compared [2] L. Benini, A. Bogliolo, G. A. Paleologo, and G. De Micheli, “Policy
optimization for dynamic power management,” IEEE Transactions on
in Figure 5. Different simulations for the MDP-generated Computer-Aided Design of Integrated Circuits and Systems, vol. 18,
policy were carried out using different relative weightings for no. 6, pp. 742–760, June 1999.
the two performance metrics. Simulations were also carried out [3] A. Jonsson and A. Barto, “Causal graph based decomposition of factored
MDPs,” Journal of Machine Learning Research, vol. 7, pp. 2259–2301,
for fixed signal processing configurations. The square-shaped 2006.
points in Figure 5 correspond to DCM-based channelizer [4] A. Sapio, M. Wolf, and S. S. Bhattacharyya, “Compact modeling and
operation with the number of enabled channels ranging from management of reconfiguration in digital channelizer implementation,”
in Proceedings of the IEEE Global Conference on Signal and Informa-
1 to 8. The fixed configuration DFTFB case is represented by tion Processing, December 2016.
the triangle-shaped point in the top left region of the figure. In [5] S. J. Darak, A. P. Vinod, R. Mahesh, and E. M.-K. Lai, “A reconfigurable
the context of all of the design points evaluated and the two filter bank for uniform and non-uniform channelization in multi-standard
wireless communication receivers,” in Proceedings of the International
performance metrics, we see from the figure that the MDP- Conference on Telecommunications, 2010, pp. 951–956.
based approach generates a Pareto front. This demonstrates [6] B. Bhattacharya and S. S. Bhattacharyya, “Parameterized dataflow
that the adaptive reconfiguration capability provided by the modeling for DSP systems,” IEEE Transactions on Signal Processing,
vol. 49, no. 10, pp. 2408–2421, October 2001.
MDP leads to better performance in comparison with fixed [7] ——, “Quasi-static scheduling of reconfigurable dataflow graphs for
configurations for each of the available signal processing DSP systems,” in Proceedings of the International Workshop on Rapid
algorithms. System Prototyping, Paris, France, June 2000, pp. 84–89.
[8] V. Bebelis, P. Fradet, A. Girault, and B. Lavigueur, “BPDF: A statically
10 analyzable dataflow model with integer and Boolean parameters,” in
# of RX Channels [packets/sec]
Traffic Rate
MDP
0 1.7
0 5 10 15 20 25 DFTFB [9] K. Desnos, M. Pelcat, J. Nezan, S. S. Bhattacharyya, and S. Aridhi,
Time [Hours] DCM “PiMM: Parameterized and interfaced dataflow meta-model for MPSoCs
1.6
8 runtime reconfiguration,” in Proceedings of the International Conference
6 1.5 on Embedded Computer Systems: Architectures, Modeling, and Simula-
4 tion, 2013, pp. 41–48.
2
0 1.4 [10] C. Shen, W. Plishker, H. Wu, and S. S. Bhattacharyya, “A lightweight
0 5 10 15 20 25 0.2 0.4 0.6 0.8 dataflow approach for design and implementation of SDR systems,”
Time [Hours]
Probability of Packet Collision in Proceedings of the Wireless Innovation Conference and Product
Exposition, Washington DC, USA, November 2010, pp. 640–645.
Fig. 4: Simulation results for Fig. 5: Comparison among [11] P. L. Fackler, “MDPSOLVE a MATLAB toolbox for solving Markov
MDP-I. MDP-generated policies and decision problems with dynamic programming — user’s guide,” North
Carolina State University, Tech. Rep., January 2011.
fixed-configuration designs. [12] A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi, “Internet
of things for smart cities,” IEEE Internet of Things Journal, vol. 1, no. 1,
pp. 22–32, 2014.
VI. C ONCLUSIONS
In this paper, we have developed the Hierarchical MDP
framework for Compact System-level Modeling (HMCSM)
and its application to design and implementation of adaptive