Академический Документы
Профессиональный Документы
Культура Документы
arxiv.org:1309.5504 [nlin.CD]
Chaos Forgets and Remembers:
Measuring Information Creation, Destruction, and Storage
H[X0 ] future but not in the past and information (rµ ) in the
present but in neither the past nor the future.
H[X:0 ]
rµ H[X1: ]
The rµ component was first studied by Verdú and
Weissman [13] as the erasure entropy (their H − ) to mea-
sure information loss in erasure channels. To emphasize
ρµ bµ that it is information existing only in a single moment—
created and then immediately forgotten—we refer to rµ
as the ephemeral information. The second component bµ
we call the bound information since it is information cre-
ated in the present that the system stores and that goes
FIG. 1. A process’s I-diagram showing how the past X:0 , on to affect the future [14]. It was first studied as a mea-
present X0 , and future X1: partition each other into seven sure of “interestingness” in computational musicology by
distinct information atoms. We focus only on the four re- Abdallah and Plumbley [15]. For a more complete analy-
gions contained in the present information H[X0 ] (blue cir-
cle). That is, the present decomposes into three components: sis of this decomposition, as well as computation methods
ρµ (horizontal lines), rµ (vertical lines), and bµ (diagonal and related measures, see Ref. [16].
crosshatching). The redundant information ρµ overlaps with Isolating the information H[X0 ] contained in the
the past H[X:0 ]; the ephemeral information rµ falls outside present and identifying its components provides the par-
both the past and the future H[X1: ]. The bound information
bµ is that part of H[X0 ] which is in the future yet not in the titioning illustrated in Fig. 1. This is a particularly intu-
past. itive way of thinking about the information contained in
an observation. While, some behavior (ρµ ) can be pre-
dicted, the rest (hµ = bµ + rµ ) cannot. Of that which
all the prior observations X:0 : cannot be predicted, some (bµ ) plays a role in the future
behavior and some (rµ ) does not. As such, this is a nat-
hµ = H[X0 |X:0 ] , (1) ural decomposition of a time series; one that results in a
semantic dissection of the entropy rate.
where H[Y |Z] denotes the Shannon conditional entropy By way of an example, consider a few simple processes
of random variable Y given variable Z. This quantity and how their present information decomposes into these
arises in various contexts and goes by many names: e.g., three components. A periodic process of alternating 0s
the Shannon entropy rate and the Kolmogorov-Sinai met- and 1s (. . . 01010101 . . .) has H[X0 ] = 1 bit since 0s and
ric entropy, mentioned above [8]. The complement of the 1s occur equally often. Given a prior observation, one can
entropy rate is the predicted information ρµ : accurately predict exactly which symbol will occur next
and so H[X0 ] = ρµ = 1 bit, while rµ = bµ = 0 bits. On
ρµ = I[X:0 : X0 ] , (2)
the other extreme is a fair coin flip. Again, each outcome
where I[Y : Z] denotes the mutual information between is equally likely and so H[X0 ] = 1 bit. However, each flip
random variables Y and Z [12]. Hence, ρµ is the infor- is independent of all others and so H[X0 ] = rµ = 1 bit,
mation in the present that can be predicted from prior while ρµ = bµ = 0 bits.
observations. Together, we have a decomposition of the Between these two extrema lie interesting processes:
information contained in the present: H[X0 ] = hµ + ρµ . those with stochastic structure. Processes expressing a
A simple application of the entropy chain rule [12] to fixed template, like the periodic process above, contain
Eq. (1) leads us to a different view: a finite amount of information. Those with stochastic
structure, however, constantly generate information and
hµ = I[X0 : X1: |X:0 ] + H[X0 |X:0 , X1: ] store it in the form of patterns. Being neither purely
predictable nor independently random, these patterns are
= bµ + rµ . (3)
captured by bµ . The more intricate the organization, the
This introduces two new information measures: larger bµ . More to the point, generating these patterns
requires intrinsic computation in a system—information
bµ = I[X0 : X1: |X:0 ] and (4) creation, storage, and transformation [17]. We propose
rµ = H[X0 |X:0 , X1: ] . (5) bµ as a simple method of discovering this type of physical
computation: Where there are intricate patterns, there
That is, created information (hµ ) decomposes into two is sophisticated processing.
parts: information (bµ ) shared by the present and the How useful is the proposed decomposition and its mea-
3
hµ bµ rµ hµ bµ rµ
1.0 1.0
0.8 0.8
Information Rates
Information Rates
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
3.55 3.6 3.7 3.8 3.9 4.0 1.0 1.2 1.4 1.6 1.8 2.0
Control Parameter a Control Parameter a
FIG. 2. Logistic map information anatomy as a function of FIG. 3. Tent map information anatomy: Although hµ =
control parameter a: Bound information bµ is the lower (green bµ + rµ is a smooth function of control—hµ = log2 a—the
shaded) component; ephemeral information rµ is the upper decomposition into bound and ephemeral informations is not.
(blue shaded) component. Entropy rate is the top (blue) line: Graphics layout as in previous figure.
hµ = bµ + rµ . As reference to the dynamical behavior, the
map’s bifurcation diagram is displayed in the background.
where a ∈ [0, 4] is the control parameter and the initial where a ∈ [0, 2] is the control parameter. The gener-
condition is x0 ∈ [0, 1]. Its generating partition is defined ating partition for the Tent map is the same as for the
by: Logistic map. Since the Tent map is piecewise linear, its
( Lyapunov exponent is simply λ = log2 a and, by Pesin’s
1
0 if xn < 2
theorem, so is the information generation hµ = log2 a; a
sn = . (7)
1 if xn ≥ 1 rather smooth parameter dependence. As a result, the
2
intricate structures exhibited in the Tent map’s bifurca-
Figure 2 shows the resulting measures as a function of tion diagram cannot be resolved by studying solely the
control a, with the map’s bifurcation diagram displayed behavior of the Lyapunov exponent (or hµ ) itself. Fig-
in the background for reference. ure 3 demonstrates that, despite the entropy rate’s sim-
The first point of interest is that the system’s infor- ple logarithmic dependence on control, its decomposition
4
0.9
mergings of chaotic bands. Notably, while the maximal
hµ bµ 0.24
hµ occurs along the line b = 0, maximal bµ occurs far
0.8
0.21
0.45 0.45 from b = 0. Hence, large hµ does not necessarily imply
Control Parameter b
0.7
0.18 large bound information bµ .
0.6
0.15 To sum up, we showed that a process’s information cre-
0.00 0.5 0.00
0.12 ation rate decomposes, via a chain rule, into two struc-
0.4
0.09 turally meaningful components. The components, the
0.3
−0.45 −0.45
0.06
ephemeral information rµ and the bound information bµ ,
0.2
provide direct insights into a system’s behavior without
0.1 0.03
detailed modeling or appealing to domain-specific knowl-
−0.90 0.0 −0.90 0.00
1.0 1.2 1.4 1.6 1.8 2.0 1.0 1.2 1.4 1.6 1.8 2.0 edge. That is to say, they are relatively easily defined
Control Parameter a Control Parameter a
measures that can be straightforwardly estimated. More
FIG. 4. Lozi map information anatomy: (Left) hµ as a func- to the point, however, bµ is a strong indicator of intrin-
tion of controls a and b. (Right) bµ similarly. bµ is maximized sic computation. While related to information genera-
on the upper-right and lower-right edges of the a-b region that tion, we demonstrated that it captures a different kind
supports an attractor near the origin.
of informational processing—a mechanism that actively
stores information.
hµ = bµ + rµ is not a smooth function of a. To em- Concretely, decomposing information creation in the
phasize, in sharp contrast with hµ ’s simplicity, rµ and symbolic dynamics of the Logistic, Tent, and Lozi
bµ again appear nondifferentiable—a complexity masked systems delineated the topography of their intrinsic-
by the smooth hµ . Thus, the two informational com- computation landscape. Awareness of this rich (and pre-
ponents capture a property in the chaotic system’s be- viously hidden) landscape will lead to improved engineer-
havior that is both quantitatively and qualitatively new. ing of natural systems as substrates for information pro-
As with the Logistic map, we once again find that the cessing [11]. And, it will lead to an expanded under-
bound information vanishes and that all of the informa- standing of evolved information processing systems, such
tion the Tent map generates is forgotten (hµ = rµ ) at as the linguistic processes comprising human natural lan-
parameters corresponding to merging of chaotic bands guages. A sequel will develop the decomposition further,
k
(a = 21/2 , k = 0, 1, 2, . . .). In the Supplementary Mate- including a geometric interpretation of active information
rials we show how to calculate bµ and rµ in closed form storage that parallels the geometric view of information
for the Tent map at Misiurewicz parameters. creation expressed in the Lyapunov exponents.
To explore how these measures apply more generally, KB is supported by a UC Davis Chancellor’s Post-
we extend information anatomy to two dimensions by Doctoral Fellowship. This work was partially supported
analyzing the Lozi map: by ARO grant W911NF-12-1-0288.
1996.
[5] S. H. Strogatz. Nonlinear Dynamics and Chaos: With
Applications to Physics, Biology, Chemistry, and Engi-
neering. Westview Press, 1994.
[6] J. B. Jose and E. J. Saletan. Classical Dynamics: A
Contemporary Approach. Cambridge University Press,
New York, 1998.
[7] R. F. Streater. Statistical Dynamics: A Stochastic Ap-
proach to Nonequilibrium Thermodynamics. Imperial
College Press, London, second edition, 2009.
[8] P. Gaspard and X.-J. Wang. Noise, chaos, and
(epsilon,tau)-entropy per unit time. Physics Reports,
235(6):291–343, 1993.
[9] A. N. Kolmogorov. Entropy per unit time as a metric
invariant of automorphisms. Dokl. Akad. Nauk. SSSR,
124:754, 1959. (Russian) Math. Rev. vol. 21, no. 2035b.
[10] C. E. Shannon. A mathematical theory of communica-
tion. Bell Sys. Tech. J., 27:379–423, 623–656, 1948.
[11] W. L. Ditto, A. Miliotis, K. Murali, S. Sinha, and M. L.
Spano. Chaogates: Morphing logic gates that exploit
dynamical patterns. Chaos, 20(3):037107, 2010.
[12] R. W. Yeung. Information Theory and Network Coding.
Springer, New York, 2008.
[13] S. Verdú and T. Weissman. The Information Lost in Era-
sures. IEEE Trans. Info. Th., 54(11):5030–5058, 2008.
[14] Our terminology avoids the misleading use of the phrase
“predictive information” for bµ . The latter is not the
amount of information needed to predict the future.
Rather, it is part of the predictable information—that
portion of the future which can be predicted.
[15] S. A. Abdallah and M. Plumbley. Information dynamics:
Patterns of expectation and surprise in the perception of
music. Connection Science, 21(2):89–117, June 2009.
[16] R. G. James, C. J. Ellison, and J. P. Crutchfield.
Anatomy of a Bit: Information in a Time Series Ob-
servation. Chaos, 21(3):1–15, 2011.
[17] J. P. Crutchfield and K. Young. Inferring statistical com-
plexity. Phys. Rev. Let., 63:105–108, 1989.
[18] Y. B. Pesin. Characteristic Lyapunov exponents and
smooth ergodic theory. Russ. Math. Surveys, 32(4):55–
114, 1977.
[19] Window width is adaptively chosen in inverse proportion
to the LCE. When the latter is low we use a longer win-
dow than when the system is fully chaotic. The minimum
window width of L = 31 and adaptive widths were cho-
sen so that numerical estimates varied by less than 0.01%
when the width is incremented.
6
Storage
1 1
|1
Supplementary Material a+1 a+1
B C B C
Ryan G. James, Korana Burke, and James P. 1 1 1|0 1|1
Crutchfield
a a a a
2(a+1) 2(a+1) 2(a+1)
|1 2(a+1)
|1
D D
of Eq. (4).
We explicitly implement this calculation for one pa-
1.0
D rameter value of the Tent map. In particular, consider
B C the Misiurewicz point where f 4 ( 12 ) = f 5 ( 12 ). Solving this
0.5 A constraint gives parameter value:
2
a=α+ (12)
3α
0.0
= 1.76929235 . . .
rq
D 3 19
where α = 27 + 1. There, the Tent map admits a
C
Markov partition [S3], as Fig. 5 demonstrates. From this,
a Markov chain is constructed and the generating parti-
xn+1
A 1
Misiurewicz parameter a has the following information
|0
a+2
measures (in bits per step):
a2 +2a
|0
2a2 +4a+2 p
3
√ p
3
√ !
1|1
a
2a+2
|1 C D a2 +2a+2
2a2 +4a+2
|1 9 + 57 + 9 − 57
hµ = log2 a = log2 2
a+1
a+2
|1 33
a+2
B
2a+2
|0 = 0.823172 . . .
1 2 4 9
rµ = 3− − +
FIG. 7. The (unifilar) -machine for the hidden Markov model 4 a + 1 a + 2 2a + 3
√
p
of Fig. 6(right). That is, for each state there is at most a single 3