Вы находитесь на странице: 1из 7

Santa Fe Institute Working Paper 13-09-030

arxiv.org:1309.5504 [nlin.CD]
Chaos Forgets and Remembers:
Measuring Information Creation, Destruction, and Storage

Ryan G. James,1, ∗ Korana Burke,1, † and James P. Crutchfield1, 2, ‡


1
Complexity Sciences Center and Physics Department,
University of California at Davis, One Shields Avenue, Davis, CA 95616
2
Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501
(Dated: December 16, 2013)
The hallmark of deterministic chaos is that it creates information—the rate being given by the
Kolmogorov-Sinai metric entropy. Since its introduction half a century ago, the metric entropy has
been used as a unitary quantity to measure a system’s intrinsic unpredictability. Here, we show
that it naturally decomposes into two structurally meaningful components: A portion of the created
information—the ephemeral information—is forgotten and a portion—the bound information—is
remembered. The bound information is a new kind of intrinsic computation that differs fundamen-
tally from information creation: it measures the rate of active information storage. We show that
it can be directly and accurately calculated via symbolic dynamics, revealing a hitherto unknown
richness in how dynamical systems compute.
Keywords: chaos, entropy rate, bound information, Shannon information measures, information
diagram, Tent map, Logistic map, Lozi map

PACS numbers: 05.45.-a 89.75.Kd 89.70.+c 05.45.Tp

the now-global financial engineering industry.) We make


The world is replete with systems that generate this distinction rigorous here, dividing a system’s infor-
information—information that is then encoded in a va- mation generation into a component that is relevant to
riety of ways: Erratic ant behavior eventually leads to temporal structure and a component divorced from it.
intricate, structured colony nests [1, 2]; thermally fluc- We show that the temporal component captures the sys-
tuating magnetic spins form complex domain structures tem’s internal information processing and, therefore, is
[3]; music weaves theme, form, and melody with sur- of practical interest when harnessing the chaotic nature
prise and innovation [4]. We now appreciate that the of physical systems to build novel machines and devices
underlying dynamics in such systems is frequently de- [11]. We first introduce the new measures, describe how
terministic chaos [5, 6]. In others, the underlying dy- to interpret and calculate them, and then apply them
namics appears to be fundamentally stochastic [7]. For via a generating partition to analyze several dynamical
continuous-state systems, at least, one operational dis- systems—the Logistic, Tent, and Lozi maps—revealing a
tinction between deterministic chaos and stochasticity is previously hidden form of active information storage.
found in whether or not information generation diverges We observe these systems via an optimal measuring
with measurement resolution [8]. This result calls back to instrument—called a generating partition—that encodes
Kolmogorov’s original use [9] of Shannon’s mathematical all of their behaviors in a stationary process: A distribu-
theory of communication [10] to measure a system’s rate tion Pr(. . . , X−2 , X−1 , X0 , X1 , X2 , . . .) over a bi-infinite
of information generation in terms of the metric entropy. sequence of random variables with shift-invariant statis-
Since that time, metric entropy has been understood as tics. A contiguous block of observations Xt:t+` begins at
a unitary quantity. Whether deterministic or stochastic, index t and extends for length `. (The index is inclusive
it is a system’s degree of unpredictability. Here, we show on the left and exclusive on the right.) If an index is
that this is far too simple a picture—one that obscures infinite, we leave it blank. So, a process is compactly de-
much. noted Pr(X: ). Our analysis splits X: into three segments:
To ground this claim, consider two systems. The first, the present X0 , a single observation; the past X:0 , every-
a fair coin: Each flip is independent of the others, lead- thing prior; and future X1: , everything that follows.
ing to a simple uncorrelated randomness. As a result, The information-theoretic relationships between these
no statistical fluctuation is predictively informative. For three random variable segments are graphically expressed
the second system consider a stock traded via a finan- in a Venn-like diagram, known as an I-diagram [12]; see
cial market: While its price is unpredictable, the direc- Fig. 1. The rate hµ of information generation is the
tion and magnitude of fluctuations can hint at its future amount of new information in an observation X0 given
behavior. (This, at least, is the guiding assumption of
2

H[X0 ] future but not in the past and information (rµ ) in the
present but in neither the past nor the future.

H[X:0 ]
rµ H[X1: ]
The rµ component was first studied by Verdú and
Weissman [13] as the erasure entropy (their H − ) to mea-
sure information loss in erasure channels. To emphasize
ρµ bµ that it is information existing only in a single moment—
created and then immediately forgotten—we refer to rµ
as the ephemeral information. The second component bµ
we call the bound information since it is information cre-
ated in the present that the system stores and that goes
FIG. 1. A process’s I-diagram showing how the past X:0 , on to affect the future [14]. It was first studied as a mea-
present X0 , and future X1: partition each other into seven sure of “interestingness” in computational musicology by
distinct information atoms. We focus only on the four re- Abdallah and Plumbley [15]. For a more complete analy-
gions contained in the present information H[X0 ] (blue cir-
cle). That is, the present decomposes into three components: sis of this decomposition, as well as computation methods
ρµ (horizontal lines), rµ (vertical lines), and bµ (diagonal and related measures, see Ref. [16].
crosshatching). The redundant information ρµ overlaps with Isolating the information H[X0 ] contained in the
the past H[X:0 ]; the ephemeral information rµ falls outside present and identifying its components provides the par-
both the past and the future H[X1: ]. The bound information
bµ is that part of H[X0 ] which is in the future yet not in the titioning illustrated in Fig. 1. This is a particularly intu-
past. itive way of thinking about the information contained in
an observation. While, some behavior (ρµ ) can be pre-
dicted, the rest (hµ = bµ + rµ ) cannot. Of that which
all the prior observations X:0 : cannot be predicted, some (bµ ) plays a role in the future
behavior and some (rµ ) does not. As such, this is a nat-
hµ = H[X0 |X:0 ] , (1) ural decomposition of a time series; one that results in a
semantic dissection of the entropy rate.
where H[Y |Z] denotes the Shannon conditional entropy By way of an example, consider a few simple processes
of random variable Y given variable Z. This quantity and how their present information decomposes into these
arises in various contexts and goes by many names: e.g., three components. A periodic process of alternating 0s
the Shannon entropy rate and the Kolmogorov-Sinai met- and 1s (. . . 01010101 . . .) has H[X0 ] = 1 bit since 0s and
ric entropy, mentioned above [8]. The complement of the 1s occur equally often. Given a prior observation, one can
entropy rate is the predicted information ρµ : accurately predict exactly which symbol will occur next
and so H[X0 ] = ρµ = 1 bit, while rµ = bµ = 0 bits. On
ρµ = I[X:0 : X0 ] , (2)
the other extreme is a fair coin flip. Again, each outcome
where I[Y : Z] denotes the mutual information between is equally likely and so H[X0 ] = 1 bit. However, each flip
random variables Y and Z [12]. Hence, ρµ is the infor- is independent of all others and so H[X0 ] = rµ = 1 bit,
mation in the present that can be predicted from prior while ρµ = bµ = 0 bits.
observations. Together, we have a decomposition of the Between these two extrema lie interesting processes:
information contained in the present: H[X0 ] = hµ + ρµ . those with stochastic structure. Processes expressing a
A simple application of the entropy chain rule [12] to fixed template, like the periodic process above, contain
Eq. (1) leads us to a different view: a finite amount of information. Those with stochastic
structure, however, constantly generate information and
hµ = I[X0 : X1: |X:0 ] + H[X0 |X:0 , X1: ] store it in the form of patterns. Being neither purely
predictable nor independently random, these patterns are
= bµ + rµ . (3)
captured by bµ . The more intricate the organization, the
This introduces two new information measures: larger bµ . More to the point, generating these patterns
requires intrinsic computation in a system—information
bµ = I[X0 : X1: |X:0 ] and (4) creation, storage, and transformation [17]. We propose
rµ = H[X0 |X:0 , X1: ] . (5) bµ as a simple method of discovering this type of physical
computation: Where there are intricate patterns, there
That is, created information (hµ ) decomposes into two is sophisticated processing.
parts: information (bµ ) shared by the present and the How useful is the proposed decomposition and its mea-
3

hµ bµ rµ hµ bµ rµ
1.0 1.0

0.8 0.8
Information Rates

Information Rates
0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
3.55 3.6 3.7 3.8 3.9 4.0 1.0 1.2 1.4 1.6 1.8 2.0
Control Parameter a Control Parameter a

FIG. 2. Logistic map information anatomy as a function of FIG. 3. Tent map information anatomy: Although hµ =
control parameter a: Bound information bµ is the lower (green bµ + rµ is a smooth function of control—hµ = log2 a—the
shaded) component; ephemeral information rµ is the upper decomposition into bound and ephemeral informations is not.
(blue shaded) component. Entropy rate is the top (blue) line: Graphics layout as in previous figure.
hµ = bµ + rµ . As reference to the dynamical behavior, the
map’s bifurcation diagram is displayed in the background.

mation generation is, in fact, a mixture of ephemeral


(rµ ) and bound (bµ ) informations at nearly all chaotic
sures? To answer this we analyze several discrete-time (hµ > 0) parameter values. The second is that the
chaotic dynamical systems—the Logistic and Tent maps division into the two components varies in a nontrivial
of the interval and the Lozi map of the plane—uncovering way as a function of the control parameter a. More-
a number of novel properties embedded in these famil- over, the boundary between the two appears nondiffer-
iar and oft-studied systems. As an independent cal- entiable. At first blush, this is not surprising given that
ibration for the measures, we employ Pesin’s theorem their sum hµ (= λ) is known to be nondifferentiable. Fi-
[18]: hµ is the sum of the positive Lyapunov character- nally, bµ vanishes nontrivially only at parameters that
istic exponents (LCEs). The maps here have at most coincide with the merging of the chaotic bands (e.g.,
one positive LCE λ, so hµ = max{0, λ}. The symbols a = 4.0, 3.67857 . . . , 3.59257 . . . , . . .). Thus, the informa-
s0 , s1 , s2 , . . . , sN for each process we analyze come from tion generated by the Logistic map at these parameters
a generating partition. We produce a long sample of is entirely forgotten.
N ≈ 1010 symbols, extracting subsequence statistics via Is the complex and nondifferentiable boundary be-
a sliding window [19]. Each window consists of a past, tween rµ and bµ simply a consequence of the entropy
present, and future symbol sequence and we estimate rµ rate’s complicated behavior or due a dynamical mecha-
and bµ using truncated forms of Eqs. (4) and (5). nism distinct from information creation? We answer this
Consider first the Logistic map, perhaps one of the by analyzing the Tent map:
most studied chaotic systems:  
a 1
xn+1 = 1 − 2 xn − ,
(8)
xn+1 = axn (1 − xn ) , (6) 2 2

where a ∈ [0, 4] is the control parameter and the initial where a ∈ [0, 2] is the control parameter. The gener-
condition is x0 ∈ [0, 1]. Its generating partition is defined ating partition for the Tent map is the same as for the
by: Logistic map. Since the Tent map is piecewise linear, its
( Lyapunov exponent is simply λ = log2 a and, by Pesin’s
1
0 if xn < 2
theorem, so is the information generation hµ = log2 a; a
sn = . (7)
1 if xn ≥ 1 rather smooth parameter dependence. As a result, the
2
intricate structures exhibited in the Tent map’s bifurca-
Figure 2 shows the resulting measures as a function of tion diagram cannot be resolved by studying solely the
control a, with the map’s bifurcation diagram displayed behavior of the Lyapunov exponent (or hµ ) itself. Fig-
in the background for reference. ure 3 demonstrates that, despite the entropy rate’s sim-
The first point of interest is that the system’s infor- ple logarithmic dependence on control, its decomposition
4

0.90 1.0 0.90


ner. There are swaths of low bµ corresponding to “fuzzy”
0.27

0.9
mergings of chaotic bands. Notably, while the maximal
hµ bµ 0.24
hµ occurs along the line b = 0, maximal bµ occurs far
0.8
0.21
0.45 0.45 from b = 0. Hence, large hµ does not necessarily imply
Control Parameter b

0.7
0.18 large bound information bµ .
0.6
0.15 To sum up, we showed that a process’s information cre-
0.00 0.5 0.00
0.12 ation rate decomposes, via a chain rule, into two struc-
0.4
0.09 turally meaningful components. The components, the
0.3
−0.45 −0.45
0.06
ephemeral information rµ and the bound information bµ ,
0.2
provide direct insights into a system’s behavior without
0.1 0.03
detailed modeling or appealing to domain-specific knowl-
−0.90 0.0 −0.90 0.00
1.0 1.2 1.4 1.6 1.8 2.0 1.0 1.2 1.4 1.6 1.8 2.0 edge. That is to say, they are relatively easily defined
Control Parameter a Control Parameter a
measures that can be straightforwardly estimated. More
FIG. 4. Lozi map information anatomy: (Left) hµ as a func- to the point, however, bµ is a strong indicator of intrin-
tion of controls a and b. (Right) bµ similarly. bµ is maximized sic computation. While related to information genera-
on the upper-right and lower-right edges of the a-b region that tion, we demonstrated that it captures a different kind
supports an attractor near the origin.
of informational processing—a mechanism that actively
stores information.
hµ = bµ + rµ is not a smooth function of a. To em- Concretely, decomposing information creation in the
phasize, in sharp contrast with hµ ’s simplicity, rµ and symbolic dynamics of the Logistic, Tent, and Lozi
bµ again appear nondifferentiable—a complexity masked systems delineated the topography of their intrinsic-
by the smooth hµ . Thus, the two informational com- computation landscape. Awareness of this rich (and pre-
ponents capture a property in the chaotic system’s be- viously hidden) landscape will lead to improved engineer-
havior that is both quantitatively and qualitatively new. ing of natural systems as substrates for information pro-
As with the Logistic map, we once again find that the cessing [11]. And, it will lead to an expanded under-
bound information vanishes and that all of the informa- standing of evolved information processing systems, such
tion the Tent map generates is forgotten (hµ = rµ ) at as the linguistic processes comprising human natural lan-
parameters corresponding to merging of chaotic bands guages. A sequel will develop the decomposition further,
k
(a = 21/2 , k = 0, 1, 2, . . .). In the Supplementary Mate- including a geometric interpretation of active information
rials we show how to calculate bµ and rµ in closed form storage that parallels the geometric view of information
for the Tent map at Misiurewicz parameters. creation expressed in the Lyapunov exponents.
To explore how these measures apply more generally, KB is supported by a UC Davis Chancellor’s Post-
we extend information anatomy to two dimensions by Doctoral Fellowship. This work was partially supported
analyzing the Lozi map: by ARO grant W911NF-12-1-0288.

xn+1 = 1 − a |xn | + yn (9)


yn+1 = bxn .

rgjames@ucdavis.edu

The map exhibits an attractor near the origin within a kburke@ucdavis.edu

diamond-shaped parameter region inside (a, b) ∈ [1, 2] × chaos@ucdavis.edu
[−0.9, 0.9]. Note that when b = 0 the map becomes iso- [1] E. Bonabeau, G. Theraulaz, and M. Dorigo, editors.
Swarm Intelligence: From Natural to Artificial Systems.
morphic to the Tent map. The generating partition is
Oxford University Press, New York, 1999.
given by: [2] S. Camazine, J.-L. Deneubourg, N. R. Franks, J. Sneyd,
( G. Theraulas, and E. Bonabeau. Self-Organization in Bi-
0 if xn < 0 ological Systems. Princeton University Press, New York,
sn = . (10)
1 if xn ≥ 0 2003.
[3] J. J. Binney, N. J. Dowrick, A. J. Fisher, and M. E. J.
Figure 4 shows hµ (left) and bµ (right) in the attract- Newman. The Theory of Critical Phenomena. Oxford
University Press, Oxford, 1992.
ing parameter region. Mirroring the Tent map, the Lozi
[4] B. R. Simms. Music of the Twentieth Century: Style and
map’s entropy rate varies smoothly over the attractor
Structure. Schirmer Books, New York, second edition,
region, whereas bµ varies in a more complicated man-
5

1996.
[5] S. H. Strogatz. Nonlinear Dynamics and Chaos: With
Applications to Physics, Biology, Chemistry, and Engi-
neering. Westview Press, 1994.
[6] J. B. Jose and E. J. Saletan. Classical Dynamics: A
Contemporary Approach. Cambridge University Press,
New York, 1998.
[7] R. F. Streater. Statistical Dynamics: A Stochastic Ap-
proach to Nonequilibrium Thermodynamics. Imperial
College Press, London, second edition, 2009.
[8] P. Gaspard and X.-J. Wang. Noise, chaos, and
(epsilon,tau)-entropy per unit time. Physics Reports,
235(6):291–343, 1993.
[9] A. N. Kolmogorov. Entropy per unit time as a metric
invariant of automorphisms. Dokl. Akad. Nauk. SSSR,
124:754, 1959. (Russian) Math. Rev. vol. 21, no. 2035b.
[10] C. E. Shannon. A mathematical theory of communica-
tion. Bell Sys. Tech. J., 27:379–423, 623–656, 1948.
[11] W. L. Ditto, A. Miliotis, K. Murali, S. Sinha, and M. L.
Spano. Chaogates: Morphing logic gates that exploit
dynamical patterns. Chaos, 20(3):037107, 2010.
[12] R. W. Yeung. Information Theory and Network Coding.
Springer, New York, 2008.
[13] S. Verdú and T. Weissman. The Information Lost in Era-
sures. IEEE Trans. Info. Th., 54(11):5030–5058, 2008.
[14] Our terminology avoids the misleading use of the phrase
“predictive information” for bµ . The latter is not the
amount of information needed to predict the future.
Rather, it is part of the predictable information—that
portion of the future which can be predicted.
[15] S. A. Abdallah and M. Plumbley. Information dynamics:
Patterns of expectation and surprise in the perception of
music. Connection Science, 21(2):89–117, June 2009.
[16] R. G. James, C. J. Ellison, and J. P. Crutchfield.
Anatomy of a Bit: Information in a Time Series Ob-
servation. Chaos, 21(3):1–15, 2011.
[17] J. P. Crutchfield and K. Young. Inferring statistical com-
plexity. Phys. Rev. Let., 63:105–108, 1989.
[18] Y. B. Pesin. Characteristic Lyapunov exponents and
smooth ergodic theory. Russ. Math. Surveys, 32(4):55–
114, 1977.
[19] Window width is adaptively chosen in inverse proportion
to the LCE. When the latter is low we use a longer win-
dow than when the system is fully chaotic. The minimum
window width of L = 31 and adaptive widths were cho-
sen so that numerical estimates varied by less than 0.01%
when the width is incremented.
6

Chaos Forgets and Remembers: A A


Measuring Information Creation, Destruction, and 1
2
1
2
1
2
|0 1
2
|0

Storage
1 1
|1
Supplementary Material a+1 a+1
B C B C
Ryan G. James, Korana Burke, and James P. 1 1 1|0 1|1
Crutchfield
a a a a
2(a+1) 2(a+1) 2(a+1)
|1 2(a+1)
|1
D D

COMPUTING BOUND AND EPHEMERAL


INFORMATIONS ANALYTICALLY FIG. 6. (Left) Markov chain induced by the Markov partition.
(Right) The generating partition applied to the transitions,
resulting in a hidden Markov model that describes the Tent
To obviate the data requirements for accurately esti- map’s symbolic stochastic process.
mating bµ from a time series, we present a method for
computing it analytically, in closed form. An analytic ex- xn+1 = 1 − a |xn | + yn yn+1
xn+1 = 1 − a |x | + y
= bx y =
xn+1 = 1 − a |xn | + yn yn+1
xn+1 =nn1 − a |xnn | + ynn
= bx yn+1
n+1 =
pression is possible if one can construct forward-time and sufficient statistics. From these, bµ can be computed via:
reverse-time models of the system. These models are suf-
ficient statistics of the past about the present and future, bµ = I[X0 : S0+ |S1− ] , (11)
and the future about the present and past, respectively.
Here, we use -machines [S1, S2], which are the minimal where S0+ is the forward -machine’s state random vari-
able at time 0—the minimal sufficient statistic of the past
about the present and future—and S1− is the reverse-time
0.0 0.2 0.4 0.6 0.8 1.0 -machine’s state random variable at time 1—the mini-
2.0
mal sufficient statistic of the future about the present
and future. Due to their standing as sufficient statistics,
1.5 these states stand in for the future X1: and the past X:0
Probability Density

of Eq. (4).
We explicitly implement this calculation for one pa-
1.0
D rameter value of the Tent map. In particular, consider
B C the Misiurewicz point where f 4 ( 12 ) = f 5 ( 12 ). Solving this
0.5 A constraint gives parameter value:

2
a=α+ (12)

0.0
= 1.76929235 . . .
rq
D 3 19
where α = 27 + 1. There, the Tent map admits a

C
Markov partition [S3], as Fig. 5 demonstrates. From this,
a Markov chain is constructed and the generating parti-
xn+1

B tion overlaid. The result is the hidden Markov model of


Fig. 6(right) that exactly describes the map’s symbolic
A
dynamics stochastic process.
Equation (11) requires the model to be a sufficient
statistic to calculate bµ . And so, we transform the hid-
A B C D
xn den Markov model of Fig. 6(right) to one that is unifilar
and, in particular, to the -machine of Fig. 7.
FIG. 5. (Above) Tent map’s invariant distribution at param- As the final step, we construct the process’s bidirec-
eter a as defined in Eq. (12), consisting of three contiguous, tional machine [S1, S2] from the -machine; the re-
uniformly distributed parts, each differently colored for clar-
ity. (Below) The same colors superimposed on the map itself sult is shown in Fig. 8. Then, from it we calculate
along with (dotted line) guides to show that the uniform com- the joint distribution Pr(S0+ , S0− , X0 , S1+ , S1− ). This, in
ponents of the invariant distribution do indeed form a Markov turn, allows one to calculate bµ = H[X0 |S0+ , S1− ] and
partition. rµ = I[X0 : S1− |S0+ ]. We find that the Tent map at the
7

A 1
Misiurewicz parameter a has the following information
|0
a+2
measures (in bits per step):
a2 +2a
|0
2a2 +4a+2 p
3
√ p
3
√ !
1|1
a
2a+2
|1 C D a2 +2a+2
2a2 +4a+2
|1 9 + 57 + 9 − 57
hµ = log2 a = log2 2
a+1
a+2
|1 33
a+2
B
2a+2
|0 = 0.823172 . . .
 
1 2 4 9
rµ = 3− − +
FIG. 7. The (unifilar) -machine for the hidden Markov model 4 a + 1 a + 2 2a + 3

p 
of Fig. 6(right). That is, for each state there is at most a single 3

outgoing transition labeled with each symbol. This makes the


1  207 57 − 1349 32
= − q √  +7

states a function of the past X:0 and so allows for the required 9 192/3 3
19 207 57 − 1349
calculation.
= 0.648258 . . .
1
+|1y
xn+1 = 1 − a |xn |a+1 yn+1 = bxn
xn+1 = 1 − a |xn | + ynn yn+1 = bxn bµ = hµ − rµ
B:A A:B
1
|0
= 0.174915 . . . .
2
a 1
1|0 a+1
|1 2
|1

C:C B:C Supplementary References


1
1|0 2
|0
1
|1
2
S1. J. P. Crutchfield, C. J. Ellison, and J. R. Mahoney,
1 1 a
|1 |1 |1
a+1 2 a+1
“Time’s Barbed Arrow: Irreversibility, Crypticity, and
D:A D:B D:C C:B
Stored Information”, Phys. Rev. Lett. 103:9 (2009)
a 1
|1 |0
a+1 2
094101.
S2. C. J. Ellison, J. R. Mahoney, and J. P. Crutchfield,
1
|1
a+1
“Prediction, Retrodiction, and the Amount of Informa-
tion Stored in the Present”, J. Stat. Phys. 136:6 (2009)
FIG. 8. Bidirectional machine of the stochastic process gener-
ated by the Tent map’s symbolic dynamics at the Misiurewicz 1005–1034.
xn+1 = 1 − a |xn | + y+ yn+1 = bxn
parameter a. Its states
xn+1 = 1 are |xn | +Sy0nn : S0− .yn+1
− a pairs By =utilizing
bxn the S3. D. Lind and B. Marcus. An Introduction to Sym-
dynamic (the edges that connect states), it allows one to di- bolic Dynamics and Coding. Cambridge University Press,
rectly calculate H[X0 |S0+ , S1− ] and, thus, rµ and bµ . 1999.

Вам также может понравиться