Академический Документы
Профессиональный Документы
Культура Документы
domains, and this division enables large-scale motion in the we get 25 ¼ 32 effective AA species, which are encoded as
protein [23]. This provides a concrete map between five-letter binary codons [24]. These codons specify the
sequence, configuration, and function of the protein. The bonds in the protein in a 2550-long sequence of the gene
computational simplicity allows for a massive survey of [5 × 30 × ð18 − 1Þ, because the lowest layer is connected
the sequence universe, which reveals a strong signature of only upwards].
the protein’s structure and function within correlation To become functional, we want the protein to evolve to a
ripples that appear in the space of DNA sequences. configuration of AAs and bonds that can transduce a
mechanical signal from a prescribed input at the bottom
of the cylinder to a prescribed output at its top [25]. The
II. MODEL AND RESULTS solution we search turns out to be a large-scale, low-energy
Here, we give a summary and interpretation of our deformation where one domain moves rigidly with respect
results. The Appendixes contain further details and explain to another in a shear or hinge motion, which is facilitated by
choices we make in designing the model as close as the presence of a fluidized, floppy channel separating the
possible to real proteins. rigid domains [27–29].
These large-scale deformations are governed by the
rigidity pattern of the configuration, which is determined
A. Mechanical model of protein evolution by the connectivity of the AA network via a simple majority
Our model is based on two structures: a gene, and a rule (Fig. 1), which we detail in Appendix A 3. The basic
protein, which are coupled by the genotype-to-phenotype idea is that each AA can be either rigid or fluidized and that
map. The coarse-grained protein is an aggregate of amino this rigidity state propagates upwards: Depending on the
acids, modeled as beads, with short-range interactions number of bonds and the state of other AAs in its immediate
given as bonds (Fig. 1). A typical protein is made of neighborhood, an AAwill be rigidly connected, “shearable,”
several hundred AAs, and we take N ¼ 540. We layer the i.e., loosely connected, or in a pocket of less connected AAs
AAs on a cylinder, 18 high and 30 wide, similar to within a rigid neighborhood [30]. As the sequence and hence
dimensions of globular proteins. The cylindrical configu- the connections mutate, the model protein adapts to the
ration allows for fast calculation of the low-energy modes, desired input-output relation specified by the extremities of
and thereby fast evolution of the protein. Each AA may the separating fluid channel [Fig. 1 (right)].
connect to the nearest five AAs in the layer below, so that The model is easy to simulate: We start from a random
gene of 2550 bits, and at each time step we flip a randomly
drawn bit, thus adding or deleting a bond. In a zero-
temperature Metropolis fashion, we keep only mutations
that do not increase the distance from the target function,
i.e., the number of errors between the state in the top row
and the prescribed outcome. Note that, following the logics
of biological evolution, the “fitness” of the protein is only
measured at its functional surface (e.g., where a substrate
binds to an enzyme) but not in its interior.
Typically, after 103 –105 mutations this input-output
problem is solved (Fig. 2). Although the functional
sequences are extremely sparse among the 22550 possible
sequences, the small bias for getting closer to the target in
configuration space directs the search rather quickly.
Therefore, we could calculate as many as 106 runs of
the simulation, which gives 106 independent solutions of
FIG. 1. The main features of the physical model. Left: The the evolutionary task.
mapping from the binary gene to the connectivity of the amino
acid (AA) network that makes a functional protein. AAs are B. Dimensional reduction in the
beads and links are bonds. The color of the AAs represents their phenotype-to-genotype map
rigidity state as determined by the connectivity according to the
Thanks to the large number of simulations, we can
algorithm of Appendix A 3. Each AA can be in one of three
states: rigid (gray) or fluid (i.e., nonrigid), which are divided
explore vast regions of the genetic universe. That the
between shearable (blue) and nonshearable (red). Right: The AAs sampling is well distributed can be seen from the typical
in the model protein are arranged in the shape of a cylinder, in this intersequences distance, which is comparable with the
case with a fluid channel (blue region). Such a configuration can universe diameter (Fig. 4). This also indicates that the
transduce a mechanical signal of shear or hinge motion along the dimension of the solution set is high. Indeed, the observed
fluid channel. dimension of sequence space, as estimated following
021037-2
PHYSICAL MODEL OF THE GENOTYPE-TO-PHENOTYPE … PHYS. REV. X 7, 021037 (2017)
021037-3
TLUSTY, LIBCHABER, and ECKMANN PHYS. REV. X 7, 021037 (2017)
021037-4
PHYSICAL MODEL OF THE GENOTYPE-TO-PHENOTYPE … PHYS. REV. X 7, 021037 (2017)
021037-5
TLUSTY, LIBCHABER, and ECKMANN PHYS. REV. X 7, 021037 (2017)
E. Fluid channel supports low-energy shear modes F. Proteins can adapt simultaneously to multiple tasks
The evolved rigidity pattern supports low-energy modes Our models are designed to trace the evolution of a
with strain localized in the floppy, fluid channel. We test mechanical function and show how it constrains the
whether the evolved AA network indeed induces such genotype-to-phenotype map, as we show above. Real
modes (Fig. 9), by calculating the mechanical spectrum of a proteins also evolve towards other essential functions, such
spring network in which bonds are substituted by harmonic as binding affinity and biochemical catalysis at specific
springs. The shear motion of the network is characterized binding sites. Here, we examine another important molecu-
by the modes of H, its elastic tensor. H is the 2N × 2N lar trait, stability.
curvature matrix in the harmonic expansion of the elastic Many studies examine the energetic stability of the
energy E ≃ 12 δrT Hδr, where δr is the 2N vector of the 2D protein, as measured by its overall free energy (ΔG)
displacements of the N AAs. H has the structure of the [4,5,8]. In the present model, this free energy is given
network Laplacian multiplied by the 2 × 2 tensors of by the number of bonds, which represent chemical and
directional derivatives (see Appendix B 3, which is derived physical interactions among the amino acids. The higher
from Ref. [38], pp. 618–619). the number of bonds, the more stable and less flexible the
We trace the mechanical spectrum of the protein during protein. By tuning stability, organisms adapt to their
the evolution of the fluidized channel (a shear band). We environment. Thermophiles that live in hotter places, such
find that the formation of a continuous channel of less as hydrothermal vents, evolve more stable proteins to
connected amino acids indeed facilitates the emergence of withstand the heat. Cryophiles that reside in colder niches
low-energy modes of shear or hinge deformations (Fig. 9). have more flexible proteins [40].
The energy of such low modes nearly vanishes as the We simulate the evolution of the two phenotypes, our
channel is close to completion. Similar deformations, specific dynamical mode together with an energetic state
where the strain is localized in a rather narrow channel, (i.e., a given bond density). We find that the large solution
occur in real proteins, as shown in a recent analysis of set of the mechanical problem allows the protein to select a
structural data [22]. subset with a specific energetic state. Thus, the evolu-
tionary dynamics could find solutions to the same mechani-
cal function when we impose extreme values of bond
density (Fig. 10). This demonstrates the capacity of the
protein to search in parallel for the solutions of several
biological tasks. Evolving a specific binding site is
expected to be an easier task, since such sites are confined
to a small fraction of the protein.
021037-6
PHYSICAL MODEL OF THE GENOTYPE-TO-PHENOTYPE … PHYS. REV. X 7, 021037 (2017)
A0 A1 A2
FIG. 11. The AA interaction model. (a) Dimensions of the
A0 1 1 1
genotype and phenotype spaces are similar to the standard model
A1 1 1 0
(Fig. 3). (b) Left: The first few eigenfunctions for the configu-
A2 1 0 0
ration. Right: The same for the bond patterns.
021037-7
TLUSTY, LIBCHABER, and ECKMANN PHYS. REV. X 7, 021037 (2017)
Table I. The robustness of the results manifests the predicted by the model in high-resolution maps of molecu-
universality of the dimensional reduction that originates lar fitness landscapes [48–51].
from the continuity of the mechanical transduction. Past studies have shown that the motion of proteins
[52–55] and their hydrophobicity patterns [56] may often
be approximated by a few normal modes, while others have
III. CONCLUSIONS demonstrated that the variation in aligned sequences may
Our models of the genotype-to-phenotype map put be characterized by a few correlation modes [39,45–47].
forward a new physical picture of protein evolution. Our The present study links the genotype and phenotype spaces,
thesis is that rather than structure itself, it is the dynamics and explains the dimensional reduction as the outcome of a
that governs protein fitness. Our method considers proteins nonlinear mapping between genes and patterns of mechani-
as evolving amorphous matter with a mechanical function, cal forces: We characterize the emergent functional mode to
a specific low-energy conformational change. The rigidity be a soft, floppy mode, localized around a fluidized channel
or shearability pattern of the protein, and hence its (a shear band), a region of lower connectivity that is
dynamical modes, are determined by the connectivity of therefore easier to deform. The contiguity of this rigidity
the amino acid interaction network. The model explains pattern implies that it can be described by a few collective
how the spatially extended modes appear as the gene degrees of freedom, implying a vast dimensional reduction
mutates and changes the amino acid network. These modes of configuration space.
are shear and hinge motions where the strain is localized in The concrete genotype-to-phenotype map in our simple
the shearable channel and where the surrounding domains models demonstrates that most of the gene records random
translate or rotate as rigid bodies (Fig. 9). evolution, while only a small nonrandom fraction is
A main insight from our model is that requiring the constrained by the biophysical function. This drastic
protein to have floppy modes puts strong constraints on the dimensional reduction is the origin of the flexibility and
space of mechanical phenotypes. As a consequence, there evolvability in the functional solution set.
is a huge dimensional reduction when mapping genotypes
to phenotypes. We find that the collective mechanical ACKNOWLEDGMENTS
interactions among the amino acids are mirrored in corre- We thank Stanislas Leibler, Michael R. Mitchell,
sponding modes of sequence correlation in the genes. Elisha Moses, and Giovanni Zocchi for essential discus-
These main results do not depend on details of the model sions and encouragement. We thank T. Guhr for helpful
and have been reproduced in versions with (i) a different discussions on the spectra of constrained random matrices.
number of AA species (16 instead of 32), (ii) bonds that J.-P. E. is supported by an ERC advanced grant “Bridges,”
depend on pairwise interactions, and (iii) a harmonic spring and T. T. is supported by the Institute for Basic Science
network [26]. All these suggest that the results are generic IBS-R020, Ulsan National Institute of Science and
and apply to a wide range of realizations. Technology (UNIST) 2016 Research Fund (160070.01),
Our models are distilled to their simplest physical- and the Simons Center for Systems Biology of the Institute
mathematical schemes, but have concrete, experimentally for Advanced Study, Princeton, NJ.
testable predictions. In the functional protein, the least
random, strongly correlated sites are concentrated in a rigid APPENDIX A: PROTEIN EVOLUTION MODEL
shell that envelops the shearable channel [22]. Our model,
therefore, predicts that these sites are also the most 1. Cylindrical amino acid network
vulnerable to mutations [Fig. 8(b)], which distort the We model the protein as an aggregate of amino acids
low-frequency modes and thus hamper the biological with short-range interactions. In our coarse-grained model,
function. These effects can be examined by combining beads represent the AAs and bonds their interactions with
mutation surveys, biochemical assays of the function, and neighboring AAs (Fig. 1). We consider a simplified cylin-
physical measurements of the low-frequency spectrum, drical geometry, where the AAs are layered on the surface of a
especially in allosteric proteins. cylinder at randomized positions, to represent the noncrystal-
To that end, one may take an enzyme with a known shear line packing of this amorphous matter. Throughout this study,
band (via analysis similar to Ref. [22]) and mutate amino we examine a geometry with height hð¼ 18Þ, i.e., the number
acids within and around the band. We expect the mutation of layers in the z direction, and width wð¼ 30Þ, i.e., the
of these amino acids to have a significant impact on the circumference of the cylinder. When the cylinder is shown as
dynamics and biochemical function of the protein, as a flat 2D surface (such as in Fig. 2), there are still periodic
compared to other mutations in the rigid subdomains. boundary conditions in the horizontal w direction. The row
By sequence alignment methods [39,45–47], it is possible and column coordinates of an AA are ðr; cÞ, with r for the row
to test whether these sensitive positions in the protein ð1; …; hÞ and c for the column ð1; …; wÞ. The cylindrical
exhibit strong correlations in the gene, as predicted by the periodicity is accounted for by taking the horizontal coor-
model. One may also search for the dimensional reduction dinate c modulo w ¼ 30, c → modw ðc − 1Þ þ 1.
021037-8
PHYSICAL MODEL OF THE GENOTYPE-TO-PHENOTYPE … PHYS. REV. X 7, 021037 (2017)
021037-9
TLUSTY, LIBCHABER, and ECKMANN PHYS. REV. X 7, 021037 (2017)
X
1 Also, the propagation of fluidity should not be confused
sr;c ¼ ð1 − σ r;c Þθ sr−1;cþk − s0 ; ðA2Þ with learning in neural networks, but is rather of the
k¼−1 percolation type.
where s0 ¼ 1. The first term on the lhs ensures that a solid
AA can never become shearable. This completes the 5. Simulation of evolutionary dynamics
definition of the map from the sequence to the state.
All simulations are done on the 30 × 18 ¼ 540 play-
ground, as described above. We do simulations for many
4. Fitness and mutations
variants of the model, and many targets, but we present
As we explain above, the aim is to find a functional only two specific problems, for which we did the most
protein that can transfer forces. To find such a protein, we extensive study: In the first, the fluid regions of the input
start from a random sequences (of 2550 codons), and from and the target are opposite and of length 6 at the bottom
an initial state (input) in the bottom row of the cylinder. and length 5 at the top. In the second run, the top and
This initial state is just made from rigid and fluid beads, as bottom are the same, but the top is shifted sideways by 5
shown, e.g., in Fig. 2. For most simulations, we just take units. We call these two examples straight and tilted,
five consecutive fluid beads among the remaining solid denoted as S and T. We also studied examples in which
beads. the position of the target (relative to the input) is left free,
We next define the target. It is a chain of w values, but here we discuss only the results for the S and T case.
fluid and shearable (σ ¼ 0; s ¼ 1) or solid (σ ¼ 1; s ¼ 0), This serves to illustrate that the results are largely
in the top row, which the protein should yield as independent of the details of the model. We also studied
an output: fσ c ; sc gc¼1;…;w . Given (i) a gene sequence, many other variants, and in all cases, the main results are
which determines the connectivity lrck and (ii) the qualitatively unchanged.
input state fσ 1;c ; s1;c gc¼1;…;w , the algorithm we describe Remark.—We view this as an important outcome of our
above uniquely defines the output state in the top row, theory, namely, that it illustrates a close connection
fσ h;c ; sh;c gc¼1;…;w . At each step of evolution, the output between gene and protein which goes way beyond the
state is compared to the fixed, given target, by measuring simple model we consider here.
the Hamming distance, the number of positions where the For both, S and T, we study 200 independent branches,
output differs from the target: starting from a random sequence with about 90% of the
bonds present at the start. Given any fixed sequence, we
X
w
sweep according to the rules of Eqs. (A1) and (A2) through
F¼ ½1 − ðjsh;c − sc j − 1Þðjσ h;c − σ c j − 1Þ: ðA3Þ the net and measure the Hamming distance F [Eq. (A3)]
c¼1
between the last row and the desired target. When this
In the biological convention, −F is the fitness that should Hamming distance is 0, we consider the problem as solved.
increase towards a maximum value of −F ¼ 0, when the If not, we randomly flip a bond (exchanging 0 with 1)
input-output problem is solved. and recalculate the Hamming distance. We view this flip as
Solutions are found by mutations. At each iteration, a a mutation of the sequence, equivalent to mutating
randomly drawn digit in the gene is flipped; that is, the one nucleic base in a gene. If the Hamming distance
values of 0 and 1 are exchanged. This corresponds to decreases or remains unchanged, we keep the flip, other-
erasing or creating a randomly chosen link of a randomly wise we backtrack and flip another randomly chosen bond.
chosen AA. After each flip, a sweep is performed, and the This is repeated until a solution is found. (This is really a
new output at the top row is again compared to the target. A Metropolis algorithm [57] at zero temperature.) Typically,
mutation is kept only if the Hamming distance is not after 103 –105 mutations this input-output problem is
increased as compared to the value before the mutation solved. Although the functional sequences are extremely
(equivalently, the fitness is not allowed to decrease). sparse among the 22550 possible sequences, the small bias
This procedure is repeated until a solution (F ¼ 0) is for getting closer to the target in configuration space directs
found. This will happen with probability 1, perhaps after the search rather quickly.
very many flips, if the problem has a solution at all. This is Once a solution is found, we destroy it by further
really the Metropolis algorithm [57] (at 0 temperature). mutations and then look for a new solution, as before,
Remark.—It is an important feature of our model that the starting from the destroyed state. This we call a generation.
quality of a network is only measured at the target line. This For each of the 200 branches, we follow 5000 generations,
corresponds to the biological fact that the protein can leading to a total of 106 solutions. The time to recover from
interact with the outside world only through is surface (in a destroyed state is about 1500 flips per error in that state,
our case, the ends of the cylinder). One of the surprising which is similar to the time it takes to find a solution
outcomes of our study is that this requirement has a strong starting from a random gene. A destruction takes around
influence on what happens in the interior of the protein. 11.2 mutations on average.
021037-10
PHYSICAL MODEL OF THE GENOTYPE-TO-PHENOTYPE … PHYS. REV. X 7, 021037 (2017)
We also do another 106 simulations starting each time shearability) of vectors of the configurations. (This is
from another random configuration. The statistics in both reasonable, because, in general, there are very few non-
cases are very similar, but the destruction-reconstruction hearable and fluid AAs.)
simulations obviously show some correlations between a Apart from the numerical findings, which are shown in
generation and the next. This effect disappears after about Fig. 6 for the straight (S) example and in Fig. 7 for the tilted
four generations. (T) one, some comments are in order:
Configuration space.— (The nine figures on the bottom left
APPENDIX B: RESULTS, ANALYSIS, of Figs. 6 and 7.) The first mode is proportional to the
AND INTERPRETATION average configuration. The next modes reflect the basic
deviations of the solution around this average. For example,
1. Dimension of solution set
the second mode is left-to-right shift, the third mode is
The dimension of a space measures the number of expansion-contraction, etc. Since the shearable-nonshear-
directions in which one can move from a point. In the able interface can move at most one AA sideways between
case of our model, since from any sequence in sequence consecutive rows, the modes are constrained to diamond-
space one can move along N S ¼ 2550 axes by flipping just shaped areas in the center of the protein. This is the joint
one bit, we see that the sequence space has dimension 2550, effect of the influence zones of the input and output rows.
and the number of different elements in this space is a Sequence space.— (The nine figures on the bottom
hypercube with 22550 ∼ 10768 elements. right of Figs. 6 and 7.) The first eigenvector is the average
The set of solutions that we find has, however, bond occupancy in the 106 solutions. The higher eigen-
much smaller dimension, as we show in Fig. 3 for the values reflect the structure in the many-body correlations
straight and tilted example. In the case of experimental among the bonds. The typical pattern is that of diffraction
data, such as ours, the dimension is most conveniently or oscillations around the fluid channel. This pattern
determined by the box-counting (Grassberger-Procaccia mirrors the biophysical constraint of constructing a rigid
[33]) algorithm. This is obtained by just counting shell around the shearable region. Higher modes exhibit
the number NðϱÞ of pairs at distances ≤ ϱ, and then finding more stripes, until they become noisy, after about the tenth
the slope in a log-log plot. This is indicated by the black eigenvalue.
lines in Fig. 3, We see that, clearly, the dimension in the The bond spectrum, top right in Figs. 6 and 7, has
space of configurations is about 8–9, while, in the space of some outliers, which correspond to the localized modes
sequences, the dimension is basically infinite, namely, just shown in the eight panels below. Apart from that,
limited by the maximal slope one can obtain [31]. the majority of the eigenvalues seem to obey the
Marčenko-Pastur formula; see Ref. [58]. If the matrix is
2. Spectrum in phenotype and genotype spaces m × n, m > n, then the support of the spectrum is
1
pffiffiffiffi pffiffiffi 6
We compute spectra for both the sequences and the 2 ð m nÞ. In our case, since we have a 10 × 2550
configurations, for the 106 solutions. Let us detail this for matrix, one expects
pffiffiffiffiffiffiffi (if p
they were really random) to find the
ffiffiffiffiffiffiffiffiffiffi
the case of sequences: We have 106 binary vectors with 1 6
spectrum at 2 ð 10 2550Þ, which is close to the experi-
N S ¼ 2550 components each, and we want to know the ment, and confirms that most of the bonds are just randomly
typical spectrum of such vectors. This is conveniently present or absent. We attribute the slight enlargement of the
found with the singular value decomposition (SVD), in spectrum to memory effects between generation in the same
which one forms a matrix W of size m × n ¼ 106 × 2550. branch. This corresponds to the well-known phylogenetic
This matrix can be written as UDV , where U is m × m, V correlations among descendants in the same tree.
is n × n, and D is an m × n matrix that is diagonal in the It is tempting to also study the continuous part of this
sense that only the elements Dii with i ¼ 1; …; n are spectrum, which is not quite of the standard form. While in
nonzero. (We assume here that we are in the case principle this could be done by taking into account the
m > n.) The Dii are in general greater than 0, and in this known correlations, even the techniques of Ref. [59] seem
case the singular value decomposition is unique. We call difficult to implement.
the set of the λG
i ¼ Dii the spectrum of the sequences, and
the vectors in V the eigenvectors of the SVD. It is the first
few of those that are shown in Fig. 6. 3. Shear modes in the amino acid network
Note that the SVD eigenvalues λG i are the square roots Consider now either of the two examples, straight or
of the spectrum of the covariance matrix W T W, which has tilted (S and T). A solution of such an example is given by a
the same eigenvectors as W. Therefore, the high SVD set of bonds, and this set of bonds defines a graph on the
eigenvalues correspond to the principal components, the N AA ¼ h × w ¼ 540 AAs. This graph is embedded in 2D,
directions with maximal variation in the solution set. where ⃗xr;c are the positions of the AAs, which are
Mutatis mutandis, we perform the same SVD for the case connected by straight bonds. We now extend the scope
of the configurations, using the s values (that is, of the of our study somewhat, by assuming that the bonds are not
021037-11
TLUSTY, LIBCHABER, and ECKMANN PHYS. REV. X 7, 021037 (2017)
totally rigid, but given by harmonic springs (see also distribution of bonds) there to be about N × 2−10 ∼
Ref. [26]). This allows us to study mechanical properties 0.001N isolated nodes, i.e., isolated singletons, and even
that would be too stiff if we worked only with bonds that fewer patches of greater size.
are rigid sticks. Further zero modes come from nodes that can oscillate
In this case, the calculations are straightforward, if sideways without first-order effects. This will happen if a
somewhat complex, and they are, e.g., well explained in node is connected by only one bond. Since ϱ ∼ 1=2, the
Ref. [38] (see pp. 618–619). We thus consider the elastic probability of finding such a node is about
tensor H, which is the tensor product of the network
Laplacian with the 2 × 2 tensor of directional derivatives. ð10
1Þ
N ∼ 0.01N:
For the reader who is unfamiliar with Ref. [38], we 210
describe what this means componentwise. The playground
Ω ⊂ Z2 has size h in the z direction and size w in the x Thus, we show in Figs. 6 and 7 the eigenfunctions only for
direction, with periodic boundary condition in the x the first eigenvalues after the trivial ones. Because of the
direction. All bonds go from some ðr; cÞ to ðr þ 1; cÞ, tensorial nature of the problem, the eigenvectors have two
ðr þ 1; c 1Þ, ðr þ 1; c 2Þ, again with periodic boun- components, which we show as 2D shear flow.
dary conditions in the c direction. Each such bond defines a
direction vector ðdz ; dx Þ in R2 which we normalize to 4. Genetic correlation matrix
d2x þ d2z ¼ 1. Note that this vector depends on both the In Fig. 3, we study the correlations among the 106 solutions
origin and the target of the bond. in sequence space. Given the matrix W ij , of all sequences,
If we imagine harmonic springs between the nodes with i ¼ 1; …; N ¼ 106 , j ¼ 1; …; P 2550 (of binary digits),
connected by bonds (all with the same spring constant), we compute the means hW ·j i ¼ Ni¼1 W ij =N and the stan-
then we can define the (symmetric) tensor matrix of P
dard deviations SDj ¼ ð i jW ij − hW ·j ij2 Þ1=2 . Then, in the
deformation energies in the x and y direction by
usual way, we form Mij ¼ W ij − hW ·j i and
H0km ¼ Mðk; mÞ; with k; m ∈ Ω;
ðM MÞj;j0
Cj;j0 ¼ :
and where each element of H0km is—when k and m are SDj SDj0
connected by a bond—the 2 × 2 matrix (indexed by
i; j ∈ f1; 2g) Figure 5 then shows logðjCj;j0 jÞ, with the autocorrelation Cjj
omitted.
Mðk; mÞ ¼ (dx ðk; mÞ; dz ðk; mÞ)T ⊗ (dx ðk; mÞ; dz ðk; mÞ) Note that both the means and the variances depend very
2 weakly on j. Figure 5 reveals and reinforces several
dx dx dz
¼ : observations also made in other calculations of this paper.
dx dz d2z
First, looking onto the axis j ¼ j0 in the figure, one sees a
If k and m are not connected, then Mðk; mÞ is the 0 matrix. periodicity of the patterns corresponding to the 17 gaps
The elements of Mðk; mÞ are denoted Mðk; mÞij . between the 18 rows of the configuration space. This
Finally we complete the 2N × 2N matrix H0 to a reflects the necessity to maintain a connected liquid
Laplacian H by adding diagonal elements to it, so that channel. Also, as seen in Fig. 5, the correlations grow
the row (and column) sums are 0. In components, this somewhat towards the ends, especially toward the upper
means that we require, for each k ∈ Ω and each (j ¼ 2550) end. This is because of the mechanical con-
i; j ∈ f1; 2g, the sums straint which forces the channel to become more precise
towards the ends, in analogy with Fig. 8(b).
X The periodic patterns all over the square reflect not only
ðHkm Þij the natural periodicity of 150 (¼ 5 × w) elements in the
l
sequence, but also show that the boundaries of the channel
to vanish. Other properties of A are described in Ref. [38]. form a special shell (with two peaks per row).
Since we take periodic boundary conditions in the x
direction, there will always be a (simple) 0 eigenvalue of H 5. Survival under mutations
in this direction. Other 0 eigenvalues correspond to trans- Here, we ask how robust the solutions are as further
lation in the z direction or rotation in the x-z plane. Another mutations take place. First, we determine how many
type of (double) 0 eigenvalues are associated with any mutations lead to a destruction of the solution. The statistics
patch of nodes that is totally disconnected from the rest of of this is shown in Fig. 8. We note that about 10% of all
the lattice. Since the density ϱ of bonds is about 1=2 and solutions are destroyed by just one mutation, while there is
otherwise quite random, and there are twice five bonds an exponential decay of survival of m mutations. This
at each interior node, we expect (assuming random signals that the mutations act independently.
021037-12
PHYSICAL MODEL OF THE GENOTYPE-TO-PHENOTYPE … PHYS. REV. X 7, 021037 (2017)
One can also ask where the critical mutations take place.
This is illustrated in Fig. 8(b), and is discussed in the main
text. We also study the places where exactly two mutations
will kill a solution (and none of the two is a single site
“killer”) [Fig. 8(c)], and in these cases, one finds that the
two mutations are generally close to each other, acting on
the same site. Again, the channel is less vulnerable to
mutations, but now the mutations are evenly distributed
over the rest of the network.
021037-13
TLUSTY, LIBCHABER, and ECKMANN PHYS. REV. X 7, 021037 (2017)
[9] D. E. Koshland, Application of a Theory of Enzyme Speci- [27] S. Alexander, C. Laermans, R. Orbach, and H. M.
ficity to Protein Synthesis, Proc. Natl. Acad. Sci. U.S.A. 44, Rosenberg, Fraction Interpretation of Vibrational Proper-
98 (1958). ties of Cross-Linked Polymers, Glasses, and Irradiated
[10] K. A. Henzler-Wildman, V. Thai, M. Lei, M. Ott, M. Wolf- Quartz, Phys. Rev. B 28, 4615 (1983).
Watz, T. Fenn, E. Pozharski, M. A. Wilson, G. A. Petsko, M. [28] J. C. Phillips and M. F. Thorpe, Constraint Theory, Vector
Karplus, C. G. Hubner, and D. Kern, Intrinsic Motions Percolation and Glass-Formation, Solid State Commun.
along an Enzymatic Reaction Trajectory, Nature (London) 53, 699 (1985).
450, 838 (2007). [29] S. Alexander, Amorphous Solids: Their Structure, Lattice
[11] Y. Savir and T. Tlusty, Conformational Proofreading: The Dynamics and Elasticity, Phys. Rep. 296, 65 (1998).
Impact of Conformational Changes on the Specificity of [30] The propagation of rigidity is effectively a double perco-
Molecular Recognition, PLoS One 2, e468 (2007). lation problem in which both fluid (blue) and rigid (gray)
[12] T. M. Schmeing, R. M. Voorhees, A. C. Kelley, Y. G. Gao, regions are continuous (see Appendix A 3).
F. V. Murphy, 4th, J. R. Weir, and V. Ramakrishnan, The [31] I. Procaccia, Complex or Just Complicated?, Nature
Crystal Structure of the Ribosome Bound to EF-Tu and (London) 333, 498 (1988).
Aminoacyl-tRNA, Science 326, 688 (2009). [32] J.-P. Eckmann and D. Ruelle, Fundamental Limitations for
[13] Y. Savir and T. Tlusty, RecA-Mediated Homology Search as Estimating Dimensions and Lyapunov Exponents in
a Nearly Optimal Signal Detection System, Mol. Cell 40, Dynamical Systems, Physica (Amsterdam) 56D, 185
388 (2010). (1992).
[14] M. Huse and J. Kuriyan, The Conformational Plasticity of [33] P. Grassberger and I. Procaccia, Characterization of Strange
Protein Kinases, Cell 109, 275 (2002). Attractors, Phys. Rev. Lett. 50, 346 (1983).
[15] Y. Savir and T. Tlusty, The Ribosome as an Optimal [34] Y. Savir, E. Noor, R. Milo, and T. Tlusty, Cross-Species
Decoder: A Lesson in Molecular Recognition, Cell 153, Analysis Traces Adaptation of Rubisco toward Optimality
471 (2013). in a Low-Dimensional Landscape, Proc. Natl. Acad. Sci.
[16] M. F. Perutz, Stereochemistry of Cooperative Effects in U.S.A. 107, 3475 (2010).
Haemoglobin: Haem-Haem Interaction and the Problem [35] O. Shoval, H. Sheftel, G. Shinar, Y. Hart, O. Ramote, A.
of Allostery, Nature (London) 228, 726 (1970). Mayo, E. Dekel, K. Kavanagh, and U. Alon, Evolutionary
[17] N. M. Goodey and S. J. Benkovic, Allosteric Regulation and Trade-offs, Pareto Optimality, and the Geometry of Phe-
Catalysis Emerge via a Common Route, Nat. Chem. Biol. 4, notype Space, Science 336, 1157 (2012).
474 (2008). [36] Tsvi Tlusty, Self-Referring DNA and Protein: A Remark on
[18] S. W. Lockless and R. Ranganathan, Evolutionarily Physical and Geometrical Aspects, Phil. Trans. R. Soc. A
Conserved Pathways of Energetic Connectivity in Protein 374, DOI: (2016).
Families, Science 286, 295 (1999). [37] The natural genetic code is redundant; i.e., several codons
[19] A. C. M. Ferreon, J. C. Ferreon, P. E. Wright, and A. A. encode the same AA and are therefore synonymous. Such
Deniz, Modulation of Allostery by Protein Intrinsic Dis- redundancy reduces the fraction of destructive mutations,
order, Nature (London) 498, 390 (2013). since mutations that exchange synonymous codons do not
[20] H. Qu and G. Zocchi, How Enzymes Work: A Look through change the encoded AA and are theretofore bound to be
the Perspective of Molecular Viscoelastic Properties, Phys. neutral. A case of redundant code is examined in Sec. II G.
Rev. X 3, 011009 (2013). [38] F. R. K. Chung and S. Sternberg, Laplacian and Vibra-
[21] C. Joseph, C. Y. Tseng, G. Zocchi, and T. Tlusty, tional-Spectra for Homogeneous Graphs, Journal of Graph
Asymmetric Effect of Mechanical Stress on the Forward Theory 16, 605 (1992).
and Reverse Reaction Catalyzed by an Enzyme, PLoS One [39] T. Teşileanu, L. J. Colwell, and S. Leibler, Protein Sectors:
9, e101442 (2014). Statistical Coupling Analysis versus Conservation, PLoS
[22] M. R. Mitchell, T. Tlusty, and S. Leibler, Strain Analysis of Comput. Biol. 11, e1004091 (2015).
Protein Structures and Low Dimensionality of Mechanical [40] R. Jaenicke and G. Böhm, The Stability of Proteins in
Allosteric Couplings, Proc. Natl. Acad. Sci. U.S.A. 113, Extreme Environments, Curr. Opin. Struct. Biol. 8, 738
E5847 (2016). (1998).
[23] M. Gerstein, A. M. Lesk, and C. Chothia, Structural [41] K. A. Dill, Theory for the Folding and Stability of Globular
Mechanisms for Domain Movements in Proteins, Biochem- Proteins, Biochemistry (Moscow) 24, 1501 (1985).
istry 33, 6739 (1994). [42] T. Tlusty, A Model for the Emergence of the Genetic Code
[24] In our model, the AA species is determined by the bonds, as a Transition in a Noisy Information Channel, J. Theor.
while in real proteins the bonds are determined by the Biol. 249, 331 (2007).
chemical nature and position of the AA (see also Sec. IIG). [43] J.-P. Eckmann, Trading Codes for Errors, Proc. Natl. Acad.
[25] Note that in this simulation, we do not take as evolutionary Sci. U.S.A. 105, 8165 (2008).
criterion the mechanical signal itself, but require that the [44] T. Tlusty, A Colorful Origin for the Genetic Code:
protein forms a fluid channel with a prescribed configura- Information Theory, Statistical Mechanics and the
tion. We show that this configuration facilitates the Emergence of Molecular Codes, Phys. Life Rev. 7, 362
sought-after mechanical shear motion in Sec. II E and (2010).
Appendix B 3. (In Ref. [26], we take the mechanical modes [45] G. Casari, C. Sander, and A. Valencia, A Method to Predict
themselves as the target function.) Functional Residues in Proteins, Nat. Struct. Biol. 2, 171
[26] S. Dutta, J-P. Eckmann, and T. Tlusty (to be published). (1995).
021037-14
PHYSICAL MODEL OF THE GENOTYPE-TO-PHENOTYPE … PHYS. REV. X 7, 021037 (2017)
[46] A. Gogos, D. Jantz, S. Sentürker, D. Richardson, M. [52] M. Levitt, C. Sander, and P. S. Stern, Protein Normal-Mode
Dizdaroglu, and N. D. Clarke, Assignment of Enzyme Dynamics: Trypsin Inhibitor, Crambin, Ribonuclease and
Substrate Specificity by Principal Component Analysis of Lysozyme, J. Mol. Biol. 181, 423 (1985).
Aligned Protein Sequences: An Experimental Test Using [53] M. M. Tirion and D. Benavraham, Normal Mode Analysis of
DNA Glycosylase Homologs, Proteins Struct. Funct. Bioinf. g-Actin, J. Mol. Biol. 230, 186 (1993).
40, 98 (2000). [54] I. Bahar and A. J. Rader, Coarse-Grained Normal Mode
[47] O. Rivoire, K. A. Reynolds, and R. Ranganathan, Evolution- Analysis in Structural Biology, Curr. Opin. Struct. Biol. 15,
Based Functional Decomposition of Proteins, PLoS Com- 586 (2005).
put. Biol. 12, e1004817 (2016). [55] W. J. Zheng, B. R. Brooks, and D. Thirumalai, Low-Frequency
[48] H. Jacquier, A. Birgy, H. Le Nagard, Y. Mechulam, E. Normal Modes that Describe Allosteric Transitions in
Schmitt, J. Glodt, B. Bercot, E. Petit, J. Poulain, G. Barnaud, Biological Nanomachines Are Robust to Sequence Variations,
P.-A. Gros, and O. Tenaillon, Capturing the Mutational Proc. Natl. Acad. Sci. U.S.A. 103, 7664 (2006).
Landscape of the Beta-Lactamase TEM-1, Proc. Natl. Acad. [56] J. L. England, Allostery in Protein Domains Reflects a
Sci. U.S.A. 110, 13067 (2013). Balance of Steric and Hydrophobic Effects, Structure 19,
[49] B. P. Roscoe, K. M. Thayer, K. B. Zeldovich, D. Fushman, 967 (2011).
and D. N. A. Bolon, Analyses of the Effects of All Ubiquitin [57] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H.
Point Mutants on Yeast Growth Rate, J. Mol. Biol. 425, Teller, and E. Teller, Equation of State Calculations by Fast
1363 (2013). Computing Machines, J. Chem. Phys. 21, 1087 (1953).
[50] E. Firnberg, J. W. Labonte, J. J. Gray, and M. Ostermeier, A [58] V. A. Marčenko and L. A. Pastur, Distribution of Eigenval-
Comprehensive, High-Resolution Map of a Gene’s Fitness ues for Some Sets of Random Matrices, Math. USSR Sb. 1,
Landscape, Mol. Biol. Evol. 31, 1581 (2014). 457 (1967).
[51] D. A. Kondrashov and F. A. Kondrashov, Topological [59] T. Guhr, A. Müller-Groeling, and H. A. Weidenmüller,
Features of Rugged Fitness Landscapes in Sequence Space, Random-Matrix Theories in Quantum Physics: Common
Trends Genet. 31, 24 (2015). Concepts, Phys. Rep. 299, 189 (1998).
021037-15