Вы находитесь на странице: 1из 40

Applying Novel Mathematical Techniques to the Reengineering of a

Virus-like Nanoparticle

Andrew Favor

Dept. of Chemistry, University of California, Berkeley

A thesis submitted in partial fulfillment of the requirements for the degree of Bachelor

of Science with Honors in Chemical Biology, under the supervision of Professor

Matthew Francis

Professor Matthew Francis ___________________________________

Spring 2018
“In all chaos there is a cosmos, in all disorder a secret order."

-Carl Jung
Contents

1 Introduction: 1

2 Analysis of positional preferences in MS2 backbone: 3

2.1 Background: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Analysis of physical preferences along MS2’s backbone: . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Definitions: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.2 Standardization: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.3 Determination of Positional Property Preferences: . . . . . . . . . . . . . . . . . . . . . 7

2.3 Results and implications: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Development of a Convolutional Neural Net for predictive sequence analysis: 10

3.1 Epistasis as a motivator for the development of a predictive tool: . . . . . . . . . . . . . . . . . 10

3.2 Experimental Design: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.1 Design of the neural network function: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Results: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.4 Discussion: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Conclusion: 23

5 Acknowledgements: 23

References 24

A Appendix : Data and Figures 31

A.1 Apparent Fitness for MS2: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

A.2 Epistasis values for double mutants in MS2’s FG-loop: . . . . . . . . . . . . . . . . . . . . . . . 32

A.3 Neural Net Predictions and Additive Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3
1 Introduction:

Many questions about the physical basis of life still weigh heavy upon our minds, unanswered.

In self-replicating molecular systems we see an interplay of thermodynamic principles that are often

counter to our conception of universal trends – a preference for increased organization and a rever-

sal of entropy within a world otherwise driven towards disorder [1],[2],[3]. In recent years, a variety of

mathematical models have been developed with the goal of characterizing the interplay of various forces

that contribute to the perceived "order" of chemical systems that exhibit propagation of structural iden-

tity [4],[5]. In evolutionary time, the chemical frameworks that we find our lives supported by have dy-

namically moved through the manifold of many physical properties, and when projected upon a plane

bounded by our current observational capabilities, much of the fundamental information regarding the

guiding reasons for such changes is lost or transformed beyond practicality. Within biological systems,

proteins reside at the limit of solubility, which holds the physical consequence that at all times they exist

on the edge of "falling apart", so to speak [6]. In the evolution of such molecules, the new capabilities

that arise from mutations may often come at the cost of stability. Thus, the compounding effect of mu-

tations over time may follow a path bounded by thermodynamic penalties that prevent beneficial new

functionalities from arising. The goal of this study is to apply new analytical and predictive tools in order

to establish a model for the evolution of self assembling systems which we hope to utilize in developing

novel platforms technological development.

Directing focus towards the self-assembly process of proteins, we see complex molecular struc-

tures fluctuating within a chaotic multiplex of conformational microstates [7],[8]. The foundation of pro-

tein engineering lies in introducing new mutations for the purpose of modifying protein function. How-

ever, the challenge of predicting the fitness of a given mutation is always a limiting factor when intro-

ducing changes to a given amino acid sequence [9], [10], [11] . Based on the immense number of variant

primary sequences that can result from just two co-expressed mutations, it is virtually impossible to test

the fitness of all possible combinations of mutations through physical synthesis or even computational

modeling[12], which necessitates the development of a means by which to effectively predict the viability

of potential mutant structures for the purpose of extensive protein modification [13],[14].

1
Virus-like nanoparticles maintain great potential as versatile drug delivery systems due to their

structural stability under physiological conditions, and their ability to encapsulate a wide range of ther-

apeutic agents and biomolecules [15], [16]. This research focuses on reengineering the icosahedral coat-

protein of the MS2 bacteriophage for such applications using a multifaceted approach. Herein we explore

the use of computational chemistry, mutability analyses, and the use of machine-learning algorithms

as predictive tools for determining the resultant stability of imposing a variety of double-mutations on

this protein. Such analyses shed light upon the position-based requirements for the various physical

properties that are needed in order to achieve successful self-assembly to a stable nanoparticle, and the

structural parameters that can be altered in order to modify the functionality of the icosahedral MS2-

bacteriophage coat protein. In effect, we have attained a design blueprint for how the structure of the

MS2 nanoparticle may be altered in order to introduce changes in its physical properties such as size,

cargo affinity, and thermal stability. In order to achieve our goal of both introducing novel functional-

ity and increasing stability, it is imperative to determine which mutations would allow the formation of

stable capsids, and which would not.

It is important to note that due to the high mutability of RNA-viruses, they serve as good sub-

jects for case studies such as this, which peer into the natural order that governs mutability itself, and

how evolution occurs [17], [18]. Much research has sought to explore both human-directed and naturally

occurring trends in virus mutability, as the mechanism of their self-propagation is fundamentally depen-

dent on both highly mutation prone processes, and the need to adapt to rapidly changing environmental

parameters [19]. In addition to the basic knowledge that can be acquired from such analyses, the appli-

cations to medical and technological innovation that engineering virus-inspired nanoparticles promises

is profound to say the least. The coat proteins associated with viral anatomy provide a stable scaffolding

for the development of nanomaterials, with a versatile range of new functionality and a remarkable toler-

ance to extensive synthetic modification [20], [21], [22]. As a result, in recent years, these materials have

come to be used as the structural basis for a broad range of technological applications, such as the devel-

opment of drug delivery vehicles [16], [15], nanoreactors [23], imaging agents [24], [25], and catalysts for

the harnessing of renewable energy [26], [27], [28].

2
The information gained herein sheds light upon the underlying mechanics of evolution through

the co-mutation of amino acid residues, and fosters an examination of the synergistic effects of their pair-

wise interactions. Each mutation can positively or negatively affect the protein’s overall fitness on its own,

but how a second mutation is impacted by the first one is an issue still shrouded in mystery. The com-

bined effect of any two mutations remains very challenging to predict, as the result is not simply an ad-

dition of the two individual effects. In the structures of multiply-mutated proteins, a whole host of novel

interactions are introduced, rendering the effective change in stability or functionality to be greater than

or less than the sum of its individual parts.

2 Analysis of positional preferences in MS2 backbone:

2.1 Background:

In order to quantify the position-based physical preferences along the backbone of MS2, a large

library of mutants was generated for the purpose of providing the fitness data for such mutants. A plasmid

library of all possible single amino acid mutants for the MS2 coat protein was produced through the use of

golden gate cloning technique [29]. These plasmids were then used to transform chemically competent E.

coli (DH10B cells), which were allowed to grow on LB agar plates. Following subsequent bacterial growth

in liquid media, coat-protein production was induced via arabinose addition, and the expressed proteins

were purified through sonication and precipitation in ammonium sulfate. The precipitated coat proteins

were then separated based on their ability to form stable capsids through size exclusion chromatography

on an Akta FPLC system. An important characteristic of these proteins (as with many viral capsids) is

their ability to encapsulate the mRNA sequences that encode their own protein [30], [31], [32]. Thus, after

this size selection, the encapsulated RNA was collected from the capsids, and reverse transcribed to DNA,

which was submitted to Illumina for sequencing [16]. In this analysis, the wild-type proteins served as

positive controls, while the nonsense mutations served as a negative controls.

Fitness scores for each mutant sequence were generated by taking the base-10 logarithm of the

ratio of each sequence count observed in the capsid-group over the non-capsid group. The full apparent

3
fitness landscape for single amino acid mutations across MS2’s backbone can be seen in Appendix,[A1]).

From this data, we were able to apply further mathematical analyses in order to quantitatively make sense

of the position-based physical requirements that this protein needed in order to form stable capsids.

From a range of online literature, indices pertaining to the physical properties of each canoni-

cal amino acid were collected, as shown in the following table:

Table 1: Physical property indices for the 20 canonical amino acids, serving as a basis set for analysis of

the position-based physical requirements for effective protein folding and capsid stability [33], [34], [35],

[36], [37], [38], [39], [40].

2.2 Analysis of physical preferences along MS2’s backbone:

2.2.1 Definitions:

a: one of the 20 canonical amino acids,

a ∈ {A, S, T,V,C , E , D, K , R,Q, N , M , I , L, H , F, Y ,W,G, P }

ε: one of the 10 physical property indices used in our analysis (volume, molecular weight, length, ster-

ics, polarity, polar area, fraction water, hydrophobicity, non-polar area, flexibility)

f a,p : a fitness scores, indexed by amino acid, a, and position, p.

R p : a vector containing the fitness scores of each amino acid for a given position, p

4
· ¸
R p = f A,p f S,p f T,p ··· fG,p f P,p
1×20

ξε : a vector containing the physical property indices, ϕa,ε corresponding to a given property, ε, and

amino acid, a.

· ¸
ξε = ϕ A,ε ϕS,ε ϕT,ε ··· ϕG,ε ϕP,ε
1×20

µ(R p ) : the mean value of the fitness scores for a given position

σ(R p ) : the standard deviation of the fitness scores for a given position

µ(ξε ) : the mean value of the amino acid indices for a given physical property

σ(ξε ) : the standard deviation of the amino acid indices for a given physical property

2.2.2 Standardization:

We proceed to produce standardized fitness scores, f˜a,p , by taking the difference from a given

position’s mean, and dividing by a position’s standard deviation:

f a,p − µ(R p )
µ ¶
f˜a,p =
σ(R p )

Combining these standardized fitness scores into an array, F ∈ R129×20

5
 ³ ´ ³ ´ ³ ´ ³ ´ 
f A,1 −µ1 f S,1 −µ1 f P,1 −µ1 f ∗,1 −µ1
 σ1 σ1 ··· σ1 σ1 
 
 ³ ´ ³ ´ ³ ´ ³ ´ 
 f A,2 −µ2 f S,2 −µ2 f P,2 −µ2 f ∗,2 −µ2 
σ2 σ2 σ2 σ2
 
 
.. .. ..
 
F =
 
 . . . 

³ ´ ³ ´ ³ ´ ³ ´
 f f S,128 −µ128 f P,128 −µ128 f ∗,128 −µ128 
 A,128 −µ128

 σ128 σ128 σ128 σ128 
 
³ ´ ³ ´ ³ ´ ³ ´
f A,129 −µ129 f S,129 −µ129 f P,129 −µ129 f ∗,129 −µ129
σ129 σ129 ··· σ129 σ129
129×20
 
 f˜A,1 f˜S,1 ··· f˜G,1 f˜P,1 
 
 
 ˜ ˜
 f A,2 f P,2 

 
 . .. .. 
 
=  .. . . 
 
 
 
 f˜ ˜
f P,128 
 A,128 
 
 
f˜A,129 f˜S,129 ··· f˜G,129 ˜
f P,129
129×20

Similarly, we produced standardized property indices [ϕa,ε ]sc al ed ,0 , by taking the difference from a given

property’s mean index value, and dividing by the associated standard deviation:

ϕa,ε − µ(ξε )
µ ¶
[ϕa,ε ]sc al ed ,0 =
σ(ξε )

Next, we subtracted the minimum value of [ϕa,ε ]sc al ed ,0 for a given property, setting the minimum value

to zero:

© ª
[ϕa,ε ]scal ed ,1 = [ϕa,ε ]scal ed ,0 − mi n [ϕ A,ε ]sc al ed ,0 , [ϕS,ε ]sc al ed ,0 , · · · , [ϕP,ε ]sc al ed ,0

Finally, we divide each value of [ϕa,ε ]scal ed ,1 for a given property by the maximum value of its associated

set, thus setting the max value to 1, and producing a set of standardized indices, ϕ̃a,ε , fit between 0 and 1:

[ϕa,ε ]sc al ed ,1
µ ¶
ϕ̃a,ε = © ª
max [ϕ A,ε ]sc al ed ,0 , [ϕS,ε ]sc al ed ,0 , · · · , [ϕP,ε ]sc al ed ,0

Combining these standardized indices into an array, Φ ∈ R10×20

6
 
 ϕ̃ A,vol ume ϕ̃S,vol ume ··· ϕ̃G,vol ume ϕ̃P,vol ume 
 
 
 ϕ̃ A,wei g ht ϕ̃P,wei g ht 
 
 
.. .. ..
 
Φ=
 
 . . . 

 
 
 ϕ̃ ϕ̃P, n.p.−ar ea 
 A, n.p.−ar ea 
 
 
ϕ̃ A, f l exi bi l i t y ϕ̃S, f l exi bi l i t y ··· ϕ̃G, f l exi bi l i t y ϕ̃P, f l exi bi l i t y
10×20

2.2.3 Determination of Positional Property Preferences:

We can produce an array, Ψ ∈ R129×10 , with entries representing each position’s preference for

a given physical property by the following operation:

Ψ = F · ΦT

With individual entries, ψp,² , corresponding to a summation of the following product over all 20 canonical

amino acids, where ² corresponds to a given physical property, and p corresponds to a position in the

protein backbone:

20
ψp,² = f˜ai ,p · ϕ̃ai ,²
X
i =1

7
Such that:
 
˜ ˜
¡P20 ¡P20
i =1 f a i ,1 · ϕ̃a i ,vol . i =1 f a i ,1 · ϕ̃a i , f l ex. 
¢ ¢
 ··· 

.. .. ..
 
Ψ=
 
 . . . 

 
¡P ¢
20 ˜ ˜
¡P20
f a ,129 · ϕ̃a ,vol . f a ,129 · ϕ̃a , f l ex.
¢
i =1 i i
··· i =1 i i
129×10

 
f˜A,1 · ϕ̃ A,vol . + · · · + f˜P,1 · ϕ̃P,vol . f˜A,1 · ϕ̃ A, f l ex. + · · · + f˜P,1 · ϕ̃P, f l ex.
¡ ¢ ¡ ¢
 ··· 
 
.. .. ..
 
=
 
 . . . 

 
¡ ¢
f˜A,129 · ϕ̃ A,vol . + · · · + f˜P,129 · ϕ̃P,vol . f˜A,129 · ϕ̃ A, f l ex. + · · · + f˜P,129 · ϕ̃P, f l ex.
¢ ¡
···
129×10

 
 ψ1,vol ume ψ1,wei g ht ··· ψ1,n.p. ar ea ψ1, f l exi bi l i t y 
 
 
 ψ2,vol ume ψp, f l exi bi l i t y 
 
 
.. .. ..
 
=
 
 . . . 

 
 
ψ ψ128, f l exi bi l i t y 
 128,vol ume 
 
 
ψ129,vol ume ψ129,wei g ht ··· ψ129,n.p ar ea ψ129, f l exi bi l i t y
129×10

Thus, we have obtained an array containing information about position-wise preferences for various

physical properties, wherein rows index to the positions in the protein backbone, and columns index

the various physical properties in question.

8
2.3 Results and implications:

Figure 1: The effect of various physical properties on the apparent fitness of the MS2 capsid protein[16].

9
The results of this analysis provide information about favored residue types with respect to

their sequential position in MS2’s backbone, and their spatial arrangement in the 3D protein structure

overall. From this, we have a blueprint that allows us to acknowledge what types of mutations are well

tolerated within a given region in the capsid structure, and thereby provides a guiding range of parameters

to work within as we further modify the structure of this protein.

3 Development of a Convolutional Neural Net for predictive sequence

analysis:

3.1 Epistasis as a motivator for the development of a predictive tool:

As it stands, the protein folding problem remains unsolved, with much progress yet to be

made[41],[42],[43],[44]. The goal of defining a relationship between peptide sequence and favored 3D

structure has remained at the center of many scientific fields for decades; a multifaceted physics prob-

lem, with many hypotheses having been formed to predict how proteins find their minimum energetic

state [45],[46],[47]. The physical properties of constituent amino acids has certainly been observed to be

a primary factor driving the conformational dynamics of this process, largely due to phenomena such

as the hydrophobic effect [48], salt-bridge formation [49], and hydrogen bonding [50],[51] . However,

an extremely exhaustive search amongst conformational samplings remains a partition only surpassable

through complex cooperative behaviors between the protein’s molecular substructures [52].

This aforementioned phenomena reflect the hidden dimension of importance that epistatic ef-

fects have, not only on the energetic stability of a protein’s folded state, but also on its ability to reach that

state. However, for the purposes of protein engineering, perhaps the full dynamical trajectory associated

with a given structural alteration need not be known in order to reach our end goal. While our previous

analysis provided information regarding the physical characteristics that contributed to protein fitness

with respect to the effects of single mutations [16], such modifications are barely scratching the surface

when it comes to truly reengineering a protein. This led us to look towards double mutants for our next

analytical pursuit. While the introduction of single amino acid mutations provides information about

10
the tolerance of that residue with respect to the wild-type protein overall, the mystery of epistasis still

remains, leaving us to consider: the introduction of new physical interactions, favorable or disfavorable,

which are completely novel to the wild-type structure.

We define epistasis herein as the difference in the effect that multiple mutations have when

expressed together, from the additive sum of the individual mutations [53]. Due to the cooperative nature

of interactions between neighboring amino acids in a protein, it is already hard enough to make sense of

the effect that a single amino acid mutation has on the protein’s overall ability to fold and retain a stable

and functional state. The introduction of multiple mutations increases the complexity of this problem

exponentially, and can effectively result in novel interactions between amino acids, not observed in the

wild type.

Neural nets and related machine learning algorithms have gained much acclaim in recent years

due to the efficacy of their application as tools for predictive data analysis, classification, and disease

diagnostics [54],[55], [56], due to their ability to find patterns in large data sets which may be elusive from

the point of view of conventional analysis. Thus, it stands to reason that neural nets promise a means by

which to project relevant information away from a space with a temporal dimension, and replace it with

the dimensions of quantized physical descriptors. In the field of biological engineering, neural networks

have been used as predictive tools to guide the design of mutant sequences to enhance functionality

[57],[58]. In the age of high-throughput sequencing, the ability to generate, express, and analyze a large

number of mutants has become a remarkably feasible task, which allows for the efficient acquisition of

large bodies of data – a fundamental requisite for effective use in neural networks, due to their "data

hungry" nature [59],[60],[61].

For our examination of the efficacy of using such tools for the purposes of protein engineering

and directed evolution, a basic objective is to develop a model that can predict the effects of epistasis

of multiple co-expressed mutations with better accuracy than the predictions provided by the additive

changes of from multiple single-mutant fitness scores. There have been many approaches used to quan-

tify a representative metric for epistasis – a way to quantify the effect of interactions between multiple

mutations, which produce results that differ from the sum of their parts. Following the approach of previ-

11
ous work conducted in the evaluation of epistatic phenomena[58],[53], our analysis employs the following

mathematical description of an epistasis metric describing the double mutation imposed on residues i

and j :

² = ∆ f i j − (∆ f i + ∆ f j ), where ∆ f x = f x − f W T

Quantification of the epistasis exhibited by double mutants throughout MS2’s FG-loop can be seen in

Appendix,[A2]. The positions where either favorable or disfavorable epistasis is most pronounced give us

an idea of where to direct focus in our analysis of synergistic interactions between nearby residues, and

the magnitude of the overall physical influence of such interactions.

3.2 Experimental Design:

We trained a neural net by feeding it two sources of data to map together: information re-

garding the quantified physical properties exhibited at each position in the peptide backbone of a given

mutant, and the associated fitness score of that mutant. In analogy to using a neural net to perform im-

age classification, we trained the neural net to look at the physical properties along MS2’s backbone in

order to perform capsid-formation classification. The 12 physical properties used in our input data were

treated mathematically as a range of discrete "colors" (more formally, channels), and the positions in the

backbone were treated as order-correlated pixel values. All development of our neural network model

was done in python, using the Tensorflow library [62]. Additionally, the Numpy and Matplotlib libraries

were used for mathematical processing of data and figure generation, respectively.

The input and output data for our functional neural network model, as well as the standard-

ization process applied to them are defined as follows:

 Input, m i (data reflecting the physical state of a given mutant): A 129 x 12 array, with rows corre-

sponding to positions in a mutant’s peptide backbone, and columns corresponding to the physical

properties of the amino acid there (the same as used in our previous discussion), standardization

as follows:

• Subtract the mean for each property (to center data around zero)

• Divide by standard deviation (to get unit variance)

12
• Add the minimum value for each property group (to set the minimum value to 0)

• Divide by the maximum value for each property group (to set the maximum value to 1)

For each mutant, "m i ", in our mutation data set, we generated matrices containing information

about the physical properties at each position in its backbone, based on what amino acid was

present. Mutants with fitness scores greater than or equal to -4 were removed, and missing data

points were excluded entirely. In the input-matrices for each mutant data point, rows correspond

to the sequence position (ordered from N-terminal to C-terminal), and columns correspond to the

normalized physical property indices described above, but using 12 physical properties (positive

charge, negative charge, volume, molecular weight, length, sterics, polarity, polar area, fraction wa-

ter, hydrophobicity, non-polar area, flexibility) instead of the 10 properties used in our analysis of

position-based physical preferences:


 
(m i ) (m i ) (m i ) (m )
 α1,vol ume
α1,wei g ht
··· α1,n.p. ar ea α1, f il exi bi l i t y 
 
 
 (mi ) (m i )
 α2,vol ume αp, f l exi bi l i t y 

 
.. .. ..
 
mi = 
 
 . . . 

 
α(mi ) (m i )
 
 128,vol ume α128, f l exi bi l i t y 

 
 
(m i ) (m ) (m ) (m i )
α129,vol ume
α129,wei
i
g ht
··· α129,n.p
i
ar ea α129, f l exi bi l i t y
129×12

 Output, f i : fitness data reflecting a given mutant’s ability to form stable capsids:

The data set used to "train" the network (develop optimal weight values for producing best fit predictions)

consisted of the physical property matrices and corresponding fitness scores of the single amino acid mu-

tants found in previous experiments [16]. The data set that was used to validate the predictive abilities of

our network consisted of the same data types, but with double mutants generated in an epistatic land-

scape of MS2’s FG-loop.

13
3.2.1 Design of the neural network function:

The following pseudocode describes the computational flow of information as organized by

our neural network function:

Algorithm 1: Functional model for the convolutional neural net predictive function
Input : m i second order tensor ∈ R129×12 .

Output: f i the predicted fitness score associated with mutant, "i ", a scalar value.

1 CNN model (m i );

2 C onv 1 () ← m i : First convolutional layer; ∈ R129×12 , containing 12 filters, and a kernel size of 5, with

rectified-linear activation function;

3 Pool 1 () ← C onv1(•) : First pooling layer; ∈ R64×12 , with a pool size of 2;

4 C onv 2 ← Pool 1 (•) : Second convolutional layer; ∈ R32×4 containing 12 filters, and a kernel size of 5,

with rectified-linear activation function;

5 Pool 2 ← C onv 2 (•) : Second pooling layer; ∈ R128 containing 4 filters, and a kernel size of 5;

6 F ool 2 F l at (•) ← Pool 2 (•) : Reshape Pool 2 to a vector; ∈ R128 with a pool size of 2;

7 Dense 1 ← Pool 2 F l at (•): First fully-connected layer; ∈ R128 , with rectified-linear activation

function;

8 Dense 2 ← Dense 1 (•) : Second fully-connected layer; ∈ R32 , with rectified-linear activation

function;

9 f i ← Dense2(•) : Output scalar; the predicted fitness score.;

10 return f i

14
First Convolutional
Layer
Second Convolutional
Layer
Backbone Position

Fully-Connected
Layers
!"
#" 1×1

129×12
Physical Properties

Figure 2: Schematic of neural net functional model, with input m i , and output f i .

The design of our neural network model was composed of two sequential convolutional layers

(each followed by a pooling operation), ultimately feeding into two fully-connected layers, which output

a scalar fitness score prediction. The property-columns in our input matrices are separately fed into the

function, to be processed individually, and combined later on. The first convolutional layer takes small

fragments of the input vector (of length defined as "kernel size" = 5), corresponding to 5-residue long

sequences of amino acids in MS2’s backbone – a size which was chosen to represent small units of se-

quence that can exhibit characteristic patterns in their physical identities. Each of these are then passed

through 12 "filters" (more formally, algebraic transformations), which posses weights that are iteratively

adjusted in order to only let through data from fragment sequences that contain mathematical features

that provide useful information for the overall numerical flow. After each pass through the filters, a "pool-

ing" operation reduces the size of the data passed along by taking the maximum value of each 2-unit

long subdivisions of the filter outputs, and consolidating them to be fed into the next layer. Due to the

input of the second convolutional layer having undergone an extensive mathematical transformation by

this point, it holds an abstract relation to the physical significance of the input data, but the principles of

convolutional data processing performed are the same as seen in the first layer.

15
The pooled data that comes out of the second convolutional layer is then fed into the first "fully

connected" layer. This means that all remaining pieces of data (which happen to be 128 distinct values

now rather than the 129 × 12 values that we started with, due to the dimension reductions performed in

the pooling steps) are combined into a single vector and are together subjected to the same mathemati-

cal processing from this point on. This "fully connected" processing consists of combining each element

in the "fully connected" vector through a linear combination, and applying a rectified-linear function,

ReLU (x) = max(0, x), to the resulting sum. This is done 128 times, to produce 128 new distinct data

points, each using the same input, but being transformed by different weights. These 128 values then un-

dergo the same processing operation of linear combination followed by rectified-linear transformation,

but only 32 times, to produce 32 new values. Finally, these 32 values are passed through a regular linear

combination (this time not transformed by a function) to produce a single value that should, in theory,

match the fitness score associated with a given mutant. Thus, the function of the architectual regions

can be summarized as serving distinct mathematical roles in a data-processing procedure that maps the

matrix m i to the scalar f i .

The convolutional layers serve to reduce data by filtering through significant motif patterns, to produce a

vector:

CONV (m i ) = x̄ , where x̄ ∈ R128 and m i ∈ R129×12

The fully connected layers apply the following function to the vector provided by the convolutional layers

in order to find a general interactive-relationship between the vector values and a fitness score:
à à !!
32
X 128
X 128
X
f i (x) = a α ReLU b α,β ReLU c α,β,γ x γ
α=1 β=1 γ=1

The algorithm then iterates through the single mutant data, and readjusts the weights within

the functional model in a manner that best minimizes error between the function output, and the actual

fitness score for a given mutant, with the ultimate goal of finding weight values for that provide a "best-

fit" for all of the mutant data being trained on, as well as potential new data that the developed function

may later be used for.

16
3.3 Results:

From computing the predicted fitness scores for MS2’s FG-loop epistatic landscape (all of which

can be found in Appendix,[A3]), we see the neural network predictions matching the experimental fitness

trends for many of the position combinations. However, some deviations from predictive accuracy do

stand out.

(a).

(b).

(c).

Figure 3: Comparison of the actual fitness scores and predicted fitness scores for double mutations be-

tween position 71 and 72(a), 73(b), 73(c).

17
For example, as seen in Figure 3, mutations of positions 71 and 72, and positions 71 and 73 exhibit sim-

ilar patterns of positive fitness. However, co-mutation of position 71 with position 74 deviates from this

pattern, with overall low tolerance to mutation. This is somewhat surprising, because even in our single-

mutant analysis, positions 72 and 73 are highly mutable, while 74 is not. Having trained our model on

the single-mutant data set, one might expect position 74’s intolerance to mutation to carry over to the

predicted values in an almost additive manner, yet our data suggest that the predictive character of our

model is more heavily influenced by position 71’s single mutant data than position 74. Overall, the pre-

dicted fitness scores for co-mutating position 71 with position 74 heavily resemble those of 72 and 73,

suggesting that our model may be subject to bias towards the assumed recurrence of mutability trends. A

possible explanation for this is that as we feed our input data through the convolutional layers, the protein

backbone is divided into increments of 5 amino acids. In the single mutant data set, positions 70 through

73 are all highly mutable, whereas position 74 is not. If these five positions influence the neural net model

through the same input kernel, it makes sense that the influence of position 74’s general intolerance to

mutation is dwarfed by the high mutability of the other positions within the kernel. Nonetheless, the total

predictive accuracy does perform a remarkable job of matching mutability patterns overall.

Following our computational experiments, we were able to calculate the error between the

actual fitness scores of all FG mutants sampled and the values predicted (through either neural network

predictions or fitness score addition):

1 X N q
E pr ed i ct ed = ( f i ,ac t ual − f i ,pr ed i c t ed )2
N i =1

The average error for the convolutional neural net predictions was found to be E C N N = 5.89 × 10−1 , with

a 95 % confidence interval of µ(EC N N ) = 5.89 × 10−1 ± 1.30 × 10−2 . Likewise, average error for the ad-

ditive fitness score calculations was found to be E ad d i t i ve = 6.81 × 10−1 , with a 95 % confidence inter-

val of µ(EC N N ) = 6.81 × 10−1 ± 1.99 × 10−2 . These values indicate that the convolutional neural network

predictions yielded less error than the additive fitness scores, with a statistically significant difference (

p = 3.91 × 10−13 , two-way paired t-test).

18
3.4 Discussion:

The development of such an algorithm holds great promise for protein engineering. To use a

predictive tool such as this, one would simply need to generate a library of mutants for each position,

and have a characteristic observable quantity to test the mutants on, which could be ability to fold, an

introduction of a novel functionality to the protein’s activity, or as we did in this study, the ability to form

a stable structure of interest.

During the process of training our neural network model, the rate of decreasing validation-set

error is directly proportional to the rate decreasing training-set error, which indicates that in "learning"

how to predict single mutant fitness, based on the physical property data provided, our model is also

picking up on the sequence-structure relationship needed to predict the effect of double mutants, which

supports the idea that neural networks hold promise as effective tools for determining fitness for appli-

cations involving directed evolution.

Figure 4: The decrease in error of predicting the fitness scores of the training set (single mutants) and

validation set (FG-epi mutants) changes at a proportional rate between these two sets, with respect to

training iterations (1 epoch = 1 round of iteration through the entire training set).

19
Results of particular interest include mutants with pronounced epistasis that counters the ex-

pectations produced by single mutants: that is to say, double mutants with fitness values that are signifi-

cantly different than the predictions of the two constituent single-mutant changes in fitness. For instance,

in co-expressing mutations at both residues 71 and 76, we see that when there is a negatively charged

amino acid present at position 71, and the charge of residue 76 (natively negatively charged) undergoes

charge inversion to a positively charged amino acid, like Lysine or Arginine, there is a resultant positive

fitness value, which goes against the additive fitness expectations due to complimentary interactions be-

tween two residues that would not yield a favorable state of fitness alone.

20
Figure 5: Experimental fitness data, neural network predictions, and additive predictions (top to bottom)

for double mutants of positions: [a,b,c] 72 (left) and 74 (bottom), [d,e,f ] 71 (left) and 76 (bottom), [g,h,i]

72 (left) and 75 (bottom), and [j,k,l] 73 (left) and 75 (bottom).

Despite being far from perfect overall, our model does successfully predict the epistatic results

for a significant number of double mutants. A common theme amongst tolerated double mutations is the

pairing of positive and negative charges, either of which might be disfavorable on their own. The intro-

duction of new structural features, like salt bridges, that may confer improvements in capsid stability are

21
an example of the types of interactions that it would be desireable to introduce into our protein through

directed evolution.

While neural networks are useful tools for learning about and interacting with complex systems

characterized by mathematical patterns that often evade human definition, the proper development of

their models requires the provision of extremely large data sets. In order to tune their many facetious

weights in a manner that can best process and interpret the vast and range of possible input data that

may be encountered remains a challenge, even for our less demanding application of these tools.

To develop any function of this sort such that it generalize to the novel situations that arise from

the interplay between units in an amino acid chain requires an appropriately large amount of training

data, such that the training procedure might "cover ground" wide enough to represent most possible

results that one might encounter from moderate variation of peptide identity. Thus, despite the seemingly

wide-spanning data set of our single-mutant fitness landscape, by machine-learning standards, this set

still sets us within an under-defined system[60].

To overcome the large data set requirement for effectively training neural networks, a tech-

nique of growing popularity is that of "Transfer-Learning". In this technique, larger data sets are utilized

to initially train the first few layers of the function model to recognize general motifs exhibited by both a

small function-specific data set, such as our fitness data, and a larger data source, like an online protein

data bank[61],[63],[64]. After this initial motif-recognition is achieved, that processing function is set up

to feed into a model specifically designed to develop predictive efficacy for the specific purposes of the

small data set, which in our case would be capsid formation.

Future directions will utilize the transfer-learning technique in order to train the first few lay-

ers of a neural network to develop a comprehensive processing system to recognize the ways in which the

sequence-identity of neighboring amino acids interact in order to form local motifs and secondary struc-

tural elements[65]. However, the specific arrangements and interactions of secondary structure motifs

that contribute to the unique tertiary and quaternary structures of a protein or family of proteins remains

defined by a large body of case-specific information. Thus, making sense of these arrangements is where

having a data set like a fitness landscape for a given protein truly plays a necessary role.

22
4 Conclusion:

Unraveling the rules governing protein self-assembly and epistasis is a profound and hum-

bling endeavor. Thus, such an undertaking requires the use of extensive computational tools in order to

process the associated data systems, due to the difficulty of analyzing and understanding them by hand.

In the pursuit of extensive protein modification, we encounter complex dynamical trajectories, the birth

and extinction of new forms and functionality, all guided by interactions in a vast system of intricate inter-

dependencies. To harness these phenomena in a marriage between human innovation and the evolved

forms that uphold life itself provides a huge step in scientific progress, and thus requires novel tools and

techniques to meet its unique needs. To understand and utilize the intricate weavings of the fabric of

our reality would, in essence, truly be for us to reach an intimate communion with the very forces that

brought forth our existence.

5 Acknowledgements:

My research experience in the Francis group has undoubtedly been the most transformative

chapter of my undergraduate education. In my time there, I learned much about experimental design

and methodology, the raw excitement of discovery, and perhaps most importantly, how to acknowledge

failures within a project, accept them, learn from them, and carry on in an appropriately adjusted di-

rection. For the guidance that I received throughout my time there, I would like to thank my friend and

mentor, Marco Lobba, for exposing me to the many facets of life as a researcher, and also for helping me

navigate through this pivotal point in my scientific career. I would also like to thank Emily Hartman for

her work in generating all the experimental data used in my research analyses, and I would like to thank

Professor Matthew Francis for fostering such a fun and engaging scientific environment.

23
References

[1] Eric D Schneider and James J Kay. “Order from disorder: the thermodynamics of complexity in

biology”. In: What is life? The next fifty years: Speculations on the future of biology (1995), pp. 161–

172.

[2] F Eugene Yates. Self-organizing systems: The emergence of order. Springer Science & Business Media,

2012.

[3] W Ross Ashby. “Requisite variety and its implications for the control of complex systems”. In: Facets

of Systems Science. Springer, 1991, pp. 405–417.

[4] Jeremy L England. “Statistical physics of self-replication”. In: The Journal of chemical physics 139.12

(2013), 09B623_1.

[5] David Andrieux and Pierre Gaspard. “Nonequilibrium generation of information in copolymeriza-

tion processes”. In: Proceedings of the National Academy of Sciences 105.28 (2008), pp. 9516–9521.

[6] Tsutomu Arakawa and Serge N Timasheff. “Theory of protein solubility”. In: Methods in enzymol-

ogy. Vol. 114. Elsevier, 1985, pp. 49–77.

[7] Gabriel J Rocklin et al. “Global analysis of protein folding using massively parallel design, synthesis,

and testing”. In: Science 357.6347 (2017), pp. 168–175.

[8] Hege Beard et al. “Applying physics-based scoring to calculate free energies of binding for single

amino acid mutations in protein-protein complexes”. In: PloS one 8.12 (2013), e82849.

[9] A Elisabeth Eriksson et al. “Response of a protein structure to cavity-creating mutations and its

relation to the hydrophobic effect”. In: Science 255.5041 (1992), pp. 178–183.

[10] Philip A Romero and Frances H Arnold. “Exploring protein fitness landscapes by directed evolu-

tion”. In: Nature Reviews Molecular Cell Biology 10.12 (2009), p. 866.

[11] John Maynard Smith. “Natural selection and the concept of a protein space”. In: Nature 225.5232

(1970), p. 563.

24
[12] Frances H Arnold. “Combinatorial and computational challenges for biocatalyst design”. In: Nature

409.6817 (2001), p. 253.

[13] Thomas A Hopf et al. “Mutation effects predicted from sequence co-variation”. In: Nature biotech-

nology 35.2 (2017), p. 128.

[14] Romas J Kazlauskas and Uwe T Bornscheuer. “Finding better protein engineering strategies”. In:

Nature chemical biology 5.8 (2009), p. 526.

[15] Jeff E Glasgow et al. “Osmolyte-mediated encapsulation of proteins inside MS2 viral capsids”. In:

ACS nano 6.10 (2012), pp. 8658–8664.

[16] Emily C Hartman et al. “Quantitative characterization of all single amino acid variants of a viral

capsid-based drug delivery vehicle”. In: Nature communications 9.1 (2018), p. 1385.

[17] Maximilian Hecht, Yana Bromberg, and Burkhard Rost. “News from the protein mutability land-

scape”. In: Journal of molecular biology 425.21 (2013), pp. 3937–3948.

[18] DA Steinhauer and JJ Holland. “Rapid evolution of RNA viruses”. In: Annual Reviews in Microbiol-

ogy 41.1 (1987), pp. 409–431.

[19] Andrew L Ferguson et al. “Translating HIV sequences into quantitative fitness landscapes predicts

viral vulnerabilities for rational immunogen design”. In: Immunity 38.3 (2013), pp. 606–617.

[20] David S Peabody. “Subunit fusion confers tolerance to peptide insertions in a virus coat protein”.

In: Archives of biochemistry and biophysics 347.1 (1997), pp. 85–92.

[21] Adel M ElSohly et al. “Synthetically modified viral capsids as versatile carriers for use in antibody-

based cell targeting”. In: Bioconjugate chemistry 26.8 (2015), pp. 1590–1596.

[22] Ernest W Kovacs et al. “Dual-surface-modified bacteriophage MS2 as an ideal scaffold for a viral

capsid-based drug delivery system”. In: Bioconjugate chemistry 18.4 (2007), pp. 1140–1147.

[23] Jeff E Glasgow et al. “Influence of electrostatics on small molecule flux through a protein nanoreac-

tor”. In: ACS synthetic biology 4.9 (2015), pp. 1011–1019.

[24] Keunhong Jeong et al. “Targeted molecular imaging of cancer cells using MS2-based 129Xe NMR”.

In: Bioconjugate chemistry 27.8 (2016), pp. 1796–1801.

25
[25] Tyler Meldrum et al. “A xenon-based molecular sensor assembled on an MS2 viral capsid scaffold”.

In: Journal of the American Chemical Society 132.17 (2010), pp. 5936–5937.

[26] Nicholas Stephanopoulos, Zachary M Carrico, and Matthew B Francis. “Nanoscale integration of

sensitizing chromophores and porphyrins with bacteriophage MS2”. In: Angewandte Chemie Inter-

national Edition 48.50 (2009), pp. 9498–9502.

[27] Ying-Zhong Ma et al. “Energy transfer dynamics in light-harvesting assemblies templated by the

tobacco mosaic virus coat protein”. In: The Journal of Physical Chemistry B 112.22 (2008), pp. 6887–

6892.

[28] Rebekah A Miller, Andrew D Presley, and Matthew B Francis. “Self-assembling light-harvesting sys-

tems from synthetically modified tobacco mosaic virus coat proteins”. In: Journal of the American

Chemical Society 129.11 (2007), pp. 3104–3109.

[29] Carola Engler et al. “Golden gate shuffling: a one-pot DNA shuffling method based on type IIs re-

striction enzymes”. In: PloS one 4.5 (2009), e5553.

[30] Gabriel L Butterfield et al. “Evolution of a designed protein assembly encapsulating its own RNA

genome”. In: Nature 552.7685 (2017), p. 415.

[31] Wilf T Horn et al. “The crystal structure of a high affinity RNA stem-loop complexed with the bacte-

riophage MS2 capsid: further challenges in the modeling of ligand–RNA interactions”. In: Rna 10.11

(2004), pp. 1776–1782.

[32] Karim M ElSawy. “The impact of viral RNA on the association free energies of capsid protein assem-

bly: bacteriophage MS2 as a case study”. In: Journal of molecular modeling 23.2 (2017), p. 47.

[33] Joan Pontius, Jean Richelle, and Shoshana J Wodak. “Deviations from standard atomic volumes

as a quality measure for protein crystal structures”. In: Journal of molecular biology 264.1 (1996),

pp. 121–136.

[34] WP Jencks and J Regenstein. Handbook of Biochemistry and Molecular Biology, ; Fasman, GD, Ed.

1976.

26
[35] M Charton. “Protein folding and the genetic code: an alternative quantitative model”. In: Journal

of theoretical biology 91.1 (1981), pp. 115–123.

[36] Mauno Vihinen, Esa Torkkila, and Pentti Riikonen. “Accuracy of protein flexibility predictions”. In:

Proteins: Structure, Function, and Bioinformatics 19.2 (1994), pp. 141–149.

[37] JM Zimmerman, Naomi Eliezer, and R Simha. “The characterization of amino acid sequences in

proteins by statistical methods”. In: Journal of theoretical biology 21.2 (1968), pp. 170–201.

[38] Maria Sandberg et al. “New chemical descriptors relevant for the design of biologically active pep-

tides. A multivariate characterization of 87 amino acids”. In: Journal of medicinal chemistry 41.14

(1998), pp. 2481–2491.

[39] WR Krigbaum and Akira Komoriya. “Local interactions as a structure determinant for protein molecules:

II.” In: Biochimica et biophysica acta 576.1 (1979), pp. 204–248.

[40] Jean-Luc Fauchère et al. “Amino acid side chain parameters for correlation studies in biology and

pharmacology”. In: Chemical Biology & Drug Design 32.4 (1988), pp. 269–278.

[41] Ken A Dill and Justin L MacCallum. “The protein-folding problem, 50 years on”. In: science 338.6110

(2012), pp. 1042–1046.

[42] Michael J Behe, Eaton E Lattman, and George D Rose. “The protein-folding problem: the native fold

determines packing, but does packing determine the native fold?” In: Proceedings of the National

Academy of Sciences 88.10 (1991), pp. 4195–4199.

[43] Martin Karplus and David L Weaver. “Protein-folding dynamics”. In: Nature 260.5550 (1976), p. 404.

[44] Ulrich HE Hansmann and Yuko Okamoto. “Comparative study of multicanonical and simulated

annealing algorithms in the protein folding problem”. In: Physica A: Statistical Mechanics and its

Applications 212.3-4 (1994), pp. 415–437.

[45] Linus Pauling, Robert B Corey, et al. “Stable configurations of polypeptide chains”. In: Proc. R. Soc.

Lond. B 141.902 (1953), pp. 21–33.

27
[46] Steven S Plotkin, Jin Wang, and Peter G Wolynes. “Statistical mechanics of a correlated energy land-

scape model for protein folding funnels”. In: The Journal of chemical physics 106.7 (1997), pp. 2932–

2948.

[47] Zhenqin Li and Harold A Scheraga. “Monte Carlo-minimization approach to the multiple-minima

problem in protein folding”. In: Proceedings of the National Academy of Sciences 84.19 (1987), pp. 6611–

6615.

[48] Charles Tanford. “The hydrophobic effect and the organization of living matter”. In: Science 200.4345

(1978), pp. 1012–1018.

[49] George I Makhatadze et al. “Contribution of surface salt bridges to protein stability: guidelines for

protein engineering”. In: Journal of molecular biology 327.5 (2003), pp. 1135–1148.

[50] Ken A Dill. “Dominant forces in protein folding”. In: Biochemistry 29.31 (1990), pp. 7133–7155.

[51] George D Rose and Richard Wolfenden. “Hydrogen bonding, hydrophobicity, packing, and protein

folding”. In: Annual review of biophysics and biomolecular structure 22.1 (1993), pp. 381–415.

[52] Ken A Dill, Klaus M Fiebig, and Hue Sun Chan. “Cooperativity in protein-folding kinetics.” In: Pro-

ceedings of the National Academy of Sciences 90.5 (1993), pp. 1942–1946.

[53] Karen S Sarkisyan et al. “Local fitness landscape of the green fluorescent protein”. In: Nature 533.7603

(2016), p. 397.

[54] Henadzi Vaitsekhovich. “Neural Networks in Disease Diagnostics”. In: BALTIC CONFERENCE. Cite-

seer, p. 47.

[55] Abu Bakar Siddiquee et al. “A Constructive Algorithm for Feedforward Neural Networks for Medical

Diagnostic Reasoning”. In: arXiv preprint arXiv:1009.4564 (2010).

[56] Pedro J Ballester and John BO Mitchell. “A machine learning approach to predicting protein–ligand

binding affinity with applications to molecular docking”. In: Bioinformatics 26.9 (2010), pp. 1169–

1175.

[57] Robert J Tunney et al. “Accurate design of translational output by a neural network model of ribo-

some distribution”. In: bioRxiv (2017), p. 201517.

28
[58] Victoria Pokusaeva et al. “Experimental assay of a fitness landscape on a macroevolutionary scale”.

In: bioRxiv (2018), p. 222778.

[59] Jeffrey Dean et al. “Large scale distributed deep networks”. In: Advances in neural information pro-

cessing systems. 2012, pp. 1223–1231.

[60] Maryam M Najafabadi et al. “Deep learning applications and challenges in big data analytics”. In:

Journal of Big Data 2.1 (2015), p. 1.

[61] Jason Yosinski et al. “How transferable are features in deep neural networks?” In: Advances in neural

information processing systems. 2014, pp. 3320–3328.

[62] Martın Abadi et al. “Tensorflow: Large-scale machine learning on heterogeneous distributed sys-

tems”. In: arXiv preprint arXiv:1603.04467 (2016).

[63] Jeff Donahue et al. “Decaf: A deep convolutional activation feature for generic visual recognition”.

In: International conference on machine learning. 2014, pp. 647–655.

[64] Hoo-Chang Shin et al. “Deep convolutional neural networks for computer-aided detection: CNN

architectures, dataset characteristics and transfer learning”. In: IEEE transactions on medical imag-

ing 35.5 (2016), pp. 1285–1298.

[65] Ning Qian and Terrence J Sejnowski. “Predicting the secondary structure of globular proteins using

neural network models”. In: Journal of molecular biology 202.4 (1988), pp. 865–884.

29
30
A Appendix : Data and Figures

A.1 Apparent Fitness for MS2:

Figure 6: MS2’s single-mutant fitness landscape, the quantified effect of mutating each position in the

protein backbone to each of the 20 canonical amino acids and nonsense mutations[16].

31
A.2 Epistasis values for double mutants in MS2’s FG-loop:

Figure 7: Epistasis values, ², for the epistatic landscape of MS2’s FG-loop, with epistasis being defined as:

² = ∆ f i j − (∆ f i + ∆ f j ), where ∆ f x = f x − f W T . This represents the difference between the additive fitness

scores and the actual fitness scores for the FG loop double mutants".

32
A.3 Neural Net Predictions and Additive Predictions

Figure 8: Experimental fitness data, neural network predictions, and additive predictions (left to right) for

double mutants of positions 71 and 72.

Figure 9: Experimental fitness data, neural network predictions, and additive predictions (left to right) for

double mutants of positions 71 and 73.

Figure 10: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 71 and 74.

33
Figure 11: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 71 and 75.

Figure 12: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 71 and 76.

Figure 13: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 72 and 73.

34
Figure 14: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 72 and 74.

Figure 15: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 72 and 75.

Figure 16: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 72 and 76.

35
Figure 17: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 73 and 74.

Figure 18: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 73 and 75.

Figure 19: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 73 and 76.

36
Figure 20: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 74 and 75.

Figure 21: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 74 and 76.

Figure 22: Experimental fitness data, neural network predictions, and additive predictions (left to right)

for double mutants of positions 75 and 76.

37

Вам также может понравиться