Abstract: Progress in theory of protein evolution has been held back by
gaps between model complexity and accuracy, and a lack of data. Models
are typically either: cheap – which facilitates the study of evolution
through predictions over many sequences – but inaccurate; or else they
are accurate but too expensive to run at scale. With the growth of curated
databases and high-throughput datasets, it is becoming easier to create
simple physical models and test them on experimental data.
Likewise, the deep learning revolution has produced algorithms that can
make fast, accurate predictions. I will first introduce AlphaFold (AF),
the new structure prediction algorithm from DeepMind, and show that it
is accurate enough to predict effects of single mutations.[1] We show this
directly by comparing with structures from the Protein Data Bank, and we
show this
indirectly by demonstrating that AF can be used to reliably predict changes
in fluorescence in GFP and BFP. As an example of AF’s utility, I will show
recent results where AF predictions correlate with changes in Guanylate
Kinase activity. To link AF with physical models, I will show that AF can
identify evolutionary allosteric coupling, which closely mirrors the propagation
of mechanical force through elastic networks. Next I will introduce our
new theory of molecular discrimination by proteins.[2] We construct a model
of protein-ligand binding that is complex, yet computationally efficient.
We show that the affinity and specificity of a protein (w.r.t. a set of
ligands) depend non-linearly on flexibility, ligand mismatch, and binding
energy, which we summarize in a phase diagram. The key to achieving high
specificity is precision – the right about of flexibility,coupled with
an appropriate degree of shape and chemical complementarity. We find that
mutations to residues far from the binding site lead to increasingly small
changes at the binding site,which enables fine-tuning of protein-ligand
interactions; this is more evident in larger proteins, and as such, they
are more evolvable and robust. These findings lead to the hypothesis that
the need for specificity results in a hard constraint on the minimum size
of proteins that can discriminate between certain ligands: proteins are
large because specificity requires it.
I will finish by discussing credible future directions for combining physical
models/theory, experimental data, and machine learning tools, and how they
will transform our understanding of protein evolution.
[1] McBride, Polev, Reinharz, Grzybowski, Tlusty, AlphaFold2 can predict
structural and phenotypic effects of single mutations, arxiv (2022), https://arxiv.org/abs/2204.06860
[2] McBride, Eckmann, Tlusty, General theory of specific binding:insights
from a geneticmechano-chemical protein model, Mol. Biol. Evo (2022), https://doi.org/10.1093/molbev/msac217