Biophysics Seminar

Description

Title: The “Digital” Nature of Protein Structure

Speaker: Gevorg Grigoryan, Dartmouth College

Notes: http://marylandbiophysics.umd.edu/seminars/

Abstract:

While we know that protein sequence encodes structure, capturing this sequence-to-structure mapping computationally has been difficult. Particularly so because the space of structural possibilities appears immense and complex. We propose that this space should nevertheless be describable as a combination of discrete local structural patterns. We introduce the concept of a TERM (tertiary motif), which encapsulates the full structural environment around a given residue, and show that the protein structural universe is highly degenerate at the level of TERMs. In fact, only 650 TERMs describe over 50% of the structural database at sub-Angstrom resolution. We go on to show that such degeneracy enables the direct quantification of sequence-structure relationships. Local sequence models can be extracted for each TERM contained in a protein structure, based on the frequent reuse of TERMs in unrelated proteins, with the overall protein structure described as a combination of these models. We have begun to demonstrate the broad applicability of such a framework across a variety of applications: 1) protein design: we have either partially or fully redesigned multiple proteins using TERM data alone, as well as designed novel structures de novo (with experimental validation); 2) structure prediction: we found that TERM-based sequence statistics identify accurate models; 3) we have shown that mutational stability changes are predicted quantitatively from TERM data alone. Earlier findings of degeneracies in the protein structure (e.g., for secondary and super-secondary motifs), have greatly advanced computational structural biology. TERM-based mining of structural data is the next logical step that should provide further quantitative insights into sequence-structural relationships.