Interface predictions for all 332 interactions reported by Gordon et al. were made in two phases (residue-level interface likelihood predictions, and guided protein-protein structural docking). The combined interface annotation across all interactions is provided below. Results from the guided docking were used over raw ECLAIR predictions wherever possible.
- •   Interface_annotation_2021_05.txt
In Phase I, we leveraged our previously validated Ensemble Classifier Learning Algorithm to predict Interface Residues (ECLAIR, Meyer et al.) to perform an initial prediction of likely interface residues across all interactions. Summary annotations and raw ECLAIR interface probabilities for each interaction are summarized below.
Guided PyRosetta Docking
In phase II, we generated atomic resolution models of 250 interactions by leveraging the Rosetta scoring function (Alford et al.) and prior probabilities obtained from ECLAIR predictions to perform guided docking. The remaining 82 interactions were missing reliable 3D models at least one of their members and therefore were not amenable to docking. Interface residues were defined as those residues on the surface (relative accessibility ≥15%) and in contact with the partner chain (absolute change in accessibility decreased by ≥1.0 Å² comparing docked vs. undocked structures). Interface residue annotations were made using the top-ranked docked pose from ~50 independent trials. All other docking results are also provided.
Protein structures for docking
Structures for SARS-CoV-2 proteins were generated in Modeller (Eswar et al.) using a multiple template modelling procedure. A summary of templates used and coverage as well as the homology model are provided. Models of the human proteins used for docking were obtained from either the Protein Data Bank or ModBase. Summaries of each and the structures are provided.
- •   SARS_CoV_2_homology_model_summary_2021_05.txt
- •   SARS_CoV_2_proteins_2021_05.zip
- •   Human_structure_summary_2021_05.txt
- •   Human_proteins_2021_05.zip
We performed protein-ligand docking using Smina (Koes et al.) for 76 candidate drugs reported by Gordon et al. that target the 332 human interactors of SARS_CoV_2. The top-ranked docked pose for each drug-target pair was retained for binding site comparison against viral interactors. Two batches (one containing only the top-ranked pose and one containing the top 10 poses) are provided. Annotated sets of drug binding interface residues and enrichment for overlap with protein binding interface residues are reported.
For analysis of genetic variation that may impact the viral-human interactome, two sets of mutations were compiled. Sequence deviations between SARS-CoV and SARS-CoV-2 were identified by comparing the SASR-CoV-2 proteins sequences based (genbank accession MN985325) against the SARS-CoV proteome (UniProt Proteome ID UP000000354). Human population variants in all 332 human proteins shown to interact with SARS-CoV-2 proteins were obtained from gnomAD. These mutation sets were analyzed for enrichment or depletion of variation along SARS-CoV-2-Human protein-protein interaction interfaces.
Mutations between SARS-CoV and SARS-CoV-2
Human population variants
In order to predict the impact of variation at viral-human interaction interfaces on binding affinity, two sets of ΔΔG predictions were made using PyRosetta (Chaudhury et al.).
Set 1: A scanning mutation ΔΔG approach was implemented to explore the overall binding energy contributions of each interface residue, and to predict the impact of all possible mutations along the interface.
Set 2: Estimates of the overall impact of the cumulative set of mutations between SARS-CoV and SARS-CoV-2 were made using the same general framework. ΔΔG predictions for all population variants on the interfaces are also included.