Contents

Q. What percentage of the membrane protein coordinate files in the PDB is included in EncoMPASS?
A. Since we exclude low-resolution structures, currently, the membrane protein structures described in EncoMPASS amount to >65% of membrane protein structures contained in the PDB.

Q. How are sequence and structure neighbors defined?
A. Chain A is a sequence neighbor of chain B if they contain a similar number of transmembrane segments and the sequence identity based on the sequence alignment between A and B is ≥ 0.85. Likewise, chain A is a structure neighbor of chain B if they contain a similar number of transmembrane segments and the TM-score of the structure alignment between A and B is ≥ 0.6. Note that the sequence and structure alignments are not symmetric procedures and that the TM-score is not a symmetric operator, so the fact that chain A is neighbor of chain B does not guarantee that chain B is a neighbor of chain A (see below).

Q. Protein A is structurally related to protein B, but B is not related to A. How is this possible?
A. Structural relationship is assessed using a rigid cutoff: two chains are structurally related if their structural alignment has a TM-score ≥ 0.6. The TM-score is not a symmetric operator, meaning that it does not have the property TM-score(A,B) = TM-score(B,A). Thus, it is possible that protein A is considered structurally related to protein B, but not the opposite. The user can redefine the similarity threshold as needed, yet this stands as a reminder of the kind of inconsistencies that can be encountered when using fixed cutoffs with an asymptotically correct estimator. Moreover, neither the sequence nor the structure alignments are performed via symmetric algorithms. Thus, the alignment of chain A on chain B is in general not the same as the alignment of chain B on chain A.

Q. What is the difference between the Standard Symmetry Detection and Multi-Step Symmetry Detection sections?
A. In the Standard Symmetry Detection section, you will find the output from the programs CE-Symm and SymD when they are executed with their default (or author-selected) set of parameters. The Multi-Step Symmetry Detection section instead shows the result of a more sophisticated procedure where both programs have been executed multiple times with different sets of parameters in order to increase their sensitivity without compromising specificity. The results are then filtered to exclude irrelevant symmetries and a final result is selected so as to present the most information about the symmetric relationships in the given structure.

Q. Why is it that in the Standard Symmetry Detection section it looks like CE-Symm has found some symmetry but in the Multi-Step Symmetry Detection section it says that no symmetry was found?
A. The Multi-Step Symmetry Detection method is not yet available for beta-barrels. Furthermore, for alpha-helical proteins, it filters symmetries comprising repeats with <2 TM helices or repeats that are entirely outside of the membrane-embedded region of the protein. These criteria have been defined so as to focus on the potentially functionally meaningful symmetries in the membrane.

Q. Why is it that in the Standard Symmetry Detection section it looks like SymD has found some symmetry but in the Multi-Step Symmetry Detection section it says that no symmetry was found?
A. The Multi-Step Symmetry Detection section aims to provide complete information about each defined symmetry - i.e. it reports the ranges and multiple alignments of the repeats associated with a symmetry. SymD does not provide information about individual repeats and there is no obvious way for such information to be extracted. Hence, while we use the results from SymD to enhance the abilities of CE-Symm for detecting a symmetry, we do not report its raw output in the Multi-Step Symmetry Detection section.

Q. How should the "repeats" entry be interpreted?
A. To illustrate this, consider the following example:

(4F35.D_43-140,4F35.C_43-140)(4F35.D_259-362,4F35.C_259-362)
(4F35.D_43-140,4F35.D_259-362)
(4F35.C_43-140,4F35.C_259-362)

Each line corresponds to the repeats associated with one axis of symmetry. Thus, in 4F35 we have found three axes of symmetry. Furthermore, each bracket defines the regions that have been aligned to each other. For example, the first axis links regions 4F35.D_43-140 and 4F35.D_259-362 to regions 4F35.C_43-140 and 4F35.C_259-362, highlighting the fact that the protein is a dimer, i.e. C2-symmetric. The second and third axes refer to a C2 symmetry within each protomer.

Q. What does "Symmetry Order C1" mean?
A. C1 means that no symmetry was detected.

Q. What does "Symmetry Order R" mean?
A. R is the order assigned by CE-Symm to cases where there is an open symmetry (as opposed to a point group symmetry such as C2 or D3) that is described either by negligible translation (rotational repeats) or by negligible rotation between repeats (translational repeats).

Q. Why are there multiple values divided by "and" for each symmetry estimator?
A. Some structures contain different symmetries in different structural regions. These symmetries are not hierarchically related to each other, and can each generate hierarchies of symmetric subregions. In order to keep them separated, such symmetries are annotated and separated by the word "and".

Q. Why are there lowercase letters in the alignments of the symmetry repeats?
A. Lowercase letters represent amino acids that were not included in the calculation of the RMSD and TM-scores between the repeats. They are either aligned with a gap or not close enough to their corresponding amino acids to be considered aligned.

Q. Why isn't the Multi-step Symmetry Detection available for beta-barrel proteins?
A. We are in the process of defining filter criteria that help us exclude the symmetries that have no functional relevance, which in turn allows us to use more permissive options for CE-Symm and SymD. We plan to include beta-barrels in the future.

Q. What do the Additional CE-Symm data and Additional SymD data options show?
A. These options reveal more of the raw output of the corresponding symmetry analysis program.

Q. What do the RMSD and TM-scores reported in the symmetry-related sections mean?
A. In the results from CE-Symm, the Multi-step Symmetry Detection Analysis, and the Symmetry Inferred From Neighbors sections, the RMSD and the TM-score are calculated after superimposing all hierarchically-related repeats onto the first repeat. In the results from SymD, the RMSD and the TM-score are calculated over the aligned residues after superimposing the transformed protein (with a transformation defined by the reported symmetry axis, angle and translation) onto the original structure.

Methodology

Q. Why does EncoMPASS include only X-ray structures with resolution ≤ 3.5Å?
A. A key step in the production of all data presented in EncoMPASS is the pairwise structure alignments of each single-chain subunit from each complex with all other topologically similar chains. To ensure the accuracy of the alignments, we imposed a cutoff on the resolution of the structures. However, we are exploring ways to include such structures in EncoMPASS without compromising the accuracy of the present structure alignments.

Q. Is there any other reason why a structure might be excluded from EncoMPASS?
A. There are some rare exceptions caused by inconsistencies, repetitions or ambiguities in the coordinate files, such as unknown residue names or repeated residue indices (with no alternate location indicators) in the same chain. We also exclude structures with extended gaps in residue numbering (likely to be chimeric structures).

Q. How is a transmembrane domain defined?
A. A transmembrane domain is defined as a continuous range of amino acids containing at least one segment of secondary structure with Cα atoms inside the OPM-defined boundaries of the lipid bilayer by at least 1 Å. This definition includes membrane crossings, same-side membrane insertions, but not small membrane loops.

Q. Is the EncoMPASS definition of transmembrane domain related to the one used in OPM or PDBTM?
A. No. OPM, PDBTM and EncoMPASS have three different definitions of transmembrane domains (or segments). Specifically, OPM defines as transmembrane segment a continuous segment of secondary structure that is at least partially contained inside the lipid bilayer boundaries. This implies that a kinked TM helix crossing the membrane can correspond to two different TM segments (if the kink region is extended enough to be considered a loop). When compared on the common set of membrane protein structures, OPM, PDBTM and EncoMPASS agree on ~60% of assignments of TM domains.

Q. Why does EncoMPASS consider single-chain subunits as fundamental units for sequence and structure comparison?
A. Many existing structure classifications and protein structures databases (such as DALI, SCOP and CATH) take structural domains as fundamental units for their structural analyses. However, membrane proteins usually have only 1 or 2 structural domains, and domain fusion is rare. Moreover, the definition of a structural domain is controversial. On the other hand, comparing the structures of whole complexes can result in very low sensitivity to structural similarity. Thus, instead we used single-chain subunits as a fundamental unit. Chains are uniquely defined. Moreover, the program Fr-TM-Align is able to produce accurate structural alignments of chains with multiple structural domains.

Q. Why do sequence- and structure-wise comparisons involve only topologically similar chains?
A. EncoMPASS aims to maximize the accuracy of the similarity assessments it produces. Despite the reasonable accuracy of the sequence identity and TM-score estimators, the number of pairs of structures mistakenly considered to be related (false positives) increases with decreasing topological similarity. To limit the number of false positives, we only include alignments of chains with similar numbers of transmembrane segments (according to our definition.

Q. When are two chains considered topologically similar?
A. Two chains are considered to be topologically similar when the estimated number of transmembrane domains is the >75% of the larger number of transmembrane segments. We are aware that this is an approximation which precludes some interesting and important comparisons. Yet, the current strategy is efficient and consistent with the philosophy of sacrificing sensitivity over specificity. We are currently working on a more extensive definition of a topology class.

Q. Why does EncoMPASS rely on the TM-score rather than the RMSD to assess structural similarity?
A. The structure alignment program Fr-TM-Align relies on the TM-score to generate its results. TM-score is independent of the size of the proteins being aligned, meaning that a given value of the TM-score will always imply the same degree of similarity regardless of the size of the two structures. This is not true for the RMSD (e.g., an RMSD of 2.5 Å does not have the same meaning for an alignment of two 600-residue long proteins and two 40-residue long proteins).

Q. Is the TM-score used in the structural comparisons calculated over the whole structure?
A. No. Both RMSD and TM-score are calculated only on the pairs of Cα atoms that have been aligned by the program Fr-TM-Align. This subset of atoms and their coordinates can be downloaded from the web page corresponding to the chain of interest.

Q. What definition of sequence identity is EncoMPASS using and why?
A. Sequence identity is not uniquely defined. We represent the number of matches over the number of matches and mismatches, thus ignoring the parts of the sequence alignment containing gaps. This estimator is therefore a sequence-similarity equivalent of RMSD.

Features

Q. Can I compare my theoretical model with the structures contained in EncoMPASS?
A. Currently this is not possible, but we are working on adding this feature.

Issues

Q. Why is the PyMOL script that I downloaded from EncoMPASS for reproducing the figure not working?
A. During the upload of the database, coordinate files may have been renamed. Please change the name of the coordinate file in PyMOL to make it correspond to the coordinate file you downloaded.

Advanced Search

The user can search through results listed on the page for each protein complex ("Whole Structure" button) and subsequently filter those hits with criteria relating to results on the chains associated with those complexes. Alternatively, one can start with the results relating only to chains ("Chain Structure" button), and filter with results for the associated complexes. The table below lists the available search criteria.

General category Criteria Complex Chain Expected variable type Description
Structure information PDB Yes Yes String The 4-letter PDB code for a complex or the chain code in the format XXXX_Y, where the PDB code is designated with Xs and Y is the case-sensitive chain identifier
Protein name Yes Yes String The full or partial name of the protein as it appears in the OPM database
Protein type (alpha/beta) Yes Yes String Either alpha or beta, indicating the primary secondary structure content
Num. of chains Yes No Integer The number of chains in the complex
Num. of TM chains Yes No Integer The number of membrane-spanning chains in the complex
Num. of TM domains Yes No Integer The number of transmembrane crossings of the chain
Num. of amino acids Yes Yes Integer The number of residues in the structure
Resolution Yes Yes Float The resolution of the structure
Num. of sequence neighbors No Yes Integer The number of sequence homologues (≥85% identity) within the database
Num. of structural neighbors No Yes Integer The number of structural homologues (TM-Score ≥ 0.6) within the database
Num. of all neighbors No Yes Integer The number of sequence or structural homologues within the database
CE-Symm results Order Yes Yes String Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
Num. of levels Yes Yes Integer The total number of levels of symmetry detected for multiple, hierarchically organized symmetries
Num. of repeats Yes Yes Integer The number of symmetry-related structural repeats within the structure
Repeat length Yes Yes Integer The average number of residues in a symmetry-related structural repeat
Coverage Yes Yes Float The fraction of all amino acids in the structure that contribute to symmetry-related repeats
Angle Yes Yes Float The angle [degrees] of a symmetry transformation
Translation Yes Yes Float The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
RMSD Yes Yes Float The average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
TM-Score No Yes Float The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
SymD results Order Yes Yes Integer The number of structural repeats found within the structure (with no distinction between circular, dihedral or helical symmetries)
Coverage Yes Yes Float The fraction of all amino acids in the structure that contribute to symmetry-related repeats
Angle Yes Yes Float The angle [degrees] of a symmetry transformation
Translation Yes Yes Float The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
RMSD Yes Yes Float The root mean square deviation of the Cα atoms [Å] calculated over the symmetry-related residues after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates
TM-Score Yes Yes Float The TM-score calculated after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates coordinates (see FAQ for details)
Z-TM-Score Yes Yes Float The Z-score of the TM-score associated with the symmetry according to SymD
MSSD results Order Yes Yes String Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
Num. of levels Yes Yes Integer The total number of levels of symmetry detected for multiple, hierarchically organized symmetries
Num. of repeats Yes Yes Integer The number of symmetry-related structural repeats within the structure
Repeat length Yes Yes Integer The average number of residues in a symmetry-related structural repeat
Coverage Yes Yes Float The fraction of all amino acids in the structure that contribute to symmetry-related repeats
Angle Yes Yes Float The angle [degrees] of a symmetry transformation
Translation Yes Yes Float The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
RMSD Yes Yes Float The average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
TM-Score Yes Yes Float The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
Repeat topology (parallel/antiparallel) Yes Yes String Either parallel or antiparallel, indicating the relative topology of the repeats in the membrane
Angle with membrane normal Yes Yes Float The angle [degrees] between the symmetry axis and the membrane normal
Inferred symmetry results Template PDB No Yes String Chain PDB code (XXXX_Y) of the structure that was used as a source for the symmetry information
Order No Yes String Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
Num. of levels No Yes Integer The total number of levels of symmetry detected for multiple, hierarchically organized symmetries
Num. of repeats No Yes Integer The number of symmetry-related structural repeats within the structure
Repeat length No Yes Float The average number of residues in a symmetry-related structural repeat
Angle No Yes Float The angle [degrees] of a symmetry transformation
Translation No Yes Float The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
RMSD No Yes Float The average root mean square deviation [Å] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
TM-Score No Yes Float The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
Repeat topology (parallel/antiparallel) No Yes String Either parallel or antiparallel, indicating the relative topology of the repeats in the membrane
Angle with membrane normal No Yes Float The angle [degrees] between the symmetry axis and the membrane normal

Encompass Api

If, instead of browsing the database through the Advanced Search option, you would like to query the EncoMPASS dataset in your automated workflow, we provide an API service. The service returns JSON formatted results.

General category Criteria Complex Chain Expected variable type Description
Structure information PDB pdbid chainPdbid String The 4-letter PDB code for a complex or the chain code in the format XXXX_Y, where the PDB code is designated with Xs and Y is the case-sensitive chain identifier
Protein nametitletitleStringThe full or partial name of the protein as it appears in the OPM database
Protein type (alpha/beta)classclassStringEither alpha or beta, indicating the primary secondary structure content
Num. of chainsnumChainsnumChainsIntegerThe number of chains in the complex
Num. of TM chainsnumTmChainsnumTmChainsIntegerThe number of membrane-spanning chains in the complex
Num. of TM domainsnumTmDomainsIntegerThe number of transmembrane crossings of the chain
Num. of amino acidslengthchainLengthIntegerThe number of residues in the structure
Resolution resolutionresolutionFloatThe resolution of the structure
Num. of sequence neighborsnumSeqNeighborsIntegerThe number of sequence homologues (≥85% identity) within the database
Num. of structural neighborsnumStructNeighborsIntegerThe number of structural homologues (TM-Score ≥ 0.6) within the database
Num. of all neighborsnumNeighborsIntegerThe number of sequence or structural homologues within the database
CE-Symm results OrdercesymmOrderchainCesymmOrderStringSymmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
Num. of levelscesymmNumLevelschainCesymmNumLevelsIntegerThe total number of levels of symmetry detected for multiple, hierarchically organized symmetries
Num. of repeatscesymmNumRepeatschainCesymmNumRepeatsIntegerThe number of symmetry-related structural repeats within the structure
Repeat lengthcesymmRepLengthchainCesymmRepLengthIntegerThe average number of residues in a symmetry-related structural repeat
CoveragecesymmCoveragechainCesymmCoverageFloatThe fraction of all amino acids in the structure that contribute to symmetry-related repeats
AnglecesymmAnglechainCesymmAngleFloatThe angle [degrees] of a symmetry transformation
TranslationcesymmTranslationchainCesymmTranslationFloatThe length [Å] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
RMSDcesymmRmsdchainCesymmRmsdFloatThe average root mean square deviation [Å] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
TM-ScorecesymmTmScorechainCesymmTmScoreFloatThe TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
Aligned ResiduescesymmAlignedResidueschainCesymmAlignedResiduesString
Unrefined RmsdcesymmUnrefinedRmsdchainCesymmUnrefinedRmsdString
Unrefined TM-ScorecesymmUnrefinedTmScorechainCesymmUnrefinedTmScoreString
SeedcesymmSeedchainCesymmSeedString
SymD results OrdersymdOrderchainSymdOrderIntegerThe number of structural repeats found within the structure (with no distinction between circular, dihedral or helical symmetries)
CoveragesymdCoveragechainSymdCoverageFloatThe fraction of all amino acids in the structure that contribute to symmetry-related repeats
AnglesymdAnglechainSymdAngleFloatThe angle [degrees] of a symmetry transformation
TranslationsymdTranslationchainSymdTranslationFloatThe length [Å] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
RMSDsymdRmsdchainSymdRmsdFloatThe root mean square deviation of the Cα atoms [Å] calculated over the symmetry-related residues after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates
TM-ScoresymdTmScorechainSymdTmScoreFloatThe TM-score calculated after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates (see FAQ for details)
Z-TM-ScoresymdZTmScorechainSymdZTmScoreFloatThe Z-score of the TM-score associated with the symmetry according to SymD
Symmetry Detection OrdermssdOrderchainMssdOrderStringSymmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
Num. of levelsmssdNumLevelschainMssdNumLevelsIntegerThe total number of levels of symmetry detected for multiple, hierarchically organized symmetries
Num. of repeatsmssdNumRepeatschainMssdNumRepeatsIntegerThe number of symmetry-related structural repeats within the structure
Repeat lengthmssdRepLengthchainMssdRepLengthIntegerThe average number of residues in a symmetry-related structural repeat
CoveragemssdCoveragechainMssdCoverageFloatThe fraction of all amino acids in the structure that contribute to symmetry-related repeats
AnglemssdAnglechainMssdAngleFloatThe angle [degrees] of a symmetry transformation
TranslationmssdTranslationchainMssdTranslationFloatThe length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
RMSDmssdRmsdchainMssdRmsdFloatThe average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
TM-ScoremssdTmScorechainMssdTmScoreFloatThe TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
Repeat topology (parallel/antiparallel)mssdRepTopologychainMssdRepTopologyStringEither parallel or antiparallel, indicating the relative topology of the repeats in the membrane
Angle with membrane normalmssdAngleMembranechainMssdAngleMembraneFloatThe angle [degrees] between the symmetry axis and the membrane normal
Aligned ResiduesmssdAlignedResidueschainMssdAlignedResiduesInteger
Inferred symmetry results Template PDBchainTemplateidStringChain PDB code (XXXX_Y) of the structure that was used as a source for the symmetry information
OrderchainInferOrderStringSymmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
Num. of levelschainInferNumLevelsIntegerThe total number of levels of symmetry detected for multiple, hierarchically organized symmetries
Num. of repeatschainInferNumRepeatsIntegerThe number of symmetry-related structural repeats within the structure
Repeat lengthchainInferRepLengthIntegerThe average number of residues in a symmetry-related structural repeat
CoveragechainInferCoverageFloatThe fraction of all amino acids in the structure that contribute to symmetry-related repeats
AnglechainInferAngleFloatThe angle [degrees] of a symmetry transformation
TranslationchainInferTranslationFloatThe length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
RMSDchainInferRmsdFloatThe average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
TM-ScorechainInferTmScoreFloatThe TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
Repeat topology (parallel/antiparallel)chainInferRepTopologyStringEither parallel or antiparallel, indicating the relative topology of the repeats in the membrane
Angle with membrane normalchainInferAngleMembraneFloatThe angle [degrees] between the symmetry axis and the membrane normal
Aligned ResidueschainInferAlignedResiduesInteger
API-specific options pagepagepageIntegerInstead of returning all results from a query, which might be time intensive, results can be divided into pages with a predefined size (see size parameter) and only the requested page(s) will be displayed
sizesizesizeIntegerNumber of results to be returned or displayed on each page