Contents
Q. How are sequence and structure neighbors defined?
A. Chain A is a sequence neighbor of chain B if they contain a similar number of transmembrane segments and the sequence identity based on the sequence alignment between A and B is ≥ 0.85. Likewise, chain A is a structure neighbor of chain B if they contain a similar number of transmembrane segments and the TM-score of the structure alignment between A and B is ≥ 0.6.
Note that the sequence and structure alignments are not symmetric procedures and that the TM-score is not a symmetric operator, so the fact that chain A is neighbor of chain B does not guarantee that chain B is a neighbor of chain A (see below).
Q. Protein A is structurally related to protein B, but B is not related to A. How is this possible?
A. Structural relationship is assessed using a rigid cutoff: two chains are structurally related if their structural alignment has a TM-score ≥ 0.6. The TM-score is not a symmetric operator, meaning that it does not have the property TM-score(A,B) = TM-score(B,A). Thus, it is possible that protein A is considered structurally related to protein B, but not the opposite. The user can redefine the similarity threshold as needed, yet this stands as a reminder of the kind of inconsistencies that can be encountered when using fixed cutoffs with an asymptotically correct estimator.
Moreover, neither the sequence nor the structure alignments are performed via symmetric algorithms. Thus, the alignment of chain A on chain B is in general not the same as the alignment of chain B on chain A.
Q. What is the difference between the Standard Symmetry Detection and Multi-Step Symmetry Detection sections?
A. In the Standard Symmetry Detection section, you will find the output from the programs CE-Symm and SymD when they are executed with their default (or author-selected) set of parameters. The Multi-Step Symmetry Detection section instead shows the result of a more sophisticated procedure where both programs have been executed multiple times with different sets of parameters in order to increase their sensitivity without compromising specificity. Symmetry assignments from QuatSymm are also considered for multi-chain structures. The results are then filtered to exclude irrelevant symmetries and a final result is selected so as to present the most information about the symmetric relationships in the given structure.
Q. Why is it that in the Standard Symmetry Detection section it looks like CE-Symm has found some symmetry but in the Multi-Step Symmetry Detection section it says that no symmetry was found?
A. The Multi-Step Symmetry Detection method is not yet available for beta-barrels. Furthermore, for alpha-helical proteins, it filters symmetries comprising repeats with <2 TM helices or repeats that are entirely outside of the membrane-embedded region of the protein. These criteria have been defined so as to focus on the potentially functionally meaningful symmetries in the membrane.
Q. Why is it that in the Standard Symmetry Detection section it looks like SymD has found some symmetry but in the Multi-Step Symmetry Detection section it says that no symmetry was found?
A. The Multi-Step Symmetry Detection section aims to provide complete information about each defined symmetry - i.e. it reports the ranges and multiple alignments of the repeats associated with a symmetry. SymD does not provide information about individual repeats and there is no obvious way for such information to be extracted. Hence, while we use the results from SymD to enhance the abilities of CE-Symm for detecting a symmetry, we do not report its raw output in the Multi-Step Symmetry Detection section.
Q. How should the "repeats" entry be interpreted?
A. To illustrate this, consider the following example:
(4F35.D_43-140,4F35.C_43-140)(4F35.D_259-362,4F35.C_259-362)
(4F35.D_43-140,4F35.D_259-362)
(4F35.C_43-140,4F35.C_259-362)
Each line corresponds to the repeats associated with one axis of symmetry. Thus, in 4F35 we have found three axes of symmetry. Furthermore, each bracket defines the regions that have been aligned to each other. For example, the first axis links regions 4F35.D_43-140 and 4F35.D_259-362 to regions 4F35.C_43-140 and 4F35.C_259-362, highlighting the fact that the protein is a dimer, i.e. C2-symmetric. The second and third axes refer to a C2 symmetry within each protomer.
Q. What does "Symmetry Order C1" mean?
A. C1 means that no symmetry was detected.
Q. What does "Symmetry Order R" mean?
A. R is the order assigned by CE-Symm to cases where there is an open symmetry (as opposed to a point group symmetry such as C2 or D3) that is described either by negligible translation (rotational repeats) or by negligible rotation between repeats (translational repeats).
Q. Why are there multiple values divided by "and" for each symmetry estimator?
A. Some structures contain different symmetries in different structural regions. These symmetries are not hierarchically related to each other, and can each generate hierarchies of symmetric subregions. In order to keep them separated, such symmetries are annotated and separated by the word "and".
Q. Why are there lowercase letters in the alignments of the symmetry repeats?
A. Lowercase letters represent amino acids that were not included in the calculation of the RMSD and TM-scores between the repeats. They are either aligned with a gap or not close enough to their corresponding amino acids to be considered aligned.
Q. Why isn't the Multi-step Symmetry Detection available for beta-barrel proteins?
A. We are in the process of defining filter criteria that help us exclude the symmetries that have no functional relevance, which in turn allows us to use more permissive options for CE-Symm and SymD. We plan to include beta-barrels in the future.
Q. What do the Additional CE-Symm data and Additional SymD data options show?
A. These options reveal more of the raw output of the corresponding symmetry analysis program.
Q. What do the RMSD and TM-scores reported in the symmetry-related sections mean?
A. In the results from CE-Symm, the Multi-step Symmetry Detection Analysis, and the Symmetry Inferred From Neighbors sections, the RMSD and the TM-score are calculated after superimposing all hierarchically-related repeats onto the first repeat. In the results from SymD, the RMSD and the TM-score are calculated over the aligned residues after superimposing the transformed protein (with a transformation defined by the reported symmetry axis, angle and translation) onto the original structure.
Methodology
Q. Is there any reason why a structure might be excluded from EncoMPASS?
A. There are some rare exceptions caused by inconsistencies, repetitions or ambiguities in the coordinate files, such as unknown residue names or repeated residue indices (with no alternate location indicators) in the same chain. We also exclude structures with extended gaps in residue numbering (likely to be chimeric structures) or structures that are incompatible with a single flat membrane.
Q. How is a transmembrane domain defined?
A. A transmembrane domain is defined as a continuous range of amino acids containing at least one segment of secondary structure with Cα atoms inside the OPM-defined boundaries of the lipid bilayer by at least 1 Å. This definition includes membrane crossings, same-side membrane insertions, but not small membrane loops.
Q. Is the EncoMPASS definition of transmembrane domain related to the one used in OPM or PDBTM?
A. No. OPM, PDBTM and EncoMPASS have three different definitions of transmembrane domains (or segments). Specifically, OPM defines as transmembrane segment a continuous segment of secondary structure that is at least partially contained inside the lipid bilayer boundaries. This implies that a kinked TM helix crossing the membrane can correspond to two different TM segments (if the kink region is extended enough to be considered a loop). When compared on the common set of membrane protein structures, OPM, PDBTM and EncoMPASS agree on ~60% of assignments of TM domains.
Q. Why does EncoMPASS consider single-chain subunits as fundamental units for sequence and structure comparison?
A. Many existing structure classifications and protein structures databases (such as DALI, SCOP and CATH) take structural domains as fundamental units for their structural analyses. However, membrane proteins usually have only 1 or 2 structural domains, and domain fusion is rare. Moreover, the definition of a structural domain is controversial. On the other hand, comparing the structures of whole complexes can result in very low sensitivity to structural similarity. Thus, instead we used single-chain subunits as a fundamental unit. Chains are uniquely defined. Moreover, the program Fr-TM-Align is able to produce accurate structural alignments of chains with multiple structural domains.
Q. Why do sequence- and structure-wise comparisons involve only topologically similar chains?
A. EncoMPASS aims to maximize the accuracy of the similarity assessments it produces. Despite the reasonable accuracy of the sequence identity and TM-score estimators, the number of pairs of structures mistakenly considered to be related (false positives) increases with decreasing topological similarity. To limit the number of false positives, we only include alignments of chains with similar numbers of transmembrane segments (according to our definition.
Q. When are two chains considered topologically similar?
A. Two chains are considered to be topologically similar when the estimated number of transmembrane domains is the >75% of the larger number of transmembrane segments. We are aware that this is an approximation which precludes some interesting and important comparisons. Yet, the current strategy is efficient and consistent with the philosophy of sacrificing sensitivity over specificity. We are currently working on a more extensive definition of a topology class.
Q. Why does EncoMPASS rely on the TM-score rather than the RMSD to assess structural similarity?
A. The structure alignment program Fr-TM-Align relies on the TM-score to generate its results. TM-score is independent of the size of the proteins being aligned, meaning that a given value of the TM-score will always imply the same degree of similarity regardless of the size of the two structures. This is not true for the RMSD (e.g., an RMSD of 2.5 Å does not have the same meaning for an alignment of two 600-residue long proteins and two 40-residue long proteins).
Q. Is the TM-score used in the structural comparisons calculated over the whole structure?
A. No. Both RMSD and TM-score are calculated only on the pairs of Cα atoms that have been aligned by the program Fr-TM-Align. This subset of atoms and their coordinates can be downloaded from the web page corresponding to the chain of interest.
Q. What definition of sequence identity is EncoMPASS using and why?
A. Sequence identity is not uniquely defined. We represent the number of matches over the number of matches and mismatches, thus ignoring the parts of the sequence alignment containing gaps. This estimator is therefore a sequence-similarity equivalent of RMSD.
Features
Q. Can I compare my theoretical model with the structures contained in EncoMPASS?
A. Currently this is not possible, but we are working on adding this feature.
Issues
Q. Why is the PyMOL script that I downloaded from EncoMPASS for reproducing the figure not working?
A. During the upload of the database, coordinate files may have been renamed. Please change the name of the coordinate file in PyMOL to make it correspond to the coordinate file you downloaded.
Advanced Search
The user can search through results listed on the page for each protein complex ("Whole Structure" button) and subsequently filter those hits with criteria relating to results on the chains associated with those complexes. Alternatively, one can start with the results relating only to chains ("Chain Structure" button), and filter with results for the associated complexes. The table below lists the available search criteria.
General category |
Criteria |
Complex |
Chain |
Expected variable type |
Description |
Structure information |
PDB |
Yes |
Yes |
String |
The 4-letter PDB code for a complex or the chain code in the format XXXX_Y, where the PDB code is designated with Xs and Y is the case-sensitive chain identifier |
Protein name |
Yes |
Yes |
String |
The full or partial name of the protein as it appears in the OPM database |
Protein type (alpha/beta) |
Yes |
Yes |
String |
Either alpha or beta, indicating the primary secondary structure content |
UniProt |
Yes |
Yes |
String |
Accession number for the UniProt protein sequence database |
Num. of chains |
Yes |
No |
Integer |
The number of chains in the complex |
Num. of TM chains |
Yes |
No |
Integer |
The number of membrane-spanning chains in the complex |
Num. of TM domains |
Yes |
No |
Integer |
The number of transmembrane crossings of the chain |
Num. of amino acids |
Yes |
Yes |
Integer |
The number of residues in the structure |
Resolution |
Yes |
Yes |
Float |
The resolution of the structure |
Method |
Yes |
Yes |
String |
The technique used to solve the structure (e.g. X-RAY DIFFRACTION, ELECTRON MICROSCOPY, ELECTRON CRYSTALLOGRAPHY, NMR) |
Num. of sequence neighbors |
No |
Yes |
Integer |
The number of sequence homologues (≥85% identity) within the database |
Num. of structural neighbors |
No |
Yes |
Integer |
The number of structural homologues (TM-Score ≥ 0.6) within the database |
Num. of all neighbors |
No |
Yes |
Integer |
The number of sequence or structural homologues within the database |
CE-Symm results |
Order |
Yes |
Yes |
String |
Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries |
Num. of levels |
Yes |
Yes |
Integer |
The total number of levels of symmetry detected for multiple, hierarchically organized symmetries |
Num. of repeats |
Yes |
Yes |
Integer |
The number of symmetry-related structural repeats within the structure |
Repeat length |
Yes |
Yes |
Integer |
The average number of residues in a symmetry-related structural repeat |
Coverage |
Yes |
Yes |
Float |
The fraction of all amino acids in the structure that contribute to symmetry-related repeats |
Angle |
Yes |
Yes |
Float |
The angle [degrees] of a symmetry transformation |
Translation |
Yes |
Yes |
Float |
The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation) |
RMSD |
Yes |
Yes |
Float |
The average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry |
TM-Score |
No |
Yes |
Float |
The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details) |
SymD results |
Order |
Yes |
Yes |
Integer |
The number of structural repeats found within the structure (with no distinction between circular, dihedral or helical symmetries) |
Coverage |
Yes |
Yes |
Float |
The fraction of all amino acids in the structure that contribute to symmetry-related repeats |
Angle |
Yes |
Yes |
Float |
The angle [degrees] of a symmetry transformation |
Translation |
Yes |
Yes |
Float |
The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation) |
RMSD |
Yes |
Yes |
Float |
The root mean square deviation of the Cα atoms [Å] calculated over the symmetry-related residues after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates |
TM-Score |
Yes |
Yes |
Float |
The TM-score calculated after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates coordinates (see FAQ for details) |
Z-TM-Score |
Yes |
Yes |
Float |
The Z-score of the TM-score associated with the symmetry according to SymD |
MSSD results |
Order |
Yes |
Yes |
String |
Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries |
Num. of levels |
Yes |
Yes |
Integer |
The total number of levels of symmetry detected for multiple, hierarchically organized symmetries |
Num. of repeats |
Yes |
Yes |
Integer |
The number of symmetry-related structural repeats within the structure |
Repeat length |
Yes |
Yes |
Integer |
The average number of residues in a symmetry-related structural repeat |
Coverage |
Yes |
Yes |
Float |
The fraction of all amino acids in the structure that contribute to symmetry-related repeats |
Angle |
Yes |
Yes |
Float |
The angle [degrees] of a symmetry transformation |
Translation |
Yes |
Yes |
Float |
The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation) |
RMSD |
Yes |
Yes |
Float |
The average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry |
TM-Score |
Yes |
Yes |
Float |
The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details) |
Repeat topology (parallel/antiparallel) |
Yes |
Yes |
String |
Either parallel or antiparallel, indicating the relative topology of the repeats in the membrane |
Angle with membrane normal |
Yes |
Yes |
Float |
The angle [degrees] between the symmetry axis and the membrane normal |
Inferred symmetry results |
Template PDB |
No |
Yes |
String |
Chain PDB code (XXXX_Y) of the structure that was used as a source for the symmetry information |
Order |
No |
Yes |
String |
Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries |
Num. of levels |
No |
Yes |
Integer |
The total number of levels of symmetry detected for multiple, hierarchically organized symmetries |
Num. of repeats |
No |
Yes |
Integer |
The number of symmetry-related structural repeats within the structure |
Repeat length |
No |
Yes |
Float |
The average number of residues in a symmetry-related structural repeat |
Angle |
No |
Yes |
Float |
The angle [degrees] of a symmetry transformation |
Translation |
No |
Yes |
Float |
The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation) |
RMSD |
No |
Yes |
Float |
The average root mean square deviation [Å] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry |
TM-Score |
No |
Yes |
Float |
The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details) |
Repeat topology (parallel/antiparallel) |
No |
Yes |
String |
Either parallel or antiparallel, indicating the relative topology of the repeats in the membrane |
Angle with membrane normal |
No |
Yes |
Float |
The angle [degrees] between the symmetry axis and the membrane normal |
Encompass Api
If, instead of browsing the database through the Advanced Search option, you would like to query the EncoMPASS dataset in your automated workflow, we provide an API service. The service returns JSON formatted results.
General category |
Criteria |
Complex |
Chain |
Expected variable type |
Description |
Structure information |
PDB |
pdbid |
chainPdbid |
String |
The 4-letter PDB code for a complex or the chain code in the format XXXX_Y, where the PDB code is designated with Xs and Y is the case-sensitive chain identifier |
Protein name | title | title | String | The full or partial name of the protein as it appears in the OPM database |
Protein type (alpha/beta) | class | class | String | Either alpha or beta, indicating the primary secondary structure content |
UniProt | uniprot | chainUniprot | String | Accession code for the UniProt protein sequence database |
Num. of chains | numChains | numChains | Integer | The number of chains in the complex |
Num. of TM chains | numTmChains | numTmChains | Integer | The number of membrane-spanning chains in the complex |
Num. of TM domains | | numTmDomains | Integer | The number of transmembrane crossings of the chain |
Num. of amino acids | length | chainLength | Integer | The number of residues in the structure |
Resolution | resolution | resolution | Float | The resolution of the structure |
Method | method | method | String | Technique used to solve the structure (e.g. X-RAY DIFFRACTION, ELECTRON MICROSCOPY, ELECTRON CRYSTALLOGRAPHY, NMR) |
Num. of sequence neighbors | | numSeqNeighbors | Integer | The number of sequence homologues (≥85% identity) within the database |
Num. of structural neighbors | | numStructNeighbors | Integer | The number of structural homologues (TM-Score ≥ 0.6) within the database |
Num. of all neighbors | | numNeighbors | Integer | The number of sequence or structural homologues within the database |
CE-Symm results |
Order | cesymmOrder | chainCesymmOrder | String | Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries |
Num. of levels | cesymmNumLevels | chainCesymmNumLevels | Integer | The total number of levels of symmetry detected for multiple, hierarchically organized symmetries |
Num. of repeats | cesymmNumRepeats | chainCesymmNumRepeats | Integer | The number of symmetry-related structural repeats within the structure |
Repeat length | cesymmRepLength | chainCesymmRepLength | Integer | The average number of residues in a symmetry-related structural repeat |
Coverage | cesymmCoverage | chainCesymmCoverage | Float | The fraction of all amino acids in the structure that contribute to symmetry-related repeats |
Angle | cesymmAngle | chainCesymmAngle | Float | The angle [degrees] of a symmetry transformation |
Translation | cesymmTranslation | chainCesymmTranslation | Float | The length [Å] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation) |
RMSD | cesymmRmsd | chainCesymmRmsd | Float | The average root mean square deviation [Å] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry |
TM-Score | cesymmTmScore | chainCesymmTmScore | Float | The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details) |
Aligned Residues | cesymmAlignedResidues | chainCesymmAlignedResidues | String | |
Unrefined Rmsd | cesymmUnrefinedRmsd | chainCesymmUnrefinedRmsd | String | |
Unrefined TM-Score | cesymmUnrefinedTmScore | chainCesymmUnrefinedTmScore | String | |
Seed | cesymmSeed | chainCesymmSeed | String | |
SymD results |
Order | symdOrder | chainSymdOrder | Integer | The number of structural repeats found within the structure (with no distinction between circular, dihedral or helical symmetries) |
Coverage | symdCoverage | chainSymdCoverage | Float | The fraction of all amino acids in the structure that contribute to symmetry-related repeats |
Angle | symdAngle | chainSymdAngle | Float | The angle [degrees] of a symmetry transformation |
Translation | symdTranslation | chainSymdTranslation | Float | The length [Å] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation) |
RMSD | symdRmsd | chainSymdRmsd | Float | The root mean square deviation of the Cα atoms [Å] calculated over the symmetry-related residues after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates |
TM-Score | symdTmScore | chainSymdTmScore | Float | The TM-score calculated after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates (see FAQ for details) |
Z-TM-Score | symdZTmScore | chainSymdZTmScore | Float | The Z-score of the TM-score associated with the symmetry according to SymD |
Symmetry Detection |
Order | mssdOrder | chainMssdOrder | String | Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries |
Num. of levels | mssdNumLevels | chainMssdNumLevels | Integer | The total number of levels of symmetry detected for multiple, hierarchically organized symmetries |
Num. of repeats | mssdNumRepeats | chainMssdNumRepeats | Integer | The number of symmetry-related structural repeats within the structure |
Repeat length | mssdRepLength | chainMssdRepLength | Integer | The average number of residues in a symmetry-related structural repeat |
Coverage | mssdCoverage | chainMssdCoverage | Float | The fraction of all amino acids in the structure that contribute to symmetry-related repeats |
Angle | mssdAngle | chainMssdAngle | Float | The angle [degrees] of a symmetry transformation |
Translation | mssdTranslation | chainMssdTranslation | Float | The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation) |
RMSD | mssdRmsd | chainMssdRmsd | Float | The average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry |
TM-Score | mssdTmScore | chainMssdTmScore | Float | The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details) |
Repeat topology (parallel/antiparallel) | mssdRepTopology | chainMssdRepTopology | String | Either parallel or antiparallel, indicating the relative topology of the repeats in the membrane |
Angle with membrane normal | mssdAngleMembrane | chainMssdAngleMembrane | Float | The angle [degrees] between the symmetry axis and the membrane normal |
Aligned Residues | mssdAlignedResidues | chainMssdAlignedResidues | Integer | |
Inferred symmetry results |
Template PDB | | chainTemplateid | String | Chain PDB code (XXXX_Y) of the structure that was used as a source for the symmetry information |
Order | | chainInferOrder | String | Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries |
Num. of levels | | chainInferNumLevels | Integer | The total number of levels of symmetry detected for multiple, hierarchically organized symmetries |
Num. of repeats | | chainInferNumRepeats | Integer | The number of symmetry-related structural repeats within the structure |
Repeat length | | chainInferRepLength | Integer | The average number of residues in a symmetry-related structural repeat |
Coverage | | chainInferCoverage | Float | The fraction of all amino acids in the structure that contribute to symmetry-related repeats |
Angle | | chainInferAngle | Float | The angle [degrees] of a symmetry transformation |
Translation | | chainInferTranslation | Float | The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation) |
RMSD | | chainInferRmsd | Float | The average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry |
TM-Score | | chainInferTmScore | Float | The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details) |
Repeat topology (parallel/antiparallel) | | chainInferRepTopology | String | Either parallel or antiparallel, indicating the relative topology of the repeats in the membrane |
Angle with membrane normal | | chainInferAngleMembrane | Float | The angle [degrees] between the symmetry axis and the membrane normal |
Aligned Residues | | chainInferAlignedResidues | Integer | |
API-specific options |
page | page | page | Integer | Instead of returning all results from a query, which might be time intensive, results can be divided into pages with a predefined size (see size parameter) and only the requested page(s) will be displayed |
size | size | size | Integer | Number of results to be returned or displayed on each page |