EncoMPASS

Encyclopedia of Membrane Proteins Analyzed by Structure and Symmetry

EncoMPASS

Q. How are sequence and structure neighbors defined?
A. Chain A is a sequence neighbor of chain B if they contain a similar number of transmembrane segments and the sequence identity based on the sequence alignment between A and B is ≥ 0.85. Likewise, chain A is a structure neighbor of chain B if they contain a similar number of transmembrane segments and the TM-score of the structure alignment between A and B is ≥ 0.6. Note that the sequence and structure alignments are not symmetric procedures and that the TM-score is not a symmetric operator, so the fact that chain A is neighbor of chain B does not guarantee that chain B is a neighbor of chain A (see below).

Q. Protein A is structurally related to protein B, but B is not related to A. How is this possible?
A. Structural relationship is assessed using a rigid cutoff: two chains are structurally related if their structural alignment has a TM-score ≥ 0.6. The TM-score is not a symmetric operator, meaning that it does not have the property TM-score(A,B) = TM-score(B,A). Thus, it is possible that protein A is considered structurally related to protein B, but not the opposite. The user can redefine the similarity threshold as needed, yet this stands as a reminder of the kind of inconsistencies that can be encountered when using fixed cutoffs with an asymptotically correct estimator. Moreover, neither the sequence nor the structure alignments are performed via symmetric algorithms. Thus, the alignment of chain A on chain B is in general not the same as the alignment of chain B on chain A.

Q. What is the difference between the Standard Symmetry Detection and Multi-Step Symmetry Detection sections?
A. In the Standard Symmetry Detection section, you will find the output from the programs CE-Symm and SymD when they are executed with their default (or author-selected) set of parameters. The Multi-Step Symmetry Detection section instead shows the result of a more sophisticated procedure where both programs have been executed multiple times with different sets of parameters in order to increase their sensitivity without compromising specificity. Symmetry assignments from QuatSymm are also considered for multi-chain structures. The results are then filtered to exclude irrelevant symmetries and a final result is selected so as to present the most information about the symmetric relationships in the given structure.

Q. Why is it that in the Standard Symmetry Detection section it looks like CE-Symm has found some symmetry but in the Multi-Step Symmetry Detection section it says that no symmetry was found?
A. The Multi-Step Symmetry Detection method is not yet available for beta-barrels. Furthermore, for alpha-helical proteins, it filters symmetries comprising repeats with <2 TM helices or repeats that are entirely outside of the membrane-embedded region of the protein. These criteria have been defined so as to focus on the potentially functionally meaningful symmetries in the membrane.

Q. Why is it that in the Standard Symmetry Detection section it looks like SymD has found some symmetry but in the Multi-Step Symmetry Detection section it says that no symmetry was found?
A. The Multi-Step Symmetry Detection section aims to provide complete information about each defined symmetry - i.e. it reports the ranges and multiple alignments of the repeats associated with a symmetry. SymD does not provide information about individual repeats and there is no obvious way for such information to be extracted. Hence, while we use the results from SymD to enhance the abilities of CE-Symm for detecting a symmetry, we do not report its raw output in the Multi-Step Symmetry Detection section.

Q. How should the "repeats" entry be interpreted?
A. To illustrate this, consider the following example:

(4F35.D_43-140,4F35.C_43-140)(4F35.D_259-362,4F35.C_259-362)
(4F35.D_43-140,4F35.D_259-362)
(4F35.C_43-140,4F35.C_259-362)

Each line corresponds to the repeats associated with one axis of symmetry. Thus, in 4F35 we have found three axes of symmetry. Furthermore, each bracket defines the regions that have been aligned to each other. For example, the first axis links regions 4F35.D_43-140 and 4F35.D_259-362 to regions 4F35.C_43-140 and 4F35.C_259-362, highlighting the fact that the protein is a dimer, i.e. C2-symmetric. The second and third axes refer to a C2 symmetry within each protomer.

Q. What does "Symmetry Order C1" mean?
A. C1 means that no symmetry was detected.

Q. What does "Symmetry Order R" mean?
A. R is the order assigned by CE-Symm to cases where there is an open symmetry (as opposed to a point group symmetry such as C2 or D3) that is described either by negligible translation (rotational repeats) or by negligible rotation between repeats (translational repeats).

Q. Why are there multiple values divided by "and" for each symmetry estimator?
A. Some structures contain different symmetries in different structural regions. These symmetries are not hierarchically related to each other, and can each generate hierarchies of symmetric subregions. In order to keep them separated, such symmetries are annotated and separated by the word "and".

Q. Why are there lowercase letters in the alignments of the symmetry repeats?
A. Lowercase letters represent amino acids that were not included in the calculation of the RMSD and TM-scores between the repeats. They are either aligned with a gap or not close enough to their corresponding amino acids to be considered aligned.

Q. Why isn't the Multi-step Symmetry Detection available for beta-barrel proteins?
A. We are in the process of defining filter criteria that help us exclude the symmetries that have no functional relevance, which in turn allows us to use more permissive options for CE-Symm and SymD. We plan to include beta-barrels in the future.

Q. What do the Additional CE-Symm data and Additional SymD data options show?
A. These options reveal more of the raw output of the corresponding symmetry analysis program.

Q. What do the RMSD and TM-scores reported in the symmetry-related sections mean?
A. In the results from CE-Symm, the Multi-step Symmetry Detection Analysis, and the Symmetry Inferred From Neighbors sections, the RMSD and the TM-score are calculated after superimposing all hierarchically-related repeats onto the first repeat. In the results from SymD, the RMSD and the TM-score are calculated over the aligned residues after superimposing the transformed protein (with a transformation defined by the reported symmetry axis, angle and translation) onto the original structure.

Methodology

Q. Is there any reason why a structure might be excluded from EncoMPASS?
A. There are some rare exceptions caused by inconsistencies, repetitions or ambiguities in the coordinate files, such as unknown residue names or repeated residue indices (with no alternate location indicators) in the same chain. We also exclude structures with extended gaps in residue numbering (likely to be chimeric structures) or structures that are incompatible with a single flat membrane.

Q. How is a transmembrane domain defined?
A. A transmembrane domain is defined as a continuous range of amino acids containing at least one segment of secondary structure with Cα atoms inside the OPM-defined boundaries of the lipid bilayer by at least 1 �. This definition includes membrane crossings, same-side membrane insertions, but not small membrane loops.

Q. Is the EncoMPASS definition of transmembrane domain related to the one used in OPM or PDBTM?
A. No. OPM, PDBTM and EncoMPASS have three different definitions of transmembrane domains (or segments). Specifically, OPM defines as transmembrane segment a continuous segment of secondary structure that is at least partially contained inside the lipid bilayer boundaries. This implies that a kinked TM helix crossing the membrane can correspond to two different TM segments (if the kink region is extended enough to be considered a loop). When compared on the common set of membrane protein structures, OPM, PDBTM and EncoMPASS agree on ~60% of assignments of TM domains.

Q. Why does EncoMPASS consider single-chain subunits as fundamental units for sequence and structure comparison?
A. Many existing structure classifications and protein structures databases (such as DALI, SCOP and CATH) take structural domains as fundamental units for their structural analyses. However, membrane proteins usually have only 1 or 2 structural domains, and domain fusion is rare. Moreover, the definition of a structural domain is controversial. On the other hand, comparing the structures of whole complexes can result in very low sensitivity to structural similarity. Thus, instead we used single-chain subunits as a fundamental unit. Chains are uniquely defined. Moreover, the program Fr-TM-Align is able to produce accurate structural alignments of chains with multiple structural domains.

Q. Why do sequence- and structure-wise comparisons involve only topologically similar chains?
A. EncoMPASS aims to maximize the accuracy of the similarity assessments it produces. Despite the reasonable accuracy of the sequence identity and TM-score estimators, the number of pairs of structures mistakenly considered to be related (false positives) increases with decreasing topological similarity. To limit the number of false positives, we only include alignments of chains with similar numbers of transmembrane segments (according to our definition.

Q. When are two chains considered topologically similar?
A. Two chains are considered to be topologically similar when the estimated number of transmembrane domains is the >75% of the larger number of transmembrane segments. We are aware that this is an approximation which precludes some interesting and important comparisons. Yet, the current strategy is efficient and consistent with the philosophy of sacrificing sensitivity over specificity. We are currently working on a more extensive definition of a topology class.

Q. Why does EncoMPASS rely on the TM-score rather than the RMSD to assess structural similarity?
A. The structure alignment program Fr-TM-Align relies on the TM-score to generate its results. TM-score is independent of the size of the proteins being aligned, meaning that a given value of the TM-score will always imply the same degree of similarity regardless of the size of the two structures. This is not true for the RMSD (e.g., an RMSD of 2.5 � does not have the same meaning for an alignment of two 600-residue long proteins and two 40-residue long proteins).

Q. Is the TM-score used in the structural comparisons calculated over the whole structure?
A. No. Both RMSD and TM-score are calculated only on the pairs of Cα atoms that have been aligned by the program Fr-TM-Align. This subset of atoms and their coordinates can be downloaded from the web page corresponding to the chain of interest.

Q. What definition of sequence identity is EncoMPASS using and why?
A. Sequence identity is not uniquely defined. We represent the number of matches over the number of matches and mismatches, thus ignoring the parts of the sequence alignment containing gaps. This estimator is therefore a sequence-similarity equivalent of RMSD.

Features

Q. Can I compare my theoretical model with the structures contained in EncoMPASS?
A. Currently this is not possible, but we are working on adding this feature.

Issues

Q. Why is the PyMOL script that I downloaded from EncoMPASS for reproducing the figure not working?
A. During the upload of the database, coordinate files may have been renamed. Please change the name of the coordinate file in PyMOL to make it correspond to the coordinate file you downloaded.

Advanced Search

The user can search through results listed on the page for each protein complex ("Whole Structure" button) and subsequently filter those hits with criteria relating to results on the chains associated with those complexes. Alternatively, one can start with the results relating only to chains ("Chain Structure" button), and filter with results for the associated complexes. The table below lists the available search criteria.

General category	Criteria	Complex	Chain	Expected variable type	Description
Structure information	PDB	Yes	Yes	String	The 4-letter PDB code for a complex or the chain code in the format XXXX_Y, where the PDB code is designated with Xs and Y is the case-sensitive chain identifier
	Protein name	Yes	Yes	String	The full or partial name of the protein as it appears in the OPM database
	Protein type (alpha/beta)	Yes	Yes	String	Either alpha or beta, indicating the primary secondary structure content
	UniProt	Yes	Yes	String	Accession number for the UniProt protein sequence database
	Num. of chains	Yes	No	Integer	The number of chains in the complex
	Num. of TM chains	Yes	No	Integer	The number of membrane-spanning chains in the complex
	Num. of TM domains	Yes	No	Integer	The number of transmembrane crossings of the chain
	Num. of amino acids	Yes	Yes	Integer	The number of residues in the structure
	Resolution	Yes	Yes	Float	The resolution of the structure
	Method	Yes	Yes	String	The technique used to solve the structure (e.g. X-RAY DIFFRACTION, ELECTRON MICROSCOPY, ELECTRON CRYSTALLOGRAPHY, NMR)
	Num. of sequence neighbors	No	Yes	Integer	The number of sequence homologues (≥85% identity) within the database
	Num. of structural neighbors	No	Yes	Integer	The number of structural homologues (TM-Score ≥ 0.6) within the database
	Num. of all neighbors	No	Yes	Integer	The number of sequence or structural homologues within the database
CE-Symm results	Order	Yes	Yes	String	Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
	Num. of levels	Yes	Yes	Integer	The total number of levels of symmetry detected for multiple, hierarchically organized symmetries
	Num. of repeats	Yes	Yes	Integer	The number of symmetry-related structural repeats within the structure
	Repeat length	Yes	Yes	Integer	The average number of residues in a symmetry-related structural repeat
	Coverage	Yes	Yes	Float	The fraction of all amino acids in the structure that contribute to symmetry-related repeats
	Angle	Yes	Yes	Float	The angle [degrees] of a symmetry transformation
	Translation	Yes	Yes	Float	The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
	RMSD	Yes	Yes	Float	The average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
	TM-Score	No	Yes	Float	The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
SymD results	Order	Yes	Yes	Integer	The number of structural repeats found within the structure (with no distinction between circular, dihedral or helical symmetries)
	Coverage	Yes	Yes	Float	The fraction of all amino acids in the structure that contribute to symmetry-related repeats
	Angle	Yes	Yes	Float	The angle [degrees] of a symmetry transformation
	Translation	Yes	Yes	Float	The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
	RMSD	Yes	Yes	Float	The root mean square deviation of the Cα atoms [�] calculated over the symmetry-related residues after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates
	TM-Score	Yes	Yes	Float	The TM-score calculated after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates coordinates (see FAQ for details)
	Z-TM-Score	Yes	Yes	Float	The Z-score of the TM-score associated with the symmetry according to SymD
MSSD results	Order	Yes	Yes	String	Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
	Num. of levels	Yes	Yes	Integer	The total number of levels of symmetry detected for multiple, hierarchically organized symmetries
	Num. of repeats	Yes	Yes	Integer	The number of symmetry-related structural repeats within the structure
	Repeat length	Yes	Yes	Integer	The average number of residues in a symmetry-related structural repeat
	Coverage	Yes	Yes	Float	The fraction of all amino acids in the structure that contribute to symmetry-related repeats
	Angle	Yes	Yes	Float	The angle [degrees] of a symmetry transformation
	Translation	Yes	Yes	Float	The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
	RMSD	Yes	Yes	Float	The average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
	TM-Score	Yes	Yes	Float	The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
	Repeat topology (parallel/antiparallel)	Yes	Yes	String	Either parallel or antiparallel, indicating the relative topology of the repeats in the membrane
	Angle with membrane normal	Yes	Yes	Float	The angle [degrees] between the symmetry axis and the membrane normal
Inferred symmetry results	Template PDB	No	Yes	String	Chain PDB code (XXXX_Y) of the structure that was used as a source for the symmetry information
	Order	No	Yes	String	Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
	Num. of levels	No	Yes	Integer	The total number of levels of symmetry detected for multiple, hierarchically organized symmetries
	Num. of repeats	No	Yes	Integer	The number of symmetry-related structural repeats within the structure
	Repeat length	No	Yes	Float	The average number of residues in a symmetry-related structural repeat
	Angle	No	Yes	Float	The angle [degrees] of a symmetry transformation
	Translation	No	Yes	Float	The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
	RMSD	No	Yes	Float	The average root mean square deviation [�] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
	TM-Score	No	Yes	Float	The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
	Repeat topology (parallel/antiparallel)	No	Yes	String	Either parallel or antiparallel, indicating the relative topology of the repeats in the membrane
	Angle with membrane normal	No	Yes	Float	The angle [degrees] between the symmetry axis and the membrane normal

Encompass Api

If, instead of browsing the database through the Advanced Search option, you would like to query the EncoMPASS dataset in your automated workflow, we provide an API service. The service returns JSON formatted results.

General category	Criteria	Complex	Chain	Expected variable type	Description
Structure information	PDB	pdbid	chainPdbid	String	The 4-letter PDB code for a complex or the chain code in the format XXXX_Y, where the PDB code is designated with Xs and Y is the case-sensitive chain identifier
	Protein name	title	title	String	The full or partial name of the protein as it appears in the OPM database
	Protein type (alpha/beta)	class	class	String	Either alpha or beta, indicating the primary secondary structure content
	UniProt	uniprot	chainUniprot	String	Accession code for the UniProt protein sequence database
	Num. of chains	numChains	numChains	Integer	The number of chains in the complex
	Num. of TM chains	numTmChains	numTmChains	Integer	The number of membrane-spanning chains in the complex
	Num. of TM domains		numTmDomains	Integer	The number of transmembrane crossings of the chain
	Num. of amino acids	length	chainLength	Integer	The number of residues in the structure
	Resolution	resolution	resolution	Float	The resolution of the structure
	Method	method	method	String	Technique used to solve the structure (e.g. X-RAY DIFFRACTION, ELECTRON MICROSCOPY, ELECTRON CRYSTALLOGRAPHY, NMR)
	Num. of sequence neighbors		numSeqNeighbors	Integer	The number of sequence homologues (≥85% identity) within the database
	Num. of structural neighbors		numStructNeighbors	Integer	The number of structural homologues (TM-Score ≥ 0.6) within the database
	Num. of all neighbors		numNeighbors	Integer	The number of sequence or structural homologues within the database
CE-Symm results	Order	cesymmOrder	chainCesymmOrder	String	Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
	Num. of levels	cesymmNumLevels	chainCesymmNumLevels	Integer	The total number of levels of symmetry detected for multiple, hierarchically organized symmetries
	Num. of repeats	cesymmNumRepeats	chainCesymmNumRepeats	Integer	The number of symmetry-related structural repeats within the structure
	Repeat length	cesymmRepLength	chainCesymmRepLength	Integer	The average number of residues in a symmetry-related structural repeat
	Coverage	cesymmCoverage	chainCesymmCoverage	Float	The fraction of all amino acids in the structure that contribute to symmetry-related repeats
	Angle	cesymmAngle	chainCesymmAngle	Float	The angle [degrees] of a symmetry transformation
	Translation	cesymmTranslation	chainCesymmTranslation	Float	The length [�] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
	RMSD	cesymmRmsd	chainCesymmRmsd	Float	The average root mean square deviation [�] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
	TM-Score	cesymmTmScore	chainCesymmTmScore	Float	The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
	Aligned Residues	cesymmAlignedResidues	chainCesymmAlignedResidues	String
	Unrefined Rmsd	cesymmUnrefinedRmsd	chainCesymmUnrefinedRmsd	String
	Unrefined TM-Score	cesymmUnrefinedTmScore	chainCesymmUnrefinedTmScore	String
	Seed	cesymmSeed	chainCesymmSeed	String
SymD results	Order	symdOrder	chainSymdOrder	Integer	The number of structural repeats found within the structure (with no distinction between circular, dihedral or helical symmetries)
	Coverage	symdCoverage	chainSymdCoverage	Float	The fraction of all amino acids in the structure that contribute to symmetry-related repeats
	Angle	symdAngle	chainSymdAngle	Float	The angle [degrees] of a symmetry transformation
	Translation	symdTranslation	chainSymdTranslation	Float	The length [�] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
	RMSD	symdRmsd	chainSymdRmsd	Float	The root mean square deviation of the Cα atoms [�] calculated over the symmetry-related residues after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates
	TM-Score	symdTmScore	chainSymdTmScore	Float	The TM-score calculated after superimposing the coordinates transformed by the symmetry transformation onto the initial structure coordinates (see FAQ for details)
	Z-TM-Score	symdZTmScore	chainSymdZTmScore	Float	The Z-score of the TM-score associated with the symmetry according to SymD
Symmetry Detection	Order	mssdOrder	chainMssdOrder	String	Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
	Num. of levels	mssdNumLevels	chainMssdNumLevels	Integer	The total number of levels of symmetry detected for multiple, hierarchically organized symmetries
	Num. of repeats	mssdNumRepeats	chainMssdNumRepeats	Integer	The number of symmetry-related structural repeats within the structure
	Repeat length	mssdRepLength	chainMssdRepLength	Integer	The average number of residues in a symmetry-related structural repeat
	Coverage	mssdCoverage	chainMssdCoverage	Float	The fraction of all amino acids in the structure that contribute to symmetry-related repeats
	Angle	mssdAngle	chainMssdAngle	Float	The angle [degrees] of a symmetry transformation
	Translation	mssdTranslation	chainMssdTranslation	Float	The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
	RMSD	mssdRmsd	chainMssdRmsd	Float	The average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
	TM-Score	mssdTmScore	chainMssdTmScore	Float	The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
	Repeat topology (parallel/antiparallel)	mssdRepTopology	chainMssdRepTopology	String	Either parallel or antiparallel, indicating the relative topology of the repeats in the membrane
	Angle with membrane normal	mssdAngleMembrane	chainMssdAngleMembrane	Float	The angle [degrees] between the symmetry axis and the membrane normal
	Aligned Residues	mssdAlignedResidues	chainMssdAlignedResidues	Integer
Inferred symmetry results	Template PDB		chainTemplateid	String	Chain PDB code (XXXX_Y) of the structure that was used as a source for the symmetry information
	Order		chainInferOrder	String	Symmetry order such as C2, D3, for point group symmetries and H or R for helical or repeated symmetries
	Num. of levels		chainInferNumLevels	Integer	The total number of levels of symmetry detected for multiple, hierarchically organized symmetries
	Num. of repeats		chainInferNumRepeats	Integer	The number of symmetry-related structural repeats within the structure
	Repeat length		chainInferRepLength	Integer	The average number of residues in a symmetry-related structural repeat
	Coverage		chainInferCoverage	Float	The fraction of all amino acids in the structure that contribute to symmetry-related repeats
	Angle		chainInferAngle	Float	The angle [degrees] of a symmetry transformation
	Translation		chainInferTranslation	Float	The length [Angstroms] of the component of the translation vector associated with a symmetry transformation that is parallel to the symmetry axis (i.e., screw translation)
	RMSD		chainInferRmsd	Float	The average root mean square deviation [Angstroms] of the Cα atoms calculated after superposing all repeats in a hierarchical symmetry
	TM-Score		chainInferTmScore	Float	The TM-score calculated by superposing all repeats in a hierarchical symmetry (see FAQ for details)
	Repeat topology (parallel/antiparallel)		chainInferRepTopology	String	Either parallel or antiparallel, indicating the relative topology of the repeats in the membrane
	Angle with membrane normal		chainInferAngleMembrane	Float	The angle [degrees] between the symmetry axis and the membrane normal
	Aligned Residues		chainInferAlignedResidues	Integer
API-specific options	page	page	page	Integer	Instead of returning all results from a query, which might be time intensive, results can be divided into pages with a predefined size (see size parameter) and only the requested page(s) will be displayed
	size	size	size	Integer	Number of results to be returned or displayed on each page

HHS Vulnerability Disclosure

EncoMPASS

Encyclopedia of Membrane Proteins Analyzed by Structure and Symmetry

National Institute of Neurological Disorders & Stroke

Contents

Methodology

Features

Issues

Advanced Search

Encompass Api