Appendix B: Structural Similarity per Residue

Next: About this document ... Up: Appendices Previous: Appendix B: Contents

Appendix B: $Q_{res}$ Structural Similarity per Residue

Here we define another metric, called Q $_{res}$ , that is derived from Q which is used to measure the structural conservation of the environment of each residue in the alignment. Q $_{res}$ is a measure of the similarity of the C $^\alpha$ -C $^\alpha$ distances between a particular residue and all other aligned residues, excluding nearest neighbors, in a set of aligned proteins. The result is a value between 0 and 1 that describes the similarity of the structural environment of a residue in a particular protein to the environment of that same residue in all other proteins in the set. Lower scores represent low similarity and higher scores high similarity. If the set of proteins represents an evolutionarily balanced set, then structural similarity corresponds to structural conservation. Formally, Q $_{res}$ is defined as follows:

$\begin{displaymath} Q_{res}^{(i,n)} = \aleph \mathop{\sum _{(m\not=n)}^{proteins... ...ime }j^{\prime }}^{(m)} \right)^{2}}{2\sigma ^{2}_{ij}}\right] \end{displaymath}$

(1)

where $Q_{res}^{(i,n)}$ is the structural similarity of the $i^{th}$ residue in the $n^{th}$ protein, $r_{ij}^{(n)}$ is the $C^\alpha$ - $C^\alpha$ distance between residues and in protein and $r_{i^{\prime }j^{\prime }}^{(m)}$ is the $C^\alpha$ - $C^\alpha$ distance between the residues in protein that correspond to residues and in protein . The variance is related to the sequence separation between residues and ,

$\begin{displaymath} \sigma ^{2}_{ij}=\left\vert i-j\right\vert ^{0.15} \end{displaymath}$

(2)

and the normalization is given by

$\begin{displaymath} \aleph =\frac{1}{\left( N_{seq}-1\right) \left( N_{res}-k\right)} \end{displaymath}$

(3)

where $N_{seq}$ is the number of proteins in the set, $N_{res}$ is the number of residues in protein , and is 2 when residue is the N- or C-terminus otherwise 3.

In order to know which residues correspond to each other across the set of proteins, Q $_{res}$ requires a multiple sequence alignment (MSA) of the proteins' sequences. Typically the MSA is generated using a structural alignment program.

Next: About this document ... Up: Appendices Previous: Appendix B: Contents

multiseq@scs.uiuc.edu