next up previous contents
Next: About this document ... Up: Appendices Previous: Appendix B:   Contents


Appendix B: $Q_{res}$ Structural Similarity per Residue

Here we define another metric, called Q$_{res}$, that is derived from Q which is used to measure the structural conservation of the environment of each residue in the alignment. Q$_{res}$ is a measure of the similarity of the C$^\alpha$-C$^\alpha$ distances between a particular residue and all other aligned residues, excluding nearest neighbors, in a set of aligned proteins. The result is a value between 0 and 1 that describes the similarity of the structural environment of a residue in a particular protein to the environment of that same residue in all other proteins in the set. Lower scores represent low similarity and higher scores high similarity. If the set of proteins represents an evolutionarily balanced set, then structural similarity corresponds to structural conservation. Formally, Q$_{res}$ is defined as follows:


\begin{displaymath}
Q_{res}^{(i,n)} = \aleph \mathop{\sum _{(m\not=n)}^{proteins...
...ime }j^{\prime }}^{(m)} \right)^{2}}{2\sigma ^{2}_{ij}}\right]
\end{displaymath} (1)

where $Q_{res}^{(i,n)}$ is the structural similarity of the $i^{th}$ residue in the $n^{th}$ protein, $r_{ij}^{(n)}$ is the $C^\alpha$-$C^\alpha$ distance between residues $i$ and $j$ in protein $n$ and $r_{i^{\prime }j^{\prime }}^{(m)}$ is the $C^\alpha$-$C^\alpha$ distance between the residues in protein $m$ that correspond to residues $i$ and $j$ in protein $n$. The variance is related to the sequence separation between residues $i$ and $j$,


\begin{displaymath}
\sigma ^{2}_{ij}=\left\vert i-j\right\vert ^{0.15}
\end{displaymath} (2)

and the normalization is given by


\begin{displaymath}
\aleph =\frac{1}{\left( N_{seq}-1\right) \left( N_{res}-k\right)}
\end{displaymath} (3)

where $N_{seq}$ is the number of proteins in the set, $N_{res}$ is the number of residues in protein $n$, and $k$ is 2 when residue $i$ is the N- or C-terminus otherwise 3.

In order to know which residues correspond to each other across the set of proteins, Q$_{res}$ requires a multiple sequence alignment (MSA) of the proteins' sequences. Typically the MSA is generated using a structural alignment program.


next up previous contents
Next: About this document ... Up: Appendices Previous: Appendix B:   Contents
multiseq@scs.uiuc.edu