Homology of sequences in genetics
In genetics, homology is used in reference to protein or DNA sequences, meaning that the given sequences share a common ancestor. Sequence homology may also indicate common function. Asking whether two sequences are homologous is a yes-or-no question—there is no such condition as "degrees of homology." Sequence regions that are homologous may also be called conserved.
Homology among proteins and DNA is often concluded on the basis of sequence similarity, especially in bioinformatics. For example, in general, if two genes have an almost identical DNA sequence, it is likely that they are homologous. However, it may be that the sequence similarity did not arise from their sharing a common ancestor; short sequences may be similar by chance, or sequences may be similar because both were selected to bind to a particular protein, such as a transcription factor. Such sequences are similar but not homologous.
The phrase "percent homology", as sometimes used by those outside the fields of evolutionary biology or bioinformatics, is incorrect. The phrases "percent identity" or "percent similarity" should be used to quantify the similarity between the biomolecule sequences. For two naturally occurring sequences, percent identity is a factual measurement, whereas homology is a hypothesis supported by evidence. One can, however, refer to partial homology where a fraction of the sequences compared (are presumed to) share descent, while the rest does not.
Many algorithms exist to cluster protein sequences into sequence families, which are sets of mutually homologous sequences. (See sequence clustering and sequence alignment.)
|
|