Abstract
Molecular sequences that share a high degree of similarity often are thought to have evolved from common ancestral genes. Closely related protein sequences will presumably correspond to similar three-dimensional structures and conserved biological functions (although the reverse is not necessarily true: similar structures and conserved functions do not imply that the corresponding protein sequences will be similar; reviewed in ref. 1). These assumptions provide the basis for computational gene annotation. Typically, the first step in characterizing a novel gene is to compare its sequence against known sequences in available databases and to predict its origin and function by copying the annotation of those previously characterized sequences. This approach has been highly successful and is probably the only practical method applicable to large-scale annotation efforts at present. It should be pointed out, however, that this practice is not without its limitations (and is also unsatisfactory from the more theoretical perspective of those who wish to determine structure and function from primary sequence; for a provocative editorial on this subject, see ref. 2). The intrinsic problems of transitive propagation of historical annotation errors have been discussed elsewhere (bi3) and are all too familiar to any biologist who has looked into the databases only to find puzzling annotations that make no sense with current knowledge.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Weir, M., Swindells, M., and Overington, J. (2001) Insights into protein function through large-scale computational analysis of sequence and structure. Trends Biotechnol. 19, S61–S6.
Konopka, A. K. (2003) Selected dreams and nightmares about computational biology. Comp. Biol. & Chem. 27, 91–92.
Brendel, V. (2002) Integration of data management and analysis for genome research. In Schubert, S., Reusch, B., and Jesse, N. (eds.), “Informatik bewegt”. Lecture Notes in Informatics (LNI)—Proceedings P-20, 10–21.
Altschul S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
Altschul S. F., Madden, T. L., Schäffer, A. A., et al. (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Benson D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., and Wheeler, D. L. (2003) GenBank. Nucleic Acids Res. 31, 23–27.
Westbrook, J., Feng Z., Chen L., Yang H., and Berman, H. M. (2003) The Protein Data Bank and structural genomics. Nucleic Acids Res. 31, 489–491.
Higgins D. G., Thompson, J. D., and Gibson, T. J. (1996) Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 266, 383–402.
Kumar S., Tamura K., and Nei M. (1994) MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers. Comput. Appl. Biosci. 10, 189–191.
Felsenstein J. (1989) PHYLIP-Phylogeny Inference Package (Version 3.2). Cladistics 5, 164–166.
Vogt, G., Etzold T., and Argos, P. (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J. Mol. Biol. 249, 816–831.
Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 89, 10,915–10,919.
Dayhoff, M. O., Schwartz, R. M., and Orcutt, B. C. (1978). A model of evolutionary change in proteins. In: (Dayhoff, M. O., ed.) Atlas of Protein Sequence and Structure, National Biomedical Research Foundation, Washington, DC: pp. 345–362.
Altschul, S. F., Boguski, M. S., Gish, W., and Wootton, J. C. (1994) Issues in searching molecular sequence databases. Nat. Genet. 6, 119–129.
Rost B. (2002) Enzyme function less conserved than anticipated. J. Mol. Biol. 318, 595–608.
Xing L. and Brendel V. (2001) Multi-query sequence BLAST output examination with MuSeq Box. Bioinformatics 17, 744–745.
Worley K. C., Wiese, B. A., and Smith, R. F. (1995) BEAUTY: an enhanced BLASTbased search tool that integrates multiple biological information resources into sequence similarity search results. Genome Res. 5, 173–184.
Brinkman, F. S., Wan, I., Hancock, R. E., Rose, A. M., and Jones, S. J. (2001) PhyloBLAST: facilitating phylogenetic analysis of BLAST results. Bioinformatics 17, 385–387.
Paquola, A. C., Machado, A. A., Reis, E. M., Da Silva A. M., and Verjovski-Almeida S. (2003) Zerg: a very fast BLAST parser library. Bioinformatics 22, 1035–1036.
Altschul, S. F. and Koonin, E. V. (1998) Iterated profile searches with PSI-BLAST-a tool for discovery in protein databases. Trends Biochem. Sci. 23, 444–447.
Jones D. T. and Swindells, M. B. (2002) Getting the most from PSI-BLAST. Trends Biochem. Sci. 27, 161–164.
Mitsuuchi, Y., Johnson, S. W., Sonoda, G., Tanno, S., Golemis, E. A., and Testa, J. R. (1999) Identification of a chromosome 3p14.3-21.1 gene, APPL, encoding an adaptor molecule that interacts with the oncoprotein-serine/threonine kinase AKT2. Oncogene 18, 4891–4898.
Miaczynska M., Christoforidis S., Giner A., et al. (2004) APPL proteins link Rab5 to nuclear signal transduction via an endosomal compartment. Cell 116, 445–456.
Peter, B. J., Kent, H. M., Mills, I. G., et al. (2004) BAR domains as sensors of membrane curvature: the amphiphysin BAR structure. Science 303, 495–499.
Lipman, D. J. and Pearson, W. R. (1985) Rapid and sensitive protein similarity searches. Science 227, 1435–1441.
Pearson, W. R. and Lipman, D. J. (1998) Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444–2448.
Smith, T. and Waterman, M. (1981) Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197.
Usuka, J., Zhu, W., and Brendel, V. (2000) Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16, 203–211.
Kent, W. J. (2002) BLAT-the BLAST-like alignment tool. Genome Res. 12, 656–664.
Pertsemlidis, A. and Fondon, J. W. 3rd. (2001) Having a BLAST with bioinformatics (and avoiding BLASTphemy). Genome Biol. 2, reviews 2002.1–2002.10.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Humana Press Inc., Totowa, NJ
About this protocol
Cite this protocol
Dong, Q., Brendel, V. (2005). Computational Identification of Related Proteins. In: Walker, J.M. (eds) The Proteomics Protocols Handbook. Springer Protocols Handbooks. Humana Press. https://doi.org/10.1007/978-1-59259-890-8_51
Download citation
DOI: https://doi.org/10.1007/978-1-59259-890-8_51
Publisher Name: Humana Press
Print ISBN: 978-1-58829-343-5
Online ISBN: 978-1-59259-890-8
eBook Packages: Springer Protocols