Lesson 2_1: Sequence Alignments

Assignment 2_1

Beginning with Margret Dayhoff's creation of the Mutation Data Matrix in the late 1960's, computers have been used more and more to help investigation of similarity amongst protein and DNA sequences. Now there are sites that provide HTML and MUD mediated interactive programs for data submission and analysis. There is even a 1 year old Internet course on Biocomputing. This assignment is meant to familiarize you with theory and facilities for sequence alignments and similarity searches. Please make use of the many links to additional materials in the background reading. There is also some useful information in the GCG Users Guide at the Biotec Center.

  1. Read the background information (and also make use of this excellent VSNS site, and for this section, especially the chapter on weighted matrixes; and or check out the tutorials listed below at the Pittsburg Supercomputing Center).
  2. Conduct the canned exercise to see what is meant by a multiple sequence alignment.
  3. Using a local  copy of ClustalW (save this zip file on your hard drive, rename it to clust.exe, and execute it to unzip the program files; UPDATE - ClustalW has been updated  for a Windowed interface as ClustalX (perhaps more upto date, local copy placed 9/98) on the same set of sequences. As far as I can tell, the two programs are very similar, but the windows version shows a plot of alignment quality, and there may be some options for alignment parameters that differ between the two programs (scoring matrices available, etc). Note the different options that you have, and be sure to read about them [use the help feature and go to the various menu screens to see what options are available; or read the author's comments (ClustalW, ClustalX) (or other helpful descriptions of ClustalW)] so that you understand what the choices mean. Conduct an alignment of sequences that you care about. Post a brief description of why you chose those sequences, and use the report to copy / paste the alignment preformatted text so that you may discuss the significance of conserved regions.

Potentially Useful links:

Some Interesting Links found at the Pittsburg Supercomputing Center (PSC)

Software available at the PSC

Nucleic Acid and Protein Sequence Analysis software

Sequence Analysis Tutorials

Return to Table of Contents