Lesson 2_1:
Sequence Alignments
Assignment 2_1
Beginning with Margret Dayhoff's creation of the Mutation Data Matrix in
the late 1960's, computers have been used more and more to help investigation
of similarity amongst protein and DNA sequences. Now there are sites that
provide HTML and MUD mediated interactive programs for data submission and
analysis. There is even a 1 year old Internet course on Biocomputing. This
assignment is meant to familiarize you with theory and facilities for sequence
alignments and similarity searches. Please make use of the many links to
additional materials in the background reading. There is also some useful
information in the GCG Users Guide at the Biotec Center.
-
Read the background information
(and also make use of this excellent
VSNS
site, and for this section, especially the chapter on
weighted
matrixes; and or check out the tutorials listed below at the Pittsburg
Supercomputing Center).
-
Conduct the canned exercise to see what is meant by a multiple sequence
alignment.
-
Retrieve seqdata.txt. This
file contains three sequences in the FASTA format. Use the select and copy
to clip-board functions to paste sequence from this window to the clipboard.
-
Return to this page.
-
Click on the link to an internet version of
Clustal
W..
-
Enter the sequences. Use copy / paste to aid in the data entry.
-
Check the ClustalW option (it is the default, but just be sure), and press
the submit button.
-
You should see a return of the alignment and data submitted, but it might
take a few minutes. Note that for large runs, or if you don't want to wait,
there is an Email option for receiving the results.
-
Save the alignment report to a local file.
-
Post a link to your saved copy of the HTML report from your home page to
complete your assignment.
-
Using a local copy of
ClustalW (save this
zip file on your hard drive, rename it to clust.exe, and execute it to unzip
the program files; UPDATE - ClustalW has been updated for a Windowed
interface as ClustalX
(perhaps more upto
date, local copy
placed 9/98) on the same set of sequences. As far as I can tell, the two
programs are very similar, but the windows version shows a plot of alignment
quality, and there may be some options for alignment parameters that differ
between the two programs (scoring matrices available, etc). Note the different
options that you have, and be sure to read about them [use the help feature
and go to the various menu screens to see what options are available; or
read the author's comments
(ClustalW,
ClustalX)
(or other helpful descriptions of
ClustalW)]
so that you understand what the choices mean. Conduct an alignment of sequences
that you care about. Post a brief description of why you chose those sequences,
and use the report to copy / paste the alignment preformatted text so that
you may discuss the significance of conserved regions.
Potentially Useful links:
Some Interesting Links found at the Pittsburg Supercomputing Center
(PSC)
Software
available at the PSC
Nucleic
Acid and Protein Sequence Analysis software
Sequence
Analysis Tutorials
Return to Table of Contents