Lesson 2_7: Phylogenetic Trees
Assignment 2_7
Read the background information on phylogeny concepts and programs. As you will learn from the reading, many algorithms for identifying phylogenetic trees depend on the order in which sequences are analyzed. The program Phylip is freely available, and can be used to explore this issue.
A better strategy might be to use a distance method for determining the tree, given the large number of sequences. Why is this so?
(don't try this exercise on the lab computers; its therefore optional)
One method of estimating the confidence that you have in assigning the groups is to perform a bootstrap on the original data file, determine a distance matrix for each bootstrapped data set, use the neighbor-joining program to establish groups for each data set, and then use the Consensu.exe program to derive the majority rule consensus tree for all of the bootstrapped data sets. To do so for 610 bootstrapped data sets took several days on the Pentium PC mentioned above. Note that the final tree here represents the bootstrap freqencies, not the evolutionary distances (the internode distances reflect the bootstrap frequencies; the final line to each sequence is the same for all sequences, and its magnitude is just chosen for spreading out the names so they can be read).
Alternate (or additional optional) Exercise
The NCBI now shows data coming in from the effort to sequence the genome of Pseudamonas aeruginosa. If you do a blast search for WPGNVREL, a highly conserved motif in the sigma 54 dependent transcriptional activators that was used for one of the pcr primers used to obtain many of the sequences in the file all14d.aln, there are 12 hits. You can download the sequence contigs that are presently available, and use blast and a translation program to determine the amino acid sequence of the 12 putative activators. (Use the wordpad editor to access each of the 12 contigs; then cut / paste them into the blast search site.) After translating the sequences, you could eliminate any that were already in all14.dnd, add the new ones, and use clustalX with Phylip to see if they cluster with any previously known genetic functions. Your blast search may also turn up some predicted functional genes that these activators are regulating (can you find the system that is a very likely candidate for regulating the expression of a transport gene? it may help to confine your search to sequences that flank the sigma 54-dependent activator region by removing the latter). Also, you could use the program SEQSCAN that I provide to look for sigma 54 dependent promoters that may or may not be adjacent to IHF binding sites. Any such promoters should mark a point in the sequence close to the beginning of regulated genes, and indicate the direction of transcription. If you find any novel relationships, you are among the first in the world to know about them!
Potentially Useful links: