Lesson 2_7: Quantification of Biochemical Data
Five years ago my experience with quantifying ligand binding data was based entirely on using Scatchard plots. At that time, a graduate student and I wanted to measure the microscopic binding constants for the association of a transcription factor and its cognate binding site. Moreover, we wanted to know if binding of two copies of the protein to two adjacent sites was cooperative, or independent. In preparing to do this work, we learned about using simultaneous, non-linear regression as a tool for analyzing DNAse I footprint data. Our lesson is worth sharing, and that is the primary reason I began this course. When we are done, you should have a good feeling for:
An excellent source of information has been compiled in two volumes of Methods in Enzymology, vols 210 and 240. I strongly urge you to take a look at this material. Two articles that I found particularly helpful were the first ones in each volume, and the one by Beecham in vol 210. Some notes that I have excerpted from those sources are posted, as is a description of the DNAse I problem, as we experienced it. You may also want to see a web page that Ross Hardison and I made for BMB 400 (as well as presenting its own content, it uses some links to the prior DNAse I pages).
I will be discussing this material in class, and making up some exercises as we go.For now, study the backgound information.
Assignment 2_7A - NONLIN Construction. Perform this exercise on compiling and linking a fortran module with the modules of NONLIN, and write up an HTML document describing what you did, answering the questions posed in the exercise.
Assignment 2_7B - In performing the last exercise you probably found the programming process a little tedious, because you had to switch directories, set paths, rename files, etc. Plus, in running nonlin you need to input a data file name, an output file name, a series of initial guesses, and several choices, depending on what happens during the converging process. So, you had to stay by the terminal as the program ran. While this is not a major problem for simple analyses, it can be a major headache for longer ones - imagine running NONLIN 50 times to walk along a parameter axis to see what the variance of fit is for changes in that parameter! Batch files and I/O redirection can relieve such pain very effectively (I/O stands for input/output). This assignment on batch files and I/O redirection is designed to reveal what these are all about, and help you begin to learn how to use them (such options are typical of any operating system; you just have to know how they are implemented in your particular case). When you are done, write an HTML document describing what you have learned, so that you will be able to recall it later.
Take a break - you deserve it!
Assignment 2_7C - Now you know how to perform compiling and linking of projects that you can organize in directories specific for each project. Let's get some more practice writing the fortran module that is needed for defining new versions of NONLIN.
The first exercise reveals for you one way to include more than one experiment in an analysis, where each experiment has parameters unique to itself as well as parameters shared by other experiments. This is very important, as we will see eventually that combining experiments into a global analysis is a much more powerful experimental approach than simply repeating and experiment to collect more of the same data.
The second exercise simply asks you to make a NONLIN executable for analyzing initial velocity data for equilibrium kinetic analysis using the model of Michaelis-Menton or Briggs-Haldane. This exercise is intended to bring home the point that you can write these modules for any function you know about - with not much difficulty.
When done, write an HTML document describing your fortran modules, explaining how they accomplish the desired end.
Take a deep breath for the next exercise takes some time. But, getting through it is worth a year of my time spent learning how to do it! Plus, I've made it a "canned" exercise, so you should be able to work through it pretty quickly.
Assignment 2_7D - Ok, now we're ready for a bigger project. Now that you know about NONLIN and how it can be used to analyze data (as single experiments, or as combinations of more than one), let's do an exercise on some real DNAse I footprint data that Dean Scholl and I published in 1996 in J. Biol. Chem. You can read a detailed description of how we did the analysis in my posting on cooperativity. I have put together a series of exercises to walk you through such an analysis. Remember, you must (probably!) use a PC and not a MAC for this exercise, and you will also need to use a spreedsheet (Excel is on the computers in 120 S.Frear).
Here is a summary of my take on the various approaches for protein-DNA binding studies, together with one of the error analyses plots from exercise 2_7D.
So far we have used nonlinear regression to solve protein DNA binding problems, and steady state kinetic experiments. Next we will see how exponential functions are used to model equilibrium analytical ultracentrifugation data. After that, we'll see how one of the software packages that provides for analysis of equilibrium data also provides several ways to analyze sedimentation velocity data. These experiments provide information about Molecular Weight (M; equilibrium data) and mass and shape (velocity data).
Assignment 2_7E - Analytical Ultracentrifugation.
Here is a review of analytical ultracentrifugation. Here are some slides used at a workshop on Analytical Ultracentrifugation from which you can see the forces, equations etc together with lots of additional information. BeckmanCoulter, the manufacturer of the analytical ultracentrifuge we have access to, posts some articles about the use of the instrument. They can be found by going to their home page and searching for 'xl-i'.
Read the review and other postings as needed to learn about the technique, then work through the exercises below which illustrate some of the possible experimental realities you might encounter.
In general, one needs to know sigma, M, vbar and buffer density for any sedimentation data analysis. A program written by Tom Laue and many others, distributed by John Philo, makes this easy. The program is called Sedinterp, and it provides a lot of useful utility tools. Download it from our course ftp site, and load this program on your local machines. I'll help in class. Then, use it to calculate for NtrCR461Q:
To do so, you'll want to place the sequence of NtrC:R461Q on the clipboard (section it off with the mouse, and press ctrl-c). Then open Sedinterp, press the from composition button, the from the clipboard button. This loads the sequence into Sedinterp. You may save it, or make other calculations after this. The help file for Sedinterp is also very useful - take a look at it.
UltrascanII - Van Holde-Weischet and time derivative analysis (used to require Linux OS, now is available for Windows). We will install it during class.
Helpful link: Software and starting point for lots of info.
The NONLIN program we will use for equilibrium analysis is called WinNonlin, by Yphrantis et al. It is a Windows95 version of a NONLIN executable designed to analyze sedimentation equilibrium results. I got it by anonymous ftp to spin6.mcb.uconn.edu in the /pc/winnonln directory.
1. Read the user information in the WinNonlin Help file. You may also find the old documentation for the Mac version helpful (HID, this is an rtf file that can be opened in Word or other word processor; for file transfer purposes from this server, it is called HID.bin).
Learn about
2. Retrieve the XL-I data file 00012ra4.bin, open it in Microsoft Excel, and examine a plot of column 1 (radial position) vs column 2 (absorbance) [column 3 is the error for repeated measurements of each absorbance value). There are three concentration profiles of interest in here, which have to be processed before they can be analyzed. First, the appropriate ranges have to be identified. Second, use cut/paste to make a new spreadsheet of just the columns 1 to 3. Then, change the radius values to 0.5*radius2. In a real study, you'd have to save this as an ascii text file, and do so for each data set.
3. Now, run the program WinNONLIN and analyze the NtrC data files that are provided (1.bin, 2.bin, 3.bin, 4.bin, 5.bin, 6.bin). These were obtained for 1, 2, 4, 8.8, 12.9, and 16.8 microM monomer equivalents of NtrC loading concentration. The rpm was 8000. The gas constant R is 8.314 x 107 ergs/mole-°K, at 293 °K (the temperature for the runs). What is sigma for a monomer of NtrC (mol wt 50 kDa)? For a dimer?
First, analyze each data file separately, for a single species. Fit lnA, deltaY, and sigma (what are these?). Make a table to record the results, indicating if the residuals are random or not, and noting the variance of fit. From the estimated value of sigma, calculate the apparent molecular weight for the average size species based on the single species model, for each concentration run. Is there a trend for larger average molecular weight when increasing concentrations are loaded? If there is, what does this mean; if there is not, what does this mean?
Second, analyze all data files globally. Test monomer, monomer-dimer, dimer-tetramer, dimer-tetramer-octamer, etc models. (you do this by checking the k or k's to be fit, entering a 0 for initial guesses, and making sure the chosen k's correspond to the right N values - which can be held constant to define a given model, or floated to let the system tell you which values for N gives the best fit; note that this may be asking too much of your data). In each case, note the pattern of residuals, and note the variance of fit. Perform an error analysis to see the confidence limits. Which model do you think best describes the data?
Modified Instructions for 2006: Sed Vel using Ultrascan II
On the ftp server you will find a folder called ultrancentrifuge\ultrascanii\us. Copy the files in that folder into the c:\us folder on your local hard drive. Then, also copy the files you find in the ultracentrifuge\ultrascanii\us\results folder into the results folder on your local drive, c:\us\results. These files contain pre-edited velocity data for NtrC in the apo and phosphorylated states. In particular: 1stmodel is the apo state; rerun0604 is phosphorylated state; btnoctrerun is phosphorylated state but ~1X EDTA was added to chelate Mg2+; 2xedta is when excess EDTA was added. Start Ultrascan, choose Velocity - Enhanced van Holde Weischet analysis. Load each data file one at a time. After loading, 'accept' the data. Then calculate the vbar value for the protein (follow the menus, selecting to load from hard drive finding the sequence file for ntrcs160f3ala in the c:\us folder). Then calculate the density and viscosity (load from hard drive finding the right buffer - glycerol or edta - in the c:\us folder. Then calculate the distribution and note the result. Do this for each of the 4 conditions. What do you think is happening when you phosphorylate the protein (adding EDTA will bind Mg2+, needed to stably phosphorylate the protein)? After you have really tried to do this on your own, you can check out what I think (with additional evidence in supplemental materials) at De Carlo et al, Genes and Dev.
Another program for Sed Vel (we'll do this in class on tuesday)
For analyzing sedimentation velocity data, we are going to use Svedberg which simulates the run. Here is a zip file containing several datafiles, for the same protein.
Assignment 2_7F - Kinetic Simulation.
In addition to providing the computation power needed to perform serious nonlinear regression and error analyses, desktop computers also make it possible to use numerical integration to simulate the time course of a chemical reaction. By changing the parameter values, one can simply adjust them until a mechanism is seen to provide a good fit or not for the data.
Read the paper describing KINSIM (Barshop et al., 1983, Anal. Biochem. 130:134-145) (pdf available for Penn State users) , and you may want to read about FITSIM (Zimmerle, C.T. and Frieden, C., 1989, Biochem. J. 258:381-387.), but the following tutorial does not require it. We will be using a version of KINSIM called KinTekSim, which you can download from the KinTek Corporation web site. It has FitSim built into it. Using KinTekSim, work through the tutorial provided by Wachsstock and Pollard (Biophysical Journal 67:1260-1273). This tutorial has instructions that you should follow, using an appropriate version of KinSim software. You do not need FITSIM for this tutorial, and it is in fact better for you to do things manually to learn the relationships. Realize, however, that in practice, FITSIM does a lot of the work for you - making the appropriate guesses until a good fit is obtained if all works okay. (The tutorial and problem data files are provided in links that I have set up for you; the original source is here.)
To use the tutorial, consider the following.
For PC's, there are
The latter is the version that we will be using in class. For this year's class, (1998), I have posted a zip file kinteksim.zip, containing the setup.exe and associated files posted in June of 1998 that are needed to install KinTekSim. Please read the readme.btn file before unzipping and running setup. The data files for the tutorial problems 1 to 9, and the tutorial in HTML and word processor formats are located in this second zip file.
When you are done with the tutorial, write an HTML document explaining the value of kinetic simulation, compared to the use of steady state analyses.
Potentially Useful Sites
Interactive Biochemistry: - Sean O'Hearn pointed out a java applet page that let's you explore Michaelis-Menton plots. The authors of that page have posted other such applets, which are listed at their Interactive Biochemistry site.
KINSIM and FITSIM - most recent posting of these applications from the author, Dr. Frieden.
Solution X-ray Scattering
Some notes (introduced in 2006) and examples of small- and wide-angle X-ray solution scattering.
Crystal Reflections
Some notes (introduced in 2006). (pdf file with clear math text.)
Assignment 2_7G - Biacore
The Biacore is an instrument that uses surface plasmon resonance to measure the refractive index of solutions near a surface. Other instruments are available that do the same thing. This information can be used to determine the concentration of mass near the surface, which under special circumstances can be used to watch binding events in the millisecond to minutes time frame. So, SPR data is treated as kinetic data, with numerical integration and global statistical analysis greatly increasing the number of systems that can be effectively modeled.
1) Learn about SPR.
Browse these general information sites
You may use the Biacore information program (note - this installation file is 32 MB and requires sound; sound is not available on the microlab computers, but is on the department laptop under Windows95, in 330 SF, and 308 Althouse - ask for help from the Nixon and Simpson labs, respectively, if you want to use the latter two sites). After browsing that information,
For the serious: Read this chapter (by Dr. Earp) and the listed or other article:
2) Using the software (installed on your computers as Clamp.exe) made freely available at the Utah based Protein Interaction Facility, called Clamp, work through the provided tutorials.
************ skip savuka for 2003 ***************
New Assignment - Linux and Savuka - being developed for Friday.
We will be using Linux, so here are a few commands that will be helpful. You will download a static image of savuka, and I'll help you install it on the machine. The lab computers already have f2c-19970805-3 installed, which is good because savuka uses its library. If you install this image of savuka elsewhere, you may have to install this f2c program if it is not already there. I have provided the f2c-1997-0805-i386.rpm file at the ftp site for this purpose. So, follow these instructions to get and install savuka on the lab computers.
Once the program is installed, read the manual for its commands, and read and perform the two tutorials to learn about this program. Savuka - manual, original tutorial, my edited version of the tutorial. Once you have finished the author-provided tutorial
******************************* skip above in 2003 *****************************