Contents

  1. Introduction
  2. Global Analysis
  3. Math Models
  4. Dct Example
  5. Software


Global or Simultaneous, Nonlinear Regression Analysis

The analysis of multiple site titrations is often best performed using a global nonlinear regression method for estimating the parameter values that have the maximum liklihood of being correct. The approach to be described here was first applied to protein / DNA interactions in studies of the lambda phage repressor / operator system (1-3), and has recently been reviewed in Methods in Enzymology Vol 210, 1992 (see reference articles 4-7). In this context, a nonlinear function is one whose second and higher order derivatives with respect to its parameters (constants) are all zero. The function y defined by the set of parameters {a1, a2, a3, ...} and independent variable x such that y = a1 + a2x + a3x2 + ... is thus a linear function with respect to its parameters, while the function y = (a1x + a1a2a3x2)/(1 + a1x + a2x + a1a2a3x2), which describes the fractional occupancy of sites in a two site system with cooperativity existing between adjacently bound proteins, is nonlinear.

In such an analysis, a reiterative process is employed to identify the parameter values (m in number) that minimize the variance of fit. The variance of fit is defined as the averaged weighted sum of the squares of the differences between the fitting function [f(xi,a)] and the experimental data [y(xi)]:

In this equation, weighting of individual differences between the observed and predicted values is achieved by dividing them by the standard deviation (sigma ) for each observation; averaging is achieved by dividing by the number of degrees of freedom (N-m-1, for N data points and m parameters). For a given experimental data set, such an analysis often yields many parameter value sets ("solutions") that lead to variances of fit that are insignificantly different. In such cases, there is no basis for chosing which parameter values are the most likely ones to describe the system being studied. In other words, the information content of the experiment is insufficient to resolve the parameters into a unique set of most probable values. A single parameter set associated with a unique minimal variance of fit often emerges by including data from a different but related experiment, whose modelling function shares one or more parameters with that of the first data set. This happens when constraints exist for a given parameter in one experiment that eliminates some of the possible values for that parameter in the second experiment, and vice versa. To achieve such a mutual refinement of parameter estimation, one conducts the least squares regression process for both experiments simultaneously, minimizing a variance of fit that reflects both experiments.

Such simultaneous analysis of two or more experimental data sets is called a global analysis. In addition to improving one's ability to resolve unique estimates of parameters, a simultaneous analysis provides a more realistic estimate of error enabling one to better interpret one's data. This is especially true when parameters are cross-correlated, for which an uncertainty in estimating one parameter will induce a systematic uncertainty in estimating the other parameter. Ignoring this information can result in significantly underestimating the confidence intervals. Recently such analyses have become practical for all molecular biologists, as desktop computers are now powerful enough to rapidly perform the needed calculations. This document describes how to perform such an analysis of DNAse 1 footprint data on an IBM-PC clone, but the information is relevant to analysis of data from other experiments on any computer, assuming one can formulate a quantitative model and has basic knowledge about the operating system.

In the example that follows, the program NONLIN was used to perform a global nonlinear regression analysis of several DNAseI footprint titrations. Other statistical analysis programs could certainly have been used, but NONLIN was chosen because it includes information about cross-correlation of parameters in the error analysis, unlike many programs that opt for speed by simplifying error calculations. In addition, NONLIN very efficiently converges on parameter sets that yield a minimal variance of fit. NONLIN is available for several computer platforms from Dr. Michael Johnson, Dept of Pharmacology, University of Virginia, Charlottesville, VA 22908.

Prior Chapter

Next Chapter