DctD, a sigma-54-dependent, prokaryotic transcriptional activator, binds to tandem sites located between 90 and 160 base pairs upstream of the dctA promoter:
-160 Site 1 -120 site 2 -90 | --- > < --- | -------> <------- | ATGCGAACATGGTGCATGTTTTCGCCCAGGACGCCAGCACTTCTGTGCGGAAATCCGCACATATCCACGAA...
Regulated expression of dctA provides for utilization of 4C-dicarboxylates by the free-living and symbiotic forms of the bacteria. In order to determine intrinsic binding constants and to establish whether binding was cooperative, it was necessary to perform DNAse I footprint titrations on "reduced valency" DNA templates that isolated the two sites from each other as well as on the wildtype configuration of sites. This is typcial of complex systems that are defined by covariant parameters such as free energies of intrinsic binding and cooperative interactions.
In general, when trying to isolate one site from several in a complex binding region, it is recommended to create minimal changes in the DNA sequence that accomplishes site isolation, as changes in one site may affect the intrinsic binding to an adjacent site. Consequently, a point mutation that had been previously found to render the binding site 2 nearly nonfunctional in in vivo assays (ref 8) was included in the study (template 2, Fig below). Unfortunately, it became clear that this point mutation, while lowering affinity for the given mutant site, did not reduce affinity enough to allow the mutant site to be ignored in the data analysis.
In essence, then, including the mutant site in the analysis created an additional parameter for intrinsic binding to the new site, rather than placing constraints on the acceptable values for parameter estimates from the experiment with wild type DNA (here we are assuming that the point mutation disturbed only the intrinsic binding to the mutant site, and has no effect on cooperativity or intrinsic binding to the adjacent, nonmutant site). A 5 bp insertion between sites was introduced in DNA templates 1 and 2 to disrupt cooperative interactions, effectively separating the tandem sites from one another (templates 3 and 4, Fig below). Since a 5 bp insertion may affect the intrinsic affinity of adjacent sites for DctD, site 1 was also deleted to provide an independent data set for estimating intrinsic binding to site 2 (template 5).
-160 Site 1 -120 site 2 -90
| --- > < --- | -------> <------- |
ATGCGAACATGGTGCATGTTTTCGCCCAGGACGCCAGCACTTCTGTGCGGAAATCCGCACATATCCACGAA... 1 (sites 1 and 2)
........................................................C................. 2 (site 1, site 3)
.....................................|.................................... 3 (site 1, site 2)
GTAAC
.....................................|..................C................. 4 (site 1, site 3)
GTAAC
(----------------------------------)...................................... 5 (site 2)
Legend: The first DNA template contains the wild type configuration in which the centers
of sites 1 and 2 are separated by three turns of the helix; the second template contains a
point mutation in site 2, modeled as site 3; the third template contains a 5 bp insertion
between wild type sites 1 and 2; template four contains the same 5 bp insertion in template
2, between site 1 and mutant site 3; and site 1 has been deleted from template five. All
templates were located in the center of an EcoR1-BamH1 restriction fragment of ~210 bp.
In summary, the 5 DNA templates can be modeled as a subset of the more general 3 site system (site 1, site 2, site 2* = site 3). In this case, states 6 and 7 of the three site system (Table 1) simply do not exist, because sites 2 and 2* cannot coexist. Note that it is being assumed that the 5 bp insertions in templates 3 and 4 eliminate cooperative interactions but do not disturb intrinsic binding constants. Likewise, deleting site 1 in template 5 is presumed to not affect the intrinsic affinty of site 2 for DctD, and it is assumed that the point mutation creating site 2* does not affect the intrinsic affinity of adjacent site 1. In general, these assumptions need not be true, but the results are consistent with them being true in this case (see below). Finally, cooperativity may or may not be affected by the point mutation in site 2. Both situations were modeled by creating two versions of the NONLIN program, one assigning dG13 to equal dG12 and the other providing for both parameters to be estimated from the combined data set.
The sites were placed in the middle of a 210 bp EcoR1-BamH1 restriction fragment. As described in Current Protocols Chapter 12 section 4, data was obtained showing protection of each DNA fragment from DNAse I digestion by Rhizobium meliloti DctD, present at concentrations ranging from mM to pM of presumed dimer. A BetaScope 603 Blot Analyzer was used to collect the protection data, which was converted to normalized fractional protection values using the formula:
All five of the footprint experiments were repeated, so there was a total of 18 site titrations (templates 1-4 have 2 sites each, while template 5 has only 1 site). An ASCII text file containing this data, for all site titrations, was then created with 5 columns for each data point for simultaneous analysis. The file contains 392 lines of data, roughly 22 per titration of a given site.
The number in column 5 is not simply decimal 1 to 5 identifying the DNA templates as illustrated above, but rather a decimal number that encodes in binary form the site structure of a given DNA template (see below). These latter two numbers are used by the program to determine which equation should be used to model fractional occupancy, and to identify the limits used to convert the observed fractional protection (raw data) to observed fractional occupancy for each site titration. Incorrect determination of the numbers in columns 4 and 5, which reflect the actual data set and model being tested, is a likely source of error that will cause NONLIN to malfunction.
A single 8-bit byte can be represented by a string of eight "1" or "0" characters. By using a "1" to indicate presence and "0" to indicate absence, the presence or absence of sites 1 to 3 can be indicated by placing the correct character in places 1 to 3 of the byte. Thus, the decimal number 3, representing the byte 0000011, stands for a DNA template in which sites 1 and 2 are present; likewise, the decimal number 2 represents the byte 00000010, indicating the presence of only site 2. Given the assumptions that have been made, data for experiments with templates 3 and 4 can be combined with the rest by treating the individual site titrations in templates 3 and 4 as if the DNA lacked the other site. For example, for site 1, template 4 can be idenfied by "00000001" (decimal 1), and for site 3, as "00000100" (decimal 4). Table 2 illustrates all of the template assignments for column 5 in the data file.

This number indicates two things.
This latter information is required for analyzing the raw data, since fractional protection (P, the raw data) is linearly proportional to fractional occupancy (Y, that which is modeled): Pobs = (U-L)*Yobs + L. This equation rearranges to Yobs = (Pobs-L)/(U-L); only by estimating U and L can Pobs be used to calculate Yobs which can then be compared with Y predicted by the model. Indeed, it is necessary to use a wide range of protein concentration in quantitative DNAse I footprints specifically to provide accurate estimates of U and L. It is also by knowing the values of U and L for each titration that the raw data can be recast as Yobs for plotting vs [Ligand] to illustrate the isotherms normally published. In the DctD titrations of the DNA templates being considered, U and L varied from about -.1 to.1 and .6 to .9, respectively.
The number in column 4 must thus be unique to identify each titration, and yet refer to the common sites 1-3. This may be accomplished in several ways; the example here is to use decimal numbers 1,2, and 3 to refer to sites 1, 2, or 3 for the first experiment (titrations 1 and 2); for the second experiment, we will use decimal numbers 4, 5 or 6 to refer to sites 1, 2, and 3; etc, so that the titrations of the 10th experiment will be described as 28, 29 and 30 to refer to sites 1, 2, or 3.
Recalling that column 1 is for site protection P, 2 for weighting factors, 3 for [DctD2], 4 for site and titration identification, and 5 for template identification, we have the following ASCII text data file (all data points are shown for the first titration, while only the first and last of subsequent ones; the entire data file is available):
7.440E-01 1.000E+00 3.900E-06 1.000E+00 3.000E+00 7.780E-01 1.000E+00 1.950E-06 1.000E+00 3.000E+00 8.040E-01 1.000E+00 7.800E-07 1.000E+00 3.000E+00 8.190E-01 1.000E+00 3.900E-07 1.000E+00 3.000E+00 7.810E-01 1.000E+00 1.950E-07 1.000E+00 3.000E+00 6.890E-01 1.000E+00 7.800E-08 1.000E+00 3.000E+00 7.980E-01 1.000E+00 3.900E-08 1.000E+00 3.000E+00 7.210E-01 1.000E+00 1.950E-08 1.000E+00 3.000E+00 6.670E-01 1.000E+00 7.800E-09 1.000E+00 3.000E+00 4.230E-01 1.000E+00 3.900E-09 1.000E+00 3.000E+00 3.060E-01 1.000E+00 1.950E-09 1.000E+00 3.000E+00 9.700E-02 1.000E+00 7.800E-10 1.000E+00 3.000E+00 -1.100E-02 1.000E+00 3.900E-10 1.000E+00 3.000E+00 -4.000E-02 1.000E+00 1.950E-10 1.000E+00 3.000E+00 -1.100E-01 1.000E+00 7.800E-11 1.000E+00 3.000E+00 -4.900E-02 1.000E+00 3.900E-11 1.000E+00 3.000E+00 3.200E-02 1.000E+00 1.950E-11 1.000E+00 3.000E+00 -4.700E-02 1.000E+00 7.800E-12 1.000E+00 3.000E+00 -1.210E-01 1.000E+00 3.900E-12 1.000E+00 3.000E+00 -3.700E-02 1.000E+00 1.950E-12 1.000E+00 3.000E+00 9.100E-02 1.000E+00 7.800E-13 1.000E+00 3.000E+00 4.400E-02 1.000E+00 3.900E-13 1.000E+00 3.000E+00 7.410E-01 1.000E+00 3.900E-06 2.000E+00 3.000E+00 ... -2.900E-02 1.000E+00 3.900E-13 2.000E+00 3.000E+00 8.540E-01 1.000E+00 3.900E-06 4.000E+00 3.000E+00 ... -7.800E-02 1.000E+00 3.900E-13 4.000E+00 3.000E+00 7.830E-01 1.000E+00 3.900E-06 5.000E+00 3.000E+00 ... 1.800E-02 1.000E+00 3.900E-13 5.000E+00 3.000E+00 5.810E-01 1.000E+00 3.900E-06 7.000E+00 1.000E+00 ... -1.100E-02 1.000E+00 1.950E-12 7.000E+00 1.000E+00 6.300E-01 1.000E+00 3.900E-06 8.000E+00 2.000E+00 ... 1.300E-02 1.000E+00 1.950E-12 8.000E+00 2.000E+00 8.150E-01 1.000E+00 3.900E-06 1.000E+01 1.000E+00 ... 2.800E-02 1.000E+00 3.900E-13 1.000E+01 1.000E+00 8.000E-01 1.000E+00 3.900E-06 1.100E+01 2.000E+00 ... 5.000E-03 1.000E+00 3.900E-13 1.100E+01 2.000E+00 7.800E-01 1.000E+00 3.900E-06 1.300E+01 5.000E+00 ... 2.200E-02 1.000E+00 3.900E-13 1.300E+01 5.000E+00 7.340E-01 1.000E+00 3.900E-06 1.500E+01 5.000E+00 ... -1.000E-02 1.000E+00 3.900E-13 1.500E+01 5.000E+00 8.020E-01 1.000E+00 3.900E-06 1.600E+01 5.000E+00 ... -4.600E-02 1.000E+00 3.900E-13 1.600E+01 5.000E+00 8.240E-01 1.000E+00 3.900E-06 1.800E+01 5.000E+00 ... 4.000E-03 1.000E+00 3.900E-13 1.800E+01 5.000E+00 8.660E-01 1.000E+00 3.900E-06 1.900E+01 1.000E+00 ... -1.800E-02 1.000E+00 3.900E-13 1.900E+01 1.000E+00 8.810E-01 1.000E+00 3.900E-06 2.100E+01 4.000E+00 ... 5.000E-03 1.000E+00 3.900E-13 2.100E+01 4.000E+00 8.430E-01 1.000E+00 3.900E-06 2.200E+01 1.000E+00 ... 6.000E-02 1.000E+00 3.900E-13 2.200E+01 1.000E+00 8.490E-01 1.000E+00 3.900E-06 2.400E+01 4.000E+00 ... -4.000E-03 1.000E+00 3.900E-13 2.400E+01 4.000E+00 8.300E-01 1.000E+00 3.900E-06 2.600E+01 2.000E+00 ... 7.200E-02 1.000E+00 3.900E-13 2.600E+01 2.000E+00 8.530E-01 1.000E+00 3.900E-06 3.000E+01 2.000E+00 ... -2.800E-02 1.000E+00 3.900E-13 3.000E+01 2.000E+00
typed input comment
dctAuas.dat name of the input file containing the normalized, DNAse1 protection data for all titrations
dctAuas.nln name of the output file, for NONLIN to write the results to
-9 initial guess for dG1 - required for beginning the nonlinear regression process
-9 initial guess for dG2
-9 initial guess for dG3
-3 initial guess for dG12
-3 initial guess for dG13
-3 initial guess for dG23 - here not relevant, any value is ok
-3 initial guess for dG123 - not relevant
-1 Kd for monomer / dimer equilibrium; -1 flags NONLIN to omit such calculations
0 initial guess for lower limit L on titration identified as 1
1 initial guess for upper limit U on titration identified as 1
0 initial guess for lower limit L on titration identified as 2
1 initial guess for upper limit U on titration identified as 2
0 initial guess for lower limit L on titration identified as 4
1 initial guess for upper limit U on titration identified as 4
0 initial guess for lower limit L on titration identified as 5
1 initial guess for upper limit U on titration identified as 5
0 initial guess for lower limit L on titration identified as 7
1 initial guess for upper limit U on titration identified as 7
0 initial guess for lower limit L on titration identified as 8
1 initial guess for upper limit U on titration identified as 8
0 initial guess for lower limit L on titration identified as 10
1 initial guess for upper limit U on titration identified as 10
0 initial guess for lower limit L on titration identified as 11
1 initial guess for upper limit U on titration identified as 11
0 initial guess for lower limit L on titration identified as 13
1 initial guess for upper limit U on titration identified as 13
0 initial guess for lower limit L on titration identified as 15
1 initial guess for upper limit U on titration identified as 15
0 initial guess for lower limit L on titration identified as 16
1 initial guess for upper limit U on titration identified as 16
0 initial guess for lower limit L on titration identified as 18
1 initial guess for upper limit U on titration identified as 18
0 initial guess for lower limit L on titration identified as 19
1 initial guess for upper limit U on titration identified as 19
0 initial guess for lower limit L on titration identified as 21
1 initial guess for upper limit U on titration identified as 21
0 initial guess for lower limit L on titration identified as 22
1 initial guess for upper limit U on titration identified as 22
0 initial guess for lower limit L on titration identified as 24
1 initial guess for upper limit U on titration identified as 24
0 initial guess for lower limit L on titration identified as 26
1 initial guess for upper limit U on titration identified as 26
0 initial guess for lower limit L on titration identified as 29
1 initial guess for upper limit U on titration identified as 29
fit command to begin fitting process
11111000111111111111111111111111111111111111 vector assigning parameters to be held constant (0) or fit (1)
1 command to begin error analysis process
stop command to end NONLIN session
To use the program 3sit5eq4.exe, the fitting vector would be changed so that parameter #5 is not fit (remember, dG13 is being equated with dG12 in this version of the program, so it would be a mistake to tell the program to try to fit parameter 5; doing so will cause the program to crash). To make this change, the fitting vector in the instructions file should read:
.
.
fit
11110000111111111111111111111111111111111111
.
.
Note that the NONLIN menu allows setting various program constants. For example, the anlaysis presented here used a 67% confidence level, and minimization stops short of a true minimum when the change in weighted squared summmed residuals is less than .0001 (frequently the case for DNAse1 data). Using the program interactively, where one types in the information needed for NONLIN to proceed, allows one to explore the available settings. Alternatively, to change the confidence level to 95% but leave the cutoff at .0001, insert the following lines after the initial guesses, but before the "fit" command in the instructions file so it looks like:
.
.
.
0
1
constants
95
.0001
NO
fit
.
.
.
If NONLIN successfully minimizes the weighted squared sum of residuals, or reaches the user-assigned cutoff (default 0.0001) for significant change, the user is then provided with an option for conducting an error analysis:
0 - no error analysis
1 - analysis using F-statistic ratio of first type
2 - analysis using F-statistic ratio of second type
3 - analysis using user defined F-statistic ratio
In the above example instructions file, option 1 was chosen, which is recommended in the NONLIN documentation.
The Results
The output of the global analysis, in which all 18 titrations were simultaneously considered, is recorded in the output file (as named in the second line of the instructions file, or second entry when run interactively). The output file contains the following elements:
For graphical presentation of the data, one can use these parameter estimates to: