Edit or run this notebook

Fitting of equilibrium binding data

This notebook plots an equilibrium binding dataset (with error bars, for datasets containing replicates) and performs non-linear curve fitting of a model to the data.

The following reference explains very well the theory of equilibrium binding experiments, as well as many important practical considerations:

Jarmoskaite I, AlSadhan I, Vaidyanathan PP & Herschlag D (2020) How to measure and evaluate binding affinities. eLife 9: e57264 https://doi.org/10.7554/eLife.57264

Example datasets are from the following publication:

Gaullier G, Roberts G, Muthurajan UM, Bowerman S, Rudolph J, Mahadevan J, Jha A, Rae PS & Luger K (2020) Bridging of nucleosome-proximal DNA double-strand breaks by PARP2 enhances its interaction with HPF1. PLOS ONE 15: e0240932 https://doi.org/10.1371/journal.pone.0240932

The original files are freely available from https://doi.org/10.5281/zenodo.3519435 (CC-BY 4.0). They are also available reformatted for compatibility with this notebook in this repository: https://github.com/Guillawme/julia-curve-fitting/tree/main/datasets

2.7 ms

Load data

The data file must be in CSV format. The first row is assumed to contain column names. The first column is assumed to be the X values, all other columns are assumed to be replicate Y values that will be averaged (fitting will be done against the mean values, weighted by their standard deviations). In addition, there must not be any row with an X=0 value (this would result in an error when attempting to plot with a logarithmic scale). I always perform two measurements at X=0, and I always sort rows by descending X values, so this notebook automatically skips the last two rows of the CSV file; adjust accordingly if you don't measure at X=0 and want to keep all rows (see section Data processing below). The data does not need to be scaled such that Y takes values between 0 and 1: the binding models can account for arbitrary minimum and maximum Y values (see section Model functions below). Scaling data will hide differences in signal change between datasets, while these differences may tell you something about the system under study, so scaling should never be done "blindly"; always look at the raw data.

Indicate below which data files to process:

  • if the path given is not absolute, it is assumed to be relative to the notebook file (wherever the notebook is located)

  • a list of files can be provided with one file path per line and separated by commas

  • files can be located by local path or URL, but each type of location should be in the dedicated list

21.2 Î¼s
dataURLs
1.3 Î¼s
dataFiles
1.3 Î¼s

Number of rows to ignore at the end of files:

26.9 ms

Your data should appear below shortly, check that it looks normal. In addition to the columns present in your CSV file, you should see three columns named mean, std and measurement (these values will be used for fitting and plotting).

4.7 Î¼s
23.6 s

We will need to keep track of dataset names. You can edit them here if you want to use something more meaningful than the file name or URL; changes will propagate to plot legends. This list must contain the same number of elements as you have datasets: 2 in this case.

21.6 Î¼s
datasetNames
2.3 Î¼s

Visualizations

Your data and fit should appear below shortly. Take a good look at the data and fit, make sure you check the residuals. Once you're happy with it, check the numerical results.

9.5 Î¼s

Data and fit

Select binding model:

4.8 Î¼s
111 ms

Show fit line with initial parameters?

26.0 ms

For the quadratic model, indicate receptor concentration (the receptor is the binding partner kept at constant, low concentration across the titration series). Parameter R0= 5.0

138 ms
4.6 s

Residuals

3.0 Î¼s

The fit residuals should follow a random normal distribution around 0. If they show a systematic trend, it means the fit systematically deviates from your data, and therefore the model you chose might not be justified (but be careful when considering alternative models: introducing more free parameters will likely get the fit line closer to the data points and yield a lower sum of squared residuals, but this is not helpful if these additional parameters don't contribute to explaining the physical phenomenon being modeled). Another possibility is a problem with your data. The most common problems are:

  • the data does not cover the proper concentration range

  • the concentration of receptor is too high relative to the KD

In either case, your best option is to design a new experiment and collect new data.

12.4 Î¼s

Scatter plot

3.2 Î¼s
351 ms

Histogram

3.2 Î¼s
1.2 s

Numerical results

Model parameters

4.5 Î¼s
Dataset				Kd				Smin			Smax			h
dataset_003.csv		16.7 ± 0.1		62.9 ± 0.1		262.9 ± 0.2		0.9 ± 0.0
dataset_005.csv		11.8 ± 0.0		68.0 ± 0.2		279.7 ± 0.3		1.2 ± 0.0
223 ms

Sum of squared residuals

2.8 Î¼s
Dataset				Sum of squared residuals
dataset_003.csv		2161.2
dataset_005.csv		449.13
731 ms

Code

The code doing the actual work is in this section. Do not edit unless you know what you are doing.

4.7 Î¼s

Necessary packages and notebook setup

4.0 Î¼s
55.2 s
4.8 ms

Data processing

2.8 Î¼s

The commonProcessing() function computes the mean and standard deviation of replicates, defines measurements as mean ± std, and returns a DataFrame containing all the data. It is used by all methods of the following processData() function, which handle various inputs (path to a local file, URL to a remote file, loaded CSV file, loaded DataFrame).

3.9 Î¼s
commonProcessing (generic function with 1 method)
42.9 Î¼s

The processData() function loads one data file, computes the mean and standard deviation of replicates, defines measurements as mean ± std, and returns a DataFrame containing all the data.

4.3 Î¼s
processData (generic function with 1 method)
41.6 Î¼s

This functions should also work if passed an URL to a remote file:

3.4 Î¼s
processData (generic function with 2 methods)
44.5 Î¼s

This functions should also work if passed an already loaded CSV file:

3.4 Î¼s
processData (generic function with 3 methods)
23.5 Î¼s

This functions should also work if passed an already loaded data frame (for example, if the user wants to load data and pre-process it in a different way before averaging replicates):

3.0 Î¼s
processData (generic function with 4 methods)
16.1 Î¼s

Plotting

2.9 Î¼s

The initMainPlot() function initializes a plot, the plotOneDataset() function plots one dataset (call it repeatedly to plot more datasets on the same axes).

3.9 Î¼s
initMainPlot (generic function with 1 method)
32.9 Î¼s
plotOneDataset! (generic function with 3 methods)
70.8 Î¼s

The initResidualPlot() function initializes a plot, the plotOneResiduals!() function plots the fit residuals from one dataset (call it repeatedly to plot more datasets on the same axes).

4.0 Î¼s
initResidualPlot (generic function with 1 method)
31.7 Î¼s
plotOneResiduals! (generic function with 1 method)
33.5 Î¼s

The initResidualHistogram() function initializes a histogram, the plotOneResidualsHistogram!() function plots a histogram of the fit residuals from one dataset (call it repeatedly to plot more datasets on the same axes).

3.4 Î¼s
initResidualHistogram (generic function with 1 method)
32.6 Î¼s
plotOneResidualsHistogram! (generic function with 1 method)
30.5 Î¼s

Model functions

3.0 Î¼s

Model selection

3.4 Î¼s

This dictionary maps radio button options (in section Visualizations above) to their corresponding model function:

5.0 Î¼s
bindingModels
351 ms

The remaining cells in this section are only meant to check that the model selection buttons work. This first cell should return the name of the selected binding model (corresponding to the active radio button in section Visualizations above):

5.2 Î¼s
"Hill"
100 ns

This other cell should return the model function corresponding to the selected binding model (the active radio button in section Visualizations above):

4.8 Î¼s
hill (generic function with 1 method)
2.6 Î¼s

Hill model

This is the Hill equation:

S=Smin+(Smax−Smin)×LhKDh+Lh

In which S is the measured signal (Y value) at a given value of ligand concentration L (X value), Smin and Smax are the minimum and maximum values the observed signal can take, respectively, KD is the equilibrium dissociation constant and h is the Hill coefficient.

8.2 Î¼s
hill (generic function with 1 method)
38.5 Î¼s

Hyperbolic model

The hyperbolic equation is a special case of the Hill equation, in which h=1:

5.2 Î¼s
hyperbolic (generic function with 1 method)
27.4 Î¼s

Quadratic model

Unlike the Hill and hyperbolic models, the quadratic model does not make the approximation that the concentration of free ligand at equilibrium is equal to the total ligand concentration:

S=Smin+(Smax−Smin)×(KD+Rtot+Ltot)−(−KD−Rtot−Ltot)2−4×Rtot×Ltot2×Rtot

Symbols have the same meaning as in the previous equations, except here Ltot is the total concentration of ligand, not the concentration of free ligand at equilibrium. Rtot is the total concentration of receptor.

In principle, Rtot could be left as a free parameter to be determined by the fitting procedure, but in general it is known accurately enough from the experimental set up, and one should replicate the same experiment with different concentrations of receptor to check its effect on the results. Rtot should be set in the experiment to be smaller than KD, ideally, or at least of the same order of magnitude than KD. It might take a couple experiments to obtain an estimate of KD before one can determine an adequately small concentration of receptor at which to perform a definite experiment.

7.8 Î¼s
quadratic (generic function with 1 method)
34.8 Î¼s

Parameters and their initial values

2.9 Î¼s

The findInitialValues() function takes the measured data and returns an array containing initial values for the model parameters (in this order): Smin, Smax, KD and h (for the Hill model only, so the function needs to know which model was selected).

Initial values for Smin and Smax are simply taken as the minimal and maximal values found in the data. The initial estimate for KD is the concentration of the data point that has a signal closest to halfway between Smin and Smax (if the experiment was properly designed, this is a reasonable estimate and close enough to the true value for the fit to converge). The initial estimate of h is 1.0, meaning we assume no cooperativity.

5.7 Î¼s
findInitialValues (generic function with 1 method)
43.3 Î¼s

Determine initial values of the selected model's parameters from the currently loaded datasets:

3.4 Î¼s
initialParams
253 ms

Fitting

2.9 Î¼s

Perform fit of the selected model to the measurements' mean values using initial values for the model parameters determined previously. If the dataset contains replicates, the fit will be weighted by the measurements' standard deviations.

3.6 Î¼s
4.5 s

Degrees of freedom:

3.2 Î¼s
76.6 ms

Best fit parameters:

2.8 Î¼s
84.4 ms

Standard errors of best-fit parameters:

3.1 Î¼s
paramsStdErrors
474 ms
Loading...i