Tutorial about analyzing blink statistics#

SMLM depends critically on the fluorescence intermittency or, in other words, the blinking of fluorescence dyes. To characterize blinking properties you can compute on- and off-periods from clustered localizations assuming that they originate from the same fluorophore.

from pathlib import Path

%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

import locan as lc

lc.show_versions(system=False, dependencies=False, verbose=False)

Locan:
   version: 0.20.0.dev41+g755b969

Python:
   version: 3.11.6

Synthetic data#

We use synthetic data that represents localizations from a single fluorophore being normal distributed in space and emitting at a constant intensity. We assume that the on- and off-times in units of frames are distributed like a geometric distribution with a mean on_period mean_on and a mean off_period mean_off. Typically a geometric distribution is parameterized by a variable p with p = 1 / mean.

rng = np.random.default_rng(seed=1)

n_samples = 10_000
mean_on = 5
mean_off = 20

on_periods = stats.geom.rvs(p=1/mean_on, size=n_samples, random_state=rng)
off_periods = stats.geom.rvs(p=1/mean_off, size=n_samples, random_state=rng)

On- and off-times are converted in a series of frame numbers at which a localization was detected.

def periods_to_frames(on_periods, off_periods):
    """
    Convert on- and off-periods into a series of increasing frame values.
    """
    on_frames = np.arange(np.sum(on_periods))
    cumsums = np.r_[0, np.cumsum(off_periods)[:-1]]
    add_on = np.repeat(cumsums, on_periods)
    frames = on_frames + add_on
    return frames[:len(on_periods)]

frames = periods_to_frames(on_periods, off_periods)

offspring = [rng.normal(loc=0, scale=10, size=(n_samples, 2))]
locdata = lc.simulate_cluster(centers=[(50, 50)], region=[(0, 100), (0, 100)], offspring=offspring, clip=False, shuffle=False, seed=rng)
locdata.dataframe['intensity'] = 1
locdata.dataframe['frame'] = frames

locdata = lc.LocData.from_dataframe(dataframe=locdata.data)

print('Data head:')
print(locdata.data.head(), '\n')
print('Summary:')
locdata.print_summary()
print('Properties:')
print(locdata.properties)

Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.

Data head:
   position_x  position_y  cluster_label  intensity  frame
0   50.507149   36.698297              0          1      0
1   44.539062   27.969248              0          1      1
2   56.876954   42.152826              0          1      2
3   38.878363   43.999842              0          1      3
4   61.001644   54.231150              0          1      4 

Summary:
identifier: "2"
comment: ""
source: DESIGN
state: RAW
element_count: 10000
frame_count: 10000
creation_time {
  2024-03-14T11:04:50.982889Z
}

Properties:
{'localization_count': 10000, 'position_x': 49.986893963647944, 'uncertainty_x': 0.09881426457266232, 'position_y': 49.84659129517861, 'uncertainty_y': 0.10007238629130723, 'intensity': 10000, 'frame': 0, 'region_measure_bb': 6466.041774468444, 'localization_density_bb': 1.5465411992056104, 'subregion_measure_bb': 322.0177777257221}

lc.render_2d(locdata, bin_size=5);

../../_images/f8eeb74e463fa7a51ad1dc0b025ea6cf8d3d343ab1d72dcaf3361553034ecbfc.png

Blinking statistics#

To determine on- and off-times for the observed blink events use the analysis class BlinkStatistics.

bs = lc.BlinkStatistics(memory=0, remove_heading_off_periods=False).compute(locdata)
bs

BlinkStatistics(memory=0, remove_heading_off_periods=False)

bs.results.keys()

dict_keys(['on_periods', 'on_periods_frame', 'off_periods', 'off_periods_frame', 'on_periods_indices'])

When plotting the histogram an exponential distribution is fitted by default.

bs.hist(data_identifier='on_periods');

../../_images/8484c0b68355ee3dd71c31a88027424aa7bb59a242f6209e88079e88746e207a.png

bs.hist(data_identifier='off_periods');

../../_images/6e10fef20a66a196dde86faf654cd56a731d7ba04b29ceac970042372169d907.png

bs.distribution_statistics

{'on_periods': _DistributionFits(analysis_class=BlinkStatistics, distribution=expon_gen, data_identifier=on_periods),
 'off_periods': _DistributionFits(analysis_class=BlinkStatistics, distribution=expon_gen, data_identifier=off_periods)}

The fit results provide loc and scale parameter (see scipy.stats documentation). For loc = 0, scale describes the mean of the distribution..

bs.distribution_statistics['on_periods'].parameter_dict()

{'on_periods_loc': 1.0, 'on_periods_scale': 3.9455984174085064}

bs.distribution_statistics['off_periods'].parameter_dict()

{'off_periods_loc': 1.0, 'off_periods_scale': 18.682830282038594}

Due to the default setting for the scaling parameter loc the mean on_period is on_periods_scale + on_periods_loc in agreement with our input value.

Geometric distribution#

We can compare this with a geometric distribution that is estimated from the observed mean on_period on_periods_mean.

on_periods_mean = bs.results['on_periods'].mean()
on_periods_mean.round(2)

4.95

off_periods_mean = bs.results['off_periods'].mean()
off_periods_mean.round(2)

19.68

# test result
x = np.arange(stats.geom.ppf(0.01, 1/on_periods_mean), stats.geom.ppf(0.9999, 1/on_periods_mean))
y = stats.geom.pmf(x, 1/on_periods_mean)
fig, ax = plt.subplots()
bs.hist(data_identifier='on_periods', fit=False, label='data')
bs.distribution_statistics['on_periods'].plot(label='exponential')
ax.plot(x, y, '-go', label='geometric')
ax.set_yscale('log')
ax.legend(loc='best')
plt.show()

../../_images/0e832ffb2bc5f2c696b7c216c61750aa5fc1d3b112cb0fbbc286195f500691f0.png