Tutorial about analyzing blink statistics#
SMLM depends critically on the fluorescence intermittency or, in other words, the blinking of fluorescence dyes. To characterize blinking properties you can compute on- and off-periods from clustered localizations assuming that they originate from the same fluorophore.
from pathlib import Path
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import locan as lc
lc.show_versions(system=False, dependencies=False, verbose=False)
Locan:
version: 0.20.0.dev41+g755b969
Python:
version: 3.11.6
Synthetic data#
We use synthetic data that represents localizations from a single fluorophore being normal distributed in space and emitting at a constant intensity. We assume that the on- and off-times in units of frames are distributed like a geometric distribution with a mean on_period mean_on
and a mean off_period mean_off
. Typically a geometric distribution is parameterized by a variable p
with p = 1 / mean
.
rng = np.random.default_rng(seed=1)
n_samples = 10_000
mean_on = 5
mean_off = 20
on_periods = stats.geom.rvs(p=1/mean_on, size=n_samples, random_state=rng)
off_periods = stats.geom.rvs(p=1/mean_off, size=n_samples, random_state=rng)
On- and off-times are converted in a series of frame numbers at which a localization was detected.
def periods_to_frames(on_periods, off_periods):
"""
Convert on- and off-periods into a series of increasing frame values.
"""
on_frames = np.arange(np.sum(on_periods))
cumsums = np.r_[0, np.cumsum(off_periods)[:-1]]
add_on = np.repeat(cumsums, on_periods)
frames = on_frames + add_on
return frames[:len(on_periods)]
frames = periods_to_frames(on_periods, off_periods)
offspring = [rng.normal(loc=0, scale=10, size=(n_samples, 2))]
locdata = lc.simulate_cluster(centers=[(50, 50)], region=[(0, 100), (0, 100)], offspring=offspring, clip=False, shuffle=False, seed=rng)
locdata.dataframe['intensity'] = 1
locdata.dataframe['frame'] = frames
locdata = lc.LocData.from_dataframe(dataframe=locdata.data)
print('Data head:')
print(locdata.data.head(), '\n')
print('Summary:')
locdata.print_summary()
print('Properties:')
print(locdata.properties)
Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.
Data head:
position_x position_y cluster_label intensity frame
0 50.507149 36.698297 0 1 0
1 44.539062 27.969248 0 1 1
2 56.876954 42.152826 0 1 2
3 38.878363 43.999842 0 1 3
4 61.001644 54.231150 0 1 4
Summary:
identifier: "2"
comment: ""
source: DESIGN
state: RAW
element_count: 10000
frame_count: 10000
creation_time {
2024-03-14T11:04:50.982889Z
}
Properties:
{'localization_count': 10000, 'position_x': 49.986893963647944, 'uncertainty_x': 0.09881426457266232, 'position_y': 49.84659129517861, 'uncertainty_y': 0.10007238629130723, 'intensity': 10000, 'frame': 0, 'region_measure_bb': 6466.041774468444, 'localization_density_bb': 1.5465411992056104, 'subregion_measure_bb': 322.0177777257221}
lc.render_2d(locdata, bin_size=5);
Blinking statistics#
To determine on- and off-times for the observed blink events use the analysis class BlinkStatistics
.
bs = lc.BlinkStatistics(memory=0, remove_heading_off_periods=False).compute(locdata)
bs
BlinkStatistics(memory=0, remove_heading_off_periods=False)
bs.results.keys()
dict_keys(['on_periods', 'on_periods_frame', 'off_periods', 'off_periods_frame', 'on_periods_indices'])
When plotting the histogram an exponential distribution is fitted by default.
bs.hist(data_identifier='on_periods');
bs.hist(data_identifier='off_periods');
bs.distribution_statistics
{'on_periods': _DistributionFits(analysis_class=BlinkStatistics, distribution=expon_gen, data_identifier=on_periods),
'off_periods': _DistributionFits(analysis_class=BlinkStatistics, distribution=expon_gen, data_identifier=off_periods)}
The fit results provide loc
and scale
parameter (see scipy.stats
documentation). For loc = 0
, scale
describes the mean of the distribution..
bs.distribution_statistics['on_periods'].parameter_dict()
{'on_periods_loc': 1.0, 'on_periods_scale': 3.9455984174085064}
bs.distribution_statistics['off_periods'].parameter_dict()
{'off_periods_loc': 1.0, 'off_periods_scale': 18.682830282038594}
Due to the default setting for the scaling parameter loc
the mean on_period is on_periods_scale + on_periods_loc
in agreement with our input value.
Geometric distribution#
We can compare this with a geometric distribution that is estimated from the observed mean on_period on_periods_mean
.
on_periods_mean = bs.results['on_periods'].mean()
on_periods_mean.round(2)
4.95
off_periods_mean = bs.results['off_periods'].mean()
off_periods_mean.round(2)
19.68
# test result
x = np.arange(stats.geom.ppf(0.01, 1/on_periods_mean), stats.geom.ppf(0.9999, 1/on_periods_mean))
y = stats.geom.pmf(x, 1/on_periods_mean)
fig, ax = plt.subplots()
bs.hist(data_identifier='on_periods', fit=False, label='data')
bs.distribution_statistics['on_periods'].plot(label='exponential')
ax.plot(x, y, '-go', label='geometric')
ax.set_yscale('log')
ax.legend(loc='best')
plt.show()