Tutorial about Ripley’s k function#
from pathlib import Path
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import locan as lc
lc.show_versions(system=False, dependencies=False, verbose=False)
Locan:
version: 0.20.0.dev41+g755b969
Python:
version: 3.11.6
Synthetic data#
We simulate localization data that is homogeneously Poisson distributed (also described as complete spatial randomness, csr).
rng = np.random.default_rng(seed=1)
locdata_csr = lc.simulate_Poisson(intensity=1e-3, region=((0,1000), (0,1000)), seed=rng)
print('Data head:')
print(locdata_csr.data.head(), '\n')
print('Summary:')
locdata_csr.print_summary()
print('Properties:')
print(locdata_csr.properties)
Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.
Data head:
position_x position_y
0 144.159613 948.649447
1 311.831452 423.326449
2 827.702594 409.199136
3 549.593688 27.559113
4 753.513109 538.143313
Summary:
identifier: "1"
comment: ""
source: SIMULATION
state: RAW
element_count: 1001
frame_count: 0
creation_time {
2024-03-14T11:05:13.014271Z
}
Properties:
{'localization_count': 1001, 'position_x': 497.1480704304259, 'uncertainty_x': 8.846806436903796, 'position_y': 503.49401443075675, 'uncertainty_y': 9.309630174812341, 'region_measure_bb': 997049.9168748705, 'localization_density_bb': 0.001003961770677952, 'subregion_measure_bb': 3994.0982202393607, 'region_measure': 1000000, 'localization_density': 0.001001, 'subregion_measure': 4000}
We also simulate data that follows a Neyman-Scott distribution (blobs):
locdata_blob = lc.simulate_Thomas(parent_intensity=1e-4, region=((0, 1000), (0, 1000)), cluster_mu=100, cluster_std=5, seed=rng)
print('Data head:')
print(locdata_blob.data.head(), '\n')
print('Summary:')
locdata_blob.print_summary()
print('Properties:')
print(locdata_blob.properties)
Data head:
position_x position_y cluster_label
0 639.830857 587.034592 13
1 499.121280 3.258363 15
2 347.348453 532.577203 62
3 563.963921 711.136973 61
4 258.865522 24.737190 47
Summary:
identifier: "2"
comment: ""
source: SIMULATION
state: RAW
element_count: 9860
frame_count: 0
creation_time {
2024-03-14T11:05:13.061376Z
}
Properties:
{'localization_count': 9860, 'position_x': 486.6445632771076, 'uncertainty_x': 2.851385115124155, 'position_y': 467.1399336108736, 'uncertainty_y': 3.0359994421575136, 'region_measure_bb': 999385.4817400454, 'localization_density_bb': 0.009866062875791034, 'subregion_measure_bb': 3998.7709412537642, 'region_measure': 1000000, 'localization_density': 0.00986, 'subregion_measure': 4000}
Scatter plot#
fig, ax = plt.subplots(nrows=1, ncols=2)
locdata_csr.data.plot.scatter(x='position_x', y='position_y', ax=ax[0], color='Blue', s=1, alpha=0.1, label='locdata_csr')
locdata_blob.data.plot.scatter(x='position_x', y='position_y', ax=ax[1], color='Blue', s=1, alpha=0.1, label='locdata_blobs')
plt.tight_layout()
plt.show()
Analyze Ripley’s h function#
We have a look at the Ripley’s h function from all localizations in locdata.
The analysis class Ripley_h_function provides numerical results, and a plot of results versus radii.
rhf_csr = lc.RipleysHFunction(radii=np.linspace(0, 200, 50))
rhf_csr.compute(locdata_csr)
rhf_csr.results.head()
Ripley_h_data | |
---|---|
radius | |
0.000000 | 0.000000 |
4.081633 | 0.279930 |
8.163265 | -0.240094 |
12.244898 | -0.194560 |
16.326531 | -0.301127 |
rhf_blob = lc.RipleysHFunction(radii=np.linspace(0, 200, 50))
rhf_blob.compute(locdata_blob)
rhf_blob.results.head()
Ripley_h_data | |
---|---|
radius | |
0.000000 | 0.000000 |
4.081633 | 18.353371 |
8.163265 | 31.787540 |
12.244898 | 38.295378 |
16.326531 | 39.369222 |
The plot reflects the amount of clustering. For homogeneous distributed data it decreases towards negative values since edge effects are not taken into account.
rhf_csr.plot()
rhf_blob.plot();
The maximum of the computed H-function is provided by the attribute.
rhf_blob.Ripley_h_maximum
radius | Ripley_h_maximum | |
---|---|---|
Ripley_h_data | 16.326531 | 39.369222 |
Estimate Ripley’s h function#
We can speed up the computation of an estimated Ripley’s k function by providing a subset of the original localizations as test points.
We first take a random subset of the original localizations as test data. Here we provide 10 shuffeled data sets.
from locan.data.filter import random_subset
subsets = [lc.random_subset(locdata_blob, n_points=5, seed=rng) for i in range(10)]
We then compute the estimated Ripley’s h function’
rhf_estimate = lc.RipleysHFunction(radii=np.linspace(0, 200, 50)).compute(locdata_blob, other_locdata=subsets[0])
rhf_estimate.plot();
We can do the same for all subsets
rhf_estimates = [lc.RipleysHFunction(radii=np.linspace(0, 200, 50)).compute(locdata_blob, other_locdata=subset) for subset in subsets]
fig, ax = plt.subplots(nrows=1, ncols=1)
for estimate in rhf_estimates:
estimate.plot(ax=ax)
plt.show()
Compute Ripley’s k, l and h function#
We can compute Ripley’s k, l and h function
rkf_csr = lc.RipleysKFunction(radii=np.linspace(0, 200, 20)).compute(locdata_csr)
rlf_csr = lc.RipleysLFunction(radii=np.linspace(0, 200, 20)).compute(locdata_csr)
rhf_csr = lc.RipleysHFunction(radii=np.linspace(0, 200, 20)).compute(locdata_csr)
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 5))
for estimate, ax in zip([rkf_csr, rlf_csr, rhf_csr], axes.ravel()):
estimate.plot(marker='o', ax=ax)
plt.tight_layout()
plt.show()
Estimate Ripley’s h function for 3D data#
dat_blob_3D = lc.simulate_Thomas(parent_intensity=1e-7, region=((0, 1000), (0, 1000), (0, 1000)), cluster_mu=100, cluster_std=5, seed=rng)
dat_blob_3D.print_summary()
identifier: "23"
comment: ""
source: SIMULATION
state: RAW
element_count: 9394
frame_count: 0
creation_time {
2024-03-14T11:05:22.095819Z
}
sub = lc.random_subset(dat_blob_3D, n_points=1000, seed=rng)
rhf_3D = lc.RipleysHFunction(radii=np.linspace(0, 200, 100)).compute(dat_blob_3D, other_locdata=sub)
rhf_3D.plot();