Tutorial about Ripley’s k function#

from pathlib import Path

%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

import locan as lc
lc.show_versions(system=False, dependencies=False, verbose=False)
Locan:
   version: 0.20.0.dev41+g755b969

Python:
   version: 3.11.6

Synthetic data#

We simulate localization data that is homogeneously Poisson distributed (also described as complete spatial randomness, csr).

rng = np.random.default_rng(seed=1)
locdata_csr = lc.simulate_Poisson(intensity=1e-3, region=((0,1000), (0,1000)), seed=rng)

print('Data head:')
print(locdata_csr.data.head(), '\n')
print('Summary:')
locdata_csr.print_summary()
print('Properties:')
print(locdata_csr.properties)
Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.
Data head:
   position_x  position_y
0  144.159613  948.649447
1  311.831452  423.326449
2  827.702594  409.199136
3  549.593688   27.559113
4  753.513109  538.143313 

Summary:
identifier: "1"
comment: ""
source: SIMULATION
state: RAW
element_count: 1001
frame_count: 0
creation_time {
  2024-03-14T11:05:13.014271Z
}

Properties:
{'localization_count': 1001, 'position_x': 497.1480704304259, 'uncertainty_x': 8.846806436903796, 'position_y': 503.49401443075675, 'uncertainty_y': 9.309630174812341, 'region_measure_bb': 997049.9168748705, 'localization_density_bb': 0.001003961770677952, 'subregion_measure_bb': 3994.0982202393607, 'region_measure': 1000000, 'localization_density': 0.001001, 'subregion_measure': 4000}

We also simulate data that follows a Neyman-Scott distribution (blobs):

locdata_blob = lc.simulate_Thomas(parent_intensity=1e-4, region=((0, 1000), (0, 1000)), cluster_mu=100, cluster_std=5, seed=rng)

print('Data head:')
print(locdata_blob.data.head(), '\n')
print('Summary:')
locdata_blob.print_summary()
print('Properties:')
print(locdata_blob.properties)
Data head:
   position_x  position_y  cluster_label
0  639.830857  587.034592             13
1  499.121280    3.258363             15
2  347.348453  532.577203             62
3  563.963921  711.136973             61
4  258.865522   24.737190             47 

Summary:
identifier: "2"
comment: ""
source: SIMULATION
state: RAW
element_count: 9860
frame_count: 0
creation_time {
  2024-03-14T11:05:13.061376Z
}

Properties:
{'localization_count': 9860, 'position_x': 486.6445632771076, 'uncertainty_x': 2.851385115124155, 'position_y': 467.1399336108736, 'uncertainty_y': 3.0359994421575136, 'region_measure_bb': 999385.4817400454, 'localization_density_bb': 0.009866062875791034, 'subregion_measure_bb': 3998.7709412537642, 'region_measure': 1000000, 'localization_density': 0.00986, 'subregion_measure': 4000}

Scatter plot#

fig, ax = plt.subplots(nrows=1, ncols=2)
locdata_csr.data.plot.scatter(x='position_x', y='position_y', ax=ax[0], color='Blue', s=1, alpha=0.1, label='locdata_csr')
locdata_blob.data.plot.scatter(x='position_x', y='position_y', ax=ax[1], color='Blue', s=1, alpha=0.1, label='locdata_blobs')
plt.tight_layout()
plt.show()
../../_images/5eeea5ecb3dfae3f47218fb555a6b3dc656803e675299b23a7854eea1c9fad7d.png

Analyze Ripley’s h function#

We have a look at the Ripley’s h function from all localizations in locdata.

The analysis class Ripley_h_function provides numerical results, and a plot of results versus radii.

rhf_csr = lc.RipleysHFunction(radii=np.linspace(0, 200, 50))
rhf_csr.compute(locdata_csr)
rhf_csr.results.head()
Ripley_h_data
radius
0.000000 0.000000
4.081633 0.279930
8.163265 -0.240094
12.244898 -0.194560
16.326531 -0.301127
rhf_blob = lc.RipleysHFunction(radii=np.linspace(0, 200, 50))
rhf_blob.compute(locdata_blob)
rhf_blob.results.head()
Ripley_h_data
radius
0.000000 0.000000
4.081633 18.353371
8.163265 31.787540
12.244898 38.295378
16.326531 39.369222

The plot reflects the amount of clustering. For homogeneous distributed data it decreases towards negative values since edge effects are not taken into account.

rhf_csr.plot()
rhf_blob.plot();
../../_images/0cdbad4e90f1913d8c587489bcb3d99853c56e3aba1751399143cc7bbdd8e60e.png

The maximum of the computed H-function is provided by the attribute.

rhf_blob.Ripley_h_maximum
radius Ripley_h_maximum
Ripley_h_data 16.326531 39.369222

Estimate Ripley’s h function#

We can speed up the computation of an estimated Ripley’s k function by providing a subset of the original localizations as test points.

We first take a random subset of the original localizations as test data. Here we provide 10 shuffeled data sets.

from locan.data.filter import random_subset
subsets = [lc.random_subset(locdata_blob, n_points=5, seed=rng) for i in range(10)]

We then compute the estimated Ripley’s h function’

rhf_estimate = lc.RipleysHFunction(radii=np.linspace(0, 200, 50)).compute(locdata_blob, other_locdata=subsets[0])
rhf_estimate.plot();
../../_images/501c3d077e987917d306c0b35d44e70e6d76793adda2f4d70e62fbb393cda9cc.png

We can do the same for all subsets

rhf_estimates = [lc.RipleysHFunction(radii=np.linspace(0, 200, 50)).compute(locdata_blob, other_locdata=subset) for subset in subsets]
fig, ax = plt.subplots(nrows=1, ncols=1)
for estimate in rhf_estimates:
    estimate.plot(ax=ax)
plt.show()
../../_images/7337e9838385044788790bece146cad5ec36103f1458b3ad12592af3abddaba8.png

Compute Ripley’s k, l and h function#

We can compute Ripley’s k, l and h function

rkf_csr = lc.RipleysKFunction(radii=np.linspace(0, 200, 20)).compute(locdata_csr)
rlf_csr = lc.RipleysLFunction(radii=np.linspace(0, 200, 20)).compute(locdata_csr)
rhf_csr = lc.RipleysHFunction(radii=np.linspace(0, 200, 20)).compute(locdata_csr)
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(15, 5))
for estimate, ax in zip([rkf_csr, rlf_csr, rhf_csr], axes.ravel()):
    estimate.plot(marker='o', ax=ax)
plt.tight_layout()
plt.show()
../../_images/41b1767fe7b07a5d954f3ace684192eefa2adeb6d41ec1428dd94b7b12feb8f9.png

Estimate Ripley’s h function for 3D data#

dat_blob_3D = lc.simulate_Thomas(parent_intensity=1e-7, region=((0, 1000), (0, 1000), (0, 1000)), cluster_mu=100, cluster_std=5, seed=rng)
dat_blob_3D.print_summary()
identifier: "23"
comment: ""
source: SIMULATION
state: RAW
element_count: 9394
frame_count: 0
creation_time {
  2024-03-14T11:05:22.095819Z
}
sub = lc.random_subset(dat_blob_3D, n_points=1000, seed=rng)
rhf_3D = lc.RipleysHFunction(radii=np.linspace(0, 200, 100)).compute(dat_blob_3D, other_locdata=sub)
rhf_3D.plot();
../../_images/11e08c4037c941b0e656ea68a9e64aa41f688ee41097e306391f83285b5afd78.png