{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial about simulating localization data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Locan provides methods for simulating basic localization data sets as LocData objects." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import numpy as np\n", "import pandas as pd\n", "\n", "%matplotlib inline\n", "\n", "import matplotlib.pyplot as plt\n", "from mpl_toolkits.mplot3d import Axes3D\n", "\n", "import locan as lc" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lc.show_versions(system=False, dependencies=False, verbose=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use random number generator" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In all simulations we make use of numpy routines for random number generation by instantiating `numpy.random.default_rng` and taking a seed parameter. Therefore, we recommend to set up a random number generator in every script and pass that generator instance to all simulation functions through the seed parameter." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rng = np.random.default_rng(seed=1)\n", "locdatas = [lc.simulate_uniform(n_samples=100, region=((0, 1000), (0, 1000)), seed=rng) for i in range(3)]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "for i, locdata in enumerate(locdatas):\n", " locdata.data.plot.scatter(x='position_x', y='position_y', color=plt.cm.tab10(i), ax=ax, label='locdata')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure to follow the correct procedure for parallel computation as described in the numpy tutorials (https://numpy.org/doc/stable/reference/random/parallel.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulate localization data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Point coordinates are distributed on region as specified by `Region` instances or interval tuples." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulate localization data that follows a uniform distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### in 1d" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "locdata = lc.simulate_uniform(n_samples=100, region=lc.data.regions.Interval(0, 1000), seed=1)\n", "\n", "locdata.print_summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "ax.plot(locdata.data.position_x, [1] * len(locdata), 'o', color='Blue', label='locdata')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### in 2d" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "points = ((0, 0), (0, 10), (10, 10), (10, 0))\n", "holes = [((1, 1), (1, 9), (4, 9), (4, 1))]\n", "region = lc.data.regions.Polygon(points, holes)\n", "locdata = lc.simulate_uniform(n_samples=1000, region=region, seed=1)\n", "\n", "locdata.print_summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "locdata.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Blue', label='locdata')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulate localization data that follows a homogeneous (Poisson) distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### in 2d" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "points = ((0, 0), (0, 10), (10, 10), (10, 0))\n", "holes = [((1, 1), (1, 9), (4, 9), (4, 1))]\n", "region = lc.data.regions.Polygon(points, holes)\n", "locdata = lc.simulate_Poisson(intensity=10, region=region, seed=1)\n", "\n", "locdata.print_summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "locdata.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Blue', label='locdata')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### in 3d" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "locdata = lc.simulate_Poisson(intensity=1e-4, region=((0, 100), (0, 100), (0, 100)), seed=1)\n", "\n", "locdata.print_summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = plt.figure()\n", "ax = fig.add_subplot(111, projection='3d')\n", "\n", "x,y,z = locdata.coordinates.T\n", "ax.scatter(x, y, z, color='Blue', label='locdata')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Neyman-Scott distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a Neyman-Scott distribution parent events are homogeneously distributed with a certain region and each parent event brings about a number of offspring events distributed around the parent event. \n", "For a typical Neyman-Scott process, both the number of parent events and the number of offspring events for each cluster are Poisson distributed. It is important to note that parent events can be outside the support region. For correct simulation, the support region is expanded to distribute parent events and then clipped after offspring substitution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Matern distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a Matern process offspring localizations are distributed homogeneously in circles of a given radius around the parent event." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "locdata = lc.simulate_Matern(parent_intensity=1e-3, region=((0, 100), (0, 100)), cluster_mu=100, radius=10, clip=True, seed=1)\n", "locdata_expanded = lc.simulate_Matern(parent_intensity=1e-3, region=((0, 100), (0, 100)), cluster_mu=100, radius=10, clip=False, seed=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "locdata.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Blue', label='clipped', alpha=0.1)\n", "locdata_expanded.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Red', label='extended', alpha=0.1)\n", "ax.add_patch(locdata.region.as_artist(fill=False))\n", "ax.add_patch(locdata_expanded.region.as_artist(fill=False))\n", "ax.axis('equal')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More variability can be achieved by specifying arrays for radius." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "locdata = lc.simulate_Matern(parent_intensity=3e-3, region=((0, 100), (0, 100)), cluster_mu=100, radius=np.linspace(1, 30, 300), clip=True, seed=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "locdata.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Blue', label='clipped', alpha=0.1)\n", "ax.add_patch(locdata.region.as_artist(fill=False))\n", "ax.axis('equal')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Thomas distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a Thomas process offspring localizations follow a normal distribution with center being the parent event and a given standard deviation. Here the region is expanded by a distance that equals cluster_std * expansion_factor." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "locdata = lc.simulate_Thomas(parent_intensity=1e-3, region=((0, 100), (0, 100)), cluster_mu=100, cluster_std=3, clip=True, seed=1)\n", "locdata_expanded = lc.simulate_Thomas(parent_intensity=1e-3, region=((0, 100), (0, 100)), cluster_mu=100, cluster_std=3, clip=False, seed=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "locdata.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Blue', label='clipped', alpha=0.1)\n", "locdata_expanded.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Red', label='extended', alpha=0.1)\n", "ax.add_patch(locdata.region.as_artist(fill=False))\n", "ax.add_patch(locdata_expanded.region.as_artist(fill=False))\n", "ax.axis('equal')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More variability can be achieved by specifying arrays of cluster_mu or cluster_std." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Cluster distribution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you need a fixed number of samples, use simulate_cluster and specify arbitrary offspring distributions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "offspring_points = [((-10, -10), (0, 10), (10, -10))] * 5\n", " \n", "locdata = lc.simulate_cluster(centers=5, region=((0, 100), (0, 100)), expansion_distance=10, offspring=offspring_points, clip=True, seed=1)\n", "locdata_expanded = lc.simulate_cluster(centers=5, region=((0, 100), (0, 100)), expansion_distance=10, offspring=offspring_points, clip=False, seed=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "locdata.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Blue', label='clipped')\n", "locdata_expanded.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Red', label='extended')\n", "ax.add_patch(locdata.region.as_artist(fill=False))\n", "ax.add_patch(locdata_expanded.region.as_artist(fill=False))\n", "ax.axis('equal')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def offspring_points(parent):\n", " angles = np.linspace(0, 360, 36)\n", " for angle in angles:\n", " circle = lc.data.regions.Ellipse(parent, 50, 30, angle)\n", " return circle.points\n", " \n", "locdata = lc.simulate_cluster(centers=5, region=((0, 100), (0, 100)), expansion_distance=10, offspring=offspring_points, clip=True, seed=1)\n", "locdata_expanded = lc.simulate_cluster(centers=5, region=((0, 100), (0, 100)), expansion_distance=10, offspring=offspring_points, clip=False, seed=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "locdata.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Blue', label='clipped', alpha=0.2)\n", "locdata_expanded.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Red', label='extended', alpha=0.2)\n", "ax.add_patch(locdata.region.as_artist(fill=False))\n", "ax.add_patch(locdata_expanded.region.as_artist(fill=False))\n", "ax.axis('equal')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resample data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The resample function provides additional localizations for each given localizations that are Gauss distributed around the original localizations with a standard deviation given by the `uncertainty_x` property. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nrg = np.random.default_rng(seed=1)\n", "n_samples = 10\n", "dat = lc.simulate_uniform(n_samples=n_samples, region=((0, 1000), (0, 1000)), seed=rng)\n", "dat.dataframe = dat.dataframe.assign(uncertainty_x= 20*rng.random(n_samples))\n", "dat.dataframe = dat.dataframe.assign(uncertainty_y= 20*rng.random(n_samples))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dat_resampled = lc.resample(dat, n_samples=1000, seed=rng)\n", "dat_resampled.data.tail()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "dat_resampled.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Red', label='locdata resampled', alpha=0.01)\n", "dat.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Blue', label='locdata')\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }