{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial about analyzing grouped cluster properties" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Localization properties vary within clusters. Analyzing such variations can help to characterize cluster populations. Here, we show examples for variations in convex hull properties or coordinate variances." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "%matplotlib inline\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from colorcet import m_fire, m_gray, m_dimgray\n", "\n", "import locan as lc" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "lc.show_versions(system=False, dependencies=False, verbose=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Synthetic data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We simulate localization data that follows a Neyman-Scott distribution in 2D:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rng = np.random.default_rng(seed=1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "locdata = lc.simulate_dstorm(parent_intensity=1e-5, region=((0, 10_000), (0, 10_000)), cluster_mu=10, cluster_std=10, seed=rng)\n", "\n", "locdata.print_summary()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "bin_range=((0, 2_000), (0, 2_000))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "lc.render_2d_mpl(locdata, bin_size=20, bin_range=bin_range)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## True clusters" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First we look at ground truth clusters." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "grouped = locdata.data.groupby(\"cluster_label\")\n", "clust = lc.LocData.from_collection([lc.LocData.from_selection(locdata, indices=group.index) for name, group in grouped])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Filter out clusters with less than 3 localizations since on convex hull can be computed for such clusters." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "clust_selection = lc.select_by_condition(clust, condition=\"2 < localization_count\")\n", "references_ = [clust.references[i] for i in clust_selection.indices]\n", "clust_selection.reduce()\n", "clust_selection.references = references_\n", "clust = clust_selection" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "n_clustered_loc = np.sum([ref.properties['localization_count'] for ref in clust.references])\n", "print(f\"Number of clusters: {clust.properties['localization_count']}\")\n", "print(f\"Number of clustered localizations: {n_clustered_loc}\")\n", "print(f\"Ratio cluster to noise localizations: {n_clustered_loc / len(locdata):.3}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "lc.render_2d_mpl(locdata, bin_size=20, bin_range=bin_range, cmap=m_gray.reversed())\n", "clust.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Red', s=10, label='cluster centroids', xlim=bin_range[0], ylim=bin_range[1])\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "clust.data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "clust.properties" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Investigate the convex hull areas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Localization clusters can be analyzed with respect to their convex hull region properties as function of localization_count as outlined in Ebert et al. (https://doi:10.1093/bioinformatics/btac700)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "che = lc.ConvexHullExpectation(convex_hull_property='region_measure_ch', expected_variance=10**2).compute(locdata=clust)\n", "che.results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))\n", "che.plot(ax=axes[0])\n", "che.hist(ax=axes[1]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Investigate the position variances" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Localization coordinates in localization clusters come with a certain variance. The variance is related to the localization precision or other localization properties but also varies with localization_count if determined as biased sample variance (i.e. without Bessel's correction)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pve_biased = lc.PositionVarianceExpectation(loc_property=\"position_x\", expectation=10**2, biased=True).compute(locdata=clust)\n", "pve_biased.results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))\n", "pve_biased.plot(ax=axes[0])\n", "pve_biased.hist(ax=axes[1]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A similar analysis can be performed with unbiased variances in which Bessel's correction is applied." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pve = lc.PositionVarianceExpectation(loc_property=\"position_x\", expectation=10**2, biased=False).compute(locdata=clust)\n", "pve.results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))\n", "pve.plot(ax=axes[0])\n", "pve.hist(ax=axes[1], log=True);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Investigate any grouped property" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A similar analysis can be carried out with any LocData property. For instance, let's check the coordinate uncertainties of cluster centroids. The uncertainty in one dimension should follow the square root of the biased position variance for clusters with variable number of localizations.\n", "\n", "It is important to consider the differences between variance and standard deviation. Position uncertainties are usually given as standard deviation with units equal to position units. Converting the ground truth for the coordinate standard deviation in each cluster (std as used in the simulations above) requires a Bessel correction on the squared std being the variance). In addition the coordinate uncertainties of cluster centroids should scale with the inverse square root of the number of localizations per cluster." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "n_locs = np.arange(1, 1000)\n", "ground_truth_std = 10\n", "ground_truth_variance = ground_truth_std**2\n", "biased_variance = ground_truth_variance * (1 - 1 / n_locs)\n", "biased_uncertainty = np.sqrt(biased_variance)\n", "expected_uncertainty = biased_uncertainty / np.sqrt(n_locs)\n", "expectation = pd.Series(data=expected_uncertainty, index=n_locs)\n", "expectation;" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "loc_property = \"uncertainty_x\"\n", "other_loc_property = \"localization_count\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gpe = lc.GroupedPropertyExpectation(loc_property=loc_property, other_loc_property=other_loc_property, expectation=expectation).compute(locdata=clust)\n", "gpe.results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))\n", "gpe.plot(ax=axes[0])\n", "gpe.hist(ax=axes[1]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cluster localizations by dbscan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When clustering data by dbscan slight deviations appear between expectation and computed properties." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "noise, clust = lc.cluster_dbscan(locdata, eps=20, min_samples=3)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "hide-input" ] }, "outputs": [], "source": [ "n_clustered_loc = np.sum([ref.properties['localization_count'] for ref in clust.references])\n", "print(f\"Number of clusters: {clust.properties['localization_count']}\")\n", "print(f\"Number of noise localizations: {noise.properties['localization_count']}\")\n", "print(f\"Number of clustered localizations: {n_clustered_loc}\")\n", "print(f\"Ratio cluster to noise localizations: {n_clustered_loc / (n_clustered_loc + noise.properties['localization_count']):.3}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, ax = plt.subplots(nrows=1, ncols=1)\n", "lc.render_2d_mpl(locdata, bin_size=20, bin_range=bin_range, cmap=m_gray.reversed())\n", "clust.data.plot.scatter(x='position_x', y='position_y', ax=ax, color='Red', s=10, label='cluster centroids', xlim=bin_range[0], ylim=bin_range[1])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Investigate the position variances" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pve_biased = lc.PositionVarianceExpectation(loc_property=\"position_x\", expectation=10**2, biased=True).compute(locdata=clust)\n", "pve_biased.results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))\n", "pve_biased.plot(ax=axes[0])\n", "pve_biased.hist(ax=axes[1]);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pve = lc.PositionVarianceExpectation(loc_property=\"position_x\", expectation=10**2, biased=False).compute(locdata=clust)\n", "pve.results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))\n", "pve.plot(ax=axes[0])\n", "pve.hist(ax=axes[1], log=True);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Investigate the convex hull areas" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "che = lc.ConvexHullExpectation(convex_hull_property='region_measure_ch', expected_variance=10**2).compute(locdata=clust)\n", "che.results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))\n", "che.plot(ax=axes[0])\n", "che.hist(ax=axes[1]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Investigate the uncertainties for cluster centroids " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "n_locs = np.arange(1, 1000)\n", "ground_truth_std = 10\n", "ground_truth_variance = ground_truth_std**2\n", "biased_variance = ground_truth_variance * (1 - 1 / n_locs)\n", "biased_uncertainty = np.sqrt(biased_variance)\n", "expected_uncertainty = biased_uncertainty / np.sqrt(n_locs)\n", "expectation = pd.Series(data=expected_uncertainty, index=n_locs)\n", "expectation;" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "loc_property = \"uncertainty_x\"\n", "other_loc_property = \"localization_count\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "gpe = lc.GroupedPropertyExpectation(loc_property=loc_property, other_loc_property=other_loc_property, expectation=expectation).compute(locdata=clust)\n", "gpe.results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 5))\n", "gpe.plot(ax=axes[0])\n", "gpe.hist(ax=axes[1]);" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.9" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }