{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial about the LocData class"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "import locan as lc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "lc.show_versions(system=False, dependencies=False, verbose=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sample data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A localization has certain properties such as 'position_x'. A list of localizations can be assembled into a dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(\n",
    "    {\n",
    "        'position_x': np.arange(0,10),\n",
    "        'position_y': np.random.random(10),\n",
    "        'frame': np.arange(0,10),\n",
    "    })"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instantiate LocData from a dataframe"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A LocData object carries localization data together with metadata and aggregated properties for the whole set of localizations.\n",
    "\n",
    "We first instantiate a LocData object from the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata = lc.LocData.from_dataframe(dataframe=df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "attributes = [x for x in dir(locdata) if not x.startswith('_')]\n",
    "attributes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## LocData attributes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The class attribute Locdata.count represents the number of all current LocData instantiations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('LocData count: ', lc.LocData.count)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The localization dataset is provided by the data attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(locdata.data.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Aggregated properties are provided by the attribute properties. E.g. the property `position_x` represents the mean of the `position_x` for all localizations. We keep the name, since the aggregated dataset can be treated as just a single locdata event with `position_x`. This is used when dealing with data clusters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.properties"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since spatial coordinates are quite important one can check on *coordinate_keys* and dimension:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.coordinate_keys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.dimension"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A numpy array of spatial coordinates is returned by:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.coordinates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Metadata "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For detailed information see the `Tutorial about metadata`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metadata is provided by the attribute meta and can be printed as"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "locdata.print_meta()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A summary of the most important metadata fields is printed as:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "locdata.print_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metadata fields can be printed and changed individually:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(locdata.meta.comment)\n",
    "locdata.meta.comment = 'user comment'\n",
    "print(locdata.meta.comment)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "LocData.meta.map represents a dictionary structure that can be filled by the user. Both key and value have to be strings, if not a TypeError is thrown."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(locdata.meta.map)\n",
    "locdata.meta.map['user field'] = 'more information'\n",
    "print(locdata.meta.map)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Metadata can also be added at Instantiation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_2 = lc.LocData.from_dataframe(dataframe=df, meta={'identifier': 'myID_1', \n",
    "                                                   'comment': 'my own user comment'})\n",
    "locdata_2.print_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instantiate locdata from selection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A LocData object can also be instantiated from a selection of localizations. In this case the LocData object keeps a reference to the original locdata together with a list of indices (or a slice object)). The new dataset is assembled on request of the data attribute."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Typically a selection is derived using a selection method such that using LocData.from_selection() is not often necessary.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_2 = lc.LocData.from_selection(locdata, indices=[1,2,3,4])\n",
    "locdata_3 = lc.LocData.from_selection(locdata, indices=[5,6,7,8])\n",
    "\n",
    "print('count: ', lc.LocData.count)\n",
    "print('')\n",
    "print(locdata_2.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_2.print_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The reference is kept in a private attribute as are the indices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(locdata_2.references)\n",
    "print(locdata_2.indices)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The reference is the same for both selections."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(locdata_2.references is locdata_3.references)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Instantiate locdata from collection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A LocDat object can further be instantiated from a collection of other LocData objects."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "del(locdata_2, locdata_3)\n",
    "\n",
    "locdata_1 = lc.LocData.from_selection(locdata, indices=[0,1,2])\n",
    "locdata_2 = lc.LocData.from_selection(locdata, indices=[3,4,5])\n",
    "locdata_3 = lc.LocData.from_selection(locdata, indices=[6,7,8])\n",
    "locdata_c = lc.LocData.from_collection(locdatas=[locdata_1, locdata_2, locdata_3], meta={'identifier': 'my_collection'})\n",
    "\n",
    "print('count: ', lc.LocData.count, '\\n')\n",
    "print(locdata_c.data, '\\n')\n",
    "print(locdata_c.properties, '\\n')\n",
    "locdata_c.print_summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case the reference are also kept in case the original localizations from the collected LocData object are requested."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(locdata_c.references)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In case the collected LocData objects are not needed anymore and should be free for garbage collection the references can be deleted by a dedicated Locdata method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_c.reduce()\n",
    "print(locdata_c.references)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Concatenating LocData objects "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets have a second dataset with localization data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "del(locdata_2)\n",
    "\n",
    "df_2 = pd.DataFrame(\n",
    "    {\n",
    "        'position_x': np.arange(0,10),\n",
    "        'position_y': np.random.random(10),\n",
    "        'frame': np.arange(0,10),\n",
    "    })\n",
    "\n",
    "locdata_2 = lc.LocData.from_dataframe(dataframe=df_2)\n",
    "\n",
    "print('First locdata:')\n",
    "print(locdata.data.head())\n",
    "print('')\n",
    "print('Second locdata:')\n",
    "print(locdata_2.data.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to combine two sets of localization data from two LocData objects into a single LocData object use the class method *LocData.concat*:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_new = lc.LocData.concat([locdata, locdata_2])\n",
    "print('Number of localizations in locdata_new: ', len(locdata_new))\n",
    "locdata_new.data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modifying data in place"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In case localization data has been modified in place, i.e. the dataset attribute is changed, all properties and hulls must be recomputed. This is best done by re-instantiating the LocData object using `LocData.from_dataframe()`; but it can also be done using the `LocData.reset()` function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "del(df, locdata)\n",
    "\n",
    "df = pd.DataFrame(\n",
    "    {\n",
    "        'position_x': np.arange(0,10),\n",
    "        'position_y': np.random.random(10),\n",
    "        'frame': np.arange(0,10),\n",
    "    })\n",
    "\n",
    "locdata = lc.LocData.from_dataframe(dataframe=df)\n",
    "\n",
    "print(locdata.data.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.centroid"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now if localization data is changed in place (which you should not do unless you have a good reason), properties and bounding box are not automatically adjusted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.dataframe = pd.DataFrame(\n",
    "    {\n",
    "        'position_x': np.arange(0,8),\n",
    "        'position_y': np.random.random(8),\n",
    "        'frame': np.arange(0,8),\n",
    "    })\n",
    "\n",
    "print(locdata.data.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.centroid  # so this returns incorrect values here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Update them by re-instantiating a new LocData object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_new = lc.LocData.from_dataframe(dataframe=locdata.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_new.centroid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_new.meta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively you can use `reset()`. In this case, however, metadata is not updated and will provide wrong information.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.reset()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.centroid"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.meta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Copy LocData"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Shallow and deep copies can be made from LocData instances. In either case the class variable count and the metadata is not just copied but adjusted accordingly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('count: ', lc.LocData.count)\n",
    "print('')\n",
    "print(locdata_2.meta)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from copy import copy, deepcopy\n",
    "\n",
    "print('count before: ', lc.LocData.count)\n",
    "locdata_copy = copy(locdata_2)\n",
    "locdata_deepcopy = deepcopy(locdata_2)\n",
    "print('count after: ', lc.LocData.count)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(locdata_copy.meta)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(locdata_deepcopy.meta)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adding a property"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Any property that is created for a set of localizations (and represented as a python dictionary) can be added to the Locdata object. As an example, we compute the maximum distance between any two localizations and add that `max_distance` as new property to `locdata`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "max_distance = lc.max_distance(locdata)\n",
    "max_distance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "locdata.properties.update(max_distance)\n",
    "locdata.properties"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adding a property to each localization in LocData.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In case you have processed your data and come up with a new property for each localization in the LocData object, this property can be added to data. As an example, we compute the nearest neighbor distance for each localization and add `nn_distance` as new property."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nn = lc.NearestNeighborDistances().compute(locdata)\n",
    "nn.results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To add `nn_distance` as new property to each localization in LocData object, use the `pandas.assign` function on the `locdata.dataframe`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata.dataframe = locdata.dataframe.assign(nn_distance=nn.results['nn_distance'])\n",
    "locdata.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding nn_distance as new property to each localization in LocData object with dataframe=None"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In case the LocData object was created with LocData.from_selection() the LocData.dataframe attribute is None and LocData.data is generated from the referenced locdata and the index list. \n",
    "\n",
    "In this case LocData.dataframe can still be filled with additional data that is merged upon returning LocData.data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_selection = lc.LocData.from_selection(locdata, indices=[1, 3, 4, 5])\n",
    "locdata_selection.data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_selection.dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nn_selection = lc.NearestNeighborDistances().compute(locdata_selection)\n",
    "nn_selection.results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make sure the indices in nn.results match those in dat_selection.data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_selection.data.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nn_selection.results.index = locdata_selection.data.index\n",
    "nn_selection.results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then assign the corresponding result to dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_selection.dataframe = locdata_selection.dataframe.assign(nn_distance= nn_selection.results['nn_distance'])\n",
    "locdata_selection.dataframe"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Calling `data` will return the complete dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "locdata_selection.data"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}