{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial about setting up an analysis pipeline and batch processing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Quite often you experiment with various analysis routines and appropriate parameters and come up with an analysis pipeline. A pipeline procedure then is a script defining analysis steps for a single locdata object (or a single group of corresponding locdatas as for instance used in 2-color measurements).\n",
    "\n",
    "The `Pipeline` class can be used to combine the pipeline code, metadata and analysis results in a single pickleable object (meaning it can be serialized by the python pickle module).\n",
    "\n",
    "This pipeline might then be applied to a number of similar datasets. A batch process is such a procedure for running a pipeline over multiple locdata objects and collecting and combing results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import locan as lc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "lc.show_versions(system=False, dependencies=False, verbose=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A path in which test data can be found:\n",
    "TEST_DIR: Path = Path.cwd().parents[2] / \"tests\"\n",
    "TEST_DIR"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Apply a pipeline of different analysis routines"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load rapidSTORM data file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "path = TEST_DIR / 'test_data/npc_gp210.asdf'\n",
    "print(path, '\\n')\n",
    "dat = lc.load_locdata(path=path, file_type=lc.FileType.ASDF)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "dat.properties"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up an analysis procedure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First define the analysis procedure (pipeline) in form of a computation function. Make sure the first parameter is the `self` refering to the Pipeline object. Add arbitrary keyword arguments thereafter. When finishing with `return self` the compute method can easily be called with instantiation. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "def computation(self, locdata, n_localizations_min=4):\n",
    "    \n",
    "    # import required modules\n",
    "    from locan.analysis import LocalizationPrecision\n",
    "    \n",
    "    # prologue\n",
    "    self.file_indicator = locdata.meta.file.path\n",
    "    self.locdata = locdata\n",
    "    \n",
    "    # check requirements\n",
    "    if len(locdata)<=n_localizations_min:\n",
    "        return None\n",
    "    \n",
    "    # compute localization precision\n",
    "    self.lp = LocalizationPrecision().compute(self.locdata)\n",
    "    \n",
    "    return self"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run the analysis procedure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instantiate a Pipeline object and run compute():"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "pipe = lc.Pipeline(computation=computation, locdata=dat, n_localizations_min=4).compute()\n",
    "pipe.meta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Results are available from Pipeline object in form of attributes defined in the compute function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "[attr for attr in dir(pipe) if not attr.startswith('__') and not attr.endswith('__')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "pipe.lp.results.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "pipe.lp.hist();\n",
    "print(pipe.lp.distribution_statistics.parameter_dict())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can recover the computation procedure:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "pipe.computation_as_string()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "or save it as text protocol:"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "pipe.save_computation(path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Pipeline object is pickleable and can thus be saved for revisits."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Apply the pipeline on multiple datasets - a batch process"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create multiple datasets:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "path = TEST_DIR / 'test_data/npc_gp210.asdf'\n",
    "print(path, '\\n')\n",
    "dat = lc.load_locdata(path=path, file_type=lc.FileType.ASDF)\n",
    "\n",
    "locdatas = [lc.select_by_condition(dat, f'{min}<index<{max}') for min, max in ((0,100), (101,202))]\n",
    "locdatas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the analysis pipeline as batch process"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "pipes = [lc.Pipeline(computation=computation, locdata=dat).compute() for dat in locdatas]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As long as the batch procedure runs in a single computer process, the identifier increases with every instantiation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "[pipe.meta.identifier for pipe in pipes]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize the combined results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(nrows=1, ncols=1)\n",
    "for pipe in pipes:\n",
    "    pipe.lp.plot(ax=ax, window=10)\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.6"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "state": {},
    "version_major": 2,
    "version_minor": 0
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}