{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial about setting up an analysis pipeline and batch processing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Quite often you experiment with various analysis routines and appropriate parameters and come up with an analysis pipeline. A pipeline procedure then is a script defining analysis steps for a single locdata object (or a single group of corresponding locdatas as for instance used in 2-color measurements).\n", "\n", "The `Pipeline` class can be used to combine the pipeline code, metadata and analysis results in a single pickleable object (meaning it can be serialized by the python pickle module).\n", "\n", "This pipeline might then be applied to a number of similar datasets. A batch process is such a procedure for running a pipeline over multiple locdata objects and collecting and combing results." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "%matplotlib inline\n", "\n", "import matplotlib.pyplot as plt\n", "\n", "import locan as lc" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "lc.show_versions(system=False, dependencies=False, verbose=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A path in which test data can be found:\n", "TEST_DIR: Path = Path.cwd().parents[2] / \"tests\"\n", "TEST_DIR" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Apply a pipeline of different analysis routines" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load rapidSTORM data file" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = TEST_DIR / 'test_data/npc_gp210.asdf'\n", "print(path, '\\n')\n", "dat = lc.load_locdata(path=path, file_type=lc.FileType.ASDF)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "dat.properties" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set up an analysis procedure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First define the analysis procedure (pipeline) in form of a computation function. Make sure the first parameter is the `self` refering to the Pipeline object. Add arbitrary keyword arguments thereafter. When finishing with `return self` the compute method can easily be called with instantiation. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "def computation(self, locdata, n_localizations_min=4):\n", " \n", " # import required modules\n", " from locan.analysis import LocalizationPrecision\n", " \n", " # prologue\n", " self.file_indicator = locdata.meta.file.path\n", " self.locdata = locdata\n", " \n", " # check requirements\n", " if len(locdata)<=n_localizations_min:\n", " return None\n", " \n", " # compute localization precision\n", " self.lp = LocalizationPrecision().compute(self.locdata)\n", " \n", " return self" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run the analysis procedure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instantiate a Pipeline object and run compute():" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "pipe = lc.Pipeline(computation=computation, locdata=dat, n_localizations_min=4).compute()\n", "pipe.meta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Results are available from Pipeline object in form of attributes defined in the compute function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "[attr for attr in dir(pipe) if not attr.startswith('__') and not attr.endswith('__')]" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "pipe.lp.results.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "pipe.lp.hist();\n", "print(pipe.lp.distribution_statistics.parameter_dict())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can recover the computation procedure:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "pipe.computation_as_string()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or save it as text protocol:" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "pipe.save_computation(path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Pipeline object is pickleable and can thus be saved for revisits." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Apply the pipeline on multiple datasets - a batch process" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create multiple datasets:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = TEST_DIR / 'test_data/npc_gp210.asdf'\n", "print(path, '\\n')\n", "dat = lc.load_locdata(path=path, file_type=lc.FileType.ASDF)\n", "\n", "locdatas = [lc.select_by_condition(dat, f'{min}