Tutorial about loading localization data from file#

from pathlib import Path

import locan as lc

lc.show_versions(system=False, dependencies=False, verbose=False)

Locan:
   version: 0.20.0.dev41+g755b969

Python:
   version: 3.11.6

Localization data is typically provided as text or binary file with different formats depending on the fitting software. Locan provides functions for loading various localization files.

All available functions can be looked up in the API documentation.

In locan there are functions availabel to deal with file types according to the constant enum FileType:

list(lc.FileType._member_names_)

['UNKNOWN_FILE_TYPE',
 'CUSTOM',
 'RAPIDSTORM',
 'ELYRA',
 'THUNDERSTORM',
 'ASDF',
 'NANOIMAGER',
 'RAPIDSTORMTRACK',
 'SMLM',
 'DECODE',
 'SMAP']

Currently the following io functions are available:

[name for name in dir(lc.locan_io) if not name.startswith("__")]

Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.

['Files',
 'annotations',
 'convert_property_names',
 'convert_property_types',
 'files',
 'find_file_upstream',
 'load_Elyra_file',
 'load_Elyra_header',
 'load_Nanoimager_file',
 'load_Nanoimager_header',
 'load_SMAP_file',
 'load_SMAP_header',
 'load_SMLM_file',
 'load_SMLM_header',
 'load_SMLM_manifest',
 'load_asdf_file',
 'load_decode_file',
 'load_decode_header',
 'load_locdata',
 'load_rapidSTORM_file',
 'load_rapidSTORM_header',
 'load_rapidSTORM_track_file',
 'load_rapidSTORM_track_header',
 'load_thunderstorm_file',
 'load_thunderstorm_header',
 'load_txt_file',
 'locdata',
 'manifest_file_info_from_locdata',
 'manifest_format_from_locdata',
 'manifest_from_locdata',
 'save_SMAP_csv',
 'save_SMLM',
 'save_asdf',
 'save_thunderstorm_csv',
 'utilities']

Throughout this manual it might be helpful to use pathlib to provide path information. In all cases a string path is also usable.

Load rapidSTORM data file#

Here we identify some data in the test_data directory and provide a path using pathlib (a pathlib object is returned by lc.ROOT_DIR):

path = lc.ROOT_DIR / 'tests/test_data/rapidSTORM_dstorm_data.txt'
print(path, '\n')

/home/docs/checkouts/readthedocs.org/user_builds/locan/envs/latest/lib/python3.11/site-packages/locan/tests/test_data/rapidSTORM_dstorm_data.txt 

The data is then loaded from a rapidSTORM localization file. The file header is read to provide correct property names. The number of localisations to be read can be limited by nrows

dat = lc.load_rapidSTORM_file(path=path, nrows=10)

Print information about the data:

print('Data head:')
print(dat.data.head(), '\n')
print('Summary:')
dat.print_summary()
print('Properties:')
print(dat.properties)

Data head:
   position_x  position_y  frame  intensity  chi_square  local_background
0     9657.40     24533.5      0   33290.10   1192250.0        767.732971
1    16754.90     18770.0      0   21275.40   2106810.0        875.460999
2    14457.60     18582.6      0   20748.70    526031.0        703.369995
3     6820.58     16662.8      0    8531.77   3179190.0        852.789001
4    19183.20     22907.2      0   14139.60    448631.0        662.770020 

Summary:
identifier: "1"
comment: ""
source: EXPERIMENT
state: RAW
element_count: 10
frame_count: 1
file {
  type: RAPIDSTORM
  path: "/home/docs/checkouts/readthedocs.org/user_builds/locan/envs/latest/lib/python3.11/site-packages/locan/tests/test_data/rapidSTORM_dstorm_data.txt"
}
creation_time {
  2024-03-14T11:08:51.023763Z
}

Properties:
{'localization_count': 10, 'position_x': 16878.898, 'uncertainty_x': 2252.823069869743, 'position_y': 18209.502, 'uncertainty_y': 1569.5233876858572, 'intensity': 147591.47, 'local_background': 707.37335, 'frame': 0, 'region_measure_bb': 378484578.7175999, 'localization_density_bb': 2.6421155741358055e-08, 'subregion_measure_bb': 80399.79999999999}

Column names are exchanged with standard locan property names according to the following mapping. If no mapping is defined a warning is issued and the original column name is kept.

lc.RAPIDSTORM_KEYS

{'Position-0-0': 'position_x',
 'Position-1-0': 'position_y',
 'Position-2-0': 'position_z',
 'ImageNumber-0-0': 'frame',
 'Amplitude-0-0': 'intensity',
 'FitResidues-0-0': 'chi_square',
 'LocalBackground-0-0': 'local_background',
 'TwoKernelImprovement-0-0': 'two_kernel_improvement',
 'Position-0-0-uncertainty': 'uncertainty_x',
 'Position-1-0-uncertainty': 'uncertainty_y',
 'Position-2-0-uncertainty': 'uncertainty_z'}

Load Zeiss Elyra data file#

The Elyra super-resolution microscopy system from Zeiss uses as slightly different file format. Elyra column names are exchanged with locan property names upon loading the data.

path_Elyra = lc.ROOT_DIR / 'tests/test_data/Elyra_dstorm_data.txt'
print(path_Elyra, '\n')

/home/docs/checkouts/readthedocs.org/user_builds/locan/envs/latest/lib/python3.11/site-packages/locan/tests/test_data/Elyra_dstorm_data.txt 

dat_Elyra = lc.load_Elyra_file(path=path_Elyra, nrows=10)

print('Data head:')
print(dat_Elyra.data.head(), '\n')
print('Summary:')
dat_Elyra.print_summary()
print('Properties:')
print(dat_Elyra.properties)

Data head:
   original_index  frame  frames_number  frames_missing  position_x  \
0               1      1              1               0     15850.6   
1               2      1              1               0     25617.3   
2               3      1              1               0     20155.8   
3               4      1              1               0     10776.9   
4               5      1              1               0     28966.9   

   position_y  uncertainty  intensity  local_background_sigma  chi_square  \
0     23502.1          8.6      472.0                    5.33        0.28   
1     24310.2          9.5      529.0                    4.38        0.31   
2     24039.1         13.0      306.0                    3.06        0.23   
3     10047.4         13.4      369.0                    3.98        0.25   
4      8731.6         18.1      428.0                   14.73        0.41   

   psf_half_width  channel  slice_z  
0      110.000000        1      1.0  
1      129.800003        1      1.0  
2      131.100006        1      1.0  
3      143.000000        1      1.0  
4      150.100006        1      1.0   

Summary:
identifier: "2"
comment: ""
source: EXPERIMENT
state: RAW
element_count: 10
frame_count: 1
file {
  type: ELYRA
  path: "/home/docs/checkouts/readthedocs.org/user_builds/locan/envs/latest/lib/python3.11/site-packages/locan/tests/test_data/Elyra_dstorm_data.txt"
}
creation_time {
  2024-03-14T11:08:51.060046Z
}

Properties:
{'localization_count': 10, 'position_x': 19610.811087722697, 'uncertainty_x': 2109.405021031108, 'position_y': 18319.131814537763, 'uncertainty_y': 2608.8543142720196, 'intensity': 3145.0, 'frame': 1, 'region_measure_bb': 351167887.24, 'localization_density_bb': 2.847640790447807e-08, 'subregion_measure_bb': 75072.8}

Localization data from a custom text file#

Other custom text files can be read with a function that wraps the pandas.read_table() method.

path_csv = lc.ROOT_DIR / 'tests/test_data/five_blobs.txt'
print(path_csv, '\n')

/home/docs/checkouts/readthedocs.org/user_builds/locan/envs/latest/lib/python3.11/site-packages/locan/tests/test_data/five_blobs.txt 

Here data is loaded from a comma-separated-value file. Column names are read from the first line and a warning is given if the naming does not comply with locan conventions. Column names can also be provided as column. The separater, e.g. a tab ‘\t’ can be provided as sep.

dat_csv = lc.load_txt_file(path=path_csv, sep=',', columns=None, nrows=10)

print('Data head:')
print(dat_csv.data.head(), '\n')
print('Summary:')
dat_csv.print_summary()
print('Properties:')
print(dat_csv.properties)

Data head:
   index  position_x  position_y  cluster_label  frame
0      0       624.0       919.0              3      0
1      1       611.0       873.0              3      0
2      2       388.0      1015.0              0      0
3      3       209.0       465.0              2      0
4      4      1001.0       851.0              4      0 

Summary:
identifier: "3"
comment: ""
source: EXPERIMENT
state: RAW
element_count: 10
frame_count: 1
file {
  type: CUSTOM
  path: "/home/docs/checkouts/readthedocs.org/user_builds/locan/envs/latest/lib/python3.11/site-packages/locan/tests/test_data/five_blobs.txt"
}
creation_time {
  2024-03-14T11:08:51.086592Z
}

Properties:
{'localization_count': 10, 'position_x': 517.5, 'uncertainty_x': 87.40648971583543, 'position_y': 815.3, 'uncertainty_y': 65.24586832385123, 'frame': 0, 'region_measure_bb': 488950.0, 'localization_density_bb': 2.0451988955925965e-05, 'subregion_measure_bb': 2878.0}

Load localization data file#

A general function for loading localization data is provided. Targeting specific localization file formats is done through the file_format parameter.

path = lc.ROOT_DIR / 'tests/test_data/rapidSTORM_dstorm_data.txt'
print(path, '\n')

/home/docs/checkouts/readthedocs.org/user_builds/locan/envs/latest/lib/python3.11/site-packages/locan/tests/test_data/rapidSTORM_dstorm_data.txt 

dat = lc.load_locdata(path=path, file_type=lc.FileType.RAPIDSTORM, nrows=10)

dat.print_summary()

identifier: "4"
comment: ""
source: EXPERIMENT
state: RAW
element_count: 10
frame_count: 1
file {
  type: RAPIDSTORM
  path: "/home/docs/checkouts/readthedocs.org/user_builds/locan/envs/latest/lib/python3.11/site-packages/locan/tests/test_data/rapidSTORM_dstorm_data.txt"
}
creation_time {
  2024-03-14T11:08:51.111785Z
}

The file type can be specified by using the enum class FileType and use tab control to make a choice.

list(lc.FileType._member_names_)

['UNKNOWN_FILE_TYPE',
 'CUSTOM',
 'RAPIDSTORM',
 'ELYRA',
 'THUNDERSTORM',
 'ASDF',
 'NANOIMAGER',
 'RAPIDSTORMTRACK',
 'SMLM',
 'DECODE',
 'SMAP']

lc.FileType.RAPIDSTORM

<FileType.RAPIDSTORM: 2>

Adjust data types#

The data types of localization proparties are adjusted in all load functions by default to the following standdard types:

lc.PROPERTY_KEYS

{'index': 'integer',
 'original_index': 'integer',
 'position_x': 'float',
 'position_y': 'float',
 'position_z': 'float',
 'frame': 'integer',
 'frames_number': 'integer',
 'frames_missing': 'integer',
 'time': 'float',
 'intensity': 'float',
 'local_background': 'float',
 'local_background_sigma': 'float',
 'signal_noise_ratio': 'float',
 'signal_background_ratio': 'float',
 'chi_square': 'float',
 'two_kernel_improvement': 'float',
 'psf_amplitude': 'float',
 'psf_width': 'float',
 'psf_width_x': 'float',
 'psf_width_y': 'float',
 'psf_width_z': 'float',
 'psf_half_width': 'float',
 'psf_half_width_x': 'float',
 'psf_half_width_y': 'float',
 'psf_half_width_z': 'float',
 'psf_sigma': 'float',
 'psf_sigma_x': 'float',
 'psf_sigma_y': 'float',
 'psf_sigma_z': 'float',
 'uncertainty': 'float',
 'uncertainty_x': 'float',
 'uncertainty_y': 'float',
 'uncertainty_z': 'float',
 'channel': 'integer',
 'slice_z': 'float',
 'plane': 'integer',
 'cluster_label': 'integer'}

If this is not what you want, add convert = False.

path = lc.ROOT_DIR / 'tests/test_data/rapidSTORM_dstorm_data.txt'
print(path, '\n')

/home/docs/checkouts/readthedocs.org/user_builds/locan/envs/latest/lib/python3.11/site-packages/locan/tests/test_data/rapidSTORM_dstorm_data.txt 

locdata = lc.load_locdata(path=path, file_type=lc.FileType.RAPIDSTORM, nrows=10, convert=False)
locdata.data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   position_x        10 non-null     float64
 1   position_y        10 non-null     float64
 2   frame             10 non-null     int64  
 3   intensity         10 non-null     float64
 4   chi_square        10 non-null     float64
 5   local_background  10 non-null     float64
dtypes: float64(5), int64(1)
memory usage: 612.0 bytes

Maybe adjust types for selected localization properties.

other_types = {"frame": float}
df = lc.convert_property_types(locdata.data, types=other_types)
locdata.update(dataframe=df)
locdata.data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 6 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   position_x        10 non-null     float64
 1   position_y        10 non-null     float64
 2   frame             10 non-null     float64
 3   intensity         10 non-null     float64
 4   chi_square        10 non-null     float64
 5   local_background  10 non-null     float64
dtypes: float64(6)
memory usage: 612.0 bytes