Tutorial about the LocData class#

import numpy as np
import pandas as pd

import locan as lc
/tmp/ipykernel_1475/3754365477.py:2: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
lc.show_versions(system=False, dependencies=False, verbose=False)
Locan:
   version: 0.20.0.dev41+g755b969

Python:
   version: 3.11.6

Sample data#

A localization has certain properties such as ‘position_x’. A list of localizations can be assembled into a dataframe:

df = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })

Instantiate LocData from a dataframe#

A LocData object carries localization data together with metadata and aggregated properties for the whole set of localizations.

We first instantiate a LocData object from the dataframe:

locdata = lc.LocData.from_dataframe(dataframe=df)
Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.
attributes = [x for x in dir(locdata) if not x.startswith('_')]
attributes
['alpha_shape',
 'bounding_box',
 'centroid',
 'concat',
 'convex_hull',
 'coordinate_keys',
 'coordinates',
 'count',
 'data',
 'dataframe',
 'dimension',
 'from_chunks',
 'from_collection',
 'from_coordinates',
 'from_dataframe',
 'from_selection',
 'indices',
 'inertia_moments',
 'meta',
 'oriented_bounding_box',
 'print_meta',
 'print_summary',
 'projection',
 'properties',
 'reduce',
 'references',
 'region',
 'reset',
 'uncertainty_keys',
 'update',
 'update_alpha_shape',
 'update_alpha_shape_in_references',
 'update_convex_hulls_in_references',
 'update_inertia_moments_in_references',
 'update_oriented_bounding_box_in_references',
 'update_properties_in_references']

LocData attributes#

The class attribute Locdata.count represents the number of all current LocData instantiations.

print('LocData count: ', lc.LocData.count)
LocData count:  1

The localization dataset is provided by the data attribute:

print(locdata.data.head())
   position_x  position_y  frame
0           0    0.721990      0
1           1    0.408696      1
2           2    0.684785      2
3           3    0.313493      3
4           4    0.740382      4

Aggregated properties are provided by the attribute properties. E.g. the property position_x represents the mean of the position_x for all localizations. We keep the name, since the aggregated dataset can be treated as just a single locdata event with position_x. This is used when dealing with data clusters.

locdata.properties
{'localization_count': 10,
 'position_x': 4.5,
 'uncertainty_x': 0.9574271077563381,
 'position_y': 0.551084446025565,
 'uncertainty_y': 0.08032460018240109,
 'frame': 0,
 'region_measure_bb': 6.405344503496165,
 'localization_density_bb': 1.561196278286328,
 'subregion_measure_bb': 19.423409889665816}

Since spatial coordinates are quite important one can check on coordinate_keys and dimension:

locdata.coordinate_keys
['position_x', 'position_y']
locdata.dimension
2

A numpy array of spatial coordinates is returned by:

locdata.coordinates
array([[0.        , 0.72198962],
       [1.        , 0.40869622],
       [2.        , 0.68478475],
       [3.        , 0.31349324],
       [4.        , 0.7403815 ],
       [5.        , 0.94204647],
       [6.        , 0.33209912],
       [7.        , 0.32689362],
       [8.        , 0.23034153],
       [9.        , 0.81011839]])

Metadata#

For detailed information see the Tutorial about metadata.

Metadata is provided by the attribute meta and can be printed as

locdata.print_meta()
identifier: "1"
source: DESIGN
state: RAW
history {
  name: "LocData.from_dataframe"
}
element_count: 10
frame_count: 10
creation_time {
  2024-03-14T11:08:55.025282Z
}

A summary of the most important metadata fields is printed as:

locdata.print_summary()
identifier: "1"
comment: ""
source: DESIGN
state: RAW
element_count: 10
frame_count: 10
creation_time {
  2024-03-14T11:08:55.025282Z
}

Metadata fields can be printed and changed individually:

print(locdata.meta.comment)
locdata.meta.comment = 'user comment'
print(locdata.meta.comment)
user comment

LocData.meta.map represents a dictionary structure that can be filled by the user. Both key and value have to be strings, if not a TypeError is thrown.

print(locdata.meta.map)
locdata.meta.map['user field'] = 'more information'
print(locdata.meta.map)
{}
{'user field': 'more information'}

Metadata can also be added at Instantiation:

locdata_2 = lc.LocData.from_dataframe(dataframe=df, meta={'identifier': 'myID_1', 
                                                   'comment': 'my own user comment'})
locdata_2.print_summary()
identifier: "myID_1"
comment: "my own user comment"
source: DESIGN
state: RAW
element_count: 10
frame_count: 10
creation_time {
  2024-03-14T11:08:55.107563Z
}

Instantiate locdata from selection#

A LocData object can also be instantiated from a selection of localizations. In this case the LocData object keeps a reference to the original locdata together with a list of indices (or a slice object)). The new dataset is assembled on request of the data attribute.

Typically a selection is derived using a selection method such that using LocData.from_selection() is not often necessary.

locdata_2 = lc.LocData.from_selection(locdata, indices=[1,2,3,4])
locdata_3 = lc.LocData.from_selection(locdata, indices=[5,6,7,8])

print('count: ', lc.LocData.count)
print('')
print(locdata_2.data)
count:  3

   position_x  position_y  frame
1           1    0.408696      1
2           2    0.684785      2
3           3    0.313493      3
4           4    0.740382      4
locdata_2.print_summary()
identifier: "3"
comment: "user comment"
source: DESIGN
state: MODIFIED
element_count: 4
frame_count: 4
creation_time {
  2024-03-14T11:08:55.025282Z
}
modification_time {
  2024-03-14T11:08:55.114205Z
}

The reference is kept in a private attribute as are the indices.

print(locdata_2.references)
print(locdata_2.indices)
<locan.data.locdata.LocData object at 0x7ff80c12add0>
[1, 2, 3, 4]

The reference is the same for both selections.

print(locdata_2.references is locdata_3.references)
True

Instantiate locdata from collection#

A LocDat object can further be instantiated from a collection of other LocData objects.

del(locdata_2, locdata_3)

locdata_1 = lc.LocData.from_selection(locdata, indices=[0,1,2])
locdata_2 = lc.LocData.from_selection(locdata, indices=[3,4,5])
locdata_3 = lc.LocData.from_selection(locdata, indices=[6,7,8])
locdata_c = lc.LocData.from_collection(locdatas=[locdata_1, locdata_2, locdata_3], meta={'identifier': 'my_collection'})

print('count: ', lc.LocData.count, '\n')
print(locdata_c.data, '\n')
print(locdata_c.properties, '\n')
locdata_c.print_summary()
count:  5 

   localization_count  position_x  uncertainty_x  position_y  uncertainty_y  \
0                   3         1.0        0.57735    0.605157       0.098816   
1                   3         4.0        0.57735    0.665307       0.185290   
2                   3         7.0        0.57735    0.296445       0.033086   

   frame  region_measure_bb  localization_density_bb  subregion_measure_bb  
0      0           0.626587                 4.787844              4.626587  
1      3           1.257106                 2.386433              5.257106  
2      6           0.203515                14.740915              4.203515   

{'localization_count': 3, 'position_x': 4.0, 'uncertainty_x': 1.7320508075688772, 'position_y': 0.3369779066935461, 'uncertainty_y': 0.055178550208717245, 'frame': 0, 'region_measure_bb': 2.213173912125862, 'localization_density_bb': 1.3555193216236465, 'subregion_measure_bb': 12.737724637375287} 

identifier: "my_collection"
comment: ""
source: DESIGN
state: RAW
element_count: 3
frame_count: 3
creation_time {
  2024-03-14T11:08:55.264923Z
}

In this case the reference are also kept in case the original localizations from the collected LocData object are requested.

print(locdata_c.references)
[<locan.data.locdata.LocData object at 0x7ff7fe8295d0>, <locan.data.locdata.LocData object at 0x7ff80c10ac90>, <locan.data.locdata.LocData object at 0x7ff7fe85d490>]

In case the collected LocData objects are not needed anymore and should be free for garbage collection the references can be deleted by a dedicated Locdata method

locdata_c.reduce()
print(locdata_c.references)
None

Concatenating LocData objects#

Lets have a second dataset with localization data:

del(locdata_2)

df_2 = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })

locdata_2 = lc.LocData.from_dataframe(dataframe=df_2)

print('First locdata:')
print(locdata.data.head())
print('')
print('Second locdata:')
print(locdata_2.data.head())
First locdata:
   position_x  position_y  frame
0           0    0.721990      0
1           1    0.408696      1
2           2    0.684785      2
3           3    0.313493      3
4           4    0.740382      4

Second locdata:
   position_x  position_y  frame
0           0    0.920945      0
1           1    0.278350      1
2           2    0.306932      2
3           3    0.843144      3
4           4    0.623920      4

In order to combine two sets of localization data from two LocData objects into a single LocData object use the class method LocData.concat:

locdata_new = lc.LocData.concat([locdata, locdata_2])
print('Number of localizations in locdata_new: ', len(locdata_new))
locdata_new.data.head()
Number of localizations in locdata_new:  20
position_x position_y frame
0 0 0.721990 0
1 1 0.408696 1
2 2 0.684785 2
3 3 0.313493 3
4 4 0.740382 4

Modifying data in place#

In case localization data has been modified in place, i.e. the dataset attribute is changed, all properties and hulls must be recomputed. This is best done by re-instantiating the LocData object using LocData.from_dataframe(); but it can also be done using the LocData.reset() function.

del(df, locdata)

df = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })

locdata = lc.LocData.from_dataframe(dataframe=df)

print(locdata.data.head())
   position_x  position_y  frame
0           0    0.335945      0
1           1    0.437390      1
2           2    0.181114      2
3           3    0.530118      3
4           4    0.680089      4
locdata.centroid
array([4.5       , 0.51240132])

Now if localization data is changed in place (which you should not do unless you have a good reason), properties and bounding box are not automatically adjusted.

locdata.dataframe = pd.DataFrame(
    {
        'position_x': np.arange(0,8),
        'position_y': np.random.random(8),
        'frame': np.arange(0,8),
    })

print(locdata.data.head())
   position_x  position_y  frame
0           0    0.138824      0
1           1    0.606688      1
2           2    0.559792      2
3           3    0.130943      3
4           4    0.775142      4
locdata.centroid  # so this returns incorrect values here
array([4.5       , 0.51240132])

Update them by re-instantiating a new LocData object:

locdata_new = lc.LocData.from_dataframe(dataframe=locdata.data)
locdata_new.centroid
array([3.5       , 0.37336717])
locdata_new.meta
identifier: "12"
source: DESIGN
state: RAW
history {
  name: "LocData.from_dataframe"
}
element_count: 8
frame_count: 8
creation_time {
  seconds: 1710414535
  nanos: 343495000
}

Alternatively you can use reset(). In this case, however, metadata is not updated and will provide wrong information.

locdata.reset()
<locan.data.locdata.LocData at 0x7ff7fe670e10>
locdata.centroid
array([3.5       , 0.37336717])
locdata.meta
identifier: "11"
source: DESIGN
state: RAW
history {
  name: "LocData.from_dataframe"
}
element_count: 10
frame_count: 10
creation_time {
  seconds: 1710414535
  nanos: 313441000
}

Copy LocData#

Shallow and deep copies can be made from LocData instances. In either case the class variable count and the metadata is not just copied but adjusted accordingly.

print('count: ', lc.LocData.count)
print('')
print(locdata_2.meta)
count:  7

identifier: "9"
source: DESIGN
state: RAW
history {
  name: "LocData.from_dataframe"
}
element_count: 10
frame_count: 10
creation_time {
  seconds: 1710414535
  nanos: 289072000
}
from copy import copy, deepcopy

print('count before: ', lc.LocData.count)
locdata_copy = copy(locdata_2)
locdata_deepcopy = deepcopy(locdata_2)
print('count after: ', lc.LocData.count)
count before:  7
count after:  9
print(locdata_copy.meta)
identifier: "13"
source: DESIGN
state: MODIFIED
history {
  name: "LocData.from_dataframe"
}
history {
  name: "LocData.copy"
  parameter: "None"
}
ancestor_identifiers: "9"
element_count: 10
frame_count: 10
creation_time {
  seconds: 1710414535
  nanos: 289072000
}
modification_time {
  seconds: 1710414535
  nanos: 390941000
}
print(locdata_deepcopy.meta)
identifier: "14"
source: DESIGN
state: MODIFIED
history {
  name: "LocData.from_dataframe"
}
history {
  name: "LocData.deepcopy"
  parameter: "None"
}
ancestor_identifiers: "9"
element_count: 10
frame_count: 10
creation_time {
  seconds: 1710414535
  nanos: 289072000
}
modification_time {
  seconds: 1710414535
  nanos: 392597000
}

Adding a property#

Any property that is created for a set of localizations (and represented as a python dictionary) can be added to the Locdata object. As an example, we compute the maximum distance between any two localizations and add that max_distance as new property to locdata.

max_distance = lc.max_distance(locdata)
max_distance
{'max_distance': 7.00140004632628}
locdata.properties.update(max_distance)
locdata.properties
{'localization_count': 8,
 'position_x': 3.5,
 'uncertainty_x': 0.8660254037844386,
 'position_y': 0.37336717472931963,
 'uncertainty_y': 0.08596222573944916,
 'frame': 0,
 'region_measure_bb': 4.509393984363259,
 'localization_density_bb': 1.7740743052704513,
 'subregion_measure_bb': 15.288398281246645,
 'region_measure_ch': 2.8511486808868756,
 'localization_density_ch': 2.8058866426816884,
 'subregion_measure_ch': 14.152580785545538,
 'max_distance': 7.00140004632628}

Adding a property to each localization in LocData.data#

In case you have processed your data and come up with a new property for each localization in the LocData object, this property can be added to data. As an example, we compute the nearest neighbor distance for each localization and add nn_distance as new property.

locdata.data
position_x position_y frame
0 0 0.138824 0
1 1 0.606688 1
2 2 0.559792 2
3 3 0.130943 3
4 4 0.775142 4
5 5 0.180445 5
6 6 0.316270 6
7 7 0.278833 7
nn = lc.NearestNeighborDistances().compute(locdata)
nn.results
nn_distance nn_index
0 1.104037 1
1 1.001099 2
2 1.001099 1
3 1.088077 2
4 1.163471 5
5 1.009182 6
6 1.000701 7
7 1.000701 6

To add nn_distance as new property to each localization in LocData object, use the pandas.assign function on the locdata.dataframe.

locdata.dataframe = locdata.dataframe.assign(nn_distance=nn.results['nn_distance'])
locdata.data
position_x position_y frame nn_distance
0 0 0.138824 0 1.104037
1 1 0.606688 1 1.001099
2 2 0.559792 2 1.001099
3 3 0.130943 3 1.088077
4 4 0.775142 4 1.163471
5 5 0.180445 5 1.009182
6 6 0.316270 6 1.000701
7 7 0.278833 7 1.000701

Adding nn_distance as new property to each localization in LocData object with dataframe=None#

In case the LocData object was created with LocData.from_selection() the LocData.dataframe attribute is None and LocData.data is generated from the referenced locdata and the index list.

In this case LocData.dataframe can still be filled with additional data that is merged upon returning LocData.data.

locdata_selection = lc.LocData.from_selection(locdata, indices=[1, 3, 4, 5])
locdata_selection.data
position_x position_y frame nn_distance
1 1 0.606688 1 1.001099
3 3 0.130943 3 1.088077
4 4 0.775142 4 1.163471
5 5 0.180445 5 1.009182
locdata_selection.dataframe
nn_selection = lc.NearestNeighborDistances().compute(locdata_selection)
nn_selection.results
nn_distance nn_index
0 2.055805 1
1 1.189535 2
2 1.163471 3
3 1.163471 2

Make sure the indices in nn.results match those in dat_selection.data:

locdata_selection.data.index
Index([1, 3, 4, 5], dtype='int64')
nn_selection.results.index = locdata_selection.data.index
nn_selection.results
nn_distance nn_index
1 2.055805 1
3 1.189535 2
4 1.163471 3
5 1.163471 2

Then assign the corresponding result to dataframe:

locdata_selection.dataframe = locdata_selection.dataframe.assign(nn_distance= nn_selection.results['nn_distance'])
locdata_selection.dataframe
nn_distance
1 2.055805
3 1.189535
4 1.163471
5 1.163471

Calling data will return the complete dataset.

locdata_selection.data
position_x position_y frame nn_distance_x nn_distance_y
1 1 0.606688 1 1.001099 2.055805
3 3 0.130943 3 1.088077 1.189535
4 4 0.775142 4 1.163471 1.163471
5 5 0.180445 5 1.009182 1.163471