Tutorial about metadata#

import numpy as np
import pandas as pd
from google.protobuf import json_format, text_format

import locan as lc
/tmp/ipykernel_1586/2539966667.py:2: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
lc.show_versions(system=False, dependencies=False, verbose=False)
Locan:
   version: 0.20.0.dev41+g755b969

Python:
   version: 3.11.6

Metadata definition#

We have define a canonical set of metadata to accompany localization data.

Metadata is described by protobuf messages. Googles protobuf format is advantageous to enforce metdata definitions that can be easily attached to various file formats, exchanged with other programmes and implemented in different programming languages.

Metadata is instantiated through messages defined in the locan.data.metadata_pb2 module.

list(lc.data.metadata_pb2.DESCRIPTOR.message_types_by_name.keys())
Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.
['Operation',
 'File',
 'Address',
 'Affiliation',
 'Person',
 'ExperimentalSample',
 'ExperimentalSetup',
 'OpticalUnit',
 'Illumination',
 'Detection',
 'Camera',
 'Acquisition',
 'Lightsheet',
 'Experiment',
 'Localizer',
 'Relation',
 'Property',
 'Metadata']

Each class contains a logical set of information that is integrated in the main class Metadata.

Metadata contains the following keys:

metadata = lc.data.metadata_pb2.Metadata()
list(metadata.DESCRIPTOR.fields_by_name.keys())
['identifier',
 'comment',
 'source',
 'state',
 'history',
 'ancestor_identifiers',
 'properties',
 'localization_properties',
 'element_count',
 'frame_count',
 'file',
 'relations',
 'experiment',
 'localizer',
 'map',
 'creation_time',
 'modification_time',
 'production_time']

Each field has a predefined type and can be set to appropriate values:

metadata.comment = "This is a comment"
try:
    metadata.comment = 1
except Exception as e:
    print(e)
bad argument type for built-in operation
metadata
comment: "This is a comment"

Metadata values including the default values can be shown in JSON format or as dictionary:

json_format.MessageToDict(metadata)
{'comment': 'This is a comment'}
# except empty fields with repeated message classes
json_format.MessageToDict(metadata, including_default_value_fields=True, preserving_proto_field_name=True)
{'comment': 'This is a comment',
 'identifier': '',
 'source': 'UNKNOWN_SOURCE',
 'state': 'UNKNOWN_STATE',
 'history': [],
 'ancestor_identifiers': [],
 'properties': [],
 'localization_properties': [],
 'element_count': '0',
 'frame_count': '0',
 'relations': [],
 'map': {}}
json_format.MessageToJson(metadata, including_default_value_fields=True, preserving_proto_field_name=True)
'{\n  "comment": "This is a comment",\n  "identifier": "",\n  "source": "UNKNOWN_SOURCE",\n  "state": "UNKNOWN_STATE",\n  "history": [],\n  "ancestor_identifiers": [],\n  "properties": [],\n  "localization_properties": [],\n  "element_count": "0",\n  "frame_count": "0",\n  "relations": [],\n  "map": {}\n}'

To print metadata with timestamp and duration in a well formatted string use:

lc.metadata_to_formatted_string(metadata)
'comment: "This is a comment"\n'

Set metadata fields#

Repeated fields#

To set selected fields instantiate the appropriate messages. For list fields use message.add().

metadata = lc.data.metadata_pb2.Metadata()

ou = metadata.experiment.setups.add().optical_units.add()
ou.detection.camera.electrons_per_count = 13.26

metadata
experiment {
  setups {
    optical_units {
      detection {
        camera {
          electrons_per_count: 13.26
        }
      }
    }
  }
}

Timestamp fields#

Timestamp fields contain information on date and time zone and are of type google.protobuf.Timestamp.

import time
metadata = lc.data.metadata_pb2.Metadata()
metadata.creation_time.GetCurrentTime()
metadata.creation_time
seconds: 1710414556
nanos: 851861000
metadata.creation_time.FromJsonString('2022-05-14T06:58:00.514893Z')
metadata.creation_time.ToJsonString()
'2022-05-14T06:58:00.514893Z'

Time duration fields contain information on time intervals and are of type google.protobuf.Duration.

metadata.experiment.setups.add().optical_units.add().detection.camera.integration_time.FromMilliseconds(20)
metadata.experiment.setups[0].optical_units[0].detection.camera.integration_time.ToMilliseconds()
# metadata.experiment.setups[0].optical_units[0].detection.camera.integration_time.ToJsonString()
20

To print metadata with timestamp and duration in a well formatted string use:

lc.metadata_to_formatted_string(metadata)
'experiment {\n  setups {\n    optical_units {\n      detection {\n        camera {\n          integration_time {\n            0.020s\n          }\n        }\n      }\n    }\n  }\n}\ncreation_time {\n  2022-05-14T06:58:00.514893Z\n}\n'

Metadata scheme#

The overall scheme can be instantiated and visualized:

metadata = lc.data.metadata_pb2.Metadata()
scheme = lc.message_scheme(metadata)
scheme
{'identifier': '',
 'comment': '',
 'source': 'UNKNOWN_SOURCE',
 'state': 'UNKNOWN_STATE',
 'history': {'name': '', 'parameter': ''},
 'ancestor_identifiers': [],
 'properties': {'identifier': '',
  'comment': '',
  'name': '',
  'unit': '',
  'type': '',
  'map': {}},
 'localization_properties': {'identifier': '',
  'comment': '',
  'name': '',
  'unit': '',
  'type': '',
  'map': {}},
 'element_count': '0',
 'frame_count': '0',
 'relations': {'identifier': '',
  'comment': '',
  'map': {},
  'file': {'identifier': '',
   'comment': '',
   'type': 'UNKNOWN_FILE_TYPE',
   'path': '',
   'groups': []}},
 'map': {},
 'file': {'identifier': '',
  'comment': '',
  'type': 'UNKNOWN_FILE_TYPE',
  'path': '',
  'groups': []},
 'experiment': {'identifier': '',
  'comment': '',
  'experimenters': {'identifier': '',
   'comment': '',
   'first_name': '',
   'last_name': '',
   'title': '',
   'affiliations': {'institute': '',
    'department': '',
    'address': {'address_lines': [],
     'city': '',
     'city_code': '',
     'country': ''}},
   'emails': [],
   'roles': [],
   'address': {'address_lines': [],
    'city': '',
    'city_code': '',
    'country': ''}},
  'samples': {'identifier': '',
   'comment': '',
   'targets': [],
   'fluorophores': [],
   'buffers': [],
   'map': {}},
  'setups': {'identifier': '',
   'comment': '',
   'optical_units': {'identifier': '',
    'comment': '',
    'illumination': {'identifier': '',
     'comment': '',
     'lightsource': '',
     'power': 0.0,
     'area': 0.0,
     'power_density': 0.0,
     'wavelength': 0.0,
     'map': {}},
    'detection': {'identifier': '',
     'comment': '',
     'map': {},
     'camera': {'identifier': '',
      'comment': '',
      'name': '',
      'model': '',
      'gain': 0.0,
      'electrons_per_count': 0.0,
      'pixel_count_x': 0,
      'pixel_count_y': 0,
      'pixel_size_x': 0.0,
      'pixel_size_y': 0.0,
      'flipped': False,
      'map': {},
      'offset': 0.0,
      'serial_number': '',
      'integration_time': '0s'}},
    'acquisition': {'identifier': '',
     'comment': '',
     'frame_count': 0,
     'frame_of_interest_first': 0,
     'frame_of_interest_last': 0,
     'stack_count': 0,
     'stack_step_count': 0,
     'stack_step_size': 0.0,
     'map': {},
     'time_start': '1970-01-01T00:00:00Z',
     'time_end': '1970-01-01T00:00:00Z'},
    'lightsheet': {'identifier': '',
     'comment': '',
     'angle_x': 0.0,
     'angle_y': 0.0,
     'angle_z': 0.0,
     'map': {}}},
   'map': {}},
  'map': {}},
 'localizer': {'identifier': '',
  'comment': '',
  'software': '',
  'intensity_threshold': 0.0,
  'psf_fixed': False,
  'psf_size': 0.0,
  'map': {}},
 'creation_time': '1970-01-01T00:00:00Z',
 'modification_time': '1970-01-01T00:00:00Z',
 'production_time': '1970-01-01T00:00:00Z'}

Metadata from toml file#

You can provide metadata in a toml file.

metadata_toml = \
"""
# Define the class (message) instances.

[[messages]]
name = "metadata"
module = "locan.data.metadata_pb2"
class_name = "Metadata"


# Fill metadata attributes
# Headings must be a message name or valid attribute.
# Use [[]] to add repeated elements.
# Use string '2022-05-14T06:58:00Z' for Timestamp elements.
# Use int in nanoseconds for Duration elements.

[metadata]
identifier = "123"
comment = "my comment"
ancestor_identifiers = ["1", "2"]
production_time = '2022-05-14T06:58:00Z'

[[metadata.experiment.experimenters]]
first_name = "First name"
last_name = "Last name"

[[metadata.experiment.experimenters.affiliations]]
institute = "Institute"
department = "Department"

[[metadata.experiment.setups]]
identifier = "1"

[[metadata.experiment.setups.optical_units]]
identifier = "1"

[metadata.experiment.setups.optical_units.detection.camera]
identifier = "1"
name = "camera name"
model = "camera model"
electrons_per_count = 3.1
integration_time = 10_000_000

[metadata.localizer]
software = "rapidSTORM"

[[metadata.relations]]
identifier = "1"
"""
toml_out = lc.metadata_from_toml_string(metadata_toml)
for k, v in toml_out.items():
    print(k, ":\n\n", v)
metadata :

 identifier: "123"
comment: "my comment"
ancestor_identifiers: "1"
ancestor_identifiers: "2"
relations {
  identifier: "1"
}
experiment {
  experimenters {
    first_name: "First name"
    last_name: "Last name"
    affiliations {
      institute: "Institute"
      department: "Department"
    }
  }
  setups {
    identifier: "1"
    optical_units {
      identifier: "1"
      detection {
        camera {
          identifier: "1"
          name: "camera name"
          model: "camera model"
          electrons_per_count: 3.1
          integration_time {
            nanos: 10000000
          }
        }
      }
    }
  }
}
localizer {
  software: "rapidSTORM"
}
production_time {
  seconds: 1652511480
}

To load from file:

Metadata for LocData#

Metadata is instantiated for each LocData object and accessible through the LocData.meta attribute.

Sample data#

df = pd.DataFrame(
    {
        'position_x': np.arange(0,10),
        'position_y': np.random.random(10),
        'frame': np.arange(0,10),
    })
locdata = lc.LocData.from_dataframe(dataframe=df)

locdata.meta
identifier: "1"
source: DESIGN
state: RAW
history {
  name: "LocData.from_dataframe"
}
element_count: 10
frame_count: 10
creation_time {
  seconds: 1710414556
  nanos: 906704000
}

Fields can also be printed as well formatted string (using lc.metadata_to_formatted_string):

locdata.print_meta()
identifier: "1"
source: DESIGN
state: RAW
history {
  name: "LocData.from_dataframe"
}
element_count: 10
frame_count: 10
creation_time {
  2024-03-14T11:09:16.906704Z
}

A summary of the most important metadata is printed as:

locdata.print_summary()
identifier: "1"
comment: ""
source: DESIGN
state: RAW
element_count: 10
frame_count: 10
creation_time {
  2024-03-14T11:09:16.906704Z
}

Metadata fields can be printed and changed individually:

print(locdata.meta.comment)
locdata.meta.comment = 'user comment'
print(locdata.meta.comment)
user comment

Metadata can also be added at instantiation:

locdata_2 = lc.LocData.from_dataframe(dataframe=df, meta={'identifier': 'myID_1', 
                                                   'comment': 'my own user comment'})
locdata_2.print_summary()
identifier: "myID_1"
comment: "my own user comment"
source: DESIGN
state: RAW
element_count: 10
frame_count: 10
creation_time {
  2024-03-14T11:09:16.933277Z
}