MetaWriter¶
class MetaWriter(settings: Optional[dict] = None)
Bases: BaseDataPlugin
What you get by inheriting from MetaWriter¶
MetaWriter is the base class for writing the data corresponding to events found by a MetaEventFinder subclass instance events within your nanopore data and represents the first analysis and transformation step. MetaWriter depends on and is linked at instantiation to a MetaEventFinder subclass instance that serves as its source of nanopore data, meaning that creating and using one of these plugins requires that you first instantiate an eventfinder.
Poriscope ships with SQLiteEventWriter, a subclass of MetaWriter already that writes data to a sqlite3 format. While additional subclasses can write to almost any format you desire, we strongly encourage standardization around this format. Think twice before creating additional subclasses of this base class. It is not sufficient to write just a MetaWriter subclass. In addition to this base class, you will also need a paired MetaEventLoader subclass to read back and use the data you write to any other format for downstream analysis.
Warning
We strongly encourage standardization on the :ref:SQLiteDBWriter subclass, so please think carefully before creating other formats.
Public Methods¶
Abstract Methods¶
These methods must be implemented by subclasses.
- abstractmethod MetaWriter.close_resources(channel: int | None = None) None¶
- Parameters:
channel (Optional[int]) – channel ID
Purpose: Clean up any open file handles or memory.
This is called during app exit or plugin deletion, as well as at the end of any batch write operation, to ensure proper cleanup of resources that could otherwise leak. Do this for all channels if no channel is specified, otherwise limit your closure to the specified channel. Your files should be closed here, if they are not in your writing step. If no such operation is needed, it suffices to
pass. In the case of writers, this method is also called with a specific channel identifier at the end of any batch write operation (a call tocommit_events()), and so should be used to ensure atomic write operations if possible.
- abstractmethod MetaWriter.reset_channel(channel: int | None = None) None¶
- Parameters:
channel (Optional[int]) – channel ID
Purpose: Reset the state of a specific channel for a new operation or run.
This is called any time an operation on a channel needs to be cleaned up or reset for a new run. If channel is not None, handle only that channel, else close all of them. Most writers will create permanent state changes in the form of data written to the output file, that should be deleted or otherwise set up for subsequent overwrite when this function is called.
Concrete Methods¶
- MetaWriter.commit_events(channel: int) Generator[float, None, None]¶
Create a generator that will loop through events in self.eventfinder in channel and call self._write_data() to commit it to file
- MetaWriter.force_serial_channel_operations()¶
- Returns:
True if only one channel can run at a time, False otherwise
- Return type:
Purpose: Indicate whether operations on different channels must be serialized (not run in parallel).
By default, writer plugins are assumed to not be threadsafe and will run in serial mode when called from the poriscope GUI. If you want to change this, you must also ensure that the parent eventfinder object is threadsafe for pulling data from it. You can play it safe by calling
self.eventfinder.force_serial_channel_operations(), but it is possible that an eventfinder is not threadsafe for eventfinding but may be for pulling the events found for writing.
- MetaWriter.get_empty_settings(globally_available_plugins: Dict[str, List[str]] | None = None, standalone=False) Dict[str, Dict[str, Any]]¶
- Parameters:
globally_available_plugins (Optional[ Mapping[str, List[str]]]) – a dict containing all data plugins that exist to date, keyed by metaclass. Must include “MetaReader” as a key, with explicitly set Type MetaReader.
standalone (bool) – False if this is called as part of a GUI, True otherwise. Default False
- Returns:
the dict that must be filled in to initialize the filter
- Return type:
Purpose: Provide a list of settings details to users to assist in instantiating an instance of your MetaWriter subclass.
Get a dict populated with keys needed to initialize the filter if they are not set yet. This dict must have the following structure, but Min, Max, and Options can be skipped or explicitly set to None if they are not used. Value and Type are required. All values provided must be consistent with Type.
settings = {'Parameter 1': {'Type': <int, float, str, bool>, 'Value': <value> or None, 'Options': [<option_1>, <option_2>, ... ] or None, 'Min': <min_value> or None, 'Max': <max_value> or None }, ... }
This function must implement returning of a dictionary of settings required to initialize the writer, in the specified format. Values in this dictionary can be accessed downstream through the
self.settingsclass variable. This structure is a nested dictionary that supplies both values and a variety of information about those values, used by poriscope to perform sanity and consistency checking at instantiation.While this function is technically not abstract in MetaWriter, which already has an implementation of this function that ensures that settings will have the required MetaEventFinder key and
Output Filekey available to users, in most cases you will need to override it to add any other settings required by your subclass. If you need additional settings, which you almost certainly do, you MUST callsuper().get_empty_settings(globally_available_plugins, standalone)before any additional code that you add. For example, your implementation could look like this:settings = super().get_empty_settings(globally_available_plugins, standalone) settings["Output File"]["Options"] = [ "SQLite3 Files (*.sqlite3)", "Database Files (*.db)", "SQLite Files (*.sqlite)", ] settings["Experiment Name"] = {"Type": str} settings["Voltage"] = {"Type": float, "Units": "mV"} settings["Membrane Thickness"] = {"Type": float, "Units": "nm", "Min": 0} settings["Conductivity"] = {"Type": float, "Units": "S/m", "Min": 0} return settings
which will ensure that your have the 4 keys specified above, as well as two additional keys,
MetaReaderandOutput File. By default, it will accept any file type as output, hence the specification of theOptionskey for the relevant plugin in the example above.
Private Methods¶
Abstract Methods¶
These methods must be implemented by subclasses.
- abstractmethod MetaWriter._finalize_initialization()¶
Purpose: Perform generic class construction operations after settings are applied. This function is called at the end of the
apply_settings()function to perform additional initialization specific to the algorithm being implemented.Perform any initialization tasks required after settings are applied. You can access the values in the settings dict provided as needed in the class variable
self.settings[key]['Value']wherekeycorresponds to the keys in the provided settings dict (as provided toapply_settings()or to the constructor). You can freely make class variables here and you can assume (if using the poriscope app) that this will only be called from a single thread. .Should Raise if initialization fails.
- abstractmethod MetaWriter._init() None¶
Purpose: Perform generic class construction operations.
All data plugins have this function and must provide an implementation. This is called immediately at the start of class creation and is used to do whatever is required to set up your reader. Note that no app settings are available when this is called, so this function should be used only for generic class construction operations. Most readers simply
passthis function.
- abstractmethod MetaWriter._initialize_database(channel: int)¶
- Parameters:
channel (int) – the channel for which to initialize the database
Purpose: Initialize a database for subsequent write operations.
This function is called at the start of a write operation and is used to do anything you need to do in order to open the output file for writing. You are responsible for checking whether such an operation is needed (for example, by setting an appropriate flag to avoid duplicate innitialization). Note that this operation will be called for each channel and you must ensure that any initializations operations are threadsafe if you are not forcing serial channel operations (see
force_serial_channel_operations()).We strongly encourage atomic operations by ensuring that any file handles opened in this function are later closed in
close_resources()which will be called at the end of any batch write operation.
- abstractmethod MetaWriter._set_output_dtype() str¶
- Returns:
A string representing a
numpydtype- Return type:
Purpose: Set the datatype of the data to be saved for each event.
This function returns a string encoding a numpy datatype that tells the writer in what format the data should be stored in the database. If the output dtype exactly matches the intput dtype, the plugin will attempt to store raw data without any precision loss. In the case of a mismatch, it is not possible for poriscope to guarantee that there is no loss of precision between the input and output operation. If there is any dount, we suggest that use of double precision floating point numbers (
"<f8") will not incur any meaningful loss of precision in the vast majority of operations regardless of input type.
- abstractmethod MetaWriter._validate_settings(settings: dict) None¶
Validate that the settings dict contains the correct information for use by the subclass.
- Parameters:
settings (dict) – Parameters for event detection.
- Raises:
ValueError – If the settings dict does not contain the correct information.
- abstractmethod MetaWriter._write_channel_metadata(channel: int) None¶
- Parameters:
channel (int) – int indicating which output to flush
Purpose: Save any metadata required at the level of channels (for example, samplerate).
Given a channel index, write any required metadata for that channel. Typically this is done once per channel on the first related write operation. Remember to close any file handles used either in this function or
close_resources()depending on whether you need to keep those resources open for the event writing step that follows.
- abstractmethod MetaWriter._write_data(data: ndarray[tuple[int, ...], dtype[number]], channel: int, index: int, scale: float | None = None, offset: float | None = None, start_sample: int | None = 0, padding_before: int | None = 0, padding_after: int | None = None, baseline_mean: float | None = None, baseline_std: float | None = None, raw_data: bool = False, abort: bool | None = False, last_call: bool | None = False) bool¶
- Parameters:
data (numpy.ndarray) – 1D numpy array of data to write to the active file in the specified channel.
channel (int) – Int indicating the channel from which it was acquired.
index (int) – event index
scale (Optional[float]) – Float indicating scaling between provided data type and encoded form for storage, default None.
offset (Optional[float]) – Float indicating offset between provided data type and encoded form for storage, default None.
start_sample (Optional[int]) – Integer index of the starting point of the provided array relative to the start of the experimental run, default 0.
padding_before (Optional[int]) – the length of the padding before the actual event start
padding_after (Optional[int]) – the length of the padding after the actual event end
baseline_mean (Optional[float]) – The local baseline, if available
baseline_std (Optional[float]) – the local standard deviation, if available
raw_data (bool) – True means to simply write data as-is to file, False indicates to first rescale it. Default False.
batch_size (int) – Number of events to batch before insert, default 100.
last_call (bool) – If True, flush the remaining batch, default False.
- Returns:
success of the write operation.
- Return type:
Purpose: Append a single event data and metadata to the database of event data.
Given a series of metadata about the event to be written, write it to the database file (append to an existing databse in the case of atomic operations). Return True if that operation succeeds. If the write operation fails, Raise an exception for handling in the caller. Note that raising on a write failure will not cause a crash - poriscope will continue trying to write subsequent events and store the string associated with the raised error as reason for that write failure for downstream reporting.
Concrete Methods¶
- MetaWriter.__init__(settings: dict | None = None)¶
Initialize and set up output environment, save metadata for subclasses.
- MetaWriter._commit_events(channel: int) Generator[float, None, None]¶
Create a generator that will loop through events in self.eventfinder in channel and call self._write_data() to commit it to file
- MetaWriter._rescale_data_to_adc(data: ~numpy.ndarray, scale: float | None = None, offset: float | None = None, raw_data: bool = False, dtype: type = <class 'numpy.int16'>, adc_min: int = -32768, adc_max: int = 32767) tuple[ndarray[tuple[int, ...], dtype[number]], float | None, float | None]¶
Rescale data to int16 Chimera VC100-style adc codes.
For other adc code types or encoding schemes, this function should be overridden. Default to Chimera-style conversion.
- Parameters:
data (numpy.ndarray) – 1D numpy array of data to write to the active file in the specified channel.
scale (float, optional) – Float indicating scaling between provided data type and encoded form for storage. If None, scale is calculated based on the data to maximally use the available adc range.
offset (float, optional) – Float indicating offset between provided data type and encoded form for storage. If None, offset is calculated based on the data to maximally use the available adc range.
raw_data (bool) – Boolean, True means to simply write data as-is to file, False indicates to first rescale it. Default False.
dtype (type, optional) – Numpy dtype to use for storage. Defaults to 16-bit signed int.
adc_min (int) – Integer encoding the minimum adc code for the adc conversion.
adc_max (int) – Integer encoding the maximum adc code for the adc conversion.
- Returns:
Tuple containing rescaled data as numpy array, scale factor, and offset.
- Return type:
- MetaWriter._validate_param_types(settings: dict) None¶
Validate that the filter_params dict contains correct data types
param settings: A dict specifying the parameters of the filter to be created. Required keys depend on subclass. :type settings: dict :raises TypeError: If the filter_params parameters are of the wrong type