MetaEventFitter¶
class MetaEventFitter(settings: Optional[dict] = None)
Bases: BaseDataPlugin
MetaEventFitter is the base class for fitting events within your nanopore data to extract physical insights from the details of translocation events. MetaEventFitter depends on and is linked at instantiation to a MetaEventLoader subclass instance that serves as its source of event data, meaning that creating and using one of these plugins requires that you first instantiate an event loader. MetaEventFinder can in turn be the child object of MetaDatabaseWriter subclass isntance for downstream saving of the metadata extracted by the fits.
What you get by inheriting from MetaEventFitter¶
MetaEventFitter provides a common and intuitive API through which to fit and extract metadata from nanopore events (whatever that means for you). In practice, typically means fitting sublevels, peaks, or other features of interest within your event for downstream postprocessing, visualization, and statistical analysis. The nanopore field has produced numerous methods of fitting nanopore data over the years. All of them could be implemented as subclasses of this base class in order to fit them into the overall poriscope workflow.
Public Methods¶
Abstract Methods¶
These methods must be implemented by subclasses.
- abstractmethod MetaEventFitter.close_resources(channel: int | None = None) None¶
- Parameters:
channel (Optional[int]) – the channel identifier
Purpose: Clean up any open file handles or memory.
This is called during app exit or plugin deletion to ensure proper cleanup of resources that could otherwise leak. Perform any actions necessary to gracefully close resources before app exit. If channel is not None, handle only that channel, else close all of them (taking care to respect thread safety if necessary). If no such operation is needed, it suffices to
pass.
- abstractmethod MetaEventFitter.construct_fitted_event(channel: int, index: int) ndarray[tuple[int, ...], dtype[float64]] | None¶
- Parameters:
- Returns:
numpy array of fitted data for the event, or None
- Return type:
Optional[npt.NDArray[np.float64]]
- Raises:
RuntimeError – if fitting is not complete yet
Purpose: Construct an array of data corresponding to the fit for the specified event.
Return a numpy array of floats that corresponds 1:1 to the underlying data, but which shows the fit instead of the raw data. What this means practically depends on what you are fitting, but the returned array must have length equal to the length of the raw data that went into the fit.
Concrete Methods¶
- MetaEventFitter.fit_events(channel: int, silent: bool = False, data_filter: Callable | None = None, indices: List[int] | None = None) Generator[float, bool | None, None]¶
Set up a generator that will walk through all provided events and calculate metadata relating to the sublevels, yielding its percentage completion each time next() is called on it. If silent flag is set, run through without yielding progress reports on the first call to next(). Once StopIteration is reached, internal lists of event metadata will be populated as entries in a dict keyed by event id.
- Parameters:
channel (int) – analyze only events from this channel
silent (bool) – indicate whether or not to report progress, default false
data_filter (Callable[[npt.NDArray[np.float64]],npt.NDArray[np.float64]]) – An optional function to call to preprocess the data before looking for events, usually a filter
indices (List[int]) – a list of indices to fit, ignoring the rest. Empty list fits all available indices.
- Returns:
Yield completion fraction on each iteration.
- Return type:
- MetaEventFitter.force_serial_channel_operations() bool¶
- Returns:
True if only one channel can run at a time, False otherwise
- Return type:
Purpose: Indicate whether operations on different channels must be serialized (not run in parallel).
By default, eventfitter plugins defer to the thread safety of their child MetaEventLoader instance. If any operation in your event finder is not thread-safe independent of the child reader object, this function should be overridden to simply return
True. Most event loaders are thread-safe since reading from a file on disk is usually so, and therefore no override is necessary. Take care to verify that the MetaReader: subclass instance on which this object depends is also threadsafe by callingself.eventloader.force_serial_channel_operations()to check.
- MetaEventFitter.get_channels()¶
get the number of available channels in the reader
- MetaEventFitter.get_empty_settings(globally_available_plugins: Dict[str, List[str]] | None = None, standalone=False) Dict[str, Dict[str, Any]]¶
- Parameters:
globally_available_plugins (Optional[ Dict[str, List[str]]]) – a dict containing all data plugins that exist to date, keyed by metaclass. Must include “MetaReader” as a key, with explicitly set Type MetaReader.
standalone (bool) – False if this is called as part of a GUI, True otherwise. Default False
- Returns:
the dict that must be filled in to initialize the filter
- Return type:
Purpose: Provide a list of settings details to users to assist in instantiating an instance of your MetaEventFinder subclass.
Get a dict populated with keys needed to initialize the filter if they are not set yet. This dict must have the following structure, but Min, Max, and Options can be skipped or explicitly set to None if they are not used. Value and Type are required. All values provided must be consistent with Type.
Your Eventfitter MUST include at least the “MetaEventLoader” key, which can be ensured by calling
settings = super().get_empty_settings(globally_available_plugins, standalone)before adding any additional settings keysThis function must implement returning of a dictionary of settings required to initialize the filter, in the specified format. Values in this dictionary can be accessed downstream through the
self.settingsclass variable. This structure is a nested dictionary that supplies both values and a variety of information about those values, used by poriscope to perform sanity and consistency checking at instantiation.While this function is technically not abstract in MetaEventFinder, which already has an implementation of this function that ensures that settings will have the required MetaReader key available to users, in most cases you will need to override it to add any other settings required by your subclass. If you need additional settings, which you almost ccertainly do, you MUST call
super().get_empty_settings(globally_available_plugins, standalone)before any additional code that you add. For example, your implementation could look like this:settings = super().get_empty_settings(globally_available_plugins, standalone) settings["Threshold"] = {"Type": float, "Value": None, "Min": 0.0, "Units": "pA" } settings["Min Duration"] = {"Type": float, "Value": 0.0, "Min": 0.0, "Units": "us" } settings["Max Duration"] = {"Type": float, "Value": 1000000.0, "Min": 0.0, "Units": "us" } settings["Min Separation"] = {"Type": float, "Value": 0.0, "Min": 0.0, "Units": "us" } return settings
which will ensure that your have the 3 keys specified above, as well as an additional key,
"MetaReader", as required by eventfinders. In the case of categorical settings, you can also supply the “Options” key in the second level dictionaries.
- MetaEventFitter.get_event_metadata_generator(channel: int) Generator[Tuple[dict, dict, ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]]], None, None]¶
Set up a generator that will return the metadata dictionary for events in sequence for a given channel
- MetaEventFitter.get_event_metadata_types() Dict[str, Type[int | float | str | bool]]¶
Return a dict of sublevel metadata along with associated datatypes for use by the database writer downstream.
- MetaEventFitter.get_event_metadata_units() Dict[str, str | None]¶
Return a dict of sublevel metadata units for use by the database writer downstream.
- MetaEventFitter.get_metadata_columns(channel: int) List[str]¶
- Parameters:
channel (int) – analyze only events from this channel
- Returns:
a list of column names
- Return type:
List[str]
Get a list of event metadata column variables
- MetaEventFitter.get_num_events(channel) int¶
get the number of events found in the channel if eventfinding has finished
- Parameters:
channel (int) – analyze only events from this channel
- Returns:
number of succesfully fitted events in the channel
- Return type:
- Raises:
RuntimeError – if called before eventfinding is completed in the given channel
- MetaEventFitter.get_plot_features(channel: int, index: int) Tuple[List[float] | None, List[float] | None, List[Tuple[float, float]] | None, List[str] | None, List[str] | None]¶
- Parameters:
- Returns:
a list of x locations to plot vertical lines, a list of y locations to plot horizontal lines, a list of (x,y) tuples on which to plot dots, labels for the vertical lines, labels for the horizontal lines.
- Return type:
Tuple[Optional[List[float]], Optional[List[float]], Optional[List[Tuple[float,float]]], Optional[List[str]], Optional[List[str]]]
- Raises:
RuntimeError – if fitting is not complete yet
Purpose: Flag features of interest on an event for display on plots
This is an optional function is used to Get a list of horizontal and vertical lines, as well as a list of points, and associated labels for the lines, to overlay on the graph generated by construct_fitted_event(). If no features need to be highlighted, you can return None for that elements. Otherwise, the list of horizonal lines must match in length ot the list of labels for it, etc. A subset of labels can be None, but if a list is returned, it must match the length of the corresponding list of features.
- MetaEventFitter.get_samplerate(channel: int) float¶
- Parameters:
channel (int) – the channel index
- Returns:
the samplerate of the associated event loader object
- Return type:
Return the samplerate of the associated reader object.
- MetaEventFitter.get_single_event_metadata(channel: int, index: int) Tuple[dict, dict, ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]], ndarray[tuple[int, ...], dtype[float64]] | None]¶
Return the metadata for the event and sublevels of the event, as well as the raw and fitted data.
- Parameters:
channel – Channel from which to retrieve event
index – Index of the event to retrieve
- Returns:
Tuple of event metadata, sublevel metadata, filtered data, raw data, and fitted data
- Raises:
RuntimeError – if fitting is not complete or data is missing
- MetaEventFitter.get_sublevel_columns(channel: int) List[str]¶
Get a list of event metadata column variables
- MetaEventFitter.get_sublevel_metadata_types() Dict[str, Type[int | float | str | bool]]¶
Assemble a dict of sublevel metadata along with associated datatypes for use by the database writer downstream.
- MetaEventFitter.get_sublevel_metadata_units() Dict[str, str | None]¶
Assemble a dict of sublevel metadata units for use by the database writer downstream.
- MetaEventFitter.report_channel_status(channel: int | None = None, init=False) str¶
Return a string detailing any pertinent information about the status of analysis conducted on a given channel
- MetaEventFitter.reset_channel(channel=None) None¶
- Parameters:
channel (Optional[int]) – the channel identifier
Purpose: Reset the state of a specific channel for a new operation or run, or all of them if no channel is specified.
MetaEventFitter already has an implementation of this function, but you may override it is you need to do further resetting beyond what is included in
reset_channel()already.Warning
This function implements core functionality required for broader plugin integration into Poriscope. If you do need to override it, you MUST call
super().reset_channel(channel)before any additional code that you add and it is on you to ensure that your additional code does not conflict with the implementation in MetaEventFinder.
Private Methods¶
Abstract Methods¶
These methods must be implemented by subclasses.
- abstractmethod MetaEventFitter._define_event_metadata_types() Dict[str, Type[int | float | str | bool]]¶
- Returns:
a dict of metadata keys and associated base dtypes
- Return type:
Purpose: Tell downstream operations what datatypes correspond to event metadata provided by this plugin
This data plugin divides event metadata into two types: event metadata, and sublevel metadata. Event metadata refers to numbers that apply to the event as a whole (for example, its duration, its maximal blockage state, etc. - things that have a single number per event). In this function, you must supply a dictionary in which they keys are the names of the event metadata you want to fit, and the values are the primitive datatype of that piece of metadata. All of this metadata must be populated during fitting. This dict must have ths same keys as that supplied in
_define_event_metadata_units(). Options for dtypes are int, float, str, bool - basic datatypes compatible with any downstream MetaDatabaseWriter subclass. For example:metadata_types = {} metadata_types["sublevel_current"] = float metadata_types["sublevel_stdev"] = float metadata_types["sublevel_blockage"] = float metadata_types["sublevel_duration"] = float metadata_types["sublevel_start_times"] = float metadata_types["sublevel_end_times"] = float metadata_types["sublevel_max_deviation"] = float metadata_types["sublevel_raw_ecd"] = float metadata_types["sublevel_fitted_ecd"] = float return metadata_types
Note that the base class will add additional keys to all event metadata (so do not duplicate these keys, they are handled for you: “start_time”, “num_sublevel”, “event_id”)
- abstractmethod MetaEventFitter._define_event_metadata_units() Dict[str, str | None]¶
- Returns:
a dict of metadata keys and associated base dtypes
- Return type:
Purpose: Tell downstream operations what units apply to event metadata provided by this plugin
This data plugin divides event metadata into two types: event metadata, and sublevel metadata. Event metadata refers to numbers that apply to the event as a whole (for example, its duration, its maximal blockage state, etc. - things that have a single number per event). In this function, you must supply a dictionary in which they keys are the names of the event metadata you want to fit, and the values are a string representing the units for that key. All of this metadata must be populated during fitting. This dict must have ths same keys as that supplied in
_define_event_metadata_types(). Units can be None. For example:metadata_units = {} metadata_units["duration"] = "us" metadata_units["fitted_ecd"] = "pC" metadata_units["raw_ecd"] = "pC" metadata_units["max_blockage"] = "pA" metadata_units["min_blockage"] = "pA" metadata_units["max_deviation"] = "pA" metadata_units["max_blockage_duration"] = "us" metadata_units["min_blockage_duration"] = "us" metadata_units["max_deviation_duration"] = "us" metadata_units["baseline_current"] = "pA" metadata_units["baseline_stdev"] = "pA" return metadata_units
Note that the base class will add additional keys to all event metadata (so do not duplicate these keys, they are handled for you: “start_time”, “num_sublevel”, “event_id”)
- abstractmethod MetaEventFitter._define_sublevel_metadata_types() Dict[str, Type[int | float | str | bool]]¶
- Returns:
a dict of metadata keys and associated base dtypes
- Return type:
Purpose: Tell downstream operations what datatypes correspond to sublevel metadata provided by this plugin
This data plugin divides event metadata into two types: event metadata, and sublevel metadata. Sublevel metadata refers to numbers that apply individual sublevels within an event (for example, the duration or blockage state of a single sublevel) as as such may have an arbitrary number of entries per event. In this function, you must supply a dictionary in which they keys are the names of the sublevel metadata you want to fit, and the values are the primitive datatype of that piece of metadata. All of this metadata must be populated during fitting. This dict must have ths same keys as that supplied in
_define_sublevel_metadata_units(). Options for dtypes are int, float, str, bool - basic datatypes compatible with any downstream MetaDatabaseWriter subclass. For example:metadata_types = {} metadata_types["sublevel_current"] = float metadata_types["sublevel_stdev"] = float metadata_types["sublevel_blockage"] = float metadata_types["sublevel_duration"] = float metadata_types["sublevel_start_times"] = float metadata_types["sublevel_end_times"] = float metadata_types["sublevel_max_deviation"] = float metadata_types["sublevel_raw_ecd"] = float metadata_types["sublevel_fitted_ecd"] = float return metadata_types
- abstractmethod MetaEventFitter._define_sublevel_metadata_units() Dict[str, str | None]¶
-
Purpose: Tell downstream operations what units apply to sublevel metadata provided by this plugin
This data plugin divides event metadata into two types: event metadata, and sublevel metadata. Sublevel metadata refers to numbers that apply individual sublevels within an event (for example, the duration or blockage state of a single sublevel) as as such may have an arbitrary number of entries per event. In this function, you must supply a dictionary in which they keys are the names of the event metadata you want to fit, and the values are a string representing the units for that key. All of this metadata must be populated during fitting. This dict must have ths same keys as that supplied in
_define_sublevel_metadata_types(). Unites can be None. For example:metadata_units = {} metadata_units["sublevel_current"] = "pA" metadata_units["sublevel_stdev"] = "pA" metadata_units["sublevel_blockage"] = "pA" metadata_units["sublevel_duration"] = "us" metadata_units["sublevel_start_times"] = "us" metadata_units["sublevel_end_times"] = "us" metadata_units["sublevel_max_deviation"] = "pA" metadata_units["sublevel_raw_ecd"] = "pC" metadata_units["sublevel_fitted_ecd"] = "pC" return metadata_units
- abstractmethod MetaEventFitter._init() None¶
Purpose: Perform generic class construction operations.
All data plugins have this function and must provide an implementation. This is called immediately at the start of class creation and is used to do whatever is required to set up your reader. Note that no app settings are available when this is called, so this function should be used only for generic class construction operations. Most readers simply
passthis function.
- abstractmethod MetaEventFitter._locate_sublevel_transitions(data: ndarray[tuple[int, ...], dtype[float64]], samplerate: float, padding_before: int | None, padding_after: int | None, baseline_mean: float | None, baseline_std: float | None) List[Any] | None¶
- Parameters:
data (npt.NDArray[np.float64]) – an array of data from which to extract the locations of sublevel transitions
samplerate (float) – the sampling rate
padding_before (Optional[int]) – the number of data points before the estimated start of the event in the chunk
padding_after (Optional[int]) – the number of data points after the estimated end of the event in the chunk
baseline_mean (Optional[float]) – the local mean value of the baseline current
baseline_std (Optional[float]) – the local standard deviation of the baseline current
- Returns:
a list of entries that details sublevel transitions. Normally this would be as a list of ints, but can be a list of tuples or other entries if more info is needed. First entry must correspond to the start of the event.
- Return type:
Optional[List[Any]]
- Raises:
ValueError – if the event is rejected. Note that ValueError will skip and reject the event but will not stop processing of the rest of the dataset.
AttributeError – if the fitting method cannot operate without provision of specific padding and baseline metadata and cannot rescue itself. This will cause a stop to processing of the dataset.
Purpose: Get a list of indices and optionally other metadata corresponding to the starting point of all sublevels within an event.
In this function, you must locate and return all features that qualify as “sublevels” for downstream processing and return a list of information that identifies the starting point of those sublevevels. The first element in the list must correspond to the start of the event (e.g. the level that corresponds to the padding before the event). This list can take any form at all and will be passed verbatim to
_populate_event_metadata()and_populate_sublevel_metadata(), meaning that you can encode extra information about the sublevels that you need in order to implement those functions. For example, if you have two different kinds of sublevels, you might pass a list of tuples that encode the index of the start of each sublevel along with a string representing its type, as in[(0, 'padding_before'), (100,'normal_blockage'), (200, 'padding_after')], or equivalently a dict that encodes the same information, for example[{'index': 0, 'type': 'padding_before'},{'index': 100, 'type': 'normal_blockage'},{'index': 200, 'type': 'padding_after'},]. The only restrictions are thatThe top-level structure must be a 1D iterable
Each entry must contain the index of the start of the sublevel
The first entry must correspond to the start of the event
Plugin must handle gracefully the case where any of the arguments except data are None, as not all event loaders are guaranteed to return these values. Raising an an acceptable handler, as it will be handled, and the event simply skipped as not fitted, in the event that this function Raises.
- abstractmethod MetaEventFitter._populate_event_metadata(data: ndarray[tuple[int, ...], dtype[float64]], samplerate: float, baseline_mean: float | None, baseline_std: float | None, sublevel_metadata: Dict[str, List[int | float | number]]) Dict[str, int | float | number]¶
Assemble a list of metadata to save in the event database later. Note that keys ‘start_time_s’ and ‘index’ are already handled in the base class and should not be touched here.
- Parameters:
data (npt.NDArray[np.float64]) – an array of data from which to extract the locations of sublevel transitions
samplerate (float) – the sampling rate
baseline_mean (Optional[float]) – the local mean value of the baseline current
baseline_std (Optional[float]) – the local standard deviation of the baseline current
sublevel_metadata (Dict[str, List[Numeric]]) – the dict of sublevel metadata built by self._populate_sublevel_metadata()
- Returns:
a dict of event metadata values
- Return type:
Purpose: Extract metadata for each sublevel within the event
The
sublevel_metadatalist corresponds to the return value of_populate_sublevel_metadata(). Using this information, provide values for all of the event metadata required by the fitter. This should be returned as a dict with keys that match exactly those defined in_define_event_metadata_types()and_define_event_metadata_units(). Values for each key should be a single value with type consistent with_define_sublevel_metadata_units(). Do not provide values for any reserved keys.
- abstractmethod MetaEventFitter._populate_sublevel_metadata(data: ndarray[tuple[int, ...], dtype[float64]], samplerate: float, baseline_mean: float | None, baseline_std: float | None, sublevel_starts: List[int]) Dict[str, ndarray[tuple[int, ...], dtype[int | float | number]]]¶
- Parameters:
data (npt.NDArray[np.float64]) – an array of data from which to extract the locations of sublevel transitions
samplerate (float) – the sampling rate
baseline_mean (Optional[float]) – the local mean value of the baseline current
baseline_std (Optional[float]) – the local standard deviation of the baseline current
sublevel_starts (List[int]) – the list of sublevel start indices located in self._locate_sublevel_transitions()
- Returns:
a dict of lists of sublevel metadata values, one list entry per sublevel for each piece of metadata
- Return type:
Dict[str, npt.NDArray[Numeric]]
Purpose: Extract metadata for each sublevel within the event
The
sublevel_startslist corresponds verbatim to the return value of_locate_sublevel_transitions(). Using this information, provide values for all of the sublevle metadata required by the fitter. This should be returned as a dict with keys that match exactly those defined in_define_sublevel_metadata_types()and_define_sublevel_metadata_units(). Values for each key should be a list of data with length exactly equal to that ofsublevel_startsand types consistent with_define_sublevel_metadata_units(). Do not provide values for any reserved keys.
- abstractmethod MetaEventFitter._post_process_events(channel: int) None¶
- Parameters:
channel (int) – the index of the channel to preprocess
Purpose: Apply any operations to the fits that need to occur after preliminary fitting is finished, for example, refining fits using information about the global dataset structure. Try to avoid computationally intensive operations here if possible. Most fitters can simple
pass.
- abstractmethod MetaEventFitter._pre_process_events(channel: int) None¶
- Parameters:
channel (int) – the channel to pre-process
Purpose: Apply any operations to the fits that need to occur before fitting occurs, for example finding the longest and shorted events. Try to avoid computationally intensive operations here if possible. Most fitters can simple
pass.
- abstractmethod MetaEventFitter._validate_settings(settings: dict) None¶
Validate that the settings dict contains the correct information for use by the subclass.
- Parameters:
settings (dict) – Parameters for event detection.
- Raises:
ValueError – If the settings dict does not contain the correct information.
Concrete Methods¶
- MetaEventFitter.__init__(settings: dict | None = None) None¶
Initialize the MetaEventFinder instance.
- MetaEventFitter._define_metadata_types() None¶
Define metadata datatypes for all columns calculated by the fitter - should not be touched
- MetaEventFitter._define_metadata_units() None¶
Define metadata datatypes for all columns calculated by the fitter - should not be touched
- MetaEventFitter._finalize_initialization() None¶
Purpose: Apply application-specific settings to the plugin, if needed.
This function is called at the end of the class constructor to perform additional initialization specific to the algorithm being implemented. If additional initialization operations are required beyond the defaults provided in BaseDataPlugin or MetaEventFinder that must occur after settings have been applied to the reader instance, you can override this function to add those operations, subject to the caveat below.
Warning
This function implements core functionality required for broader plugin integration into Poriscope. If you do need to override it, you MUST call
super()._finalize_initialization()before any additional code that you add, and take care to understand the implementation of bothapply_settings()and_finalize_initialization()before doing so to ensure that you are not conflicting with those functions.Should Raise if initialization fails.
- MetaEventFitter._validate_param_types(settings: dict) None¶
Validate that the filter_params dict contains correct data types
param settings: A dict specifying the parameters of the filter to be created. Required keys depend on subclass. :type settings: dict :raises TypeError: If the filter_params parameters are of the wrong type