MetaEventFinder¶
class MetaEventFinder(settings: Optional[dict] = None)
Bases: BaseDataPlugin
MetaEventFinder is the base class for finding events within your nanopore data and represents the first analysis and transformation step. MetaEventFinder depends on and is linked at instantiation to a MetaReader subclass instance that serves as its source of nanopore data, meaning that creating and using one of these plugins requires that you first instantiate a reader. MetaEventFinder can in turn be the child object of MetaWriter subclass isntance for downstream saving of the data found by a instance of a subclass of MetaEventFinder.
What you get by inheriting from MetaEventFinder¶
MetaEventFinder provides a common and intuitive API through which to identify segments of a nanopore timeseries that represent events (whatever that means for you) and flag them for writing to disk, excluding the uninteresting parts. In practice, this means that the size of nanopore data can be reduced by up to 1000x by keeping only the segments that matter. This operation is a precursor to downstream analysis, which operates only on the data segments flagged by subclasses of this base class.
Public Methods¶
Abstract Methods¶
These methods must be implemented by subclasses.
- abstractmethod MetaEventFinder.close_resources(channel: int | None = None) None¶
- Parameters:
channel (Optional[int]) – the channel identifier
Purpose: Clean up any open file handles or memory.
This is called during app exit or plugin deletion to ensure proper cleanup of resources that could otherwise leak. Perform any actions necessary to gracefully close resources before app exit. If channel is not None, handle only that channel, else close all of them (taking care to respect thread safety if necessary). If no such operation is needed, it suffices to
pass.
Concrete Methods¶
- MetaEventFinder.find_events(channel: int, ranges: List[Tuple[float, float]], chunk_length: float = 1.0, data_filter: Callable | None = None) Generator[float, bool | None, None]¶
Orchestrates event finding over multiple (start, end) ranges for a single channel. Yields progress for each chunk processed.
- Parameters:
channel – The channel to process.
ranges – List of (start, end) tuples in seconds.
chunk_length – Length of each chunk in seconds.
data_filter – Optional callable filter to apply to each chunk.
- Returns:
Generator yielding fractional progress (0.0–1.0)
- MetaEventFinder.force_serial_channel_operations() bool¶
- Returns:
True if only one channel can run at a time, False otherwise
- Return type:
Purpose: Indicate whether operations on different channels must be serialized (not run in parallel).
By default, eventfinder plugins defer to the thread safety of their child MetaReader instance. If any operation in your event finder is not thread-safe independent of the child reader object, this function should be overridden to simply return
True. Most event finders are thread-safe since reading from a file on disk is usually so, and therefore no override is necessary. Take care to verify that the MetaReader: subclass instance on which this object depends is also threadsafe by callingself.reader.force_serial_channel_operations()to check.
- MetaEventFinder.get_base_experiment_name() str¶
Get the base name of the experiment being analyzed
- Returns:
name of the experiment being analyzed
- Return type:
- MetaEventFinder.get_channels()¶
get the number of available channels in the reader
- MetaEventFinder.get_dtype() object¶
return the raw data type of the associated reader
- Returns:
the raw data type of the associated reader
- Return type:
- MetaEventFinder.get_empty_settings(globally_available_plugins: Dict[str, List[str]] | None = None, standalone=False) Dict[str, Dict[str, Any]]¶
- Parameters:
globally_available_plugins (Optional[ Dict[str, List[str]]]) – a dict containing all data plugins that exist to date, keyed by metaclass. Must include “MetaReader” as a key, with explicitly set Type MetaReader.
standalone (bool) – False if this is called as part of a GUI, True otherwise. Default False
- Returns:
the dict that must be filled in to initialize the filter
- Return type:
Purpose: Provide a list of settings details to users to assist in instantiating an instance of your MetaEventFinder subclass.
Get a dict populated with keys needed to initialize the filter if they are not set yet. This dict must have the following structure, but Min, Max, and Options can be skipped or explicitly set to None if they are not used. Value and Type are required. All values provided must be consistent with Type.
Your Eventfinder MUST include at least the “MetaReader” key, which can be ensured by calling
settings = super().get_empty_settings(globally_available_plugins, standalone)before adding any additional settings keysThis function must implement returning of a dictionary of settings required to initialize the filter, in the specified format. Values in this dictionary can be accessed downstream through the
self.settingsclass variable. This structure is a nested dictionary that supplies both values and a variety of information about those values, used by poriscope to perform sanity and consistency checking at instantiation.While this function is technically not abstract in MetaEventFinder, which already has an implementation of this function that ensures that settings will have the required MetaReader key available to users, in most cases you will need to override it to add any other settings required by your subclass. If you need additional settings, which you almost ccertainly do, you MUST call
super().get_empty_settings(globally_available_plugins, standalone)before any additional code that you add. For example, your implementation could look like this:settings = super().get_empty_settings(globally_available_plugins, standalone) settings["Threshold"] = {"Type": float, "Value": None, "Min": 0.0, "Units": "pA" } settings["Min Duration"] = {"Type": float, "Value": 0.0, "Min": 0.0, "Units": "us" } settings["Max Duration"] = {"Type": float, "Value": 1000000.0, "Min": 0.0, "Units": "us" } settings["Min Separation"] = {"Type": float, "Value": 0.0, "Min": 0.0, "Units": "us" } return settings
which will ensure that your have the 3 keys specified above, as well as an additional key,
"MetaReader", as required by eventfinders. In the case of categorical settings, you can also supply the “Options” key in the second level dictionaries.
- MetaEventFinder.get_event_data_generator(channel: int, data_filter: Callable | None = None, rectify: bool = False, raw_data: bool = False) Generator[ndarray[tuple[int, ...], dtype[float64]], None, None]¶
Set up a generator that will return the start and end indices of event i within the data chunk analyzed. If offset was provided during analysis, it will be included here.
- Parameters:
channel (int) – label for the channel from which to retrieve event indices
- Raises:
ValueError – If events have not been found or if index is out of bounds.
- Returns:
A Generator that gives data in an event and the index of the start of that event relative to the start of the file. If offset was provided during analysis, it will be included here.
- Return type:
- MetaEventFinder.get_event_indices(index: int) Tuple[Dict[int, List[int]], Dict[int, List[int]]]¶
return the start and end indices of event i within the data chunk analyzed.
- Parameters:
index (int) – The index of the event to retrieve data for
- Raises:
IndexError – If index is out of bounds
- Returns:
Lists of start and end indices for all events found in the data. If offset was provided during analysis, it will be included here.
- Return type:
- MetaEventFinder.get_eventfinding_status(channel: int) int¶
Check whether the eventfinder has finished processing a given channel yet
- MetaEventFinder.get_samplerate() float¶
Return the samplerate of the associated reader object.
- Returns:
the samplerate of the associated reader object
- Return type:
- MetaEventFinder.get_single_event_data(channel: int, index: int, data_filter: Callable | None = None, rectify: bool | None = False, raw_data: bool | None = False) Dict[str, ndarray[tuple[int, ...], dtype[float64]] | float] | None¶
Return a dictionary of data and metadata for the requested event
- Parameters:
channel (int) – label for the channel from which to retrieve event indices
index (int) – The index of the event to retrieve data for
data_filter (Optional[Callable]) – a function that is called to preprocess the data before it is returned
rectify (Optional[bool]) – should the data be returned rectified?
raw_data (Optional[bool]) – return raw adc codes on True, pA values on False
- Raises:
IndexError – If index is out of bounds
KeyError – If the channel does not exist
ValueError – if no events have been found in the channel
- Returns:
A dictionary of data and metadata for the specicied event
- Return type:
- MetaEventFinder.report_channel_status(channel: int | None = None, init=False) str¶
Return a string detailing any pertinent information about the status of analysis conducted on a given channel
- MetaEventFinder.reset_channel(channel: int | None = None) None¶
- Parameters:
channel (Optional[int]) – the channel identifier
Purpose: Reset the state of a specific channel for a new operation or run, or all of them if no channel is specified.
MetaEventFinder already has an implementation of this function, but you may override it is you need to do further resetting beyond what is included in
reset_channel()already.Warning
This function implements core functionality required for broader plugin integration into Poriscope. If you do need to override it, you MUST call
super().reset_channel(channel)before any additional code that you add and it is on you to ensure that your additional code does not conflict with the implementation in MetaEventFinder.
Private Methods¶
Abstract Methods¶
These methods must be implemented by subclasses.
- abstractmethod MetaEventFinder._filter_events(event_starts: List[int], event_ends: List[int], channel: int, last_end=0) Tuple[List[int], List[str]]¶
- Parameters:
event_starts (List[int]) – a list of starting data indices for events. You may assume that event_starts[0] < event_ends[0]. It is possible that there will be one more entry in this list than in event_ends.
event_ends (List[int]) – a list of ending data indices for events. You may assume that event_starts[0] < event_ends[0]
channel (int) – Bool indicating whether this is the first chunk of data in the series to be analyzed
last_end (int) – the index of the end of the last accepted event
- Returns:
A list of indices to reject from the given list of event starts and ends, and a list of reason for rejection
- Return type:
Given the lists of event start and event ends calculated by your implementation of
_find_events_in_chunk(), select which ones to reject. For this, you may assume that poriscope has corrected for events that straddle the start of the chink, but not the end, which is to say thatevent_starts[0] < event_ends[0]will beTrue, but it is possible thatevent_startwill have an additional trailing entry that you should not attempt to reject. You must return a list of indices (N.B, not the actual values inevent_startsorevent_ends) to reject, and an equal-length list of strings that provide a reason for rejection (be very terse).
- abstractmethod MetaEventFinder._find_events_in_chunk(data: ndarray[tuple[int, ...], dtype[float64]], mean: float, std: float, offset: int, entry_state: bool = False, first_chunk: bool = False) Tuple[List[int], List[int], bool]¶
- Parameters:
data (npt.NDArray[np.float64]) – Chunk of timeseries data to analyze. Assume it is rectified so that a blockage will always represent a reduction in absolute value.
mean (float) – Mean of the baseline on the given chunk. Must be positive.
std (float) – Standard deviation of the baseline on the given chunk
offset (int) – the index of the start of the chunk in the global dataset
entry_state (bool) – Bool indicating whether we start in the middle of an event (True) or not (False)
first_chunk (bool) – Bool indicating whether this is the first chunk of data in the series to be analyzed
- Raises:
ValueError – If event_params are invalid.
- Returns:
Lists of event start and end indices, and boolean entry state.
- Return type:
This is the core of the event finder. You will be given a segment of data as well as a series of related arguments, and you must write a function that flags the start and end times of all events in that data chunk. Bear in mind that events might straddle more than one event chunk. The
entrey_stateargument encodes whether or not the previous data chunk ended inside an event, and thefirst_chunkargument encodes whether this is the first call to this function. You are also given the mean and standard deviation of the chunk as determined by your implementation of_get_baseline_stats()as an input.Your function must return two lists and a boolean: integers representing the start times and end times of all events flagged in that chunk, and a bollean flag inficating whether nor not the chunk ended partway through an evnet. These lists can be different lengths, since as noted previously, your chunk could have events that straddle the start, end, or both, of the chunk.You are responsible only for flagging the start and end of events that are present in the given data chunk; the base class will handle stitching them all together.
- abstractmethod MetaEventFinder._get_baseline_stats(data: ndarray[tuple[int, ...], dtype[float64]]) tuple[float, float]¶
“
- Parameters:
data (npt.NDArray[np.float64]) – Chunk of timeseries data to compute statistics on.
- Returns:
Tuple of mean, and standard deviation the baseline.
- Return type:
This function must calculate and return the mean and standard deviation of the baseline for the given chunk of data, excluding any events present in the chunk. These values are used downstream to determine where the baseline deviates from the open pore current. By default, MetaEventFinder assumes a Gaussian distribution of baseline noise. You may assume that the data is rectified.
- abstractmethod MetaEventFinder._init() None¶
Purpose: Perform generic class construction operations.
All data plugins have this function and must provide an implementation. This is called immediately at the start of class creation and is used to do whatever is required to set up your reader. Note that no app settings are available when this is called, so this function should be used only for generic class construction operations. Most readers simply
passthis function.
- abstractmethod MetaEventFinder._validate_settings(settings: dict) None¶
Validate that the settings dict contains the correct information for use by the subclass.
- Parameters:
settings (dict) – Parameters for event detection.
- Raises:
ValueError – If the settings dict does not contain the correct information.
Concrete Methods¶
- MetaEventFinder.__init__(settings: dict | None = None) None¶
Initialize the MetaEventFinder instance.
- MetaEventFinder._finalize_initialization() None¶
This function is called at the end of the class constructor to perform additional initialization specific to the algorithm being implemented. kwargs provided to the base class constructor are available as class attributes.
Purpose: Apply application-specific settings to the plugin, if needed.
If additional initialization operations are required beyond the defaults provided in BaseDataPlugin or MetaEventFinder that must occur after settings have been applied to the reader instance, you can override this function to add those operations, subject to the caveat below.
Warning
This function implements core functionality required for broader plugin integration into Poriscope. If you do need to override it, you MUST call
super()._finalize_initialization()before any additional code that you add, and take care to understand the implementation of bothapply_settings()and_finalize_initialization()before doing so to ensure that you are not conflicting with those functions.Should Raise if initialization fails.
- MetaEventFinder._find_events_single_range(channel: int, start: float = 0, end: float = 0, chunk_length: float = 1.0, data_filter: Callable | None = None) Generator[float, bool | None, None]¶
Set up a generator that will walk through all provided data and find events, yielding its percentage completion each time next() is called on it. If silent flag is set, run through without yielding progress reports on the first call to next(). Once StopIteration is reached, internal lists of event starts and ends will be populated as entries in a dict keyed by channel index.
- MetaEventFinder._get_padding_length(event_starts: List[int], event_ends: List[int], last_end: int, last_duration: int, samplerate: float, last_call: bool = False, last_sample: int = 0) Tuple[List[int], List[int], int, int]¶
Determine the number of data points before and after an event to use for visual padding.
- Parameters:
event_starts (List[int]) – List of start indices of events in a chunk of data, referenced from the start of the file. May contain events which are later rejected.
event_ends (List[int]) – List of start indices of events in a chunk of data, referenced from the start of the file. May contain events which are later rejected.
last_end – index of the end of the last event detected in the previous chunk
samplerate (float) – Sampling rate for the reader in question
last_call (bool) – is this the last time the function will be called?
last_sample (int) – what is the value of the last data index in the channel? Can be 0 if last_call is False.
- Returns:
a list of padding before values and padding after values that do not conflict with neightbouring events, whether good or bad. Also an int for the amount of padding to add to the trailing end of events already saved, or None oif this is not necessary, and the value of the last evnet end
- Type:
- MetaEventFinder._merge_overlapping_ranges(ranges)¶
Merge a list of overlapping or adjacent (start, end) ranges into non-overlapping intervals.
Ranges with start >= end are filtered out as invalid. Overlapping or adjacent ranges are merged into a single continuous interval.
- MetaEventFinder._validate_param_types(settings: dict) None¶
Validate that the filter_params dict contains correct data types
param settings: A dict specifying the parameters of the filter to be created. Required keys depend on subclass. :type settings: dict :raises TypeError: If the filter_params parameters are of the wrong type