Build a MetaReader subclass¶
- class poriscope.utils.MetaReader.MetaReader(settings: dict | None = None)
MetaReader is the base class for all things related to reading raw nanopore timeseries datafiles. It handles mapping groups of files that belong in the same experiment, separating them by channel in the case of multichannel experimental operations, and time-ordering files within a channel when many data files are written as part of a single experiment. Subsequently, it provides a common API through which to interact with that data, effectively standardizing data reading operations regardless of the source. Given the number of different file formats commonly in use in the nanopore field, this plugin will likely always have the largest number of subclasses.
What you get by inheriting from MetaReader¶
Regardless of the details of how your data is actually stored, MetaReader will provide a common and intuitive API with which to interact with it, stitching together all the files in your dataset to work seamlessly together as a single dataset. Datasets are broken down by channel ID and time, allowing slicing into data that might be spread across multiple files as though it were a single contiguous memory structure. Data can be retrieved either on an ad-hoc basis, or as a continuous generator that allows you to iterate through on demand. Metadata like sampling rate, the length of data available in each channel, etc., can be retrieved through the API directly.
Required Public API Methods¶
- MetaReader.get_empty_settings(globally_available_plugins: Dict[str, List[str]] | None = None, standalone=False) Dict[str, Dict[str, Any]]
- Parameters:
- Returns:
the dict that must be filled in to initialize the filter
- Return type:
Purpose: Provide a list of settings details to users to assist in instantiating an instance of your MetaReader subclass.
Get a dict populated with keys needed to initialize the filter if they are not set yet. This dict must have the following structure, but Min, Max, and Options can be skipped or explicitly set to None if they are not used. Value and Type are required. All values provided must be consistent with Type.
settings = {'Parameter 1': {'Type': <int, float, str, bool>, 'Value': <value> or None, 'Options': [<option_1>, <option_2>, ... ] or None, 'Min': <min_value> or None, 'Max': <max_value> or None, 'Units': <unit str> or None }, ... }
Several parameter keywords are reserved: these are
‘Input File’ ‘Output File’ ‘Folder’ and all MetaClass names
These must have Type str and will cause the GUI to generate appropriate widgets to allow selection of these elements when used.
This function must implement returning of a dictionary of settings required to initialize the filter, in the specified format. Values in this dictionary can be accessed downstream through the
self.settingsclass variable. This structure is a nested dictionary that supplies both values and a variety of information about those values, used by poriscope to perform sanity and consistency checking at instantiation.While this function is technically not abstract in MetaReader, which already has an implementation of this function that ensures that settings will have the required MetaReader key available to users, in most cases you will need to override it to add any other settings required by your subclass. The implementation in MetaReader provides a single key,
Input File, without specifing file options. If you need additional settings, or if you want to specify the file types that will show up in related file dialogs and be accepted as inputs (recommended) you would override this to specidy options, but you MUST callsettings = super().get_empty_settings(globally_available_plugins, standalone)first, which will ensure the existence of the “Input File” key. For example:settings = super().get_empty_settings(globally_available_plugins, standalone) settings["Input File"]["Options"] = ["ABF2 Files (*.abf)"] settings["Your Key"] = {"Type": float, "Value": None, "Min": 0.0, "Units": "pA" } return settings
which will ensure that your have key specified above, as well as an additional key,
Input File, as required by readers. You can learn more about formatting input file option strings in the PySide6 module documentation. In the case of multiple file types, supply the relevant strings as a comma-separated list in the “Options” key; poriscope will handle formatting it forPySide6.
- abstractmethod MetaReader.close_resources(channel: int | None = None) None
- Parameters:
channel (Optional[int]) – channel ID
Purpose: Clean up any open file handles or memory.
This is called during app exit or plugin deletion to ensure proper cleanup of resources that could otherwise leak. If channel is not None, handle only that channel, else close all of them. If no such operation is needed, it suffices to
pass. Note that readers that operate based on memmaps need not explicitly close those memmaps, as they will be handled by the garbage collector, but it does no harm to do so. Any open file handles should be closed explicitly if not closed at the end of read operations.
- abstractmethod MetaReader.reset_channel(channel: int | None = None) None
Perform any actions necessary to gracefully close resources before app exit. :param channel: channel ID :type channel: Optional[int]
Purpose: Reset the state of a specific channel for a new operation or run.
This is called any time an operation on a channel needs to be cleaned up or reset for a new run. If channel is not None, handle only that channel, else close all of them. If reading through a channel does not create any persistent state changes in your plugin, you can simply
passthis function.
Required Private Methods¶
- abstractmethod MetaReader._init() None
Purpose: Perform generic class construction operations.
All data plugins have this function and must provide an implementation. This is called immediately at the start of class creation and is used to do whatever is required to set up your reader. Note that no app settings are available when this is called, so this function should be used only for generic class construction operations. Most readers simply
passthis function.
- abstractmethod MetaReader._set_file_extension() str
- Returns:
the file extension
- Return type:
Purpose: Set the file extension for the file type this reader plugin handles.
This is a simple function that allows you to set the file extension (including the leading dot) of the file type that this reader plugin will read. It is used by downstream functions while mapping your data to assist in identifying files. It should be a single line:
return ".ext"
If you need to refer to this value again, you can access it via the class variable
self.file_extension.
- abstractmethod MetaReader._set_raw_dtype(configs: List[dict]) dtype
Set the data type for the raw data in files of this type
- Parameters:
configs (List[dict]) – List of configuration dictionaries corresponding to data files.
- Returns:
the dtype of the raw data in your data files
- Return type:
np.dtype
Purpose: Inform Poriscope what NumPy datatype to expect for raw data on disk.
This function is used to tell Poriscope what datatype to expect on disk for downstream use by
_map_data(). You should return a NumPy dtype object. For example, if you are using a 16-bit ADC code, you might returnnp.uint16. This is also a single-line function:return np.uint16
If you need to refer to this value again, you can access it via the class variable
self.dtype. For more details on NumPy dtypes, refer to the NumPy documentation on dtypes.
- abstractmethod MetaReader._get_file_pattern(file_name: str) str
- Parameters:
file_name (os.PathLike) – File name to get the base pattern for.
- Returns:
Base pattern for matching other files.
- Return type:
Purpose: Extract a glob pattern from an input filename to match all dataset files.
When you instantiate a reader plugin, you provide a single filename as input. However, in some cases, a dataset might comprise many files. This function requires you to extract a pattern from the given filename that can be used by
globto match all files belonging to your dataset.If your dataset consists of only a single file, you can simply return the original filename.
Example:
Consider a scenario where your dataset files follow a pattern with channel numbers and serial numbers, such as:
experiment_1_channel_01_001.logexperiment_1_channel_01_002.logexperiment_1_channel_02_001.logexperiment_1_channel_02_002.log
In this case, you could return a
globpattern like:experiment_1_channel_??_???.log
This pattern assumes the channel stamp will always be two digits and the serial number always three. If the lengths of these varying parts are uncertain, a more general pattern using wildcards would be:
experiment_1_channel_*_*.log
Poriscope will use this file pattern to search the folder of the input file for other files that match the pattern. It will not search outside of that folder.
For more information on
globpatterns, refer to the glob module documentation.
- abstractmethod MetaReader._get_configs(datafiles: List[PathLike]) List[dict]
- Parameters:
datafiles (List[os.PathLike]) – List of data files for which to load configurations.
- Returns:
List of configuration dictionaries.
- Return type:
List[dict]
Purpose: Extract configuration metadata from dataset files.
Given a list of filenames corresponding to the data files, construct a list of dictionaries containing any required configurations for use downstream. Your config dictionaries must have at a minimum the key ‘samplerate’ in them, and the list of configs must correspond one-to-one to the provided list of data files. All files in a dataset must have the same samplerate. Your reader will use these configs to map the data on disk, so you could include information like endianness, raw data type, details of any columns within the data, etc. Aside from the required samplerate key, this can be anything.
- abstractmethod MetaReader._get_file_time_stamps(file_names: List[PathLike], configs: List[dict]) List[str | int | float | datetime | date | datetime64]
- Parameters:
file_names (List[os.PathLike]) – List of file names to get time stamps for.
configs (List[dict]) – List of configuration dictionaries corresponding to data files.
- Returns:
List of serialization keys for timestamps in almost any format.
- Return type:
List[Union[str, int, float, datetime.datetime, datetime.date, np.datetime64]]
Purpose: Extract time stamps for sorting files chronologically within a channel.
Given a list of all the files in the experiment and the list of config dictionaries you defined above, extract a corresponding list of timestamps. These timestamps will be used to time-order the mapped data within each channel. The list must have the same length as both input lists and must be of a type that can be sorted into the desired time-ordering using the builtin
sort()method.
- abstractmethod MetaReader._get_file_channel_stamps(file_names: List[PathLike], configs: List[dict]) List[int]
- Parameters:
file_names (List[os.PathLike]) – List of file names to get channel stamps for.
configs (List[dict]) – List of configuration dictionaries corresponding to data files.
- Returns:
List of serialization keys for channels
- Return type:
List[int]
Purpose: Extract channel identifiers for grouping files by channel.
Given a list of all the files in the experiment and the list of config dictionaries you defined above, extract a corresponding list of channel identifiers as integers. These channel indices will be used to group the mapped data by channel. The list must have the same length as both input lists and must be a list of integers.
- abstractmethod MetaReader._map_data(datafiles: List[PathLike], configs: List[dict]) List[ndarray[tuple[int, ...], dtype[Any]]]
- Parameters:
datafiles (List[os.PathLike]) – List of data files to map.
configs (List[dict]) – List of configuration dictionaries corresponding to data files.
- Returns:
List of memmaps or numpy arrays mapped from data files.
- Return type:
List[numpy.ndarray]
Purpose: Map the provided data files into an accessible format, preferably memory-mapped views.
Using all the information provided in the implementations so far, in this function, you are asked to map the list of files provided in
datafiles, according to information given inconfigs. You can assume that the lists are of equal length and that the config file at a given index corresponds to the data file at the same index. You must return a list of views into those files. We strongly encourage the use ofmemmapwhere possible, in which case you may return a list of such memmaps with length equal to the input list of filenames.Warning
This function expects that the elements of the returned list can be indexed and sliced into like NumPy arrays, hence the suggestion to use memmaps, which avoid the need to actually load raw data into RAM before it is needed. In cases where memmap is not an option, you must still return NumPy array for each file, which may involve significant memory consumption. If this is impractical, it is possible to override this function to return, for example, a list of file handles instead, with the caveat that this will in turn require that you completely override
load_data()as well to properly handle your file access method manually.
- abstractmethod MetaReader._convert_data(data: ndarray[tuple[int, ...], dtype[int16]], config: dict, raw_data: bool = False) ndarray[tuple[int, ...], dtype[float64]]
- Parameters:
- Returns:
Converted data, and scale and offset if and only if raw_data is True
- Return type:
Purpose: Convert raw data from disk format to a usable numerical format.
Given a numpy array of raw data extracted from one of the
memmapinstances you defined in the previous function along with its associatedconfigdict, provide a means to turn this raw data into a numpy array of ~numpy.float64 double precision floats. For this purpose, if convenient, you can use the_scale_data()function, which will apply bitmasks, multiply data by a scaling factor, and add an offset, like so:def _scale_data(self, data: npt.NDArray[Any], copy:Optional[bool]=True, bitmask:Optional[np.uint64]=None, dtype:Optional[str]=None, scale:Optional[float]=None, offset:Optional[float]=None, raw_data:Optional[bool]=False) -> npt.NDArray[Any]: if bitmask == 0: bitmask = None if not raw_data: if (copy): data = np.copy(data) if (bitmask is not None): data = np.bitwise_and(data.astype(type(bitmask)), bitmask) if (dtype is not None): data = data.astype(dtype) if (scale is not None): data *= scale if (offset is not None): data += offset return data else: if not dtype: raise ValueError('Specify dtype to retrieve raw data') return data
if
raw_dataisTrue, your function must also return a scale and offset factor, like so:if raw_data: return data, scale, offset else: return data
- abstractmethod MetaReader._validate_settings(settings: dict) None
Validate that the settings dict contains the correct information for use by the subclass.
- Parameters:
settings (dict) – Parameters for event detection.
- Raises:
ValueError – If the settings dict does not contain the correct information.
Optional Method Overrides¶
Methods in this section have an implementation in either BaseDataPlugin or MetaReader, but they can be overridden if necessary to tweak the behavior of your plugin.
- MetaReader.force_serial_channel_operations() bool
- Returns:
True if only one channel can run at a time, False otherwise
- Return type:
Purpose: Indicate whether operations on different channels must be serialized (not run in parallel).
By default this simply returns
False, meaning that it is acceptable and thread-safe to run operations on different channels in different threads on this plugin. If such operation is not thread-safe, this function should be overridden to simply returnTrue. Most readers are thread-safe since reading from a file on disk is usually so, and therefore no override is necessary.
- MetaReader._finalize_initialization() None
Purpose: Apply application-specific settings to the plugin, if needed.
If additional initialization operations are required beyond the defaults provided in BaseDataPlugin or MetaReader that must occur after settings have been applied to the reader instance, you can override this function to add those operations, subject to the caveat below.
Warning
This function implements core functionality required for broader plugin integration into Poriscope. If you do need to override it, you MUST call
super()._finalize_initialization()before any additional code that you add, and take care to understand the implementation of bothapply_settings()and_finalize_initialization()before doing so to ensure that you are not conflicting with those functions.