MetaDatabaseLoader¶

class MetaDatabaseLoader(settings: Optional[dict] = None)

Bases: BaseDataPlugin

What you get by inheriting from MetaDatabaseLoader¶

MetaDatabaseLoader is the base class for loading the data written by a MetaDatabaseWriter subclass instance or any other method that produces an equivalent format.

Poriscope ships with SQLiteDBLoader, a subclass of MetaDatabaseLoader that reads data written by the SQLiteDBWriter subclass. While additional subclasses can read almost any format you desire, we strongly encourage standardization around this format. Think twice before creating additional subclasses of this base class. It is not sufficient to write just a MetaEventLoader subclass. In addition to this base class, you will also need a paired MetaDatabaseWriter subclass to write data in your target format.

Public Methods¶

Abstract Methods¶

These methods must be implemented by subclasses.

abstractmethod MetaDatabaseLoader.add_columns_to_table(df: DataFrame, units: List[str | None], table_name: str) → bool¶

Parameters:

df (pd.DataFrame) – A pandas DataFrame. Must contain an ‘id’ column corresponding to the primary key of the target table, and one or more additional columns to be added.
units (List[Optional[str]]) – A list of strings specifying units for the new columns to be added. Must have length equal to the number of new cols, but can contain None values
table_name (str) – The name of the SQLite table to modify. This table must already exist in the databse.

Returns:

True on success, False otherwise

Return type:

bool

Raises:

ValueError – If the DataFrame does not contain an ‘id’ column or if the specified table does not exist.
IOError – If any write-related error occurs

Purpose: Adds new columns from a pandas DataFrame to an existing SQLite table

Create new columns in the specified table and populate them with the procided data, matching on the ‘id’ column against the primary id in the target table

abstractmethod MetaDatabaseLoader.alter_database(queries: List[str]) → bool¶

Parameters:: queries (List[str]) – a list of queries to run on the database
Returns:: True if the operation succeeded, False otherwise
Return type:: bool

Purpose: Run a given list of queries on the database. There is no validation here, use it sparingly.

abstractmethod MetaDatabaseLoader.close_resources(channel: int | None = None) → None¶

Parameters:: channel (Optional[int]) – channel ID

Purpose: Clean up any open file handles or memory on app exit.

This is called during app exit or plugin deletion to ensure proper cleanup of resources that could otherwise leak. Do this for all channels if no channel is specified, otherwise limit your closure to the specified channel. If no such operation is needed, it suffices to pass.

abstractmethod MetaDatabaseLoader.get_channels_by_experiment(experiment: str) → List[int] | None¶

Parameters:: experiment (str) – The name of the experiment.
Returns:: List of channel IDs.
Return type:: Optional[List[int]]

Purpose: Retrieve a list of all channel identifiers (the identifier, not the primary key of the channels table) associated with a given experiment name or None on failure

abstractmethod MetaDatabaseLoader.get_column_names_by_table(table: str | None = None) → List[str] | None¶

Parameters:: table (Optional[str]) – The name of the table.
Returns:: List of column names.
Return type:: Optional[List[str]]

Purpose: Retrieve the column names available in a specified table, all columns in the database is table is not specified, or None on failure

abstractmethod MetaDatabaseLoader.get_column_units(column_name: str) → str | None¶

Parameters:: column_name (str) – The name of the column.
Returns:: The units of the column.
Return type:: Optional[str]

Purpose: Retrieve the units associated with a specific column name or None on failure

abstractmethod MetaDatabaseLoader.get_empty_settings(globally_available_plugins: Dict[str, List[str]] | None = None, standalone=False) → Dict[str, Dict[str, Any]]¶

Parameters:

globally_available_plugins (Optional[ Dict[str, List[str]]]) – a dict containing all data plugins that exist to date, keyed by metaclass. Must include “MetaReader” as a key, with explicitly set Type MetaReader.
standalone (bool) – False if this is called as part of a GUI, True otherwise. Default False

Returns:

the dict that must be filled in to initialize the filter

Return type:

Dict[str, Dict[str, Any]]

Purpose: Provide a list of settings details to users to assist in instantiating an instance of your MetaWriter subclass.

Get a dict populated with keys needed to initialize the filter if they are not set yet. This dict must have the following structure, but Min, Max, and Options can be skipped or explicitly set to None if they are not used. Value and Type are required. All values provided must be consistent with Type.

settings = {'Parameter 1': {'Type': <int, float, str, bool>,
                                'Value': <value> or None,
                                'Options': [<option_1>, <option_2>, ... ] or None,
                                'Min': <min_value> or None,
                                'Max': <max_value> or None
                               },
               ...
               }

Several parameter keywords are reserved: these are

‘Input File’ ‘Output File’ ‘Folder’

These must have Type str and will cause the GUI to generate widgets to allow selection of these elements when used

This function must implement returning of a dictionary of settings required to initialize the filter, in the specified format. Values in this dictionary can be accessed downstream through the self.settings class variable. This structure is a nested dictionary that supplies both values and a variety of information about those values, used by poriscope to perform sanity and consistency checking at instantiation.

While this function is technically not abstract in MetaEventLoader, which already has an implementation of this function that ensures that settings will have the required Input File key available to users, in most cases you will need to override it to add any other settings required by your subclass or to specify which files types are allowed. If you need additional settings, which you almost certainly do, you MUST call super().get_empty_settings(globally_available_plugins, standalone) before any additional code that you add. For example, your implementation could look like this, to limit it to sqlite files:

settings = super().get_empty_settings(globally_available_plugins, standalone)
settings["Input File"]["Options"] = [
                        "SQLite3 Files (*.sqlite3)",
                        "Database Files (*.db)",
                        "SQLite Files (*.sqlite)",
                        ]
return settings

which will ensure that your have the Input File key and limit visible options to sqlite3 files. By default, it will accept any file type as output, hence the specification of the Options key for the relevant plugin in the example above.

abstractmethod MetaDatabaseLoader.get_event_counts_by_experiment_and_channel(experiment: str | None = None, channel: int | None = None) → int¶

Parameters:

experiment (Optional[str]) – The name of the experiment.
channel (Optional[int]) – The index of the channel

Returns:

event count matching the conditions

Return type:

int

Purpose: Return the number of events in the database matching the experiment name and channel identifier.

If no channel name is provided, count across all channels for that experiment. If no experiment is provided, ignore channel and return the number of events in the entire database

abstractmethod MetaDatabaseLoader.get_experiment_names(experiment_id: int | None = None) → List[str] | None¶

Parameters:: experiment_id (Optional[int]) – the id of the experiment for which to fetch the name
Returns:: List of experiment names, or None on failure
Return type:: Optional[List[str]]

Purpose: Retrieve a list of all unique experiment names registered in the database, or a singleton list if an id is given.

abstractmethod MetaDatabaseLoader.get_llm_prompt() → str¶

Returns:: a prompt that gives an LLM context for the database and how to query it
Return type:: str

Purpose: Return a prompt that will tell the LLM the structure of the database to be queried to assist users in accessing the data written in your format

abstractmethod MetaDatabaseLoader.get_samplerate_by_experiment_and_channel(experiment: str, channel: int) → float | None¶

Parameters:

experiment (str) – The name of the experiment in the database.
channel (int) – The channel id to get sampling rate for.

Returns:

sampling rate for the specific expreiment-channel combination, or None on failure

Return type:

Optional[float]

Purpose: Retrieve the sampling rate for a given experiment and channel id, or None on failure

abstractmethod MetaDatabaseLoader.get_table_by_column(column: str) → str | None¶

Parameters:: column (str) – The name of the column.
Returns:: List of table names.
Return type:: List[str]

Purpose: Retrieve the names of the table in which the given column is found, or None on failure

abstractmethod MetaDatabaseLoader.get_table_names() → List[str] | None¶

Returns:: List of table names.
Return type:: Optional[List[str]]

Purpose: Retrieve the names of available tables in the database or None on failure.

abstractmethod MetaDatabaseLoader.reset_channel(channel: int | None = None) → None¶

Parameters:: channel (Optional[int]) – channel ID

Purpose: Reset the state of a specific channel for a new operation or run.

This is called any time an operation on a channel needs to be cleaned up or reset for a new run. If channel is not None, handle only that channel, else reset all of them. In most cases for MetaDatabaseLoaders there is no need to reset and you can simplt pass.

abstractmethod MetaDatabaseLoader.validate_filter_query(query: str) → Tuple[bool, str]¶

Parameters:: query (str) – The SQL query string.
Returns:: True, "" if the query is valid, and False, "[[helpful explanation]]" if it is not
Return type:: Tuple[bool, str]

Purpose: Validate a SQL query without executing it.

Return True, "" if the query is valid, and False, "[[helpful explanation]]" if it is not

Concrete Methods¶

MetaDatabaseLoader.construct_event_data_query(conditions: str | None = None, experiments_and_channels: Dict[str, List[int] | None] | None = None) → Tuple[str, str]¶

Construct a query that will get all event data matching a set of conditions

Parameters:

conditions (Optional[str]) – Optional filter condition for query.
expeirments_and_channels – a dict of experiment names as keys as lists of channels to include as values. Can be None, and individual channel lists can be None to include all channels for that experiment

Returns:

a valid SQL query and an empty string, or an empty string and a debug message

Return type:

Tuple[str, str]

MetaDatabaseLoader.construct_metadata_query(columns: List[str], conditions: str | None = None, experiments_and_channels: Dict[str, List[int] | None] | None = None) → Tuple[str, str, str]¶

The query to be constructed will take one of three forms, depending on the tables in which the metadata reside.

If all queries are in the events table, then the query executed will be:

SELECT id, experiment_id, channel_id, event_id, [[columns]]
FROM events
WHERE [[conditions]]

If all queries are in the sublevels table, then the query executed will be:

SELECT id, experiment_id, channel_id, event_id, [[columns]]
FROM sublevels
WHERE [[conditions]]

If the columns are mixed between the tables, the query will be:

SELECT e.id, e.experiment_id, e.channel_id, e.event_id, [[events_columns]], [[sublevels_columns]]
FROM events e
JOIN sublevels s on e.id = s.event_db_id
WHERE [[conditions]]

Note when constructing the conditions clause that it will need to take into account this structure.

Parameters:

columns (List[str]) – List of column names to retrieve.
conditions (Optional[str]) – Optional filter condition for query.
expeirments_and_channels – a dict of experiment names as keys as lists of channels to include as values. Can be None, and individual channel lists can be None to include all channels for that experiment

Returns:

a valid SQL query and an empty string, or an empty string and a debug message, and the table name of the affected id column

Return type:

Tuple[str, str, str]

MetaDatabaseLoader.export_subset_to_csv(output_folder: str, subset_name: str = '', conditions: str | None = None, experiments_and_channels: Dict[str, List[int] | None] | None = None) → Generator[float, None, None]¶

Return a generator that shows progress toward outputting a csv version of the subset of the database satisfying the conditions, including both data and metadata

Parameters:

output_folder (str) – The folder to which the subset should be printed. This is assumed to exist already and will raise an error if it does not.
conditions (Optional[str]) – Optional filter condition for query.
conditions – Optional string to append to filenames in the subset
expeirments_and_channels – a dict of experiment names as keys as lists of channels to include as values. Can be None, and individual channel lists can be None to include all channels for that experiment

Returns:

a float between 0 and 1 representing progress toward completion

Return type:

float

Raises:

IOError if output_folder does not already exist

Raises:

ValueError if the SQL string is invalid

MetaDatabaseLoader.force_serial_channel_operations() → bool¶

Returns:: True if only one channel can run at a time, False otherwise
Return type:: bool

Purpose: Indicate whether operations on different channels must be serialized (not run in parallel).

MetaDatabaseLoader.get_experiment_id_by_name(experiment_name: str) → int | None¶

Retrieve a list of all unique experiment names registered in the database or a singleton list if a name is given.

Parameters:: experiment_id (Optional[int]) – the id of the experiment for which to fetch the name
Returns:: List of experiment names, or None on failure
Return type:: Optional[List[str]]

MetaDatabaseLoader.get_experiments_and_channels() → Dict[str, List[int] | None]¶

Retrieve a mapping of experiment names to their associated channel lists.

Calls get_experiment_names() to fetch all experiment identifiers, then maps each experiment to its corresponding list of channels using get_channels_by_experiment().

Returns:: Dictionary mapping experiment names to lists of channel indices.
Return type:: dict[str, Optional[list[int]]]

Load data and return a generator that gives a one-row dataframe corresponding one row returned by query Make sure you exhaust or explicitly abort the generator, or else connections will remain open You can assume that the query was generated by self.construct_event_data_query() and will have 10 colums: event_id, channel_id, experiment_id, data_format, baseline, stdev, padding_before, padding_after, samplerate, data where data is a bytes object to be interpreted using data_format

Parameters:

conditions (Optional[str]) – Optional filter condition for query.
expeirments_and_channels – a dict of experiment names as keys as lists of channels to include as values. Can be None, and individual channel lists can be None to include all channels for that experiment

Returns:

a generator that returns primary database id, experiment_id, channel_id, event_id, samplerate, padding_before, padding_after, samplerate, and a numpy array with event data

Return type:

Generator[Tuple[int, int, int, int, float, int, int, npt.NDArray[np.float64]], bool, None]

MetaDatabaseLoader.load_metadata(columns: List[str], conditions: str | None = None, experiments_and_channels: Dict[str, List[int] | None] | None = None) → DataFrame¶

Fetch specified columns from the metadata database given a query

Will always include experiment_id, channel_id, and event_id in the dataframe in addition to requested columns.

Parameters:

columns (List[str]) – List of column names to retrieve.
conditions (Optional[str]) – Optional filter condition for query.
expeirments_and_channels – a dict of experiment names as keys as lists of channels to include as values. Can be None, and individual channel lists can be None to include all channels for that experiment

Returns:

pandas dataframe containing retrieved data

Return type:

pd.DataFrame

MetaDatabaseLoader.query_database_directly(query: str) → DataFrame | None¶

Run a given query on the DB after basic validation.

Parameters:: query (str) – query to run on the database
Returns:: List of numpy arrays containing retrieved data.
Return type:: Optional[pd.DataFrame]

MetaDatabaseLoader.query_database_directly_and_get_generator(query: str) → Generator[DataFrame, bool, None]¶

Run a given querry on the DB after basic validation and return a generator that feeds out one row at a time

Parameters:: query (str) – query to run on the database
Returns:: A generator that feeds out onne row at a time in the form of a single-line dataframe
Return type:: Generator[pd.DataFrame, bool, None]

MetaDatabaseLoader.report_channel_status(channel: int | None = None, init=False) → str¶

Return a string detailing any pertinent information about the status of analysis conducted on a given channel

Parameters:

channel (Optional[int]) – channel ID
init (bool) – is the function being called as part of plugin initialization? Default False

Returns:

the status of the channel as a string

Return type:

str

Private Methods¶

Abstract Methods¶

These methods must be implemented by subclasses.

abstractmethod MetaDatabaseLoader._init() → None¶

Purpose: Perform generic class construction operations.

This is called immediately at the start of class creation and is used to do whatever is required to set up your reader. Note that no app settings are available when this is called, so this function should be used only for generic class construction operations. Most readers simply pass this function.

abstractmethod MetaDatabaseLoader._load_event_data(query: str) → Generator[Dict[str, int | float | ndarray[tuple[int, ...], dtype[float64]]], bool, None]¶

Load data and return a generator that gives a one-row dataframe corresponding one row returned by query Make sure you exhaust the generator, or else connections will remain open You can assume that the query was generated by self.construct_event_data_query() and will have 5 colums: event_id, channel_id, experiment_id, data_format, data, baseline, stdev, padding_before, padding_after, data where data is a bytes object to be interpreted using data_format

Parameters:: query (str) – a valid SQL query, checked in the calling function for validity
Returns:: a generator that returns a dict with id, event_id, channel_id, experiment_id, samplerate, padding_before, padding_after, and numpy array with event data for raw, filtered, and fitted data
Return type:: Generator[Dict[str, Union[int, int, int, int, float, int, int, npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64]]], bool, None]

abstractmethod MetaDatabaseLoader._load_metadata(query: str) → DataFrame | None¶

Parameters:: query (str) – a valid SQL query, checked in the calling function for validity
Returns:: A dataframe containing the requested event data as columns or None on failure
Return type:: Optional[pd.DataFrame]

Purpose: Load and return the data specified by a valid SQL query, or None on failure

The data should be formatted as a pandas Dataframe object

abstractmethod MetaDatabaseLoader._load_metadata_generator(query: str) → Generator[DataFrame, None, None]¶

Parameters:: query (str) – query to run on the database
Returns:: A generator that feeds out onne row at a time in the form of a single-line dataframe
Return type:: Generator[pd.DataFrame, None, None]

Purpose: Load and yield the data specified by a valid SQL query one row at a time. Useful in cases where _load_metadata() returns too much data for memory.

Data should be formatted as a pandas dataframe in line with _load_metadata(). Make sure you exhaust the generator when done with it, or else database connections will remain open.

abstractmethod MetaDatabaseLoader._validate_settings(settings: dict) → None¶

Validate that the settings dict contains the correct information for use by the subclass.

Parameters:: settings (dict) – Parameters for event detection.
Raises:: ValueError – If the settings dict does not contain the correct information.

Concrete Methods¶

MetaDatabaseLoader.__init__(settings: dict | None = None)¶: Initialize and set up the plugin, if settings are available at this stage

MetaDatabaseLoader._finalize_initialization()¶

Purpose: Apply application-specific settings to the plugin, if needed.

If additional initialization operations are required beyond the defaults provided in BaseDataPlugin or MetaDatabaseLoader that must occur after settings have been applied to the reader instance, you can override this function to add those operations.

MetaDatabaseLoader._format_debug_msg(debug: str) → str¶

Strip out newlines and unnecessary whitespace from SQL queries for printing

Parameters:: debug (str) – a string containing an error message and an SQL string for correction
Returns:: the input string with whitepsace removed and newlines in it to format for export
Return type:: str