MetaDatabaseLoader¶
class MetaDatabaseLoader(settings: Optional[dict] = None)
Bases: BaseDataPlugin
What you get by inheriting from MetaDatabaseLoader¶
MetaDatabaseLoader is the base class for loading the data written by a MetaDatabaseWriter subclass instance or any other method that produces an equivalent format.
Poriscope ships with SQLiteDBLoader, a subclass of MetaDatabaseLoader that reads data written by the SQLiteDBWriter subclass. While additional subclasses can read almost any format you desire, we strongly encourage standardization around this format. Think twice before creating additional subclasses of this base class. It is not sufficient to write just a MetaEventLoader subclass. In addition to this base class, you will also need a paired MetaDatabaseWriter subclass to write data in your target format.
Public Methods¶
Abstract Methods¶
These methods must be implemented by subclasses.
- abstractmethod MetaDatabaseLoader.add_columns_to_table(df: DataFrame, units: List[str | None], table_name: str) bool¶
- Parameters:
df (pd.DataFrame) – A pandas DataFrame. Must contain an ‘id’ column corresponding to the primary key of the target table, and one or more additional columns to be added.
units (List[Optional[str]]) – A list of strings specifying units for the new columns to be added. Must have length equal to the number of new cols, but can contain None values
table_name (str) – The name of the SQLite table to modify. This table must already exist in the databse.
- Returns:
True on success, False otherwise
- Return type:
- Raises:
ValueError – If the DataFrame does not contain an ‘id’ column or if the specified table does not exist.
IOError – If any write-related error occurs
Purpose: Adds new columns from a pandas DataFrame to an existing SQLite table
Create new columns in the specified table and populate them with the procided data, matching on the ‘id’ column against the primary id in the target table
- abstractmethod MetaDatabaseLoader.alter_database(queries: List[str]) bool¶
- Parameters:
queries (List[str]) – a list of queries to run on the database
- Returns:
True if the operation succeeded, False otherwise
- Return type:
Purpose: Run a given list of queries on the database. There is no validation here, use it sparingly.
- abstractmethod MetaDatabaseLoader.close_resources(channel: int | None = None) None¶
- Parameters:
channel (Optional[int]) – channel ID
Purpose: Clean up any open file handles or memory on app exit.
This is called during app exit or plugin deletion to ensure proper cleanup of resources that could otherwise leak. Do this for all channels if no channel is specified, otherwise limit your closure to the specified channel. If no such operation is needed, it suffices to
pass.
- abstractmethod MetaDatabaseLoader.get_channels_by_experiment(experiment: str) List[int] | None¶
- Parameters:
experiment (str) – The name of the experiment.
- Returns:
List of channel IDs.
- Return type:
Optional[List[int]]
Purpose: Retrieve a list of all channel identifiers (the identifier, not the primary key of the channels table) associated with a given experiment name or None on failure
- abstractmethod MetaDatabaseLoader.get_column_names_by_table(table: str | None = None) List[str] | None¶
- Parameters:
table (Optional[str]) – The name of the table.
- Returns:
List of column names.
- Return type:
Optional[List[str]]
Purpose: Retrieve the column names available in a specified table, all columns in the database is table is not specified, or None on failure
- abstractmethod MetaDatabaseLoader.get_column_units(column_name: str) str | None¶
- Parameters:
column_name (str) – The name of the column.
- Returns:
The units of the column.
- Return type:
Optional[str]
Purpose: Retrieve the units associated with a specific column name or None on failure
- abstractmethod MetaDatabaseLoader.get_empty_settings(globally_available_plugins: Dict[str, List[str]] | None = None, standalone=False) Dict[str, Dict[str, Any]]¶
- Parameters:
globally_available_plugins (Optional[ Dict[str, List[str]]]) – a dict containing all data plugins that exist to date, keyed by metaclass. Must include “MetaReader” as a key, with explicitly set Type MetaReader.
standalone (bool) – False if this is called as part of a GUI, True otherwise. Default False
- Returns:
the dict that must be filled in to initialize the filter
- Return type:
Purpose: Provide a list of settings details to users to assist in instantiating an instance of your MetaWriter subclass.
Get a dict populated with keys needed to initialize the filter if they are not set yet. This dict must have the following structure, but Min, Max, and Options can be skipped or explicitly set to None if they are not used. Value and Type are required. All values provided must be consistent with Type.
settings = {'Parameter 1': {'Type': <int, float, str, bool>, 'Value': <value> or None, 'Options': [<option_1>, <option_2>, ... ] or None, 'Min': <min_value> or None, 'Max': <max_value> or None }, ... }
Several parameter keywords are reserved: these are
‘Input File’ ‘Output File’ ‘Folder’
These must have Type str and will cause the GUI to generate widgets to allow selection of these elements when used
This function must implement returning of a dictionary of settings required to initialize the filter, in the specified format. Values in this dictionary can be accessed downstream through the
self.settingsclass variable. This structure is a nested dictionary that supplies both values and a variety of information about those values, used by poriscope to perform sanity and consistency checking at instantiation.While this function is technically not abstract in MetaEventLoader, which already has an implementation of this function that ensures that settings will have the required
Input Filekey available to users, in most cases you will need to override it to add any other settings required by your subclass or to specify which files types are allowed. If you need additional settings, which you almost certainly do, you MUST callsuper().get_empty_settings(globally_available_plugins, standalone)before any additional code that you add. For example, your implementation could look like this, to limit it to sqlite files:settings = super().get_empty_settings(globally_available_plugins, standalone) settings["Input File"]["Options"] = [ "SQLite3 Files (*.sqlite3)", "Database Files (*.db)", "SQLite Files (*.sqlite)", ] return settings
which will ensure that your have the
Input Filekey and limit visible options to sqlite3 files. By default, it will accept any file type as output, hence the specification of theOptionskey for the relevant plugin in the example above.
- abstractmethod MetaDatabaseLoader.get_event_counts_by_experiment_and_channel(experiment: str | None = None, channel: int | None = None) int¶
- Parameters:
- Returns:
event count matching the conditions
- Return type:
Purpose: Return the number of events in the database matching the experiment name and channel identifier.
If no channel name is provided, count across all channels for that experiment. If no experiment is provided, ignore channel and return the number of events in the entire database
- abstractmethod MetaDatabaseLoader.get_experiment_names(experiment_id: int | None = None) List[str] | None¶
- Parameters:
experiment_id (Optional[int]) – the id of the experiment for which to fetch the name
- Returns:
List of experiment names, or None on failure
- Return type:
Optional[List[str]]
Purpose: Retrieve a list of all unique experiment names registered in the database, or a singleton list if an id is given.
- abstractmethod MetaDatabaseLoader.get_llm_prompt() str¶
- Returns:
a prompt that gives an LLM context for the database and how to query it
- Return type:
Purpose: Return a prompt that will tell the LLM the structure of the database to be queried to assist users in accessing the data written in your format
- abstractmethod MetaDatabaseLoader.get_samplerate_by_experiment_and_channel(experiment: str, channel: int) float | None¶
- Parameters:
- Returns:
sampling rate for the specific expreiment-channel combination, or None on failure
- Return type:
Optional[float]
Purpose: Retrieve the sampling rate for a given experiment and channel id, or None on failure
- abstractmethod MetaDatabaseLoader.get_table_by_column(column: str) str | None¶
- Parameters:
column (str) – The name of the column.
- Returns:
List of table names.
- Return type:
List[str]
Purpose: Retrieve the names of the table in which the given column is found, or None on failure
- abstractmethod MetaDatabaseLoader.get_table_names() List[str] | None¶
- Returns:
List of table names.
- Return type:
Optional[List[str]]
Purpose: Retrieve the names of available tables in the database or None on failure.
- abstractmethod MetaDatabaseLoader.reset_channel(channel: int | None = None) None¶
- Parameters:
channel (Optional[int]) – channel ID
Purpose: Reset the state of a specific channel for a new operation or run.
This is called any time an operation on a channel needs to be cleaned up or reset for a new run. If channel is not None, handle only that channel, else reset all of them. In most cases for MetaDatabaseLoaders there is no need to reset and you can simplt
pass.
- abstractmethod MetaDatabaseLoader.validate_filter_query(query: str) Tuple[bool, str]¶
- Parameters:
query (str) – The SQL query string.
- Returns:
True, ""if the query is valid, andFalse, "[[helpful explanation]]"if it is not- Return type:
Purpose: Validate a SQL query without executing it.
Return
True, ""if the query is valid, andFalse, "[[helpful explanation]]"if it is not
Concrete Methods¶
- MetaDatabaseLoader.construct_event_data_query(conditions: str | None = None, experiments_and_channels: Dict[str, List[int] | None] | None = None) Tuple[str, str]¶
Construct a query that will get all event data matching a set of conditions
- Parameters:
conditions (Optional[str]) – Optional filter condition for query.
expeirments_and_channels – a dict of experiment names as keys as lists of channels to include as values. Can be None, and individual channel lists can be None to include all channels for that experiment
- Returns:
a valid SQL query and an empty string, or an empty string and a debug message
- Return type:
- MetaDatabaseLoader.construct_metadata_query(columns: List[str], conditions: str | None = None, experiments_and_channels: Dict[str, List[int] | None] | None = None) Tuple[str, str, str]¶
The query to be constructed will take one of three forms, depending on the tables in which the metadata reside.
If all queries are in the events table, then the query executed will be:
SELECT id, experiment_id, channel_id, event_id, [[columns]] FROM events WHERE [[conditions]]
If all queries are in the sublevels table, then the query executed will be:
SELECT id, experiment_id, channel_id, event_id, [[columns]] FROM sublevels WHERE [[conditions]]
If the columns are mixed between the tables, the query will be:
SELECT e.id, e.experiment_id, e.channel_id, e.event_id, [[events_columns]], [[sublevels_columns]] FROM events e JOIN sublevels s on e.id = s.event_db_id WHERE [[conditions]]
Note when constructing the conditions clause that it will need to take into account this structure.
- Parameters:
columns (List[str]) – List of column names to retrieve.
conditions (Optional[str]) – Optional filter condition for query.
expeirments_and_channels – a dict of experiment names as keys as lists of channels to include as values. Can be None, and individual channel lists can be None to include all channels for that experiment
- Returns:
a valid SQL query and an empty string, or an empty string and a debug message, and the table name of the affected id column
- Return type:
- MetaDatabaseLoader.export_subset_to_csv(output_folder: str, subset_name: str = '', conditions: str | None = None, experiments_and_channels: Dict[str, List[int] | None] | None = None) Generator[float, None, None]¶
Return a generator that shows progress toward outputting a csv version of the subset of the database satisfying the conditions, including both data and metadata
- Parameters:
output_folder (str) – The folder to which the subset should be printed. This is assumed to exist already and will raise an error if it does not.
conditions (Optional[str]) – Optional filter condition for query.
conditions – Optional string to append to filenames in the subset
expeirments_and_channels – a dict of experiment names as keys as lists of channels to include as values. Can be None, and individual channel lists can be None to include all channels for that experiment
- Returns:
a float between 0 and 1 representing progress toward completion
- Return type:
- Raises:
IOError if output_folder does not already exist
- Raises:
ValueError if the SQL string is invalid
- MetaDatabaseLoader.force_serial_channel_operations() bool¶
- Returns:
True if only one channel can run at a time, False otherwise
- Return type:
Purpose: Indicate whether operations on different channels must be serialized (not run in parallel).
- MetaDatabaseLoader.get_experiment_id_by_name(experiment_name: str) int | None¶
Retrieve a list of all unique experiment names registered in the database or a singleton list if a name is given.
- MetaDatabaseLoader.get_experiments_and_channels() Dict[str, List[int] | None]¶
Retrieve a mapping of experiment names to their associated channel lists.
Calls get_experiment_names() to fetch all experiment identifiers, then maps each experiment to its corresponding list of channels using get_channels_by_experiment().
- MetaDatabaseLoader.load_event_data(conditions: str | None = None, experiments_and_channels: Dict[str, List[int] | None] | None = None) Generator[Dict[str, int | float | ndarray[tuple[int, ...], dtype[float64]]], bool, None]¶
Load data and return a generator that gives a one-row dataframe corresponding one row returned by query Make sure you exhaust or explicitly abort the generator, or else connections will remain open You can assume that the query was generated by self.construct_event_data_query() and will have 10 colums: event_id, channel_id, experiment_id, data_format, baseline, stdev, padding_before, padding_after, samplerate, data where data is a bytes object to be interpreted using data_format
- Parameters:
conditions (Optional[str]) – Optional filter condition for query.
expeirments_and_channels – a dict of experiment names as keys as lists of channels to include as values. Can be None, and individual channel lists can be None to include all channels for that experiment
- Returns:
a generator that returns primary database id, experiment_id, channel_id, event_id, samplerate, padding_before, padding_after, samplerate, and a numpy array with event data
- Return type:
Generator[Tuple[int, int, int, int, float, int, int, npt.NDArray[np.float64]], bool, None]
- MetaDatabaseLoader.load_metadata(columns: List[str], conditions: str | None = None, experiments_and_channels: Dict[str, List[int] | None] | None = None) DataFrame¶
Fetch specified columns from the metadata database given a query
Will always include experiment_id, channel_id, and event_id in the dataframe in addition to requested columns.
- Parameters:
columns (List[str]) – List of column names to retrieve.
conditions (Optional[str]) – Optional filter condition for query.
expeirments_and_channels – a dict of experiment names as keys as lists of channels to include as values. Can be None, and individual channel lists can be None to include all channels for that experiment
- Returns:
pandas dataframe containing retrieved data
- Return type:
pd.DataFrame
- MetaDatabaseLoader.query_database_directly(query: str) DataFrame | None¶
Run a given query on the DB after basic validation.
- Parameters:
query (str) – query to run on the database
- Returns:
List of numpy arrays containing retrieved data.
- Return type:
Optional[pd.DataFrame]
- MetaDatabaseLoader.query_database_directly_and_get_generator(query: str) Generator[DataFrame, bool, None]¶
Run a given querry on the DB after basic validation and return a generator that feeds out one row at a time
Private Methods¶
Abstract Methods¶
These methods must be implemented by subclasses.
- abstractmethod MetaDatabaseLoader._init() None¶
Purpose: Perform generic class construction operations.
This is called immediately at the start of class creation and is used to do whatever is required to set up your reader. Note that no app settings are available when this is called, so this function should be used only for generic class construction operations. Most readers simply
passthis function.
- abstractmethod MetaDatabaseLoader._load_event_data(query: str) Generator[Dict[str, int | float | ndarray[tuple[int, ...], dtype[float64]]], bool, None]¶
Load data and return a generator that gives a one-row dataframe corresponding one row returned by query Make sure you exhaust the generator, or else connections will remain open You can assume that the query was generated by self.construct_event_data_query() and will have 5 colums: event_id, channel_id, experiment_id, data_format, data, baseline, stdev, padding_before, padding_after, data where data is a bytes object to be interpreted using data_format
- Parameters:
query (str) – a valid SQL query, checked in the calling function for validity
- Returns:
a generator that returns a dict with id, event_id, channel_id, experiment_id, samplerate, padding_before, padding_after, and numpy array with event data for raw, filtered, and fitted data
- Return type:
Generator[Dict[str, Union[int, int, int, int, float, int, int, npt.NDArray[np.float64], npt.NDArray[np.float64], npt.NDArray[np.float64]]], bool, None]
- abstractmethod MetaDatabaseLoader._load_metadata(query: str) DataFrame | None¶
- Parameters:
query (str) – a valid SQL query, checked in the calling function for validity
- Returns:
A dataframe containing the requested event data as columns or None on failure
- Return type:
Optional[pd.DataFrame]
Purpose: Load and return the data specified by a valid SQL query, or None on failure
The data should be formatted as a pandas Dataframe object
- abstractmethod MetaDatabaseLoader._load_metadata_generator(query: str) Generator[DataFrame, None, None]¶
- Parameters:
query (str) – query to run on the database
- Returns:
A generator that feeds out onne row at a time in the form of a single-line dataframe
- Return type:
Generator[pd.DataFrame, None, None]
Purpose: Load and yield the data specified by a valid SQL query one row at a time. Useful in cases where
_load_metadata()returns too much data for memory.Data should be formatted as a pandas dataframe in line with
_load_metadata(). Make sure you exhaust the generator when done with it, or else database connections will remain open.
- abstractmethod MetaDatabaseLoader._validate_settings(settings: dict) None¶
Validate that the settings dict contains the correct information for use by the subclass.
- Parameters:
settings (dict) – Parameters for event detection.
- Raises:
ValueError – If the settings dict does not contain the correct information.
Concrete Methods¶
- MetaDatabaseLoader.__init__(settings: dict | None = None)¶
Initialize and set up the plugin, if settings are available at this stage
- MetaDatabaseLoader._finalize_initialization()¶
Purpose: Apply application-specific settings to the plugin, if needed.
If additional initialization operations are required beyond the defaults provided in BaseDataPlugin or MetaDatabaseLoader that must occur after settings have been applied to the reader instance, you can override this function to add those operations.