Data Loaders

Functions for loading spike train data from various file formats, including pickle, NWB, and Neo-compatible formats.

Lightweight loaders that convert common neurophysiology formats into spikedata.SpikeData objects.

Supported inputs (best-effort, optional deps):

HDF5 (generic): spike times, (indices,times), or raster matrices
NWB: reads Units table spike_times (via pynwb if available, else h5py)
KiloSort/Phy outputs: spike_times.npy + spike_clusters.npy (+ optional TSV)
SpikeInterface: from a SortingExtractor
IBL (International Brain Laboratory): via ONE API + brainwidemap

Times are converted to milliseconds to match SpikeData conventions. These helpers avoid hard dependencies: optional libraries are imported lazily.

spikelab.data_loaders.data_loaders.load_spikedata_from_hdf5(filepath, *, raster_dataset=None, raster_bin_size_ms=None, spike_times_dataset=None, spike_times_index_dataset=None, spike_times_unit='s', fs_Hz=None, group_per_unit=None, group_time_unit='s', idces_dataset=None, times_dataset=None, times_unit='s', raw_dataset=None, raw_time_dataset=None, raw_time_unit='s', length_ms=None, metadata=None)[source]

Load spike trains from a generic HDF5 file using one of four supported input styles.

Exactly one input style must be specified. The four styles are: raster matrix, ragged arrays, group-per-unit, and paired arrays.

Parameters:

filepath (str) – Path to the HDF5 file.
raster_dataset (str | None) – Dataset path for a 2D raster/counts matrix (units x time). Activates raster style.
raster_bin_size_ms (float | None) – Bin width in milliseconds. Required for raster style.
spike_times_dataset (str | None) – Dataset path for flat concatenated spike times. Activates ragged style (requires spike_times_index_dataset).
spike_times_index_dataset (str | None) – Dataset path for cumulative end-of-unit indices into the flat spike times array.
spike_times_unit (str) – Time unit for ragged spike times (‘s’, ‘ms’, or ‘samples’).
fs_Hz (float | None) – Sampling frequency in Hz. Required when any time unit is ‘samples’.
group_per_unit (str | None) – HDF5 group path containing one dataset per unit. Activates group-per-unit style.
group_time_unit (str) – Time unit for group-per-unit datasets (‘s’, ‘ms’, or ‘samples’).
idces_dataset (str | None) – Dataset path for unit index array. Activates paired-arrays style (requires times_dataset).
times_dataset (str | None) – Dataset path for spike times array (paired with idces_dataset).
times_unit (str) – Time unit for paired spike times (‘s’, ‘ms’, or ‘samples’).
raw_dataset (str | None) – Dataset path for optional raw analog data.
raw_time_dataset (str | None) – Dataset path for the raw data time vector.
raw_time_unit (str) – Time unit for the raw time vector (‘s’, ‘ms’, or ‘samples’).
length_ms (float | None) – Recording duration in milliseconds. If not provided, inferred from the latest spike time.
metadata (Mapping | None) – Additional metadata to attach to the resulting SpikeData.

Returns:

The loaded spike train data.

Return type:

sd (SpikeData)

Raises:

ValueError – If not exactly one input style is specified, or if required arguments are missing.

spikelab.data_loaders.data_loaders.load_spikedata_from_hdf5_raw_thresholded(filepath, dataset, *, fs_Hz, threshold_sigma=5.0, filter=True, hysteresis=True, direction='both')[source]

Threshold-and-detect spikes from an HDF5 dataset of raw traces.

Parameters:

filepath (str) – Path to HDF5 file.
dataset (str) – HDF5 dataset path containing raw traces shaped (channels, time).
fs_Hz (float) – Sampling frequency in Hz.
threshold_sigma (float) – Threshold in units of per-channel standard deviation.
filter (dict | bool) – If True, apply default Butterworth bandpass; if dict, pass to filter; if False, no filtering.
hysteresis (bool) – Use rising-edge detection if True.
direction (str) – ‘both’ | ‘up’ | ‘down’.

Returns:

The detected spike train data.

Return type:

sd (SpikeData)

spikelab.data_loaders.data_loaders.load_spikedata_from_nwb(filepath, *, prefer_pynwb=True, length_ms=None)[source]

Load spike trains from an NWB file’s Units table.

Parameters:

filepath (str) – Path to the NWB file.
prefer_pynwb (bool) – If True, try pynwb first; if False, try h5py.
length_ms (float | None) – Recording duration in milliseconds.

Returns:

The loaded spike train data.

Return type:

sd (SpikeData)

spikelab.data_loaders.data_loaders.load_spikedata_from_kilosort(folder, *, fs_Hz, spike_times_file='spike_times.npy', spike_clusters_file='spike_clusters.npy', cluster_info_tsv=None, time_unit='samples', include_noise=False, length_ms=None, channel_map_file='channel_map.npy', channel_positions_file='channel_positions.npy')[source]

Load KiloSort/Phy outputs into SpikeData.

Parameters:

folder (str) – Path to the KiloSort/Phy output directory.
fs_Hz (float) – Sampling frequency in Hz.
spike_times_file (str) – Path to the spike_times.npy file.
spike_clusters_file (str) – Path to the spike_clusters.npy file.
cluster_info_tsv (str | None) – Path to the cluster info TSV file.
time_unit (str) – Unit of the spike times (‘samples’, ‘s’, or ‘ms’).
include_noise (bool) – If True, include noise clusters.
length_ms (float | None) – Recording duration in milliseconds.
channel_map_file (str) – Filename of the channel map file relative to folder. Expected format: 1D numpy array mapping cluster indices to channel numbers.
channel_positions_file (str) – Filename of the channel positions file relative to folder. Expected format: 2D numpy array of shape (channels, 3) containing channel positions.

Returns:

The loaded spike train data.

Return type:

sd (SpikeData)

Notes

This loader does not extract or include waveform data; only spike times and cluster assignments are loaded.
Reads spike_times.npy (samples) and spike_clusters.npy; groups times per cluster and converts to ms using fs_Hz.

spikelab.data_loaders.data_loaders.load_spikedata_from_spikeinterface(sorting, *, sampling_frequency=None, unit_ids=None, segment_index=0)[source]

Convert a SpikeInterface SortingExtractor-like object to SpikeData.

Parameters:

sorting (object) – Exposes get_unit_ids(), get_sampling_frequency(), get_unit_spike_train(…).
sampling_frequency (float | None) – Optional override for sampling frequency (Hz).
unit_ids (Sequence | None) – Optional subset of unit IDs to include.
segment_index (int) – Segment index for multi-segment sortings.

Returns:

The converted spike train data.

Return type:

sd (SpikeData)

spikelab.data_loaders.data_loaders.load_spikedata_from_spikeinterface_recording(recording, *, segment_index=0, threshold_sigma=5.0, filter=False, hysteresis=True, direction='both')[source]

Convert a SpikeInterface BaseRecording-like object into SpikeData.

Parameters:

recording (object) – Exposes get_traces(segment_index=…), get_sampling_frequency(), get_num_channels().
segment_index (int) – Segment index for multi-segment recordings.
threshold_sigma (float) – Threshold in units of per-channel standard deviation.
filter (dict | bool) – If True, apply default Butterworth bandpass; if dict, pass to filter; if False, no filtering.
hysteresis (bool) – Use rising-edge detection if True.
direction (str) – ‘both’ | ‘up’ | ‘down’.

Returns:

The converted spike train data.

Return type:

sd (SpikeData)

spikelab.data_loaders.data_loaders.load_spikedata_from_pickle(filepath, *, aws_access_key_id=None, aws_secret_access_key=None, aws_session_token=None, region_name=None)[source]

Load a SpikeData object from a pickle file.

Warning

Only load pickle files from trusted sources. Pickle deserialization can execute arbitrary code and should never be used with untrusted data. The file is deserialized before type checking — malicious payloads execute regardless of the subsequent isinstance check.

Parameters:

filepath (str) – Path to the pickle file, or an S3 URL (s3://bucket/key).
aws_access_key_id (str | None) – AWS access key ID for S3 downloads.
aws_secret_access_key (str | None) – AWS secret access key for S3 downloads.
aws_session_token (str | None) – AWS session token for temporary credentials.
region_name (str | None) – AWS region name for S3 access.

Returns:

The deserialized SpikeData object.

Return type:

sd (SpikeData)

spikelab.data_loaders.data_loaders.load_spikedata_from_ibl(eid, pid, *, length_ms=None)[source]

Load spike trains for a single IBL probe into SpikeData.

Authenticates against the public IBL server automatically. Only units labelled as good (label == 1) in the Brain-Wide Map unit table are included. Trial event times are stored in SpikeData.metadata as individual numpy arrays, all in milliseconds.

Parameters:

eid (str) – IBL experiment ID (UUID string).
pid (str) – IBL probe ID (UUID string).
length_ms (float | None) – Recording duration in milliseconds. If not provided, the maximum spike time across all units is used.

Returns:

Loaded spike train data.: neuron_attributes contains {"region": <Beryl atlas region>} per unit. metadata contains eid, pid, n_trials, trial_start_times, trial_end_times, stim_on_times, stim_off_times, go_cue_times, response_times, feedback_times, first_movement_times, choice, feedback_type, contrast_left, contrast_right, and probability_left. All time arrays are in milliseconds.

Return type:

sd (SpikeData)

Notes

Requires one-api and brainwidemap packages (optional dependencies).
Spike times are converted from seconds (IBL convention) to milliseconds.
Trial times are converted from seconds to milliseconds.
Probe collection is inferred from the PID suffix; falls back through alf/probe00/pykilosort, alf/probe01/pykilosort, and alf.

spikelab.data_loaders.data_loaders.query_ibl_probes(target_regions=None, *, min_units=0, min_fraction_in_target=0.0)[source]

Search the IBL Brain-Wide Map database for probes matching given criteria.

Authenticates against the public IBL server automatically. Filters probes by brain region and unit count. Returns matching (eid, pid) pairs alongside a per-probe statistics DataFrame.

Parameters:

target_regions (list[str] | None) – Beryl atlas region names to filter by (e.g. ["MOs", "MOp"]). If None, no region filter is applied.
min_units (int) – Minimum number of good units required per probe. Default 0 (no minimum).
min_fraction_in_target (float) – Minimum fraction (0–1) of good units that must fall within target_regions. Ignored when target_regions is None. Default 0.0.

Returns:

List of (eid, pid) pairs for: probes that pass all filters, sorted by descending good unit count.
stats (pd.DataFrame): One row per matching probe with columns:: eid, pid, n_good_units, and (when target_regions is not None) n_in_target and fraction_in_target.

Return type:

probes (list[tuple[str, str]])

Notes

Requires one-api and brainwidemap packages (optional dependencies).
bwm_units() fetches the full Brain-Wide Map unit table from the IBL server; this may take several seconds on first call.

spikelab.data_loaders.data_loaders.load_spikedata_from_spikelab_sorted_npz(filepath, *, length_ms=None)[source]

Load a SpikeLab compiled sorting result (.npz) into SpikeData.

These .npz files are produced by sort_with_kilosort2()’s compile_results step and contain per-unit spike trains, electrode locations, waveform templates, and quality metrics.

Parameters:

filepath (str) – Path to the .npz file.
length_ms (float | None) – Recording duration in milliseconds. Inferred from the latest spike time when None.

Returns:

The loaded spike train data with neuron attributes: (unit_id, location, electrode, template, amplitudes, etc.).

Return type:

sd (SpikeData)