Loading Data

SpikeLab supports loading spike train data from a variety of common electrophysiology formats. All loaders return a SpikeData object and convert spike times to milliseconds where possible, since all analyses assume this unit.

The main loader functions live in spikelab.data_loaders.data_loaders. In addition, SpikeData provides static constructors for building spike data directly from arrays or raw traces.

From pickle files

If you have a previously saved SpikeData object in a pickle file, you can load it with the standard library or with the SpikeLab convenience loader.

Using the standard library:

import pickle

with open("my_data.pkl", "rb") as f:
    sd = pickle.load(f)

Using the SpikeLab loader (which also supports S3 URLs):

from spikelab.data_loaders.data_loaders import load_spikedata_from_pickle

sd = load_spikedata_from_pickle("my_data.pkl")

# Load from S3
sd = load_spikedata_from_pickle(
    "s3://my-bucket/my_data.pkl",
    aws_access_key_id="...",
    aws_secret_access_key="...",
)

From HDF5 files

HDF5 is the most flexible format. SpikeLab supports four different storage styles within an HDF5 file:

Raster – a (N, T) spike count matrix stored as a single dataset.
Ragged – a flat array of spike times plus an index array that marks the boundaries of each unit (NWB-like layout).
Group – one HDF5 group per unit, each containing a 1-D array of spike times.
Paired – two parallel 1-D arrays: unit indices and spike times.

You choose which style to load by setting the corresponding parameters. Exactly one style must be specified per call.

from spikelab.data_loaders.data_loaders import load_spikedata_from_hdf5

# Raster style
sd = load_spikedata_from_hdf5(
    "recording.h5",
    raster_dataset="raster",
    raster_bin_size_ms=1.0,
)

# Ragged style (spike_times + spike_times_index)
sd = load_spikedata_from_hdf5(
    "recording.h5",
    spike_times_dataset="spike_times",
    spike_times_index_dataset="spike_times_index",
    spike_times_unit="s",       # times in the file are in seconds
)

# Group-per-unit style
sd = load_spikedata_from_hdf5(
    "recording.h5",
    group_per_unit="units",
    group_time_unit="s",
)

# Paired style (idces + times)
sd = load_spikedata_from_hdf5(
    "recording.h5",
    idces_dataset="idces",
    times_dataset="times",
    times_unit="ms",
)

The `spike_times_unit` parameter

For the ragged style, the spike_times_unit parameter (default 's') tells the loader what unit the times in the file are stored in. The loader converts them to milliseconds automatically. Set this to 'ms' if your file already stores times in milliseconds.

Analogous parameters exist for the other styles: group_time_unit for group style and times_unit for paired style.

From NWB files

SpikeLab can load spike trains from NWB (Neurodata Without Borders) files. The loader reads the /units group and populates neuron_attributes with unit_id, electrode (or electrode_group/channel), location, and electrode positions (x, y, z) when these are present in the file.

from spikelab.data_loaders.data_loaders import load_spikedata_from_nwb

sd = load_spikedata_from_nwb("session.nwb")

# Optionally control the backend and recording length
sd = load_spikedata_from_nwb(
    "session.nwb",
    prefer_pynwb=True,    # try pynwb first, fall back to h5py
    length_ms=600000.0,   # truncate to first 10 minutes
)

The NWB loader tries pynwb first by default. If pynwb is not installed it falls back to reading the HDF5 file directly with h5py.

From KiloSort/Phy output

KiloSort and Phy produce a folder of .npy files. SpikeLab reads spike_times.npy and spike_clusters.npy and optionally filters clusters using the TSV cluster info file produced by Phy (keeping good and mua labels by default).

The fs_Hz parameter is required – it specifies the sampling frequency so that sample indices can be converted to milliseconds.

from spikelab.data_loaders.data_loaders import load_spikedata_from_kilosort

sd = load_spikedata_from_kilosort(
    "path/to/kilosort_output/",
    fs_Hz=30000,
)

# With cluster filtering and custom file names
sd = load_spikedata_from_kilosort(
    "path/to/kilosort_output/",
    fs_Hz=30000,
    cluster_info_tsv="cluster_info.tsv",
    include_noise=False,
    spike_times_file="spike_times.npy",
    spike_clusters_file="spike_clusters.npy",
)

The loader populates neuron_attributes with unit_id, electrode, and location when the corresponding information is available in the KiloSort output.

From SpikeInterface

If you are using SpikeInterface for spike sorting, you can convert a SortingExtractor directly into a SpikeData:

from spikelab.data_loaders.data_loaders import load_spikedata_from_spikeinterface

# sorting is any SpikeInterface SortingExtractor-like object
sd = load_spikedata_from_spikeinterface(
    sorting,
    sampling_frequency=30000,   # override if not set on the object
    unit_ids=None,              # load all units
    segment_index=0,
)

SpikeLab also supports loading from a SpikeInterface BaseRecording object, which applies threshold detection to the raw traces:

from spikelab.data_loaders.data_loaders import load_spikedata_from_spikeinterface_recording

sd = load_spikedata_from_spikeinterface_recording(
    recording,
    segment_index=0,
    threshold_sigma=5.0,
    filter=False,
    hysteresis=True,
    direction="both",
)

From SpikeLab sorted .npz

The SpikeLab spike-sorting pipeline (see Spike Sorting and Curation) can compile its output into .npz files containing per-unit spike trains, electrode locations, waveform templates, and quality metrics. Load these with:

from spikelab.data_loaders.data_loaders import load_spikedata_from_spikelab_sorted_npz

sd = load_spikedata_from_spikelab_sorted_npz(
    "sorted_results.npz",
    length_ms=600000.0,  # optional; inferred from latest spike if omitted
)

The loader populates neuron_attributes with electrode positions, waveform templates, and quality metrics when available.

From the IBL database

SpikeLab can load spike trains directly from the International Brain Laboratory public server. This requires the one-api and brainwidemap packages.

First, search for probes matching your criteria:

from spikelab.data_loaders.data_loaders import query_ibl_probes

probes, stats_df = query_ibl_probes(
    target_regions=["MOs", "MOp"],   # Beryl atlas region names
    min_units=20,                     # minimum good units per probe
    min_fraction_in_target=0.5,       # at least 50% of units in target regions
)

# probes is a list of (eid, pid) tuples
# stats_df is a pandas DataFrame with per-probe statistics

Then load a specific probe:

from spikelab.data_loaders.data_loaders import load_spikedata_from_ibl

eid, pid = probes[0]
sd = load_spikedata_from_ibl(eid, pid)

Only units labelled as good in the Brain-Wide Map unit table are included. Trial event times are stored in sd.metadata as numpy arrays in milliseconds.

From Neo SpikeTrains

If you have a list of Neo SpikeTrain objects, convert them directly using the static constructor:

from spikelab import SpikeData

# spiketrains is a list of neo.SpikeTrain objects
sd = SpikeData.from_neo_spiketrains(spiketrains)

The constructor converts spike times to milliseconds automatically using the units attached to each SpikeTrain.

From raw data

SpikeData provides two static constructors for building spike data from arrays that do not require any external file format.

From threshold detection

If you have raw voltage traces as a NumPy array of shape (channels, time), you can detect spikes using a threshold crossing method:

from spikelab.spikedata.spike_data import SpikeData
import numpy as np

raw_traces = np.random.randn(64, 600000)  # 64 channels, 30 s at 20 kHz

sd = SpikeData.from_thresholding(
    raw_traces,
    fs_Hz=20000,
    threshold_sigma=5.0,
    filter=True,           # 300-6000 Hz Butterworth bandpass
    hysteresis=True,
    direction="both",      # detect both positive and negative crossings
)

The resulting SpikeData object has the original traces attached as raw_data and raw_time attributes.

From a spike count raster

If you already have a (N, T) spike count raster, you can convert it directly:

from spikelab.spikedata.spike_data import SpikeData
import numpy as np

raster = np.random.poisson(0.01, size=(32, 100000))

sd = SpikeData.from_raster(raster, bin_size_ms=1.0)

Spikes are placed evenly within each bin.