Loading Data
SpikeLab supports loading spike train data from a variety of common
electrophysiology formats. All loaders return a
SpikeData object and convert spike
times to milliseconds where possible, since all analyses assume this unit.
The main loader functions live in
spikelab.data_loaders.data_loaders. In addition,
SpikeData provides static constructors
for building spike data directly from arrays or raw traces.
From pickle files
If you have a previously saved SpikeData object in a pickle file, you can
load it with the standard library or with the SpikeLab convenience loader.
Using the standard library:
import pickle
with open("my_data.pkl", "rb") as f:
sd = pickle.load(f)
Using the SpikeLab loader (which also supports S3 URLs):
from spikelab.data_loaders.data_loaders import load_spikedata_from_pickle
sd = load_spikedata_from_pickle("my_data.pkl")
# Load from S3
sd = load_spikedata_from_pickle(
"s3://my-bucket/my_data.pkl",
aws_access_key_id="...",
aws_secret_access_key="...",
)
From HDF5 files
HDF5 is the most flexible format. SpikeLab supports four different storage styles within an HDF5 file:
Raster – a
(N, T)spike count matrix stored as a single dataset.Ragged – a flat array of spike times plus an index array that marks the boundaries of each unit (NWB-like layout).
Group – one HDF5 group per unit, each containing a 1-D array of spike times.
Paired – two parallel 1-D arrays: unit indices and spike times.
You choose which style to load by setting the corresponding parameters. Exactly one style must be specified per call.
from spikelab.data_loaders.data_loaders import load_spikedata_from_hdf5
# Raster style
sd = load_spikedata_from_hdf5(
"recording.h5",
raster_dataset="raster",
raster_bin_size_ms=1.0,
)
# Ragged style (spike_times + spike_times_index)
sd = load_spikedata_from_hdf5(
"recording.h5",
spike_times_dataset="spike_times",
spike_times_index_dataset="spike_times_index",
spike_times_unit="s", # times in the file are in seconds
)
# Group-per-unit style
sd = load_spikedata_from_hdf5(
"recording.h5",
group_per_unit="units",
group_time_unit="s",
)
# Paired style (idces + times)
sd = load_spikedata_from_hdf5(
"recording.h5",
idces_dataset="idces",
times_dataset="times",
times_unit="ms",
)
The spike_times_unit parameter
For the ragged style, the spike_times_unit parameter (default 's')
tells the loader what unit the times in the file are stored in. The loader
converts them to milliseconds automatically. Set this to 'ms' if your file
already stores times in milliseconds.
Analogous parameters exist for the other styles: group_time_unit for group
style and times_unit for paired style.
From NWB files
SpikeLab can load spike trains from NWB (Neurodata Without Borders) files. The
loader reads the /units group and populates neuron_attributes with
unit_id, electrode (or electrode_group/channel), location,
and electrode positions (x, y, z) when these are present in the
file.
from spikelab.data_loaders.data_loaders import load_spikedata_from_nwb
sd = load_spikedata_from_nwb("session.nwb")
# Optionally control the backend and recording length
sd = load_spikedata_from_nwb(
"session.nwb",
prefer_pynwb=True, # try pynwb first, fall back to h5py
length_ms=600000.0, # truncate to first 10 minutes
)
The NWB loader tries pynwb first by default. If pynwb is not installed
it falls back to reading the HDF5 file directly with h5py.
From KiloSort/Phy output
KiloSort and Phy produce a folder of .npy files. SpikeLab reads
spike_times.npy and spike_clusters.npy and optionally filters clusters
using the TSV cluster info file produced by Phy (keeping good and mua
labels by default).
The fs_Hz parameter is required – it specifies the sampling frequency so
that sample indices can be converted to milliseconds.
from spikelab.data_loaders.data_loaders import load_spikedata_from_kilosort
sd = load_spikedata_from_kilosort(
"path/to/kilosort_output/",
fs_Hz=30000,
)
# With cluster filtering and custom file names
sd = load_spikedata_from_kilosort(
"path/to/kilosort_output/",
fs_Hz=30000,
cluster_info_tsv="cluster_info.tsv",
include_noise=False,
spike_times_file="spike_times.npy",
spike_clusters_file="spike_clusters.npy",
)
The loader populates neuron_attributes with unit_id, electrode,
and location when the corresponding information is available in the
KiloSort output.
From SpikeInterface
If you are using SpikeInterface
for spike sorting, you can convert a SortingExtractor directly into a
SpikeData:
from spikelab.data_loaders.data_loaders import load_spikedata_from_spikeinterface
# sorting is any SpikeInterface SortingExtractor-like object
sd = load_spikedata_from_spikeinterface(
sorting,
sampling_frequency=30000, # override if not set on the object
unit_ids=None, # load all units
segment_index=0,
)
SpikeLab also supports loading from a SpikeInterface BaseRecording object,
which applies threshold detection to the raw traces:
from spikelab.data_loaders.data_loaders import load_spikedata_from_spikeinterface_recording
sd = load_spikedata_from_spikeinterface_recording(
recording,
segment_index=0,
threshold_sigma=5.0,
filter=False,
hysteresis=True,
direction="both",
)
From SpikeLab sorted .npz
The SpikeLab spike-sorting pipeline (see Spike Sorting and Curation) can compile
its output into .npz files containing per-unit spike trains, electrode
locations, waveform templates, and quality metrics. Load these with:
from spikelab.data_loaders.data_loaders import load_spikedata_from_spikelab_sorted_npz
sd = load_spikedata_from_spikelab_sorted_npz(
"sorted_results.npz",
length_ms=600000.0, # optional; inferred from latest spike if omitted
)
The loader populates neuron_attributes with electrode positions, waveform
templates, and quality metrics when available.
From the IBL database
SpikeLab can load spike trains directly from the International Brain
Laboratory public server. This
requires the one-api and brainwidemap packages.
First, search for probes matching your criteria:
from spikelab.data_loaders.data_loaders import query_ibl_probes
probes, stats_df = query_ibl_probes(
target_regions=["MOs", "MOp"], # Beryl atlas region names
min_units=20, # minimum good units per probe
min_fraction_in_target=0.5, # at least 50% of units in target regions
)
# probes is a list of (eid, pid) tuples
# stats_df is a pandas DataFrame with per-probe statistics
Then load a specific probe:
from spikelab.data_loaders.data_loaders import load_spikedata_from_ibl
eid, pid = probes[0]
sd = load_spikedata_from_ibl(eid, pid)
Only units labelled as good in the Brain-Wide Map unit table are included.
Trial event times are stored in sd.metadata as numpy arrays in
milliseconds.
From Neo SpikeTrains
If you have a list of Neo SpikeTrain
objects, convert them directly using the static constructor:
from spikelab import SpikeData
# spiketrains is a list of neo.SpikeTrain objects
sd = SpikeData.from_neo_spiketrains(spiketrains)
The constructor converts spike times to milliseconds automatically using the
units attached to each SpikeTrain.
From raw data
SpikeData provides two static
constructors for building spike data from arrays that do not require any
external file format.
From threshold detection
If you have raw voltage traces as a NumPy array of shape (channels, time),
you can detect spikes using a threshold crossing method:
from spikelab.spikedata.spike_data import SpikeData
import numpy as np
raw_traces = np.random.randn(64, 600000) # 64 channels, 30 s at 20 kHz
sd = SpikeData.from_thresholding(
raw_traces,
fs_Hz=20000,
threshold_sigma=5.0,
filter=True, # 300-6000 Hz Butterworth bandpass
hysteresis=True,
direction="both", # detect both positive and negative crossings
)
The resulting SpikeData object has the original traces attached as
raw_data and raw_time attributes.
From a spike count raster
If you already have a (N, T) spike count raster, you can convert it
directly:
from spikelab.spikedata.spike_data import SpikeData
import numpy as np
raster = np.random.poisson(0.01, size=(32, 100000))
sd = SpikeData.from_raster(raster, bin_size_ms=1.0)
Spikes are placed evenly within each bin.