Spike Sorting

The spikelab.spike_sorting sub-package provides a full spike-sorting pipeline: loading raw recordings, running a sorter backend (Kilosort2, Kilosort4, or RT-Sort), extracting waveforms, curating units, and compiling results into SpikeData objects.

See the Spike Sorting and Curation guide for usage examples and environment setup instructions.

Entry Points

spikelab.spike_sorting.sort_recording(recording_files, config=None, sorter='kilosort2', intermediate_folders=None, results_folders=None, **kwargs)[source]

Run spike sorting on one or more recordings using any registered backend.

This is the primary entry point for the modular sorting pipeline.

Parameters:
  • recording_files (list) – Paths to recording files or directories. Each entry is sorted independently. Directories have their contents concatenated before sorting and split back into per-file SpikeData afterward.

  • config (SortingPipelineConfig or None) – Pre-built configuration. When provided, **kwargs are applied as overrides via config.override(). When None, a fresh config is built from sorter + **kwargs. Preset configs are available in spikelab.spike_sorting.config (e.g. KILOSORT2).

  • sorter (str) – Registered sorter backend name. Only used when config is None. Available: "kilosort2", "kilosort4".

  • intermediate_folders (list or None) – Intermediate result directories, one per recording. Auto-generated if None.

  • results_folders (list or None) – Output directories, one per recording. Auto-generated if None.

  • **kwargs – Override individual config fields (e.g. snr_min=5.0, use_docker=True, fr_min=0.05). See spikelab.spike_sorting.config for all available parameters, grouped by: RecordingConfig, SorterConfig, WaveformConfig, CurationConfig, CompilationConfig, FigureConfig, ExecutionConfig.

Returns:

One SpikeData per original recording

file. For directory inputs, the concatenated recording is split back into per-file SpikeData objects.

Return type:

results (list[SpikeData])

Notes

  • Pickle files (sorted_spikedata_curated.pkl and optionally sorted_spikedata.pkl) are saved to each results folder.

  • hdf5_plugin_path (passed via config or kwargs) sets os.environ['HDF5_PLUGIN_PATH'] before any recording is loaded. This is needed for Maxwell .h5 files and applies to all backends.

spikelab.spike_sorting.sort_multistream(recording, stream_ids, config=None, sorter='kilosort2', **kwargs)[source]

Sort a multi-stream recording across multiple stream IDs.

Calls sort_recording once per stream ID, routing each stream to its own intermediate and results folders. Validates that the requested stream IDs exist in the recording file before sorting.

Parameters:
  • recording (str or Path) – Path to a single multi-stream recording file (e.g. MaxTwo .raw.h5) or a directory of such files. When a directory is given, all files are concatenated per stream.

  • stream_ids (list of str) – Stream identifiers to sort, e.g. ["well000", "well001", "well002"].

  • config (SortingPipelineConfig or None) – Pre-built configuration. When provided, **kwargs are applied as overrides.

  • sorter (str) – Registered sorter backend name (default "kilosort2"). Only used when config is None.

  • **kwargs

    Override individual config fields. The following must not be provided:

    • intermediate_folders and results_folders are auto-generated per stream.

    • stream_id is set automatically per iteration.

Returns:

{stream_id: list[SpikeData]}.

Return type:

results (dict)

Notes

  • Stream ID validation uses SpikeInterface’s extractor for the recording format. Currently supports Maxwell .h5 files. For other formats, validation is skipped and invalid stream IDs will produce errors at loading time.

  • When recording is a directory of files, each file is concatenated per stream before sorting. Channel count and sampling frequency must match across files (raises ValueError); mismatched channel IDs or locations produce warnings.

Configuration

Configuration dataclass for the spike sorting pipeline.

Replaces the ~80 module-level globals in kilosort2.py with a single typed, inspectable configuration object that is passed explicitly to every pipeline function.

class spikelab.spike_sorting.config.RecordingConfig(stream_id=None, hdf5_plugin_path=None, first_n_mins=None, mea_y_max=None, gain_to_uv=None, offset_to_uv=None, rec_chunks=<factory>, rec_chunks_s=<factory>, start_time_s=None, end_time_s=None, freq_min=300, freq_max=6000)[source]

Bases: object

Parameters for recording loading and preprocessing.

stream_id: Optional[str] = None
hdf5_plugin_path: Optional[str] = None
first_n_mins: Optional[float] = None
mea_y_max: Optional[int] = None
gain_to_uv: Optional[float] = None
offset_to_uv: Optional[float] = None
rec_chunks: List[Tuple[int, int]]
rec_chunks_s: List[Tuple[float, float]]
start_time_s: Optional[float] = None
end_time_s: Optional[float] = None
freq_min: int = 300
freq_max: int = 6000
__init__(stream_id=None, hdf5_plugin_path=None, first_n_mins=None, mea_y_max=None, gain_to_uv=None, offset_to_uv=None, rec_chunks=<factory>, rec_chunks_s=<factory>, start_time_s=None, end_time_s=None, freq_min=300, freq_max=6000)
class spikelab.spike_sorting.config.SorterConfig(sorter_name='kilosort2', sorter_path=None, sorter_params=None, use_docker=False)[source]

Bases: object

Parameters for the spike sorter itself.

sorter_name: str = 'kilosort2'
sorter_path: Optional[str] = None
sorter_params: Optional[Dict[str, Any]] = None
use_docker: bool = False
__init__(sorter_name='kilosort2', sorter_path=None, sorter_params=None, use_docker=False)
class spikelab.spike_sorting.config.RTSortConfig(model_path=None, probe='mea', device='cuda', num_processes=None, recording_window_ms=None, save_rt_sort_pickle=True, delete_inter=False, verbose=True, params=None, detection_window_s=None)[source]

Bases: object

Parameters for the RT-Sort detection and sorting backend.

RT-Sort is an action-potential-propagation-based spike sorter using a deep learning detection model followed by codetection clustering and template matching. See van der Molen, Lim et al. 2024 (PLOS ONE, DOI: 10.1371/journal.pone.0312438) for algorithmic details.

Parameters:
  • model_path (str or None) – Path to a folder containing init_dict.json and state_dict.pt for a pretrained ModelSpikeSorter. When None, the bundled model corresponding to probe is loaded.

  • probe (str) – Which bundled pretrained model to use when model_path is None. "mea" or "neuropixels".

  • device (str) – PyTorch device for inference. "cuda" or "cpu".

  • num_processes (int or None) – Number of worker processes for parallel detection/clustering stages. None selects an automatic value based on CPU count.

  • recording_window_ms (tuple or None) – (start_ms, end_ms) window of the recording to process. None processes the entire recording.

  • save_rt_sort_pickle (bool) – If True, serialize the final RTSort object to the sorter output folder so the trained sequences can be re-used in Phase 2 stim-aware sorting.

  • delete_inter (bool) – If True, delete the intermediate cache directory after sorting completes.

  • verbose (bool) – Print progress messages during sorting.

  • params (dict or None) – Override dictionary merged into the RT-Sort parameter set. Takes precedence over the preset defaults; useful for one-off tuning without editing a preset. Keys must match detect_sequences parameter names.

  • detection_window_s (float or None) – If set, run sequence detection on only the first detection_window_s seconds of the recording (the heavy GPU + clustering phase), then apply the resulting sequences to the full recording during sort_offline. Decouples the detection-phase memory ceiling from total recording length. None uses the full window for both phases (legacy behavior).

model_path: Optional[str] = None
probe: str = 'mea'
device: str = 'cuda'
num_processes: Optional[int] = None
recording_window_ms: Optional[Any] = None
save_rt_sort_pickle: bool = True
delete_inter: bool = False
verbose: bool = True
params: Optional[Dict[str, Any]] = None
detection_window_s: Optional[float] = None
__init__(model_path=None, probe='mea', device='cuda', num_processes=None, recording_window_ms=None, save_rt_sort_pickle=True, delete_inter=False, verbose=True, params=None, detection_window_s=None)
class spikelab.spike_sorting.config.WaveformConfig(ms_before=2.0, ms_after=2.0, pos_peak_thresh=2.0, max_waveforms_per_unit=300, compiled_ms_before=2.0, compiled_ms_after=2.0, scale_compiled_waveforms=True, std_at_peak=True, std_over_window_ms_before=0.5, std_over_window_ms_after=1.5, streaming=True, save_waveform_files=True)[source]

Bases: object

Parameters for waveform extraction and template computation.

Memory-budget note: the default extractor pre-allocates one (n_spikes, nsamples, num_channels) .npy memmap per unit before extraction begins. For high-unit-count sorters on high-density MEAs this grows to tens of GB (e.g. 400 units × 1018 channels = ~39 GB). When that exceeds host RAM, set streaming=True to use a one-unit-at-a-time path that discards each unit’s waveforms after templates and metrics are computed — peak RAM becomes one unit’s buffer (~100 MB for MaxOne) regardless of total unit count. Waveform files are only written when save_waveform_files=True.

ms_before: float = 2.0
ms_after: float = 2.0
pos_peak_thresh: float = 2.0
max_waveforms_per_unit: int = 300
compiled_ms_before: float = 2.0
compiled_ms_after: float = 2.0
scale_compiled_waveforms: bool = True
std_at_peak: bool = True
std_over_window_ms_before: float = 0.5
std_over_window_ms_after: float = 1.5
streaming: bool = True
save_waveform_files: bool = True
__init__(ms_before=2.0, ms_after=2.0, pos_peak_thresh=2.0, max_waveforms_per_unit=300, compiled_ms_before=2.0, compiled_ms_after=2.0, scale_compiled_waveforms=True, std_at_peak=True, std_over_window_ms_before=0.5, std_over_window_ms_after=1.5, streaming=True, save_waveform_files=True)
class spikelab.spike_sorting.config.CurationConfig(curate_first=True, curate_second=True, curation_epoch=None, fr_min=0.05, isi_viol_max=0.01, isi_violation_method='percent', snr_min=5.0, spikes_min_first=30, spikes_min_second=50, std_norm_max=1.0)[source]

Bases: object

Parameters for unit quality-control curation.

curate_first: bool = True
curate_second: bool = True
curation_epoch: Optional[int] = None
fr_min: Optional[float] = 0.05
isi_viol_max: Optional[float] = 0.01
isi_violation_method: str = 'percent'
snr_min: Optional[float] = 5.0
spikes_min_first: Optional[int] = 30
spikes_min_second: Optional[int] = 50
std_norm_max: Optional[float] = 1.0
__init__(curate_first=True, curate_second=True, curation_epoch=None, fr_min=0.05, isi_viol_max=0.01, isi_violation_method='percent', snr_min=5.0, spikes_min_first=30, spikes_min_second=50, std_norm_max=1.0)
class spikelab.spike_sorting.config.CompilationConfig(compile_single_recording=True, compile_to_mat=False, compile_to_npz=True, compile_waveforms=False, save_electrodes=True, save_spike_times=True, save_raw_pkl=False, save_dl_data=False)[source]

Bases: object

Parameters for result compilation and export.

compile_single_recording: bool = True
compile_to_mat: bool = False
compile_to_npz: bool = True
compile_waveforms: bool = False
save_electrodes: bool = True
save_spike_times: bool = True
save_raw_pkl: bool = False
save_dl_data: bool = False
__init__(compile_single_recording=True, compile_to_mat=False, compile_to_npz=True, compile_waveforms=False, save_electrodes=True, save_spike_times=True, save_raw_pkl=False, save_dl_data=False)
class spikelab.spike_sorting.config.FigureConfig(create_figures=False, create_unit_figures=False, dpi=None, font_size=12, bar_x_label='Recording', bar_y_label='Number of Units', bar_label_rotation=0, bar_total_label='First Curation', bar_selected_label='Selected Curation', scatter_std_max_units_per_recording=None, scatter_recording_colors=<factory>, scatter_recording_alpha=1.0, scatter_x_label='Number of Spikes', scatter_y_label='avg. STD / amplitude', scatter_x_max_buffer=300.0, scatter_y_max_buffer=0.2, templates_color_curated='#000000', templates_color_failed='#FF0000', templates_per_column=50, templates_y_spacing=50.0, templates_y_lim_buffer=10.0, templates_window_ms_before=5.0, templates_window_ms_after=5.0, templates_line_ms_before=1.0, templates_line_ms_after=4.0, templates_x_label='Time Rel. to Peak (ms)')[source]

Bases: object

Parameters for QC figure generation.

create_figures: bool = False
create_unit_figures: bool = False
dpi: Optional[int] = None
font_size: int = 12
bar_x_label: str = 'Recording'
bar_y_label: str = 'Number of Units'
bar_label_rotation: int = 0
bar_total_label: str = 'First Curation'
bar_selected_label: str = 'Selected Curation'
scatter_std_max_units_per_recording: Optional[int] = None
scatter_recording_colors: List[str]
scatter_recording_alpha: float = 1.0
scatter_x_label: str = 'Number of Spikes'
scatter_y_label: str = 'avg. STD / amplitude'
scatter_x_max_buffer: float = 300.0
scatter_y_max_buffer: float = 0.2
templates_color_curated: str = '#000000'
templates_color_failed: str = '#FF0000'
templates_per_column: int = 50
templates_y_spacing: float = 50.0
templates_y_lim_buffer: float = 10.0
templates_window_ms_before: float = 5.0
templates_window_ms_after: float = 5.0
templates_line_ms_before: Optional[float] = 1.0
templates_line_ms_after: Optional[float] = 4.0
templates_x_label: str = 'Time Rel. to Peak (ms)'
__init__(create_figures=False, create_unit_figures=False, dpi=None, font_size=12, bar_x_label='Recording', bar_y_label='Number of Units', bar_label_rotation=0, bar_total_label='First Curation', bar_selected_label='Selected Curation', scatter_std_max_units_per_recording=None, scatter_recording_colors=<factory>, scatter_recording_alpha=1.0, scatter_x_label='Number of Spikes', scatter_y_label='avg. STD / amplitude', scatter_x_max_buffer=300.0, scatter_y_max_buffer=0.2, templates_color_curated='#000000', templates_color_failed='#FF0000', templates_per_column=50, templates_y_spacing=50.0, templates_y_lim_buffer=10.0, templates_window_ms_before=5.0, templates_window_ms_after=5.0, templates_line_ms_before=1.0, templates_line_ms_after=4.0, templates_x_label='Time Rel. to Peak (ms)')
class spikelab.spike_sorting.config.ExecutionConfig(n_jobs=8, total_memory='16G', use_parallel_processing_for_raw_conversion=True, save_script=False, out_file='sort_with_kilosort2.out', random_seed=1, recompute_recording=False, recompute_sorting=False, reextract_waveforms=False, recurate_first=False, recurate_second=False, recompile_single_recording=False, delete_inter=True)[source]

Bases: object

Parameters for pipeline execution control.

n_jobs: int = 8
total_memory: str = '16G'
use_parallel_processing_for_raw_conversion: bool = True
save_script: bool = False
out_file: str = 'sort_with_kilosort2.out'
random_seed: int = 1
recompute_recording: bool = False
recompute_sorting: bool = False
reextract_waveforms: bool = False
recurate_first: bool = False
recurate_second: bool = False
recompile_single_recording: bool = False
delete_inter: bool = True
__init__(n_jobs=8, total_memory='16G', use_parallel_processing_for_raw_conversion=True, save_script=False, out_file='sort_with_kilosort2.out', random_seed=1, recompute_recording=False, recompute_sorting=False, reextract_waveforms=False, recurate_first=False, recurate_second=False, recompile_single_recording=False, delete_inter=True)
class spikelab.spike_sorting.config.SortingPipelineConfig(recording=<factory>, sorter=<factory>, rt_sort=<factory>, waveform=<factory>, curation=<factory>, compilation=<factory>, figures=<factory>, execution=<factory>)[source]

Bases: object

Complete configuration for a spike sorting pipeline run.

Groups all parameters into typed sub-configs. Passed explicitly to every pipeline function, replacing module-level globals.

Parameters:
recording: RecordingConfig
sorter: SorterConfig
rt_sort: RTSortConfig
waveform: WaveformConfig
curation: CurationConfig
compilation: CompilationConfig
figures: FigureConfig
execution: ExecutionConfig
classmethod from_kwargs(**kwargs)[source]

Build a config from flat keyword arguments.

Maps the flat parameter names used by sort_with_kilosort2() to the nested sub-config fields. Unknown keys raise TypeError.

Parameters:

**kwargs – Flat keyword arguments matching sort_with_kilosort2() parameter names.

Returns:

Populated configuration.

Return type:

config (SortingPipelineConfig)

override(**kwargs)[source]

Return a copy of this config with selected fields overridden.

Accepts the same flat keyword arguments as from_kwargs(). Unspecified fields retain their current values.

Parameters:

**kwargs – Flat keyword arguments to override.

Returns:

New config with overrides.

Return type:

config (SortingPipelineConfig)

__init__(recording=<factory>, sorter=<factory>, rt_sort=<factory>, waveform=<factory>, curation=<factory>, compilation=<factory>, figures=<factory>, execution=<factory>)
spikelab.spike_sorting.config.KILOSORT2 = SortingPipelineConfig(recording=RecordingConfig(stream_id=None, hdf5_plugin_path=None, first_n_mins=None, mea_y_max=None, gain_to_uv=None, offset_to_uv=None, rec_chunks=[], rec_chunks_s=[], start_time_s=None, end_time_s=None, freq_min=300, freq_max=6000), sorter=SorterConfig(sorter_name='kilosort2', sorter_path=None, sorter_params=None, use_docker=False), rt_sort=RTSortConfig(model_path=None, probe='mea', device='cuda', num_processes=None, recording_window_ms=None, save_rt_sort_pickle=True, delete_inter=False, verbose=True, params=None, detection_window_s=None), waveform=WaveformConfig(ms_before=2.0, ms_after=2.0, pos_peak_thresh=2.0, max_waveforms_per_unit=300, compiled_ms_before=2.0, compiled_ms_after=2.0, scale_compiled_waveforms=True, std_at_peak=True, std_over_window_ms_before=0.5, std_over_window_ms_after=1.5, streaming=True, save_waveform_files=True), curation=CurationConfig(curate_first=True, curate_second=True, curation_epoch=None, fr_min=0.05, isi_viol_max=0.01, isi_violation_method='percent', snr_min=5.0, spikes_min_first=30, spikes_min_second=50, std_norm_max=1.0), compilation=CompilationConfig(compile_single_recording=True, compile_to_mat=False, compile_to_npz=True, compile_waveforms=False, save_electrodes=True, save_spike_times=True, save_raw_pkl=False, save_dl_data=False), figures=FigureConfig(create_figures=False, create_unit_figures=False, dpi=None, font_size=12, bar_x_label='Recording', bar_y_label='Number of Units', bar_label_rotation=0, bar_total_label='First Curation', bar_selected_label='Selected Curation', scatter_std_max_units_per_recording=None, scatter_recording_colors=['#f74343', '#fccd56', '#74fc56', '#56fcf6', '#1e1efa', '#fa1ed2'], scatter_recording_alpha=1.0, scatter_x_label='Number of Spikes', scatter_y_label='avg. STD / amplitude', scatter_x_max_buffer=300.0, scatter_y_max_buffer=0.2, templates_color_curated='#000000', templates_color_failed='#FF0000', templates_per_column=50, templates_y_spacing=50.0, templates_y_lim_buffer=10.0, templates_window_ms_before=5.0, templates_window_ms_after=5.0, templates_line_ms_before=1.0, templates_line_ms_after=4.0, templates_x_label='Time Rel. to Peak (ms)'), execution=ExecutionConfig(n_jobs=8, total_memory='16G', use_parallel_processing_for_raw_conversion=True, save_script=False, out_file='sort_with_kilosort2.out', random_seed=1, recompute_recording=False, recompute_sorting=False, reextract_waveforms=False, recurate_first=False, recurate_second=False, recompile_single_recording=False, delete_inter=True))

Default configuration for Kilosort2. Parameters are compatible with Maxwell MEA and other probe types. Hardware-specific presets can be created by overriding parameters.

spikelab.spike_sorting.config.KILOSORT2_DOCKER = SortingPipelineConfig(recording=RecordingConfig(stream_id=None, hdf5_plugin_path=None, first_n_mins=None, mea_y_max=None, gain_to_uv=None, offset_to_uv=None, rec_chunks=[], rec_chunks_s=[], start_time_s=None, end_time_s=None, freq_min=300, freq_max=6000), sorter=SorterConfig(sorter_name='kilosort2', sorter_path=None, sorter_params=None, use_docker=True), rt_sort=RTSortConfig(model_path=None, probe='mea', device='cuda', num_processes=None, recording_window_ms=None, save_rt_sort_pickle=True, delete_inter=False, verbose=True, params=None, detection_window_s=None), waveform=WaveformConfig(ms_before=2.0, ms_after=2.0, pos_peak_thresh=2.0, max_waveforms_per_unit=300, compiled_ms_before=2.0, compiled_ms_after=2.0, scale_compiled_waveforms=True, std_at_peak=True, std_over_window_ms_before=0.5, std_over_window_ms_after=1.5, streaming=True, save_waveform_files=True), curation=CurationConfig(curate_first=True, curate_second=True, curation_epoch=None, fr_min=0.05, isi_viol_max=0.01, isi_violation_method='percent', snr_min=5.0, spikes_min_first=30, spikes_min_second=50, std_norm_max=1.0), compilation=CompilationConfig(compile_single_recording=True, compile_to_mat=False, compile_to_npz=True, compile_waveforms=False, save_electrodes=True, save_spike_times=True, save_raw_pkl=False, save_dl_data=False), figures=FigureConfig(create_figures=False, create_unit_figures=False, dpi=None, font_size=12, bar_x_label='Recording', bar_y_label='Number of Units', bar_label_rotation=0, bar_total_label='First Curation', bar_selected_label='Selected Curation', scatter_std_max_units_per_recording=None, scatter_recording_colors=['#f74343', '#fccd56', '#74fc56', '#56fcf6', '#1e1efa', '#fa1ed2'], scatter_recording_alpha=1.0, scatter_x_label='Number of Spikes', scatter_y_label='avg. STD / amplitude', scatter_x_max_buffer=300.0, scatter_y_max_buffer=0.2, templates_color_curated='#000000', templates_color_failed='#FF0000', templates_per_column=50, templates_y_spacing=50.0, templates_y_lim_buffer=10.0, templates_window_ms_before=5.0, templates_window_ms_after=5.0, templates_line_ms_before=1.0, templates_line_ms_after=4.0, templates_x_label='Time Rel. to Peak (ms)'), execution=ExecutionConfig(n_jobs=8, total_memory='16G', use_parallel_processing_for_raw_conversion=True, save_script=False, out_file='sort_with_kilosort2.out', random_seed=1, recompute_recording=False, recompute_sorting=False, reextract_waveforms=False, recurate_first=False, recurate_second=False, recompile_single_recording=False, delete_inter=True))

Kilosort2 with Docker (no local MATLAB needed).

spikelab.spike_sorting.config.KILOSORT4 = SortingPipelineConfig(recording=RecordingConfig(stream_id=None, hdf5_plugin_path=None, first_n_mins=None, mea_y_max=None, gain_to_uv=None, offset_to_uv=None, rec_chunks=[], rec_chunks_s=[], start_time_s=None, end_time_s=None, freq_min=300, freq_max=6000), sorter=SorterConfig(sorter_name='kilosort4', sorter_path=None, sorter_params=None, use_docker=False), rt_sort=RTSortConfig(model_path=None, probe='mea', device='cuda', num_processes=None, recording_window_ms=None, save_rt_sort_pickle=True, delete_inter=False, verbose=True, params=None, detection_window_s=None), waveform=WaveformConfig(ms_before=2.0, ms_after=2.0, pos_peak_thresh=2.0, max_waveforms_per_unit=300, compiled_ms_before=2.0, compiled_ms_after=2.0, scale_compiled_waveforms=True, std_at_peak=True, std_over_window_ms_before=0.5, std_over_window_ms_after=1.5, streaming=True, save_waveform_files=True), curation=CurationConfig(curate_first=True, curate_second=True, curation_epoch=None, fr_min=0.05, isi_viol_max=0.01, isi_violation_method='percent', snr_min=5.0, spikes_min_first=30, spikes_min_second=50, std_norm_max=1.0), compilation=CompilationConfig(compile_single_recording=True, compile_to_mat=False, compile_to_npz=True, compile_waveforms=False, save_electrodes=True, save_spike_times=True, save_raw_pkl=False, save_dl_data=False), figures=FigureConfig(create_figures=False, create_unit_figures=False, dpi=None, font_size=12, bar_x_label='Recording', bar_y_label='Number of Units', bar_label_rotation=0, bar_total_label='First Curation', bar_selected_label='Selected Curation', scatter_std_max_units_per_recording=None, scatter_recording_colors=['#f74343', '#fccd56', '#74fc56', '#56fcf6', '#1e1efa', '#fa1ed2'], scatter_recording_alpha=1.0, scatter_x_label='Number of Spikes', scatter_y_label='avg. STD / amplitude', scatter_x_max_buffer=300.0, scatter_y_max_buffer=0.2, templates_color_curated='#000000', templates_color_failed='#FF0000', templates_per_column=50, templates_y_spacing=50.0, templates_y_lim_buffer=10.0, templates_window_ms_before=5.0, templates_window_ms_after=5.0, templates_line_ms_before=1.0, templates_line_ms_after=4.0, templates_x_label='Time Rel. to Peak (ms)'), execution=ExecutionConfig(n_jobs=8, total_memory='16G', use_parallel_processing_for_raw_conversion=True, save_script=False, out_file='sort_with_kilosort2.out', random_seed=1, recompute_recording=False, recompute_sorting=False, reextract_waveforms=False, recurate_first=False, recurate_second=False, recompile_single_recording=False, delete_inter=True))

Default configuration for Kilosort4. Kilosort4 is pure Python (PyTorch) — no MATLAB required. Default parameters are tuned for Neuropixels probes but work for other probe types. Hardware-specific presets (e.g. for Maxwell MEAs) can be created by overriding detection/filtering parameters.

spikelab.spike_sorting.config.KILOSORT4_DOCKER = SortingPipelineConfig(recording=RecordingConfig(stream_id=None, hdf5_plugin_path=None, first_n_mins=None, mea_y_max=None, gain_to_uv=None, offset_to_uv=None, rec_chunks=[], rec_chunks_s=[], start_time_s=None, end_time_s=None, freq_min=300, freq_max=6000), sorter=SorterConfig(sorter_name='kilosort4', sorter_path=None, sorter_params=None, use_docker=True), rt_sort=RTSortConfig(model_path=None, probe='mea', device='cuda', num_processes=None, recording_window_ms=None, save_rt_sort_pickle=True, delete_inter=False, verbose=True, params=None, detection_window_s=None), waveform=WaveformConfig(ms_before=2.0, ms_after=2.0, pos_peak_thresh=2.0, max_waveforms_per_unit=300, compiled_ms_before=2.0, compiled_ms_after=2.0, scale_compiled_waveforms=True, std_at_peak=True, std_over_window_ms_before=0.5, std_over_window_ms_after=1.5, streaming=True, save_waveform_files=True), curation=CurationConfig(curate_first=True, curate_second=True, curation_epoch=None, fr_min=0.05, isi_viol_max=0.01, isi_violation_method='percent', snr_min=5.0, spikes_min_first=30, spikes_min_second=50, std_norm_max=1.0), compilation=CompilationConfig(compile_single_recording=True, compile_to_mat=False, compile_to_npz=True, compile_waveforms=False, save_electrodes=True, save_spike_times=True, save_raw_pkl=False, save_dl_data=False), figures=FigureConfig(create_figures=False, create_unit_figures=False, dpi=None, font_size=12, bar_x_label='Recording', bar_y_label='Number of Units', bar_label_rotation=0, bar_total_label='First Curation', bar_selected_label='Selected Curation', scatter_std_max_units_per_recording=None, scatter_recording_colors=['#f74343', '#fccd56', '#74fc56', '#56fcf6', '#1e1efa', '#fa1ed2'], scatter_recording_alpha=1.0, scatter_x_label='Number of Spikes', scatter_y_label='avg. STD / amplitude', scatter_x_max_buffer=300.0, scatter_y_max_buffer=0.2, templates_color_curated='#000000', templates_color_failed='#FF0000', templates_per_column=50, templates_y_spacing=50.0, templates_y_lim_buffer=10.0, templates_window_ms_before=5.0, templates_window_ms_after=5.0, templates_line_ms_before=1.0, templates_line_ms_after=4.0, templates_x_label='Time Rel. to Peak (ms)'), execution=ExecutionConfig(n_jobs=8, total_memory='16G', use_parallel_processing_for_raw_conversion=True, save_script=False, out_file='sort_with_kilosort2.out', random_seed=1, recompute_recording=False, recompute_sorting=False, reextract_waveforms=False, recurate_first=False, recurate_second=False, recompile_single_recording=False, delete_inter=True))

Kilosort4 with Docker.

spikelab.spike_sorting.config.RT_SORT_MEA = SortingPipelineConfig(recording=RecordingConfig(stream_id=None, hdf5_plugin_path=None, first_n_mins=None, mea_y_max=None, gain_to_uv=None, offset_to_uv=None, rec_chunks=[], rec_chunks_s=[], start_time_s=None, end_time_s=None, freq_min=300, freq_max=6000), sorter=SorterConfig(sorter_name='rt_sort', sorter_path=None, sorter_params=None, use_docker=False), rt_sort=RTSortConfig(model_path=None, probe='mea', device='cuda', num_processes=None, recording_window_ms=None, save_rt_sort_pickle=True, delete_inter=False, verbose=True, params=None, detection_window_s=None), waveform=WaveformConfig(ms_before=2.0, ms_after=2.0, pos_peak_thresh=2.0, max_waveforms_per_unit=300, compiled_ms_before=2.0, compiled_ms_after=2.0, scale_compiled_waveforms=True, std_at_peak=True, std_over_window_ms_before=0.5, std_over_window_ms_after=1.5, streaming=True, save_waveform_files=True), curation=CurationConfig(curate_first=True, curate_second=True, curation_epoch=None, fr_min=0.05, isi_viol_max=0.01, isi_violation_method='percent', snr_min=5.0, spikes_min_first=30, spikes_min_second=50, std_norm_max=1.0), compilation=CompilationConfig(compile_single_recording=True, compile_to_mat=False, compile_to_npz=True, compile_waveforms=False, save_electrodes=True, save_spike_times=True, save_raw_pkl=False, save_dl_data=False), figures=FigureConfig(create_figures=False, create_unit_figures=False, dpi=None, font_size=12, bar_x_label='Recording', bar_y_label='Number of Units', bar_label_rotation=0, bar_total_label='First Curation', bar_selected_label='Selected Curation', scatter_std_max_units_per_recording=None, scatter_recording_colors=['#f74343', '#fccd56', '#74fc56', '#56fcf6', '#1e1efa', '#fa1ed2'], scatter_recording_alpha=1.0, scatter_x_label='Number of Spikes', scatter_y_label='avg. STD / amplitude', scatter_x_max_buffer=300.0, scatter_y_max_buffer=0.2, templates_color_curated='#000000', templates_color_failed='#FF0000', templates_per_column=50, templates_y_spacing=50.0, templates_y_lim_buffer=10.0, templates_window_ms_before=5.0, templates_window_ms_after=5.0, templates_line_ms_before=1.0, templates_line_ms_after=4.0, templates_x_label='Time Rel. to Peak (ms)'), execution=ExecutionConfig(n_jobs=8, total_memory='16G', use_parallel_processing_for_raw_conversion=True, save_script=False, out_file='sort_with_kilosort2.out', random_seed=1, recompute_recording=False, recompute_sorting=False, reextract_waveforms=False, recurate_first=False, recurate_second=False, recompile_single_recording=False, delete_inter=True))

RT-Sort with the bundled MEA detection model. Uses the propagation-based RT-Sort algorithm (van der Molen, Lim et al. 2024, PLOS ONE) with the pretrained model tuned for Maxwell multi-electrode arrays.

spikelab.spike_sorting.config.RT_SORT_NEUROPIXELS = SortingPipelineConfig(recording=RecordingConfig(stream_id=None, hdf5_plugin_path=None, first_n_mins=None, mea_y_max=None, gain_to_uv=None, offset_to_uv=None, rec_chunks=[], rec_chunks_s=[], start_time_s=None, end_time_s=None, freq_min=300, freq_max=6000), sorter=SorterConfig(sorter_name='rt_sort', sorter_path=None, sorter_params=None, use_docker=False), rt_sort=RTSortConfig(model_path=None, probe='neuropixels', device='cuda', num_processes=None, recording_window_ms=None, save_rt_sort_pickle=True, delete_inter=False, verbose=True, params={'stringent_thresh': 0.175, 'loose_thresh': 0.075, 'inference_scaling_numerator': 15.4, 'min_amp_dist_p': 0.1, 'max_latency_diff_spikes': 2.5, 'max_amp_median_diff_spikes': 0.45, 'max_latency_diff_sequences': 2.5, 'max_amp_median_diff_sequences': 0.45, 'max_root_amp_median_std_sequences': 2.5}, detection_window_s=None), waveform=WaveformConfig(ms_before=2.0, ms_after=2.0, pos_peak_thresh=2.0, max_waveforms_per_unit=300, compiled_ms_before=2.0, compiled_ms_after=2.0, scale_compiled_waveforms=True, std_at_peak=True, std_over_window_ms_before=0.5, std_over_window_ms_after=1.5, streaming=True, save_waveform_files=True), curation=CurationConfig(curate_first=True, curate_second=True, curation_epoch=None, fr_min=0.05, isi_viol_max=0.01, isi_violation_method='percent', snr_min=5.0, spikes_min_first=30, spikes_min_second=50, std_norm_max=1.0), compilation=CompilationConfig(compile_single_recording=True, compile_to_mat=False, compile_to_npz=True, compile_waveforms=False, save_electrodes=True, save_spike_times=True, save_raw_pkl=False, save_dl_data=False), figures=FigureConfig(create_figures=False, create_unit_figures=False, dpi=None, font_size=12, bar_x_label='Recording', bar_y_label='Number of Units', bar_label_rotation=0, bar_total_label='First Curation', bar_selected_label='Selected Curation', scatter_std_max_units_per_recording=None, scatter_recording_colors=['#f74343', '#fccd56', '#74fc56', '#56fcf6', '#1e1efa', '#fa1ed2'], scatter_recording_alpha=1.0, scatter_x_label='Number of Spikes', scatter_y_label='avg. STD / amplitude', scatter_x_max_buffer=300.0, scatter_y_max_buffer=0.2, templates_color_curated='#000000', templates_color_failed='#FF0000', templates_per_column=50, templates_y_spacing=50.0, templates_y_lim_buffer=10.0, templates_window_ms_before=5.0, templates_window_ms_after=5.0, templates_line_ms_before=1.0, templates_line_ms_after=4.0, templates_x_label='Time Rel. to Peak (ms)'), execution=ExecutionConfig(n_jobs=8, total_memory='16G', use_parallel_processing_for_raw_conversion=True, save_script=False, out_file='sort_with_kilosort2.out', random_seed=1, recompute_recording=False, recompute_sorting=False, reextract_waveforms=False, recurate_first=False, recurate_second=False, recompile_single_recording=False, delete_inter=True))

RT-Sort with the bundled Neuropixels detection model. Uses Neuropixels-tuned detection thresholds and merge parameters.

Backend Registry

Spike sorter backend registry.

Maps sorter names to their backend classes. Backends are imported lazily to avoid requiring all sorter dependencies at import time.

spikelab.spike_sorting.backends.get_backend_class(sorter_name)[source]

Look up and import the backend class for a sorter name.

Parameters:

sorter_name (str) – Registered sorter name (e.g. "kilosort2").

Returns:

The SorterBackend subclass.

Return type:

cls

Raises:

ValueError – If the sorter name is not registered.

spikelab.spike_sorting.backends.list_sorters()[source]

Return the list of registered sorter names.

Returns:

Available sorter names.

Return type:

sorters (list of str)

class spikelab.spike_sorting.backends.base.SorterBackend(config)[source]

Bases: ABC

Interface that each spike sorter backend must implement.

Parameters:

config (SortingPipelineConfig) – Full pipeline configuration. Backends read their relevant sub-configs (config.recording, config.sorter, config.waveform, config.execution).

__init__(config)[source]
abstract load_recording(rec_path)[source]

Load and preprocess a single recording.

Handles format-specific loading (Maxwell .h5, NWB, etc.), gain/offset scaling, and bandpass filtering.

Parameters:

rec_path (Any) – Path to a recording file, a directory of files to concatenate, or a pre-loaded BaseRecording object.

Returns:

A SpikeInterface BaseRecording ready for

sorting (scaled, filtered, single-segment).

Return type:

recording

abstract sort(recording, rec_path, recording_dat_path, output_folder)[source]

Run the spike sorter on a preprocessed recording.

Parameters:
  • recording – SpikeInterface BaseRecording from load_recording.

  • rec_path – Original recording file path (for binary conversion or metadata).

  • recording_dat_path (Path) – Path for the binary .dat file (used by sorters that require pre-converted input).

  • output_folder (Path) – Directory for sorter output files.

Returns:

A SpikeInterface BaseSorting with detected

units and spike trains.

Return type:

sorting

abstract extract_waveforms(recording, sorting, waveforms_folder, curation_folder, rec_path=None, rng=None)[source]

Extract per-unit waveforms and compute templates.

Parameters:
  • recording – SpikeInterface BaseRecording.

  • sorting – SpikeInterface BaseSorting from sort.

  • waveforms_folder (Path) – Root directory for waveform storage.

  • curation_folder (Path) – Directory for initial unit list and metadata.

Returns:

An object providing at minimum:

  • sorting — the sorting object (possibly with centered spike times)

  • recording — the recording object

  • sampling_frequency — float

  • peak_ind — int (peak sample index in template)

  • chans_max_all — dict or array mapping unit_id to max-amplitude channel index

  • use_pos_peak — dict or array mapping unit_id to bool (polarity)

  • get_computed_template(unit_id, mode) — returns (n_samples, n_channels) template array

  • ms_to_samples(ms) — time conversion

  • root_folder — Path to waveform files

This can be the custom WaveformExtractor (Kilosort2 backend) or a wrapper around SpikeInterface’s WaveformExtractor (future backends).

Return type:

waveform_extractor

write_recording(recording, dat_path)[source]

Convert a recording to the binary format needed by the sorter.

Not all sorters need this (some read recordings directly via SpikeInterface). The default implementation is a no-op.

Parameters:
  • recording (Any) – SpikeInterface BaseRecording.

  • dat_path (Path) – Output binary file path.

Return type:

None

Classified Exceptions

When a sort fails, SpikeLab can classify the failure into one of three categories so that callers can implement skip/retry/stop policies without parsing generic error messages.

Classified spike-sorting exceptions shared across runners and curation.

Failures from Kilosort2, Kilosort4, and the downstream curation/waveform code are grouped into three categories so callers can implement retry / skip / hard-stop policies without parsing generic Exception messages:

  • BiologicalSortFailure — the recording itself cannot be sorted (too silent, all channels bad, no waveforms to compute metrics on). Recommended policy: mark the target as not-sortable, move on, do not retry.

  • EnvironmentSortFailure — the host environment or container runtime is misconfigured. Recommended policy: hard stop and surface to the operator; retrying without intervention will loop.

  • ResourceSortFailure — the job exhausted a machine resource (GPU memory today; disk/CPU in future). Recommended policy: retry with reduced parameters rather than skip or hard-stop.

Classifiers in _classifier inspect sorter logs and exception chains to re-raise generic failures as one of the specific types below. The classes are also usable directly from non-classifier paths (e.g. curation code that already knows the exact condition).

exception spikelab.spike_sorting._exceptions.SpikeSortingClassifiedError[source]

Bases: RuntimeError

Base class for all classified sort-pipeline failures.

Catch this when you want to treat any identified failure uniformly. Prefer catching the more specific categorical bases (BiologicalSortFailure, EnvironmentSortFailure, ResourceSortFailure) when the policy differs by category.

exception spikelab.spike_sorting._exceptions.BiologicalSortFailure[source]

Bases: SpikeSortingClassifiedError

Failure caused by the recording itself (too little signal).

exception spikelab.spike_sorting._exceptions.EnvironmentSortFailure[source]

Bases: SpikeSortingClassifiedError

Failure caused by host or container environment misconfiguration.

exception spikelab.spike_sorting._exceptions.ResourceSortFailure[source]

Bases: SpikeSortingClassifiedError

Failure caused by exhausting a machine resource.

exception spikelab.spike_sorting._exceptions.InsufficientActivityError(message, *, sorter, threshold_crossings=None, units_at_failure=None, nspks_at_failure=None, log_path=None)[source]

Bases: BiologicalSortFailure

Sorting crashed because the recording has too little spiking activity.

Kilosort2, Kilosort4, and RT-Sort all fail on near-silent recordings, but in different ways:

  • Kilosort2: mex kernels launch with degenerate grid/block configurations when template counts and per-batch spike counts approach zero. Pre-Blackwell GPUs tolerated these launches; newer architectures (compute capability ≥ 12) reject them with CUDA error: invalid configuration argument.

  • Kilosort4: sklearn’s TruncatedSVD rejects an empty feature matrix, or KMeans fails the n_samples >= n_clusters check, when the initial spike-detection pass finds essentially no events.

  • RT-Sort: detect_sequences produces zero propagation sequences when the recording lacks sufficient spiking activity for clustering. Returns None, which causes an AttributeError when sort_offline is subsequently called.

threshold_crossings

KS2 only; count of detected threshold crossings parsed from kilosort2.log. None for KS4 / RT-Sort.

units_at_failure

KS2 template count at the crash, or KS4 n_samples when KMeans complained. None when the log did not expose the value.

nspks_at_failure

KS2 only; spikes-per-batch at the failing template-optimization step.

log_path

Sorter log file carrying the full trace when located.

sorter

Short identifier of the sorter that raised ("kilosort2", "kilosort4", "rt_sort").

__init__(message, *, sorter, threshold_crossings=None, units_at_failure=None, nspks_at_failure=None, log_path=None)[source]
exception spikelab.spike_sorting._exceptions.NoGoodChannelsError(message, *, sorter, total_channels=None, bad_channels=None, log_path=None)[source]

Bases: BiologicalSortFailure

All channels were flagged as bad by the sorter’s good-channel check.

Distinct from InsufficientActivityError: the signal may be noisy/present but no channel passes the sorter’s minfr_goodchannels (or equivalent) firing-rate threshold.

total_channels

Total channel count in the recording, when parsed.

bad_channels

Channels flagged as bad.

log_path

Sorter log file carrying the full trace when located.

sorter

Short identifier of the sorter that raised.

__init__(message, *, sorter, total_channels=None, bad_channels=None, log_path=None)[source]
exception spikelab.spike_sorting._exceptions.SaturatedSignalError(message, *, channels_saturated=None, total_channels=None)[source]

Bases: BiologicalSortFailure

Recording appears flat or rail-saturated across all channels.

Typical causes: disconnected electrodes, loss of fluid contact, broken amplifier front-end, or a saved recording that never received real data. Distinct from InsufficientActivityError because it reflects a hardware/acquisition fault rather than biology.

The sort-time log signatures are ambiguous with near-silent biology, so this class is currently intended to be raised by dedicated pre-sort validators (e.g. per-channel variance / rail-clip checks) rather than by the post-failure classifiers. Callers that already know the condition may raise it directly.

channels_saturated

Number of channels identified as saturated, when the caller provides this.

total_channels

Total channel count in the recording.

__init__(message, *, channels_saturated=None, total_channels=None)[source]
exception spikelab.spike_sorting._exceptions.EmptyWaveformMetricsError(message, *, metric_name=None)[source]

Bases: BiologicalSortFailure, ValueError

Waveform metrics (SNR, std-norm) cannot be computed.

Raised when curation requests a waveform-based metric but no precomputed values exist and raw_data on the SpikeData is empty, so there is nothing to extract waveforms from.

This is biology-adjacent: it typically means the upstream sorter produced units that have no usable waveform evidence attached, or that the pipeline skipped the waveform-extraction stage. Callers should treat it as “cannot curate this target” rather than retry.

Inherits from both BiologicalSortFailure (for category-aware handling) and ValueError (for backward compatibility with callers that historically caught ValueError from this site).

metric_name

The metric that could not be computed.

__init__(message, *, metric_name=None)[source]
exception spikelab.spike_sorting._exceptions.HDF5PluginMissingError(message, *, configured_path=None)[source]

Bases: EnvironmentSortFailure

HDF5 filter plugin is missing or the plugin path is misconfigured.

Typical signatures in the underlying exception chain: h5py / HDF5 errors about being unable to open a compressed dataset, or the inherited HDF5_PLUGIN_PATH environment variable pointing to a non-existent directory.

Recommended remediation (operator, not the library): set HDF5_PLUGIN_PATH to a directory containing the compression plugin required by the recording’s HDF5 build before any h5py import. The exact directory and plugin name are deployment-specific.

configured_path

The value of HDF5_PLUGIN_PATH at failure time, if known.

__init__(message, *, configured_path=None)[source]
exception spikelab.spike_sorting._exceptions.DockerEnvironmentError(message, *, reason)[source]

Bases: EnvironmentSortFailure

Docker daemon, client library, or image is unusable for sorting.

The reason string narrows the failure mode so callers can render better diagnostics or choose different remediations without catching sub-exceptions.

Recognized reason values:

  • "daemon_down" — Cannot connect to the Docker daemon.

  • "client_missing" — The Python docker client library is not installed in the sorting env.

  • "image_pull_failed" — Image pull returned an error (network, auth, or manifest-not-found).

  • "permission_denied" — Socket permission denied; user not in the docker group or equivalent.

  • "other" — Docker is broken in a way that did not match any known signature; inspect __cause__ for details.

reason

One of the strings above.

__init__(message, *, reason)[source]
exception spikelab.spike_sorting._exceptions.ModelLoadingError(message, *, sorter='rt_sort', model_path=None)[source]

Bases: EnvironmentSortFailure

Detection model could not be loaded or is unusable.

Raised when RT-Sort’s ModelSpikeSorter.load() fails — typically because PyTorch is missing, weights are corrupt, the model folder does not exist, or the architecture parameters do not match the saved state dict.

model_path

Path that was attempted, when known.

sorter

Short identifier of the sorter that raised.

__init__(message, *, sorter='rt_sort', model_path=None)[source]
exception spikelab.spike_sorting._exceptions.GPUOutOfMemoryError(message, *, sorter, log_path=None)[source]

Bases: ResourceSortFailure

The sorter exhausted GPU memory.

Raised when either a PyTorch CUDA out of memory error (KS4) or a MATLAB/mex CUDA_ERROR_OUT_OF_MEMORY diagnostic (KS2) appears in the exception chain or sorter log.

Recommended remediation: reduce batch size / NT / nPCs, split the recording into shorter segments, or run on a larger-memory GPU. Retrying the same command unchanged will loop.

sorter

Short identifier of the sorter that raised.

log_path

Sorter log file carrying the full trace when located.

__init__(message, *, sorter, log_path=None)[source]

Post-Failure Classifiers

The classifier module inspects sorter logs and exception chains to produce specific SpikeSortingClassifiedError subclasses from generic failures.

spikelab.spike_sorting._classifier.classify_ks2_failure(output_folder, exc)[source]

Return a classified exception for a Kilosort2 failure, or None.

Priority: environment → resource → biology. Environment and resource errors can appear on any recording, so they take precedence over biology signatures that would otherwise be consistent with them.

Return type:

Optional[SpikeSortingClassifiedError]

spikelab.spike_sorting._classifier.classify_ks4_failure(output_folder, exc)[source]

Return a classified exception for a Kilosort4 failure, or None.

Priority mirrors KS2. KS4 does not expose a distinct “all channels bad” diagnostic the same way KS2 does, so only the generic biology classifier (insufficient activity) is applied.

Return type:

Optional[SpikeSortingClassifiedError]