========================== Spike Sorting and Curation ========================== SpikeLab includes a spike-sorting pipeline in the ``spikelab.spike_sorting`` sub-package. It supports three sorting algorithms — Kilosort2 (Pachitariu et al. 2016), Kilosort4 (Pachitariu et al. 2024), and RT-Sort (Van der Molen et al. 2024) — behind a unified interface built on SpikeInterface (Buccino et al. 2020). The pipeline returns curated :class:`~spikelab.SpikeData` objects ready for downstream analysis. SpikeLab also provides standalone curation methods that can be used on any ``SpikeData`` object, whether it came from the sorting pipeline or from an external source. **References:** - Pachitariu, M., Steinmetz, N., Kadir, S., Carandini, M. & Harris, K. D. Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels. *bioRxiv* (2016). - Pachitariu, M., Sridhar, S., Pennington, J. & Stringer, C. Spike sorting with Kilosort4. *Nature Methods* 21, 914--921 (2024). - Van der Molen, T., Lim, M., Bartram, J. et al. RT-Sort: An action potential propagation-based algorithm for real time spike detection and sorting with millisecond latencies. *PLoS ONE* 19, e0312438 (2024). - Buccino, A. P., Hurwitz, C. L., Garcia, S. et al. SpikeInterface, a unified framework for spike sorting. *eLife* 9, e61834 (2020). .. contents:: On this page :local: :depth: 2 Spike Sorting ------------- Prerequisites ^^^^^^^^^^^^^ The sorting pipeline requires external dependencies that are **not** installed with SpikeLab by default: - **Kilosort2** requires MATLAB (R2019b+) and the `Kilosort2 repository `_. A Docker variant is available that removes the MATLAB requirement. - **Kilosort4** is pure Python but requires ``torch`` and ``kilosort``. A Docker variant is also available. - **RT-Sort** requires ``torch`` for its neural-network spike detection model. For Maxwell Biosystems ``.h5`` files the HDF5 decompression plugin must also be installed; follow the instructions printed by the loader if the plugin is missing. Basic usage ^^^^^^^^^^^ The main entry point is :func:`~spikelab.spike_sorting.sort_recording`, which accepts a list of recording files and returns a list of :class:`~spikelab.SpikeData` objects: .. code-block:: python from spikelab.spike_sorting import sort_recording results = sort_recording( recording_files=["session1.raw.h5"], sorter="kilosort4", ) sd = results[0] print(sd.N, "units") print(sd.length / 1000, "seconds") The ``sorter`` parameter selects the algorithm: ``"kilosort2"``, ``"kilosort4"``, or ``"rt_sort"``. Configuration and presets ^^^^^^^^^^^^^^^^^^^^^^^^^ The pipeline is configured via a :class:`~spikelab.spike_sorting.config.SortingPipelineConfig` dataclass composed of sub-configs for recording I/O, sorting, curation, waveform extraction, and execution. Pre-built presets provide sensible defaults: .. code-block:: python from spikelab.spike_sorting.config import KILOSORT4 results = sort_recording( recording_files=["session1.raw.h5"], config=KILOSORT4, ) To customise a preset, use the ``override`` method: .. code-block:: python config = KILOSORT4.override( fr_min=0.1, # stricter minimum firing rate (Hz) snr_min=6.0, # stricter SNR threshold n_jobs=16, total_memory="32G", ) results = sort_recording(["session1.raw.h5"], config=config) Individual parameters can also be passed directly as keyword arguments to ``sort_recording``, which builds a config internally: .. code-block:: python results = sort_recording( recording_files=["session1.raw.h5"], sorter="kilosort2", kilosort_path="/opt/Kilosort2", fr_min=0.1, n_jobs=8, ) Multi-stream recordings ^^^^^^^^^^^^^^^^^^^^^^^ For multi-stream files (e.g. Maxwell multi-well ``.raw.h5``), use :func:`~spikelab.spike_sorting.sort_multistream`: .. code-block:: python from spikelab.spike_sorting import sort_multistream stream_results = sort_multistream( recording="multiwell.raw.h5", stream_ids=["well000", "well001"], sorter="kilosort4", ) for stream_id, sds in stream_results.items(): print(f"{stream_id}: {sds[0].N} units") This returns a dict mapping stream IDs to lists of ``SpikeData`` objects. Reusing intermediate results ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The pipeline caches intermediate files (binary recordings, sorting output, waveforms). Control which stages are re-run with the ``recompute_*`` flags: .. code-block:: python results = sort_recording( recording_files=["session1.raw.h5"], sorter="kilosort4", recompute_recording=False, # reuse existing binary recompute_sorting=True, # force re-sort reextract_waveforms=True, # force waveform re-extraction ) See the :doc:`API reference ` for the full configuration options. Stimulation Artifact Removal ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When sorting recordings with electrical stimulation, stim artifacts must be removed before spike detection. SpikeLab provides :func:`~spikelab.spike_sorting.stim_sorting.artifact_removal.remove_stim_artifacts` for this purpose. Two methods are available: - **polynomial** (default) — fits a low-order polynomial to the post-saturation artifact tail and subtracts it, preserving neural spikes that ride on the artifact decay. Saturated samples are blanked (zeroed). - **blank** — zeros the entire artifact window. Simpler but discards any spikes within the window. Stim times logged by the hardware may not align exactly with the artifact in the recording. Use :func:`~spikelab.spike_sorting.stim_sorting.recentering.recenter_stim_times` to find the true artifact onset: .. code-block:: python from spikelab.spike_sorting.stim_sorting.recentering import recenter_stim_times from spikelab.spike_sorting.stim_sorting.artifact_removal import remove_stim_artifacts # Recenter stim times to the actual artifact onset corrected_times = recenter_stim_times( traces, # (channels, samples) raw voltage stim_times_ms, # logged stim times in ms fs_Hz=20000, peak_mode="down_edge", # alignment mode for biphasic pulses ) # Remove artifacts cleaned, blanked_mask = remove_stim_artifacts( traces, corrected_times, fs_Hz=20000, method="polynomial", artifact_window_ms=10.0, # max tail duration after desaturation poly_order=3, # cubic polynomial (default) ) The ``peak_mode`` parameter controls how each artifact is aligned: - ``"abs_max"`` — largest absolute voltage (monophasic pulses) - ``"down_edge"`` — positive-to-negative zero-crossing (biphasic anodic-first) - ``"up_edge"`` — negative-to-positive zero-crossing (biphasic cathodic-first) - ``"pos_peak"`` / ``"neg_peak"`` — largest positive or most negative voltage When multiple stim events occur in rapid succession (e.g. burst or paired-pulse protocols), the module automatically detects whether the signal has returned to baseline between events and extends the blanking region across the entire burst if needed. The polynomial detrend approach is related to SALPA, adapted for offline use: - Wagenaar, D. A. & Potter, S. M. Real-time multi-channel stimulus artifact suppression by local curve fitting. *J Neurosci Methods* 120, 113--120 (2002). Unit Curation ------------- SpikeLab provides curation methods that filter units by quality metrics. These work on any :class:`~spikelab.SpikeData` object — not just output from the sorting pipeline. Each curation method returns a tuple ``(sd_curated, result_dict)`` where ``result_dict`` contains: - ``"metric"`` — ``np.ndarray (N,)`` with the per-unit metric for **all** original units. - ``"passed"`` — ``np.ndarray (N,)`` boolean mask of units that passed. Individual criteria ^^^^^^^^^^^^^^^^^^^ Apply a single quality criterion at a time: .. code-block:: python # Remove units with fewer than 50 spikes sd_curated, res = sd.curate_by_min_spikes(min_spikes=50) # Remove units below 0.1 Hz firing rate sd_curated, res = sd.curate_by_firing_rate(min_rate_hz=0.1) # Remove units with > 1% ISI violations sd_curated, res = sd.curate_by_isi_violations( max_violation=1.0, threshold_ms=1.5, ) # Remove units with low SNR sd_curated, res = sd.curate_by_snr(min_snr=5.0) # Remove units with inconsistent waveforms sd_curated, res = sd.curate_by_std_norm(max_std_norm=1.0) SNR and waveform consistency (``curate_by_snr``, ``curate_by_std_norm``) require that the ``SpikeData`` object has ``raw_data`` attached. If the metrics have not been pre-computed, call :meth:`~spikelab.SpikeData.compute_waveform_metrics` first: .. code-block:: python sd, metrics = sd.compute_waveform_metrics() sd_curated, res = sd.curate_by_snr(min_snr=5.0) Merge-based deduplication ^^^^^^^^^^^^^^^^^^^^^^^^^ When spike sorting produces duplicate units for the same neuron recorded on adjacent electrodes, you can merge them using :meth:`~spikelab.SpikeData.curate_by_merge_duplicates`. This finds spatially nearby unit pairs, filters by ISI violation rate and waveform cosine similarity, then greedily merges accepted pairs: .. code-block:: python sd_merged, res = sd.curate_by_merge_duplicates( dist_um=24.8, # max inter-electrode distance in um cosine_threshold=0.5, # min waveform similarity max_isi_increase=0.04, # reject merge if ISI violations rise too much ) n_absorbed = (~res["passed"]).sum() print(f"Merged {n_absorbed} duplicate units") This requires ``neuron_attributes`` with position and ``avg_waveform`` entries, which are set automatically when the ``SpikeData`` comes from the sorting pipeline. Batch curation ^^^^^^^^^^^^^^ To apply multiple criteria in one call, use :meth:`~spikelab.SpikeData.curate`. Only criteria with non-``None`` values are applied: .. code-block:: python sd_curated, results = sd.curate( min_spikes=50, min_rate_hz=0.1, isi_max=1.0, min_snr=5.0, ) # results contains per-criterion entries for criterion, res in results.items(): n_removed = (~res["passed"]).sum() print(f"{criterion}: removed {n_removed} units") Curation history ^^^^^^^^^^^^^^^^ For reproducibility, build a serialisable record of what was removed and why: .. code-block:: python history = sd.build_curation_history( sd_original=sd_raw, sd_curated=sd_curated, results=results, ) The returned dict is JSON-serialisable and can be stored in a workspace or saved alongside the curated data. Sorting Concatenated Recordings -------------------------------- When a directory containing multiple recording files is passed to ``sort_recording``, the pipeline concatenates them into a single recording for sorting. The returned ``SpikeData`` objects are automatically split back into per-recording epochs. When a list of recording paths is passed instead, each file is processed sequentially without concatenation. If you need to re-split a concatenated ``SpikeData`` manually (e.g. after loading a saved pickle that was not yet split), use :meth:`~spikelab.SpikeData.split_epochs`. This requires ``rec_chunks_ms`` in ``metadata`` (set automatically by the sorting pipeline) and time-shifts each epoch to start at t=0: .. code-block:: python epoch_sds = sd.split_epochs() for i, epoch_sd in enumerate(epoch_sds): print(f"Epoch {i}: {epoch_sd.N} units, {epoch_sd.length:.0f} ms") Handling Sort Failures ---------------------- When a sorting run fails, SpikeLab can classify the failure into one of three categories so that batch scripts can implement skip/retry/stop policies without parsing generic error messages. The three categories are: - **BiologicalSortFailure** -- the recording itself cannot be sorted (too silent, all channels bad, no waveforms to compute metrics on). Recommended policy: mark the target as not-sortable and move on. - **EnvironmentSortFailure** -- the host environment or container runtime is misconfigured (Docker down, HDF5 plugin missing). Recommended policy: stop and fix the environment. - **ResourceSortFailure** -- the job exhausted a machine resource (GPU out of memory). Recommended policy: retry with reduced parameters. All three inherit from :class:`~spikelab.spike_sorting._exceptions.SpikeSortingClassifiedError`, which in turn inherits from ``RuntimeError``. Each sorter has a dedicated classifier that inspects both sorter logs and exception chains to identify known failure signatures: - ``classify_ks2_failure`` — Kilosort2 - ``classify_ks4_failure`` — Kilosort4 - ``classify_rt_sort_failure`` — RT-Sort .. code-block:: python from spikelab.spike_sorting._classifier import ( classify_ks2_failure, classify_ks4_failure, classify_rt_sort_failure, ) from spikelab.spike_sorting._exceptions import ( BiologicalSortFailure, EnvironmentSortFailure, ResourceSortFailure, ) # Pick the classifier matching your sorter classify = classify_ks4_failure # or classify_ks2_failure, classify_rt_sort_failure try: results = sort_recording(["session1.raw.h5"], sorter="kilosort4") except Exception as exc: classified = classify(output_folder, exc) if classified is not None: if isinstance(classified, BiologicalSortFailure): print(f"Skipping (biology): {classified}") elif isinstance(classified, EnvironmentSortFailure): raise # stop the batch elif isinstance(classified, ResourceSortFailure): print(f"Retry with smaller batch: {classified}") else: raise # unrecognised failure Specific exception classes carry diagnostic attributes. For example, :class:`~spikelab.spike_sorting._exceptions.InsufficientActivityError` exposes ``threshold_crossings``, ``units_at_failure``, and ``nspks_at_failure`` parsed from sorter logs. RT-Sort additionally raises :class:`~spikelab.spike_sorting._exceptions.ModelLoadingError` when the detection model cannot be loaded. See the :doc:`API reference ` for the full exception hierarchy.