Curation

Unit-curation helpers for SpikeData objects. Each function takes a SpikeData as its first argument and returns a tuple (SpikeData, result_dict) where result_dict contains the per-unit metric and a boolean mask of units that passed the criterion. These functions are also bound as methods on SpikeData (e.g. sd.curate_by_snr(...)) and can be applied in combination via curate().

Unit curation methods for SpikeData objects.

Each public function accepts a SpikeData as its first argument and returns (SpikeData, result_dict) where result_dict always contains:

metric — np.ndarray (N,) with the per-unit metric value (computed over all original units).
passed — np.ndarray (N,) boolean mask indicating which units passed the curation criterion.

The returned SpikeData contains only the passing units (via subset).

These functions are bound as methods on SpikeData by spikedata.py so they can be called as sd.curate_by_*(…).

spikelab.spikedata.curation.curate_by_min_spikes(sd, min_spikes=30)[source]

Remove units with fewer than min_spikes spikes.

Parameters:

sd (SpikeData) – Source spike data.
min_spikes (int) – Minimum spike count threshold.

Returns:

SpikeData with only passing units. result (dict): {"metric": (N,) spike counts, "passed": (N,) bool mask}.

Return type:

sd_out (SpikeData)

spikelab.spikedata.curation.curate_by_firing_rate(sd, min_rate_hz=0.05)[source]

Remove units whose firing rate is below min_rate_hz.

Parameters:

sd (SpikeData) – Source spike data.
min_rate_hz (float) – Minimum firing rate in Hz.

Returns:

SpikeData with only passing units. result (dict): {"metric": (N,) firing rates in Hz, "passed": (N,) bool mask}.

Return type:

sd_out (SpikeData)

spikelab.spikedata.curation.curate_by_isi_violations(sd, max_violation=0.01, threshold_ms=1.5, min_isi_ms=0.0, method='percent')[source]

Remove units with excessive inter-spike-interval violations.

Two methods are available:

"percent" — violation count divided by total spike count, expressed as a fraction in [0, 1] (e.g. 0.01 means 1 % of spikes are ISI violations).
"hill" — violation rate ratio from Hill et al. (2011) J Neurosci 31:8699-8705. Values above 1 indicate highly contaminated units.

Parameters:

sd (SpikeData) – Source spike data.
max_violation (float) – Maximum allowed metric. With method="percent" this is a fraction in [0, 1] (default 0.01 = 1 % of spikes). With method="hill" it is a contamination ratio.
threshold_ms (float) – Refractory period threshold in ms.
min_isi_ms (float) – Minimum possible ISI enforced by hardware or post-processing, in ms.
method (str) – "percent" or "hill".

Returns:

SpikeData with only passing units. result (dict): {"metric": (N,) ISI violation metric, "passed": (N,) bool mask}.

Return type:

sd_out (SpikeData)

spikelab.spikedata.curation.curate_by_snr(sd, min_snr=5.0, ms_before=1.0, ms_after=2.0)[source]

Remove units whose signal-to-noise ratio is below min_snr.

SNR is defined as peak_amplitude / noise_level where peak amplitude is the absolute maximum of the average waveform on the channel with the largest amplitude, and noise level is estimated via the median absolute deviation (MAD) of the raw trace on that channel.

The method first checks for a precomputed "snr" value in neuron_attributes. If not found, it computes SNR from raw_data (using get_waveform_traces). If neither is available a ValueError is raised.

Parameters:

sd (SpikeData) – Source spike data.
min_snr (float) – Minimum SNR threshold.
ms_before (float) – ms before spike for waveform extraction (only used when computing from raw_data).
ms_after (float) – ms after spike for waveform extraction (only used when computing from raw_data).

Returns:

SpikeData with only passing units. result (dict): {"metric": (N,) per-unit SNR, "passed": (N,) bool mask}.

Return type:

sd_out (SpikeData)

spikelab.spikedata.curation.curate_by_std_norm(sd, max_std_norm=1.0, at_peak=True, window_ms_before=0.5, window_ms_after=1.5, ms_before=1.0, ms_after=2.0)[source]

Remove units whose normalized waveform standard deviation exceeds max_std_norm.

Normalized STD is |std| / |amplitude| on the channel with the largest amplitude. When at_peak is True, STD is measured at the single peak sample; otherwise it is averaged over a window around the peak.

The method first checks for a precomputed "std_norm" value in neuron_attributes. If not found, it computes the metric from raw_data. If neither is available a ValueError is raised.

Parameters:

sd (SpikeData) – Source spike data.
max_std_norm (float) – Maximum allowed normalized STD.
at_peak (bool) – Measure STD at peak sample only.
window_ms_before (float) – Window before peak for averaging STD (only used when at_peak is False).
window_ms_after (float) – Window after peak for averaging STD (only used when at_peak is False).
ms_before (float) – ms before spike for waveform extraction (only used when computing from raw_data).
ms_after (float) – ms after spike for waveform extraction (only used when computing from raw_data).

Returns:

SpikeData with only passing units. result (dict): {"metric": (N,) normalized STD, "passed": (N,) bool mask}.

Return type:

sd_out (SpikeData)

spikelab.spikedata.curation.compute_waveform_metrics(sd, ms_before=1.0, ms_after=2.0, at_peak=True, window_ms_before=0.5, window_ms_after=1.5, freq_min=300, freq_max=6000)[source]

Compute average waveforms, SNR, and normalized STD for every unit.

Results are stored in neuron_attributes under the keys "snr" and "std_norm". Average waveforms are stored by get_waveform_traces (called internally with store=True).

Waveforms are extracted from bandpass-filtered data (freq_min–freq_max Hz) and noise is estimated on the same filtered band. This matches SpikeInterface’s quality-metrics convention and avoids inflating the SNR denominator with LFP energy.

Parameters:

sd (SpikeData) – Source spike data. Must have non-empty raw_data.
ms_before (float) – ms before spike for waveform extraction.
ms_after (float) – ms after spike for waveform extraction.
at_peak (bool) – Measure STD at peak sample only.
window_ms_before (float) – Window before peak for averaging STD (only used when at_peak is False).
window_ms_after (float) – Window after peak for averaging STD (only used when at_peak is False).
freq_min (float) – Low-cut frequency for bandpass filter (Hz). Defaults to 300 Hz (matches _globals.FREQ_MIN).
freq_max (float) – High-cut frequency for bandpass filter (Hz). Defaults to 6000 Hz (matches _globals.FREQ_MAX).

Returns:

The same SpikeData object (modified in place: with updated neuron_attributes).
metrics (dict): Dict with keys "snr" and "std_norm",: each mapping to an np.ndarray of shape (N,).

Return type:

sd (SpikeData)

spikelab.spikedata.curation.curate(sd, min_spikes=None, min_rate_hz=None, isi_max=None, isi_threshold_ms=1.5, isi_min_ms=0.0, isi_method='percent', min_snr=None, max_std_norm=None, std_at_peak=True, std_window_ms_before=0.5, std_window_ms_after=1.5, snr_ms_before=1.0, snr_ms_after=2.0)[source]

Apply multiple curation criteria in sequence (intersection).

Only criteria whose threshold is not None are applied. Returns the filtered SpikeData and a dict of per-criterion results.

Parameters:

sd (SpikeData) – Source spike data.
min_spikes (int or None) – Minimum spike count.
min_rate_hz (float or None) – Minimum firing rate in Hz.
isi_max (float or None) – Maximum ISI violation metric.
isi_threshold_ms (float) – Refractory period for ISI check.
isi_min_ms (float) – Minimum possible ISI for ISI check.
isi_method (str) – "percent" or "hill" for ISI check.
min_snr (float or None) – Minimum SNR.
max_std_norm (float or None) – Maximum normalized STD.
std_at_peak (bool) – Measure STD at peak only.
std_window_ms_before (float) – Window before peak for STD averaging.
std_window_ms_after (float) – Window after peak for STD averaging.
snr_ms_before (float) – ms before spike for waveform extraction.
snr_ms_after (float) – ms after spike for waveform extraction.

Returns:

SpikeData with only units passing all criteria. results (dict): Mapping from criterion name to {"metric": (N,), "passed": (N,)}.

Return type:

sd_out (SpikeData)

spikelab.spikedata.curation.build_curation_history(sd_original, sd_curated, results, parameters=None)[source]

Translate curation results into a serializable history dict.

The output format mirrors the curation history produced by the Kilosort2 pipeline, making it suitable for saving as JSON.

Parameters:

sd_original (SpikeData) – The SpikeData before curation.
sd_curated (SpikeData) – The SpikeData after curation.
results (dict) – Results dict returned by curate() or assembled manually from individual curate_by_* calls. Keys are criterion names, values are dicts with "metric" and "passed" arrays.
parameters (dict or None) – Curation parameter values to record. If None, an empty dict is stored.

Returns:

Serializable curation history with keys:: curation_parameters, initial, curations, curated, failed, metrics, curated_final.

Return type:

history (dict)

spikelab.spikedata.curation.curate_by_merge_duplicates(sd, dist_um=24.8, max_violation_rate=0.04, isi_threshold_ms=1.5, cosine_threshold=0.5, max_lag=10, delta_ms=0.4, max_isi_increase=0.04, verbose=False)[source]

Remove duplicate units by merging nearby pairs with similar waveforms.

Runs the full merge-based deduplication pipeline:

Find spatially nearby unit pairs within dist_um.
Discard pairs where either unit exceeds the ISI violation threshold.
Compute pairwise cosine waveform similarity.
Discard pairs below cosine_threshold.
Greedily merge accepted pairs; a merge is rejected if the ISI violation fraction increases by more than max_isi_increase.

Requires neuron_attributes with position and avg_waveform entries. Unlike other curate_by_* functions this merges spike trains rather than simply removing units.

Parameters:

sd (SpikeData) – spike data.
dist_um (float) – Maximum inter-electrode distance in µm to consider a pair as candidate duplicates.
max_violation_rate (float) – Maximum ISI violation rate (fraction, not percent) for a unit to participate in a merge.
isi_threshold_ms (float) – Refractory period threshold in ms.
cosine_threshold (float) – Minimum cosine similarity to merge a pair.
max_lag (int) – Maximum lag in samples for cosine similarity alignment.
delta_ms (float) – Spike deduplication window in ms when merging trains.
max_isi_increase (float) – Maximum allowable absolute increase in ISI violation fraction after merging.
verbose (bool) – Print per-pair merge decisions.

Returns:

SpikeData with merged units. result (dict): {"metric": (N,) cosine similarity to merge partner (0 if unmerged), "passed": (N,) bool mask of retained units}.

Return type:

sd_out (SpikeData)