Curation
Unit-curation helpers for SpikeData objects. Each function
takes a SpikeData as its first argument and returns a tuple
(SpikeData, result_dict) where result_dict contains the per-unit metric
and a boolean mask of units that passed the criterion. These functions are also
bound as methods on SpikeData (e.g. sd.curate_by_snr(...))
and can be applied in combination via curate().
Unit curation methods for SpikeData objects.
Each public function accepts a SpikeData as its first argument and returns
(SpikeData, result_dict) where result_dict always contains:
metric—np.ndarray (N,)with the per-unit metric value (computed over all original units).passed—np.ndarray (N,)boolean mask indicating which units passed the curation criterion.
The returned SpikeData contains only the passing units (via subset).
These functions are bound as methods on SpikeData by
spikedata.py so they can be called as sd.curate_by_*(…).
- spikelab.spikedata.curation.curate_by_min_spikes(sd, min_spikes=30)[source]
Remove units with fewer than min_spikes spikes.
- spikelab.spikedata.curation.curate_by_firing_rate(sd, min_rate_hz=0.05)[source]
Remove units whose firing rate is below min_rate_hz.
- spikelab.spikedata.curation.curate_by_isi_violations(sd, max_violation=0.01, threshold_ms=1.5, min_isi_ms=0.0, method='percent')[source]
Remove units with excessive inter-spike-interval violations.
Two methods are available:
"percent"— violation count divided by total spike count, expressed as a fraction in[0, 1](e.g.0.01means 1 % of spikes are ISI violations). The"percent"name is kept for backward compatibility with prior versions; the value is now a fraction, not a percentage."hill"— violation rate ratio from Hill et al. (2011) J Neurosci 31:8699-8705. Values above 1 indicate highly contaminated units.
Deprecated since version 0.105: With
method="percent",max_violationis now a fraction (0.01= 1 % of spikes) instead of a percent value (1.0= 1 %). Passing a value>= 1.0withmethod="percent"emits aDeprecationWarningand is auto-converted by dividing by 100. The legacy default1.0is therefore treated as0.01. This compatibility shim will be removed in a future release.- Parameters:
sd (SpikeData) – Source spike data.
max_violation (float) – Maximum allowed metric. With
method="percent"this is a fraction in[0, 1](default0.01= 1 % of spikes). Withmethod="hill"it is a contamination ratio.threshold_ms (float) – Refractory period threshold in ms.
min_isi_ms (float) – Minimum possible ISI enforced by hardware or post-processing, in ms.
method (str) –
"percent"or"hill".
- Returns:
SpikeData with only passing units. result (dict):
{"metric": (N,) ISI violation metric, "passed": (N,) bool mask}.- Return type:
sd_out (SpikeData)
- spikelab.spikedata.curation.curate_by_snr(sd, min_snr=5.0, ms_before=1.0, ms_after=2.0)[source]
Remove units whose signal-to-noise ratio is below min_snr.
SNR is defined as
peak_amplitude / noise_levelwhere peak amplitude is the absolute maximum of the average waveform on the channel with the largest amplitude, and noise level is estimated via the median absolute deviation (MAD) of the raw trace on that channel.The method first checks for a precomputed
"snr"value inneuron_attributes. If not found, it computes SNR fromraw_data(usingget_waveform_traces). If neither is available aValueErroris raised.- Parameters:
- Returns:
SpikeData with only passing units. result (dict):
{"metric": (N,) per-unit SNR, "passed": (N,) bool mask}.- Return type:
sd_out (SpikeData)
- spikelab.spikedata.curation.curate_by_std_norm(sd, max_std_norm=1.0, at_peak=True, window_ms_before=0.5, window_ms_after=1.5, ms_before=1.0, ms_after=2.0)[source]
Remove units whose normalized waveform standard deviation exceeds max_std_norm.
Normalized STD is
|std| / |amplitude|on the channel with the largest amplitude. When at_peak is True, STD is measured at the single peak sample; otherwise it is averaged over a window around the peak.The method first checks for a precomputed
"std_norm"value inneuron_attributes. If not found, it computes the metric fromraw_data. If neither is available aValueErroris raised.- Parameters:
sd (SpikeData) – Source spike data.
max_std_norm (float) – Maximum allowed normalized STD.
at_peak (bool) – Measure STD at peak sample only.
window_ms_before (float) – Window before peak for averaging STD (only used when at_peak is False).
window_ms_after (float) – Window after peak for averaging STD (only used when at_peak is False).
ms_before (float) – ms before spike for waveform extraction (only used when computing from raw_data).
ms_after (float) – ms after spike for waveform extraction (only used when computing from raw_data).
- Returns:
SpikeData with only passing units. result (dict):
{"metric": (N,) normalized STD, "passed": (N,) bool mask}.- Return type:
sd_out (SpikeData)
- spikelab.spikedata.curation.compute_waveform_metrics(sd, ms_before=1.0, ms_after=2.0, at_peak=True, window_ms_before=0.5, window_ms_after=1.5)[source]
Compute average waveforms, SNR, and normalized STD for every unit.
Results are stored in
neuron_attributesunder the keys"snr"and"std_norm". Average waveforms are stored byget_waveform_traces(called internally withstore=True).- Parameters:
sd (SpikeData) – Source spike data. Must have non-empty
raw_data.ms_before (float) – ms before spike for waveform extraction.
ms_after (float) – ms after spike for waveform extraction.
at_peak (bool) – Measure STD at peak sample only.
window_ms_before (float) – Window before peak for averaging STD (only used when at_peak is False).
window_ms_after (float) – Window after peak for averaging STD (only used when at_peak is False).
- Returns:
- The same SpikeData object (modified in place
with updated
neuron_attributes).- metrics (dict): Dict with keys
"snr"and"std_norm", each mapping to an
np.ndarrayof shape(N,).
- Return type:
sd (SpikeData)
- spikelab.spikedata.curation.curate(sd, min_spikes=None, min_rate_hz=None, isi_max=None, isi_threshold_ms=1.5, isi_min_ms=0.0, isi_method='percent', min_snr=None, max_std_norm=None, std_at_peak=True, std_window_ms_before=0.5, std_window_ms_after=1.5, snr_ms_before=1.0, snr_ms_after=2.0)[source]
Apply multiple curation criteria in sequence (intersection).
Only criteria whose threshold is not None are applied. Returns the filtered SpikeData and a dict of per-criterion results.
- Parameters:
sd (SpikeData) – Source spike data.
min_spikes (int or None) – Minimum spike count.
min_rate_hz (float or None) – Minimum firing rate in Hz.
isi_max (float or None) – Maximum ISI violation metric.
isi_threshold_ms (float) – Refractory period for ISI check.
isi_min_ms (float) – Minimum possible ISI for ISI check.
isi_method (str) –
"percent"or"hill"for ISI check.min_snr (float or None) – Minimum SNR.
max_std_norm (float or None) – Maximum normalized STD.
std_at_peak (bool) – Measure STD at peak only.
std_window_ms_before (float) – Window before peak for STD averaging.
std_window_ms_after (float) – Window after peak for STD averaging.
snr_ms_before (float) – ms before spike for waveform extraction.
snr_ms_after (float) – ms after spike for waveform extraction.
- Returns:
SpikeData with only units passing all criteria. results (dict): Mapping from criterion name to
{"metric": (N,), "passed": (N,)}.- Return type:
sd_out (SpikeData)
- spikelab.spikedata.curation.build_curation_history(sd_original, sd_curated, results, parameters=None)[source]
Translate curation results into a serializable history dict.
The output format mirrors the curation history produced by the Kilosort2 pipeline, making it suitable for saving as JSON.
- Parameters:
sd_original (SpikeData) – The SpikeData before curation.
sd_curated (SpikeData) – The SpikeData after curation.
results (dict) – Results dict returned by
curate()or assembled manually from individualcurate_by_*calls. Keys are criterion names, values are dicts with"metric"and"passed"arrays.parameters (dict or None) – Curation parameter values to record. If None, an empty dict is stored.
- Returns:
- Serializable curation history with keys:
curation_parameters,initial,curations,curated,failed,metrics,curated_final.
- Return type:
history (dict)
- spikelab.spikedata.curation.curate_by_merge_duplicates(sd, dist_um=24.8, max_violation_rate=0.04, isi_threshold_ms=1.5, cosine_threshold=0.5, max_lag=10, delta_ms=0.4, max_isi_increase=0.04, verbose=False)[source]
Remove duplicate units by merging nearby pairs with similar waveforms.
Runs the full merge-based deduplication pipeline:
Find spatially nearby unit pairs within dist_um.
Discard pairs where either unit exceeds the ISI violation threshold.
Compute pairwise cosine waveform similarity.
Discard pairs below cosine_threshold.
Greedily merge accepted pairs; a merge is rejected if the ISI violation fraction increases by more than max_isi_increase.
Requires neuron_attributes with position and avg_waveform entries. Unlike other curate_by_* functions this merges spike trains rather than simply removing units.
- Parameters:
sd (SpikeData) – spike data.
dist_um (float) – Maximum inter-electrode distance in µm to consider a pair as candidate duplicates.
max_violation_rate (float) – Maximum ISI violation rate (fraction, not percent) for a unit to participate in a merge.
isi_threshold_ms (float) – Refractory period threshold in ms.
cosine_threshold (float) – Minimum cosine similarity to merge a pair.
max_lag (int) – Maximum lag in samples for cosine similarity alignment.
delta_ms (float) – Spike deduplication window in ms when merging trains.
max_isi_increase (float) – Maximum allowable absolute increase in ISI violation fraction after merging.
verbose (bool) – Print per-pair merge decisions.
- Returns:
SpikeData with merged units. result (dict):
{"metric": (N,) cosine similarity to merge partner (0 if unmerged), "passed": (N,) bool mask of retained units}.- Return type:
sd_out (SpikeData)