============ Loading Data ============ SpikeLab supports loading spike train data from a variety of common electrophysiology formats. All loaders return a :class:`~spikelab.spikedata.spike_data.SpikeData` object and convert spike times to **milliseconds** where possible, since all analyses assume this unit. The main loader functions live in ``spikelab.data_loaders.data_loaders``. In addition, :class:`~spikelab.spikedata.spike_data.SpikeData` provides static constructors for building spike data directly from arrays or raw traces. From pickle files ----------------- If you have a previously saved ``SpikeData`` object in a pickle file, you can load it with the standard library or with the SpikeLab convenience loader. Using the standard library: .. code-block:: python import pickle with open("my_data.pkl", "rb") as f: sd = pickle.load(f) Using the SpikeLab loader (which also supports S3 URLs): .. code-block:: python from spikelab.data_loaders.data_loaders import load_spikedata_from_pickle sd = load_spikedata_from_pickle("my_data.pkl") # Load from S3 sd = load_spikedata_from_pickle( "s3://my-bucket/my_data.pkl", aws_access_key_id="...", aws_secret_access_key="...", ) From HDF5 files ---------------- HDF5 is the most flexible format. SpikeLab supports four different storage styles within an HDF5 file: - **Raster** -- a ``(N, T)`` spike count matrix stored as a single dataset. - **Ragged** -- a flat array of spike times plus an index array that marks the boundaries of each unit (NWB-like layout). - **Group** -- one HDF5 group per unit, each containing a 1-D array of spike times. - **Paired** -- two parallel 1-D arrays: unit indices and spike times. You choose which style to load by setting the corresponding parameters. Exactly one style must be specified per call. .. code-block:: python from spikelab.data_loaders.data_loaders import load_spikedata_from_hdf5 # Raster style sd = load_spikedata_from_hdf5( "recording.h5", raster_dataset="raster", raster_bin_size_ms=1.0, ) # Ragged style (spike_times + spike_times_index) sd = load_spikedata_from_hdf5( "recording.h5", spike_times_dataset="spike_times", spike_times_index_dataset="spike_times_index", spike_times_unit="s", # times in the file are in seconds ) # Group-per-unit style sd = load_spikedata_from_hdf5( "recording.h5", group_per_unit="units", group_time_unit="s", ) # Paired style (idces + times) sd = load_spikedata_from_hdf5( "recording.h5", idces_dataset="idces", times_dataset="times", times_unit="ms", ) The ``spike_times_unit`` parameter ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For the ragged style, the ``spike_times_unit`` parameter (default ``'s'``) tells the loader what unit the times in the file are stored in. The loader converts them to milliseconds automatically. Set this to ``'ms'`` if your file already stores times in milliseconds. Analogous parameters exist for the other styles: ``group_time_unit`` for group style and ``times_unit`` for paired style. From NWB files -------------- SpikeLab can load spike trains from NWB (Neurodata Without Borders) files. The loader reads the ``/units`` group and populates ``neuron_attributes`` with ``unit_id``, ``electrode`` (or ``electrode_group``/``channel``), ``location``, and electrode positions (``x``, ``y``, ``z``) when these are present in the file. .. code-block:: python from spikelab.data_loaders.data_loaders import load_spikedata_from_nwb sd = load_spikedata_from_nwb("session.nwb") # Optionally control the backend and recording length sd = load_spikedata_from_nwb( "session.nwb", prefer_pynwb=True, # try pynwb first, fall back to h5py length_ms=600000.0, # truncate to first 10 minutes ) The NWB loader tries ``pynwb`` first by default. If ``pynwb`` is not installed it falls back to reading the HDF5 file directly with ``h5py``. From KiloSort/Phy output ------------------------- KiloSort and Phy produce a folder of ``.npy`` files. SpikeLab reads ``spike_times.npy`` and ``spike_clusters.npy`` and optionally filters clusters using the TSV cluster info file produced by Phy (keeping ``good`` and ``mua`` labels by default). The ``fs_Hz`` parameter is required -- it specifies the sampling frequency so that sample indices can be converted to milliseconds. .. code-block:: python from spikelab.data_loaders.data_loaders import load_spikedata_from_kilosort sd = load_spikedata_from_kilosort( "path/to/kilosort_output/", fs_Hz=30000, ) # With cluster filtering and custom file names sd = load_spikedata_from_kilosort( "path/to/kilosort_output/", fs_Hz=30000, cluster_info_tsv="cluster_info.tsv", include_noise=False, spike_times_file="spike_times.npy", spike_clusters_file="spike_clusters.npy", ) The loader populates ``neuron_attributes`` with ``unit_id``, ``electrode``, and ``location`` when the corresponding information is available in the KiloSort output. From SpikeInterface ------------------- If you are using `SpikeInterface `_ for spike sorting, you can convert a ``SortingExtractor`` directly into a :class:`~spikelab.spikedata.spike_data.SpikeData`: .. code-block:: python from spikelab.data_loaders.data_loaders import load_spikedata_from_spikeinterface # sorting is any SpikeInterface SortingExtractor-like object sd = load_spikedata_from_spikeinterface( sorting, sampling_frequency=30000, # override if not set on the object unit_ids=None, # load all units segment_index=0, ) SpikeLab also supports loading from a SpikeInterface ``BaseRecording`` object, which applies threshold detection to the raw traces: .. code-block:: python from spikelab.data_loaders.data_loaders import load_spikedata_from_spikeinterface_recording sd = load_spikedata_from_spikeinterface_recording( recording, segment_index=0, threshold_sigma=5.0, filter=False, hysteresis=True, direction="both", ) From SpikeLab sorted .npz -------------------------- The SpikeLab spike-sorting pipeline (see :doc:`spike_sorting`) can compile its output into ``.npz`` files containing per-unit spike trains, electrode locations, waveform templates, and quality metrics. Load these with: .. code-block:: python from spikelab.data_loaders.data_loaders import load_spikedata_from_spikelab_sorted_npz sd = load_spikedata_from_spikelab_sorted_npz( "sorted_results.npz", length_ms=600000.0, # optional; inferred from latest spike if omitted ) The loader populates ``neuron_attributes`` with electrode positions, waveform templates, and quality metrics when available. From the IBL database --------------------- SpikeLab can load spike trains directly from the `International Brain Laboratory `_ public server. This requires the ``one-api`` and ``brainwidemap`` packages. First, search for probes matching your criteria: .. code-block:: python from spikelab.data_loaders.data_loaders import query_ibl_probes probes, stats_df = query_ibl_probes( target_regions=["MOs", "MOp"], # Beryl atlas region names min_units=20, # minimum good units per probe min_fraction_in_target=0.5, # at least 50% of units in target regions ) # probes is a list of (eid, pid) tuples # stats_df is a pandas DataFrame with per-probe statistics Then load a specific probe: .. code-block:: python from spikelab.data_loaders.data_loaders import load_spikedata_from_ibl eid, pid = probes[0] sd = load_spikedata_from_ibl(eid, pid) Only units labelled as good in the Brain-Wide Map unit table are included. Trial event times are stored in ``sd.metadata`` as numpy arrays in milliseconds. From Neo SpikeTrains -------------------- If you have a list of `Neo `_ ``SpikeTrain`` objects, convert them directly using the static constructor: .. code-block:: python from spikelab import SpikeData # spiketrains is a list of neo.SpikeTrain objects sd = SpikeData.from_neo_spiketrains(spiketrains) The constructor converts spike times to milliseconds automatically using the units attached to each ``SpikeTrain``. From raw data ------------- :class:`~spikelab.spikedata.spike_data.SpikeData` provides two static constructors for building spike data from arrays that do not require any external file format. From threshold detection ^^^^^^^^^^^^^^^^^^^^^^^^ If you have raw voltage traces as a NumPy array of shape ``(channels, time)``, you can detect spikes using a threshold crossing method: .. code-block:: python from spikelab.spikedata.spike_data import SpikeData import numpy as np raw_traces = np.random.randn(64, 600000) # 64 channels, 30 s at 20 kHz sd = SpikeData.from_thresholding( raw_traces, fs_Hz=20000, threshold_sigma=5.0, filter=True, # 300-6000 Hz Butterworth bandpass hysteresis=True, direction="both", # detect both positive and negative crossings ) The resulting ``SpikeData`` object has the original traces attached as ``raw_data`` and ``raw_time`` attributes. From a spike count raster ^^^^^^^^^^^^^^^^^^^^^^^^^ If you already have a ``(N, T)`` spike count raster, you can convert it directly: .. code-block:: python from spikelab.spikedata.spike_data import SpikeData import numpy as np raster = np.random.poisson(0.01, size=(32, 100000)) sd = SpikeData.from_raster(raster, bin_size_ms=1.0) Spikes are placed evenly within each bin.