API for scripting

Scripting with tridesclous can avoid unnecessary click and play with the GUI in parameter dialogs if you need to process many files. Almost everything can be done 3 classes:

  • DataIO for configuration of the dataset, format selection, PRB file assignement, …
  • CatalogueConstructor run all steps to construct the catalogue : signal processing, fetaure, clustering, …
  • Peeler run the template matching engine

Of course everything done by script can still be check and modify with the GUI (MainWindow, CatalogueWindow and PeelerWindow).

The best is to read examples in the git repo

Classes API

class tridesclous.dataio.DataIO(dirname='test')[source]

Class to acces the dataset (raw data, processed, catalogue, spikes) in read/write mode.

All operations on the dataset are done througth that class.

The dataio :
  • work in a path. Almost everything is persistent.
  • needed by CatalogueConstructor and Peeler
  • have a datasource member that access raw data
  • store/load processed signals
  • store/load spikes
  • store/load the catalogue
  • deal with sevral channel groups given a PRB file
  • deal with several segment of recording (aka several files for raw data)
  • export the results (labeled spikes) to differents format

The underlying data storage is a simple tree on the file system. Everything is organised as simple as possible in sub folder (by channel group then segment index). In each folder:

  • arrays.json describe the list of numpy array (name, dtype, shape)
  • XXX.raw are the raw numpy arrays and load with a simple memmap.
  • some array are struct arrays (aka array of struct)

The datasource system is based on neo.rawio so all format in neo.rawio are available in tridesclous. neo.rawio is able to read chunk of signals indexed on time axis and channel axis.

The raw dataset do not need to be inside the working directory but can be somewhere outside. The info.json describe the link to the datasource (raw data)

Many raw dataset are saved by the device with an underlying int16. DataIO save the processed signals as float32 by default. So if you have a 10Go raw dataset tridesclous will need at least 20 Go more for storage of the processed signals.

Usage:

# initialize a directory
dataio = DataIO(dirname='/path/to/a/working/dir')

# set a data source
filenames = ['file1.raw', 'file2.raw']
dataio.dataio.set_data_source(type='RawData', filenames=filenames, 
                            sample_rate=10000, total_channel=16, dtype='int16')

# set a PRB file
dataio.set_probe_file('/path/to/a/file.prb')
# or dowload it
dataio.download_probe('kampff_128', origin='spyking-circus')

# check lenght and channel groups
print(dataio)
append_spikes(seg_num=0, chan_grp=0, spikes=None)[source]

Append spikes.

channel_group_label(chan_grp=0)[source]

Label of channel for a group.

download_probe(probe_name, origin='kwikteam')[source]

Download a prb file from github into the working dir.

The spiking-circus and kwikteam propose a list prb file. See:

probe_name: str
the name of file in github
origin: ‘kwikteam’ or ‘spyking-circus’
github project
export_spikes(export_path=None, split_by_cluster=False, use_cell_label=True, formats=None)[source]

Export spikes to other format (csv, matlab, excel, …)

export_path: str or None
export path. If None (default then inside working dir)
split_by_cluster: bool (default False)
Each cluster is split to a diffrent file or not.
use_cell_label: bool (deafult True)
if true cell_label is used if false cluster_label is used
formats: ‘csv’ or ‘mat’ or ‘xlsx’
The output format.
flush_processed_signals(seg_num=0, chan_grp=0)[source]

Flush the underlying memmap for processed signals.

flush_spikes(seg_num=0, chan_grp=0)[source]

Flush underlying memmap for spikes.

get_geometry(chan_grp=0)[source]

Get the geometry for a given channel group in a numpy array way.

get_segment_length(seg_num)[source]

Segment length (in sample) for a given segment index

get_segment_shape(seg_num, chan_grp=0)[source]

Segment shape for a given segment index and channel group.

get_signals_chunk(seg_num=0, chan_grp=0, i_start=None, i_stop=None, signal_type='initial')[source]

Get a chunk of signal for for a given segment index and channel group.

The signal can be the ‘initial’ (aka raw signal), the none filetered signals or the ‘processed’ signal.

seg_num: int
segment index
chan_grp: int
channel group key
i_start: int or None
start index (included)
i_stop: int or None
stop index (not included)
signal_type: str
‘initial’ or ‘processed’
get_some_waveforms(seg_num=0, chan_grp=0, spike_indexes=None, n_left=None, n_right=None)[source]

Exctract some waveforms given spike_indexes

get_spikes(seg_num=0, chan_grp=0, i_start=None, i_stop=None)[source]

Read spikes

iter_over_chunk(seg_num=0, chan_grp=0, i_stop=None, chunksize=1024, **kargs)[source]

Create an iterable on signals. (‘initial’ or ‘processed’)

for ind, sig_chunk in data.iter_over_chunk(seg_num=0, chan_grp=0, chunksize=1024, signal_type=’processed’):
do_something_on_chunk(sig_chunk)
load_catalogue(name='initial', chan_grp=0)[source]

Load the catalogue dict.

nb_channel(chan_grp=0)[source]

Number of channel for a channel group.

reset_processed_signals(seg_num=0, chan_grp=0, dtype='float32')[source]

Reset processed signals.

reset_spikes(seg_num=0, chan_grp=0, dtype=None)[source]

Reset spikes.

save_catalogue(catalogue, name='initial')[source]

Save the catalogue made by CatalogueConstructor and needed by Peeler inside the working dir.

Note that you can construct several catalogue for the same dataset to compare then just change the name. Different folder name so.

set_channel_groups(channel_groups, probe_filename='channels.prb')[source]

Set manually the channel groups dictionary.

set_data_source(type='RawData', **kargs)[source]

Set the datasource. Must be done only once otherwise raise error.

type: str (‘RawData’, ‘Blackrock’, ‘Neuralynx’, …)
The name of the neo.rawio class used to open the dataset.
kargs:
depends on the class used. They are transimted to neo class. So see neo doc for kargs
set_probe_file(src_probe_filename)[source]

Set the probe file. The probe file is copied inside the working dir.

set_signals_chunk(sigs_chunk, seg_num=0, chan_grp=0, i_start=None, i_stop=None, signal_type='processed')[source]

Set a signal chunk (only for ‘processed’)

class tridesclous.catalogueconstructor.CatalogueConstructor(dataio, chan_grp=None, name='catalogue_constructor')[source]

The goal of CatalogueConstructor is to construct a catalogue of template (centroids) for the Peeler.

For so the CatalogueConstructor will:
  • preprocess a duration of data
  • detect peaks
  • extract some waveform
  • compute some feature from these waveforms
  • find cluster
  • compute some metrics ontheses clusters
  • enable some manual trash/merge/split

At the end we can make_catalogue_for_peeler, this construct everything usefull for the Peeler : centroids (median for each cluster: the centroids, first and secnd derivative, median and mad of noise, …

You need to have one CatalogueConstructor for each channel_group.

Since all operation are more or less slow each internal arrays is by default persistent on the disk (memory_mode=’memmap’). With this when you instanciate a CatalogueConstructor you load all existing arrays already computed. For that tridesclous mainly use the numpy structured arrays (array of struct) with memmap. So, hacking internal arrays of the CatalogueConstructor should be easy outside python and tridesclous.

You can explore/save/reload internal state by borwsing this directory: path_of_dataset/channel_group_XXX/catalogue_constructor

Usage:

from tridesclous  import Dataio, CatalogueConstructor
dirname = '/path/to/dataset'
dataio = DataIO(dirname=dirname)
cc = CatalogueConstructor(dataio, chan_grp=0)

# preprocessing
cc.set_preprocessor_params(chunksize=1024,
    highpass_freq=300.,
    lowpass_freq=5000.,
    common_ref_removal=True,
    lostfront_chunksize=64,
    peak_sign='-', 
    relative_threshold=5.5,
    peak_span_ms=0.3,
    )
cc.estimate_signals_noise(seg_num=0, duration=10.)
cc.run_signalprocessor(duration=60.)

# waveform/feature/cluster
cc.extract_some_waveforms(n_left=-25, n_right=40, mode='rand')
cc.clean_waveforms(alien_value_threshold=55.)
cc.project(method='global_pca', n_components=7)
cc.find_clusters(method='kmeans', n_clusters=7)

# manual stuff
cc.trash_small_cluster(n=5)
cc.order_clusters()

# do this before peeler
cc.make_catalogue_for_peeler()

For more example see examples.

Persitent atributes:

CatalogueConstructor have a mechanism to store/load some attributes in the dirname of dataio. This is usefull to continue previous woprk on a catalogue.

The attributes are almost all numpy arrays and stored using the numpy.memmap mechanism. So prefer local fast fast storage like ssd.

Here the list of theses attributes with shape and dtype. N is the total number of peak detected. M is the number of selected peak for waveform/feature/cluser. C is the number of clusters

  • all_peaks (N, ) dtype = [(‘index’, ‘int64’), (‘cluster_label’, ‘int64’), (‘segment’, ‘int64’)]
  • signals_medians (nb_sample, nb_channel, ) float32
  • signals_mads (nb_sample, nb_channel, ) float32
  • clusters (c, ) dtype= [(‘cluster_label’, ‘int64’), (‘cell_label’, ‘int64’), (‘max_on_channel’, ‘int64’), (‘max_peak_amplitude’, ‘float64’), (‘waveform_rms’, ‘float64’), (‘nb_peak’, ‘int64’), (‘tag’, ‘U16’), (‘annotations’, ‘U32’), (‘color’, ‘uint32’)]
  • some_peaks_index (M) int64
  • some_waveforms (M, width, nb_channel) float32
  • some_features (M, nb_feature) float32
  • channel_to_features (nb_chan, nb_component) bool
  • some_noise_snippet (nb_noise, width, nb_channel) float32
  • some_noise_index (nb_noise, ) int64
  • some_noise_features (nb_noise, nb_feature) float32
  • centroids_median (C, width, nb_channel) float32
  • centroids_mad (C, width, nb_channel) float32
  • centroids_mean (C, width, nb_channel) float32
  • centroids_std (C, width, nb_channel) float32
  • spike_waveforms_similarity (M, M) float32
  • cluster_similarity (C, C) float32
  • cluster_ratio_similarity (C, C) float32
auto_merge_high_similarity(threshold=0.95)[source]

Recursively merge all pairs with similarity hihger that a given threshold

clean_waveforms(alien_value_threshold=100.0)[source]

Detect bad waveform (artefact, …) and tag them with allien label (-9)

compute_spike_waveforms_similarity(method='cosine_similarity', size_max=10000000.0)[source]

This compute the similarity spike by spike.

create_savepoint()[source]

this create a copy of the entire catalogue_constructor subdir Usefull for the UI when the user wants to snapshot and try tricky merge/split.

estimate_signals_noise(seg_num=0, duration=10.0)[source]

This estimate the median and mad on processed signals on a short duration. This will be necessary for normalisation in next steps.

Note that if the noise is stable even a short duratio is OK.

seg_num: int
segment index
duration: float
duration in seconds
extract_some_features(method='global_pca', selection=None, **params)[source]

Extract feature from waveforms.

extract_some_noise(nb_snippet=300)[source]

Find some snipet of signal that are not overlap with peak waveforms.

Usefull to project this noise with the same tranform as real waveform and see the distinction between waveforma and noise in the subspace.

extract_some_waveforms(n_left=None, n_right=None, wf_left_ms=None, wf_right_ms=None, index=None, mode='rand', nb_max=10000, align_waveform=False, subsample_ratio=20)[source]

Extract waveform snippet for a subset of peaks (already detected).

Note that this operation is slow.

After this the attribute some_peaks_index will contain index in all_peaks that have waveforms.

n_left: int
Left sweep in sample must be negative
n_right: int
Right sweep in sample
wf_left_ms:
Left sweep in ms must be negative
wf_right_ms:
Right sweep in ms must be negative
index: None (by default) or numpy array of int
If mode is None then the user can give a selection index of peak to extract waveforms.
mode: ‘rand’ (default) or ‘all’ or None
‘rand’ select randomly some peak to extract waveform. If None then index must not be None.
nb_max: int
When rand then is this the number of selected waveform.
find_clusters(method='kmeans', selection=None, **kargs)[source]

Find cluster for peaks that have a waveform and feature.

find_good_limits(mad_threshold=1.1, channel_percent=0.3, extract=True, min_left=-5, max_right=5)[source]

Find goods limits for the waveform. Where the MAD is above noise level (=1.)

The technics constists in finding continuous samples above 10% of backgroud noise for at least 30% of channels

Parameters

mad_threshold: (default 1.1) threshold noise channel_percent: (default 0.3) percent of channel above this noise.

flush_info()[source]

Flush info (mainly parameters) to json files.

make_catalogue_for_peeler()[source]

Make and save catalogue in the working dir for the Peeler.

order_clusters(by='waveforms_rms')[source]

This reorder labels from highest rms to lower rms. The higher rms the smaller label. Negative labels are not reassigned.

project(method='global_pca', selection=None, **params)

Extract feature from waveforms.

re_detect_peak(peakdetector_engine='numpy', peak_sign='-', relative_threshold=7, peak_span_ms=0.3)[source]

Peak are detected while run_signalprocessor. But in some case for testing other threshold we can re-detect peak without signal processing.

peakdetector_engine: ‘numpy’ or ‘opencl’
Engine for peak detection.
peak_sign: ‘-‘ or ‘+’
Signa of peak.
relative_threshold: int default 7
Threshold for peak detection. The preprocessed signal have units expressed in MAD (robust STD). So 7 is MAD*7.
peak_span_ms: float default 0.3
Peak span to avoid double detection. In second.
reload_data()[source]

Reload persitent arrays.

run_signalprocessor(duration=60.0, detect_peak=True)[source]

this run (chunk by chunk), the signal preprocessing chain on all segments.

The duration can be clip for very long recording. For catalogue construction the user must have the intuition of how signal we must have to get enough spike to detect clusters. If the duration is too short clusters will not be visible. If too long the processing time will be unacceptable. This totally depend on the dataset (nb channel, spike rate …)

This also detect peak to avoid to useless access to the storage.

duration: float
duration in seconds for each segment
detect_peak: bool (default True)
Also detect peak.
set_preprocessor_params(chunksize=1024, memory_mode='memmap', internal_dtype='float32', signalpreprocessor_engine='numpy', highpass_freq=300.0, lowpass_freq=None, smooth_size=0, common_ref_removal=False, lostfront_chunksize=None, peakdetector_engine='numpy', peak_sign='-', relative_threshold=7, peak_span_ms=0.3)[source]

Set parameters for the preprocessor engine

chunksize: int. default 1024
Length of each chunk for processing. For real time, the latency is between chunksize and 2*chunksize.
memory_mode: ‘memmap’ or ‘ram’ default ‘memmap’
By default all arrays are persistent with memmap but you can also have everything in ram.
internal_dtype: ‘float32’ (or ‘float64’)
Internal dtype for signals/waveforms/features. Support of intteger signal and waveform is planned for one day and should boost the process!!
signalpreprocessor_engine=’numpy’ or ‘opencl’
If you have pyopencl installed and correct ICD installed you can try ‘opencl’ for high channel count some critial part of the processing is done on the GPU.
highpass_freq: float dfault 300
High pass cut frquency of the filter. Can be None if the raw dataset is already filtered.
lowpass_freq: float default is None
Low pass frequency of the filter. None by default. For high samplign rate a low pass at 5~8kHz can be a good idea for smoothing a bit waveform and avoid high noise.
smooth_size: int default 0
This a simple smooth convolution kernel. More or less act like a low pass filter. Can be use instead lowpass_freq.
common_ref_removal: bool. False by dfault.
The remove the median of all channel sample by sample.
lostfront_chunksize: int. default None
size in sample of the margin at the front edge for each chunk to avoid border effect in backward filter. In you don’t known put None then lostfront_chunksize will be int(sample_rate/highpass_freq)*3 which is quite robust (<5% error) compared to a true offline filtfilt.
peakdetector_engine: ‘numpy’ or ‘opencl’
Engine for peak detection.
peak_sign: ‘-‘ or ‘+’
Signa of peak.
relative_threshold: int default 7
Threshold for peak detection. The preprocessed signal have units expressed in MAD (robust STD). So 7 is MAD*7.
peak_span_ms: float default 0.3
Peak span to avoid double detection. In millisecond.
split_cluster(label, method='kmeans', **kargs)[source]

This split one cluster by applying a new clustering method only on the subset.

tag_same_cell(labels_to_group)[source]

In some situation in spike burst the amplitude change baldly. In theses cases we can have 2 distinct cluster but the user suspect that it is the same cell. In such cases the user must not merge the 2 clusters because the centroids will represent nothing and and the Peeler.will fail.

Instead we tag the 2 differents cluster as “same cell” so same cell_label.

This is a manual action.

class tridesclous.peeler.Peeler(dataio)[source]

The peeler is core of spike sorting itself. It basically do a template matching on a signals.

This class nedd a catalogue constructed by CatalogueConstructor. Then the compting is applied chunk chunk on the raw signal itself.

So this class is the same for both offline/online computing.

At each chunk, the algo is basically this one:
  1. apply the processing chain (filter, normamlize, ….)
  2. Detect peaks
  3. Try to classify peak and detect the jitter
  4. With labeled peak create a prediction for the chunk
  5. Substract the prediction from the processed signals.
  6. Go back to 2 until there is no peak or only peaks that can’t be labeled.
  7. return labeld spikes from this or previous chunk and the processed signals (for display or recoding)

The main difficulty in the implemtation is to deal with edge because spikes waveforms can spread out in between 2 chunk.

Note that the global latency depend on this é paramters:
  • lostfront_chunksize
  • chunksize