The standard datasource used to get training and test splits of data.
ds_basic( binned_data, var_to_decode, num_cv_splits, use_count_data = FALSE, num_label_repeats_per_cv_split = 1, label_levels_to_use = NULL, num_resample_sites = NULL, site_IDs_to_use = NULL, site_IDs_to_exclude = NULL, randomly_shuffled_labels_before_running = FALSE, create_simultaneously_recorded_populations = 0 )
binned_data | A string that list a path to a file that has data in binned format, or a data frame of binned_data that is in binned format. |
---|---|
var_to_decode | A string specifying the name of the labels that should be decoded. This label must be one of the columns in the binned data that starts with 'label.' |
num_cv_splits | A number specifying how many cross-validation splits should be used. |
use_count_data | If the binned data is neural spike counts, then setting use_count_data = TRUE will convert the data into spike counts. This is useful for classifiers that work on spike count data, e.g., the poisson_naive_bayes_CL. |
num_label_repeats_per_cv_split | A number specifying how many times each label should be repeated in each cross-validation split. |
label_levels_to_use | A vector of strings specifying specific label levels that should be used. If this is set to NULL then all label levels available will be used. |
num_resample_sites | The number of sites that should be randomly selected when constructing training and test vectors. This number needs to be less than or equal to the number of sites available that have num_cv_splits * num_label_repeats_per_cv_split repeats. |
site_IDs_to_use | A vector of integers specifying which sites should be used. |
site_IDs_to_exclude | A vector of integers specifying which sites should be excluded. |
randomly_shuffled_labels_before_running | A boolean specifying whether the labels should be shuffled prior to the get_data() function being called. This is used when one wants to create a null distribution for comparing when decoding results are above chance. |
create_simultaneously_recorded_populations | If the data from all sites was recorded simultaneously, then setting this variable to 1 will cause the get_data() function to return simultaneous populations rather than pseudo-populations. |
This 'basic' datasource is the datasource that will most commonly be used for most analyses. It can generate training and tests sets for data that has been recorded simultaneously or pseudo-populations for data that was not recorded simultaneously.
Like all datasources, this datasource takes binned format data
and has a get_data()
method that is called by a cross-validation object to
get training and testing splits of data that can be passed to a classifier.
Other datasource:
ds_generalization()
# A typical example of creating a datasource to be passed cross-validation object data_file <- system.file("extdata/ZD_150bins_50sampled.Rda", package = "NDTr") ds <- ds_basic(data_file, "stimulus_ID", 18) # If one has many repeats of each label, decoding can be faster if one # uses fewer CV splits and repeats each label multiple times in each split. ds <- ds_basic(data_file, "stimulus_ID", 6, num_label_repeats_per_cv_split = 3 ) # One can specify a subset of labels levels to be used in decoding. Here # we just do a three-way decoding analysis between "car", "hand" and "kiwi". ds <- ds_basic(data_file, "stimulus_ID", 18, label_levels_to_use = c("car", "hand", "kiwi") ) # One never explicitely calls the get_data() function, but rather this is # done by the cross-validator. However, to illustrate what this function # does, we can call it explicitly here to get training and test data: all_cv_data <- NDTr:::get_data(ds) names(all_cv_data)#> [1] "train_labels" "test_labels" "time_bin" "site_0001" "site_0002" #> [6] "site_0003" "site_0004" "site_0005" "site_0006" "site_0007" #> [11] "site_0008" "site_0009" "site_0010" "site_0011" "site_0012" #> [16] "site_0013" "site_0014" "site_0015" "site_0016" "site_0017" #> [21] "site_0018" "site_0019" "site_0020" "site_0021" "site_0022" #> [26] "site_0023" "site_0024" "site_0025" "site_0026" "site_0027" #> [31] "site_0028" "site_0029" "site_0030" "site_0031" "site_0032" #> [36] "site_0033" "site_0034" "site_0035" "site_0036" "site_0037" #> [41] "site_0038" "site_0039" "site_0040" "site_0041" "site_0042" #> [46] "site_0043" "site_0044" "site_0045" "site_0046" "site_0047" #> [51] "site_0048" "site_0049" "site_0050" "site_0051" "site_0052" #> [56] "site_0053" "site_0054" "site_0055" "site_0056" "site_0057" #> [61] "site_0058" "site_0059" "site_0060" "site_0061" "site_0062" #> [66] "site_0063" "site_0064" "site_0065" "site_0066" "site_0067" #> [71] "site_0068" "site_0069" "site_0070" "site_0071" "site_0072" #> [76] "site_0073" "site_0074" "site_0075" "site_0076" "site_0077" #> [81] "site_0078" "site_0079" "site_0080" "site_0081" "site_0082" #> [86] "site_0083" "site_0084" "site_0085" "site_0086" "site_0087" #> [91] "site_0088" "site_0089" "site_0090" "site_0091" "site_0092" #> [96] "site_0093" "site_0094" "site_0095" "site_0096" "site_0097" #> [101] "site_0098" "site_0099" "site_0100" "site_0101" "site_0102" #> [106] "site_0103" "site_0104" "site_0105" "site_0106" "site_0107" #> [111] "site_0108" "site_0109" "site_0110" "site_0111" "site_0112" #> [116] "site_0113" "site_0114" "site_0115" "site_0116" "site_0117" #> [121] "site_0118" "site_0119" "site_0120" "site_0121" "site_0122" #> [126] "site_0123" "site_0124" "site_0125" "site_0126" "site_0127" #> [131] "site_0128" "site_0129" "site_0130" "site_0131" "site_0132" #> [136] "CV_1" "CV_2" "CV_3" "CV_4" "CV_5" #> [141] "CV_6" "CV_7" "CV_8" "CV_9" "CV_10" #> [146] "CV_11" "CV_12" "CV_13" "CV_14" "CV_15" #> [151] "CV_16" "CV_17" "CV_18"