The standard datasource used to get training and test splits of data.

ds_basic(
  binned_data,
  labels,
  num_cv_splits,
  use_count_data = FALSE,
  num_label_repeats_per_cv_split = 1,
  label_levels = NULL,
  num_resample_sites = NULL,
  site_IDs_to_use = NULL,
  site_IDs_to_exclude = NULL,
  randomly_shuffled_labels = FALSE,
  create_simultaneous_populations = 0
)

Arguments

binned_data

A string that list a path to a file that has data in binned format, or a data frame of binned_data that is in binned format.

labels

A string specifying the name of the labels that should be decoded. This label must be one of the columns in the binned data that starts with 'label.'. For example, if there was a column name in a binned data file called labels.stimulus_ID that you wanted to decode, then you would set this argument to be "stimulus_ID".

num_cv_splits

A number specifying how many cross-validation splits should be used.

use_count_data

If the binned data is neural spike counts, then setting use_count_data = TRUE will convert the data into spike counts. This is useful for classifiers that work on spike count data, e.g., the poisson_naive_bayes_CL.

num_label_repeats_per_cv_split

A number specifying how many times each label should be repeated in each cross-validation split.

label_levels

A vector of strings specifying specific label levels that should be used. If this is set to NULL then all label levels available will be used.

num_resample_sites

The number of sites that should be randomly selected when constructing training and test vectors. This number needs to be less than or equal to the number of sites available that have num_cv_splits * num_label_repeats_per_cv_split repeats.

site_IDs_to_use

A vector of integers specifying which sites should be used. If this is NULL (default value), then all sites that have num_cv_splits * num_label_repeats_per_cv_split repeats will be used, and a message about how many sites are used will be displayed.

site_IDs_to_exclude

A vector of integers specifying which sites should be excluded.

randomly_shuffled_labels

A Boolean specifying whether the labels should be shuffled prior to running an analysis (i.e., prior to the first call to the the get_data() method). This is used when one wants to create a null distribution for comparing when decoding results are above chance.

create_simultaneous_populations

If the data from all sites was recorded simultaneously, then setting this variable to 1 will cause the get_data() function to return simultaneous populations rather than pseudo-populations.

Value

This constructor creates an NDR datasource object with the class ds_basic. Like all NDR datasource objects, this datasource will be used by the cross-validator to generate training and test data sets.

Details

This 'basic' datasource is the datasource that will most commonly be used for most analyses. It can generate training and tests sets for data that has been recorded simultaneously or pseudo-populations for data that was not recorded simultaneously.

Like all datasources, this datasource takes binned format data and has a get_data() method that is never explicitly called by the user of the package, but rather it is called internally by a cross-validation object to get training and testing splits of data that can be passed to a classifier.

See also

Other datasource: ds_generalization()

Examples

# A typical example of creating a datasource to be passed cross-validation object
data_file <- system.file(file.path("extdata", "ZD_150bins_50sampled.Rda"), package = "NeuroDecodeR")
ds <- ds_basic(data_file, "stimulus_ID", 18)
#> Automatically selecting sites_IDs_to_use. Since num_cv_splits = 18 and num_label_repeats_per_cv_split = 1, all sites that have 18 repetitions have been selected. This yields 132 sites that will be used for decoding (out of 132 total).

# If one has many repeats of each label, decoding can be faster if one
# uses fewer CV splits and repeats each label multiple times in each split.
ds <- ds_basic(data_file, "stimulus_ID", 6,
  num_label_repeats_per_cv_split = 3
)
#> Automatically selecting sites_IDs_to_use. Since num_cv_splits = 6 and num_label_repeats_per_cv_split = 3, all sites that have 18 repetitions have been selected. This yields 132 sites that will be used for decoding (out of 132 total).

# One can specify a subset of labels levels to be used in decoding. Here
#  we just do a three-way decoding analysis between "car", "hand" and "kiwi".
ds <- ds_basic(data_file, "stimulus_ID", 18,
  label_levels = c("car", "hand", "kiwi")
)
#> Automatically selecting sites_IDs_to_use. Since num_cv_splits = 18 and num_label_repeats_per_cv_split = 1, all sites that have 18 repetitions have been selected. This yields 132 sites that will be used for decoding (out of 132 total).

# One never explicitly calls the get_data() function, but rather this is
# called by the cross-validator. However, to illustrate what this function
# does, we can call it explicitly here to get training and test data:
all_cv_data <- get_data(ds)
names(all_cv_data)
#>   [1] "train_labels" "test_labels"  "time_bin"     "site_0001"    "site_0002"   
#>   [6] "site_0003"    "site_0004"    "site_0005"    "site_0006"    "site_0007"   
#>  [11] "site_0008"    "site_0009"    "site_0010"    "site_0011"    "site_0012"   
#>  [16] "site_0013"    "site_0014"    "site_0015"    "site_0016"    "site_0017"   
#>  [21] "site_0018"    "site_0019"    "site_0020"    "site_0021"    "site_0022"   
#>  [26] "site_0023"    "site_0024"    "site_0025"    "site_0026"    "site_0027"   
#>  [31] "site_0028"    "site_0029"    "site_0030"    "site_0031"    "site_0032"   
#>  [36] "site_0033"    "site_0034"    "site_0035"    "site_0036"    "site_0037"   
#>  [41] "site_0038"    "site_0039"    "site_0040"    "site_0041"    "site_0042"   
#>  [46] "site_0043"    "site_0044"    "site_0045"    "site_0046"    "site_0047"   
#>  [51] "site_0048"    "site_0049"    "site_0050"    "site_0051"    "site_0052"   
#>  [56] "site_0053"    "site_0054"    "site_0055"    "site_0056"    "site_0057"   
#>  [61] "site_0058"    "site_0059"    "site_0060"    "site_0061"    "site_0062"   
#>  [66] "site_0063"    "site_0064"    "site_0065"    "site_0066"    "site_0067"   
#>  [71] "site_0068"    "site_0069"    "site_0070"    "site_0071"    "site_0072"   
#>  [76] "site_0073"    "site_0074"    "site_0075"    "site_0076"    "site_0077"   
#>  [81] "site_0078"    "site_0079"    "site_0080"    "site_0081"    "site_0082"   
#>  [86] "site_0083"    "site_0084"    "site_0085"    "site_0086"    "site_0087"   
#>  [91] "site_0088"    "site_0089"    "site_0090"    "site_0091"    "site_0092"   
#>  [96] "site_0093"    "site_0094"    "site_0095"    "site_0096"    "site_0097"   
#> [101] "site_0098"    "site_0099"    "site_0100"    "site_0101"    "site_0102"   
#> [106] "site_0103"    "site_0104"    "site_0105"    "site_0106"    "site_0107"   
#> [111] "site_0108"    "site_0109"    "site_0110"    "site_0111"    "site_0112"   
#> [116] "site_0113"    "site_0114"    "site_0115"    "site_0116"    "site_0117"   
#> [121] "site_0118"    "site_0119"    "site_0120"    "site_0121"    "site_0122"   
#> [126] "site_0123"    "site_0124"    "site_0125"    "site_0126"    "site_0127"   
#> [131] "site_0128"    "site_0129"    "site_0130"    "site_0131"    "site_0132"   
#> [136] "CV_1"         "CV_2"         "CV_3"         "CV_4"         "CV_5"        
#> [141] "CV_6"         "CV_7"         "CV_8"         "CV_9"         "CV_10"       
#> [146] "CV_11"        "CV_12"        "CV_13"        "CV_14"        "CV_15"       
#> [151] "CV_16"        "CV_17"        "CV_18"