The standard cross-validator (CV) — cv

This object runs a decoding analysis where a classifier is repeatedly trained and tested using cross-validation.

cv_standard(
  ndr_container = NULL,
  datasource = NULL,
  classifier = NULL,
  feature_preprocessors = NULL,
  result_metrics = NULL,
  num_resample_runs = 50,
  run_TCD = TRUE,
  num_parallel_cores = NULL,
  parallel_outfile = NULL
)

Arguments

ndr_container: The purpose of this argument is to make the constructor of the cv_standard cross-validator work with the magrittr pipe (|>) operator. This argument would almost always be set at the end of a sequence of piping operators that include a datasource and a classifier. Alternatively, one can keep this set to NULL and directly use the datasource and classifier arguments (one would almost never use both types of arguments). See the examples.
datasource: A datasource (DS) object that will generate the training and test data.
classifier: A classifier (CS) object that will learn parameters based on the training data and will generate predictions based on the test data.
feature_preprocessors: A list of feature preprocessor (FP) objects that learn preprocessing parameters from the training data and apply preprocessing of both the training and test data based on these parameters.
result_metrics: A list of result metric (RM) objects that are used to evaluate the classification performance. If this is set to NULL then the rm_main_results(), rm_confusion_matrix() results metrics will be used.
num_resample_runs: The number of times the cross-validation should be run (i.e., "resample runs"), where on each run, new training and test sets are generated. If pseudo-populations are used (e.g., with the ds_basic), then new pseudo-populations will be generated on each resample run as well.
run_TCD: A Boolean indicating whether a Temporal Cross-Decoding (TCD) analysis should be run where the the classifier is trained and tested at all points in time. Setting this to FALSE causes the classifier to only be tested at same time it is trained on which can speed up the analysis run time and save memory at the cost of not calculated the temporal cross decoding results.
num_parallel_cores: An integers specifying the number of parallel cores to use when executing the resample runs in the analysis. The default (NULL) value is to use half of the cores detected on the system. If this value is set to a value of less than 1, then the code will be run serially and messages will be printed showing how long each CV split took to run which is useful for debugging.
parallel_outfile: A string specifying the name of a file where the output from running the code in parallel is written (this argument is ignored if num_parallel_cores < 1). By default the parallel output is written to dev/null so it is not accessible. If this is changed to an empty string the output will be written to the screen, otherwise it will be written to a file name specified. See parallel::makeCluster for more details.

Value

This constructor creates an NDR cross-validator object with the class cv_standard. Like all NDR cross-validator objects, one should use run_decoding method to run a decoding analysis.

Details

A cross-validator object takes a datasource (DS), a classifier (CL), feature preprocessors (FP) and result metric (RM) objects, and runs multiple cross-validation cycles where:

A datasource (DS) generates training and test data splits of the data
Feature preprocessors (FPs) do preprocessing of the data
A classifier (CL) is trained and predictions are generated on a test set
Result metrics (RMs) assess the accuracy of the predictions and compile the results.

Examples

data_file <- system.file("extdata/ZD_150bins_50sampled.Rda",
  package = "NeuroDecodeR")

ds <- ds_basic(data_file, "stimulus_ID", 18)
#> Automatically selecting sites_IDs_to_use. Since num_cv_splits = 18 and num_label_repeats_per_cv_split = 1, all sites that have 18 repetitions have been selected. This yields 132 sites that will be used for decoding (out of 132 total).
fps <- list(fp_zscore())
cl <- cl_max_correlation()

cv <- cv_standard(datasource = ds, 
                 classifier = cl, 
                 feature_preprocessors = fps,
                  num_resample_runs = 2)  # better to use more resample runs (default is 50)
#> Warning: The result_metrics argument is NULL in the cv_standard constructor. Setting the result_metrics to default values of rm_main_results and rm_confusion_matrix.


# \donttest{

# alternatively, one can also use the pipe (|>) to do an analysis
data_file2 <- system.file("extdata/ZD_500bins_500sampled.Rda",
  package = "NeuroDecodeR")
  
DECODING_RESULTS <- data_file2 |>
    ds_basic('stimulus_ID', 18) |>
    cl_max_correlation() |>
    fp_zscore() |>
    rm_main_results() |>
    rm_confusion_matrix() |>
    cv_standard(num_resample_runs = 2) |>
    run_decoding()
#> Automatically selecting sites_IDs_to_use. Since num_cv_splits = 18 and num_label_repeats_per_cv_split = 1, all sites that have 18 repetitions have been selected. This yields 132 sites that will be used for decoding (out of 132 total).
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |======================================================================| 100%

# }