Simple regression example

The regression example proposed uses the SantaFe Laser dataset 1, a chaotic time-series typically used for regression benchmark. The task is a 1-step ahead prediction.

The dataset can be generated using the get_laser_data.py script present in the “examples” folder, that can be executed using:

python3 get_laser_data.py

The script will take care of formatting the data in the proper way for the EchoBay framework.

In particular, after downloading the dataset, it will be rescaled and split it in Training, Validation, and Test set, with a 55%/20%/25% proportion.

At the end of the execution, 7 files will be generated:

  • FullData.csv, which contains the entire time-series

  • TrainData.csv

  • TrainLabel.csv

  • ValData.csv

  • ValLabel.csv

  • TestData.csv

  • TestLabel.csv

EchoBay will use the validation set to evaluate the performance achieved by each configuration of hyperparameters. After choosing the best set of hyperparameters, EchoBay will merge the training and the validation sets, and evaluate the performance of the network on the test set.

The laser.yml file provides a ready-to-use optimization strategy for the task.

The template.yml file can be used in order to understand how to customize the YAML file, while the Section Hyperparams basic configuration presents the role of each hyperparameter and the typical ranges of optimization.

The optimization process can be started using the following command:

./echobay train laser.yml LaserTrain

EchoBay will first perform a random sampling of the hyperparameters space. The number of iterations for the random sampling process depends on the number of hyperparameters to be optimized OptDim, and it is determined as follow:

Iterations = 10 + 3 * (OptDim-3)

If OptDim is less than 3, 10 random sampling iterations will be performed.

After the random sampling, the bayesian optimization will start. In this case, the number of iterations is independent from the number of optimizable hyperparameters and it is set to 40. The bayesian optimization will stop before the 40th iteration if the value of fitness achieved by one of the configuration is below the early_stop threshold set in the YAML file.

At the end of the optimization process, EchoBay will save the output files in the LaserTrain1 folder. The number at the end of the folder is added in case of multiple datasets optimization.

The output files generated by the EchoBay can be divided in two categories, the optimization results and the optimization structures.

1

Gershenfeld, Neil A., and Andreas S. Weigend. The future of time series. No. XEROX-SPL-93-057. Xerox Corporation, Palo Alto Research Center, 1993.

Optimization Results File

They consist of:

  • samples.dat

  • aggregated_observation.dat

  • outputLabel_ k.csv

  • tSamples.dat

samples.dat: this file has OptDim + 1 columns. The leftmost column indicates the iteration number of the bayesian optimization process, where -1 refers to the random sampling process. The other columns contains the configuration of hyperparameters tested, in the range [0,1]. The order followed respects the index value in the YAML configuration file. The last row contains the configuration used for the test set.

aggregated_observation.dat : it consists of 2 columns. The leftmost one has the same structure of the one present in samples.dat. The second column contains the fitness value achieved at that iteration. In case of multiple guesses for each configuration, the fitness performance will be the average of the performances achieved by each guesses. The last row refers to the fitness value achieved on the test set, considering the worst case scenario among all the guesses.

outputLabel_k.csv: there will be k outputLabel files, where k is the number of guesses performed for each configuration. Each file contains the output of the network, which in the regression case will be the predicted values at each time-step, while in the classification case will be the predicted class. They are stored in binary format, so they must be converted before being readable. This can be done using the converter executable, using the following command:

./converter outputLabel_1.csv outputConverted_1.csv

Optimization Structures File

They consist of:

  • optimal.yml

  • Win_eigen x.dat

  • Wr_eigen x.dat

  • State_eigen x.dat

  • Wout_eigen.dat

optimal.yml: the first part of the file is a copy of the original YAML used for the optimization. At the end of the file is added the vector x with the optimal values for each parameter, in the range [0,1], following the order of the index field in each optimizable hyperparameters.

Win_eigenx.dat: these files contain the Win weight matrix. There will be x files, where x is the number of layers of the ESN.

Wr_eigenx.dat: these files contain the Wr weight matrix.

State_eigenx.dat: these files contain the final state of the ESN after the training procedure. This is useful in order to avoid an unnecessary reset of the ESN state.

Wout_eigen.dat: this files contains the trained Wout weight matrix.

These files are useful when using the EchoBay framework in test mode. This can be done with the following command:

./echobay test trainingFolderName outputFolderName

Where trainingFolderName in this case correspond LaserTrain1. In test mode, EchoBay will load the structures already present in the folder to evaluate the performance on the test set, and no optimization procedure will be performed.