Training set
Training set can be made easily directly from the time series. Certain number of measured values is used as inputs and the value to be predicted (i.e., the value in the future, in some chosen distance after these input measured values) is used as required output. Input part of the time series is called window, the output part is the predicted value. By shifting the window over time series the items of training set are made (see figure 3). It is advised to left part of time series for testing, i.e., to not use this part during learning, but to use it to test how successfully the network learned to predict our data.
The training set obtained in this way can be then adjusted for the needs of a particular neural network. For example, it may be necessary to adjust the values to a certain interval, such as (0,1).
Figure 4 - Creating training set
Available data are often divided into three set: learning set, validating set and testing set. These sets can overlap (see the figure 5) and do not have to be continuous. The learning set is a sequence that is shown to the neural network during the learning phase. The network is adapted to it to achieve required outputs (in other words, weights in the network are changed based on this set). The difference to required output is measured using the validating set and this difference is used to validate whether the learning of the network can be finished. The last set, testing set, is then used to test whether the network is able to work also on the data that were not used in the previous process.
To summarize, the learning set is used for creating a model, validation set is used for verifying the model, and the testing set is used for testing of the usability of the model.
Figure 5 - Validating, learning and testing set of data
Data preprocessing is important as well. For example, it can be useful to remove trend and other components (such as seasonal trends) - of course only if we are able to detect such components. The overview about time series decomposition can be found in references
Especially for neural networks that can have outputs only in a certain interval it is important to realize that it is not possible to predict values outside of this interval. Data normalization is then required for the network to be able to get meaningful outputs.
(c) Marek Obitko, 1999 - Terms of use