In this section we analyze the output results of two distinct series of runs performed with two different approaches; a standard linear BJ filter (labelled L, which stands for linear) and an autoregressive NN (N, for non- linear). The data set we used for both series has been previously standardized (i.e.

and

), the series corresponds to the period April-May 1993 for the training data set and to the period May-June 1993 for the validation data set.

We suggest that both training and validation sets must be, in this particular kind of analysis, short and adjacent series; in fact, trends and variability features, having strong seasonal behaviour may be reproduced more easily (Aussem 1995). This kind of seasonal features may result confused in too long series. On the contrary this approach lacks in forecasting of "out of trend'' events, such as strong pressure variations related to the approaching of climatic fronts, which are difficult to predict with a merely statistic approach.

In Table 1 (click here) results corresponding to L-runs are reported. A standard least squares algorithm has been used to refine fitting parameters on training set. Note that reported standard errors are evaluated on validation data set. Results of run L1 seem to be quite promising, confirming the strong autocorrelation component contained in temperature time series. Runs L2 (Fig. 3 (click here)) and L3 represent an attempt to forecast temperature with a time lag equal to 6 hours: note that the purely autoregressive model give better performances than the ARMAX one. In order to justify this fact, two explanations are possible:

Figure 3: Binned distribution of differences between observed and predicted temperatures for L2 run

**Table 1:** Parameters and errors in validation sets of L-runs; regrs stands for regressor type, pts is the number of samples in training and validation set, while RMS is the evaluated standard error
Run	Type	k_T/k_P	lag (hours)	regrs	pts	RMS (C)
L1	AR	1	1	T	1320	0.7
L2	AR	1	6	T	220	2.2
L3	ARMAX	2/2	6	T/P	220	3.1

The first run of N type represents a NLAR attempt to perform a forecast with a time lag equal to 1 hour. Note that we found better performances using few units (only one in the shown case) in the hidden layer.

In runs from N2 to N5 we checked the NN capabilities with a lag of 6 hours. Runs N2, N3 and N4 use NLARN, LARMA and NLARMAX schemes respectively, while run N5 differs slightly from the previous ones. In fact we used 8 units in hidden layer and then we refined the network architecture with a pruning strategy called "Optimal Brain Surgeon'' (Nørgaard 1995). After this procedure about 70% of unit connections have been pruned, confirming that in the present case a low number of units in hidden layer increases performances and network stability. Only results from run N2 are shown (Fig. 4 (click here)) being representative of the sub-set with time lag equal to 6 hours.

Figure 4: Binned distribution of differences between observed and predicted temperatures for N2 run

**Table 2:** Parameters and errors in validation sets of N-series; regrs stands for regressor type, pts is the number of samples in training and validation set, while RMS is the evaluated standard error, nhu is the number of hidden units. The label (p) means that a pruning algorithms has been used (see text). Symbol (*) refers to NN topology before pruning
Run	nhu	Type	k_T/k_P	lag(hours)	regrs	pts	RMS (C)
N1	1	NLAR	5	1	T	1320	0.6
N2	2	NLAR	10	6	T	220	1.3
N3	1	NLARMA	5	6	T	220	1.4
N4	1	NLARMAX	5/5	6	T/P	220	1.4
N5	8(*)	NLARMAX(p)	5/5	6	T/P	220	1.5
N6	1	NLAR	4	12	T	110	2.1

In Table 2 (click here) the characteristics of six runs and estimation errors for the validation data set are summarised. BJ and NN predictions performances may be compared with predictions obtained with Carbon Copy technique. This method assumes that the value of T at time t is equal to T at time t-24 (hours).

The estimated confidence level of the Carbon Copy analysis we carried out is about 2.1

C: this value limits the actual prediction capability to a time interval of 12 hours (Table 2 (click here)). Figure 5 (click here) gives a different criterion to evaluate predictions in AR and NLAR approaches showing the correlation coefficients

between validation set and corresponding models versus forecast time step. Figure 5 (click here) also shows that both models are highly correlated with observed series, confirming the correctness in the choice of an autoregressive model. Moreover, NN shows greater correlation values than BJ.

Figure 5: Correlation coefficient between actual and predicted values as a function of prediction time in AR and NLAR cases

4. Results