Temperature Time Series Forecasting in The Optimal Challenges in Irrigation (TO CHAIR)

Predicting and forecasting weather time series has always been a difficult field of research analysis with a very slow progress rate over the years. The main challenge in this project – The Optimal Challenges in Irrigation (TO CHAIR) – is to study how to manage irrigation problems as an optimal control problem: the daily irrigation problem of minimizing water consumption. For that it is necessary to estimate and forecast weather variables in real time in each monitoring area of irrigation. These time series present strong trends and highfrequency seasonality. How to best model and forecast these patterns has been a long-standing issue in time series analysis. This study presents a comparison of the forecasting performance of TBATS (Trigonometric Seasonal, Box-Cox Transformation, ARMA errors, Trend and Seasonal Components) and regression with correlated errors models. These methods are chosen due to their ability to model trend and seasonal fluctuations present in weather data, particularly in dealing with time series with complex seasonal patterns (multiple seasonal patterns). The forecasting performance is demonstrated through a case study of weather time series: minimum air temperature.


Introduction
In a world where climate change and increasing social conflicts are a reality, a proper management of the existing scarce resources is vital. Thus, we will try to find the best technical solutions to improve the efficiency of their use in response to environmental concerns. Most irrigation systems on sale in the market are based on control with no prediction techniques. The excess of water in the soil, which is frequently a result of these techniques, is responsible for significant water waste. Understanding the behaviour of humidity in the soil by mathematical/statistical modeling allows, among others, an efficient planning of water use via irrigation systems [1].
According to IPMA, the Portuguese Institute for the Ocean and Atmosphere (September 30, 2017), about 81% of Portugal's mainland was in severe drought, 7.4% in extreme drought, 10.7% in moderate drought and 0.8% in weak drought. 2017 was an extremely dry year and, considering the data from January 1st, 2017 to December 27 th , 2017, it will be among the 4 driest years since 1931 (all occurred after 2000), and the average annual total precipitation will be about 60% of what is deemed normal. The period from April to December, with persistently negative precipitation abnormalites, will be deemed the driest of the last 87 years. In the media, several news mention the various problems that Portugal has to face, such as producing animal feed, supplying water to the population and, very important, the lack of water for agriculture purposes. Water resources are mainly used in agriculture: about 70% of freshwater is used in agriculture. Consequently, there is much that can be done to save water, and this is of the utmost importance for our planet. Therefore, the main challenge in project The Optimal Challenges in Irrigation (TO CHAIR) is to study how to manage irrigation problems as an optimal control problem: the daily irrigation problem of minimizing water consumption. For that it is necessary to estimate and forecast weather variables (like minimum air temperature) in real time in each irrigation area, in order to determine, in particular, the evapotranspiration (related to the irrigation planning problem). Our data source are the records of the variable minimum air temperature observed in a farm located in Vila Real County, in northern Portugal, in the field of agriculture irrigation, registered in the period from January 23rd, 2015 to August 11th, 2018 on a daily basis. The main goal is to forecast these environmental variables at a location (in this case, at the farm), where there are historical observations but current measurements are not available (including various steps for forecasting (i.e., 7 days)).

Methodology
A time series is an ordered sequence of values of a variable at equally spaced time intervals, in this case daily minimum air temperature at a weather station. Time series forecasting is an important area in which past observations of the same variable are collected and analyzed to develop a model describing the underlying relationship. The model is then used to extrapolate the time series into the future. Forecasting methods are a key tool in decision-making processes in many areas, such as economics, agriculture, management or environment. There are several approaches to modeling time series, but we decided to study and to compare the accuracy of the TBATS model and the regression with correlated errors model for forecasting weather / meteorological time series, because both models can increase the chance of capturing the proprieties and the dynamics in the data and improving accurate forecasts. Both methods have the ability to deal with time series with high-frequency seasonality.
The time series analysis (of both processes) was carried out using the statistical software R programming language and specialized packages for modeling and forecasting [2].

TBATS
TBATS model is a time series model for series demonstrating multiple/complex seasonality. The TBATS model was introduced by De Livera, Hyndman and Snyder (2011, JASA). TBATS is an abbreviation denoting its salient types: T for trigonometric regressors to model multiple seasonality, B for Box-Cox transformations, A for ARMA errors (autoregressive moving average), T for trend and S for seasonality ( [3] and [4]). The model including a Box-Cox transformation (the notation ! (#) is used to represent Box-Cox transformed observations with the parameter , where ! is the observation at time ), ARMA errors, and T seasonal patterns is as follows: where % , … , & denote the seasonal periods, ℓ ! is the local level in period , is the long-run trend, ! is the short-run trend in period , ! (') represents the th seasonal component at time , ! denotes an ARMA ( , ) process and ! is a Gaussian white noise process with zero mean and constant variance ( . The smoothing parameters are given by , and ' for = 1, … , . is the damping parameter representing the damped trend (the damping factor is included in the level and measurement equations as well as the trend equation).
To introduce the trigonometric representation of seasonal components based on Fourier series (trigonometric representation of seasonal components based on Fourier series), [5] and [6], the ! (') can be rewritten as follows: for even values of ' , and when ' = ( ' − 1) 2 ⁄ for odd values of ' . It is anticipated that most seasonal components will require fewer harmonics, thus reducing the number of parameters to be estimated. A deterministic representation of the seasonal components can be obtained by setting the smoothing parameters equal to zero.
A new class of innovations state space models is obtained by replacing the seasonal component ! (') in equations (1) and (2) by the trigonometric seasonal formulation, and the measurement equation by This class is designated by TBATS, the initial T connoting "trigonometric". To provide more details about their structure, this identifier is supplemented with relevant arguments to give the designation A TBATS model requires the estimation of 2( % + ( + ⋯ + & ) initial seasonal values.

Regression model with correlated errors
An important approach in forecasting time series, particularly meteorological time series, involves fitting regression models (RM) to time series including trend and seasonality components. The RM models are originally based on linear modeling, but they also allow parameters such as trend and season to be added to the data. In this study, the trend parameter will be fitted with polynomial function, and the season parameter will be estimated with Fourier series. But for the RM to be validated, the error terms must be a sequence of uncorrelated and Gaussian (with mean 0 and variance constant).
Also, one of the most popular ways of time series modeling is autoregressive integrated moving average (ARIMA) modeling introduced by Box and Jenkins in 1960s to forecast time series [7]. An ARIMA( , , ) model can account for temporal dependence in several ways. Firstly, the time series is -differenced to render it stationary. If = 0, the observations are modeled directly, and if = 1, the differences between consecutive observations are modeled. Secondly, the time dependence of the stationary process { ! } is modeled by including autoregressive models. Thirdly, are moving average terms, in addition to any time-varying covariates. It takes the observation of previous errors. Finally, by combining these three models, we get the ARIMA model. Thus, the general form of the ARIMA models is given by: where y , is a stationary stochastic process, is the constant that determines the level of the time series, ! is the error or white noise disturbance term, ' means autoregressive coefficient and ) is the moving average coefficient. For a seasonal time series, these steps can be repeated according to the period of the cycle, whatever time interval.
A regression model (RM) with correlated errors in which are incorporated external regressors in the form of Fourier terms (to account for the seasonal behavior) are added to an ARIMA ( , , ) model [8]. These models are regression models (RM) which include a correction for autocorrelated errors, [9] and [10]. Hence, we can add ARIMA terms to the regression model to eliminate the autocorrelation. To do this, we re-fit the regression model as an ARIMA ( , , ) model with regressors, and specify the appropriate AR ( ) or MA ( ) terms to fit the pattern of autocorrelation we observed in the original residuals.
So, in this regression model we apply the Fourier series to model seasonal pattern by using Fourier terms with short-term time series dynamics allowed in the error, and we consider the following model: where ! is an ARIMA process, -andare Fourier coefficients and is a length of period. The value of is chosen by minimizing forecast error measures [11].

Forecast error measures
Let's denote the actual observation for time period by ! and the estimated or forecasted value for the same period by N ! , and is the total number of observations. The most commonly used forecast error measures are the mean error (ME), the root mean squared error (RMSE), and the mean absolute error (MAE). They are defined by the following formulas, respectively: When comparing the performance of forecast methods on a single dataset, the MAE is interesting for it is easy to understand, but the RMSE is more valuable because it is more sensitive than other measures to the occasional large error (the squaring process gives disproportionate weight to very large errors).
The MASE was proposed by Hyndman and Koehler (2006) for comparing forecast accuracies. The MASE is given by the formula:

MASE = MAE Q
where Q is a scaling statistic. For a seasonal time series, a scaling statistic can be defined using the seasonal naïve forecasts: where the seasonal naïve method accounts for seasonality by setting each prediction to be equal to the last observed value of the same season. When comparing forecasting methods, the method with lowest ME, RMSE, MAE, or MASE is the preferred one. Frequently, different accuracy measures will lead to different results as to which forecast method is the best, [12] and [13].

Data
In the present study, we focus on a minimum temperature dataset. Fig. 1 shows the time series distribution in the total observed period: between January 23rd, 2015 to August 11th, 2018 (1327 days). The graphical representation clearly shows that time series exhibits seasonal behaviour, as is expected due to the environmental nature of the data. The daily data exhibits a strong annual seasonality (a period of 365 days, because we have excluded the days 29 th of February during the period under observation) with extreme values in cold seasons. Moreover, the variation seems to be also larger in cold seasons than in warm ones.  Table 1 presents descriptive of statistics for the minimum temperature time series during the observed period by month. As expected, the minimum temperature is higher in the summer months and presents lower values in the winter months. The monthly standard deviations (SD) indicate a larger variability during the months of November, December, January and February. Minimum air temperature is characterised by a symmetric distribution (presenting values near zero) by month.  Fig. 2 presents box-plots of daily minimum temperature by month. The boxplots are able to identify some moderate outliers in some months (April, May, August, and September).

Results
The results obtained from the application of TBATS and RM with correlated errors methods are reported in this section. The methods considered in this study are applied to two sets: training data (in-sample data) and testing data (out-of-sample data) in order to testify the accuracy of the proposed forecasting models. The selected training period was from January 23rd, 2015 to January 22nd, 2018 (first 1095 observations) and was used in order to fit the models to data, and the test period with the last 232 observations (period between January 23rd, 2018 to August 11th, 2018) was used to forecast. This approach gives the ability to compare the effectiveness of different methods of prediction.

TBATS
The minimum air temperature data are observed daily and show a strong annual seasonal pattern, so the length of seasonality of the time series is % = 365. The time series exhibits an upward additive trend and an additive seasonal pattern, that is, a pattern for which the variation does change with the level of the time series.
As a second step, an ARMA model was fitted to the residuals with ( , ) combinations, and it was discovered that the TBATS (1, 1, 0, 4, {365,3}) model minimizes the AIC. AIC is known as the Akaike's information criteria. The estimated parameters for the TBATS model are shown in Table 2. No Box-Cox transformation is necessary for this time series (so, = 1). The estimated values of 0 for and 1 for imply a purely deterministic growth rate with no damping effect. The model also implies that the irregular component of the series is correlated and can be described by an ARMA (0,4) process, and that a strong transformation is not necessary to handle nonlinearities in the series. In the final model, = 2.3109 and the AIC is 9533.843. In Fig. 3 are represented the original values of the air minimum temperature, the estimates in the modeling period (training period), the forecasts in the forecasting period (testing period) and the forecast intervals for a confidence level of 90% and 95% by applying the TBATS model. The model validation was assessed by means of the residuals analysis (Fig. 4). The independency assumption was assessed by estimating the autocorrelation and the partial autocorrelation functions of residuals and the assumption that the residuals are identically normally distributed was also verified (by performing the Kolmogorov-Smirnov test).

Regression model with correlated errors
In this study, we use the regression model in the basic form ! = ! + ! + ! , where ! and ! represent the trend and the seasonal components of the time series at time , respectively. We apply the Fourier series to model the seasonal component (equation (3)). The value of K can be chosen by minimizing predictions errors (minimizing AIC).
We consider the model with data having a long seasonal period (365 for daily data, i.e., = 365). To choose the best RM, we ran the model by varying K, and the smallest forecast errors was when = 2.
The analysis of residuals indicates the existence of a temporal correlation structure in the residuals. The minimum of AIC where e , is an ARIMA (p, , ) process, in this case an ARIMA (2,0,0), i.e., an AR (2) process. Table 3 presents the estimated parameters and the respective standard errors, for the RM with AR (2) errors.
In the final model, =2.3130 and the AIC is 4953.53. Table 3 Estimated parameters for the application of the RM with correlated errors, and the correspondent standard errors.
In Fig. 5 are represented the original values of the air minimum temperature, the estimates in the modeling period (training period), the forecasts in the forecasting period (testing period) and the forecast intervals for a confidence level of 90% and 95% by applying the regression model with correlated errors (AR(2)).

Models performance
The residuals performance in both processes modeling is consistent with the white noise process as seen in Fig. 4 and Fig. 7, so we can conclude the validity and adequacy of the two fitted models. Table 4 shows the result of the accuracy measures calculated for training and testing periods for the two methods applied to the time series under study. The performance comparisons of the competing models (TBATS and RM with correlated errors) were evaluated using ME, RMSE, MAE, and MASE. The results obtained showed that the regression model with correlated errors, which requires fewer parameters to be estimated, is more accurate than TBATS, and performs better for all period times (training and test periods).
From the two models performed, we selected the most adequate model which has the lowest forecast error when comparing predicted data using a suitable test set: regression model with correlated errors. Therefore, RM with correlated errors can more efficiently capture the dynamic behaviour of the weather property, minimum air temperature, compared to TBATS.

Conclusions
In this study, we have shown that both TBATS and RM with correlated errors (for forecasting time series with complex seasonal patterns) can efficiently capture the behaviour of air temperature in the studied site. The obtained results show that the application of TBATS and RM with correlated errors methods to the minimum air temperature provides valuable insights into the studied data structures and their components, being a good basis for accurate estimations and forecasts. However we have to further explore the features of the two models, and we need to investigate more. Our preliminary findings show that, in this case (minimum air temperature at this farm) RM with correlated errors is better than the TBATS model for forecasting, because within the scope of TO CHAIR project we intend to obtain accurate forecasts of this weather variable for the following 7 days in the farm in order to solve the irrigation problems.