Statistical Modelling of Counts with a Simple Integer-Valued Bilinear Process

The aim of this work is the statistical modelling of counts assuming low values and exhibiting sudden and large bursts that occur randomly in time. It is well known that bilinear processes capture these kind of phenomena. In this work the integer-valued bilinear INBL(1,0,1,1) model is discussed and some properties are reviewed. Classical and Bayesian methodologies are considered and compared through simulation studies, namely to obtain estimates of model parameters and to calculate point and interval predictions. Finally, an empirical application to real epidemiological count data is also presented to attest for its practical applicability in data analysis.


Introduction
In the analysis of stationary integer-valued time series the class of INARMA models plays a central role. However, such models are unlikely to provide a sufficiently broad class capable of accurately capturing features often exhibited by data sets such as sudden burst of large values. For that purpose and using the concept of thinning operator, introduced by [12], conventional bilinear models can be adapted to the integer case leading to the class of integer-valued bilinear models. Doukhan et al. [3] proposed the first-order INBL(1,0,1,1) model where the thinning operators "α•" and "β•" 1 are mutually independents, { t } t ∈Z is a sequence of i.i.d. non-negative integer-valued random variables with finite mean and finite variance, independent of the operators. Doukhan et al. [3] derived conditions guaranteeing strictly and second-order stationarities of INBL(1,0,1,1) model. Drost et al. [4] also provided sufficient conditions for the existence of higher order moments of X t , considering the superdiagonal INBL(p, q, m, n). One step towards the application of bilinear models to real data sets is the estimation of parameters.
Considering Poisson thinning operators [3] have obtained moments estimators and derived their asymptotic distribution. In contrast, Bayesian analysis of INBL has not received much attention in the literature neither diagnostic analysis.
In this paper we consider the INBL(1,0,1,1) given in (1), with the following assumptions: the operators "α • " and "β • " are mutually independents such as Bi( t −1 X t −1 , β) and { t } t ∈Z is a sequence of i.i.d. Poisson random variables with mean λ, independent of the operators.
The class of stationary models defined in (1) is useful for representing time series that assume low values with high probability and exhibit sudden bursts of large values that occur randomly in time, hence can produce heavy-tailed data. As an illustration of this kind of data we present in Fig. 1 two time series of count data in epidemiology, originally studied by [3]. Data consist of the weekly number of E. coli infections and meningitis cases, both starting in January 1990 and corresponding to 143 observations for each series. Counts are typically small, skewed and both series contain a large quantity of zeros. Hereafter the weekly number of E.coli infections and weekly number of meningitis cases are denoted by the E.coli data and meningitis data, respectively.
In time series analysis we usually are interested in estimating the underlying model and in predictive capabilities of that model. Thus, the aim of this study is to establish a comparison between classical and Bayesian approaches in order to  1 Steutel and van Harn operator "φ • " is defined by . . , X, is a sequence of independent and identically distributed (i.i.d.) counting random variables with mean φ and X is a non-negative integer-valued random variable, independent of Y . If Y i is a Bernoulli random variable, we have the binomial thinning operator. conduct inference for model parameters and obtain predictions for future values. The rest of the paper is organized as follows. Classical and Bayesian methodologies are presented to obtain estimates of the model parameters in Sect. 2 and forecasting is addressed in Sect. 3. The performance of the above procedures is illustrated through a simulation study in Sect. 4. Section 5 provides applications of this model to real data sets. Finally, Sect. 6 contains some concluding remarks.

Parameters Estimation
One step forward the application of INBL models in practice is the estimation of their corresponding parameters θ θ θ = (θ 1 , θ 2 , θ 3 ) = (α, β, λ). The classical estimators studied are grouped according to two broad categories: regressionbased and likelihood-based estimators. Furthermore, Bayesian estimation is also considered. In any estimation procedure, it is required to estimate the r.vs t since they are not observable. From (1), the innovations t can be recursively calculated . . , n., using an initial value for 1 .

Conditional Least Squares Estimators
The CLS-estimators of θ θ θ are obtained by minimizing yielding to the following expressions for the parameters estimatorŝ

Conditional Maximum Likelihood Estimators
For fixed values of x 1 and 1 , the conditional log-likelihood function for the INBL(1,0,1,1) model is given by , respectively, and Poisson distribution with parameter λ. The CML-estimators are obtained by maximizing the conditional log-likelihood function. Due to the complexity of log-likelihood expression it is not possible to give explicit forms to the CML-estimators of α, β, and λ, thus it is necessary to use numerical procedures. The initial estimates required by such numerical procedures can be obtained by the method of least squares or using moment estimates given by [3] procedure.

Bayesian Approach
To implement the Bayesian version of the INBL(1,0,1,1) model we need to consider prior distributions for the parameters.Thus, for the parameters 0 < α, β < 1 we choose Beta priors with hyperparameters (a, b) and (c, d), respectively while for the positive parameter λ we choose a Gamma prior with hyperparameters (e, f ). These priors are traditionally used for the PoINAR(1) by [11]. Given the particular sample x n , the updated information about θ θ θ is expressed through Bayes theorem by the posterior distribution π(θ θ θ |x n ) given by with π(θ θ θ) representing the prior distribution.
Assuming independence assumptions on the parameters the posterior distribution is given by Thus given the complexity of the posterior distribution, Markov Chain Monte Carlo (MCMC) techniques are required for sampling purposes. For the simulations we need the full conditional distributions for each parameter θ i , denoted by π(θ i |θ θ θ −i , x n ), which is the posterior distribution of θ i conditional on all other parameters and the data x n . The full conditional distributions of α, β, and λ are, respectively: and π(λ|α, β, x n ) ∝ λ e−1 e −f λ L(x n ; θ θ θ |x 1 , 1 ).
From the above expressions it is easy to conclude that the full conditional distributions will not be standard distributions and therefore a componentwise Metropolis-Hastings algorithm is used, particularly the Adaptive Rejection Metropolis Sampling (ARMS), as described in [7]. After having generated samples θ θ θ (1) , . . . ,θ θ θ (N) sample central tendency measures are used to estimate the model parameters.

Prediction Future Observations
In this section we consider the problem of predicting the values of X n+h , h ∈ N for INBL (1,0,1,1) process based on the observed series up to time n. The usual way of producing forecasts is via the conditional predictive distribution and the most common procedure for obtaining predictions in time series models is to use conditional expectations, since we pretend to minimize the mean square error. Throughout this section we consider B n = {X 1 , . . . , X n ; 1 , . . . , n }.
We can easily prove that Since these predictors based on conditional expectation hardly produce integervalued forecasts, we can alternatively use the median of h-step-ahead conditional distribution of X n+h |B n , denoted byM n+h , to obtain coherent predictions of X n+h , as suggested in [5].

Bayesian Predictions
The Bayesian predictive probability function is based on the assumption that both the future observation X n+h and θ θ θ are unknown. The conditional distribution of X n+h given B n which can be viewed as containing all the accumulated information about the future, represents the h-step-ahead Bayesian posterior predictive distribution. It is defined by with θ θ θ ∈ Θ Θ Θ being the vector of unknown parameters, π(θ θ θ |x n ) the posterior density of θ θ θ , and f (x n+h |B n ; θ θ θ) the (classical) predictive distribution.

Point Predictions for the Future Observation
In the particular case of h = 1 the one-step-ahead Bayesian predictive distribution is given by The Bayesian predictor of X n+1 can be obtained by any location measure of the predictive distribution. Its complexity does not allow work with it directly. However we can adapt to the integer case the Tanner composition method (as reported in [13]), to get an estimate of X n+h using the sample mean or sample median of the generated values (X n+h,1 , . . . , X n+h,m ). Similarly to the classical case we can use the recursive expression whereα B ,β B , andλ B are the Bayesian estimates of the parameters. It is worth to mention that there is no need to do any plug-in as happened in classical approach.

HPD Predictive Intervals
We use an adaptive generalization of the method used to obtain Highest Posterior Density (HPD) intervals of model parameters, considering predictive distribution instead of the posterior. Hence the 100 γ % HPD predictive interval for X n+1 is defined by with K γ being the largest constant such as P [X n+1 ∈ R(γ )] ≥ γ .
We can obtain an approximation to the HPD predictive interval for X n+1 using the algorithm developed by [1]. After computing the 100 γ % credible intervalŝ where [mγ ] is the integer part of mγ , the 100 γ % HPD interval, denoted byR(γ ) is the one with the smallest amplitude.

Simulation Study
In this section we study the performance of the above classical and Bayesian procedures with count time series simulated by choosing various combinations of the parameters of INBL(1,0,1,1) model under stationarity conditions.

Inference
Through the simulation study we want to highlight the following issues: (a) how the results depend on the underlying bilinear parameter β; (b) what is the impact of sample size on the simulation results, and (c) what is the influence of the variance of the innovation process.
We simulated samples from INBL(1,0,1,1) model of length n = 50, 100, and 500 with 100 independent replicates. 2 In the absence of prior information we consider non-informative priors letting the hyperparameters equal to 0.0001. The MCMC algorithm was used with starting values based on the CLS-estimates and was run with 31,000 iterations in total, the 11,000 initial burn-in iterations were discarded and only the 20th value of the last iterations is kept to reduce the autocorrelation within the chain. Nevertheless, the stationarity of the chain and the convergence of the algorithm were duly analyzed with the usual diagnostic tests, respectively, [8] and [6] tests, which are available in package CODA. Figure 2 displays the boxplots of the biases of CLS-estimates, CML-estimates, and Bayesian estimates for θ θ θ , considering each model and the variation of sample size. Concerning the estimation of α a closer look at the figure reveals that classical estimators tend to overestimate the autoregressive parameter, in particular for the models A and D. On the other hand, β is underestimated by any methodology in models A, B, and D. Nevertheless considering all the parameters β is the one that is estimated more accurately. A comparison of the dispersion for the classical and Bayesian estimators shows the similarity for both small and large sample sizes. An important conclusion is that the value of the underlying bilinear parameter does not seem to interfere with the quality of the point estimates for this model. However the variance of the innovation process has large biases, which increases when the theoretical value of λ parameter rises, showing a significant degree of variability. As expected, both the bias and the skewness are also reduced when the sample size increases.

Prediction
To compare and analyze the different h-step-ahead predictors previously mentioned in Sect. 3 we simulated samples with sizes n = 50, 100, 200 from model (1). In order to obtain point or interval forecasts from classical approach the CML estimates were plugged in in (2) or in the predictive probability functions. To obtain Bayesian predictions, we used the expression (4) and Tanner algorithm [13]. It is worth to notice that the forecast performance depends on one hand, on the difference between x n and x n+h , h ≥ 1, similarly to what happens in INAR(1) model as described by [11] and on the other hand, on the prediction errors e n . This situation is illustrated as follows: the forecasts of  Fig. 3 the h-step-ahead predictions, considering two particular sets of parameters and n = 100, are plotted. These plots indicate that the obtained results for the predictions using the classical approach with CML-estimates and the Bayesian methodology are very similar. It must be emphasized that these predictions are closed to the limit values given by (3), corresponding to 5.33 for θ θ θ = (0.1, 0.6, 1) and to 12 when θ θ θ = (0.7, 0.2, 1). Figures 4 and 5 represent the amplitude means of the prediction intervals or the HPD predictive intervals for the future value and the frequencies of the simulated X n+1 belonging to the prediction interval, respectively. We observe that in general classical prediction intervals based on CML-estimates present smaller amplitude means than the Bayesian correspondents. Another

Application to Real Data
In this section, we illustrate the modelling procedure with the motivating examples presented in Fig. 1. It could be pointed out that both data sets are asymmetric with significant overdispersion in E.coli data, with empirical mean and variance being 2.3 and 13.03, respectively.
We should check the adequacy of the distributional assumptions of the model. For this purpose we use the nonrandomized version of PIT histogram, proposed by [2] (see [9], for further models evaluation based in its predictive performance). The graphical tools represented in Fig. 6 are the PIT histograms and the mean PIT charts applied to the data sets. From left to right, the PIT histograms are U-shaped and uniform indicating underdispersed and well-calibrated predictive distributions, respectively. These plots indicate that the probability structure addressed to the INBL(1,0,1,1) is misspecified in the E.coli data despite the Pearson residuals exhibit mean 0.0002, variance 0.9999 and no significant serial correlation. Results of the parameter estimates for the meningitis data are presented in Table 1. However the bilinear component β in the model seems to be very small, which may question its interest in the model. Finally, in order to evaluate and compare the different

Concluding Remarks
In this work classical and Bayesian approaches to time series analysis and forecasting are applied to INBL (1,0,1,1) model. However much of the work for INBL processes remains to be done. We can point out some issues that are still open questions: invertibility conditions and the probabilistic structure of the process. This class of models, due to the cross term, can generate extreme observations and hence is suitable for modelling series of counts showing heavy tailed phenomena. However these features increase the difficulty in obtaining good predictions. Throughout this work we have seen that statistical modelling of INBL processes leads to likelihood functions based on convolutions. The difficulty of computing these functions exactly points towards the development of likelihood estimation by saddlepoint approximation, as suggested by [10], and the improvement of MCMC algorithms.