Bivariate models for time series of counts: A comparison study between PBINAR models and dynamic factor models

Abstract The aim of this work is to assess the modeling performance of two bivariate models for time series of counts, within the context of a forest fires analysis in two counties of Portugal. The first model is a periodic bivariate integer-valued autoregressive (PBINAR), easily interpreted due to the PINAR description of each component. The alternative model is a bivariate dynamic factor (BDF) that has a flexible structure, with the dynamics described through the mean value of each component that is a function of latent factors. The results reveal that BDF model exhibits a better ability to capture the dependence structure.


Introduction
While in the last three decades the development of univariate count time series models has been a research topic of active interest Alzaid 1987, 1993;McKenzie 2003;Ferland, Latour, and Oraichi 2006;Fokianos 2012;Davis and Liu 2016), the research activity in modeling multivariate time series of counts is now beginning to grow (Jung, Liesenfeld, and Richard 2011;Pedeli and Karlis 2011;Scotto, Weiß, and Gouveia 2015;Davis, Holan, and Ravishanker 2016). The existing models for multivariate time series of counts can be classified into two main groups, namely the parameterdriven (PD) class in which serial correlation is introduced via some latent dynamic process (MacDonald and Zucchini 1997;Kedem and Fokianos 2002), and the observationdriven (OD) class that uses the information collected from previous observations to induce time correlation (Fokianos, Rahbek, and Tjøstheim 2009;Ferland, Latour, and Oraichi 2006;Khan, Sunecher, and Jowaheer 2016). Models belonging to the OD class include the integer-valued autoregressive moving average (INARMA) models based on thinning operators (e.g., Scotto, Weiß, and Gouveia 2015;Weiß 2018), and the integervalued generalized autoregressive conditional heteroscedastic (INGARCH) models (e.g., Tjøstheim 2012). In the class of INGARCH models, the conditional distribution of observed counts given past outcomes or latent process is usually assumed to come from some well-known discrete distributions, such as Poisson, negative binomial (NB), generalized Poisson (GP), double Poisson (DP) and ConwayMaxwellPoisson (COM-Poisson) distributions (Ferland, Latour, and Oraichi 2006;Zhu 2011a,b;Fokianos, Rahbek, and Tjøstheim 2009;Zhu 2012). The class of univariate PD extends the generalized linear models (GLM) by incorporating into the conditional mean function a latent process which evolves independently of the past observed counts. The autocorrelation as well as over-dispersion are introduced in the model via such process (see Fokianos (2012) and references therein).
Turning now to bivariate and multivariate count time series, Pedeli and Karlis (2011) introduced the BINAR(1) model in which the cross-correlation is induced through the innovation process by assuming either a bivariate Poisson distribution or a bivariate negative binomial distribution. The periodic version of Pedeli and Karlis' model, referred to as PBINAR, was first proposed by Monteiro, Pereira, and Scotto (2015). A full BINAR model with bivariate Poisson innovations was introduced and studied by Pedeli and Karlis (2013). Furthermore, multivariate INGARCH-type models have also been proposed in the literature. Heinen and Rengifo (2007) introduced the so-called multivariate autoregressive conditional double Poisson in which it is assumed that, conditionally on past observations, the means follow an autoregressive vector using copulas to introduce contemporaneous correlation. Liu (2012) proposed a bivariate Poisson integer-valued GARCH (BINGARCH) model which has the capability of modeling the serial dependence between two time series of counts. More recently, Bracher and Held (2017) introduced a class of periodic INGARCH-type models.
Within the PD class, Jorgensen et al. (1999) proposed a state-space multivariate Poisson model while Wedel, Bockenholt, and Wagner (2003) developed a general class of models with analytic factors, for the analysis of truncated multivariate count data. An autoregressive structure for conditional means was proposed by Held, Hohle, and Hofmann (2005) and applied to the multivariate infectious disease surveillance data. Jung, Liesenfeld, and Richard (2011) proposed a model with latent factor for multivariate count time series which generalizes previous works (Jorgensen et al. 1999;Wedel, Bockenholt, and Wagner 2003) and applied it to the number of trades in 5-min intervals for five New York Stock Exchange (NYSE) stocks from two industrial sectors. Serhiyenko, Ravishanker, and Venkatesan (2015) developed zero-inflated Poisson models for multivariate time series of counts. Finally, Ravishanker, Venkatesan, and Hu (2015) studied finite mixtures of multivariate Poisson time series.
The rest of the paper is organized as follows: Section 2 provides models description and the parameter estimation approach used in each model. Section 3 presents the comparative analysis in the context of monthly number of fires modeling with the discussion of the models adequacy. Finally, some concluding remarks are given in Sec. 4.

Models specification
In this section, the PBINAR model and the periodic BDF model, both with period S, are introduced and explained in detail. The PBINAR model exhibits a structure of easy interpretation, allowing cross-correlation only through the bivariate innovation process. Furthermore, it is an over-parameterized model containing 5 Â S parameters. The BDF model presented can be viewed as an alternative model, having the advantage of containing far less parameters.

PBINAR(1) model
The periodic bivariate integer-valued autoregressive of order one, say ðY t Þ; proposed by Monteiro, Pereira, and Scotto (2015) is an extension of the bivariate model introduced by Pedeli and Karlis (2011), assuming periodic time-varying parameters and periodic bivariate sequences of innovations, where Y t admits the representation with / j;t ¼ a j;i ; for t ¼ i þ kT ði ¼ 1; :::; TÞ; j ¼ 1, 2, and k 2 N 0 : Usually, the process ðY t Þ is observed during N years in periods of S seasons. Thus, in order to facilitate models' interpretation, we will denote throughout Y t Y s;n for t ¼ 1; 2; :::; T; n ¼ 1; 2; :::; N and s ¼ 1; 2; :::; S; where n is the year associated with time t and s is the respective season. Note that, when t corresponds to the first season of the year n then the previous time, which corresponds to the season S of the year n -1, will be denoted as season 0 of the year n, i.e., Y 0;n Y S;nÀ1 : Therefore, the model in (1) can be rewritten as where s ¼ 1; :::; S; emphasizing the periodic nature of the model. Within this framework, the (binomial) thinning operator "" is defined as a s;j Y sÀ1;n;j ¼ d X Y sÀ1;n;j m¼1 U m;s;j ; where ðU m;s;j Þ is a periodic sequence of independent Bernoulli random variables with success probability PðU m;s;j ¼ 1Þ ¼ a s;j 2 ð0; 1Þ: Attending to the properties of the binomial thinning operator, each component of the model can be written as Y s;n;j ¼ a s;j Y sÀ1;n;j þ Z s;n;j : j ¼ 1; 2: Moreover, it is assumed that Z s;n ¼ ðZ s;n;1 ; Z i;n;2 Þ is a periodic sequence of independent random vectors with mean E½Z s;n ¼ k s :¼ ½k s;1 k s;2 0 and covariance matrix R s with r 2 j;s ¼ t s;j k s;j ; for t s;j > 0; and r 12;s ¼: u s : Furthermore, Z s;n;j is independent of Y sÀ1;n;j and of a s;j Y sÀ1;n;j : The distributions under consideration for the innovations bivariate process are the bivariate Poisson, Z s;n $ BPoiðk s;1 ; k s;2 ; u s Þ with t s;j ¼ 1; and the bivariate negative binomial Z s;n $ BNegBinðk s;1 ; k s;2 ; b s Þ with t s;j ¼ 1 þ b s k s;j : Further properties of this model can be found in Monteiro, Pereira, and Scotto (2015).
The most common approach to estimate models' parameters is the conditional maximum likelihood (CML). Consider N complete years ðy 1;1 ; y 2;1 ; :::; y S;N Þ and h the vector of unknown parameters. Without loss of generality, it is assumed that Y 0 ¼ y 0 : Note that the transition probabilities takes the form ln p s y s;n jy sÀ1;n À Á À Á : For periodic bivariate Poisson innovations the expression p s ðbjaÞ becomes ; minðk 1;s ; k 2;s Þ; and L :¼ minðb 1 Àm 1 ; b 2 Àm 2 Þ: In the case of periodic bivariate negative binomial innovations case, p s ðbjaÞ is given by

Bivariate dynamic factor model
In this section, the bivariate version of the dynamic factor (BDF) model of Jung, Liesenfeld, and Richard (2011) with the inclusion of a periodic component is presented.
Consider a bivariate vector of counts Y t ¼ ðY t;1 ; Y t;2 Þ observed at time tðt ¼ 1; :::; TÞ; where the dynamics is introduced at the level of the latent factors. Accordingly, counts are assumed to be conditionally independently distributed with an appropriate discrete distribution, with means k t;j considered as latent random variables. Furthermore, it will be assumed that the logarithm of the mean vector k t ¼ ðk t;1 ; k t;2 Þ admits the form where l is an intercept to which the seasonal contribution is added (through S t b) where b ¼ ðb 1 ; :::; b 12 Þ; and S t is a 2 Â 12 design matrix with S t;1 ¼ S t;2 ; and X t is a 3-dimensional vector of latent random factors which are assumed to be independent from each other, and C is the ð2 Â 3Þ loading factors matrix. The choice of the lnðÁÞ function is justified by the fact that it guarantees the positiveness of the mean vector without imposing further restrictions on the parameters included in the linear function. Turning now to the bivariate context, it will be assumed that X t ¼ ðZ t ; W t;1 ; W t;2 Þ; where Z t represents a common latent factor which incorporates temporal correlation and cross-correlation and two specific factors ðW t;1 ; W t;2 Þ related with each component which also incorporate temporal correlation. Furthermore, the matrix of loading factors is assumed to be of the form Serial and cross-correlation in the counts are introduced through the structure of the factors components. Following Jung, Liesenfeld, and Richard (2011), it will be considered that the components of X t follow independent Gaussian AR(1) processes, that is X t ¼ UX tÀ1 þ t with U ¼ diagð/; q 1 ; q 2 Þ and t follows a multivariate Gaussian distribution with zero-mean vector and covariance-matrix R ¼ diagðr 2 ; r 2 1 ; r 2 2 Þ: In order to ensure the stationarity of the factors, it is assumed that j/j < 1; jq j j < 1; for j ¼ 1, 2.
The BDF model is also a state-space model also known as partial observed Markov processes (POMP), being Y t ¼ ðY t;1 ; Y t;2 Þ the observation process and X t ¼ ðZ t ; W t;1 ; W t;2 Þ the state vector at time t. For this type of non-linear and non-Gaussian state-space models, the analytical study of the likelihood function is, in general, intractable since it is necessary to use high-dimension integration. Note that for the current model, by setting Y 1:T ¼ ðY 1 ; :::; Y T Þ and X 0:T ¼ ðX 0 ; :::; X T Þ; being the marginal probability function of Y 1:T obtained by multiple integration as Note that, for f Y t jX t ðy t jx t ; hÞ in (6) we can either consider the Poisson distribution or a more flexible distribution such as the negative binomial distribution, i.e., ; Neg: Bin: : The difficulty in obtaining a closed-form expression for the likelihood led to several developments of simulation-based methods in order to perform likelihood-based inferences (Liesenfeld and Richard 2003;Richard and Zhang 2007;Scharth and Kohn 2016). These methods are based on the Sequential Monte Carlo (SMC) algorithm (Andrieu, Doucet, and Holenstein 2010) also known as particle filter. This procedure consists in extending the model of interest by allowing its (time-invariant) parameters to become artificially random, following a random walk in time ðh t Þ: From an initial h 0 and successively diminishing the parameters artificial random perturbations the algorithm generates a sequence of updated parameter estimates h 1 ; h 2 ; :::; converging to the maximum likelihood estimates, under some regularity conditions on the likelihood surface and on the algorithmic parameters (for further details see Ionides et al. 2011Ionides et al. , 2006. Iterated filtering are implemented in this work using the software package POMP (Partially Observed Markov Processes) described in Ionides (2016, 2017) and available at the R statistical computing environment (R Development Core Team 2008). In order to apply the algorithm, it is necessary to define the number of iterations, the initial magnitude of the parameter perturbations and the correspondent decrease rate from one iteration to the next one, and the number of particles to use in the embedded SMC algorithm. In practice, the convergence of the IF algorithm is assessed via convergence diagnostic plots, since the conditions on the likelihood surface are rather technical (Bret o 2014).

PBINAR and BDF models: a comparative study for the analysis of bivariate time series of fire activity
In this section, monthly records of number of forest fires in O. Bairro and Vagos counties (Aveiro's district), collected from 1985 to 2014, are analyzed through PBINAR and BDF models. The bivariate time series were obtained from the Instituto de Conservação da Natureza e das Florestas (ICNF 2016). Figure 1 displays both time series. Note that both series exhibit a periodic structure in the sample autocorrelation function and in the sample cross-correlation (see Figure 2). The PBINAR (1) model of period 12 is initially fitted to the pair of bivariate time series by considering (i) a periodic bivariate Poisson (BPois.) distribution for the innovation process, and (ii) a bivariate periodic negative binomial (BNBin) as defined in (6). Table 1 resumes the analysis of Pearson residuals for both models. In both cases the sample means of Pearson residuals are close to zero although in the PBINAR model the sample variance differs quite significantly from one. Furthermore, the Ljung-Box test rejects the absence of autocorrelation at different lags. Hence, such model does not accurately captures the dependence structure exhibited by both time series.
In fitting the BDF model with innovations being either Poisson or negative binomially distributed, the IF algorithm was primary applied to two hundred initial candidates of the parameters from which were selected three initial parameter vectors that presented the highest values of the likelihood estimates. This search was computational time-consuming, using parallelization of the cores of a i7-4500U intel core computer with 8 GB of RAM. For these chosen three initial parameter vector values, the IF algorithm was applied with the following algorithmic parameters: 300 iterations, an exponential decay of perturbations of a ¼ 0:70 and 20000 particles. Furthermore, 70,000 particles are used to obtain the point estimates of the log-likelihood 'ðÁÞ; which result    from taking the average of ten likelihood evaluations from which we calculate the Monte Carlo standard error. Parameters standard errors are obtained via profile likelihoods (see Ionides et al. 2017). The convergence of the maximum likelihood estimates is assessed through convergence diagnostic plots (see Figures 3 and 4 for the Poisson case in the appendix). Such plots suggest an appropriate choice of the algorithmic parameters. Moreover, having into account the first search and also Figure 3 the IF method better identifies the parameters associated to the latent factors than the parameters in the l and b vectors, since plausible likelihoods are found with widely differing parameter values. Furthermore, Pearson residuals shows that both sample means and sample variances are close to zero and one, respectively. Additionally, the Pearson residuals did not presented significant autocorrelation (see Table 1).
In order to assess the adequacy of the distributional assumption, the probability integral transform (PIT) histogram for a discrete cumulative predictive distribution proposed by Czado, Gneiting, and Held (2009) is computed ( Figure 5). In PIT histograms approximate 95% confidence intervals, obtained from a standard v 2 goodness-of-fit test (being the null hypothesis that the J ¼ 10 bins of the histogram are drawn from a uniform distribution) were incorporated as in Jung, McCabe, and Tremaayne (2015). Figure 5 also shows cumulative predictive distribution against the uniform distribution.
Note that, for the Poisson distribution the p-values of the v 2 goodness-of-fit test do not reject, at any conventional significance level, the null hypotheses of uniform PIT histograms for both components of the series. On the contrary, these hypotheses are rejected for the negative binomial distribution, leading to conclude that Poisson distributional assumption is more suitable than the negative binomial distribution. From visual inspection of the PIT histograms it can be seen that for the Poisson case, with the exception of the last bin, the bins lie inside of the confidence lines, while in the negative binomial case, only a few bins are within such range due to the U-shape of the PIT histograms. Furthermore, the deviations from the identity function on the plots of the discrete cumulative predictive distribution against u 2 ½0; 1 become obvious in the case of the negative binomial.
Having into account that in most real case situations the negative binomial distribution outperforms the Poisson distribution, the parametric resampling procedure proposed by Tsay (1992) was applied to better understand why the results in this situation point in the opposite direction. To this end, 5000 realizations of each model (Poisson and negative binomial) were simulated being the true (unknown) parameters replaced by their corresponding point estimates obtained by the IF algorithm, in order to assess the overall reproducibility of the fitted models and also to check the autocorrelation and the cross-correlation of the processes. For each time t, the 10% and 90% quantiles of the 5000 realizations were computed to confront with the actual data (see Figure 6). Although data generally lie inside the bounds of the acceptance envelope for both models, the Poisson model exhibits 90% quantile boundaries considerably lower than those for the negative binomial model, which helps to justify the previous results. The resampling procedure was also applied to the autocorrelation and cross-correlation function and both models generally capture the structure of these functions with a slight advantage of the Poisson model. Taking into account that Poisson distributional assumption provides a better fit, in Table 2 are presented the parameters' point estimates and also the log-likelihood estimate (complete model). Note that, point estimates associated with the specific factor of O. Bairro are not significantly different from zero, pointing to a simplification of the model by dropping the factor W t;1 in O.Bairro component.
For the simplified model, the analysis of the convergence of the IF algorithm, the Pearson residuals analysis as well as the Poisson distribution adequacy analysis was also conducted. Parameter point estimates and the estimate of the log-likelihood are presented in Table 2. The log-likelihood estimate is close to the obtained in the complete model with the advantage of incorporating lesser parameters.

Conclusions
In this paper, a comparative study between two different types of bivariate time series models suitable for modeling periodic time series of counts has been presented. The PBINAR model is the periodic counterpart of the observation-driven BINAR model of Pedeli and Karlis (2011) in which the cross-correlation is only introduced through the periodic bivariate innovation process. In terms of estimation, it can be applied standard estimation methods as the conditional maximum likelihood with the application of standard numerical optimization packages. Although has, on the one hand, a large number of parameters to be estimated which makes difficult its generalization to higher order vectors and, on the other hand, the inability, for the considered case, to capture all the existing correlation in the bivariate data under study. The alternative BDF model presents a more flexible structure that incorporates temporal and cross-correlation through the latent factors. This model, being a non-linear and non-Gaussian state space model, poses some challenges in terms of parameter estimation, that are overcome using simulation-based methods that include the IF method used in this work. This model   seems to have captured the dynamics of the underlying process of the bivariate count data under analysis and has the advantage of being easier to extend to vectors of higher orders and also is more parsimonious in terms of the number of parameters in the model. For the monthly number of fires in the counties of O.Bairro and Vagos, the bivariate factor model with Poisson distribution revealed to be adequate since it was able to capture the main features of the bivariate count data. Having into account that the district of Aveiro has 19 counties, we can build a BDF model that describes the phenomenon under study for the entire district using common counties characteristics.
The IF method reveals that the BDF model might have some parameter identifiability issues associated to l and b vectors since there are several different values of these parameters with close values of log-likelihood estimates. Before using BDF model in a forecasting step or extending it to higher dimensions it is important to clarify the existence of this identifiability problem. The comparison with other estimation methods such as efficient importance sampling described in Richard and Zhang (2007) or a Bayesian approach, will allow to clarify this issue. Alternatively, it can also be included additional assumptions, scientifically reasonable, so that the vectors l and b can be better identified or, alternatively, the inclusion of covariates containing periodic features, such as monthly temperature and humidity. This will be a topic for future research.
As a final note we would like to stress that, although in this work the PINAR and the BDF were considered for comparison purposes, such comparison is certainly not limited to these specific families of models, and can be enlarged for other competitors. An instance of that is the bivariate integer-valued GARCH model proposed by Cui and Zhu (2018), with the necessary adaptations to the periodic case, can be an alternative since it has a flexible cross-correlation structure allowing for the presence of negative correlations.