6 Application to different data sets

6.1 TYT data

We detail here the code used to estimate a two-state Poisson HMM based on the tinnitus dataset available in data/tinnitus.RData.

Set a seed for randomness, and load files

set.seed(123)
library(TMB)
TMB::compile("code/poi_hmm.cpp")
## [1] 0
dyn.load(dynlib("code/poi_hmm"))
source("functions/utils.R")
load("data/tinnitus.RData")

Set initial parameters

# Parameters and covariates
m <- 2
gamma <- matrix(c(0.8, 0.2,
                  0.2, 0.8), nrow = m, ncol = m)
lambda <- seq(quantile(tinn_data, 0.1), quantile(tinn_data, 0.9), length.out = m)
delta <- stat.dist(gamma)

Transform them into working parameters

working_params <- pois.HMM.pn2pw(m, lambda, gamma)
TMB_data <- list(x = tinn_data, m = m)

Estimate the parameters via a function

# Build the TMB object
obj_tmb <- MakeADFun(TMB_data, working_params,
                     DLL = "poi_hmm", silent = TRUE)

# Optimize
mod_tmb <- nlminb(start = obj_tmb$par,
                  objective = obj_tmb$fn,
                  gradient = obj_tmb$gr,
                  hessian = obj_tmb$he)

# Check convergence
mod_tmb$convergence == 0
## [1] TRUE

# Results
summary(sdreport(obj_tmb), "report")
##          Estimate Std. Error
## lambda 1.63641100 0.27758296
## lambda 5.53309576 0.31876147
## gamma  0.94980209 0.04374676
## gamma  0.02592204 0.02088688
## gamma  0.05019791 0.04374676
## gamma  0.97407796 0.02088688
## delta  0.34054200 0.23056437
## delta  0.65945800 0.23056437

For the code used to generate coverage probabilities and acceleration results, please take a look at code/poi_hmm_tinn.R.

6.2 Lamb data

We detail here the code used to estimate a two-state Poisson HMM based on the lamb dataset available in data/fetal-lamb.RData

Set a seed for randomness, and load files

set.seed(123)
library(TMB)
TMB::compile("code/poi_hmm.cpp")
## [1] 0
dyn.load(dynlib("code/poi_hmm"))
source("functions/utils.R")
load("data/fetal-lamb.RData")
lamb_data <- lamb
rm(lamb)

Set initial parameters

# Parameters and covariates
m <- 2
gamma <- matrix(c(0.8, 0.2,
                  0.2, 0.8), nrow = m, ncol = m)
lambda <- seq(0.3, 4, length.out = m)
delta <- stat.dist(gamma)

Transform them into working parameters

working_params <- pois.HMM.pn2pw(m, lambda, gamma)
TMB_data <- list(x = lamb_data, m = m)

Estimate the parameters via a function

# Build the TMB object
obj_tmb <- MakeADFun(TMB_data, working_params,
                     DLL = "poi_hmm", silent = TRUE)

# Optimize
mod_tmb <- nlminb(start = obj_tmb$par,
                  objective = obj_tmb$fn,
                  gradient = obj_tmb$gr,
                  hessian = obj_tmb$he)

# Check convergence
mod_tmb$convergence == 0
## [1] TRUE

# Results
summary(sdreport(obj_tmb), "report")
##          Estimate Std. Error
## lambda 0.25636541 0.04016445
## lambda 3.11475432 1.02131182
## gamma  0.98872128 0.01063571
## gamma  0.31033853 0.18468648
## gamma  0.01127872 0.01063571
## gamma  0.68966147 0.18468648
## delta  0.96493123 0.03181445
## delta  0.03506877 0.03181445

For the code used to generate coverage probabilities and acceleration results, please take a look at code/poi_hmm_lamb.R.

On a minor note, when comparing our estimation results to those reported by Leroux and Puterman (1992), some non-negligible differences can be noted. The reasons for this are difficult to determine, but some likely explanations are given in the following. First, differences in the parameter estimates may result e.g. from the optimizing algorithms used and related setting (e.g. convergence criterion, number of steps, optimization routines used in 1992, etc). Moreover, Leroux and Puterman (1992) seem to base their calculations on an altered likelihood, which is reduced by removing the constant term \(\sum_{i=1}^{T} \log(x_{i}!)\) from the log-likelihood. This modification may also possess an impact on the behavior of the optimization algorithm, as e.g. relative convergence criteria and step size could be affected.

The altered likelihood becomes apparent when computing it on a one-state HMM. A one-state Poisson HMM is a Poisson regression model, for which the log-likelihood has the expression \[\begin{align*} l(\lambda) &= \log \left(\prod_{i=1}^{T} \frac{\lambda^{x_i} e^{-\lambda}}{x_{i}!} \right)\\ &= - T \lambda + \log(\lambda) \left( \sum_{i=1}^{T} x_i \right) - \sum_{i=1}^{T} \log(x_{i}!). \end{align*}\] The authors find a ML estimate \(\lambda = 0.3583\) and a log-likelihood of -174.26. In contrast, calculating the log-likelihood explicitly shows a different result.

x <- lamb_data
# We use n instead of T in R code
n <- length(x)
l <- 0.3583
- n * l + log(l) * sum(x) - sum(log(factorial(x)))
## [1] -201.0436

The log-likelihood is different, but when the constant \(- \sum_{i=1}^{T} \log(x_{i}!)\) is removed, it matches our result.

- n * l + log(l) * sum(x)
## [1] -174.2611

6.3 Simulated data

We detail here the code used to simulate two datasets from two-states Poisson HMMs, one of size 2000 and one of size 5000. Then, using the same procedure as above, we estimate a model using different initial parameters.

Set initial parameters (data size and HMM parameters)

DATA_SIZE_SIMU <- 2000
m <- 2
# Generate parameters
lambda <- seq(10, 14, length.out = m)
# Create the transition probability matrix with 0.8 on its diagonal
gamma <- matrix(c(0.8, 0.2,
                  0.2, 0.8), nrow = m, ncol = m)
delta <- stat.dist(gamma)

The stat.dist function computes the stationary distribution.

Generate data with one of the functions defined in Generating data

simu_data <- pois.HMM.generate.sample(ns = DATA_SIZE_SIMU,
                                      mod = list(m = m,
                                                 lambda = lambda,
                                                 gamma = gamma,
                                                 delta = delta))$data

Set initial parameters

# Parameters and covariates
m <- 2
gamma <- matrix(c(0.6, 0.4,
                  0.4, 0.6), nrow = m, ncol = m)
lambda <- seq(quantile(simu_data, 0.1), quantile(simu_data, 0.9), length.out = m)
delta <- stat.dist(gamma)
# Display Poisson means
lambda
## [1]  7 17

Transform them into working parameters

working_params <- pois.HMM.pn2pw(m, lambda, gamma)
TMB_data <- list(x = simu_data, m = m)

Estimate the parameters via a function

# Build the TMB object
obj_tmb <- MakeADFun(TMB_data, working_params,
                     DLL = "poi_hmm", silent = TRUE)

# Optimize
mod_tmb <- nlminb(start = obj_tmb$par,
                  objective = obj_tmb$fn,
                  gradient = obj_tmb$gr,
                  hessian = obj_tmb$he)

# Check convergence
mod_tmb$convergence == 0
## [1] TRUE

# Results
summary(sdreport(obj_tmb), "report")
##          Estimate Std. Error
## lambda  9.7345726 0.23644285
## lambda 13.8098676 0.22273984
## gamma   0.8068975 0.03331223
## gamma   0.1564875 0.02671332
## gamma   0.1931025 0.03331223
## gamma   0.8435125 0.02671332
## delta   0.4476315 0.05201103
## delta   0.5523685 0.05201103

For the code used to generate coverage probabilities and acceleration results, please take a look at code/poi_hmm_simu1.R, code/poi_hmm_simu2.R, code/poi_hmm_simu3.R and code/poi_hmm_simu4.R.