6 Application to different data sets
6.1 TYT data
We detail here the code used to estimate a two-state Poisson HMM based on the tinnitus dataset available in data/tinnitus.RData.
- Set a seed for randomness, and load files
set.seed(123)
library(TMB)
::compile("code/poi_hmm.cpp")
TMB## [1] 0
dyn.load(dynlib("code/poi_hmm"))
source("functions/utils.R")
load("data/tinnitus.RData")
- Set initial parameters
# Parameters and covariates
<- 2
m <- matrix(c(0.8, 0.2,
gamma 0.2, 0.8), nrow = m, ncol = m)
<- seq(quantile(tinn_data, 0.1), quantile(tinn_data, 0.9), length.out = m)
lambda <- stat.dist(gamma) delta
- Transform them into working parameters
<- pois.HMM.pn2pw(m, lambda, gamma)
working_params <- list(x = tinn_data, m = m) TMB_data
- Estimate the parameters via a function
# Build the TMB object
<- MakeADFun(TMB_data, working_params,
obj_tmb DLL = "poi_hmm", silent = TRUE)
# Optimize
<- nlminb(start = obj_tmb$par,
mod_tmb objective = obj_tmb$fn,
gradient = obj_tmb$gr,
hessian = obj_tmb$he)
# Check convergence
$convergence == 0
mod_tmb## [1] TRUE
# Results
summary(sdreport(obj_tmb), "report")
## Estimate Std. Error
## lambda 1.63641100 0.27758296
## lambda 5.53309576 0.31876147
## gamma 0.94980209 0.04374676
## gamma 0.02592204 0.02088688
## gamma 0.05019791 0.04374676
## gamma 0.97407796 0.02088688
## delta 0.34054200 0.23056437
## delta 0.65945800 0.23056437
For the code used to generate coverage probabilities and acceleration results, please take a look at code/poi_hmm_tinn.R.
6.2 Lamb data
We detail here the code used to estimate a two-state Poisson HMM based on the lamb dataset available in data/fetal-lamb.RData
- Set a seed for randomness, and load files
set.seed(123)
library(TMB)
::compile("code/poi_hmm.cpp")
TMB## [1] 0
dyn.load(dynlib("code/poi_hmm"))
source("functions/utils.R")
load("data/fetal-lamb.RData")
<- lamb
lamb_data rm(lamb)
- Set initial parameters
# Parameters and covariates
<- 2
m <- matrix(c(0.8, 0.2,
gamma 0.2, 0.8), nrow = m, ncol = m)
<- seq(0.3, 4, length.out = m)
lambda <- stat.dist(gamma) delta
- Transform them into working parameters
<- pois.HMM.pn2pw(m, lambda, gamma)
working_params <- list(x = lamb_data, m = m) TMB_data
- Estimate the parameters via a function
# Build the TMB object
<- MakeADFun(TMB_data, working_params,
obj_tmb DLL = "poi_hmm", silent = TRUE)
# Optimize
<- nlminb(start = obj_tmb$par,
mod_tmb objective = obj_tmb$fn,
gradient = obj_tmb$gr,
hessian = obj_tmb$he)
# Check convergence
$convergence == 0
mod_tmb## [1] TRUE
# Results
summary(sdreport(obj_tmb), "report")
## Estimate Std. Error
## lambda 0.25636541 0.04016445
## lambda 3.11475432 1.02131182
## gamma 0.98872128 0.01063571
## gamma 0.31033853 0.18468648
## gamma 0.01127872 0.01063571
## gamma 0.68966147 0.18468648
## delta 0.96493123 0.03181445
## delta 0.03506877 0.03181445
For the code used to generate coverage probabilities and acceleration results, please take a look at code/poi_hmm_lamb.R.
On a minor note, when comparing our estimation results to those reported by Leroux and Puterman (1992), some non-negligible differences can be noted. The reasons for this are difficult to determine, but some likely explanations are given in the following. First, differences in the parameter estimates may result e.g. from the optimizing algorithms used and related setting (e.g. convergence criterion, number of steps, optimization routines used in 1992, etc). Moreover, Leroux and Puterman (1992) seem to base their calculations on an altered likelihood, which is reduced by removing the constant term \(\sum_{i=1}^{T} \log(x_{i}!)\) from the log-likelihood. This modification may also possess an impact on the behavior of the optimization algorithm, as e.g. relative convergence criteria and step size could be affected.
The altered likelihood becomes apparent when computing it on a one-state HMM. A one-state Poisson HMM is a Poisson regression model, for which the log-likelihood has the expression \[\begin{align*} l(\lambda) &= \log \left(\prod_{i=1}^{T} \frac{\lambda^{x_i} e^{-\lambda}}{x_{i}!} \right)\\ &= - T \lambda + \log(\lambda) \left( \sum_{i=1}^{T} x_i \right) - \sum_{i=1}^{T} \log(x_{i}!). \end{align*}\] The authors find a ML estimate \(\lambda = 0.3583\) and a log-likelihood of -174.26. In contrast, calculating the log-likelihood explicitly shows a different result.
<- lamb_data
x # We use n instead of T in R code
<- length(x)
n <- 0.3583
l - n * l + log(l) * sum(x) - sum(log(factorial(x)))
## [1] -201.0436
The log-likelihood is different, but when the constant \(- \sum_{i=1}^{T} \log(x_{i}!)\) is removed, it matches our result.
- n * l + log(l) * sum(x)
## [1] -174.2611
6.3 Simulated data
We detail here the code used to simulate two datasets from two-states Poisson HMMs, one of size 2000 and one of size 5000. Then, using the same procedure as above, we estimate a model using different initial parameters.
- Set initial parameters (data size and HMM parameters)
<- 2000
DATA_SIZE_SIMU <- 2
m # Generate parameters
<- seq(10, 14, length.out = m)
lambda # Create the transition probability matrix with 0.8 on its diagonal
<- matrix(c(0.8, 0.2,
gamma 0.2, 0.8), nrow = m, ncol = m)
<- stat.dist(gamma) delta
The stat.dist
function computes the stationary distribution.
- Generate data with one of the functions defined in Generating data
<- pois.HMM.generate.sample(ns = DATA_SIZE_SIMU,
simu_data mod = list(m = m,
lambda = lambda,
gamma = gamma,
delta = delta))$data
- Set initial parameters
# Parameters and covariates
<- 2
m <- matrix(c(0.6, 0.4,
gamma 0.4, 0.6), nrow = m, ncol = m)
<- seq(quantile(simu_data, 0.1), quantile(simu_data, 0.9), length.out = m)
lambda <- stat.dist(gamma)
delta # Display Poisson means
lambda## [1] 7 17
- Transform them into working parameters
<- pois.HMM.pn2pw(m, lambda, gamma)
working_params <- list(x = simu_data, m = m) TMB_data
- Estimate the parameters via a function
# Build the TMB object
<- MakeADFun(TMB_data, working_params,
obj_tmb DLL = "poi_hmm", silent = TRUE)
# Optimize
<- nlminb(start = obj_tmb$par,
mod_tmb objective = obj_tmb$fn,
gradient = obj_tmb$gr,
hessian = obj_tmb$he)
# Check convergence
$convergence == 0
mod_tmb## [1] TRUE
# Results
summary(sdreport(obj_tmb), "report")
## Estimate Std. Error
## lambda 9.7345726 0.23644285
## lambda 13.8098676 0.22273984
## gamma 0.8068975 0.03331223
## gamma 0.1564875 0.02671332
## gamma 0.1931025 0.03331223
## gamma 0.8435125 0.02671332
## delta 0.4476315 0.05201103
## delta 0.5523685 0.05201103
For the code used to generate coverage probabilities and acceleration results, please take a look at code/poi_hmm_simu1.R, code/poi_hmm_simu2.R, code/poi_hmm_simu3.R and code/poi_hmm_simu4.R.