9 GitHub

For reading more easily, we recommend opening the file Data supplements.Rproj with R-Studio, which lets the user have the correct working path set up automatically. Other files can be opened directly by double-clicking on them, or via the Files tab in R-Studio.
Most of the code can be folded/collapsed into sections easily by clicking in the menu Edit->Collapse All Output.
Unfolding can be done by clicking on the arrows to the left of the folded sections, or in the menu Edit->Expand All Output.

9.1 Directory structure

For readability, folders are displayed with a dash at the end. The folder code will be at the section code/.

9.1.1 code/

Files required to compute the main results of the article: acceleration factors and coverage probabilities.

Folder on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code

9.1.2 code/linreg.cpp

Specifies the function computing the negative log-likelihood of a linear regression in C++. Heavily inspired by https://github.com/kaskr/adcomp/blob/master/tmb_examples/linreg.cpp.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/linreg.cpp
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/linreg.cpp

9.1.3 code/linreg_extended.cpp

Similar to code/linreg.cpp, with some more complex code added to serve as an example. Used to compute negative log-likelihoods in R with TMB.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/linreg_extended.cpp
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/linreg_extended.cpp

9.1.4 code/main.R

Used to run procedures related to timing and comparisons presented in the article. By default, it will load results instead.

The first chunk of code it runs can be found in the file code/setup_parameters.R.
Afterwards, either computations are run and their results are stored in data/, or results are loaded to the current environment. The computations are written in code/poi_hmm_tinn.R, code/poi_hmm_lamb.R, code/poi_hmm_simu1.R, code/poi_hmm_simu2.R, code/poi_hmm_simu3.R, and code/poi_hmm_simu4.R.

NOTE: Our scripts were tested on a workstation with 4 Intel(R) Xeon(R) Gold 6134 processors (3.7 GHz) running under the Linux distribution Ubuntu 18.04.6 LTS (Bionic Beaver) with 384 GB, and took about a week of computing time. More details are available at individual files.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/main.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/main.R

9.1.5 code/mvnorm_hmm.cpp

Specifies the function computing the negative log-likelihood of a m-state multivariate Gaussian Hidden Markov Model in C++.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/mvnorm_hmm.cpp
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/mvnorm_hmm.cpp

9.1.6 code/norm_hmm.cpp

Specifies the function computing the negative log-likelihood of a m-state univariate Gaussian Hidden Markov Model in C++.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/norm_hmm.cpp
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/norm_hmm.cpp

9.1.7 code/packages.R

Automatically install/load necessary packages for running code/main.R with all simulations.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/packages.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/packages.R

9.1.8 code/poi_hmm.cpp

Specifies the function computing the negative log-likelihood of a m-state Poisson Hidden Markov Model in C++.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm.cpp
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm.cpp

9.1.9 code/poi_hmm_smoothing.cpp

Specifies the function computing the negative log-likelihood of a m-state Poisson Hidden Markov Model in C++. Additionally, returns smoothing probabilities.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_smoothing.cpp
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_smoothing.cpp

9.1.10 code/poi_hmm_smoothing_truncated.cpp

Specifies the function computing the negative log-likelihood of a m-state Poisson Hidden Markov Model in C++. Additionally, returns (optionally truncated) smoothing probabilities.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_smoothing_truncated.cpp
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_smoothing_truncated.cpp

9.1.11 code/poi_hmm_lamb.R

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_lamb.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_lamb.R

On our machine, this file took approximately 2h to compute.

9.1.12 code/poi_hmm_simu1.R

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_simu1.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_simu1.R

On our machine, this file took approximately 3h30 to compute.

9.1.13 code/poi_hmm_simu2.R

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_simu2.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_simu2.R

On our machine, this file took approximately 23h to compute.

9.1.14 code/poi_hmm_simu3.R

Unlike in the other simulations, the coverage probabilities computation code does not ensure that profile CIs converge. This is solved in the fourth simulation with a larger dataset size, and shows that care must be taken when dealing with profile CIs because they can fail to produce results.
File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_simu3.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_simu3.R

On our machine, this file took approximately 26h30 to compute.

9.1.15 code/poi_hmm_simu4.R

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_simu3.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_simu3.R

On our machine, this file took approximately 63h to compute.

9.1.16 code/poi_hmm_tinn.R

Most of their code is common:

  • Parameters and covariates defines parameters and covariates in natural form. The simulation files generate data there.

  • Parameters & covariates for TMB transforms parameters and covariates to their working form.

  • Estimation computes estimates with and without TMB for further comparison. The estimates are stored in the data-frames conf_int_*.

  • Creating variables for the CIs creates necessary variables for to compute confidence intervals (CIs).

  • Benchmarks uses the microbenchmark (Mersmann 2021) package to accurately time the \(TMB_0\) \(TMB_G\) \(TMB_H\) and \(TMB_{GH}\) procedures. Their times are stored in the data-frames estim_benchmarks_df_*

  • Profiling the likelihood uses the TMB package’s tmbprofile function to determine a profile of the likelihood, then determines a CI based on it for the corresponding working parameter. Eventually, CIs are derived when possible for the natural parameters \(\hat{{\boldsymbol\lambda}}\) and \(\hat{{\boldsymbol\Gamma}}\). The CIs are stored in the data-frames conf_int_*.

  • Bootstrap derives a parametric bootstrap CI from the dataset, while checking that the generated data can be estimated without errors, after which we apply our label switching function to ensure that estimates of a parameter are only aggregated with estimates of the same parameter. The CIs are stored in the data-frames conf_int_*.

  • TMB confidence intervals computes CIs based on TMB-derived standard errors. The CIs are stored in the data-frames conf_int_*.

  • Coverage probabilities of the 3 CI methods simulates coverage samples based on true values when available or on estimates otherwise, then derives CIs using the three CI generation methods described above. The coverage probabilities are stored in the data-frames conf_int_*.

  • Fixes uses the label-switching algorithm on the estimates in the conf_int_* variable to ensure they are correctly ordered, and then reorders \(\hat{{\boldsymbol\Gamma}}\) in the conf_int_* variable to a row-wise order instead of the default column-wise order. In short this applies two fixes on the conf_int_* variable to make it easier to read.
    At the end, we reorder the profile likelihood based CIs in case they are not in ascending order.

A unique randomness seed is set in each of these four files before all time-consuming tasks involving some randomness. code/setup_parameters.R defines the random number generator’s version.

code/poi_hmm_tinn.R contains an extra section: Benchmark the same dataset many times to check the benchmark durations has low variance. As written in its title, its role is to time estimation with and without TMB. However, unlike in the section Benchmarks, estimation is done on the same dataset everytime. If everything goes well, the times should have a negligible variance. This allows us to check if normal background activity on the computer (e.g. the user opening a window or moving the mouse) affects estimation time in any noticeable way. If it affects estimation noticeably, then we would have to apply much stricter control on background processes in order to reliably compare estimation speeds with different optimizers. Luckily, a low variance was observed, making the acceleration evidenced in this project significant.

NOTE: to be faster, some code is parallelized:

  • In code/poi_hmm_simu1.R, bootstrapping is parallelized both in Bootstrap (accelerated from 83 seconds per sample to 29 seconds) and in Coverage probabilities of the 3 CI methods (likelihood profiling was slower with parallelization, likely due to loading TMB each time and the procedure being already fairly quick: from 2.8 seconds in total without to 3.5 seconds in total with).
  • In code/poi_hmm_simu2.R, bootstrapping and likelihood profiling are both parallelized in Bootstrap (accelerated from 64 seconds per bootstrap sample to 21), Profiling the likelihood, and in Coverage probabilities of the 3 CI methods.
  • code/poi_hmm_simu3.R and code/poi_hmm_simu4.R benefit from the same parallelization as code/poi_hmm_simu2.R.

Parallelization is handled with the foreach (Revolution Analytics and Weston, n.d.) package and the doParallel (Corporation and Weston 2022) package as a backend.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/poi_hmm_tinn.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/poi_hmm_tinn.R

9.1.17 code/setup_parameters.R

Sets up multiple variables either to define some global constants or to store useful results for easier maintenance, and compiles code/linreg.cpp and code/poi_hmm.cpp.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/code/setup_parameters.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/code/setup_parameters.R

9.1.18 data/

Files containing both datasets and the computational results. code/main.R stores important results in this folder, and only loads them by default.

Speed results are available after some post-processing that we leave to the reader.

Folder on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data

9.1.19 data/fetal-lamb.RData

Contains lamb: an integer vector of with 240 data from Leroux and Puterman (1992).

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data/fetal-lamb.RData
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/data/fetal-lamb.RData

9.1.26 data/tinnitus.RData

Tinnitus data collected with the “Track Your Tinnitus” (TYT) mobile application on 87 successive days, provided by the University of Regensburg and European School for Interdisciplinary Tinnitus Research (ESIT), of which a detailed description is presented in Pryss, Reichert, Langguth, et al. (2015) and Pryss, Reichert, Herrmann, et al. (2015).

A plot is available in Figure 4.1.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/data/tinnitus.RData
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/data/tinnitus.RData

9.1.27 functions/

Utility functions used both in C++ files and in R files.

Folder on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions

9.1.28 functions/mvnorm_utils.cpp

Utility functions used in Multivariate Gaussian HMM to transform working parameters to their natural form, and to compute the hidden Markov chain’s stationary distribution when needed.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/mvnorm_utils.cpp
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/mvnorm_utils.cpp

9.1.29 functions/mvnorm_utils.R

Useful functions to estimate multivariate Gaussian HMMs with and without TMB and perform various pre-processing and post-processing.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/mvnorm_utils.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/mvnorm_utils.R

9.1.30 functions/norm_utils.cpp

Utility functions used in Gaussian HMM to transform working parameters to their natural form, and to compute the hidden Markov chain’s stationary distribution when needed.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/norm_utils.cpp
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/norm_utils.cpp

9.1.31 functions/norm_utils.R

Useful functions to estimate Gaussian HMMs with and without TMB and perform various pre-processing and post-processing.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/norm_utils.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/norm_utils.R

9.1.32 functions/utils.cpp

Utility functions used in code/poi_hmm.cpp to transform working parameters to their natural form, and to compute the hidden Markov chain’s stationary distribution when needed.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/utils.cpp
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/utils.cpp

9.1.33 functions/utils.R

Useful functions to estimate Poisson HMMs with and without TMB and perform various pre-processing and post-processing.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/utils.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/utils.R

9.1.34 functions/utils.R-explanation.R

Description of the function parameters and return objects defined in functions/utils.R.
The other utility files are adapted to the univariate and multivariate Gaussian setting but otherwise have identical functions.

Note that the article focuses on Poisson HMMs. Hence, some utility functions in R present for the Poisson HMM are not adapted to the Gaussian HMMs.

File on GitHub: https://github.com/timothee-bacri/HMM_with_TMB/blob/main/functions/utils.R-explanation.R
File: https://github.com/timothee-bacri/HMM_with_TMB/raw/main/functions/utils.R-explanation.R