8 Multivariate Gaussian HMM
Multivariate Gaussian HMMs with TMB
is a direct generalization of the univariate case from the previous section. We focus on the main differences.
Most notably, distribution parameters are changed from Normal mean vectors to matrices and we now have one covariance matrix for each state.
For the implementation, the main change is for the density function to go from dnorm
to dmvnorm
(and the random data generation functions from rnorm
to rmvnorm
for simulations) from the mvtnorm
(Genz et al. 2021) package.
Many functions available in functions/norm_utils.R have been adapted with the changes described for the multivariate Gaussian case in functions/mvnorm_utils.R.
As an example, the function to generate random data becomes
# Generate a random sample from a HMM
<- function(ns, mod) {
mvnorm.HMM.generate.sample <- 1:mod$m
mvect <- numeric(ns)
state 1] <- sample(mvect, 1, prob = mod$delta)
state[for (i in 2:ns) {
<- sample(mvect, 1, prob = mod$gamma[state[i - 1], ])
state[i]
}<- matrix(NA, nrow = ns, ncol = p)
x for (i in 1:ns) {
<- rmvnorm(n = 1,
x[i, ] mean = mod$mu[state[i], ],
sigma = mod$sigma[, , state[i]])
}names(x) <- NULL
return(list(data = x, state = state))
}
We detail below an example of the estimation of a two-state trivariate Gaussian HMM with TMB.
8.1 Dataset & Parameters
To avoid issues regarding the choice of data, we simulate our own.
Let two multivariate Gaussian distributions \(N({\boldsymbol\mu_1}, {\boldsymbol\Sigma_1})\) and \(N({\boldsymbol\mu_2}, {\boldsymbol\Sigma_2})\) be defined with (rather arbitrary) parameters as follows. \[\begin{equation*} {\boldsymbol\mu_1} = (6, 8, 9) \quad \text{and } {\boldsymbol\mu_2} = (1, 2, 3) \end{equation*}\]
\[\begin{equation*} {\boldsymbol\Sigma_1} = \begin{pmatrix} 1.0 & 0.5 & 0.5\\ 0.5 & 1.0 & 0.5\\ 0.5 & 0.5 & 1.0 \end{pmatrix} \quad \text{ and } {\boldsymbol\Sigma_2} = \begin{pmatrix} 2.0 & 1.5 & 1.5\\ 1.5 & 2.0 & 1.5\\ 1.5 & 1.5 & 2.0 \end{pmatrix}. \end{equation*}\]
Let us consider a stationary Markov chain with the following (also fairly random) TPM \[\begin{equation*} {\boldsymbol\Gamma} = \begin{pmatrix} 0.95 & 0.05\\ 0.15 & 0.85 \end{pmatrix}. \end{equation*}\]
We generate 1000 data with the function above.
8.2 Likelihood function
The Gaussian negative log-likelihood in TMB can be coded as
#include <TMB.hpp>
#include "../functions/mvnorm_utils.cpp"
// Likelihood for a multivariate normal hidden markov model.
template<class Type>
<Type>::operator() ()
Type objective_function{
// Data
(x); // timeseries matrix (n rows * p cols)
DATA_MATRIX(m); // Number of states m
DATA_INTEGER
// Parameters
(tmu); // mp conditional mu's (matrix: m rows, p columns)
PARAMETER_MATRIX(tsigma); // mp(p+1)/2 working parameters of covariance matrices (matrix: p(p+1)/2 rows, m columns)
PARAMETER_MATRIX(tgamma); // m(m-1) working parameters of TPM (vector: m*m-m columns)
PARAMETER_VECTOR
// Uncomment only when using a non-stationary distribution
//PARAMETER_VECTOR(tdelta); // m-1 working parameters (vector)
// Load namespace which contains the multivariate distributions
using namespace density;
// Get number of covariates (p)
int p = x.cols();
// Transform working parameters to natural parameters:
<Type> mu = tmu.transpose(); // We need rows (for the emission probability matrix) but TMB only supports easy retrieval of columns
matrix<Type> gamma = gamma_w2n(m, tgamma);
matrix<Type> sigma = sigma_w2n(m, p, tsigma); // Construct m matrices of size p x p (array: p x p x m)
array
// Construct stationary distribution
<Type> delta = stat_dist(m, gamma);
vector// If using a non-stationary distribution, use this instead
//vector<Type> delta = delta_w2n(m, tdelta);
// Get number of timesteps (n)
int n = x.rows();
// Evaluate conditional distribution: Put conditional
// probabilities of observed x in n times m matrix
// (one column for each state, one row for each datapoint):
<Type> emission_probs(n, m);
matrix<Type> sigma_m(p, p);
matrix<Type> residual_vec(p);
vector= 0;
Type nll
// Evaluate and store the conditional distributions column-wise
bool NA_appears = false;
for (int m_idx = 0; m_idx < m; m_idx++) {
MVNORM_t<Type> neg_log_dmvnorm(sigma.col(m_idx).matrix());
for (int i = 0; i < n; i++) {
// Replace missing values (NA in R, NaN in C++) with 1
= false;
NA_appears for (int p_idx = 0; p_idx < p; p_idx++) {
if (x(i, p_idx) != x(i, p_idx)) { // f != f returns true if and only if f is NaN.
= true;
NA_appears }
}
if (NA_appears) {
(i, m_idx) = 1;
emission_probs} else {
// Fill the emission probability matrix
= vector<Type>(x.row(i));
residual_vec -= vector<Type>(mu.col(m_idx));
residual_vec = neg_log_dmvnorm(residual_vec);
nll // MVNORM_t returns the negative log-likelihood, we only want the likelihood
(i, m_idx) = exp(-nll);
emission_probs}
}
}
// Corresponds to (Zucchini et al., 2016, p 333)
<Type> foo, P;
matrix, sumfoo, lscale;
Type mllk
= (delta * vector<Type>(emission_probs.row(0))).matrix();
foo = foo.sum();
sumfoo = log(sumfoo);
lscale .transposeInPlace();
foo/= sumfoo;
foo for (int i = 2; i <= n; i++) {
= emission_probs.row(i - 1);
P = ((foo * gamma).array() * P.array()).matrix();
foo = foo.sum();
sumfoo += log(sumfoo);
lscale /= sumfoo;
foo }
= -lscale;
mllk
// Undo the transpose done at the beginning
.transposeInPlace();
mu
// Use adreport on variables for which we want standard errors
(mu);
ADREPORT(sigma);
ADREPORT(gamma);
ADREPORT(delta);
ADREPORT(mllk);
ADREPORT
// Variables we need for local decoding and in a convenient format
(mu);
REPORT(sigma);
REPORT(gamma);
REPORT(delta);
REPORT(n);
REPORT
return mllk;
}
and requires the following utility functions.
// Function transforming working parameters in initial distribution
// to natural parameters
template<class Type>
<Type> delta_w2n(int m, vector<Type> tdelta) {
vector
<Type> delta(m);
vector<Type> foo(m);
vector
if (m == 1)
return Type(1);
// set first element to one.
// Fill in the last m - 1 elements with working parameters
// and take exponential
<< Type(1), tdelta.exp();
foo
// normalize
= foo / foo.sum();
delta
return delta;
}
// Function transforming the working parameters in TPM to
// natural parameters (w2n)
template<class Type>
<Type> gamma_w2n(int m, vector<Type> tgamma) {
matrix
// Construct m x m identity matrix
<Type> gamma(m, m);
matrix.setIdentity();
gamma
if (m == 1)
return gamma;
// Fill offdiagonal elements with working parameters column-wise:
int idx = 0;
for (int i = 0; i < m; i++){
for (int j = 0; j < m; j++){
if (j != i){
// Fill gamma according to mapping and take exponential
(j, i) = tgamma.exp()(idx);
gamma++;
idx}
}
}
// Normalize each row:
<Type> cs = gamma.rowwise().sum();
vectorfor (int i = 0; i < m; i++) gamma.row(i) /= cs[i];
return gamma;
}
// Function transforming the working parameters in the covariance matrices to
// natural parameters (w2n) (from row-wise upper triangle as a vector to
// the original covariance matrix)
template<class Type>
<Type> sigma_w2n(int m, int p, matrix<Type> tsigma) {
array
// Construct m matrices of size p x p (p x p x m array)
<Type> sigma_array(p, p, m);
array
<Type> temporary_matrix(p, p);
matrix<Type> sigma_matrix(p, p);
matrix
// Fill upper triangular elements with working parameters column-wise
int idx;
for (int m_idx = 0; m_idx < m; m_idx++) {
= 0;
idx for (int j = 0; j < p; j++) { // tsigma is filled column-wise from sigma
for (int i = 0; i <= j; i++) {
// Fill sigma_array according to mapping, undo the log transformation
if (i == j) {
(i, j, m_idx) = exp(tsigma(idx, m_idx));
sigma_array++;
idx} else if (i < j) {
// Fill sigma_array according to mapping
(i, j, m_idx) = tsigma(idx, m_idx);
sigma_array++;
idx} else {
(i, j, m_idx) = 0;
sigma_array}
}
}
// Undo the Cholesky transformation
= sigma_array.col(m_idx).matrix(); // col() selects the last index, row() doesn't exist
sigma_matrix = sigma_matrix.transpose() * sigma_matrix;
temporary_matrix .col(m_idx) = temporary_matrix.array();
sigma_array}
return sigma_array;
}
// Function computing the stationary distribution of a Markov chain
template<class Type>
<Type> stat_dist(int m, matrix<Type> gamma) {
vector
// Construct stationary distribution
<Type> I(m, m);
matrix<Type> U(m, m);
matrix<Type> row1vec(1, m);
matrix= U.setOnes();
U = I.setIdentity();
I .setOnes();
row1vec<Type> A = I - gamma + U;
matrix<Type> Ainv = A.inverse();
matrix<Type> deltamat = row1vec * Ainv;
matrix<Type> delta = deltamat.row(0);
vector
return delta;
}
8.3 Estimation
The estimation procedure is similar to before.
# Load packages and utility functions
source("code/packages.R")
source("functions/mvnorm_utils.R")
# Compile and load the objective function for TMB
::compile("code/mvnorm_hmm.cpp")
TMB## [1] 0
dyn.load(dynlib("code/mvnorm_hmm"))
set.seed(123)
# Two states
<- 2
m # Trivariate Normal distribution
<- 3
p
# One row of means per state, one column per dimension of the data
<- matrix(c(6, 8, 9,
mu 1, 2, 3), nrow = m, ncol = p, byrow = TRUE)
# Two covariance matrices
<- matrix(c(1.0, 0.5, 0.5,
sigma1 0.5, 1.0, 0.5,
0.5, 0.5, 1.0), nrow = p, ncol = p, byrow = TRUE)
<- matrix(c(2.0, 1.5, 1.5,
sigma2 1.5, 2.0, 1.5,
1.5, 1.5, 2.0), nrow = p, ncol = p, byrow = TRUE)
# We store them in an array for convenience, making them
# easily distinguishable at a glance when displayed
<- array(c(sigma1, sigma2), dim = c(p, p, m))
sigma
# TPM
<- matrix(c(0.95, 0.05,
gamma 0.15, 0.85), byrow = TRUE, nrow = m, ncol = m)
# Similarly to the Poisson case, we can generate data.
# Here, we generate a trivariate Gaussian sample of size 1000.
<- list(m = m,
mod mu = mu,
sigma = sigma,
gamma = gamma)
<- mvnorm.HMM.generate.sample(1000, mod)
TMBdata
# Parameters & covariates for TMB
# TMB requires a list
<- list(x = TMBdata$data,
TMB_data m = m)
# TMB requires the parameters to be either vectors, matrices, or arrays.
# For simplicity, we pass the parameters as a list of vectors and matrices.
<- mvnorm.HMM.pn2pw(m = m, mu = mu, sigma = sigma, gamma = gamma)
pw
# Estimation
<- MakeADFun(data = TMB_data,
obj_tmb parameters = pw,
DLL = "mvnorm_hmm",
silent = TRUE)
nlminb(start = obj_tmb$par,
objective = obj_tmb$fn,
gradient = obj_tmb$gr,
hessian = obj_tmb$he)
## $par
## tmu tmu tmu tmu tmu tmu tsigma
## 5.979331886 1.105791929 7.974662650 2.016963286 9.034963083 3.069442564 -0.005278076
## tsigma tsigma tsigma tsigma tsigma tsigma tsigma
## 0.470025716 -0.161247055 0.455150038 0.312012980 -0.218999355 0.306537527 1.029377442
## tsigma tsigma tsigma tsigma tgamma tgamma
## -0.038685928 0.977222514 0.440952661 -0.128765538 -1.852673335 -2.813032886
##
## $objective
## [1] 4295.975
##
## $convergence
## [1] 0
##
## $iterations
## [1] 5
##
## $evaluations
## function gradient
## 6 5
##
## $message
## [1] "both X-convergence and relative convergence (5)"
8.4 Results
We show estimates and their standard errors. Note that the matrix results (for \({\boldsymbol\mu}\), \({\boldsymbol\delta}\), \({\boldsymbol\Gamma}\), and \({\boldsymbol\Sigma}\)) are displayed in a column-wise format. This can be verified by checking the more easily readable view of these estimates.
summary(sdreport(obj_tmb), "report")
## Estimate Std. Error
## mu 5.979332e+00 3.756368e-02
## mu 1.105792e+00 7.957753e-02
## mu 7.974663e+00 3.676989e-02
## mu 2.016963e+00 8.256826e-02
## mu 9.034963e+00 3.689365e-02
## mu 3.069443e+00 8.116536e-02
## sigma 9.894994e-01 5.285072e-02
## sigma 4.675514e-01 4.068925e-02
## sigma 4.527541e-01 4.048862e-02
## sigma 4.675514e-01 4.068925e-02
## sigma 9.452644e-01 5.066894e-02
## sigma 4.794808e-01 4.049568e-02
## sigma 4.527541e-01 4.048862e-02
## sigma 4.794808e-01 4.049568e-02
## sigma 9.498403e-01 5.124277e-02
## sigma 1.846100e+00 1.559273e-01
## sigma 1.398628e+00 1.430482e-01
## sigma 1.327764e+00 1.380429e-01
## sigma 1.398628e+00 1.430482e-01
## sigma 1.985164e+00 1.679836e-01
## sigma 1.430151e+00 1.455383e-01
## sigma 1.327764e+00 1.380429e-01
## sigma 1.430151e+00 1.455383e-01
## sigma 1.922361e+00 1.619135e-01
## gamma 9.433760e-01 8.707558e-03
## gamma 1.355593e-01 1.981831e-02
## gamma 5.662395e-02 8.707558e-03
## gamma 8.644407e-01 1.981831e-02
## delta 7.053648e-01 4.398108e-02
## delta 2.946352e-01 4.398108e-02
## mllk 4.295975e+03 3.549571e-13
# More readable estimates
<- obj_tmb$report()
report
report## $mu
## [,1] [,2] [,3]
## [1,] 5.979332 7.974663 9.034963
## [2,] 1.105792 2.016963 3.069443
##
## $sigma
## , , 1
##
## [,1] [,2] [,3]
## [1,] 0.9894994 0.4675514 0.4527541
## [2,] 0.4675514 0.9452644 0.4794808
## [3,] 0.4527541 0.4794808 0.9498403
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] 1.846100 1.398628 1.327764
## [2,] 1.398628 1.985164 1.430151
## [3,] 1.327764 1.430151 1.922361
##
##
## $delta
## [1] 0.7055608 0.2944392
##
## $n
## [1] 1000
##
## $gamma
## [,1] [,2]
## [1,] 0.9434294 0.05657056
## [2,] 0.1355593 0.86444068
##
## $mllk
## [1] 4303.249
8.5 Bias
Since the data was simulated, the true parameters are known and the validity of the estimates can be checked. To do this, we look at the maximum absolute percentage difference between estimates and true values.
$mu - mu) / mu
(report## [,1] [,2] [,3]
## [1,] -0.003444686 -0.003167169 0.003884787
## [2,] 0.105791929 0.008481643 0.023147521
$sigma - sigma) / sigma
(report## , , 1
##
## [,1] [,2] [,3]
## [1,] -0.01050063 -0.06489716 -0.09449190
## [2,] -0.06489716 -0.05473563 -0.04103843
## [3,] -0.09449190 -0.04103843 -0.05015973
##
## , , 2
##
## [,1] [,2] [,3]
## [1,] -0.07695023 -0.067581365 -0.11482373
## [2,] -0.06758137 -0.007418231 -0.04656629
## [3,] -0.11482373 -0.046566292 -0.03881965
$gamma - gamma) / gamma
(report## [,1] [,2]
## [1,] -0.006916378 0.13141118
## [2,] -0.096271185 0.01698903
Differences are at most in the order of 10% for values near 0. This is acceptable.