2 Principles of using TMB
for MLE
2.1 Setup
In order to run the scripts and example code, you first need to set up TMB by going through the following steps:
- Install Rtools
- Install the R-package
TMB
(Kristensen 2022) by executing the following code in R:
install.packages("TMB")
- (Optional) Setup error debugging in
RStudio
by running the commandTMB:::setupRstudio()
Advanced use is detailed in https://kaskr.github.io/adcomp/_book/Tutorial.html
2.2 Linear regression example
Let \({\boldsymbol x}\) and \({\boldsymbol y}\) denote the predictor and response vector, respectively, both of length \(n\). For a simple linear regression model with intercept \(a\) and slope \(b\), the negative log-likelihood equals \[\begin{equation*} - \log L(a, b, \sigma^2) = - \sum_{i=1}^n \log(\phi(y_i; a + bx_i, \sigma^2)), \end{equation*}\] where \(\phi(\cdot; \mu, \sigma^2)\) corresponds to the density function of the univariate normal distribution with mean \(\mu\) and variance \(\sigma^2\).
The use of TMB
requires the (negative) log-likelihood function to be coded in C++ under a specific template, which is then loaded into R.
The minimization of this function and other post-processing procedures are all carried out in R
.
Therefore, we require two files.
The first file, named code/linreg.cpp, is written in C++ and defines the objective function, i.e. the negative log-likelihood (nll) function of the linear model.
#include <TMB.hpp> //import the TMB template
template<class Type>
<Type>::operator() ()
Type objective_function{
(y); // Data vector y passed from R
DATA_VECTOR(x); // Data vector x passed from R
DATA_VECTOR
(a); // Parameter a passed from R
PARAMETER(b); // Parameter b passed from R
PARAMETER(tsigma); // Parameter sigma (transformed, on log-scale)
PARAMETER// passed from R
// Transform tsigma back to natural scale
= exp(tsigma);
Type sigma
// Declare negative log-likelihood
= - sum(dnorm(y,
Type nll + b * x,
a ,
sigmatrue));
// Necessary for inference on sigma, not only tsigma
(sigma);
ADREPORT
return nll;
}
The second file needed is written in R
and serves for compiling the nll function defined above and carrying out the estimation procedure by numerical optimization of the nll function.
The .R file (shown below) carries out the compilation of the C++ file and minimization of the nll function:
# Loading TMB package
library(TMB)
# Compilation. The compiler returns 0 if the compilation of
# the cpp file was successful
::compile("code/linreg.cpp")
TMB## [1] 0
# Dynamic loading of the compiled cpp file
dyn.load(dynlib("code/linreg"))
# Generate the data for our test sample
set.seed(123)
<- list(y = rnorm(20) + 1:20, x = 1:20)
data <- list(a = 0, b = 0, tsigma = 0)
parameters # Instruct TMB to create the likelihood function
<- MakeADFun(data, parameters, DLL = "linreg",
obj_linreg silent = TRUE)
# Optimization of the objective function with nlminb
<- nlminb(obj_linreg$par, obj_linreg$fn,
mod_linreg $gr,
obj_linreg$he)
obj_linreg$par
mod_linreg## a b tsigma
## 0.31009251 0.98395536 -0.05814649
It is noteworthy that if one estimates multiple models (for example on the same data with different initial values of parameters), then mod_linreg$par
will output the latest optimized parameters.
If one wishes to access the best set of optimized parameters (in terms of likelihood) among the ones that were estimated, then obj_tmb$env$last.par.best
will output these.
Now that the optimization is taken care of, we can display the estimates with standard errors using the sdreport
function.
sdreport(obj_linreg)
## sdreport(.) result
## Estimate Std. Error
## a 0.31009251 0.43829087
## b 0.98395536 0.03658782
## tsigma -0.05814649 0.15811383
## Maximum gradient component: 1.300261e-10
If we use summary
on this object, we also get the variables we have passed to ADREPORT
in the code/linreg.cpp file.
In this example, this is only the residual standard deviation; sigma
.
summary(sdreport(obj_linreg))
## Estimate Std. Error
## a 0.31009251 0.43829087
## b 0.98395536 0.03658782
## tsigma -0.05814649 0.15811383
## sigma 0.94351172 0.14918225
The select
argument restricts the output to variables passed
by ADREPORT(variable_name);
in the cpp file.
As we will see in the Section Wald-type confidence intervals based on the Hessian, this lets us derive confidence intervals for these natural parameters easily.
summary(sdreport(obj_linreg), select = "report")
## Estimate Std. Error
## sigma 0.9435117 0.1491823
Certainly, you would not build a TMB model to fit a linear regression. We can use standard R functions for that. Therefore, we use stats::lm
on the same data and compare the estimates to those obtained by TMB
.
rbind(
"lm" = lm(y ~ x, data = data)$coef, # linear regression using R
"TMB" = mod_linreg$par[1:2] # intercept and slope from TMB fit
)## (Intercept) x
## lm 0.3100925 0.9839554
## TMB 0.3100925 0.9839554
As we can see, the parameter estimates are exactly the same.
2.3 Extending the C++
nll
Sometimes it is useful to write subroutines as a separate function that can be used within your TMB likelihood function.
This can increase readability of your code and reduce the number of lines of code in your main cpp file.
Writing extra files to define functions compatible with TMB
requires some care, as these must follow TMB
’s template.
To illustrate how this works, we add a separate function to the code/linreg.cpp example. The following function does not do anything meaningful, but is just meant to show you how you can add write an additional function and load it into your C++. We start by writing the subroutine function as a separate file called functions/utils_linreg_extended.cpp.
template<class Type>
<Type> function_example(matrix<Type> mat_example) {
matrix
// This function doesn't do anything meaningful
<Type> mat(2, 3);
matrix.setOnes();
mat.row(1) << 5, 5, 5;
mat(0, 2) = mat.row(1).sum();
mat
return mat;
}
template<class Type>
(Type tsigma) {
Type logistic
return 1 / (1 + exp (-tsigma));
}
We then import it into code/linreg_extended.cpp.
#include <TMB.hpp> //import the TMB template
#include "../functions/utils_linreg_extended.cpp"
template<class Type>
<Type>::operator() ()
Type objective_function{
(y); // Data vector y passed from R
DATA_VECTOR(x); // Data vector x passed from R
DATA_VECTOR
(a); // Parameter a passed from R
PARAMETER(b); // Parameter b passed from R
PARAMETER(tsigma); // Parameter sigma (transformed by logistic function)
PARAMETER// constrained on [0, 1]
// passed from R
// Transform tsigma back to natural scale with our external function
= logistic(tsigma);
Type sigma
// Declare negative log-likelihood
= - sum(dnorm(y,
Type nll + b * x,
a ,
sigmatrue));
// Necessary for inference on sigma, not only tsigma
(sigma);
ADREPORT
/* This is a useless example to show how to manipulate matrices
* in C++
* This creates a matrix of 2 rows and 3 columns.
* The Eigen library is used to manipulate vectors, arrays, matrices...
*/
<Type> mat_example(2, 3);
matrix<< 1, 2, 3,
mat_example 4, 5, 6;
.setOnes();
mat_example(1, 2) = sigma;
mat_example<Type> mat_example2 = function_example(mat_example);
matrix
// This lets us retrieve any variables in a nice format
(mat_example);
REPORT(mat_example2);
REPORT
return nll;
}
And eventually we can use it in R, as shown in this minimal example.
# Loading TMB package
library(TMB)
# Compilation. The compiler returns 0 if the compilation of
# the cpp file was successful
::compile("code/linreg_extended.cpp")
TMB## [1] 0
# Dynamic loading of the compiled cpp file
dyn.load(dynlib("code/linreg_extended"))
# Generate the data for our test sample
set.seed(123)
<- 0.6
sigma <- list(y = rnorm(20, sd = sigma) + 1:20, x = 1:20)
data <- log(sigma / (1 - sigma)) # Logit transform
tsigma <- list(a = 0, b = 0, tsigma = 0.1)
parameters # Instruct TMB to create the likelihood function
<- MakeADFun(data, parameters, DLL = "linreg_extended",
obj_linreg silent = TRUE)
# Optimization of the objective function with nlminb
<- nlminb(obj_linreg$par, obj_linreg$fn,
mod_linreg $gr,
obj_linreg$he)
obj_linreg# Objects returned by ADREPORT() in C++
summary(sdreport(obj_linreg), select = "report")
## Estimate Std. Error
## sigma 0.566107 0.08950934
# Object
$report()
obj_linreg## $mat_example
## [,1] [,2] [,3]
## [1,] 1 1 1.0000000
## [2,] 1 1 0.5658614
##
## $mat_example2
## [,1] [,2] [,3]
## [1,] 1 1 15
## [2,] 5 5 5
Note that when you are writing the C++
nll file, compiling the file again may lead to an error.
This is because the dynamic library (the .dll
file) is already loaded, and cannot be overwritten.
To prevent this, it is useful either to manually unload the file via the code
dyn.unload(dynlib("code/linreg_extended"))
or to restart the R
session via the menu Session->Restart R
(shortcut Ctrl+Shift+F10
on Windows).
Note that restarting the session unloads all packages from the session, and will require you to load the TMB
package again.