The default optimization algorithm for GMM estimation has been changed to fminalgo=8 which uses Matlab’s lsqnonlin(). This should be faster and better. Rest of this post is just background info.
Turns out there are dedicated algorithms for minimizing ‘least squares residual’ problems. One of the most popular algorithms is ’ Levenberg-Marquardt’ and this is one option in Matlab’s lsqnonlin(), although the default is ‘trust-region-reflective’.
A least squares residual problem is,
\min_{x \in \mathbb{R}^m} \; ||f(x)||_2
for a function f:\mathbb{R}^m \to \mathbb{R}^n. Or rewriting the same thing,
\min_{x \in \mathbb{R}^m} f_1(x)^2+f_2(x)^2+\dots+f_n(x)^2
To implement this in Matlab you just define f:\mathbb{R}^m \to \mathbb{R}^n as a function, and then do lsqnonlin(f,x0)
Note that this means f outputs a column-vector of length n.
Our GMM problem for a life-cycle model is,
\min_{\theta \in \mathbb{R}^m} [g_d-g_m(\theta)]'W[g_d-g_m(\theta)]
where g_d-g_m(\theta)=0 is the moment condition, g_d are the data moments, and g_m(\theta) are the model moments (the moment condition is the difference in moments; awkward wording but correct wording).
So if we ignore the weighting matrix for a moment, we can just set f(\theta)=g_d-g_m(\theta) as a vector-valued function, and then do lsqnonlin(f,theta0)
If the weighting matrix was just a vector of weights, w, then we can just set f(\theta)=w.*[g_d-g_m(\theta)] as a vector-valued function, and then do lsqnonlin(f,theta0)
But the weigthing matrix is a matrix. Clearly we want to intuitively do W^{1/2} in our function for least-square residuals (as the function is then squared, which will give us W). The trick is to use a Cholesky decomposition of W (note that this requires W be positive semi-definite, but this is anyway assumed for GMM). So to use a W matrix in lsqnonlin, we work with the upper-cholesky decomposition
W_{upperchol} = chol(W, 'upper');
and then we define the function f(\theta)=W_{upperchol}*[g_d-g_m(\theta)], and then do lsqnonlin(f,theta0)
.
[Note: our least-squares residuals problem is f(\theta)' f(\theta)=(W_{upperchol} [g_d-g_m(\theta)])'W_{upperchol} [g_d-g_m(\theta)]=[g_d-g_m(\theta)]'W_{upperchol}'W_{upperchol} [g_d-g_m(\theta)]=[g_d-g_m(\theta)]'W[g_d-g_m(\theta)] which is our GMM objective. So we now have our GMM objective set up as a least-square residuals problem.
That is it. We have rewritten our GMM problem as a least-squares residuals problem. So now we can use least-squares residuals algorithms, like those in Matlab’s lsqnonlin(). These least-squares residuals algorithms appear to be meaningfully faster and more stable for least-squares residuals problems.
(Not obvious from the notation but it is of course important that f(\theta)=W_{upperchol}*[g_d-g_m(\theta)] is just a vector of length n.)