Thought I would write out the main idea from Zhang, O’Donoghue & Boyd (2020), as applied to transition paths.
Zhang et al (2020) want to solve the fixed-point problem:
Find x\in \mathbb{R}^n such that x=f(x)
where f:\mathbb{R}^n \to \mathbb{R}^n is potentially nonsmooth. We assume that f is nonexpansive (in \mathcal{l}_2-norm), i.e., ||f(x)-f(y)||_2 \leq ||x-y||_2 for all x,y\in \mathbb{R}^n.
An Economics example is a standard heterogeneous agent incomplete markets model where we are solving for the general equilibrium transition path, n=T is the number of time periods of transition path, x is the general equilibrium price path (say the path on interest rates r_1,...,r_T). And f(x) will involve starting from a price path x and then (a) solve the value function problems backwards from T to 1, (b) iterate the agent distribution forwards using policy found in (a), (c) compute aggregate variables and evaluate the general eqm conditions to create a ‘new’ price path f(x).
The standard approach to solving transition paths is to use a shooting algorithm. This works by first guessing some initial x^0, and then iterating on this creating new updated price paths as x^{k+1}=\alpha x^k + (1-\alpha) f(x^k), where \alpha is the ‘damping’ hyperparameter. This shooting algorithm is referred to by Zhang et al. (2020) as the KM (Krasnosel’skii-Mann) algorithm.
If you knew that f(x) was continuously differentiable you might instead update based on \nabla f (the gradient/derivative of f), this would give a gradient descent style of algorithm. Here though, f is possibly nonsmooth, the only assumption of Zhang et al. (2020) is that f is nonexpansive.
The AA (Anderson Acceleration) method is based on improving on the KM (shooting) algorithm. Notice that KM only uses the most recent step to update the iteration. Anderson acceleration instead maintains a memory of previous steps and updates the iteration as a linear combination of the memory with dynamic weights. This concept is summed up by the following prototype algorithm,
Algorithm 2.1: Anderson acceleration prototype (AA)
1: Input: initial point $x_0$, fixed-point mapping $f:\mathbb{R}^n \to \mathbb{R}^n$.
2: for k=0,1,... do
3: Choose $m_k$ (e.g., $m_k=min{m,k}, for some integer $m\geq0$)
4: Select weights $\alpha^k_j$ based on the last $m_k$ iterations satisfying $\sum_{j=0}^{m_k} \alpha_j^k=1$
5: $x^{k+1}=\sum_{j=0}^{m_k} \alpha_j^k f(x^{k-m_k+j})$
6: end for
Here k is counting the iterations, as before, and now we have m_k which is the memory in iteration k, and the next iterate is a linear combination of the images of the last m_k+1 iterates under the mapping f. Notice that the KM (shooting) algorithm is nested as m_k=1.
The key then is how to choose the weights \alpha_j^k and Zhang et al (2020) provide ‘Stabilised Type-I Anderson acceleration’ as their algorithm 3.1 on their page 3180. [See the paper for the full details.]
Zhang et al. (2020) make two contributions around this. First is an analytical proof of global convergence for the AA Type-I algorithm (assuming f is nonexpansive, but not requiring any smoothness, nor differentiability, nor uniqueness of the fixed-point). Second are lot’s of computational examples showing that their stabilized AA Type-I algorithm outperforms the KM algorithm.
So the general take-away is that anywhere you use shooting (KM) algorithms, you can probably do better by switching to the stabilized AA Type-I algorithm.
Three comments:
i) note that this is for any fixed-point, and so you could use it for all sorts of other problems, including to find a stationary general eqm, or to find the solution to an infinite-horizon value function problem (they discuss a bit of this in the paper).
ii) looks like I should try this stabilized type-I AA algorithm for transition paths, but also for stationary equilbrium
iii) This setting gives a nice perspective for looking at the SSJ (Sequence-Space-Jacobian) method. Essentially, if we assume that f is linear, then \nabla f(x) is independent of x. Then we can just find x based on our initial guess of x_0 directly in a single iteration as x=\frac{f(x_0)-\nabla f(x_0) x_0}{1-\nabla f(x_0)}. This is the SSJ method. The reason the SSJ method is so fast is because you only need one iteration (at the compartively miniscule cost of computing \nabla f(x_0) once). But you can also see why it is so restrictive, namely the assumption that f(x) is linear.
[\nabla f(x_0) is the “sequence-space jacobian”, and x_0 is the “sequence” which in practice is just that we stay in the stationary equilibrium with an epsilon-shock at a given time horizon. The above formula for x in a single iteration can be derived if you think about a function of a scalar, you get it from rearranging \nabla f(x_0)=\frac{f(x)-f(x_0)}{x-x_0}; note that the right-side of this is the slope from x_0 to x, the left-side is the derivative, and because the function is assumed to be linear the slope will equal the derivative.] [For detail on how to compute the SSJ itself, see ABRS2021. Note the linearity assumption does a bit extra, because as noted by BKM2018 it also means that a model with aggregate shocks can be simulated as just a weighted sum of a transition path (weights being the sequence of past shocks).]