Transition in HA model: a fast method

I attach an interesting paper of Boar and Midrigan, “Efficient Redistribution”.

In the numerical appendix they explain the method they use for solving for the transition. They use a method from a math paper “Globally convergent AA-I for non-smooth non-expansive fixed-point iterations”. I quote from their paper:

“Computing the transition dynamics requires iterating over a sequence of interest
rates and lump-sum transfers during the transitions. This system is typically solved using
a fixed point method, which is notoriously slow and unreliable. We achieved a significant
speed gain and stability by employing the globally convergent Type-I Anderson acceleration
algorithm in Zhang et al. (2018).”

You can find the code (of the math paper, not Boar and Midrigan’s) here:

Looks worth it to look at.

1 Like

Toolkit currently does the “Computing the transition dynamics requires iterating over a sequence of interest rates and lump-sum transfers during the transitions. This system is typically solved using a fixed point method, which is notoriously slow and unreliable.”

I am currently working on overhauling the OLG transition paths (putting in divide-and-conquer) so will definitely try and run some tests using this method.

I certainly agree with them about the notoriously unreliable part! Shooting-algorithms regularly misfire for transition paths.

1 Like

For completeness, here is the code for Boar and Midrigan. This is the AER:Insights paper, not the JME but the algorithm is the same. The relevant folder is Model/nonlinear/andersen

https://www.openicpsr.org/openicpsr/project/178401/version/V1/view

Great! The Boar and Midrigan model is infinite horizon, but the Anderson acceleration algorithm is a generalization of the shooting method, so it should be useful for OLG transitions as well.

1 Like

Link to the published version:
Zhang et al (2020) - Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations
https://doi.org/10.1137/18M1232772

1 Like

Thought I would write out the main idea from Zhang, O’Donoghue & Boyd (2020), as applied to transition paths.

Zhang et al (2020) want to solve the fixed-point problem:

Find x\in \mathbb{R}^n such that x=f(x)

where f:\mathbb{R}^n \to \mathbb{R}^n is potentially nonsmooth. We assume that f is nonexpansive (in \mathcal{l}_2-norm), i.e., ||f(x)-f(y)||_2 \leq ||x-y||_2 for all x,y\in \mathbb{R}^n.

An Economics example is a standard heterogeneous agent incomplete markets model where we are solving for the general equilibrium transition path, n=T is the number of time periods of transition path, x is the general equilibrium price path (say the path on interest rates r_1,...,r_T). And f(x) will involve starting from a price path x and then (a) solve the value function problems backwards from T to 1, (b) iterate the agent distribution forwards using policy found in (a), (c) compute aggregate variables and evaluate the general eqm conditions to create a ‘new’ price path f(x).

The standard approach to solving transition paths is to use a shooting algorithm. This works by first guessing some initial x^0, and then iterating on this creating new updated price paths as x^{k+1}=\alpha x^k + (1-\alpha) f(x^k), where \alpha is the ‘damping’ hyperparameter. This shooting algorithm is referred to by Zhang et al. (2020) as the KM (Krasnosel’skii-Mann) algorithm.

If you knew that f(x) was continuously differentiable you might instead update based on \nabla f (the gradient/derivative of f), this would give a gradient descent style of algorithm. Here though, f is possibly nonsmooth, the only assumption of Zhang et al. (2020) is that f is nonexpansive.

The AA (Anderson Acceleration) method is based on improving on the KM (shooting) algorithm. Notice that KM only uses the most recent step to update the iteration. Anderson acceleration instead maintains a memory of previous steps and updates the iteration as a linear combination of the memory with dynamic weights. This concept is summed up by the following prototype algorithm,

Algorithm 2.1: Anderson acceleration prototype (AA)
1: Input: initial point $x_0$, fixed-point mapping $f:\mathbb{R}^n \to \mathbb{R}^n$.
2: for k=0,1,... do
3:    Choose $m_k$ (e.g., $m_k=min{m,k}, for some integer $m\geq0$)
4:    Select weights $\alpha^k_j$ based on the last $m_k$ iterations satisfying $\sum_{j=0}^{m_k} \alpha_j^k=1$
5:    $x^{k+1}=\sum_{j=0}^{m_k} \alpha_j^k f(x^{k-m_k+j})$
6: end for

Here k is counting the iterations, as before, and now we have m_k which is the memory in iteration k, and the next iterate is a linear combination of the images of the last m_k+1 iterates under the mapping f. Notice that the KM (shooting) algorithm is nested as m_k=1.

The key then is how to choose the weights \alpha_j^k and Zhang et al (2020) provide ‘Stabilised Type-I Anderson acceleration’ as their algorithm 3.1 on their page 3180. [See the paper for the full details.]

Zhang et al. (2020) make two contributions around this. First is an analytical proof of global convergence for the AA Type-I algorithm (assuming f is nonexpansive, but not requiring any smoothness, nor differentiability, nor uniqueness of the fixed-point). Second are lot’s of computational examples showing that their stabilized AA Type-I algorithm outperforms the KM algorithm.

So the general take-away is that anywhere you use shooting (KM) algorithms, you can probably do better by switching to the stabilized AA Type-I algorithm.


Three comments:
i) note that this is for any fixed-point, and so you could use it for all sorts of other problems, including to find a stationary general eqm, or to find the solution to an infinite-horizon value function problem (they discuss a bit of this in the paper).
ii) looks like I should try this stabilized type-I AA algorithm for transition paths, but also for stationary equilbrium
iii) This setting gives a nice perspective for looking at the SSJ (Sequence-Space-Jacobian) method. Essentially, if we assume that f is linear, then \nabla f(x) is independent of x. Then we can just find x based on our initial guess of x_0 directly in a single iteration as x=\frac{f(x_0)-\nabla f(x_0) x_0}{1-\nabla f(x_0)}. This is the SSJ method. The reason the SSJ method is so fast is because you only need one iteration (at the compartively miniscule cost of computing \nabla f(x_0) once). But you can also see why it is so restrictive, namely the assumption that f(x) is linear.
[\nabla f(x_0) is the “sequence-space jacobian”, and x_0 is the “sequence” which in practice is just that we stay in the stationary equilibrium with an epsilon-shock at a given time horizon. The above formula for x in a single iteration can be derived if you think about a function of a scalar, you get it from rearranging \nabla f(x_0)=\frac{f(x)-f(x_0)}{x-x_0}; note that the right-side of this is the slope from x_0 to x, the left-side is the derivative, and because the function is assumed to be linear the slope will equal the derivative.] [For detail on how to compute the SSJ itself, see ABRS2021. Note the linearity assumption does a bit extra, because as noted by BKM2018 it also means that a model with aggregate shocks can be simulated as just a weighted sum of a transition path (weights being the sequence of past shocks).]

1 Like

Nice explanarion, thanks!

Back thinking about transition paths again :slight_smile:

The steps in an OLG transition are roughly:

  1. for a given path
    i) iterate backwards in t solving (value fn problem) to get policy fn
    ii) iterate forwards in t to get agent dist
    iii) calculate aggregates (or model stats more generally)
    iv) evaluate general eqm eqns
  2. update path (based on results of 1iv)
    [in practice I do iii and iv in same forward iteration loop as ii]

Currently toolkit does these by:
1.
i) parallel over j, within t (fastOLG) and if we can also divide-and-conquer
ii) tan improvement, parallel over j
iii) parallel the evaluations over j
iv) don’t do anything special
2. linear update

Over past two-three years, there have been big improvements to (i), (ii) and (iii), making them all parallel over j.

Runtimes currently are largely about two steps, (1i) and 2.

Heer & Mausner textbook, Section 10.4 describe three ways to do step 2: linear, Newton & Broyden. “Updating Schemes” on pg 586 (of 2024 3rd ed.) describes updating a path for \{x_t\}_{t=1}^T as:

Linear: x_t^{i+1}=\phi x_t^i + (1-\phi) x_t^{i*}, where i is the iteration of the path (a counter for what was described above as step 2).
[This is essentially what VFI Toolkit does, but using the evaluation of the general eqm eqn, rather than x_t^{i*} (x_t^{i*} can be understood as value for x_t that we get in step 1iii, but would be difficult to do in a generic fashion, e.g. for things like gov. debt, or for prices rather than quantities)]

Newton-Raphson: \mathbf{x}^{i+1}=\mathbf{x}^i -\mathbf{J(x^i)}^{-1} \mathbf{g(x)}
where \mathbf{J(x^i)} is the Jacobian of \mathbf{g(x)} [it is the state-space jacobian], and \mathbf{g(x)}=\mathbf{x}^i - \mathbf{x}^{i*}. [note, the t subscript disappears as these are all vectors over the t dimension]

Computing the Jacobian can be time consuming in larger models, and these leads to the third method,
Broyden: Same as Newton-Raphson, except do not compute Jacobian every iteration, instead use an approximation for the update. I don’t spell it out here, see Heer & Mausner Section 15.3.2. Broyden takes more iterations than the Newton-Raphson, but can save enough time in computing the Jacobian that it is still faster.

So where does this leave VFI Toolkit? Clearly there would be gains to be had from improving Step 2, whether is is about moving to Broyden, or to Anderson. I feel though like there are even bigger gains to be had from further improving 1i, and I have some ideas on this so I’m going to focus work here. Improving Step 2 is left for the future :smiley:


Note: Heer & Mausner textbook consider updating T (the number of periods of transition path) as part of algorithm. VFI Toolkit does not (at least presently) offer this option. I feel like in practice, is easier to just solve once with big T, eyeball results, choose a new T that has a big of cushion on end but not to much, done.

Note: Anderson algorithm is a version of linear, in which not just the previous iteration, but all of the last m_k iterations are being used.

Note: I say parallel over j in Tan improvement, really I just mean add age j as another dimension of the agent distribution within the iteration over t.

1 Like

Great explanation!

One question: what about transitions in infinite-horizon models? I was thinking about a model a la Guerrieri and Lorenzoni (QJE 2017), for example, or Bettina Bruggeman AEJ:Macro.

Another question: How do you parallelize over j (I guess j refers to age in an OLG model)? I think the loop over j cannot be parallelize because the iterations are not independent from each other. Instead, the loops over the other state variables, (a, z, etc) can be parallelized of course, either with parfor or implicitly through vectorization on the gpu/cpu

Yet another question: Listening to an online presentation of the Boar-Midrigan paper, I recall Virgiliu saying that the method by Zhang et al “saved their life” not so much because it is faster but because it is more robust. He was saying that with the naive method (I think the “linear” method) sometimes the transition did not converge, and they had to compute hundreds of transition paths for many different values of taxes in a reliable way.

1 Like

Infinite-horizon models, the transition is just the basics. All the steps are the same [1 i,ii,iii,iv, and 2], but obviously no parallelizing over j as there is no j (j indexes age/finite-horizon). So the only thing really left is the divide-and-conquer (and using Tan improvement).

How to parallelize over j? When doing transition paths in an OLG model, you have ages j=1,…J, and time periods t=1,…,T. You have to backward iterate over time periods t. But you can parallelize over ‘age’ j. I like to think of the transition like in this image, as being moving “diagonally up” (j,t today becomes j+1,t+1 tomorrow),


Notice that for all the ages in period 3, the ‘next period’ about which you have to form expectations is in period 4 (it is also +1 on age, but that is unimportant). So once we have the period 4 solution, we can do all the period 3 ages in parallel, since the expectations term is coming from period 4 (which we already solved), and not from a ‘different age in period 3’. That is, the problems for the different ages in period 3 are all completely independent from each other, and when problems are independent we can parallelize them :gorilla:. [Note when we are just solving a life-cycle model, and not a transition path, your expectation is about the j+1 and so you have to iterate j (not parallelize), whereas here while the expectation has a ‘j+1’ in it, the expectation is better understood as being about t+1.]

Anderson: Yeah, that Anderson approach for Step 2 definitely looks like it should be more robust. It is a cool idea and something I would like to implement at some point, just that I feel like step 1i offers a lot of room for improvement and that I should improve 1i first.

1 Like