Transition in HA model: a fast method

I attach an interesting paper of Boar and Midrigan, “Efficient Redistribution”.

In the numerical appendix they explain the method they use for solving for the transition. They use a method from a math paper “Globally convergent AA-I for non-smooth non-expansive fixed-point iterations”. I quote from their paper:

“Computing the transition dynamics requires iterating over a sequence of interest
rates and lump-sum transfers during the transitions. This system is typically solved using
a fixed point method, which is notoriously slow and unreliable. We achieved a significant
speed gain and stability by employing the globally convergent Type-I Anderson acceleration
algorithm in Zhang et al. (2018).”

You can find the code (of the math paper, not Boar and Midrigan’s) here:

Looks worth it to look at.

1 Like

Toolkit currently does the “Computing the transition dynamics requires iterating over a sequence of interest rates and lump-sum transfers during the transitions. This system is typically solved using a fixed point method, which is notoriously slow and unreliable.”

I am currently working on overhauling the OLG transition paths (putting in divide-and-conquer) so will definitely try and run some tests using this method.

I certainly agree with them about the notoriously unreliable part! Shooting-algorithms regularly misfire for transition paths.

1 Like

For completeness, here is the code for Boar and Midrigan. This is the AER:Insights paper, not the JME but the algorithm is the same. The relevant folder is Model/nonlinear/andersen

https://www.openicpsr.org/openicpsr/project/178401/version/V1/view

Great! The Boar and Midrigan model is infinite horizon, but the Anderson acceleration algorithm is a generalization of the shooting method, so it should be useful for OLG transitions as well.

1 Like

Link to the published version:
Zhang et al (2020) - Globally Convergent Type-I Anderson Acceleration for Nonsmooth Fixed-Point Iterations
https://doi.org/10.1137/18M1232772

1 Like

Thought I would write out the main idea from Zhang, O’Donoghue & Boyd (2020), as applied to transition paths.

Zhang et al (2020) want to solve the fixed-point problem:

Find x\in \mathbb{R}^n such that x=f(x)

where f:\mathbb{R}^n \to \mathbb{R}^n is potentially nonsmooth. We assume that f is nonexpansive (in \mathcal{l}_2-norm), i.e., ||f(x)-f(y)||_2 \leq ||x-y||_2 for all x,y\in \mathbb{R}^n.

An Economics example is a standard heterogeneous agent incomplete markets model where we are solving for the general equilibrium transition path, n=T is the number of time periods of transition path, x is the general equilibrium price path (say the path on interest rates r_1,...,r_T). And f(x) will involve starting from a price path x and then (a) solve the value function problems backwards from T to 1, (b) iterate the agent distribution forwards using policy found in (a), (c) compute aggregate variables and evaluate the general eqm conditions to create a ‘new’ price path f(x).

The standard approach to solving transition paths is to use a shooting algorithm. This works by first guessing some initial x^0, and then iterating on this creating new updated price paths as x^{k+1}=\alpha x^k + (1-\alpha) f(x^k), where \alpha is the ‘damping’ hyperparameter. This shooting algorithm is referred to by Zhang et al. (2020) as the KM (Krasnosel’skii-Mann) algorithm.

If you knew that f(x) was continuously differentiable you might instead update based on \nabla f (the gradient/derivative of f), this would give a gradient descent style of algorithm. Here though, f is possibly nonsmooth, the only assumption of Zhang et al. (2020) is that f is nonexpansive.

The AA (Anderson Acceleration) method is based on improving on the KM (shooting) algorithm. Notice that KM only uses the most recent step to update the iteration. Anderson acceleration instead maintains a memory of previous steps and updates the iteration as a linear combination of the memory with dynamic weights. This concept is summed up by the following prototype algorithm,

Algorithm 2.1: Anderson acceleration prototype (AA)
1: Input: initial point $x_0$, fixed-point mapping $f:\mathbb{R}^n \to \mathbb{R}^n$.
2: for k=0,1,... do
3:    Choose $m_k$ (e.g., $m_k=min{m,k}, for some integer $m\geq0$)
4:    Select weights $\alpha^k_j$ based on the last $m_k$ iterations satisfying $\sum_{j=0}^{m_k} \alpha_j^k=1$
5:    $x^{k+1}=\sum_{j=0}^{m_k} \alpha_j^k f(x^{k-m_k+j})$
6: end for

Here k is counting the iterations, as before, and now we have m_k which is the memory in iteration k, and the next iterate is a linear combination of the images of the last m_k+1 iterates under the mapping f. Notice that the KM (shooting) algorithm is nested as m_k=1.

The key then is how to choose the weights \alpha_j^k and Zhang et al (2020) provide ‘Stabilised Type-I Anderson acceleration’ as their algorithm 3.1 on their page 3180. [See the paper for the full details.]

Zhang et al. (2020) make two contributions around this. First is an analytical proof of global convergence for the AA Type-I algorithm (assuming f is nonexpansive, but not requiring any smoothness, nor differentiability, nor uniqueness of the fixed-point). Second are lot’s of computational examples showing that their stabilized AA Type-I algorithm outperforms the KM algorithm.

So the general take-away is that anywhere you use shooting (KM) algorithms, you can probably do better by switching to the stabilized AA Type-I algorithm.


Three comments:
i) note that this is for any fixed-point, and so you could use it for all sorts of other problems, including to find a stationary general eqm, or to find the solution to an infinite-horizon value function problem (they discuss a bit of this in the paper).
ii) looks like I should try this stabilized type-I AA algorithm for transition paths, but also for stationary equilbrium
iii) This setting gives a nice perspective for looking at the SSJ (Sequence-Space-Jacobian) method. Essentially, if we assume that f is linear, then \nabla f(x) is independent of x. Then we can just find x based on our initial guess of x_0 directly in a single iteration as x=\frac{f(x_0)-\nabla f(x_0) x_0}{1-\nabla f(x_0)}. This is the SSJ method. The reason the SSJ method is so fast is because you only need one iteration (at the compartively miniscule cost of computing \nabla f(x_0) once). But you can also see why it is so restrictive, namely the assumption that f(x) is linear.
[\nabla f(x_0) is the “sequence-space jacobian”, and x_0 is the “sequence” which in practice is just that we stay in the stationary equilibrium with an epsilon-shock at a given time horizon. The above formula for x in a single iteration can be derived if you think about a function of a scalar, you get it from rearranging \nabla f(x_0)=\frac{f(x)-f(x_0)}{x-x_0}; note that the right-side of this is the slope from x_0 to x, the left-side is the derivative, and because the function is assumed to be linear the slope will equal the derivative.] [For detail on how to compute the SSJ itself, see ABRS2021. Note the linearity assumption does a bit extra, because as noted by BKM2018 it also means that a model with aggregate shocks can be simulated as just a weighted sum of a transition path (weights being the sequence of past shocks).]

1 Like

Nice explanarion, thanks!

Back thinking about transition paths again :slight_smile:

The steps in an OLG transition are roughly:

  1. for a given path
    i) iterate backwards in t solving (value fn problem) to get policy fn
    ii) iterate forwards in t to get agent dist
    iii) calculate aggregates (or model stats more generally)
    iv) evaluate general eqm eqns
  2. update path (based on results of 1iv)
    [in practice I do iii and iv in same forward iteration loop as ii]

Currently toolkit does these by:
1.
i) parallel over j, within t (fastOLG) and if we can also divide-and-conquer
ii) tan improvement, parallel over j
iii) parallel the evaluations over j
iv) don’t do anything special
2. linear update

Over past two-three years, there have been big improvements to (i), (ii) and (iii), making them all parallel over j.

Runtimes currently are largely about two steps, (1i) and 2.

Heer & Mausner textbook, Section 10.4 describe three ways to do step 2: linear, Newton & Broyden. “Updating Schemes” on pg 586 (of 2024 3rd ed.) describes updating a path for \{x_t\}_{t=1}^T as:

Linear: x_t^{i+1}=\phi x_t^i + (1-\phi) x_t^{i*}, where i is the iteration of the path (a counter for what was described above as step 2).
[This is essentially what VFI Toolkit does, but using the evaluation of the general eqm eqn, rather than x_t^{i*} (x_t^{i*} can be understood as value for x_t that we get in step 1iii, but would be difficult to do in a generic fashion, e.g. for things like gov. debt, or for prices rather than quantities)]

Newton-Raphson: \mathbf{x}^{i+1}=\mathbf{x}^i -\mathbf{J(x^i)}^{-1} \mathbf{g(x)}
where \mathbf{J(x^i)} is the Jacobian of \mathbf{g(x)} [it is the state-space jacobian], and \mathbf{g(x)}=\mathbf{x}^i - \mathbf{x}^{i*}. [note, the t subscript disappears as these are all vectors over the t dimension]

Computing the Jacobian can be time consuming in larger models, and these leads to the third method,
Broyden: Same as Newton-Raphson, except do not compute Jacobian every iteration, instead use an approximation for the update. I don’t spell it out here, see Heer & Mausner Section 15.3.2. Broyden takes more iterations than the Newton-Raphson, but can save enough time in computing the Jacobian that it is still faster.

So where does this leave VFI Toolkit? Clearly there would be gains to be had from improving Step 2, whether is is about moving to Broyden, or to Anderson. I feel though like there are even bigger gains to be had from further improving 1i, and I have some ideas on this so I’m going to focus work here. Improving Step 2 is left for the future :smiley:


Note: Heer & Mausner textbook consider updating T (the number of periods of transition path) as part of algorithm. VFI Toolkit does not (at least presently) offer this option. I feel like in practice, is easier to just solve once with big T, eyeball results, choose a new T that has a big of cushion on end but not to much, done.

Note: Anderson algorithm is a version of linear, in which not just the previous iteration, but all of the last m_k iterations are being used.

Note: I say parallel over j in Tan improvement, really I just mean add age j as another dimension of the agent distribution within the iteration over t.

1 Like

Great explanation!

One question: what about transitions in infinite-horizon models? I was thinking about a model a la Guerrieri and Lorenzoni (QJE 2017), for example, or Bettina Bruggeman AEJ:Macro.

Another question: How do you parallelize over j (I guess j refers to age in an OLG model)? I think the loop over j cannot be parallelize because the iterations are not independent from each other. Instead, the loops over the other state variables, (a, z, etc) can be parallelized of course, either with parfor or implicitly through vectorization on the gpu/cpu

Yet another question: Listening to an online presentation of the Boar-Midrigan paper, I recall Virgiliu saying that the method by Zhang et al “saved their life” not so much because it is faster but because it is more robust. He was saying that with the naive method (I think the “linear” method) sometimes the transition did not converge, and they had to compute hundreds of transition paths for many different values of taxes in a reliable way.

1 Like

Infinite-horizon models, the transition is just the basics. All the steps are the same [1 i,ii,iii,iv, and 2], but obviously no parallelizing over j as there is no j (j indexes age/finite-horizon). So the only thing really left is the divide-and-conquer (and using Tan improvement).

How to parallelize over j? When doing transition paths in an OLG model, you have ages j=1,…J, and time periods t=1,…,T. You have to backward iterate over time periods t. But you can parallelize over ‘age’ j. I like to think of the transition like in this image, as being moving “diagonally up” (j,t today becomes j+1,t+1 tomorrow),


Notice that for all the ages in period 3, the ‘next period’ about which you have to form expectations is in period 4 (it is also +1 on age, but that is unimportant). So once we have the period 4 solution, we can do all the period 3 ages in parallel, since the expectations term is coming from period 4 (which we already solved), and not from a ‘different age in period 3’. That is, the problems for the different ages in period 3 are all completely independent from each other, and when problems are independent we can parallelize them :gorilla:. [Note when we are just solving a life-cycle model, and not a transition path, your expectation is about the j+1 and so you have to iterate j (not parallelize), whereas here while the expectation has a ‘j+1’ in it, the expectation is better understood as being about t+1.]

Anderson: Yeah, that Anderson approach for Step 2 definitely looks like it should be more robust. It is a cool idea and something I would like to implement at some point, just that I feel like step 1i offers a lot of room for improvement and that I should improve 1i first.

2 Likes

I was thinking about this exact problem for my model, created the above diagram in my mind last night and this morning, and having found it I now wonder: is it actually implemented?

To wit: I have a few age-dependent statistics (kappa_j and s_j, as well as carservices_j, which models that cars are most valuable when agents are of an age when they have kids that need to be driven around). I also have time-dependent statistics like mew_j (survival age mass over time) and energy_cpi, which diverges from natural (and thus, to the model, invisible) inflation.

When I compute Policy_final, I get the optimal policy choices for agents at the end of time, which I believe. But I could compute Policy_init in one of two ways. I could compute it naively at the beginning of time, where the value of energy_cpi is zero. In that case, agents make their optimal choices with knowledge about how things will shake out over their lives (kappa_j shaping their income, s_j shaping their bequest strategies, and carservices_j informing their car-buying decisions). But at the start of time, they have no foresight into how energy inflation will affect them. Under the assumptions I’m making, an agent born at the start of time will see enormous energy inflation over time, and would therefore be wise to make investments that make them as energy-independent as they can be as soon as possible. But they cannot see that when we calculate V_init and Policy_init in the normal way. But they absolutely could if we calculated their trajectory along the diagonal red lines. Which we can do by casting energy_cpi as an age-like parameter and solving that VFI problem. Let’s call that policy solution Policy_diag.

So I’m thinking about how Policy_init and Policy_diag might differ for that agent of Age 1. And then I’m thinking how we might compose the square not by making (time) columns of (age) rows (iterating backward from T_final to T_init), but rather laying down the diagonal stripes of agents moving through their own age-time pathways, and then normalizing the agent distribution in the final row/column shapes to match up with our normal demographics and statistics.

Since this chart is already drawn, I suspect there are some magic options to do this that I just haven’t found. If so, please reveal them!

If I understand what you are saying I think the answer is that when you solve a transition path nothing about where you begin from in period 1 has to be any kind of stationary general eqm. All that period 1 requires is an initial agent distribution. [You can see this if you look at the inputs to the transition path commands in VFI Toolkit, as the only thing you use is the initial agent dist.]

In practice, people often solve an initial stationary general eqm, and use the agent distribution associated with this as the initial agent distribution for their transition path. But there is no need to do so.

The main alternative would be to build an initial agent dist based on real-world data on the distribution of households/firms/etc.

[I understand you as saying that Policy_init from an initial stationary eqm will differ hugely from PolicyPath(:,:,1) the policy in the first period of the transition. It will. This is fine. Depending on what you are doing with the model you may not have any interest in Policy_init anyway, and the sole use of it was to create an initial agent dist. My impression of your post is you are in a situtation where Policy_init is just something you use to build an initial agent dist, and you can then just throw Policy_init away as it is not useful for anything.]

PS. Internally the ‘fastOLG’ agent dist path is done entirely without age-weights, which are only added back in after the path is complete, which sounds like what you are saying. Actually if you use the main transition path commands the entire agent dist path is stored without age-weights, and then the age-weights are put back in whenever model stats are being computed.

If I understand what you are saying I think the answer is that when you solve a transition path nothing about where you begin from in period 1 has to be any kind of stationary general eqm.

Correct! The GE derives from the agents’ perfect foresight.

In practice, people often solve an initial stationary general eqm, and use the agent distribution associated with this as the initial agent distribution for their transition path. But there is no need to do so.

Yes! The insight of the red lines is that we can do VFI^2 to get a more optimal transition. We still have GEs to solve along each point of the transition (which we can do by full solution or by interpolation as we do getting L-infinity norm below 1). But in this case our agents are not trusting the implied linear interpolations for far-away GEs to iteratively find optimal policy; rather, they can use those interpolations to get a read-out from VFI to then find (in whatever manner) the transitional GE.

[I understand you as saying that Policy_init from an initial stationary eqm will differ hugely from PolicyPath(:,:,1) the policy in the first period of the transition. It will. This is fine. Depending on what you are doing with the model you may not have any interest in Policy_init anyway, and the sole use of it was to create an initial agent dist.

This is either the answer or an answer to my question that I need to digest. What I really want to make peace with is that when a young agent buys a petrol car because it is cheap, and then upgrades later to an EV because they can afford it, did they really know how expensive the petrol car would become due to fossil fuel inflation when they bought it? I.e., did the buyer of the petrol car know at what point they’d be making the switch (which they would if they had the VFI following the red line path), or did the agent wake up later in life, check the cost of fuel at that time in the transition and declare “I’m selling this piece of junk before it puts me in the poor house!”

If the current transition code really gives the equivalent of following the VFI logic, then grand (and I need to internalize that). If we can get better transition decisions by building the rectangle using diagonal lines, then Kia kaha tātou (let’s do it, team!).

Agents all use the diagonal line (this is their ‘personal expectations path’ if you will).

If you don’t want agents to see the future coming you need to use ‘multiple reveal’ transition paths (which in the limit becomes ‘repeated transition paths’ with a new path revealed each period.

That’s good news: I can worry less about what the toolkit does and worry more about my own understanding of its capabilities. :wink:

1 Like