Divide and Conquer (solve one endogenous state models faster)

robertdkirkby · July 8, 2024, 10:56pm

main_v3.m now runs. Although it is slower with divideandconquer (I think because aprime is not that big relative to d and semiz, so it is not worth the extra loops to look at less of aprime). I might try something else later, maybe not. We’ll see.

[issue was that the code was written without allowing more than one decision variable to influence the semi-exo states]

aledinola · July 9, 2024, 2:22pm

I ran main_v3 too and I confirmed your findings. Results are identical b/w standard vfi and DC but DC is considerably slow. A bit strange… there are 201 asset points, 36 points on d and 8 points on semiz

Time in secs for VFI standard = 61.6505,
Time in secs for VFI divide and conquer = 520.6294,

n_a = 201; % Endogenous asset holdings
n_d = [6,6];
n_z = 0; % All shocks are semi-exog, so there are no pure z shocks
n_semiz = [2,2,2];

robertdkirkby · July 9, 2024, 10:14pm

My interpretation:
Model has (d,aprime,a,semiz)
d is 36
aprime is 201
a is 201
semiz is 8
Divide-and-conquer means not looking at all aprime, but at cost of looping. I think in this model aprime is not big versus the rest (so potential gains are not big) and then in this case I think because my divide-and-conquer is NOT conditional on semiz, it really ends up looking at almost all of aprime anyway. [plus it already has to loop over d2, and now loops for the divide-and-conquer as well, ends up a lot of looping]
[it is not a general thing for semiz, as it was faster in the Life-Cycle Model 30, but I think the nature of semiz in that model means the not conditioning on semiz was not so costly as here]

robertdkirkby · July 16, 2024, 5:39am

I just overhauled divide-and-conquer, so it is a bit better now.

In a model with just aprime and a, there is no difference. Between two points on a for the second layer you consider all aprime in a range aprimelower to aprimeupper.

The difference is in how it is handled if there are other variables (d,z,e). I will explain based on d, but z and e are done in analogous fashion.

If you have d, then aprimelower(d) and aprimeupper(d) depend on d. On a CPU this is trivial, you just loop over d and consider each aprimelower(d) and aprimeupper(d) conditional on d. But on GPU this loop is going to be way too slow.

Previously, the code looked for aprime in the range min_d[aprimelower(d)] to max_d[aprimeupper(d)]. Which is obviously rather wasteful. Now the code looks for aprime in range aprimelower(d)+[0:1:max_d[aprimeupper(d)-aprimelower(d)]]. So the number of aprime points being considered is the now max_d[aprimeupper(d)-aprimelower(d)] where it used to be max_d[aprimeupper(d)]-min_d[aprimelower(d)].

The runtime improvements are in the 10%-40%, depending on the problem (mostly nearer 20%).

I tried out using a lot of NaN in the aprime being considered, but max() with lots of NaN and ‘omitnan’ seems to have same runtime as just max() without the NaN. So I didn’t end up bothering (idea was that I could use a range for aprime that differed for each d, and then make the matrix the same size by filling in the range with NaN so that the size of that dimension of the matrix did not depend on d).

Anyway, from the user’s perspective, the only difference is that now it is 10%-40% faster, depending on your problem. No change in anything the user needs to do.

I also added that if you use vfoptions.verbose=1 while using vfoptions.divideandconquer=1 then it will print a message telling you that suitably setting vfoptions.level1n can speed your code up. I expect most users won’t want to play with this, but power users will, so only saying it if you ask for verbose seemed a good compromise.

robertdkirkby · July 19, 2024, 5:27am

Divide and conquer can now be used for transition path in infinite horizon models.

Just add
vfoptions.divideandconquer=1

(e.g., it solves Aiyagari1994TransitionPath.m in examples, roughly halves the runtime)

[also works with endogenous labour, in which case default will do ‘Refine’ for the stationary eqm, and then divide-and-conquer for the transition path]

robertdkirkby · July 22, 2024, 3:24am

I just did a divide-and-conquer for transition path in infinite horizon models where there are two endogenous states and divide-and-conquer is applied only to the first endogenous state. Should work for transition path in entrepreneurial choice infinite-horizon models, when the entrepreneur/worker is being set up as a second endogenous state. Or anything else where the second endogenous state takes just a few values.

aledinola · July 22, 2024, 11:17am

Looking forward to using this new feature! Thanks

robertdkirkby · July 27, 2024, 3:03am

Divide-and-conquer now works for Infinite Horizon problems with 2 endogenous assets.

Just set
vfoptions.divideandconquer=1
and
vfoptions.level1n=[9,7]
(9 and 7 are just examples, they are number of points in “1st layer” used for the first and second endogenous state, respectively)

Internally I call this “DC2” (divide-and-conquer 2-dimensions). There is also a “DC2B” which is for models with two endogenous states but where divide-and-conquer will only be used for the first of the two endogenous states. For example, if you have a model with n_a=[501,2], and set vfoptions.level1n=[7,2], then it will use DC2B. More generally, any time vfoptions.level1n(2)>=n_a(2), it will use DC2B instead of DC2. The advantage of DC2B is that it is meaningfully faster, so you will want to use it whenever the second endogenous state has ‘few’ grid points.

Note: It seem like in infinite horizon using divide-and-conquer is always slower than not using it (unlike finite-horizon where divide-and-conquer is slower for small grids but faster for large grids), so I don’t plan to implement divide-and-conquer in 1-dimensional infinite horizon problems, but it will let you use much larger grids so it makes sense in two endogenous state models.

aledinola · September 29, 2024, 5:54pm

I was reading again this post and I was wondering if now the toolkit can solve a HANK model in a reasonable amount of time.

Actually a one-asset HANK model should be fine. For two assets HANK (financial asset and illiquid asset) divide and conquer should make it doable, correct?

If so, it would be awesome, given that HANK is the state of the art model in monetary economics nowadays.

P.S. I say this because the Chen 2010 replication is a two assets model and is not too slow, even on my laptop. But Chen 2010 is finite horizon, while HANK is infinite horizon

P.S. 2
As a nice reference for HANK, there is a new paper by Auclert et al

Fiscal and Monetary Policy with Heterogeneous Agents

robertdkirkby · September 29, 2024, 8:42pm

Currently I have the idea that I will be writing some code with aggregate shocks in the first half of next year, although based on past experience there tends to be a 12+ month lag between me writing the first generation of codes that can do something, and the generation of code that gets a public mention of it’s release in VFI Toolkit. (My first attempt always fails to see how best to set up the problem in a general form, so I have to go through a rewrite or three to get a better form for the problem so that the codes can be sufficiently general. The early generations are often pushed to Github but remain without documentation until I am happy with making them public).

I am considering two approaches: (i) linearization based on BKM (or going further, on SSJ). (ii) non-linear, based on Jeffrey Sun’s continuation value method. I have a completely unreasonable borderline allergic reaction to linearization (to my mind it kills most questions of interest, like having fiscal multipliers differ by state of the economy), and I think Sun’s approach might fit really well with how the toolkit approaches setting up and solving problems, really leveraging the existing commands. So I kind of hope to do (ii) instead of (i), but if (ii) turns out to be too hard I will likely resort to (i) [I have not yet attempted to code Sun’s method]. Doing (ii) would also mean the implementation wasn’t just rehashing what you can already do with SSJ, but instead really add some value and open doors to cool new stuff. Anyway, we shall see if I actually end up doing anything (for every 3 plans I have, only one sees fruition; I have a lot of plans )

aledinola · September 29, 2024, 9:14pm

What you’ve outlined is a very interesting and ambitious way ahead!

(1) Do you have a reference for Sun’s method?

(2) Sometimes HANK models have only idiosyncratic shocks and then on top the policy shock (fiscal and/or monetary) is simulated as an MIT shock. See e.g. Oh and Reis (2012) or even Guerrieri and Lorenzoni (2017). Since an MIT shock is numerically a transition path, the toolkit can already handle this (indeed there is your replication of Guerrieri and Lorenzoni)

robertdkirkby · September 29, 2024, 10:27pm

Working paper on his website jeffreyesun.com: Continuation Value Is All You Need: A Deep Learning Method for Solving Heterogeneous-Agent Models With Aggregate Uncertainty

aledinola · October 2, 2024, 9:44pm

Question about divide and conquer with two endogenous state:

Say you have a model with two continuous endogenous state variables, financial assets a and housing assets h. Suppose the policy function a’(a,h,z) is increasing not only in a but also in h. Does the toolkit implementation of binary monotonicity exploit this? Or does it exploit only the monotonicity of a’ with respect to a, conditional on each h?

I think the paper by Gordon and Qi considers also the case of two continuous states

robertdkirkby · October 2, 2024, 9:53pm

With two endogenous states there is a DC2 and a DC2B.

DC2B does divide-and-conquer only in the first of the two endogenous states.
DC2 does divide-and-conquer in both of the two endogenous states.

That said, DC2 is very much a buyer-beware thing for the moment as it really hasn’t been used for anything and is only implemented in a few cases (in terms of combinations with d,z,e, and finite/inf horz).

My impression is that DC2 is only going to work for a rather limited range of models, I need to sit down and think properly about when the kind of monotonicity it tries to exploit will actually be satisfied.

DC2 tries to assume that (a1prime,a2prime) for some (a1,a2) will be found in the range (a1prime(a1lower),a2prime(a2lower)) to (a1prime(a1upper),a2prime(a2upper)). But I’m not sure this is the best idea as I am not sure many models actually satisfy this.

aledinola · October 2, 2024, 10:04pm

I was referring to Section 3 of Gordon and Qiu (2018). It turns out that Gordon and Qiu do something different from what I had in mind. They consider an RBC model with two state variables k (capital) and z (productivity) and one choice k’. The policy function k’(k,z) is increasing in both k and z and their algorithm exploits this property.
So my example about housing was misleading.

robertdkirkby · October 2, 2024, 10:15pm

haha, yeah, that part of Gordon & Qiu threw me too (my first thought was same as yours). VFI Toolkit just parallels over z in this situation (doing monotonicity of k’ in k, conditional on z), which on a gpu is probably faster than trying to exploit the montonicity of k’ in z (I haven’t tried, and given how toolkit handles multidimensional z I don’t really want to either).

I don’t think my attempt to do two-level divide-and-conquer with DC2 was the right approach, but I need to think more carefully about what the right approach to exploiting monotonicity in both of the two endogenous states is.

aledinola · October 2, 2024, 11:54pm

Thanks for your answers!

I also wanted to stress how well DC2B works for finite horizon models. I was able to run the Chen2010 example on a Gpu with only 4 gb of ram. The running time is good and this is a non trivial model given the presence of two continuous state variables.

The binary monotonicity algo is very serial in nature. What you implemented on the gpu is actually a modified version (as explained in your doc). Have you considered implementing the divide and conquer on the cpu with loops? It could make the cpu code competitive, perhaps

robertdkirkby · October 3, 2024, 12:59am

In an ideal world I would do divide-and-conquer on CPU, that is actually a really nice idea. Practically though, I always have to prioritise for the toolkit and so it is not going to happen. Current priorities are around better calibration/estimation, and especially around OLG transition paths. Future vague idea is aggregate shocks. If anyone out there would like to implement divide-and-conquer on the cpu as an option in the toolkit I would be more than happy to support them in doing so

robertdkirkby · October 3, 2024, 1:16am

Btw, I don’t think many models with two endo states are likely to satisfy the kind of ‘double monotone’ that DC2 was intended to exploit. Take that Chen (2010) housing model. If you have more assets, you might buy a bigger house and some less next period assets, so you don’t even have that aprime(a;h) is monotone in a. DC2B avoids this as it is conditional on h and on hprime, and once you condition on hprime (same next period house) you do get monotone aprime(a;h,hprime) in a.

There is probably some kind of way to exploit monotonicity in both assets, but not obvious what it is. Requires sitting down and working it out carefully.

robertdkirkby · October 26, 2024, 4:35am

Divide-and-conquer can now be used together with fastOLG for transition paths in finite-horizon models (so OLG transition paths). This combination is rocket fuel. Solves transitions fast and easy on just a regular desktop GPU