Divide and Conquer (solve one endogenous state models faster)

VFI Toolkit can now use a ‘divide and conquer’ algorithm, specifically two-level monotonicity, to solve problems where aprime(a) is a monotone function (where next period endogenous state is weakly increasing in this period endogenous state).

This is faster and uses less memory than the default (pure discretization). It is now available in the baseline settings (finite horizon, possibly with decision variables d, exogenous markov states z, exogenous iid stats e).

You use it by setting
vfoptions.divideandconquer=1;
vfoptions.level1n=11;
The second of these controls how many points to use in the ‘first level’.

This brief pdf explains what two-level monotonicity does. It will allow you to solve value function problems faster and with less memory. It applies to all models where aprime(a) is a weakly increasing function (conditional on any d, z or e); which is borderline all models in economics. Currently only allows for one endogenous state.

This github repo has a few examples, which just differ by the combination of d, a, z in the model (they are largely copy-paste of some examples from the Intro to Life-Cycle Models). Each solves the value function twice: once with default (pure discretization) algorithm, and once with the divide-and-conquer algorithm. You can see that both give the same solution, but divide and conquer is faster.

In short, if you set
vfoptions.divideandconquer=1;
vfoptions.level1n=11;
then you can solve problems faster (and use bigger grids as memory is less of a bottleneck). You will only get the correct solution if your problem is montone, but almost all economics problems are (and you can always double-check by solving with default and comparing answers).

This divide-and-conquer algorithm, called ‘two-level monotonicity’ is basically just the ‘binary monotonicity’ idea of Gordon & Qiu (2018), but modified so that it performs on a GPU [theirs is very serial]. Thanks to Alessandro Di Nola for suggesting GQ2018 to me :smiling_face:


This will be properly documented later this year (2024). For now it is buyer-beware, I ran some tests and everything seems to work but it is not widely tested. It will be by late this year once it gets full documentation, but the speed ups and large grids it enables are very nice so wanted to make them available sooner rather than later.

1 Like

Wow, that’s great! Does it work also for models with semi-exogenous shocks?

Not yet, just the basics for now. Hopefully I can do semi-exogenous shocks in the later half of this year (conceptually I can’t see any issue, should work fine), but I am about to start teaching so won’t happen this side of July.

1 Like

I was playing around with the option divideandconquer (which is really great in terms of speed-up) and I tried also a life-cycle model with two earners (married couple), see codes IntroToLifeCycleModels/LifeCycleModel28.m at main · vfitoolkit/IntroToLifeCycleModels · GitHub

With default options it works fine (but I have to set vfoptions.lowmemory=0 otherwise I run out of memory). I tried setting

vfoptions.divideandconquer=1;
vfoptions.level1n=11;

but unfortunately the code returns with the following error message:

vfoptions = 

  struct with fields:

                        n_e: [3 3]
                     e_grid: [9×2 gpuArray]
                       pi_e: [9×1 gpuArray]
                  lowmemory: 0
                   paroverz: 0
           divideandconquer: 1
                    level1n: 11
                    verbose: 1
                   parallel: 2
               returnmatrix: 2
            incrementaltype: 0
                polindorval: 1
                 outputkron: 0
    policy_forceintegertype: 0
          exoticpreferences: 'None'
                    dynasty: 0
            experienceasset: 0
           experienceassetu: 0
                 riskyasset: 0
              residualasset: 0
               e_gridvals_J: [9×2×81 gpuArray]
                     pi_e_J: [9×81 gpuArray]

Finite horizon: 80 of 81 
Error using  + 
Arrays have incompatible sizes for this operation.

Error in ValueFnIter_Case1_FHorz_DC1_e_raw (line 201)
            entireRHS_ii=ReturnMatrix_ii+DiscountFactorParamsVec*reshape(entireEV(daprimez(:)),[N_d,1,n_z]);  % Note that entireEV is independent of e

Error in ValueFnIter_Case1_FHorz_DC1 (line 29)
                    [VKron, PolicyKron]=ValueFnIter_Case1_FHorz_DC1_e_raw(n_d,n_a,n_z,  vfoptions.n_e, N_j, d_grid, a_grid, z_gridvals_J, vfoptions.e_gridvals_J, pi_z_J, vfoptions.pi_e_J, ReturnFn, Parameters, DiscountFactorParamNames, ReturnFnParamNames, vfoptions);

Error in ValueFnIter_Case1_FHorz (line 808)
    [V,Policy]=ValueFnIter_Case1_FHorz_DC1(n_d, n_a, n_z, N_j, d_grid, a_grid, z_gridvals_J, pi_z_J, ReturnFn, Parameters, DiscountFactorParamNames, ReturnFnParamNames, vfoptions);

Error in LifeCycleModel28 (line 162)
[V, Policy]=ValueFnIter_Case1_FHorz(n_d,n_a,n_z,N_j, d_grid, a_grid, z_grid, pi_z, ReturnFn, Params, DiscountFactorParamNames, [], vfoptions);
1 Like

should be fixed, thanks for pointing this out :slight_smile:

[I had n_z where it should be N_z, since I had only tried out divide-and-conquer on one-dimensional z models this had worked fine, but this model has two-dimensional z so it errored]

1 Like

Thanks a lot! Would you suggest setting vfoptions.level1n=7 or 11? Is there a default value?

no default, I have not done systematic tests to see what value for vfoptions.level1n is faster so I have no real idea.

I played with it a little with one model, there is in some sense a trade-off. The higher the value you set, the less points need to be evaluated which will make it faster, but the more serial (instead of parallel) it is and this will make is slower. So there seems to be a value at which it is fastest. For the model I played with 11 seemed reasonable (5 was slower, so was 21), but I have no idea how general this is, or whether 9 11 or 13 is better. The runtime difference between the different values for vfoptions.level1n was much smaller than the difference between using divide-and-conquer vs not using it. [If GPU memory is a bottleneck, a higher value reduces the memory usage.]

1 Like

Got it, thanks for the explanation! Another question: when you have a model like this one with a static choice d, you apply the divide and conquer strategy conditional on d, exploiting the fact that the policy function a'=g(a,z|d) is increasing with respect to a, for all possible d, correct? In this notation g(a,z|d) denotes the policy function for a' conditional on d.

In terms of pseudo-code, for each combination of state variables (a,z), you first loop over d and then conditional on d, you exploit monotonicity over a and a' and you find a'=g(a,z|d). Then you optimize over d with brute force and you obtain a'=g*(a,z)=g(a,z,d*(a,z)).

In principle, what we want is for each point d_i in d: first step, find a_lower(d_i) and a_upper(d_i), second step, just consider the aprime that are between these two. Which is exactly what you are decribing. But in practice I had to compromise to get the code vectorized to properly exploit parallelization.

This could be easily done by looping over the values of d (which is how pseudo-code explains it). But in practice this is just too slow (as GPU sucks as serializing).

So I do something that involves more points, but is faster. I cannot see how to just vectorize the above (so as I can do all the d_i in d in parallel). So instead what I do is: first step, for all the d_i calculate a_lower(d_i) and a_upper(d_i), second step, consider the aprime that are between min_i[a_lower(d_i)] and max_i[a_upper(d_i)]. This I could write vectorized and so is fast on GPU. Note that to avoid the serialization I end up looking at more values of aprime, but making it parallel was so much more important runtime wise than looking at fewer points that it was worthwhile anyway. [This was motivated because otherwise I couldn’t see how to vectorize the different number of points for a_lower(d_i) and a_upper(d_i) over the various d_i, so I tried the less discriminating version as I could see how to vectorize that]

Note that the whole reason for a gpu-version of divide-and-conquer as the the Gordon-Qiu version is way to serial for the GPU and so was no use on the gpu. The gpu-version of divide-and-conquer is essentially taking Gordon-Qiu and saying ‘more points, but parallel’, and what we are describing here is me saying again ‘more points, but parallel’. Parallel is just so crucial to decent gpu runtime that in both situations it was worth evaluating more points to get better parallelization.

Note also, I do the same thing across all the z (and e) as I do across the d; that is, I take the min of the lower limits and the max of the upper limits.

If you want a more precise psuedo-code for what I actually do when there are d/z/e, tell me and I will write one.

[I also treat specially the case where a_lower and a_upper are equal using an if-else statement in codes as in this case we can skip computing aprime as the problem is trivial.]

1 Like

Thanks for the quick reply!

How do you determine a_lower(d_i) and a_upper(d_i)? I thought it would simply be a_lower(d_i)=a_grid(1) and a_upper(d_i)=a_grid(end), for all d_i.

First step, for just vfoptions.level1n of the points, call them a_level1_j in the a_grid, find aprime*(d_i,a_level1_j) for each d_i; where aprime* is the optimal aprime [which is what you denoted as a'=g(a|d), I drop the z for simplicity]

Second step, for all the points from a_level1_j to a_level1_{j+1} the a_lower(d_i) is aprime*(d_i,a_level1_j), while the a_upper(d_i) is aprime*(d_i,a_level1_{j+1}). Then I do the min of lowers and the max of the uppers to eliminate the dependence on d_i [so a_lower=min_i a_lower(d_i) and a_upper=max_i a_upper(d_i) ] Now I can just use the a_lower to a_upper range as the aprime to consider for all the a points from a_level1_j to a_level1_{j+1} (and allowing any choice of d as if the second step is a totally standard problem).

1 Like

Note that in so far as different d_i give highly different optimal aprime (conditional on fixed a) the way I do a_lower=min_i a_lower(d_i) (and max) will involve evaluating way more points than a more serialized divide-and-conquer. But in practice, the ‘range’ of prime coming from the grid on aprime (which is just a_grid) tends to by way larger than the range of aprime that is driven by different d_grid values, so end up considering much less aprime values than the basic pure discretization algorithms.

1 Like

Figured it might help to point to actual code. Here is code with d and a (but no z nor e) variables.

Skip to line 121 to get to loop over ages j (part above is just final period).
First, we do level 1 which involves considering all a_grid points for aprime, but just vfoptions.level1n of the a_grid points for a.
Line 139 creates the return function, this is slightly different to normal as output is (d,aprime,a) whereas normal would have (dxaprime,a) [normal is that first dimension is both d and aprime]
Then I do a max() on line 143, and this gives me the index for d (note that this is still conditional on aprime). Another max() on line 147 gets the aprime.

So now I have the optimal policies for level 1.

Lines 155-159 are where I do the a_lower=min_i a_lower(d_i) and a_upper=max_i a_upper(d_i)

And then lines 163 to 180 implement the level 2 so they are about considering the possible aprime in the range a_lower to a_upper. [the if-else is just there for speed in the trivial case were a_lower=a_upper]

1 Like

Is the divide and conquer implemented for models with two z shocks z1 and z2 but no d variable? I tried to run the following code (copied in large part from IntroToLifeCycleModels/LifeCycleModel11A.m at main · vfitoolkit/IntroToLifeCycleModels · GitHub, but I fixed labor at one, and set n_d=0) but it returns with an error

Here is my code:

clear 
clc
close all

toolkit_folder = 'C:\Users\aledi\Documents\GitHub\VFIToolkit-matlab';
addpath(genpath(toolkit_folder))

%% Begin setting up to use VFI Toolkit to solve
% Lets model agents from age 20 to age 100, so 81 periods

Params.agejshifter=19; % Age 20 minus one. Makes keeping track of actual age easy in terms of model age
Params.J=100-Params.agejshifter; % =81, Number of period in life-cycle

% Grid sizes to use
n_d    = 0; % Endogenous labour choice (fraction of time worked)
d_grid = 0;
n_a    = 301; % Endogenous asset holdings
n_z    = [7,5]; % Exogenous labor productivity units shocks, persistent and transitiory
N_j    = Params.J; % Number of periods in finite horizon

%% Parameters

% Discount rate
Params.beta = 0.96;
% Preferences
Params.crra = 2; % Coeff of relative risk aversion (curvature of consumption)

% Prices
Params.w=1; % Wage
Params.r=0.05; % Interest rate (0.05 is 5%)

% Demographics
Params.agej=1:1:Params.J; % Is a vector of all the agej: 1,2,3,...,J
Params.Jr=46;

% Pensions
Params.pension=0.3;

% Age-dependent labor productivity units
Params.kappa_j=[linspace(0.5,2,Params.Jr-15),linspace(2,1,14),zeros(1,Params.J-Params.Jr+1)];
% persistent AR(1) process on idiosyncratic labor productivity units
rho_z1=0.9;
sigma_epsilon_z1=0.02;
% transitory iid normal process on idiosyncratic labor productivity units
sigma_epsilon_z2=0.2; % Implictly, rho_z2=0

% Conditional survival probabilities: sj is the probability of surviving to be age j+1, given alive at age j
% Most countries have calculations of these (as they are used by the government departments that oversee pensions)
% In fact I will here get data on the conditional death probabilities, and then survival is just 1-death.
% Here I just use them for the US, taken from "National Vital Statistics Report, volume 58, number 10, March 2010."
% I took them from first column (qx) of Table 1 (Total Population)
% Conditional death probabilities
Params.dj=[0.006879, 0.000463, 0.000307, 0.000220, 0.000184, 0.000172, 0.000160, 0.000149, 0.000133, 0.000114, 0.000100, 0.000105, 0.000143, 0.000221, 0.000329, 0.000449, 0.000563, 0.000667, 0.000753, 0.000823,...
    0.000894, 0.000962, 0.001005, 0.001016, 0.001003, 0.000983, 0.000967, 0.000960, 0.000970, 0.000994, 0.001027, 0.001065, 0.001115, 0.001154, 0.001209, 0.001271, 0.001351, 0.001460, 0.001603, 0.001769, 0.001943, 0.002120, 0.002311, 0.002520, 0.002747, 0.002989, 0.003242, 0.003512, 0.003803, 0.004118, 0.004464, 0.004837, 0.005217, 0.005591, 0.005963, 0.006346, 0.006768, 0.007261, 0.007866, 0.008596, 0.009473, 0.010450, 0.011456, 0.012407, 0.013320, 0.014299, 0.015323,...
    0.016558, 0.018029, 0.019723, 0.021607, 0.023723, 0.026143, 0.028892, 0.031988, 0.035476, 0.039238, 0.043382, 0.047941, 0.052953, 0.058457, 0.064494,...
    0.071107, 0.078342, 0.086244, 0.094861, 0.104242, 0.114432, 0.125479, 0.137427, 0.150317, 0.164187, 0.179066, 0.194979, 0.211941, 0.229957, 0.249020, 0.269112, 0.290198, 0.312231, 1.000000]; 
% dj covers Ages 0 to 100
Params.sj=1-Params.dj(21:101); % Conditional survival probabilities
Params.sj(end)=0; % In the present model the last period (j=J) value of sj is actually irrelevant

figure
plot(Params.agej,Params.kappa_j)
hold on
plot(Params.agej,Params.pension*ones(N_j,1))


%% Grids
% The ^3 means that there are more points near 0 and near 10. We know from
% theory that the value function will be more 'curved' near zero assets,
% and putting more points near curvature (where the derivative changes the most) increases accuracy of results.
a_min = 0;
a_max = 10;
a_grid=a_min+(a_max-a_min)*(linspace(0,1,n_a).^3)'; % The ^3 means most points are near zero, which is where the derivative of the value fn changes most.

% First, the AR(1) process z1
Tauchen_q = 3.0;
[z1_grid,pi_z1]=discretizeAR1_Tauchen(0.0,rho_z1,sigma_epsilon_z1,n_z(1),Tauchen_q);
z1_grid = exp(z1_grid);

% Second, the transitory process. Since z2 is purely iid, all rows of the
% transition matrix pi_z2 are the same, i.e. Prob(z2,z2') does not depend
% on z2.
[z2_grid,pi_z2]=discretizeAR1_Tauchen(0.0,0.0,sigma_epsilon_z2,n_z(2),Tauchen_q);
z2_grid = exp(z2_grid);
pi_e = pi_z2(1,:)'; % Distribution of z2=e, transitory shock
mean_z2 = dot(pi_e,z2_grid);
z2_grid = z2_grid/mean_z2;

% Combine transition matrix for two shocks. z1 is ordered first, Kronecker
% in reverse order.
z_grid = [z1_grid;z2_grid];
pi_z = kron(pi_z2,pi_z1);

%% Set up return function

DiscountFactorParamNames={'beta','sj'};

% input z1 and z2
ReturnFn=@(aprime,a,z1,z2,agej,Jr,w,kappa_j,r,pension,crra) ...
    Model_ReturnFn(aprime,a,z1,z2,agej,Jr,w,kappa_j,r,pension,crra);

%% Now solve the value function iteration problem, just to check that things are working before we go to General Equilbrium
disp('Test ValueFnIter')
vfoptions=struct(); % Just using the defaults.
vfoptions.verbose = 1;
vfoptions.divideandconquer = 0;
tic;
[V, Policy]=ValueFnIter_Case1_FHorz(n_d,n_a,n_z,N_j,d_grid,a_grid,z_grid,pi_z,ReturnFn,Params,DiscountFactorParamNames,[],vfoptions);
toc

% V is now (a,z1,z2,j). One dimension for each state variable.
% Compare
size(V)
% with
[n_a,n_z(1),n_z(2),N_j]
% there are the same.
% Policy is
size(Policy)
% which is the same as
[length(n_a),n_a,n_z(1),n_z(2),N_j]
% The n_a,n_z(1),n_z(2),N_j represent the state on which the decisions/policys
% depend, and there is one decision for each decision variable 'd' and each
% endogenous state variable 'a', and one for each exogenous state variable 'z'

If I set vfoptions.divideandconquer = 0, or if I leave the default options, all good. But if I set vfoptions.divideandconquer = 1, then I get the error:

Error using CreateReturnFnMatrix_Case1_Disc_DC1_nod_Par2
Too many input arguments.

Error in ValueFnIter_Case1_FHorz_DC1_nod_raw (line 24)
    ReturnMatrix_ii=CreateReturnFnMatrix_Case1_Disc_DC1_nod_Par2(ReturnFn, n_z, a_grid, a_grid(level1ii), z_gridvals_J(:,:,N_j), ReturnFnParamsVec,1);

Should be fixed (divide and conquer with z and with no d variables). Thanks for spotting! (was minor typo)

1 Like

Divide-and-conquer now works with semi-exogenous states

1 Like

Perfect, thanks a lot! I will run our code for the model with married couples and search effort and test if the policy functions are the same.

Just as a reminder to myself, the model has

  • One endogenous state variable a, asset holdings.
  • Two decision variables, d1 and d2, which represent search effort of husband and wife.
  • Two semi-exogenous shocks, z1 and z2, which are the employment statuses of the two members (The transition rates depend on d).
  • One aggregate shock z3, which is business cycle conditions (expansion vs recession).

Note: The aggregate shock z3 is purely exogenous but it affects the transition rates of z1 and z2. For this reason in the code we classify it as semi-exogenous.

UPDATE
The code (see our shared folder, Codes\v3\main_v3) runs well without divide and conquer, but if I set

vfoptions.divideandconquer=1;
vfoptions.level1n=11;

then I get the following error message:

vfoptions = 

  struct with fields:

                    verbose: 1
                    n_semiz: [2 2 2]
                 semiz_grid: [6×1 double]
                 numd_semiz: 2
             SemiExoStateFn: [function_handle]
           divideandconquer: 1
                    level1n: 11
                   parallel: 2
               returnmatrix: 2
                  lowmemory: 0
                   paroverz: 1
            incrementaltype: 0
                polindorval: 1
                 outputkron: 0
    policy_forceintegertype: 0
          exoticpreferences: 'None'
                    dynasty: 0
            experienceasset: 0
           experienceassetu: 0
                 riskyasset: 0
              residualasset: 0

Finite horizon: 611 of 612 
Error using gpuArray/arrayfun
@(sm,sf,aprime,a,lm,lf,za,agej,Jr,w_m,w_f,mu_n,r,pen_m,pen_f,sigma,kappa0_m,kappa1_m,kappa0_f,kappa1_f)f_ReturnFn(sm,sf,aprime,a,lm,lf,za,agej,Jr,w_m,w_f,mu_n,r,pen_m,pen_f,sigma,kappa0_m,kappa1_m,kappa0_f,kappa1_f)
takes at least 20 inputs. 19 were supplied.
For more information see Tips.


Error in CreateReturnFnMatrix_Case1_Disc_DC1_Par2 (line 64)
        Fmatrix=arrayfun(ReturnFn, d1vals, shiftdim(aprime_grid,-l_d), shiftdim(a_grid,-l_d-1), shiftdim(z_gridvals(:,1),-l_d-2), shiftdim(z_gridvals(:,2),-l_d-2), shiftdim(z_gridvals(:,3),-l_d-2), ParamCell{:});

Error in ValueFnIter_Case1_FHorz_SemiExo_DC1_nod1_noz_raw (line 179)
        ReturnMatrix_d2ii=CreateReturnFnMatrix_Case1_Disc_DC1_Par2(ReturnFn, 1, n_semiz, d2_grid(d2_c), a_grid, a_grid(level1ii), semiz_gridvals_J(:,:,jj), ReturnFnParamsVec,1);

Error in ValueFnIter_Case1_FHorz_SemiExo_DC1 (line 18)
                [VKron, PolicyKron]=ValueFnIter_Case1_FHorz_SemiExo_DC1_nod1_noz_raw(n_d2,n_a,n_semiz, N_j, d2_grid, a_grid, semiz_gridvals_J, pi_semiz_J, ReturnFn, Parameters, DiscountFactorParamNames, ReturnFnParamNames, vfoptions);

Error in ValueFnIter_Case1_FHorz (line 740)
        [V,Policy]=ValueFnIter_Case1_FHorz_SemiExo_DC1(n_d1, n_d2, n_a, n_z, vfoptions.n_semiz, N_j, d1_grid, d2_grid, a_grid, z_gridvals_J, semiz_gridvals_J, pi_z_J, pi_semiz_J, ReturnFn, Parameters, DiscountFactorParamNames, ReturnFnParamNames, vfoptions);

Error in main_v3 (line 208)
[V2,Policy2]=ValueFnIter_Case1_FHorz(n_d,n_a,n_z,N_j,d_grid,a_grid,z_grid,pi_z,ReturnFn,Params,DiscountFactorParamNames,ReturnFnParamNames,vfoptions);

Related to this, I also tested the publicly available code LifeCycleModel30 (based on Sommer 2016) and I found a discrepancy between standard VFI and VFI cum divide and conquer.
Specifically, standard vfi gives that V has size

 151     2     4    11    81

while vfi with divideandconquer gives

151    11     2     4    81 

This is a model with both z and semiz variables (differently from our code which has only semiz):

n_semiz=[2,4]; %number of infants, number of children
n_z=11; %Exogenous labor productivity units shock

Hence it seems that standard vfi orders the state variables as (a,semiz,z) while vfi with divide and conquer follows (a,z,semiz). I think one of the two ordering is better for the Tan improvement perhaps?

Fixed the Life-Cycle Model 30.

I had just done the reshape() of the answer wrong. I missed it when running my own tests as I did max(abs(V2(:)-V(:))) which undid the reshape so I didn’t notice it was wrong. Was correct answer, just in the wrong shape :rofl:

1 Like