In most algorithms used by the toolkit, we want to precompute the Return Matrix before doing the iterations, to improve performance. Let me explain in the context of an example.
a
We are solving the income fluctuation problem with discrete VFI:
where
is the “static” return function we want to precompute. Conceptually, we write a function that, given the inputs (a',a,z) and other parameters, returns the payoff F. So we first write a function as this one:
function F = ifp_returnfn(aprime,a,z,r,w)
c = (1+r)*a+w*z-aprime;
F = -inf;
if c>0
F = log(c);
end
end %end function
Then we compute the F matrix as an array with 3 dimensions:
F_matrix = zeros(n_a,n_a,n_z);
tic
for z_c=1:n_z
z_val = z_grid(z_c);
for a_c=1:n_a
a_val = a_grid(a_c);
for ap_c=1:n_a
aprime_val = a_grid(ap_c);
F = ifp_returnfn(aprime_val,a_val,z_val,r,w);
F_matrix(ap_c,a_c,z_c) = F;
end
end
end
In the toolkit, this is done is a separate way depending on whether the user is on CPU or GPU.
On a CPU, the array F_matrix is built in the function CreateReturnFnMatrix_Case1_Disc. The implementation is similar to the nested loops I wrote above, and sometimes it uses a parfor.
On a GPU, the array F_matrix is built in the function CreateReturnFnMatrix_Case1_Disc_Par2 using a very different method, based on arrayfun (for GPU). First we move to GPU the relevant grids and reshape them:
a_grid_gpu = gpuArray(a_grid);
z_grid_gpu = gpuArray(z_grid);
aprime_vals = reshape(a_grid_gpu,[n_a,1,1]);
a_vals = reshape(a_grid_gpu,[1,n_a,1]);
z_vals = reshape(z_grid_gpu,[1,1,n_z]);
Then we call arrayfun:
ReturnFn = @(aprime,a,z,r,w) ifp_returnfn(aprime,a,z,r,w);
tic
F_matrix_gpu = arrayfun(ReturnFn, aprime_vals,a_vals,z_vals,r,w);
toc
The important thing to note here is that we are using automatic expansion: the arrays a’,a,z need to be compatible, but they don’t need to have the same size.
Is it possible to do the same arrayfun on the CPU, so the toolkit could use the same functions?!
It turns out that arrayfun on the CPU behaves differently from the GPU, which is quite annoying and a source of confusion
arrayfun on the cpu wants all input arguments to have the same size, which is of course inefficient. So we can do this (it works):
aprime_vals = reshape(a_grid,[n_a,1,1]);
a_vals = reshape(a_grid,[1,n_a,1]);
z_vals = reshape(z_grid,[1,1,n_z]);
aprime_vals = repmat(aprime_vals,[1,n_a,n_z]);
a_vals = repmat(a_vals,[n_a,1,n_z]);
z_vals = repmat(z_vals,[n_a,n_a,1]);
r_vals = r*ones(n_a,n_a,n_z);
w_vals = w*ones(n_a,n_a,n_z);
tic
F_matrix_cpu = arrayfun(ReturnFn, aprime_vals,a_vals,z_vals,r_vals,w_vals);
toc
But is more verbose and also slower than the nested loops (which is weird, since with arrayfun we are telling Matlab that the loop iterations are independent).
In my humble opinion, Matlab should spend some time optimizing arrayfun for the cpu ![]()