Today I have installed the new Matlab R2025a. Since I’m interested on GPU computing and the VFI toolkit uses heavily gpuArrays, let’s see what’s new in this respect.
Most information is available at this link, in this post I’ll write a (subjective) summary:
https://www.mathworks.com/help/parallel-computing/release-notes.html?s_tid=CRUX_lftnav#mw_9adf403d-16d1-4ca6-982f-2241b814c464
- You can now reshape sparse GPU arrays. This means that reshape(A) works even if A is a sparse gpuArray, before it worked only if A was dense gpuArray. In the context of the toolkit, this is useful for computing the stationary distribution. See for example these lines of code:
% First step of Tan improvement
StationaryDistKron=reshape(Gammatranspose*StationaryDistKron,[N_a,N_z]); %No point checking distance every single iteration. Do 100, then check.
% Second step of Tan improvement
StationaryDistKron=reshape(StationaryDistKron*pi_z,[N_a*N_z,1]);
Moving the Tan improvement on the GPU would save some time in principle but would increase the memory requirements of the algorithm (typically the GPU has less memory than the CPU, e.g. my laptop has 4GB of gpu ram and 32GB of cpu ram). In practice, the Tan improvement is already very fast. Also, I would like to test how efficient is reshape of a sparse gpuArray. Since it is a new functionality, it might not have been optimized.
- The cell2mat function has now gpuArray support. This might be useful when generating sparse matrices for the Howard improvement or for the distribution. For example:
G=cell(n_z,1);
for z_c=1:n_z
% each G has size [n_a,n_a]
G{z_c} = sparse((1:n_a)',Policy_a(:,z_c),n_a,n_a);
end
for z_c=1:n_z
%each Qmat has size [n_a,n_z*na]
Qmat{z_c} = kron(pi_z(z_c,:),G{z_c});
end
% Similar to vertcat:
Qmat = cell2mat(Qmat);
Not really required though, since it may be faster to avoid the loop over z (the code above may not be the best way of doing this thing).
- Single-precision sparse GPU arrays. Not sure how useful this is for economists, since we typically need double precision (i.e. real(8) in Fortran notation) for our models
- The function histcounts has improved performance on the GPU. This function does something similar to
discretize. Example: suppose I have the asset grid
a_grid = [0.0,0.5,1.0];
a_opt = 0.6; % Optimal choice a*
I want to find the index pos such that a_{grid}(pos)<a^*<a_{grid}(pos+1)
Then I can do
[~,~,pos] = histcounts(0.5,a_grid)
which gives pos=2 as expected. I’ve seen histcounts being used by Pontus in his code here
In terms of performance, I run this test on my PC (solving a dense linear system with gpu arrays) and it turned out that R2024b is slightly faster
clear
clc
close all
% dummy random matrix
n = 1000;
A = rand(n)+2*eye(n);
b = rand(n,1);
A = gpuArray(A);
b = gpuArray(b);
time_vec = zeros(100,1);
for ii=1:100
tic
x = A \ b; % Solve the linear system Ax = b
time_vec(ii)=toc;
end
mean(time_vec)
It would be nice to do other tests based on arrayfun for example.
The other major innovation is Matlab copilot. It’s good but not as good as the copilot on Visual Studio Code.