In a previous post, there was an interesting discussion on how to do the Howard step in an infinite-horizon model.
I have created a new repo where I compare two methods: the first uses interp1 and the second uses precomputation of interpolation weights and indices. Interestingly, I was able to create the repo and its contents using codex in VSCode, which feels amazing! The documentation and explanations are really well done, better than I could have done myself. For example, codex generated a super clear README file:
Howard VFI GPU Benchmark (MATLAB)
This repository benchmarks two versions of value function iteration (VFI) with Howard policy improvement for an Aiyagari-style problem on the GPU:
fun_vfi_howard_interp1.m: uses interp1 in Howard policy evaluation.
fun_vfi_howard_manual_precomp.m: uses manual interpolation with precomputed indices/weights.
The main script, main.m, builds grids and parameters, moves data to GPU, runs warm-up calls, times both solvers with gputimeit, compares solution differences, and plots the policy function.
Requirements
MATLAB with Parallel Computing Toolbox (GPU support)
Update the addpath(...) line in main.m if your VFIToolkit location differs.
Run:
main
What main.m reports
GPU runtime of both methods (seconds)
Speed ratio (interp1/manual)
Max absolute difference between value/policy outputs
A policy plot for low and high productivity states
Notes
Timing uses gputimeit (appropriate for GPU workloads).
The script includes warm-up runs to reduce first-call/JIT timing bias.
@robertdkirkby I would definitely recommend using Matlab from VSCode+codex to document semi-automatically many toolkit features and repositories. It is a boring and repetitieve task at which LLMs nowadays excel.
Regarding the comparison of interp1 vs manual interpolation, this is the output of main.m
Benchmarking VFI with interp1 in Howard (GPU)...
Benchmarking VFI with manual interpolation in Howard (GPU)...
Time VFI with interp1 in Howard: 1.506464 s
Time VFI with manual interpolation in Howard: 0.578993 s
Speed ratio (interp1/manual): 2.602
Difference b/w two methods: 1.776357e-15
I am not sure you are measuring runtimes correctly in your test. Matlab suggests using gputimeit, not tic toc.
For the manual interpolation, I use discretize to locate the interpolation indices, but it would be faster to use an algebraic formula since the fine grid is equally spaced (as you do in your implementation). This means that the advantage of manual interpolation over interp1 would be even more substantial.
Hi @aledinola, I was able to run your files and I got similar timings: the version without interp1 is three times faster. In your file fun_vfi_howard_manual_precomp.m I see that on line 84 you still use interp1. Is that intentional?
Thanks for confirming my results. Yes, I use interp1 in the maximization block of the VFI in both versions. This was intentional: I wanted to focus on the Howard block. In the maximization step, the policy function a'(a,z) changes for every iteration, so there arenât many advantages of precomputing the interpolation indices and weights.
Also, please bear in mind that the codes in this repo are purely for illustration: to make the point that it is faster to do Howard in a certain way: for any other purpose you should use the VFI Toolkit commands, they are pretty good in terms of performance and generality.
Ok I compare your version with the manual interpolation and also with the toolkit command ValueFnIter_Case1 and they have almost same run time. I think you can make your version faster if you remove this loop.
for z_c=1:n_z
% EV_h_interp is [n_a,1]
EV_h_interp = omega(:,z_c).*EV_h(ind(:,z_c),z_c)+(1-omega(:,z_c)).EV_h(ind(:,z_c)+1,z_c);
V_h(:,z_c) = U_pol(:,z_c)+betaEV_h_interp;
end
I made minor modification to loop 100 times over the two solution methods, and then report mean of the runtimes. I get Time VFI with interp1 in Howard: 0.294313 s Time VFI with manual interpolation in Howard: 0.367992 s Speed ratio (interp1/manual): 0.800 Difference b/w two methods: 1.332268e-15
I guess this leaves the question of why is interp1 faster on my computer but slower on both of yours.
Do you want to try with repeated runs? Just copy-paste the following modification to lines 63-67 of main.m.
T=100;
time_interp=zeros(1,T);
time_manual=zeros(1,T);
for tt=1:T
disp('Benchmarking VFI with interp1 in Howard (GPU)...')
time_interp(tt) = gputimeit(f_interp);
disp('Benchmarking VFI with manual interpolation in Howard (GPU)...')
time_manual(tt) = gputimeit(f_manual);
end
and then change the print out (lines 77-78) to be
fprintf('Time VFI with interp1 in Howard: %.6f s\n', mean(time_interp))
fprintf('Time VFI with manual interpolation in Howard: %.6f s\n', mean(time_manual))
PS. When you do an operation just once you often get slightly misleading runtime results because of things âwarming upâ, like if you need to open a parallel pool to run a parfor loop on cpu. [Not claiming this happened here, just as a general rule for looking at runtimes.]
PPS. This is roughly the same runtime ratios as I was getting in my own tests.
PPPS. I got a new desktop last month with an NVIDIA RTX 4000 Ada, hence my fast runtimes Iâm running Matlab R2025B.
Thanks for running the test on your machine. I will modify the code as you suggest and run (1) using R2025b on the same GPU card and (2) using R2025b on a more powerful GPU. It is interesting that run times vary across machines/Matlab versions.
Regarding averaging run times: I agree with you, but in main.m I already do a warm-up, see lines 54-61. However you method is more accurate.
g = gpuDevice;
wait(g);
% Warm-up to avoid first-call/JIT overhead dominating the benchmark
f_interp();
wait(g);
f_manual();
wait(g);
The 20 GB of GPU ram vs my previous 8 has taken two endogenous state models from âcan be runâ to âeasy to run even with quite a few exogenous shocksâ, which is nice
@robertdkirkby On R2024b but on a more powerful GPU (NVIDIA GeForce RTX 5080), manual interpolation is still faster (0.85 bs 0.91 seconds) but the difference is much smaller. I report below the results of my test.
A side thing I learnt from this comparison is that on my RTX 5080, R2025b is faster than R2024b.
Manual interpolation a bit faster than interp1, by 18% (mean comparison but median is very similar). My earlier results showing a three-fold advantage of manual interpolation were probably biased by not averaging out run times (but I was also on a different GPU so bohâŚ)
@robertdkirkby What could be the difference b/w my results and your results? Maybe the operating system? I assume you are on Linux while I am on win64 (see results files on github). Also, we are using different GPUs. In any case, the advantage of replacing interp1 with precomputation of indices and weights is not so clear as I had hoped
Yeah, Iâm on linux. Apparently it can make a small difference:
" MATLAB performance is similar on Windows, macOS, and Linux, although differences can occur among platforms for the following reasons: MathWorks builds its products with a different compiler on each platform, and each has its own performance characteristics."
Not sure if that is what the difference is.
I suspect maybe this is also a lesson about high-level programming languages, which is that more complicated in-build functions can still perform well on runtimes if their internals are carefully written in lower level languages. Iâm guessing that this is the key to why the runtime differences between the two is negligible.
My overall takeaway: manual interpolation is probably very minorly faster than using interp1(), but the difference is small enough to be largely irrelevant. Given the whole toolkit is already coded around interp1() I just donât think any advantage from doing manual interpolation is worth re-coding anything.