Low memory improvement

aledinola · July 6, 2025, 10:06pm

I have a suggestion to improve performance in infinite horizon models with refinement when lowmemory=1 (and probably=2) is used.

In ValueFnIter_Case1_Refine, in the branch vfoptions.lowmemory==1 around line 32, the code preallocates ReturnMatrix as a double precision array on the CPU:

ReturnMatrix=zeros(N_a,N_a,N_z); % 'refined' return matrix

This should be changed to

ReturnMatrix=zeros(N_a,N_a,N_z,'gpuArray'); % 'refined' return matrix

This small change alone improves the runtimes by roughly half!

The reason is that in the current code ReturnMatrix is not GPU and this creates two inefficiencies:

(1) In ValueFnIter_Case1_Refine, in the loop over z around line 42. ReturnMatrix_z is GPU 2-dim array but it is stored in a 3-dim array on the cpu. I suspect this is slow.
(2) ReturnMatrix is passed to ValueFnIter_Case1_NoD_Par2_raw that converts ReturnMatrix from cpu to gpu many times during the VFI, in this summation:

entireRHS=ReturnMatrix+DiscountFactorParamsVec*EV;

Note that EV is GPU, ReturnMatrix is CPU and gets converted to GPU before being added to EV to form entireRHS on GPU.
Indeed, after the suggested change, the runtimes of creating the return matrix slighlty improves but the runtimes of the VFI improve by 4-5 times. So (2) was the real bottleneck.

robertdkirkby · July 7, 2025, 7:11am

Fixed: create on gpu · vfitoolkit/VFIToolkit-matlab@f7bc38f · GitHub

Nicely spotted! Easy to fix, was essentially a typo but as it doesn’t create any kind of error is not an easy one to spot.