Taking advantage of automatic array expansion in Matlab

This is a somewhat technical post :slight_smile:

I was looking into the toolkit function VFIToolkit-matlab/ValueFnIter/FHorz/SemiExo/ValueFnIter_Case1_FHorz_SemiExo_nod1_noz_raw.m at master · vfitoolkit/VFIToolkit-matlab · GitHub, which is relevant to solve finite-horizon models with semi-exogenous shocks. In a few places the following lines of codes appear:

EV=VKronNext_j.*shiftdim(pi_semiz',-1);
EV(isnan(EV))=0; %multilications of -Inf with 0 gives NaN, this replaces them with zeros (as the zeros come from the transition probabilites)

EV=sum(EV,2); % sum over z', leaving a singular second dimension
entireRHS=ReturnMatrix_d2+DiscountFactorParamsVec*repmat(EV,1,N_a,1);

Note that in the last line ReturnMatrix_d2 is a 3-D array over (NA,NA,NZ), while EV is also a 3-D array with dimensions (NA,1,NZ). The code replicates the array EV along the second dimension to make it compatible with ReturnMatrix_d2. This is however not necessary since Matlab R2016, because of a feature known as automatic expansion, also used by numpy in Python. You can read about automatic expansion and its speed benefits here.

Why is it faster is explained very well in the blog post linked above.
I have edited the function ValueFnIter_Case1_FHorz_SemiExo_nod1_noz_raw.m and the code is about 50% faster!
For example, the line of code above is replaced by

% EV is expanded from (NA,1,NZ) to (NA,NA,NZ)
entireRHS=ReturnMatrix_d2+DiscountFactorParamsVec*EV

Also, it worth it to set vfoptions.paroverz=1, this also improved speed signifincantly.

I post here my version of ValueFnIter_Case1_FHorz_SemiExo_nod1_noz_raw.m. Note that also the line

EV_z=VKronNext_j.*(ones(N_a,1,'gpuArray')*pi_semiz(z_c,:));

can be replaced by

EV_z=VKronNext_j.*pi_semiz(z_c,:); %this will be (NA,NZ)=(a',z)

for the same reason

function [V,Policy3]=ValueFnIter_Case1_FHorz_SemiExo_nod1_noz_raw(n_d2,n_a,n_semiz,N_j, d2_grid, a_grid, semiz_gridvals_J, pi_semiz_J, ReturnFn, Parameters, DiscountFactorParamNames, ReturnFnParamNames, vfoptions)

N_d2=prod(n_d2);
N_a=prod(n_a);
N_semiz=prod(n_semiz);

V=zeros(N_a,N_semiz,N_j,'gpuArray');
% For semiz it turns out to be easier to go straight to constructing policy that stores d,d2,aprime seperately
Policy3=zeros(2,N_a,N_semiz,N_j,'gpuArray'); % just d2 and aprime

%%
d2_grid=gpuArray(d2_grid);
a_grid=gpuArray(a_grid);

l_d2=length(n_d2);
if l_d2==1
    d2_gridvals=d2_grid;
elseif l_d2==2
    d2_gridvals=[kron(ones(n_d2(2),1),d2_grid(1:n_d2(1))), kron(d2_grid(n_d2(1)+1:end),ones(n_d2(1),1))];
end
    
if vfoptions.lowmemory>0
    special_n_semiz=ones(1,length(n_semiz));
end

% Preallocate
V_ford2_jj=zeros(N_a,N_semiz,N_d2,'gpuArray');
Policy_ford2_jj=zeros(N_a,N_semiz,N_d2,'gpuArray');


%% j=N_j

% Create a vector containing all the return function parameters (in order)
ReturnFnParamsVec=CreateVectorFromParams(Parameters, ReturnFnParamNames,N_j);

if ~isfield(vfoptions,'V_Jplus1')
    if vfoptions.lowmemory==0

        ReturnMatrix=CreateReturnFnMatrix_Case1_Disc_Par2(ReturnFn, n_d2, n_a, n_semiz, d2_grid, a_grid, semiz_gridvals_J(:,:,N_j), ReturnFnParamsVec);
        %Calc the max and it's index
        [Vtemp,maxindex]=max(ReturnMatrix,[],1);
        V(:,:,N_j)=Vtemp;
        Policy3(1,:,:,N_j)=shiftdim(rem(maxindex-1,N_d2)+1,-1);
        Policy3(2,:,:,N_j)=shiftdim(ceil(maxindex/N_d2),-1);

    elseif vfoptions.lowmemory==1

        for z_c=1:N_semiz
            z_val=semiz_gridvals_J(z_c,:,N_j);
            ReturnMatrix_z=CreateReturnFnMatrix_Case1_Disc_Par2(ReturnFn, n_d2, n_a, special_n_semiz, d2_grid, a_grid, z_val, ReturnFnParamsVec);
            %Calc the max and it's index
            [Vtemp,maxindex]=max(ReturnMatrix_z,[],1);
            V(:,z_c,N_j)=Vtemp;
            Policy3(1,:,z_c,N_j)=shiftdim(rem(maxindex-1,N_d2)+1,-1);
            Policy3(2,:,z_c,N_j)=shiftdim(ceil(maxindex/N_d2),-1);
        end

    end
else
    % Using V_Jplus1
    V_Jplus1=reshape(vfoptions.V_Jplus1,[N_a,N_semiz]);    % First, switch V_Jplus1 into Kron form

    DiscountFactorParamsVec=CreateVectorFromParams(Parameters, DiscountFactorParamNames,N_j);
    DiscountFactorParamsVec=prod(DiscountFactorParamsVec);
    
    if vfoptions.lowmemory==0
        for d2_c=1:N_d2
            % Note: By definition V_Jplus1 does not depend on d (only aprime)
            pi_semiz=pi_semiz_J(:,:,d2_c,N_j);
            d2_val=d2_gridvals(d2_c,:)';

            ReturnMatrix_d2=CreateReturnFnMatrix_Case1_Disc_Par2(ReturnFn, ones(1,l_d2), n_a, n_semiz, d2_val, a_grid, semiz_gridvals_J(:,:,N_j), ReturnFnParamsVec);
            % (d,aprime,a,z)

            if vfoptions.paroverz==1

                EV=V_Jplus1.*shiftdim(pi_semiz',-1);
                EV(isnan(EV))=0; %multilications of -Inf with 0 gives NaN, this replaces them with zeros (as the zeros come from the transition probabilites)
                EV=sum(EV,2); % sum over z', leaving a singular second dimension

                % entireEV=kron(EV,ones(N_d1,1));
                %             entireEV=repelem(EV,N_d,1,1); % I tried this instead but appears repelem() is slower than kron()
                entireRHS=ReturnMatrix_d2+DiscountFactorParamsVec*repmat(EV,1,N_a,1);

                %Calc the max and it's index
                [Vtemp,maxindex]=max(entireRHS,[],1);

                V_ford2_jj(:,:,d2_c)=shiftdim(Vtemp,1);
                Policy_ford2_jj(:,:,d2_c)=shiftdim(maxindex,1);


            elseif vfoptions.paroverz==0

                for z_c=1:N_semiz
                    ReturnMatrix_d2z=ReturnMatrix_d2(:,:,z_c);

                    %Calc the condl expectation term (except beta), which depends on z but not on control variables
                    EV_z=V_Jplus1.*(ones(N_a,1,'gpuArray')*pi_semiz(z_c,:));
                    EV_z(isnan(EV_z))=0; %multilications of -Inf with 0 gives NaN, this replaces them with zeros (as the zeros come from the transition probabilites)
                    EV_z=sum(EV_z,2);

                    % entireEV_z=kron(EV_z,ones(N_d1,1));
                    entireRHS_z=ReturnMatrix_d2z+DiscountFactorParamsVec*EV_z*ones(1,N_a,1);

                    %Calc the max and it's index
                    [Vtemp,maxindex]=max(entireRHS_z,[],1);
                    V_ford2_jj(:,z_c,d2_c)=Vtemp;
                    Policy_ford2_jj(:,z_c,d2_c)=maxindex;
                end
            end
        end
        % Now we just max over d2, and keep the policy that corresponded to that (including modify the policy to include the d2 decision)
        [V_jj,maxindex]=max(V_ford2_jj,[],3); % max over d2
        V(:,:,N_j)=V_jj;
        Policy3(1,:,:,N_j)=shiftdim(maxindex,-1); % d2 is just maxindex
        maxindex=reshape(maxindex,[N_a*N_semiz,1]); % This is the value of d that corresponds, make it this shape for addition just below
        Policy3(2,:,:,N_j)=reshape(Policy_ford2_jj((1:1:N_a*N_semiz)'+(N_a*N_semiz)*(maxindex-1)),[1,N_a,N_semiz]);
        
    elseif vfoptions.lowmemory==1
        for d2_c=1:N_d2
            % Note: By definition V_Jplus1 does not depend on d2 (only aprime)
            pi_semiz=pi_semiz_J(:,:,d2_c,N_j);
            d2_val=d2_gridvals(d2_c,:)';

            for z_c=1:N_semiz
                z_val=semiz_gridvals_J(z_c,:,N_j);
                ReturnMatrix_d2z=CreateReturnFnMatrix_Case1_Disc_Par2(ReturnFn, ones(1,l_d2), n_a, special_n_semiz, d2_val, a_grid, z_val, ReturnFnParamsVec);

                %Calc the condl expectation term (except beta), which depends on z but
                %not on control variables
                EV_z=V_Jplus1.*(ones(N_a,1,'gpuArray')*pi_semiz(z_c,:));
                EV_z(isnan(EV_z))=0; %multilications of -Inf with 0 gives NaN, this replaces them with zeros (as the zeros come from the transition probabilites)
                EV_z=sum(EV_z,2);

                % entireEV_z=kron(EV_z,ones(N_d1,1));
                entireRHS_z=ReturnMatrix_d2z+DiscountFactorParamsVec*EV_z*ones(1,N_a,1);

                %Calc the max and it's index
                [Vtemp,maxindex]=max(entireRHS_z,[],1);
                V_ford2_jj(:,z_c,d2_c)=Vtemp;
                Policy_ford2_jj(:,z_c,d2_c)=maxindex;
            end
        end
        % Now we just max over d2, and keep the policy that corresponded to that (including modify the policy to include the d2 decision)
        [V_jj,maxindex]=max(V_ford2_jj,[],3); % max over d2
        V(:,:,N_j)=V_jj;
        Policy3(1,:,:,N_j)=shiftdim(maxindex,-1); % d2 is just maxindex
        maxindex=reshape(maxindex,[N_a*N_semiz,1]); % This is the value of d that corresponds, make it this shape for addition just below
        Policy3(2,:,:,N_j)=reshape(Policy_ford2_jj((1:1:N_a*N_semiz)'+(N_a*N_semiz)*(maxindex-1)),[1,N_a,N_semiz]);

    end
end

%% Iterate backwards through j.
for reverse_j=1:N_j-1
    jj=N_j-reverse_j;

    if vfoptions.verbose==1
        fprintf('Finite horizon: %i of %i \n',jj, N_j)
    end
    
    % Create a vector containing all the return function parameters (in order)
    ReturnFnParamsVec=CreateVectorFromParams(Parameters, ReturnFnParamNames,jj);
    DiscountFactorParamsVec=CreateVectorFromParams(Parameters, DiscountFactorParamNames,jj);
    DiscountFactorParamsVec=prod(DiscountFactorParamsVec);
    
    VKronNext_j=V(:,:,jj+1);

    if vfoptions.lowmemory==0
        for d2_c=1:N_d2
            % Note: By definition V_Jplus1 does not depend on d2 (only aprime)
            pi_semiz=pi_semiz_J(:,:,d2_c,jj);
            d2_val=d2_gridvals(d2_c,:)';

            ReturnMatrix_d2=CreateReturnFnMatrix_Case1_Disc_Par2(ReturnFn, ones(1,l_d2), n_a, n_semiz, d2_val, a_grid, semiz_gridvals_J(:,:,jj), ReturnFnParamsVec);
            % (aprime,a,z) for given d

            if vfoptions.paroverz==1

                EV=VKronNext_j.*shiftdim(pi_semiz',-1);
                % EV(a',z',z)
                EV(isnan(EV))=0; %multilications of -Inf with 0 gives NaN, this replaces them with zeros (as the zeros come from the transition probabilites)
                EV=sum(EV,2); % sum over z', leaving a singular second dimension
                % EV(a',1,z)
                % entireEV=kron(EV,ones(N_d1,1));
                %             entireEV=repelem(EV,N_d,1,1); % I tried this instead but appears repelem() is slower than kron()
                % ReturnMatrix_d2(a',a,z), EV is (a',1,z)
                % Ev is automatically expanded, no need to use repmat
                %entireRHS=ReturnMatrix_d2+DiscountFactorParamsVec*repmat(EV,1,N_a,1);
                entireRHS=ReturnMatrix_d2+DiscountFactorParamsVec*EV;

                %Calc the max and it's index
                [Vtemp,maxindex]=max(entireRHS,[],1);

                V_ford2_jj(:,:,d2_c)=shiftdim(Vtemp,1);
                Policy_ford2_jj(:,:,d2_c)=shiftdim(maxindex,1);

            elseif vfoptions.paroverz==0

                for z_c=1:N_semiz
                    ReturnMatrix_d2z=ReturnMatrix_d2(:,:,z_c);

                    %Calc the condl expectation term (except beta), which depends on z but not on control variables
                    %EV_z=VKronNext_j.*(ones(N_a,1,'gpuArray')*pi_semiz(z_c,:));
                    EV_z=VKronNext_j.*pi_semiz(z_c,:);
                    EV_z(isnan(EV_z))=0; %multilications of -Inf with 0 gives NaN, this replaces them with zeros (as the zeros come from the transition probabilites)
                    EV_z=sum(EV_z,2);

                    % ReturnMatrix_d2z is (a',a), EV_z is
                    % (a',1)==>Automatic expansion to (a',a)
                    entireRHS_z=ReturnMatrix_d2z+DiscountFactorParamsVec*EV_z;

                    %Calc the max and it's index
                    [Vtemp,maxindex]=max(entireRHS_z,[],1);
                    V_ford2_jj(:,z_c,d2_c)=Vtemp;
                    Policy_ford2_jj(:,z_c,d2_c)=maxindex;
                end
            end
        end
        % Now we just max over d2, and keep the policy that corresponded to that (including modify the policy to include the d2 decision)
        [V_jj,maxindex]=max(V_ford2_jj,[],3); % max over d2
        V(:,:,jj)=V_jj;
        Policy3(1,:,:,jj)=shiftdim(maxindex,-1); % d2 is just maxindex
        maxindex=reshape(maxindex,[N_a*N_semiz,1]); % This is the value of d that corresponds, make it this shape for addition just below
        Policy3(2,:,:,jj)=reshape(Policy_ford2_jj((1:1:N_a*N_semiz)'+(N_a*N_semiz)*(maxindex-1)),[1,N_a,N_semiz]);

    elseif vfoptions.lowmemory==1
        for d2_c=1:N_d2
            % Note: By definition V_Jplus1 does not depend on d2 (only aprime)
            pi_semiz=pi_semiz_J(:,:,d2_c,jj);
            d2_val=d2_gridvals(d2_c,:)';

            for z_c=1:N_semiz
                z_val=semiz_gridvals_J(z_c,:,jj);
                ReturnMatrix_d2z=CreateReturnFnMatrix_Case1_Disc_Par2(ReturnFn, ones(1,l_d2), n_a, special_n_semiz, d2_val, a_grid, z_val, ReturnFnParamsVec);

                %Calc the condl expectation term (except beta), which depends on z but
                %not on control variables
                EV_z=VKronNext_j.*(ones(N_a,1,'gpuArray')*pi_semiz(z_c,:));
                EV_z(isnan(EV_z))=0; %multilications of -Inf with 0 gives NaN, this replaces them with zeros (as the zeros come from the transition probabilites)
                EV_z=sum(EV_z,2);

                % entireEV_z=kron(EV_z,ones(N_d1,1));
                entireRHS_z=ReturnMatrix_d2z+DiscountFactorParamsVec*EV_z*ones(1,N_a,1);

                %Calc the max and it's index
                [Vtemp,maxindex]=max(entireRHS_z,[],1);
                V_ford2_jj(:,z_c,d2_c)=Vtemp;
                Policy_ford2_jj(:,z_c,d2_c)=maxindex;
            end
        end
        % Now we just max over d2, and keep the policy that corresponded to that (including modify the policy to include the d2 decision)
        [V_jj,maxindex]=max(V_ford2_jj,[],3); % max over d2
        V(:,:,jj)=V_jj;
        Policy3(1,:,:,jj)=shiftdim(maxindex,-1); % d2 is just maxindex
        maxindex=reshape(maxindex,[N_a*N_semiz,1]); % This is the value of d that corresponds, make it this shape for addition just below
        Policy3(2,:,:,jj)=reshape(Policy_ford2_jj((1:1:N_a*N_semiz)'+(N_a*N_semiz)*(maxindex-1)),[1,N_a,N_semiz]);
        
    end
end


end

I’ve gone and removed all the repmat() and the like for putting the ReturnMatrix together with the EV terms in the value function commands for semiz with no z so you can get the full benefits effective immediately :smiley:

I will go do it for all the other value fn commands in early January. There are over 100 sub-functions for value fn iteration nowadays (different combinations of shocks, asset types, and preferences) and I just don’t have the heart to go do it right now (I am trying to take a break until christmas from the technical innards of the toolkit, do some research instead).

I have made vfoptions.paroverz=1 the default (it was previously default when prod(n_z)>50, but now is just always the default). A bit of background, back when I was doing pre-version-1 of the toolkit it wasn’t even obvious to me how to parallel over z when calculating E[V], so I just did it as a loop. Nowadays parallel over z when calculating E[V] is easy and obvious to me but I just added it as an alternative, rather than overwriting the original completely. Realistically, there is vfoptions.lowmemory, so I should probably just eliminate the whole vfoptions.paroverz=0 completely as if you get out of memory errors you can just switch to vfoptions.lowmemory=1 anyway.

Thanks for this! Faster is always nice :smiley_cat:

1 Like

Have now implemented automatic array expansion in all the value fn iteration commands :smiley:

1 Like

That’s awesome thanks! And it should not be too bad for backward compatibility since automatic expansion was introduces in R2016a :slight_smile: