Bug in StationaryDist_FHorz_Iteration_SemiExo_noz_raw

I think there is a bug in the way this function indexes the finite-horizon semi-exogenous transition matrix. This is the function:

Relevant object

The input pi_semiz_J is a 4-dimensional array with size


[N_semiz, N_semiz, N_dsemiz, N_j]

where the dimensions are:

  1. current semi-exogenous state

  2. next semi-exogenous state

  3. decision affecting the semi-exogenous transition

  4. age

So in a finite-horizon model, the transition matrix is allowed to vary with age.

Relevant code

In StationaryDist_FHorz_Iteration_SemiExo_noz_raw.m:


Policy_dsemiexo=reshape(Policy_dsemiexo,[N_a*N_semiz,1,N_j]);

semizindex=repelem(gpuArray(1:1:N_semiz)',N_a,1) ...

+ N_semiz*gpuArray(0:1:N_semiz-1) ...

+ (N_semiz*N_semiz)*(Policy_dsemiexo-1);

and later:


semiztransitions=gather(pi_semiz_J(semizindex(:,:,jj)));

Why this looks wrong

The array semizindex only encodes linear indexing for the first 3 dimensions of pi_semiz_J:

  • current semiz

  • next semiz

  • decision

It does not include any offset for the 4th dimension, age.

In particular, there is no term of the form


(N_semiz*N_semiz*N_dsemiz)*(jj-1)

in the construction of semizindex.

Therefore, semizindex(:,:,jj) is an index for a 3-dimensional array of size


[N_semiz, N_semiz, N_dsemiz]

but it is then being used to index the 4-dimensional array pi_semiz_J.

Why this implies a bug

If pi_semiz_J varies with age, then the code should select the age-jj slice of the transition matrix before applying the linear index over (semiz, semiz', d).

But the current code does:


pi_semiz_J(semizindex(:,:,jj))

instead of first restricting to age jj.

So the age dimension is not being indexed correctly.

What I think the code should do instead

It seems the safe way is:


pi_semiz_jj = pi_semiz_J(:,:,:,jj);

semiztransitions = gather(pi_semiz_jj(semizindex(:,:,jj)));

because semizindex(:,:,jj) is exactly an index for the 3-dimensional object


pi_semiz_jj(semiz, semizprime, dsemiz).

Bottom line

So the issue is:

  • pi_semiz_J is 4-dimensional

  • semizindex only indexes 3 dimensions

  • the current code applies the 3-dimensional linear index directly to the 4-dimensional array

This appears to ignore the finite-horizon age dimension of the semi-exogenous transition matrix.

Note: I generated this post with the help of AI.

@robertdkirkby, @MichaelTiemann Please let me know what you think about this. I’m having strange results in a project of mine that uses the toolkit and with my proposed fix everything looks fine.

@robertdkirkby I created a PR here: Fix FHorz semiexo PType distribution and lifecycle profile helpers by aledinola · Pull Request #79 · vfitoolkit/VFIToolkit-matlab · GitHub This contains two fixes: the distribution command and the lifecycle command.

1 Like

Merged the pull request. Looks right to me (I didn’t test it, just read the code in the pull request).

1 Like

@aledinola any idea if this was just the _noz or all the semiexo StationaryDist_FHorz_Iteration_SemiExo codes?

I guess I am going to go look at all the others now. Maybe I will set up some tests on it.

1 Like

I don’t know. The model I am testing has only semi-exogenous shocks but no exogenous z shocks. I also don’t have any d1 variables (i.e. I have only one d variable which affects the semiz transitions, which in the toolkit notation should be known as d2, right?).

Regarding the fix: Codex put a gather, but is it really necessary? If all the inputs to the function are already on cpu, it should be redundant.

I have shared my code on a public repo. I hope it helps for your tests. This is the workflow that I’ve followed:

  • I started with a toy model coded with the VFI-toolkit. The model is explained in the accompanying pdf.
  • Then I asked an AI (Codex) to recode the model using simple Matlab code on the cpu.
  • Test for discrepancies b/w the toolkit results and the manual results.
  • If there are discrepancies, investigate the source.

This is how I spotted the bugs in the toolkit. I used to do this myself in the past but now with Codex is much faster.

By the way, the bug in the stationary distribution was subtle because the toolkit function did not return an error message. The original code was just always using the pi_semiz_J for age=1.

Hmm, it was a problem in every StationaryDist command when semi-exogenous shocks probability transitions depend on age. Glad you found this!!!

As you say original code was just always using the pi_semiz_J for age=1.

I ended up doing a minorly different implementation. Runtime between this and your version were effectively identical, but this version fits more cleanly with the nProbs codes, letting the with and without nProbs versions look more alike.

Just pushed the fix: fix age-conditional semi-exo shock transition probabilities · vfitoolkit/VFIToolkit-matlab@cfc5f19 · GitHub

2 Likes

This error was bigger than anything that should occur. Clearly I need to add ‘age-dependent shocks’ to the core toolkit tests, maybe I should build something that tests these, as well as ptype dependence in shocks and initial dist.

Thanks for spotting this! I feel like this is something I should have seen much much earlier.

1 Like

I am always happy to improve the toolkit :blush:

I think it would help other users (as well as our future selves) to have a simple example with semi-exogenous, age-dependent shocks. This could be placed either in https://github.com/vfitoolkit/VFItoolkit-matlab-examples/tree/master/FiniteHorizon or in https://github.com/vfitoolkit/IntroToLifeCycleModels.

Would this problem also affect the panel simulation? For my project I have to simulate panel data as well, for some complicated moments.

Don’t think it will impact the Panel data commands, they operate in a quite different way.

For example, you can see on line 19 of

that it is indexing the j-dimension of cumsumpi_semiz_J, which is the relevant version of pi_semiz_J.

Easy test, write a model with pi_semiz_J that differs by J. Then create a FnsToEvaluate that just returns the value of ‘semiz’. Then compare values in AgeConditionalStats (run stationary dist command, followed by lifecycleprofiles command) with a Panel Data simulation that takes the mean over the cross-section to get the age-conditional means. You should get the same answer both ways (up to the additional random noise errors in the panel data).

1 Like

Good suggestion! I am planning to do the test you suggest, conditional on permanent types. So I will compare the following moments between sim panel simulation and distribution:

  • Average of semiz by age
  • Average of semiz by age and by ptype

The two sets of moments must be very close (not exactly the same because simulation has noise but the distance should get smaller as I increase Nsim).

Does this make sense?

1 Like

Makes sense to me. Testing both the j and i dimensions.

1 Like