Grid Interpolation Layer - Improvement

Found a minor bug in grid interpolation layer that triggered in a model I am using, but was not noticeable in any other model I had used previously. I am mid-fix now, just posting for anyone who is interested.

When I wrote the grid interpolation layer code, I thought carefully about what happens if one of the two coarse grid points we want to interpolate between involves a next period V was a minus -Inf, and made sure the linear interpolation is based on conservatively assuming the region in between this point and the next is all -Inf. I also thought about when ReturnFn was -Inf at one of the two coarse grid points, but since I evaluate ReturnFn exactly (rather than interpolate) we can just ignore these when. We get the most accurate Policy with this approach, while still being conservative and avoiding anything that should perhaps be a -Inf.

The agent distribution iteration then involves moving all agents back onto the coarse grid, and this was done by turning the fine index into a weight and using those weights to move agents to the two course grid points either side of the fine index.

Individually these are both perfectly good choices, but the combination of the two causes an issue in some niche cases.

Specifically, I realised that because of how the ReturnFn being -Inf is handled, interacting this with the simulation/iteration taking place on the coarse grid causes an error that results in agents leaking into infeasible situations. Specifically, when you have that a neighbouring coarse ReturnFn point is -Inf and you choose a point on the fine grid, while the Policy is fine, I was then putting a non-zero weight on the neighbouring coarse grid point during simulation/interpolation, which was like saying that with a non-zero probability you had chosen a -Inf ReturnFn.

One solution would simply to be to rule out the Policy from using fine grid if the neighbouring grid point is ReturnFn of -Inf. But this is rather crude, as we loose a lot of accuracy from Policy. Instead, I think it is more accurate that I keep the fine grid index when next to a ReturnFn of -Inf, but we record a flag to say that the the probability weight assigned to the neighbouring point will be zero during simulation/iteration.

In this way, we can keep the fine grid in Policy, giving more accurate plots for Policy, and more accurate stats when we use FnsToEvaluate, while at the same time eliminating any ‘leakage’ from the agent distribution into these ‘infeasible states’.

To implement this, I am adding an additional row to Policy whenever grid interpolation layer is being used. This row contains the flag: 2 is the usual use of fine index to create the weights, 1 is all weight to lower grid point, 3 is all weight to upper grid point.

As an end user this should make no impact at all to you, with two exceptions. First, on the outside chance you happen to have a model where this kind of leakage was occuring, in which case it will be corrected, you will see a change in results and the new ones are the correct ones. Second, if you have any saved files containing Policy from the old behaviour, these will no longer be usable as they will contain the wrong number of rows.

One way to think about this change is how it handles constraints that tip from being not-binding to being binding in between course grid points. You have to be fuzzy on one side or another of the tipping point, either you allow some fuzziness by breaching the constraint as far as the next grid point, or you allow some fuzziness by enforcing the constraint as far as the previous grid point. The old version was to allow fuzziness by breaching the constraint as far as the next grid point (I did not realise this at the time). Simply ruling out the fine index in these situations would involve fuzziness by enforcing the constraint as far as the previous grid point. The approach taken in the new version is to not impose any fuzziness on the Policy, and to impose fuzziness by enforcing the constraint as far as the previous grid point when doing simulation/iteration of the agent distribution.

Of course, as the number of coarse grid points increases, these differences between the approaches essentially disappear. The question is to find a method that works well when the coarse grid is truly coarse. I think the new approach treads a better balance than the old approach did.

The change to toolkit github will occur later this week, first I have a lot of coding/testing to do. I will post a reply here once the change is done. I will update the pdf on grid interpolation then too.

4 Likes

Good point. Just to make sure if I understand: let a_star be the optimal index on the coarse grid. Usually, you build the fine grid in the interval between a_star-1 and a_star+1. But if a_star+1 gives a return of -inf (infeasible point), this creates problems. The optimal point on the finer grid might be between a_star and a_star+1, so that a_star+1 gets some positive probability in simulation. Is my understanding correct? I think adding a few equations would clarify this issue.

Typo: on the outside change should be on the outside chance.

1 Like

Once the code is done I will update the pdf, that version will have the exact equations/description. Just wanted to get the idea down.

and fixed the typo

2 Likes

See pdf: GridInterpolationLayer/GridInterpLayer.pdf at main ¡ robertdkirkby/GridInterpolationLayer ¡ GitHub

On page 8 it discusses the problem and lists four possible approaches. I have ruled out one and two. If anyone has a view on which of three and four seems best please let me know.
[Intuition at pg 4 introduces the figures used on page 8]

the subsection after that about how to code it and giving precise algorithm and equation details is still very preliminary and incomplete, largely because I haven’t decided which of possibilities three and four to follow

1 Like

I will read the document! Meanwhile, I was thinking that an obvious way of fixing this problem is to let the distribution be defined on the finer grid, but that would be impractical.
(The optimal point on the finer grid is by construction feasible).

Moreover, how likely it is this problem to show up in practice? If I see that something is strange when I simulate or compute the distribution, I would just increase the number of points on the coarse grid and the problem would go away.
It would be interesting if you could describe the model or the situation where this numerical issue showed up.

1 Like

Typo: convering → converging

The pseudocode introducing a_grid does so without protecting the underscore within the name, creating a bad g subscript (a_grid) instead of a\_grid

A thought off the top of my head on the questions the paper asks: what if the toolkit could help users create better grids in the first place. Many example codes cleverly use cubic interpolation near points that are known to be important for maximizing accuracy (when 101 points between -1 and 1 are cubed, we get a lot more points closer in value to zero than closer to -1 or 1).

As a writer of return functions, I don’t necessarily know how my binding constraints are going to show up on the grid(s), but if the toolkit told me that there’s an infinity at a=98 that carries a lot of probability across to a=90, I could adjust my grid to place additional points between 90 and 100. This of course would result in a report that there’s an infinity between a=97 and a=98, but a corresponding tolerance value would say at what point do I have the accuracy I want in terms of range based on where infinities show up in my domain.

I do think there’s a point at which you don’t want the toolkit itself going too far off the grid (so to speak) in search of optima, but giving the grid-writer feedback on their chosen grids might be both practical and welcomed.

2 Likes

Setting up some adaptive grids is definitely on the list of longer-term goals for the toolkit. You are right that even just having a diagnostic that tells the user this —without an automated adaptation— would be a useful tool in itself.

Actually that does suggest one immediate idea. Both the third and fourth options for what to do with -Inf in grid interpolation layer will create flags for where this occurs. Those could be the basis for a simple diagnostic for where more points would help.

In the shower this morning I had a related thought. One advantage of the third is that when you ask ‘how many people find this constraint binding’, the third (and the second) give a good answer. Whereas first and fourth kind of resolve the constraint by steering people well wide of it. This has me leaning towards using the third, and then I can reuse that flag as a diagnostic for where more points would help the grid interpolation layer.

PS. One of the main challenges with the adaptive grids is that this is a value fn problem, not a static problem. So we don’t just need grids that give an accurate solution for today, we also need grids that we can interpolate for tommorow, and because we will be doing that a huge number of times that interpolation has to be fast. A lot of tricks for creating clever adaptive sparse grids are great at approximating the function, but they tend not to have been designed with ‘fast to interpolate’ in mind. A more immediate problem is that most of them are highly serial and not created with gpu parallelization in mind

1 Like

As the coarse (Layer 1) grid gets more points, all four of the possible grid interpolation layer approaches to dealing with -Inf will give essentially the same. Intuitively, if the interpolation is just splitting hairs across a tiny range, then how you do it end up being fairly irrelevant. Also, as the coarse (Layer 1) grid gets more points, the interpolation is adding very little and you are getting back to something close to pure discretization which we analytically know solves absolutely everything correctly.

1 Like

I had a model with two endogenous states. A collateral constraint says a1prime>=-\lambda*a2prime (a1 was savings, and can only be negative if collateralized against a2). The issue was the interpolation meant some agents ended up implicitly breaching this (due to the interpolation back onto the grid during iteration of the agent dist). The leakage was not large, but I did not like it.

1 Like

Really? I think Simon Scheidegger has several tutorials on github about adaptive sparse grid and the algorithms exploit parallelism.
I think in the long run the toolkit should move in this direction, trying to alleviate the curse of dimensionality.

With this type of models sometimes you can do a transformation to simplify the borrowing constraint:

a' \geq -\lambda h'

becomes
w' \geq 0,
where w \equiv a+\lambda h. Then instead of (a,h) you keep track of (w,h)

This post by Julien Pascal is a nice explanation of the adaptive sparse grids used by Brumm & Scheidegger (2017), for anyone interested

1 Like

Nice reference, thanks.

I found another paper that solves discrete VFI with tensor train decomposition (something similar to adapative sparse grids?), by Richard Dennis. The title of the paper is ambitious: Value Function Iteration Without the Curse of Dimensionality.

P.S. This is not really related to the original post about grid-interpolation-layer-improvement, so apologies for going off topic. I can create a separate post about “VFI toolkit new solution methods: a wishlist” :slight_smile:

I had codex analyse the document and it found some issues:

Substantive Issues

  • GridInterpLayer.tex: “based on monotonicity” is probably not the right condition for concluding the true optimum lies in ((aprime_{i-1}, aprime_{i+1})). You need something like concavity/quasi-concavity or single-peakedness of the objective in (a’), not monotonicity alone. AGREE

  • GridInterpLayer.tex: the second-layer point list repeats (a_{i-1,1},\dots,a_{i-1,n2}) on both sides of (aprime_i). The second block should be something like (a_{i,1},\dots,a_{i,n2}), i.e. points between (aprime_i) and (aprime_{i+1}). AGREE

  • GridInterpLayer.tex: aL2_grid=linspace(g_j(a,z)-1,g_j(a,z)+1,3+2*n2) creates an index grid, but later it is passed to (F(\cdot,a,z)) as if it were asset values. It should either be clearly an index grid mapped through interp1, or written directly as asset values. AGREE

  • GridInterpLayer.tex: the fine-grid index formula looks wrong. It says
    fine grid index = n*(lower index - 1) + second layer index,
    but it should be based on the number of fine intervals per coarse interval:
    (n2+1)(\text{lower index}-1)+\text{second layer index}.
    The same issue appears again at line 196.

  • GridInterpLayer.tex: the text says the stored second-layer index runs from (1) to (n2+2), but the pseudocode’s second-layer maximization at line 159 returns an index from (1) to (3+2n2). The conversion from centered-index form to lower-grid-point/local-index form needs to be spelled out.

I think the first one of these is wrong.

We are not talking about montonicity of a function (which is what the AI sounds like it is describing). We are talking about monotonicity of the return fn and implicitly also of the next period value fn. Since the ReturnFn is monotone dec in aprime, and E[V] is monotone inc in aprime, this is enough.

Intuitively, because on fn decreases and the other increases, you are looking for where they ‘cross’ (by which I mean where the marginal utilities are equal magnitude but with opposite sign, if they were continuous fns). The further you go from the point on the (coarse, layer 1) grid where this ‘occurs’ (closest to holding exactly, given stuck on grid) this can only get further away from being the optimal point.

Where this ‘occurs’ on the coarse (layer 1) grid has to be ‘next to’ where it ‘occurs’ in the fine (layer 2) grid.

Fixed the other four, updated pdf. Thanks!

1 Like

Just one random comment about adaptive sparse grids, tensor trains, etc. These mainly focus on representing the value fn V, as being the source of the curse of dimensionality.

This is half true. But in practice the real binding part of the curse of dimensionality tends to be the action-space, not the state-space. It is the dimensionality of trying to solve the max() operator, not trying to represent V, that is the real bottleneck in most models. [This is why reinforcement learning is likely a much better idea than deep learning for value fn problems.]

[The only model that comes to mind that does not fit this is Keane & Wolpin (1997), their action space is just a 4-valued single dimension, but the state-space is 20^4-ish. But this is the exception that proves the rule.]

1 Like

I made a final decision on how to handle “-Inf in the ReturnFn at a neighbouring coarse grid point” when doing the grid interpolation layer. It has been implemented throughout all the FHorz commands (InfHorz will be done next week, I need a rest from it). I’ve updated the grid interpolation layer pdf to explain exactly what being done.

Note that this is something that impacts very few models. I must have solved almost 100 models before I came across one where this issue arose.

1 Like