Conditional restrictions on agent distribution stats

Consider a textbook Aiyagari model and suppose I’d like to compute median wealth. I can do the following:

First, I define a function that gives me “wealth”, and I call it “A”:

FnsToEvaluate.A = @(aprime,a,eps,theta) a; 

Then, after computing the Stationary Distribution, I call the function:

AllStats=EvalFnOnAgentDist_AllStats_Case1(StationaryDist,Policy,FnsToEvaluate,...
    Params,[],n_d,n_a,n_z,d_grid,a_grid,z_grid,simoptions);

Median wealth is then given by

AllStats.A.Median

So far so good. But suppose that I have a slightly more complicated model, say Aiyagari with an occupational choice (entrepreneur vs worker) and I would like to compute the median wealth conditional on being a worker and separately conditional on being an entrepreneur.

I used the following but it does not work (it always gives zero as a result)

% Assets owned by entrepreneurs
FnsToEvaluate.A_entre = @(aprime,a,eps,theta,r,w,vi,delta,lambda,gamma) a*f_Entre(aprime,a,eps,theta,r,w,vi,delta,lambda,gamma);

% Assets owned by workers
FnsToEvaluate.A_work = @(aprime,a,eps,theta,r,w,vi,delta,lambda,gamma) a*(1-f_Entre(aprime,a,eps,theta,r,w,vi,delta,lambda,gamma));

Note that in this case, f_Entre is a function that returns 1 if entre, 0 if worker.

I think I understand what is going on: in the case above I set wealth equal to zero whenever the individual is NOT an entrepreneur. Since there are only 5-10% entre in the economy, then most wealth levels are set to 0 and therefore the median is also 0.
What I would like to do is to set the distribution, not the wealth values, equal to 0 if the guy is NOT an entre, and then renormalize. I did this manually and it works well but it would be great to do this automatically with the toolkit, if possible.

I had a look at the toolkit function EvalFnOnAgentDist_AllStats_Case1 and I was wondering if the part on “conditional restrictions”, around line 140, might help.

My manual code:

a_vec = repmat(a_grid,[1,n_z(1),n_z(2)]);
a_vec = a_vec(:);
StationaryDist_Entre = (pol_e==1).*StationaryDist;
StationaryDist_Entre = StationaryDist_Entre/sum(StationaryDist_Entre,"all");
StationaryDist_Work = (pol_e==0).*StationaryDist;
StationaryDist_Work = StationaryDist_Work/sum(StationaryDist_Work,"all");
A_entre_Median = quantili(a_vec(:),StationaryDist_Entre(:),0.50);
A_work_Median = quantili(a_vec(:),StationaryDist_Work(:),0.50);

and the function quantili (not included here but available on request) does the standard task of computing percentiles.

1 Like

The ‘conditional restrictions’ is for exactly this purpose, but I am not sure it really handles AllStats at the moment (it is a vestigial piece from AggVars where it did work). I’ll take a look this week and get it working properly.

The idea is that you can put in a ‘conditional restriction’ via simoptions. The conditional restriction must be a function that evaluates to 1 (or 0), and then only points that satisfy this restriction (that evaluate to 1) will be uses as the basis for computing statistics.

I’ll post again once I clean it up and check that it does work as intended.

2 Likes

Great, thanks! What I can do now is to compute e.g. the entrepreneurs’ share of total wealth as

AllStats.A_entre.Mean/(AllStats.A_entre.Mean+AllStats.A_work.Mean);

The reason is that the command AllStats.A_entre.Mean computes the integral of assets with respect to the population distribution, let’s call it D, and assets are recoded as 0 if the individual is not an entrepreneur. So I get the total assets owned by entrepreneurs.

But if I wanted the average amount of assets conditional on being an entrepreneur I would need to restrict the distribution only to entrepreneurs. Let’s call this new distribution as D_E. Then the average assets of entrepreneurs are equal to the integral of assets with respect to this new distribution, D_E.

(I’m sure you already know this of course, I just write here for other users and to my future self o/w I forget :slight_smile:

Here is an example of how to do conditional restrictions in the Aiyagari (1994) model. The restriction in this example is rather contrived, but point is just to show how it is done.

Say we already have the function we want to evaluate,
FnsToEvaluate.Assets=@(aprime,a,z) a;
And we want to evaluate assets conditional on the shock z being greater than 1.

Then we just set up
simoptions.conditionalrestrictions.bigz=@(aprime,a,z) (z>1);
and then run AllStats
AllStats=EvalFnOnAgentDist_AllStats_Case1(StationaryDist, Policy, FnsToEvaluateIneq,Params, [],n_d, n_a, n_z, d_grid, a_grid,z_grid,simoptions);

Then the conditional mean is in
AllStats.bigz.Assets.Mean
and the conditional median is in
AllStats.bigz.Assets.Median
etc.

We can double check that this is doing the same thing by taking advantage of the fact that the conditional mean is easy to calculate in an alternative manner, specifically set up
FnsToEvaluate.test1=@(aprime,a,z) a*(z>1);
FnsToEvaluate.test2=@(aprime,a,z) (z>1);
Then run AllStats again, and calculate
AllStats.test1.Mean/AllStats.test2.Mean
and we can see that this gives exact same answer as when we used the simoptions.conditionalrestrictions approach. Obviously though this alternative only really works for conditional mean, and for things like the conditional median we need to take the simoptions.conditionalrestrictions approach.

Note, if you want to do something like, assets for people in lowest decile of income, you can run AllStats twice, the first time to get the cutoff for the income decile, and then use this to create a conditional restriction and call AllStats a second time.

3 Likes

Hi Robert,

your explanation is very helpful, thanks! Regarding your last point

Note, if you want to do something like, assets for people in lowest decile of income, you can run AllStats twice, the first time to get the cutoff for the income decile, and then use this to create a conditional restriction and call AllStats a second time.

I would like to do something similar, namely to compute the average tax rate for the top 1 percent of individuals in the income distribution. Then, following your suggestions, I do

% Average tax rates for top percentiles of the income distribution
% Step 1: compute p99,p97,p95,p90 of IncomePreTax distribution
Params.p95 = AllStats.IncomePreTax.QuantileCutoffs(20);

% Step 2: set conditionalrestrictions for IncomePreTax>=p99, etc. and run
% again the AllStats command
simoptions.conditionalrestrictions.p95 = @(aprime,a,eps,theta,K_to_L,alpha,vi,delta,lambda,gamma,lam_hsv,tau_hsv,p95) ...
    f_IncomePreTax(aprime,a,eps,theta,K_to_L,alpha,vi,delta,lambda,gamma,lam_hsv,tau_hsv)>=p95;

AllStats=EvalFnOnAgentDist_AllStats_Case1(StationaryDist,Policy,FnsToEvaluate,...
    Params,[],n_d,n_a,n_z,d_grid,a_grid,z_grid,simoptions);

% Step 3: Compute aggregates of PerIncTax and IncomePreTax, given the restrictions
% Then take the ratio b/w the two aggregates to get the average tax rate
atr_top5 = AllStats.p95.PerIncTax.Mean/AllStats.p95.IncomePreTax.Mean;

Howver, I have a question regarding Step 1 of the above method. How do I find the p99 in QuantilesCutoffs? I’ve seen that QuantilesCutoffs have 21 points. I would like to compute the average tax rate for the top 1, top 3, top 5 and top 10.

Thanks!

Default is 20 ventiles, but you can switch to percentiles by just
simoptions.nquantiles=100;

(In the default, 20 ventiles, you will get 20 quantile means. Between the ventiles are 19 cut-offs, and the output also reports the min and max in the cut-offs, hence why QuantileCutoffs has 21 entries.)

Once you have the cut-offs you will be able to set up all four conditional restrictions and then make a single call to AllStats.

To be clear, multiple conditional restrictions can be set up like,
simoptions.conditionalrestrictions.bigz=@(aprime,a,z) (z>1);
simoptions.conditionalrestrictions.midz=@(aprime,a,z) (z==1);
simoptions.conditionalrestrictions.smallz=@(aprime,a,z) (z<1);

1 Like

Thanks! The code works as intended, but the call to AllStats takes a great amount of time. Just to give an idea, to solve the model in partial equilibrium it takes about 12 seconds, but the VFI and the Distribution take about 1 second only. All the rest is spent calling this line:

AllStats=EvalFnOnAgentDist_AllStats_Case1(StationaryDist,Policy,FnsToEvaluate,...
    Params,[],n_d,n_a,n_z,d_grid,a_grid,z_grid,simoptions);

Using the Matlab profiler I discovered that the bottleneck is the function StatsFromWeightedGrid, especially the computation of quintiles (whichstats(6)=1) and this loop in particular:

for ll=2:nquantiles-1
                QuantileMeans(ll)=(sum(WeightedSortedValues(quantilecutoffindexes(ll-1)+1:quantilecutoffindexes(ll)))-SortedValues(quantilecutoffindexes(ll))*(CumSumSortedWeights(quantilecutoffindexes(ll))-ll/nquantiles)+SortedValues(quantilecutoffindexes(ll-1))*(CumSumSortedWeights(quantilecutoffindexes(ll-1))-(ll-1)/nquantiles))*nquantiles;
end

It’s quite good that the toolkit allows the user to set simoptions.whichstats which tells the code to compute only a subset of all possible statistics. In my case I set whichstats(6)=1 since I need the quintiles. But I need only the cutoffs, not the means and the means take a lot of time (see above). So I was wondering if it is possible to either (1) speed up the quantiles part of StatsFromWeightedGrid or (2) add the option to do only the quintiles cutoffs and not the quintile means, or (3) let the user choose the quantiles to compute in simoptions, instead of computing all 20 (in my case, for example, I need only 3 or 4).

Option (3) would allow the user to pass something like qvec = [0.99, 0.95, 0.90] as a field of simoptions.

Thanks in advance!

P.S. I ask this because I’d like to do some estimation of parameters. I run AllStats only after solving the model of course, but I have to run it before computing the estimation targets…

I’m going to sit down sometime early next week and overhaul conditional restrictions. Get them faster, working same way across FHorz and InfHorz, and get them working with PTypes. I’ll report back once done :smiley:

1 Like

I’ve improved speed for quantiles, set simoptions.whichstats(6)=2
(How to cut runtimes for life-cycle profiles or 'AllStats' when you don't need all the stats (simoptions.whichstats) - #5 by robertdkirkby )

Next is to get conditional restrictions to PTypes and FHorz.

1 Like

simoptions.conditionalrestrictions now works in AllStats. With and without PType. For both FHorz and InfHorz.

I still want to add it to AgeConditionalStats.

1 Like

simoptions.conditionalrestrictions now works with LifeCycleProfiles_FHorz_Case1 and the PType version.

Actually I ended up overhauling a lot of how FnsToEvaluate are done behind the scenes.

1 Like

Note: I deliberately left is so conditionalrestrictions do not work with AggVars (so that it is clear they cannot currently be used as part of general equilibrium conditions). Obviously the actual AggVars values are anyway generated as part of AllStats if you are interested in them.

1 Like

Average tax rates for top percentiles of income distribution

I came back to this old post since I am again calculating average tax rates by income percentiles. The method outlined above works well. I have come across another method (based on the files you shared with me via email) that computes the average tax rate in top 1 percent of incoem as follows:

AllStats2.IncomeTaxRevenue.QuantileMeans(100)/AllStats2.Income.QuantileMeans(100);

I am not sure this method works as intended, though. The top 1 percent is computed for the distribution of income tax revenues and for the distribution of income. This is incorrect, I think. The correct way to identify the top 1 percent of the income distribution, then compute the sum of income taxes paid by this group and divide by the sum of taxable income earned by this group.

I think this is the correct approach:

Params.p99 = AllStats.Income.QuantileCutoffs(99);
simoptions.conditionalrestrictions.top1 = @(aprime,a,z,p99) f_Income(aprime,a,z)>p99;
% Call AllStats with conditional restrictions
atr_top1_inc = AllStats.p99.IncomeTaxRevenue.Mean/AllStats.p99.Income.Mean;

This first one does not calc what you want.

This calculates the ‘Mean(income tax revenue| in top 1% of households by income tax paid)’ divided by ‘Mean(income|in top1% of households by income)’.

The second approach in your post calculates ‘Mean(income tax revenue| in top 1% of households by income)’ divided by ‘Mean(income|in top1% of households by income)’.

Note that a third possibility, not in your codes, would be to calculate ‘Mean(income tax revenue/income| in top 1% of households by income tax paid)’.

The second approach, which you describe as ‘I think this is the correct approach’ likely is the one you want, but of course that depends on exactly what it is you want to measure.

1 Like

Thanks for the answer!

My intent is to replicate Table 5 of Bruggeman (2021). The table footnote states that “Average tax rates are computed by dividing total income tax by taxable income.”

What do you think would be the correct approach in this instance?

I expect she did the second. That is the one most people would do.

1 Like

Before becoming an amateur economist, I found myself becoming an accidental data scientist. After four years of working with Pandas dataframes (solving climate-related calculations), I came to appreciate the deep logic of this Split-Apply-Combine diagram:

https://www.makerluis.com/content/images/size/w1600/2024/05/Data-aggregations-concept-split-apply-and-combine.png

Maintaining clarity (and conceptual separation) as to how primary data can be split up into groups, the groups can be aggregated, and the aggregations recombined is so fundamental. The Pandas dataframe model makes it very easy to be explicit about how the splitting (i.e. data selection) is done, before aggregation and recombination. I think that trying to do the splitting in other than the first step almost always leads to tears.

1 Like