We can characterize the Ramsey problem as:
max_{\tau} \, Welfare(model(\tau)) \quad \quad s.t. \quad TaxRevenue(\tau) >= R
where \tau is the tax rate, model(\tau) is understood as the stationary general eqm solution of the model (given \tau), and Welfare() is some measure of social welfare (say behind-the-veil utilitarian). TaxRevenue(\tau) is the tax revenue raised (in model eqm) by the tax, and R is some ‘minimum’ amount of revenue we must raise (for unmodelled reasons) So the Ramsey problem is to find the tax rate that maximizes social welfare subject to a constraint that we have to raise a certain amount of revenue.
As you say, there are essentially two ways to solve this. (i) try lots of different \tau and pick the best one, (ii) set it up as an optimization problem.
Conesa, Kitao and Krueger (2009) actually do (i). Following is copy-paste of a comment in my (not yet public) replication:
% The original CKK2009 which just tries out 55 different tax rates:
% In particular (from their GRIDTAX.f90) we have that they try out, one point on na0, which has value 0.23
% five points on na1, which range from 6 to 8, 11 points on tauk, which range from 0.3 to 0.4
% Their EVALUATE.f90 loops over these grids, solving for general eqm at each point on the cross-product-grid.
So they just do the same as you give an example code for, on a grid (over three taxes) of 55 points. (It is possible they tried some others, the above is what is left in their codes.) Which is method (i) of set grid on \tau, looping over grid on \tau solving the stationary general eqm and evaluating the welfare, end loop, pick the tax associated with the highest welfare.
If you want to set it up as an optimization like in (ii), then since optimization is typically actually set up as minimization, you need to either minimize -Welfare() [just take negative of welfare], or replace welfare with a measure of welfare-loss, and minimize the welfare loss. If you do go for (ii) then you can either set it up as a joint-optimization or a nested-optimization (you have two optimizations, one is welfare maxiziation, one is general eqm). When using VFI Toolkit you typically want to do the nested-optimization because of how it handles general eqm and then getting welfare.
If you do choose to do (ii), often it is still worth doing (i) on rough grids first, then use the solution from (i) as the initial guess for (ii) [plus it can provide a good double-check on your codes]
In everything I described so far, I have largely ignored the Ramsey constraint TaxRevenue(\tau) >= R, but you typically just do this by adjusting one of your tax rates to get TaxRevenue(\tau)= R to hold as a general eqm condition (you know from analytics/theory that the welfare optimal will not raise more revenue than necessary, since in the model setup there is no real point to doing so).
In my experience replicating Conesa, Kitao & Kreuger (2009), setting up (ii) as a nested optimization is a tough problem to solve. fminsearch() was not up to the problem and I ended up using CMA-ES to solve it. But in principle, any kind of minimization routine (fminsearch, fminunc, fmincon, or something boutique) could be used to try solve it. Because it involved solving stationary general eqm in the model a large number of times I switched to using shooting-algorithm for the general eqm to make that as fast as possible [see Appendix of Intro to OLG Models for how to set up shooting algo for stationary general eqm in VFI Toolkit].
My impression from reading papers is that doing a ‘rough grids (i)’ then into an (ii) is a good combo. But plenty of papers just do a ‘rough grids (i)’ and then stop there.