I just overhauled divide-and-conquer, so it is a bit better now.
In a model with just aprime and a, there is no difference. Between two points on a for the second layer you consider all aprime in a range aprimelower to aprimeupper.
The difference is in how it is handled if there are other variables (d,z,e). I will explain based on d, but z and e are done in analogous fashion.
If you have d, then aprimelower(d) and aprimeupper(d) depend on d. On a CPU this is trivial, you just loop over d and consider each aprimelower(d) and aprimeupper(d) conditional on d. But on GPU this loop is going to be way too slow.
Previously, the code looked for aprime in the range min_d[aprimelower(d)] to max_d[aprimeupper(d)]. Which is obviously rather wasteful. Now the code looks for aprime in range aprimelower(d)+[0:1:max_d[aprimeupper(d)-aprimelower(d)]]. So the number of aprime points being considered is the now max_d[aprimeupper(d)-aprimelower(d)] where it used to be max_d[aprimeupper(d)]-min_d[aprimelower(d)].
The runtime improvements are in the 10%-40%, depending on the problem (mostly nearer 20%).
I tried out using a lot of NaN in the aprime being considered, but max() with lots of NaN and ‘omitnan’ seems to have same runtime as just max() without the NaN. So I didn’t end up bothering (idea was that I could use a range for aprime that differed for each d, and then make the matrix the same size by filling in the range with NaN so that the size of that dimension of the matrix did not depend on d).
Anyway, from the user’s perspective, the only difference is that now it is 10%-40% faster, depending on your problem. No change in anything the user needs to do.
I also added that if you use vfoptions.verbose=1 while using vfoptions.divideandconquer=1 then it will print a message telling you that suitably setting vfoptions.level1n can speed your code up. I expect most users won’t want to play with this, but power users will, so only saying it if you ask for verbose seemed a good compromise.