Mini Batches

In Barzilai [1] we split a small non-singular training problem into 2 mini-batches. The only solution of this problem is the vector (1, 2, 3, 4, 5, 6) and the solutions of the mini-batches are (3.3, 6.2, 8.4, 9.3, 8.3) and (-1.3, -2.2, -2.3, -1.4, 0.7, 3.6). The mini-batch problems are singular and their solutions do not resemble each other or the problem's solution. Their general solutions are given by y₁=x₁+v₁ and y₂=x₂+v₂ where v₁ and v₂ are arbitrary vectors in different subspaces and x₁, x₂ are particular solutions. By splitting a non-singular problem that has a single solution into batches, we produce singular batches, each with infinitely many solutions, and splitting the input into batches has a destructive effect: the batch solutions x₁ and x₂ differ from the solution of the total cost at a single-digit precision.

Even when the cost function has a single minimal value, this value may be attained by an infinite number of solutions, that is, by many sets of weights. However, since typical large-scale training problems are singular (and the mini-batches procedure adds to their level of singularity) they have many local minima. The local minimum that will be reached may depend on the algorithm's starting point and its stopping criterion, the batch strategy used, and the hardware environment in which the training is performed.

In the general case, denote by B_i batch number i and suppose that the overall procedure terminates when batch B_l is minimized. Denoting the solution of B_i by y_i=x_i+v_i as above we note that, in infinite precision, minimizing B_i is unrelated to minimizing B_j and the overall solution is y_l=x_l+v_l. To emphasize, minimizing one batch is unrelated to minimizing another, and the solution of the last batch is taken as the solution of the entire problem, so that all the batches but the last one have no effect on this procedure's output..

It also follows that the output depends on the arbitrary ordering of the batches and their elements. Since these depend on how the problem's input is ordered, the solution depends on this order; reordering the input produces different outputs, i.e. different solutions. Note that a proper solution of a problem cannot depend on its description.

Under this cyclical procedure in practical computation (which in addition employs a limited number of steps) each batch's solution is affected by the arbitrary vectors generated by the other batches, and the overall solution will be strongly affected by the last batch's arbitrary vector.

These issues are not addressed in the literature. They imply that output-based metrics do not characterize a training algorithm's performance. The number of steps required to reduce the error by a fixed factor — which depends on the algorithm's convergence rate but not on the problem's size — is a better indicator of an algorithm's performance than output-based metrics.

References

Jonathan Barzilai, "On Neural-network Accuracy and Validation," SIAM News, December 3, 2021. View
Jonathan Barzilai, "A Note on Neural-Network Mini-Batches," SIAM News, October 27, 2021. View
Jonathan Barzilai, "On Neural-Network Training Algorithms," in Human-Machine Shared Contexts, William Lawless, Ranjeev Mittu, and Donald Sofge (Eds.), Elsevier, pp. 307—313, 2020. View preprint
Jonathan Barzilai, "A Spectrum Algorithm and Optimization Notes," Pure and Applied Functional Analysis, Volume 3, Number 4, pp. 533—536, 2018.
Jonathan Barzilai and Jonathan M. Borwein, "Two-Point Step Size Gradient Methods," IMA Journal of Numerical Analysis, Vol. 8, pp. 141—148, 1988.
Jonathan Barzilai and Michael A.H. Dempster, "Measuring Rates of Convergence of Numerical Algorithms," Journal of Optimization Theory and Applications, Vol. 78, No. 1, pp.109—125, 1993.
Jonathan Barzilai and Aharon Ben-Tal, "Nonpolynomial and Inverse Interpolation for Line Search: Synthesis and Convergence Rates," SIAM J. Numer. Anal., Vol. 19, pp. 1263—1277, 1982.

Other Publications

Jonathan Barzilai, A Linear-Equations Challenge, 2017. View
Jonathan Barzilai, Supply and Demand Equilibrium and Debreu's Theory of Value, 2016. View
Jonathan Barzilai, And the Mathematics is Incorrect, SIAM News, March 2016. View published letter or see unedited version.
Jonathan Barzilai, On Ordinal, Cardinal, and Expected Utility, pp. 1—6, 2011. Published as "Inapplicable Operations on Ordinal, Cardinal, and Expected Utility," Real-World Economic Review, No. 63, 25 March 2013. View
Jonathan Barzilai, "On Microeconomics Errors and Ordinal Group Decision Making," AAAI Technical Report SS-12-01, pp. 4—7, 2012. View preprint
Jonathan Barzilai, "Preference Function Modeling: The Mathematical Foundations of Decision Theory," in Trends in Multiple Criteria Decision Analysis, Matthias Ehrgott, José Rui Figueira, and Salvatore Greco (Eds.), Springer, pp. 57—86, 2010. View preprint