Mini Batches
In Barzilai [1] we split a small non-singular training problem into 2 mini-batches. The only solution of this problem is the vector (1, 2, 3, 4, 5, 6) and the solutions of the mini-batches are (3.3, 6.2, 8.4, 9.3, 8.3) and (-1.3, -2.2, -2.3, -1.4, 0.7, 3.6). The mini-batch problems are singular and their solutions do not resemble each other or the problem's solution. Their general solutions are given by y1=x1+v1 and y2=x2+v2 where v1 and v2 are arbitrary vectors in different subspaces and x1, x2 are particular solutions. By splitting a non-singular problem that has a single solution into batches, we produce singular batches, each with infinitely many solutions, and splitting the input into batches has a destructive effect: the batch solutions x1 and x2 differ from the solution of the total cost at a single-digit precision.
Even when the cost function has a single minimal value, this value may be attained by an infinite number of solutions, that is, by many sets of weights. However, since typical large-scale training problems are singular (and the mini-batches procedure adds to their level of singularity) they have many local minima. The local minimum that will be reached may depend on the algorithm's starting point and its stopping criterion, the batch strategy used, and the hardware environment in which the training is performed.
In the general case, denote by Bi batch number i and suppose that the overall procedure terminates when batch Bl is minimized. Denoting the solution of Bi by yi=xi+vi as above we note that, in infinite precision, minimizing Bi is unrelated to minimizing Bj and the overall solution is yl=xl+vl. To emphasize, minimizing one batch is unrelated to minimizing another, and the solution of the last batch is taken as the solution of the entire problem, so that all the batches but the last one have no effect on this procedure's output..
It also follows that the output depends on the arbitrary ordering of the batches and their elements. Since these depend on how the problem's input is ordered, the solution depends on this order; reordering the input produces different outputs, i.e. different solutions. Note that a proper solution of a problem cannot depend on its description.
Under this cyclical procedure in practical computation (which in addition employs a limited number of steps) each batch's solution is affected by the arbitrary vectors generated by the other batches, and the overall solution will be strongly affected by the last batch's arbitrary vector.
These issues are not addressed in the literature. They imply that output-based metrics do not characterize a training algorithm's performance. The number of steps required to reduce the error by a fixed factor — which depends on the algorithm's convergence rate but not on the problem's size — is a better indicator of an algorithm's performance than output-based metrics.
References
- Jonathan Barzilai, "On Neural-network Accuracy and Validation," SIAM News, December 3, 2021. View
- Jonathan Barzilai, "A Note on Neural-Network Mini-Batches," SIAM News, October 27, 2021. View
- Jonathan Barzilai, "On Neural-Network Training Algorithms," in Human-Machine Shared Contexts, William Lawless, Ranjeev Mittu, and Donald Sofge (Eds.), Elsevier, pp. 307—313, 2020. View preprint
- Jonathan Barzilai, "A Spectrum Algorithm and Optimization Notes," Pure and Applied Functional Analysis, Volume 3, Number 4, pp. 533—536, 2018.
- Jonathan Barzilai and Jonathan M. Borwein, "Two-Point Step Size Gradient Methods," IMA Journal of Numerical Analysis, Vol. 8, pp. 141—148, 1988.
- Jonathan Barzilai and Michael A.H. Dempster, "Measuring Rates of Convergence of Numerical Algorithms," Journal of Optimization Theory and Applications, Vol. 78, No. 1, pp.109—125, 1993.
- Jonathan Barzilai and Aharon Ben-Tal, "Nonpolynomial and Inverse Interpolation for Line Search: Synthesis and Convergence Rates," SIAM J. Numer. Anal., Vol. 19, pp. 1263—1277, 1982.
Other Publications
- Jonathan Barzilai, A Linear-Equations Challenge, 2017. View
- Jonathan Barzilai, Supply and Demand Equilibrium and Debreu's Theory of Value, 2016. View
- Jonathan Barzilai, And the Mathematics is Incorrect, SIAM News, March 2016. View published letter or see unedited version.
- Jonathan Barzilai, On Ordinal, Cardinal, and Expected Utility, pp. 1—6, 2011. Published as "Inapplicable Operations on Ordinal, Cardinal, and Expected Utility," Real-World Economic Review, No. 63, 25 March 2013. View
- Jonathan Barzilai, "On Microeconomics Errors and Ordinal Group Decision Making," AAAI Technical Report SS-12-01, pp. 4—7, 2012. View preprint
- Jonathan Barzilai, "Preference Function Modeling: The Mathematical Foundations of Decision Theory," in Trends in Multiple Criteria Decision Analysis, Matthias Ehrgott, José Rui Figueira, and Salvatore Greco (Eds.), Springer, pp. 57—86, 2010. View preprint