Motivated by applications to distributed optimization over networks and large-scale data processing in machine learning, we analyze the deterministic incremental aggregated gradient method for minimizing a finite sum of smooth functions where the sum is strongly convex. This method processes the functions one at a time in a deterministic order and incorporates a memory of previous gradient values to accelerate convergence. Empirically it performs well in practice; however, no theoretical analysis with explicit rate results was previously given in the literature to our knowledge, in particular most of the recent efforts concentrated on the randomized versions. In this paper, we show that this deterministic algorithm has global linear convergence and we characterize the convergence rate. We also consider an aggregated method with momentum and demonstrate its linear convergence. Our proofs rely on a careful choice of a Lyapunov function that offers insight into the algorithm’s behavior and simplifies the proofs considerably.
Read More: https://epubs.siam.org/doi/abs/10.1137/15M1049695