Improved Convergence in Parameter-Agnostic Error Feedback through Momentum

Nov 1, 2025·

Abdurakhmon Sadiev

Yury Demidovich

Igor Sokolov

Grigory Malinovsky

Sarit Khirirat

Peter Richtárik

· 0 min read

PDF Cite Code arXiv

Abstract

Communication compression is essential for scalable distributed training of modern machine learning models, but it often degrades convergence due to the noise it introduces. Error Feedback (EF) mechanisms are widely adopted to mitigate this issue of distributed compression algorithms. Despite their popularity and training efficiency, existing distributed EF algorithms often require prior knowledge of problem parameters (e.g., smoothness constants) to fine-tune stepsizes. This limits their practical applicability especially in large-scale neural network training. In this paper, we study normalized error feedback algorithms that combine EF with normalized updates, various momentum variants, and parameter-agnostic, time-varying stepsizes, thus eliminating the need for problem-dependent tuning. We analyze the convergence of these algorithms for minimizing smooth functions, and establish parameter-agnostic complexity bounds that are close to the bestknown bounds with carefully-tuned problem-dependent stepsizes. Specifically, we show that normalized EF21 achieve the convergence rate of near $\mathcal{O}\left(1 / {T}^{1 / 4}\right)$ for Polyak’s heavy-ball momentum, $\mathcal{O}\left(1 / {T}^{2 / 7}\right)$ for Iterative Gradient Transport (IGT), and $\mathcal{O}\left(1 / {T}^{1 / 3}\right)$ for STORM and Hessian-corrected momentum. Our results hold with decreasing stepsizes and small minibatches. Finally, our empirical experiments confirm our theoretical insights.

Type

Preprint

Publication

arXiv preprint arXiv:2511.14501

Last updated on Nov 30, 2025

Federated Learning Distributed Optimization Error Feedback Communication Compression Non-Convex Optimization Normalized Updates

Authors

Igor Sokolov

PhD Candidate in Applied Mathematics and Computational Science

Bernoulli-LoRA: A Theoretical Framework for Randomized Low-Rank Adaptation Aug 1, 2025 →