A convex combination of decay step sizes for SGD

Authorsمهسا سهیل شمائی، سجاد فتحی هفشجانی
Conference Titleچهارمین کنفرانس بین المللی جبر محاسباتی، نظریه محاسباتی اعداد و کاربردها
Holding Date of Conference2023-07-04 - 2023-07-06
Event Place1 - کاشان
Presented byدانشگاه کاشان
PresentationSPEECH
Conference LevelInternational Conferences

Abstract

The decay step size is widely recognized as one of the most efficient methods for determining the learning rate in stochastic gradient descent (SGD). While there have been several approaches proposed for decay step sizes, this paper focuses on enhancing the efficiency of the cosine step size in practical scenarios. To achieve this, we propose a convex combination of the cosine and $\frac{1}{\sqrt{t}}$ step sizes. For this convex combination, we provide an $O(\frac{1}{\sqrt{T}})$ rate of convergence for smooth non-convex functions without the Polyak-Łojasiewicz condition. In terms of numerical results, we implement SGD on the FashionMNIST, CIFAR10, and CIFAR100 datasets. We demonstrate that the new step size improves the accuracy and loss function of the cosine step size.

tags: stochastic gradient descent, decay step size, convergence rate