A convex combination of decay step sizes for SGD

نویسندگانمهسا سهیل شمائی، سجاد فتحی هفشجانی
همایشچهارمین کنفرانس بین المللی جبر محاسباتی، نظریه محاسباتی اعداد و کاربردها
تاریخ برگزاری همایش2023-07-04 - 2023-07-06
محل برگزاری همایش1 - کاشان
ارائه به نام دانشگاهدانشگاه کاشان
نوع ارائهسخنرانی
سطح همایشبین المللی

چکیده مقاله

The decay step size is widely recognized as one of the most efficient methods for determining the learning rate in stochastic gradient descent (SGD). While there have been several approaches proposed for decay step sizes, this paper focuses on enhancing the efficiency of the cosine step size in practical scenarios. To achieve this, we propose a convex combination of the cosine and $\frac{1}{\sqrt{t}}$ step sizes. For this convex combination, we provide an $O(\frac{1}{\sqrt{T}})$ rate of convergence for smooth non-convex functions without the Polyak-Łojasiewicz condition. In terms of numerical results, we implement SGD on the FashionMNIST, CIFAR10, and CIFAR100 datasets. We demonstrate that the new step size improves the accuracy and loss function of the cosine step size.

کلید واژه ها: stochastic gradient descent, decay step size, convergence rate