Authors | مهسا سهیل شمائی، سجاد فتحی هفشجانی |
---|---|
Conference Title | چهارمین کنفرانس بین المللی جبر محاسباتی، نظریه محاسباتی اعداد و کاربردها |
Holding Date of Conference | 2023-07-04 - 2023-07-06 |
Event Place | 1 - کاشان |
Presented by | دانشگاه کاشان |
Presentation | SPEECH |
Conference Level | International Conferences |
Abstract
The decay step size is widely recognized as one of the most efficient methods for determining the learning rate in stochastic gradient descent (SGD). While there have been several approaches proposed for decay step sizes, this paper focuses on enhancing the efficiency of the cosine step size in practical scenarios. To achieve this, we propose a convex combination of the cosine and $\frac{1}{\sqrt{t}}$ step sizes. For this convex combination, we provide an $O(\frac{1}{\sqrt{T}})$ rate of convergence for smooth non-convex functions without the Polyak-Łojasiewicz condition. In terms of numerical results, we implement SGD on the FashionMNIST, CIFAR10, and CIFAR100 datasets. We demonstrate that the new step size improves the accuracy and loss function of the cosine step size.
tags: stochastic gradient descent, decay step size, convergence rate