A convex combination of decay step sizes for SGD

مهسا سهیل شمائی، سجاد فتحی هفشجانی

Authors	مهسا سهیل شمائی، سجاد فتحی هفشجانی
Conference Title	چهارمین کنفرانس بین المللی جبر محاسباتی، نظریه محاسباتی اعداد و کاربردها
Holding Date of Conference	2023-07-04 - 2023-07-06
Event Place	1 - کاشان
Presented by	دانشگاه کاشان
Presentation	SPEECH
Conference Level	International Conferences

Abstract

The decay step size is widely recognized as one of the most efficient methods for determining the learning rate in stochastic gradient descent (SGD). While there have been several approaches proposed for decay step sizes, this paper focuses on enhancing the efficiency of the cosine step size in practical scenarios. To achieve this, we propose a convex combination of the cosine and $\frac{1}{\sqrt{t}}$ step sizes. For this convex combination, we provide an $O(\frac{1}{\sqrt{T}})$ rate of convergence for smooth non-convex functions without the Polyak-Łojasiewicz condition. In terms of numerical results, we implement SGD on the FashionMNIST, CIFAR10, and CIFAR100 datasets. We demonstrate that the new step size improves the accuracy and loss function of the cosine step size.

Mahsa Soheil Shamaee

Assistant Professor Mahsa Soheil Shamaee

A convex combination of decay step sizes for SGD

Abstract