How to implement momentum and decay correctly – SGD

By jacksparrow September 10, 2024

How to Implement Momentum and Decay Correctly – SGD

Introduction

Stochastic Gradient Descent (SGD) is a widely used optimization algorithm in machine learning. While simple and effective, SGD can sometimes struggle with oscillations and slow convergence. To address these issues, momentum and decay are commonly used techniques that enhance SGD’s performance.

Understanding Momentum

What is Momentum?

Momentum introduces a “memory” to the gradient updates. It accumulates a fraction of the previous gradient update and adds it to the current gradient update. This helps the optimizer “remember” the direction of movement and accelerate learning in that direction, smoothing out oscillations.

Implementing Momentum

Here’s how to implement momentum in SGD:

Initialize the velocity (v) to 0.
For each iteration:

Calculate the gradient (g).
Update the velocity: v = βv + g (where β is the momentum coefficient).
Update the parameters: θ = θ – αv (where α is the learning rate).

Typical momentum coefficient values (β) range from 0.5 to 0.9. Higher values lead to more “inertia” and faster convergence but can overshoot optimal points.

Understanding Weight Decay

What is Weight Decay?

Weight decay (also known as L2 regularization) penalizes large weights during training. This encourages the model to find solutions with smaller, more distributed weights, reducing overfitting.

Implementing Weight Decay

Incorporating weight decay into SGD is straightforward:

After each parameter update, scale the weight by a factor of (1 – λ), where λ is the weight decay coefficient.

The weight decay coefficient (λ) is a small value, typically around 1e-4 or 1e-5. Larger values increase the penalty on large weights, potentially leading to smaller, simpler models.

Combining Momentum and Decay

Both momentum and decay can be effectively combined in SGD. The combined algorithm updates parameters as follows:

Initialize the velocity (v) to 0.
For each iteration:

Calculate the gradient (g).
Update the velocity: v = βv + g.
Update the parameters: θ = (1 – λ)θ – αv.

Example Implementation in Python

Code


 import numpy as np def sgd_momentum_decay(params, grads, lr, momentum, decay): """ Implements SGD with momentum and weight decay. Args: params: A dictionary of parameters. grads: A dictionary of gradients. lr: Learning rate. momentum: Momentum coefficient. decay: Weight decay coefficient. Returns: Updated parameters. """ velocity = {} for name in params: if name not in velocity: velocity[name] = np.zeros_like(params[name]) velocity[name] = momentum * velocity[name] + grads[name] params[name] = (1 - decay) * params[name] - lr * velocity[name] return params

Usage


 # Example usage: params = {'w1': np.array([1.0, 2.0]), 'w2': np.array([3.0, 4.0])} grads = {'w1': np.array([0.1, 0.2]), 'w2': np.array([0.3, 0.4])} lr = 0.01 momentum = 0.9 decay = 1e-4 updated_params = sgd_momentum_decay(params, grads, lr, momentum, decay) print(updated_params)

Output


 {'w1': array([ 0.9999, 1.9998]), 'w2': array([ 2.9997, 3.9996])}

Conclusion

Implementing momentum and decay correctly in SGD can significantly improve the convergence speed and stability of the optimization process. These techniques are valuable additions to your machine learning toolkit for training neural networks and other models.

Post Views: 6

How to implement momentum and decay correctly – SGD

Introduction

Understanding Momentum

What is Momentum?

Implementing Momentum

Understanding Weight Decay

What is Weight Decay?

Implementing Weight Decay

Combining Momentum and Decay

Example Implementation in Python

Code

Usage

Output

Conclusion

By jacksparrow

Leave a Reply Cancel reply

You Missed

What is Python? – Definition, Features, Application

KeyAttestation in Android Nougat API 24

UTM tracking codes in Firebase

android.os.BadParcelableException: ClassNotFoundException when unmarshalling: com.facebook.flatbuffers.helpers.FlatBufferModelHelper$LazyHolder

How to implement momentum and decay correctly – SGD

Introduction

Understanding Momentum

What is Momentum?

Implementing Momentum

Understanding Weight Decay

What is Weight Decay?

Implementing Weight Decay

Combining Momentum and Decay

Example Implementation in Python

Code

Usage

Output

Conclusion

By jacksparrow

Related Post

Leave a Reply Cancel reply

You Missed

What is Python? – Definition, Features, Application

KeyAttestation in Android Nougat API 24

UTM tracking codes in Firebase

android.os.BadParcelableException: ClassNotFoundException when unmarshalling: com.facebook.flatbuffers.helpers.FlatBufferModelHelper$LazyHolder