Why use tanh for activation function of MLP?

By jacksparrow August 31, 2024

Why Use Tanh for Activation Function of MLP?

In the realm of Multilayer Perceptrons (MLPs), the choice of activation function plays a pivotal role in shaping the network’s learning capabilities. Among the various options, the hyperbolic tangent function (tanh) stands out as a popular and effective choice.

Understanding the Tanh Function

The tanh function, mathematically represented as:

tanh(x) = (e^x - e^-x) / (e^x + e^-x)

is a sigmoid function that squashes its input values to the range of [-1, 1]. This property makes it particularly advantageous in MLPs, as explained below.

Advantages of Tanh in MLPs

1. Zero-Centered Output

Unlike other activation functions like ReLU, which have a non-zero lower bound, tanh’s output is centered around zero. This zero-centered output helps in:

Faster Convergence: By preventing biases in gradient updates, tanh facilitates faster convergence during training.
Improved Learning: The zero-centered nature helps neurons learn both positive and negative relationships more effectively.

2. Gradient Stability

The tanh function has a stable gradient across a wider range of inputs compared to ReLU. This stability contributes to:

Smoother Learning: The gradients don’t vanish or explode as easily, enabling a smoother learning process.
Avoiding Vanishing Gradients: The stable gradient helps mitigate the vanishing gradient problem, a common issue in deep neural networks.

3. Non-Linearity

Like other activation functions, tanh introduces non-linearity into the MLP, which is crucial for learning complex patterns and relationships in data.

Comparison with Other Activation Functions

Activation Function	Range	Zero-Centered	Gradient Stability
ReLU	[0, ∞)	No	Limited (vanishing gradient issue)
Sigmoid	[0, 1]	No	Moderate
Tanh	[-1, 1]	Yes	High

Code Example

import tensorflow as tf

# Define the MLP with tanh activation
model = tf.keras.Sequential([
  tf.keras.layers.Dense(128, activation='tanh', input_shape=(10,)),
  tf.keras.layers.Dense(64, activation='tanh'),
  tf.keras.layers.Dense(10, activation='softmax')
])

# Compile and train the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)

Conclusion

The tanh activation function offers compelling advantages in MLPs due to its zero-centered output, gradient stability, and non-linearity. It plays a crucial role in achieving faster convergence, smoother learning, and improved overall performance. When selecting an activation function, tanh should be considered as a strong candidate for its proven effectiveness in various machine learning tasks.

Post Views: 11

Why use tanh for activation function of MLP?