Why Use Tanh for Activation Function of MLP?

Why Use Tanh for Activation Function of MLP?

In the realm of Multilayer Perceptrons (MLPs), the choice of activation function plays a pivotal role in shaping the network’s learning capabilities. Among the various options, the hyperbolic tangent function (tanh) stands out as a popular and effective choice.

Understanding the Tanh Function

The tanh function, mathematically represented as:

tanh(x) = (e^x - e^-x) / (e^x + e^-x)

is a sigmoid function that squashes its input values to the range of [-1, 1]. This property makes it particularly advantageous in MLPs, as explained below.

Advantages of Tanh in MLPs

1. Zero-Centered Output

Unlike other activation functions like ReLU, which have a non-zero lower bound, tanh’s output is centered around zero. This zero-centered output helps in:

  • Faster Convergence: By preventing biases in gradient updates, tanh facilitates faster convergence during training.
  • Improved Learning: The zero-centered nature helps neurons learn both positive and negative relationships more effectively.

2. Gradient Stability

The tanh function has a stable gradient across a wider range of inputs compared to ReLU. This stability contributes to:

  • Smoother Learning: The gradients don’t vanish or explode as easily, enabling a smoother learning process.
  • Avoiding Vanishing Gradients: The stable gradient helps mitigate the vanishing gradient problem, a common issue in deep neural networks.

3. Non-Linearity

Like other activation functions, tanh introduces non-linearity into the MLP, which is crucial for learning complex patterns and relationships in data.

Comparison with Other Activation Functions

Activation Function Range Zero-Centered Gradient Stability
ReLU [0, ∞) No Limited (vanishing gradient issue)
Sigmoid [0, 1] No Moderate
Tanh [-1, 1] Yes High

Code Example

import tensorflow as tf

# Define the MLP with tanh activation
model = tf.keras.Sequential([
  tf.keras.layers.Dense(128, activation='tanh', input_shape=(10,)),
  tf.keras.layers.Dense(64, activation='tanh'),
  tf.keras.layers.Dense(10, activation='softmax')
])

# Compile and train the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10)

Conclusion

The tanh activation function offers compelling advantages in MLPs due to its zero-centered output, gradient stability, and non-linearity. It plays a crucial role in achieving faster convergence, smoother learning, and improved overall performance. When selecting an activation function, tanh should be considered as a strong candidate for its proven effectiveness in various machine learning tasks.


Leave a Reply

Your email address will not be published. Required fields are marked *