Zero Initialiser for Biases using get_variable in TensorFlow
Introduction
In TensorFlow, initialising variables with appropriate values is crucial for efficient model training. For biases, which are typically added to the output of a layer, a common practice is to initialise them to zero. This article will delve into using the get_variable
function with a zero initialiser for biases.
Understanding the Importance of Bias Initialisation
Bias terms in neural networks have a significant impact on model performance. Here’s why proper initialisation matters:
- Breaking Symmetry: Zero biases help break the symmetry in the initial state of neurons, allowing them to learn unique features during training.
- Activation Function Impact: Biases influence the output of activation functions. Initialising them to zero ensures that activations are not skewed initially, aiding in gradient-based learning.
- Avoiding Dead Neurons: In certain activation functions, like ReLU, zero biases can lead to “dead neurons” (neurons that never activate). Zero initialisation helps prevent this by ensuring a reasonable starting point.
Implementing Zero Initialiser with get_variable
TensorFlow’s get_variable
function provides a convenient way to create and initialise variables. Here’s a step-by-step approach to using a zero initialiser:
1. Import Necessary Libraries
import tensorflow as tf
2. Define the Zero Initialiser
zero_initializer = tf.zeros_initializer()
3. Create the Bias Variable
bias = tf.get_variable( 'bias', shape=[output_size], initializer=zero_initializer, dtype=tf.float32 )
Explanation:
'bias'
: The name of the variable.[output_size]
: The shape of the bias tensor (number of output units).zero_initializer
: The initialiser to use (in this case, zero initialisation).tf.float32
: The data type of the bias variable.
4. Use the Bias in Your Model
Once created, the bias can be added to the output of a layer in your model:
output = tf.matmul(input, weights) + bias
Example: A Simple Neural Network with Zero Bias Initialisation
import tensorflow as tf # Define model parameters input_size = 10 output_size = 5 # Input placeholder input_placeholder = tf.placeholder(tf.float32, shape=[None, input_size]) # Weights weights = tf.get_variable( 'weights', shape=[input_size, output_size], initializer=tf.random_normal_initializer() ) # Zero initialiser for bias zero_initializer = tf.zeros_initializer() bias = tf.get_variable( 'bias', shape=[output_size], initializer=zero_initializer ) # Model output output = tf.matmul(input_placeholder, weights) + bias # Session and execution with tf.Session() as sess: sess.run(tf.global_variables_initializer()) input_data = [[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]] result = sess.run(output, feed_dict={input_placeholder: input_data}) print(result)
Output:
[[ 0. 0. 0. 0. 0. ]]
The output shows that the initial values of the bias are all zero, as expected. This initialisation will then be used to train the model during subsequent optimization steps.
Conclusion
Using a zero initialiser for biases with get_variable
in TensorFlow is a common and effective practice. It ensures that biases start with a neutral value, allowing the model to learn appropriate biases during training, leading to more robust and well-performing neural networks.