Restricted Boltzmann Machines and Free Energy
Restricted Boltzmann Machines (RBMs) are a type of generative stochastic neural network. They are used for learning probability distributions over a set of variables, and have found applications in areas like collaborative filtering, image recognition, and natural language processing.
A key concept in RBMs is the **free energy**. It’s a function that measures the energy of a configuration of visible and hidden units in the RBM, and it plays a crucial role in understanding the model’s behavior.
Free Energy Approximation Equation
The free energy of a configuration of visible units v is defined as follows:
Equation
The free energy, denoted as F(v), is given by:
F(v) = -log ∑h exp(-E(v,h)) |
where:
- v: Vector representing the visible units.
- h: Vector representing the hidden units.
- E(v,h): Energy function of the RBM.
Interpretation
The free energy is essentially the negative logarithm of the partition function. The partition function sums over all possible configurations of hidden units, weighted by their probability of occurring given the visible units.
The free energy can be interpreted as the expected energy of the system, taking into account the probabilities of different hidden unit configurations.
Approximation
The exact calculation of free energy involves summing over all possible hidden unit configurations, which can be computationally expensive. In practice, we use an **approximation**:
F(v) ≈ - log(exp(-E(v,h*)) + ∑j exp(-E(v,h*j)) |
where:
- h*: The most likely configuration of hidden units given the visible units, determined by applying the sigmoid function to the hidden units’ activation values.
- h*j: Configurations obtained by flipping a single hidden unit from its most likely state (h*).
Code Example (Python)
import numpy as np def free_energy(v, W, b, c): """ Calculates the approximate free energy. Args: v: Vector of visible units. W: Weight matrix. b: Visible bias vector. c: Hidden bias vector. Returns: Approximate free energy. """ h_probs = sigmoid(np.dot(W, v) + c) # Probability of hidden units h_star = (h_probs > 0.5).astype(int) # Most likely hidden state # Calculate energy for most likely hidden state and flipped states energy_star = -np.dot(v, b) - np.dot(h_star, c) - np.dot(v, np.dot(W, h_star)) energy_flipped = -np.dot(v, b) - np.dot((1-h_star), c) - np.dot(v, np.dot(W, (1-h_star))) # Approximate free energy approx_free_energy = -np.log(np.exp(-energy_star) + np.sum(np.exp(-energy_flipped))) return approx_free_energy def sigmoid(x): return 1 / (1 + np.exp(-x))
Uses of Free Energy in RBMs
The free energy is essential for various aspects of RBM training and inference:
- **Training:** The contrastive divergence (CD) algorithm, a popular method for training RBMs, relies on calculating the free energy. It aims to minimize the difference between free energy for the data and free energy for model-generated data.
- **Inference:** The free energy can be used to calculate the probability of observing a particular visible unit configuration.
- **Generative Modeling:** RBMs can be used to generate new data by sampling from the model’s probability distribution, which can be determined from the free energy.
Conclusion
The free energy plays a fundamental role in restricted Boltzmann machines. While its exact computation is challenging, efficient approximations are available that allow us to train and utilize RBMs for various applications.