Boltzmann Machine and Boltzmann Learning

Definition

A Boltzmann Machine (BM) is a type of stochastic recurrent neural network that can learn internal representations and perform combinatorial optimisation. Boltzmann Learning refers to the algorithm used to train Boltzmann Machines by adjusting their weights to minimise the difference between observed and expected data distributions.

Key Concepts

Stochastic Neural Network: A network where neuron activations are probabilistic rather than deterministic.
Energy Function: A function that defines the state of the network, which the network seeks to minimize.
Visible and Hidden Units: Visible units correspond to observed data, while hidden units help capture complex dependencies in the data.
Gibbs Sampling: A Markov Chain Monte Carlo method used to sample from the network's distribution.
Contrastive Divergence: A training algorithm that approximates the gradient of the likelihood function to update network weights.

Detailed Explanation

Boltzmann Machines are powerful tools for modelling complex distributions and solving optimisation problems. They consist of visible and hidden units with weighted connections.

Network Structure:

Visible Units: Nodes that correspond to the input data.
Hidden Units: Nodes that capture the dependencies and structure in the data.
Weights: Symmetric connections between units, representing the interaction strength.

Energy Function: The energy ( E ) of a state ( \mathbf{v}, \mathbf{h} ) (where ( \mathbf{v} ) are the visible units and ( \mathbf{h} ) are the hidden units) is given by: $$ E(\mathbf{v}, \mathbf{h}) = -\sum_{i \in \text{visible}} \sum_{j \in \text{hidden}} w_{ij} v_i h_j - \sum_{i \in \text{visible}} b_i v_i - \sum_{j \in \text{hidden}} c_j h_j $$ where ( w_{ij} ) are the weights, and ( b_i ) and ( c_j ) are the biases of visible and hidden units, respectively.

Training (Boltzmann Learning):

Objective: Minimize the difference between the probability distribution of the observed data and the model distribution.
Steps:
1. Positive Phase: Compute the expected value of the product of visible and hidden units using the observed data.
2. Negative Phase: Compute the expected value of the product of visible and hidden units using samples from the model's distribution.
3. Weight Update: Adjust weights using the difference between these expectations.
Gradient Descent: Weights are updated to minimize the negative log-likelihood of the observed data.

Contrastive Divergence (CD):

Approximation Method: Instead of running the Markov Chain until convergence, CD performs a few iterations (often just one) to approximate the gradient.
Steps:
1. Initialise: Start with the observed data.
2. Gibbs Sampling: Perform Gibbs sampling for a few steps to obtain a sample from the model's distribution.
3. Update: Use the difference between data distribution and model distribution to update weights.

Applications:

Pattern Recognition: Recognizing complex patterns in data, such as handwriting or facial recognition.
Dimensionality Reduction: Reducing the dimensionality of data while preserving its structure.
Feature Learning: Learning meaningful features from unlabelled data.

Diagrams

Links to Resources

Notes and Annotations

Summary of key points:
- Boltzmann Machines are stochastic neural networks used for learning complex data distributions and optimisation.
- They consist of visible and hidden units connected by symmetric weights.
- Training involves minimising the difference between observed and model distributions using algorithms like Contrastive Divergence.
Personal annotations and insights:
- Understanding Boltzmann Machines provides insights into the foundations of deep learning and unsupervised learning.
- The probabilistic nature of Boltzmann Machines allows them to model uncertainty and capture complex dependencies in data.
- Real-world applications highlight the versatility of Boltzmann Machines in tasks requiring pattern recognition and feature learning.
- Exploring advanced training techniques, such as improved sampling methods, can enhance the performance of Boltzmann Machines in practical scenarios.

Backlinks

Linked from Unit III Associative LearningUnit III Associative LearningOverview Associative learning is the focus of this unit, which begins with an introduction to the concept and its significance in neural networks. You'll study Hopfield networks and their error performance, as well as simulated annealing processes. The unit covers Boltzmann machines and Boltzmann learning, including state transition diagrams and the problem of false minima. Stochastic update methods and simulated annealing are also discussed. Finally, the unit explores basic functional units of