Contents
What is weight initialization in a neural network?
The aim of weight initialization is to prevent layer activation outputs from exploding or vanishing during the course of a forward pass through a deep neural network. Matrix multiplication is the essential math operation of a neural network.
What is an orthogonal matrix give an example?
A square matrix with real numbers or values is termed as an orthogonal matrix if its transpose is equal to the inverse matrix of it. In other words, the product of a square orthogonal matrix and its transpose will always give an identity matrix. Suppose A is the square matrix with real values, of order n × n.
Can a weight matrix be initialized to an orthogonal matrix?
In practice, initializing the weight matrix of a dense layer to a random orthogonal matrix is fairly straightforward. For the convolutional layer, where the weight matrix isn’t strictly a matrix, we need to think more carefully about what this means.
Why is orthogonal initialization used in convolutional layers?
Orthogonal initialization has shown to provide numerous benefits for training deep neural networks. It is easy to see which vectors should be orthogonal to one another in a dense layer, but less straightforward to see where this orthogonality should happen in a convolutional layer, because the weight matrix is no longer really a matrix.
What are the properties of an orthogonal matrix?
Orthogonal matrices have many interesting properties but the most important for us is that all the eigenvalues of an orthogonal matrix have absolute value 1. This means that, no matter how many times we perform repeated matrix multiplication, the resulting matrix doesn’t explode or vanish.
How is orthogonal initialization used in gradient clipping?
Orthogonal initialization is a simple yet relatively effective way of combatting exploding and vanishing gradients, especially when paired with other methods such as gradient clipping and more advanced architectures.