Neural Network
Table of Contents
Introducation
- The network is composed of an input layer, an output layer, and optionally a series of hidden layers.
- Input layer does not carry out any operation/function, but the hidden and output layers consist of functional neurons, i.e., have activation functions.
McCulloch and Pitts (M-P) model for neurons
\begin{align*}
y = f(\mathbf{W} \mathbf{x} - \theta)
\end{align*}
where \(f\) is termed activation function.
Activation function
Activation function aims to introduce the non-linear capability, compensate the shortage of linear models.
Sign
\begin{align*}
\text{sign}(x) = \begin{cases}
1, & x \ge 0; \\
0, & x < 0.
\end{cases}
\end{align*}
Sigmoid
\begin{align*}
\text{sigmoid}(x) = \frac{1}{1 + e^{-x}}.
\end{align*}
Tanh
\begin{align*}
\text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}.
\end{align*}
ReLU
\begin{align*}
\text{ReLU}(x) = \max(x, 0)
\end{align*}
Softmax
Softmax function can transform scores/logits into a valid probability distribution.
\begin{align*} \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} \end{align*}Cost function
Cross-entropy
\begin{align*}
\mathcal{H}_{p}(p^{\prime}) = \sum_{i}p_{i}\log\frac{1}{p^{\prime}_i}
\end{align*}
where
- \(p\) is the true probability distribution.
- \(p^\prime\) is the predicted probability distribution.
Multi-layer feedforward neural network
- Full connection between neurons of the adjacent ranks.
- No connection between neurons of the identical rank.
Convolutional neural network (CNN)
- The operation of a convolutional layer is to apply a filter on the input.
- The weights of the filter is termed kernel.
- In fact, a convolutional layer is a simplified full connection layer, with following specialties.
- \(\mathbf{W}\) is a diagonal matrix.
- The non-zero elements in different rows are equal.
Pooling
- A convolutional layer is usually followed by a pooling layer to
- Reduce the output of a convolutional layer.
- Ignore the subtle variations/shifts of the image position.
- Popular pooling layers
- Maximum
- Average
Recurrent neural network (RNN)
- A learned/trained function takes both input vector and its state vector as input and yields an output and an updated state vector.
- In the front of large scale sequences, an ordinary/standard RNN suffers gradient disappear and gradient explosion. To solve the problems, some variants have been proposed, e.g. long short term memory (LSTM).