Neural Network

Introducation
McCulloch and Pitts (M-P) model for neurons
Activation function
- Sign
- Sigmoid
- Tanh
- ReLU
- Softmax
Cost function
- Cross-entropy
Multi-layer feedforward neural network
Convolutional neural network (CNN)
- Pooling
Recurrent neural network (RNN)
- Long short term memory (LSTM)

Introducation

The network is composed of an input layer, an output layer, and optionally a series of hidden layers.
Input layer does not carry out any operation/function, but the hidden and output layers consist of functional neurons, i.e., have activation functions.

McCulloch and Pitts (M-P) model for neurons

\begin{align*} y = f(\mathbf{W} \mathbf{x} - \theta) \end{align*}

where \(f\) is termed activation function.

Activation function

Activation function aims to introduce the non-linear capability, compensate the shortage of linear models.

Sign

\begin{align*} \text{sign}(x) = \begin{cases} 1, & x \ge 0; \\ 0, & x < 0. \end{cases} \end{align*}

Sigmoid

\begin{align*} \text{sigmoid}(x) = \frac{1}{1 + e^{-x}}. \end{align*}

Tanh

\begin{align*} \text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}. \end{align*}

ReLU

\begin{align*} \text{ReLU}(x) = \max(x, 0) \end{align*}

Softmax

Softmax function can transform scores/logits into a valid probability distribution.

\begin{align*} \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} \end{align*}

Cost function

Cross-entropy

\begin{align*} \mathcal{H}_{p}(p^{\prime}) = \sum_{i}p_{i}\log\frac{1}{p^{\prime}_i} \end{align*}

where

\(p\) is the true probability distribution.
\(p^\prime\) is the predicted probability distribution.

Multi-layer feedforward neural network

Full connection between neurons of the adjacent ranks.
No connection between neurons of the identical rank.

Convolutional neural network (CNN)

The operation of a convolutional layer is to apply a filter on the input.
- The weights of the filter is termed kernel.
In fact, a convolutional layer is a simplified full connection layer, with following specialties.
- \(\mathbf{W}\) is a diagonal matrix.
- The non-zero elements in different rows are equal.

Pooling

A convolutional layer is usually followed by a pooling layer to
- Reduce the output of a convolutional layer.
- Ignore the subtle variations/shifts of the image position.
Popular pooling layers
- Maximum
- Average

Recurrent neural network (RNN)

A learned/trained function takes both input vector and its state vector as input and yields an output and an updated state vector.
In the front of large scale sequences, an ordinary/standard RNN suffers gradient disappear and gradient explosion. To solve the problems, some variants have been proposed, e.g. long short term memory (LSTM).