Neural Network

Table of Contents

Introducation

  • The network is composed of an input layer, an output layer, and optionally a series of hidden layers.
  • Input layer does not carry out any operation/function, but the hidden and output layers consist of functional neurons, i.e., have activation functions.

McCulloch and Pitts (M-P) model for neurons

\begin{align*} y = f(\mathbf{W} \mathbf{x} - \theta) \end{align*}

where \(f\) is termed activation function.

Activation function

Activation function aims to introduce the non-linear capability, compensate the shortage of linear models.

Sign

\begin{align*} \text{sign}(x) = \begin{cases} 1, & x \ge 0; \\ 0, & x < 0. \end{cases} \end{align*}

Sigmoid

\begin{align*} \text{sigmoid}(x) = \frac{1}{1 + e^{-x}}. \end{align*}

Tanh

\begin{align*} \text{tanh}(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}. \end{align*}

ReLU

\begin{align*} \text{ReLU}(x) = \max(x, 0) \end{align*}

Softmax

Softmax function can transform scores/logits into a valid probability distribution.

\begin{align*} \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}} \end{align*}

Cost function

Cross-entropy

\begin{align*} \mathcal{H}_{p}(p^{\prime}) = \sum_{i}p_{i}\log\frac{1}{p^{\prime}_i} \end{align*}

where

  • \(p\) is the true probability distribution.
  • \(p^\prime\) is the predicted probability distribution.

Multi-layer feedforward neural network

  • Full connection between neurons of the adjacent ranks.
  • No connection between neurons of the identical rank.

Convolutional neural network (CNN)

  • The operation of a convolutional layer is to apply a filter on the input.
    • The weights of the filter is termed kernel.
  • In fact, a convolutional layer is a simplified full connection layer, with following specialties.
    • \(\mathbf{W}\) is a diagonal matrix.
    • The non-zero elements in different rows are equal.

Pooling

  • A convolutional layer is usually followed by a pooling layer to
    • Reduce the output of a convolutional layer.
    • Ignore the subtle variations/shifts of the image position.
  • Popular pooling layers
    • Maximum
    • Average

Recurrent neural network (RNN)

  • A learned/trained function takes both input vector and its state vector as input and yields an output and an updated state vector.
  • In the front of large scale sequences, an ordinary/standard RNN suffers gradient disappear and gradient explosion. To solve the problems, some variants have been proposed, e.g. long short term memory (LSTM).

Long short term memory (LSTM)