Kuang Algorithm Engineer&Data Mining Engineer

# Understanding LSTM Networks

2018-08-01
Kuang

### Recurrent Neural Network

Recurrent Neural Networks have loops

An unrolled recurrent neural network

LSTMs是一种非常特殊的循环神经网络，对于大多数任务，LSTMs都优于传统的RNNs，下面讨论LSTMs。

### The Problem of Long-Term Dependencies

RNNs的一个观点是，优化当前任务可以利用先前的信息，例如视频任务，可以利用先前帧的信息来理解当前的帧。但RNNs是否能像我们想象一般地很好地处理该任务呢?

## LSTM Network

LSTM是一种特殊的RNN,能够处理长期依赖问题，该模型默认记忆长期的信息。所有的循环神经网络由重复的神经网络模块组成。在一般的RNNs中，重复的模块结构非常简单，例如仅仅是一个tanh激活层。

The repeating module in a standard RNN contains a single layer

LSTMs同样有着相同的链式结构，但是内部结构有所不同。与一般的RNNs的单层结构不同的是，LSTMs每个重复单元内部有4层，这4层以一种非常巧妙的方式连接。

The repeating module in an LSTM contains four interacting layers.

## The Core Idea Behind LSTMs

LSTMs的主要贡献在于引入自循环的巧妙构思，已产生梯度长时间持续流动的路径。LSTMs的关键是cell state，即图中从左到右的箭头。

cell state类似于传送带。它贯穿整个链式结构，仅仅使用一些简单的线性操作连接。这样信息在cell state上流动就不会发生较大的改变。

sigmoid层的输出为一个0~1之间的数字，描述每个组件有多少信息应该通过。0表示不允许任何信息通过，1表示允许所有的信息通过。

## Step-by-Step LSTM Walk Through

LSTM的第一步是决定那些信息应该被遗忘，由一个别称为“遗忘门”的sigmoid层决定。将$h_{t-1}$和$x_t$ 输入到遗忘门，对$C_{t-1}$中的每个数，得到一个0~1之间的数字，1表示”完全保留该信息“，0表示”完全丢失该信息“。

LSTM的下一步是决定我们要存储哪些新的信息到cell state中。由”输入门“决定。输入门由两部分组成，首先，一个sigmoid层决定哪些信息被更新。接着，一个tanh激活函数得到一个新的候选值$\tilde{C}t$ ，$\tilde C{t}$ 中每个数根据sigmoid的输出决定是否加入到cell state中 。也就是说，最后，将sigmoid的输出和正切函数的输出结合，更新得到新的cell state。

## Variants on Long Short Term Memory

Clockwork RNNs by Koutnik, et al. (2014).

## Conclusion

Earlier, I mentioned the remarkable results people are achieving with RNNs. Essentially all of these are achieved using LSTMs. They really work a lot better for most tasks!

Written down as a set of equations, LSTMs look pretty intimidating. Hopefully, walking through them step by step in this essay has made them a bit more approachable.

LSTMs were a big step in what we can accomplish with RNNs. It’s natural to wonder: is there another big step? A common opinion among researchers is: “Yes! There is a next step and it’s attention!” The idea is to let every step of an RNN pick information to look at from some larger collection of information. For example, if you are using an RNN to create a caption describing an image, it might pick a part of the image to look at for every word it outputs. In fact, Xu, et al. (2015) do exactly this – it might be a fun starting point if you want to explore attention! There’s been a number of really exciting results using attention, and it seems like a lot more are around the corner…

Attention isn’t the only exciting thread in RNN research. For example, Grid LSTMs by Kalchbrenner, et al. (2015) seem extremely promising. Work using RNNs in generative models – such as Gregor, et al. (2015), Chung, et al.(2015), or Bayer & Osendorfer (2015) – also seems very interesting. The last few years have been an exciting time for recurrent neural networks, and the coming ones promise to only be more so!