到目前為止,我們一直專(zhuān)注于定義由序列輸入、單個(gè)隱藏 RNN 層和輸出層組成的網(wǎng)絡(luò)。盡管在任何時(shí)間步長(zhǎng)的輸入和相應(yīng)的輸出之間只有一個(gè)隱藏層,但從某種意義上說(shuō)這些網(wǎng)絡(luò)很深。第一個(gè)時(shí)間步的輸入會(huì)影響最后一個(gè)時(shí)間步的輸出T(通常是 100 或 1000 步之后)。這些輸入通過(guò)T在達(dá)到最終輸出之前循環(huán)層的應(yīng)用。但是,我們通常還希望保留表達(dá)給定時(shí)間步長(zhǎng)的輸入與同一時(shí)間步長(zhǎng)的輸出之間復(fù)雜關(guān)系的能力。因此,我們經(jīng)常構(gòu)建不僅在時(shí)間方向上而且在輸入到輸出方向上都很深的 RNN。這正是我們?cè)?MLP 和深度 CNN 的開(kāi)發(fā)中已經(jīng)遇到的深度概念。
構(gòu)建這種深度 RNN 的標(biāo)準(zhǔn)方法非常簡(jiǎn)單:我們將 RNN 堆疊在一起。給定一個(gè)長(zhǎng)度序列T,第一個(gè) RNN 產(chǎn)生一個(gè)輸出序列,也是長(zhǎng)度T. 這些依次構(gòu)成下一個(gè) RNN 層的輸入。在這個(gè)簡(jiǎn)短的部分中,我們將說(shuō)明這種設(shè)計(jì)模式,并提供一個(gè)簡(jiǎn)單示例來(lái)說(shuō)明如何編寫(xiě)此類(lèi)堆疊 RNN。下面,在 圖 10.3.1中,我們用L隱藏層。每個(gè)隱藏狀態(tài)對(duì)順序輸入進(jìn)行操作并產(chǎn)生順序輸出。此外,每個(gè)時(shí)間步的任何 RNN 單元(圖 10.3.1中的白框 )都取決于同一層在前一時(shí)間步的值和前一層在同一時(shí)間步的值。
圖 10.3.1深度 RNN 的架構(gòu)。
正式地,假設(shè)我們有一個(gè)小批量輸入 Xt∈Rn×d(示例數(shù)量: n,每個(gè)示例中的輸入數(shù)量:d) 在時(shí)間步 t. 同時(shí)step,讓hidden state的 lth隱藏層(l=1,…,L) 是 Ht(l)∈Rn×h(隱藏單元的數(shù)量:h) 和輸出層變量是 Ot∈Rn×q(輸出數(shù)量: q). 環(huán)境Ht(0)=Xt, 的隱藏狀態(tài)lth使用激活函數(shù)的隱藏層?l計(jì)算如下:
(10.3.1)Ht(l)=?l(Ht(l?1)Wxh(l)+Ht?1(l)Whh(l)+bh(l)),
權(quán)重在哪里 Wxh(l)∈Rh×h和 Whh(l)∈Rh×h, 連同偏差bh(l)∈R1×h, 是模型參數(shù)lth隱藏層。
最終輸出層的計(jì)算只是根據(jù)最終的隱藏狀態(tài)Lth隱藏層:
(10.3.2)Ot=Ht(L)Whq+bq,
重量在哪里Whq∈Rh×q和偏見(jiàn)bq∈R1×q是輸出層的模型參數(shù)。
與 MLP 一樣,隱藏層的數(shù)量L和隱藏單元的數(shù)量h是我們可以調(diào)整的超參數(shù)。常見(jiàn)的 RNN 層寬度 (h) 在范圍內(nèi)(64,2056), 和共同深度 (L) 在范圍內(nèi)(1,8). 此外,我們可以通過(guò)將(10.3.1)中的隱藏狀態(tài)計(jì)算替換為來(lái)自 LSTM 或 GRU 的計(jì)算來(lái)輕松獲得深度門(mén)控 RNN。
import torch from torch import nn from d2l import torch as d2l
from mxnet import np, npx from mxnet.gluon import rnn from d2l import mxnet as d2l npx.set_np()
import jax from flax import linen as nn from jax import numpy as jnp from d2l import jax as d2l
import tensorflow as tf from d2l import tensorflow as d2l
10.3.1。從零開(kāi)始實(shí)施
要從頭開(kāi)始實(shí)現(xiàn)多層 RNN,我們可以將每一層視為RNNScratch具有自己可學(xué)習(xí)參數(shù)的實(shí)例。
class StackedRNNScratch(d2l.Module): def __init__(self, num_inputs, num_hiddens, num_layers, sigma=0.01): super().__init__() self.save_hyperparameters() self.rnns = nn.Sequential(*[d2l.RNNScratch( num_inputs if i==0 else num_hiddens, num_hiddens, sigma) for i in range(num_layers)])
class StackedRNNScratch(d2l.Module): def __init__(self, num_inputs, num_hiddens, num_layers, sigma=0.01): super().__init__() self.save_hyperparameters() self.rnns = [d2l.RNNScratch(num_inputs if i==0 else num_hiddens, num_hiddens, sigma) for i in range(num_layers)]
class StackedRNNScratch(d2l.Module): num_inputs: int num_hiddens: int num_layers: int sigma: float = 0.01 def setup(self): self.rnns = [d2l.RNNScratch(self.num_inputs if i==0 else self.num_hiddens, self.num_hiddens, self.sigma) for i in range(self.num_layers)]
class StackedRNNScratch(d2l.Module): def __init__(self, num_inputs, num_hiddens, num_layers, sigma=0.01): super().__init__() self.save_hyperparameters() self.rnns = [d2l.RNNScratch(num_inputs if i==0 else num_hiddens, num_hiddens, sigma) for i in range(num_layers)]
多層正向計(jì)算只是逐層進(jìn)行正向計(jì)算。
@d2l.add_to_class(StackedRNNScratch) def forward(self, inputs, Hs=None): outputs = inputs if Hs is None: Hs = [None] * self.num_layers for i in range(self.num_layers): outputs, Hs[i] = self.rnns[i](outputs, Hs[i]) outputs = torch.stack(outputs, 0) return outputs, Hs
@d2l.add_to_class(StackedRNNScratch) def forward(self, inputs, Hs=None): outputs = inputs if Hs is None: Hs = [None] * self.num_layers for i in range(self.num_layers): outputs, Hs[i] = self.rnns[i](outputs, Hs[i]) outputs = np.stack(outputs, 0) return outputs, Hs
@d2l.add_to_class(StackedRNNScratch) def forward(self, inputs, Hs=None): outputs = inputs if Hs is None: Hs = [None] * self.num_layers for i in range(self.num_layers): outputs, Hs[i] = self.rnns[i](outputs, Hs[i]) outputs = jnp.stack(outputs, 0) return outputs, Hs
@d2l.add_to_class(StackedRNNScratch) def forward(self, inputs, Hs=None): outputs = inputs if Hs is None: Hs = [None] * self.num_layers for i in range(self.num_layers): outputs, Hs[i] = self.rnns[i](outputs, Hs[i]) outputs = tf.stack(outputs, 0) return outputs, Hs
例如,我們?cè)跁r(shí)間機(jī)器數(shù)據(jù)集上訓(xùn)練了一個(gè)深度 GRU 模型(與第 9.5 節(jié)相同)。為了簡(jiǎn)單起見(jiàn),我們將層數(shù)設(shè)置為 2。
data = d2l.TimeMachine(batch_size=1024, num_steps=32) rnn_block = StackedRNNScratch(num_inputs=len(data.vocab), num_hiddens=32, num_layers=2) model = d2l.RNNLMScratch(rnn_block, vocab_size=len(data.vocab), lr=2) trainer = d2l.Trainer(max_epochs=100, gradient_clip_val=1, num_gpus=1) trainer.fit(model, data)
data = d2l.TimeMachine(batch_size=1024, num_steps=32) rnn_block = StackedRNNScratch(num_inputs=len(data.vocab), num_hiddens=32, num_layers=2) model = d2l.RNNLMScratch(rnn_block, vocab_size=len(data.vocab), lr=2) trainer = d2l.Trainer(max_epochs=100, gradient_clip_val=1, num_gpus=1) trainer.fit(model, data)
data = d2l.TimeMachine(batch_size=1024, num_steps=32) rnn_block = StackedRNNScratch(num_inputs=len(data.vocab), num_hiddens=32, num_layers=2) model = d2l.RNNLMScratch(rnn_block, vocab_size=len(data.vocab), lr=2) trainer = d2l.Trainer(max_epochs=100, gradient_clip_val=1, num_gpus=1) trainer.fit(model, data)
data = d2l.TimeMachine(batch_size=1024, num_steps=32) with d2l.try_gpu(): rnn_block = StackedRNNScratch(num_inputs=len(data.vocab), num_hiddens=32, num_layers=2) model = d2l.RNNLMScratch(rnn_block, vocab_size=len(data.vocab), lr=2) trainer = d2l.Trainer(max_epochs=100, gradient_clip_val=1) trainer.fit(model, data)
10.3.2。簡(jiǎn)潔的實(shí)現(xiàn)
幸運(yùn)的是,實(shí)現(xiàn)多層 RNN 所需的許多邏輯細(xì)節(jié)都可以在高級(jí) API 中輕松獲得。我們的簡(jiǎn)潔實(shí)現(xiàn)將使用此類(lèi)內(nèi)置功能。該代碼概括了我們之前在第 10.2 節(jié)中使用的代碼,允許明確指定層數(shù)而不是選擇單層的默認(rèn)值。
class GRU(d2l.RNN): #@save """The multi-layer GRU model.""" def __init__(self, num_inputs, num_hiddens, num_layers, dropout=0): d2l.Module.__init__(self) self.save_hyperparameters() self.rnn = nn.GRU(num_inputs, num_hiddens, num_layers, dropout=dropout)
Fortunately many of the logistical details required to implement multiple layers of an RNN are readily available in high-level APIs. Our concise implementation will use such built-in functionalities. The code generalizes the one we used previously in Section 10.2, allowing specification of the number of layers explicitly rather than picking the default of a single layer.
class GRU(d2l.RNN): #@save """The multi-layer GRU model.""" def __init__(self, num_hiddens, num_layers, dropout=0): d2l.Module.__init__(self) self.save_hyperparameters() self.rnn = rnn.GRU(num_hiddens, num_layers, dropout=dropout)
Flax takes a minimalistic approach while implementing RNNs. Defining the number of layers in an RNN or combining it with dropout is not available out of the box. Our concise implementation will use all built-in functionalities and add num_layers and dropout features on top. The code generalizes the one we used previously in Section 10.2, allowing specification of the number of layers explicitly rather than picking the default of a single layer.
class GRU(d2l.RNN): #@save """The multi-layer GRU model.""" num_hiddens: int num_layers: int dropout: float = 0 @nn.compact def __call__(self, X, state=None, training=False): outputs = X new_state = [] if state is None: batch_size = X.shape[1] state = [nn.GRUCell.initialize_carry(jax.random.PRNGKey(0), (batch_size,), self.num_hiddens)] * self.num_layers GRU = nn.scan(nn.GRUCell, variable_broadcast="params", in_axes=0, out_axes=0, split_rngs={"params": False}) # Introduce a dropout layer after every GRU layer except last for i in range(self.num_layers - 1): layer_i_state, X = GRU()(state[i], outputs) new_state.append(layer_i_state) X = nn.Dropout(self.dropout, deterministic=not training)(X) # Final GRU layer without dropout out_state, X = GRU()(state[-1], X) new_state.append(out_state) return X, jnp.array(new_state)
Fortunately many of the logistical details required to implement multiple layers of an RNN are readily available in high-level APIs. Our concise implementation will use such built-in functionalities. The code generalizes the one we used previously in Section 10.2, allowing specification of the number of layers explicitly rather than picking the default of a single layer.
class GRU(d2l.RNN): #@save """The multi-layer GRU model.""" def __init__(self, num_hiddens, num_layers, dropout=0): d2l.Module.__init__(self) self.save_hyperparameters() gru_cells = [tf.keras.layers.GRUCell(num_hiddens, dropout=dropout) for _ in range(num_layers)] self.rnn = tf.keras.layers.RNN(gru_cells, return_sequences=True, return_state=True, time_major=True) def forward(self, X, state=None): outputs, *state = self.rnn(X, state) return outputs, state
選擇超參數(shù)等架構(gòu)決策與10.2 節(jié)中的決策非常相似。我們選擇相同數(shù)量的輸入和輸出,因?yàn)槲覀冇胁煌臉?biāo)記,即vocab_size。隱藏單元的數(shù)量仍然是 32。唯一的區(qū)別是我們現(xiàn)在通過(guò)指定 的值來(lái)選擇不平凡的隱藏層數(shù)量 num_layers。
gru = GRU(num_inputs=len(data.vocab), num_hiddens=32, num_layers=2) model = d2l.RNNLM(gru, vocab_size=len(data.vocab), lr=2) trainer.fit(model, data)
model.predict('it has', 20, data.vocab, d2l.try_gpu())
'it has a small the time tr'
gru = GRU(num_hiddens=32, num_layers=2) model = d2l.RNNLM(gru, vocab_size=len(data.vocab), lr=2) # Running takes > 1h (pending fix from MXNet) # trainer.fit(model, data) # model.predict('it has', 20, data.vocab, d2l.try_gpu())
gru = GRU(num_hiddens=32, num_layers=2) model = d2l.RNNLM(gru, vocab_size=len(data.vocab), lr=2) trainer.fit(model, data)
model.predict('it has', 20, data.vocab, trainer.state.params)
'it has wo mean the time tr'
gru = GRU(num_hiddens=32, num_layers=2) with d2l.try_gpu(): model = d2l.RNNLM(gru, vocab_size=len(data.vocab), lr=2) trainer.fit(model, data)
model.predict('it has', 20, data.vocab)
'it has and the time travel'
10.3.3。概括
在深度 RNN 中,隱藏狀態(tài)信息被傳遞到當(dāng)前層的下一個(gè)時(shí)間步和下一層的當(dāng)前時(shí)間步。存在許多不同風(fēng)格的深度 RNN,例如 LSTM、GRU 或普通 RNN。方便的是,這些模型都可以作為深度學(xué)習(xí)框架的高級(jí) API 的一部分使用。模型的初始化需要小心。總的來(lái)說(shuō),深度 RNN 需要大量的工作(例如學(xué)習(xí)率和裁剪)來(lái)確保適當(dāng)?shù)氖諗俊?/p>
10.3.4。練習(xí)
用 LSTM 替換 GRU 并比較準(zhǔn)確性和訓(xùn)練速度。
增加訓(xùn)練數(shù)據(jù)以包含多本書(shū)。你的困惑度可以降到多低?
在建模文本時(shí),您想結(jié)合不同作者的來(lái)源嗎?為什么這是個(gè)好主意?會(huì)出什么問(wèn)題?
-
神經(jīng)網(wǎng)絡(luò)
+關(guān)注
關(guān)注
42文章
4808瀏覽量
102822 -
pytorch
+關(guān)注
關(guān)注
2文章
809瀏覽量
13772
發(fā)布評(píng)論請(qǐng)先 登錄
基于遞歸神經(jīng)網(wǎng)絡(luò)和前饋神經(jīng)網(wǎng)絡(luò)的深度學(xué)習(xí)預(yù)測(cè)算法
PyTorch教程8.1之深度卷積神經(jīng)網(wǎng)絡(luò)(AlexNet)

PyTorch教程之從零開(kāi)始的遞歸神經(jīng)網(wǎng)絡(luò)實(shí)現(xiàn)

PyTorch教程9.6之遞歸神經(jīng)網(wǎng)絡(luò)的簡(jiǎn)潔實(shí)現(xiàn)

PyTorch教程10.3之深度遞歸神經(jīng)網(wǎng)絡(luò)

PyTorch教程10.4之雙向遞歸神經(jīng)網(wǎng)絡(luò)

PyTorch教程16.2之情感分析:使用遞歸神經(jīng)網(wǎng)絡(luò)

使用PyTorch構(gòu)建神經(jīng)網(wǎng)絡(luò)
遞歸神經(jīng)網(wǎng)絡(luò)是循環(huán)神經(jīng)網(wǎng)絡(luò)嗎
遞歸神經(jīng)網(wǎng)絡(luò)與循環(huán)神經(jīng)網(wǎng)絡(luò)一樣嗎
rnn是遞歸神經(jīng)網(wǎng)絡(luò)還是循環(huán)神經(jīng)網(wǎng)絡(luò)
PyTorch神經(jīng)網(wǎng)絡(luò)模型構(gòu)建過(guò)程
遞歸神經(jīng)網(wǎng)絡(luò)的實(shí)現(xiàn)方法
遞歸神經(jīng)網(wǎng)絡(luò)和循環(huán)神經(jīng)網(wǎng)絡(luò)的模型結(jié)構(gòu)

評(píng)論