與詞相似度和類比任務一樣,我們也可以將預訓練詞向量應用于情感分析。由于第 16.1 節中的 IMDb 評論數據集 不是很大,使用在大規模語料庫上預訓練的文本表示可能會減少模型的過度擬合。作為圖 16.2.1所示的具體示例 ,我們將使用預訓練的 GloVe 模型表示每個標記,并將這些標記表示輸入多層雙向 RNN 以獲得文本序列表示,并將其轉換為情感分析輸出 (Maas等,2011)。對于相同的下游應用程序,我們稍后會考慮不同的架構選擇。
圖 16.2.1本節將預訓練的 GloVe 提供給基于 RNN 的架構進行情緒分析。
16.2.1。用 RNN 表示單個文本
在文本分類任務中,例如情感分析,變長的文本序列將被轉換為固定長度的類別。在下面的BiRNN
類中,雖然文本序列的每個標記都通過嵌入層 ( self.embedding
) 獲得其單獨的預訓練 GloVe 表示,但整個序列由雙向 RNN ( self.encoder
) 編碼。更具體地說,雙向 LSTM 在初始和最終時間步的隱藏狀態(在最后一層)被連接起來作為文本序列的表示。然后通過具有兩個輸出(“正”和“負”)的全連接層 ( self.decoder
) 將該單一文本表示轉換為輸出類別。
class BiRNN(nn.Module):
def __init__(self, vocab_size, embed_size, num_hiddens,
num_layers, **kwargs):
super(BiRNN, self).__init__(**kwargs)
self.embedding = nn.Embedding(vocab_size, embed_size)
# Set `bidirectional` to True to get a bidirectional RNN
self.encoder = nn.LSTM(embed_size, num_hiddens, num_layers=num_layers,
bidirectional=True)
self.decoder = nn.Linear(4 * num_hiddens, 2)
def forward(self, inputs):
# The shape of `inputs` is (batch size, no. of time steps). Because
# LSTM requires its input's first dimension to be the temporal
# dimension, the input is transposed before obtaining token
# representations. The output shape is (no. of time steps, batch size,
# word vector dimension)
embeddings = self.embedding(inputs.T)
self.encoder.flatten_parameters()
# Returns hidden states of the last hidden layer at different time
# steps. The shape of `outputs` is (no. of time steps, batch size,
# 2 * no. of hidden units)
outputs, _ = self.encoder(embeddings)
# Concatenate the hidden states at the initial and final time steps as
# the input of the fully connected layer. Its shape is (batch size,
# 4 * no. of hidden units)
encoding = torch.cat((outputs[0], outputs[-1]), dim=1)
outs = self.decoder(encoding)
return outs
class BiRNN(nn.Block):
def __init__(self, vocab_size, embed_size, num_hiddens,
num_layers, **kwargs):
super(BiRNN, self).__init__(**kwargs)
self.embedding = nn.Embedding(vocab_size, embed_size)
# Set `bidirectional` to True to get a bidirectional RNN
self.encoder = rnn.LSTM(num_hiddens, num_layers=num_layers,
bidirectional=True, input_size=embed_size)
self.decoder = nn.Dense(2)
def forward(self, inputs):
# The shape of `inputs` is (batch size, no. of time steps). Because
# LSTM requires its input's first dimension to be the temporal
# dimension, the input is transposed before obtaining token
# representations. The output shape is (no. of time steps, batch size,
# word vector dimension)
embeddings = self.embedding(inputs.T)
# Returns hidden states of the last hidden layer at different time
# steps. The shape of `outputs` is (no. of time steps, batch size,
# 2 * no. of hidden units)
outputs = self.encoder(embeddings)
# Concatenate the hidden states at the initial and final time steps as
# the input of the fully connected layer. Its shape is (batch size,
# 4 * no. of hidden units)
encoding = np.concatenate((outputs[0], outputs[-1]), axis=1)
outs = self.decoder(encoding)
return outs
讓我們構建一個具有兩個隱藏層的雙向 RNN 來表示用于情感分析的單個文本。
embed_size, num_hiddens, num_layers, devices = 100, 100, 2, d2l.try_all_gpus()
net = BiRNN(len(vocab), embed_size, num_hiddens, num_layers)
def init_weights(module):
if type(module) == nn.Linear:
nn.init.xavier_uniform_(module.weight)
if type(module) == nn.LSTM:
for param in module._flat_weights_names:
if "weight" in param:
nn.init.xavier_uniform_(module._parameters[param])
net.apply(init_weights);
16.2.2。加載預訓練詞向量
embed_size
下面我們為詞匯表中的標記加載預訓練的 100 維(需要與 一致)GloVe 嵌入。
打印詞匯表中所有標記的向量形狀。
我們使用這些預訓練的詞向量來表示評論中的標記,并且不會在訓練期間更新這些向量。
16.2.3。訓練和評估模型
現在我們可以訓練雙向 RNN 進行情感分析。
lr, num_epochs = 0.01, 5
trainer = torch.optim.Adam(net.parameters(), lr=lr)
loss = nn.CrossEntropyLoss(reduction="none")
d2l.train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices)
loss 0.311, train acc 0.872, test acc 0.850
574.5 examples/sec on [device(type='cuda', index=0), device(type='cuda', index=1)]
loss 0.428, train acc 0.806, test acc 0.791
488.5 examples/sec on [gpu(0), gpu(1)]
我們定義了以下函數來使用經過訓練的模型預測文本序列的情緒net
。
評論