完美世界前传下载,盗墓笔记,小说阅读网站

雖然 AlexNet 提供了深度 CNN 可以取得良好結果的經驗證據，但它沒有提供通用模板來指導后續研究人員設計新網絡。在接下來的部分中，我們將介紹幾個常用于設計深度網絡的啟發式概念。

該領域的進展反映了芯片設計中 VLSI（超大規模集成）的進展，工程師從將晶體管放置到邏輯元件再到邏輯塊（Mead，1980 年）。同樣，神經網絡架構的設計也變得越來越抽象，研究人員從單個神經元的角度思考到整個層，現在轉向塊，重復層的模式。十年后，這已經發展到研究人員使用整個訓練模型將它們重新用于不同但相關的任務。此類大型預訓練模型通常稱為基礎模型（Bommasani等人，2021 年）。

回到網絡設計。使用塊的想法首先出現于牛津大學的視覺幾何組 (VGG)，在他們同名的VGG網絡中（Simonyan 和 Zisserman，2014 年）。通過使用循環和子例程，可以使用任何現代深度學習框架輕松地在代碼中實現這些重復結構。

import torch
from torch import nn
from d2l import torch as d2l

from mxnet import init, np, npx
from mxnet.gluon import nn
from d2l import mxnet as d2l

npx.set_np()

import jax
from flax import linen as nn
from d2l import jax as d2l

import tensorflow as tf
from d2l import tensorflow as d2l

8.2.1. VGG 塊

CNN 的基本構建塊是以下序列：(i) 帶有填充的卷積層以保持分辨率，(ii) 非線性，例如 ReLU，(iii) 池化層，例如最大池化以減少解決。這種方法的問題之一是空間分辨率下降得非?？?。特別是，這強加了一個硬限制log2?d網絡上所有維度之前的卷積層（d) 用完了。例如，在 ImageNet 的情況下，以這種方式不可能有超過 8 個卷積層。

Simonyan 和 Zisserman ( 2014 )的關鍵思想是以塊的形式通過最大池化在下采樣之間使用多個卷積。他們主要感興趣的是深度網絡還是寬網??絡表現更好。例如，連續應用兩個 3×3卷積接觸與單個相同的像素 5×5卷積確實如此。同時，后者使用了大約同樣多的參數（25?c2) 三個 3×3卷積做（3?9?c2). 在相當詳細的分析中，他們表明深度和狹窄的網絡明顯優于淺層網絡。這將深度學習置于對具有超過 100 層的典型應用的更深網絡的追求上。堆疊3×3卷積已成為后來的深度網絡的黃金標準（最近Liu等人( 2022 )才重新考慮的設計決策）。因此，小卷積的快速實現已成為 GPU 的主要內容（Lavin 和 Gray，2016 年）。

回到 VGG：一個 VGG 塊由一系列卷積組成 3×3填充為 1 的內核（保持高度和寬度）后跟一??個2×2步長為 2 的最大池化層（每個塊后將高度和寬度減半）。在下面的代碼中，我們定義了一個函數vgg_block來實現一個 VGG 塊。

下面的函數有兩個參數，對應于卷積層數num_convs和輸出通道數 num_channels。

def vgg_block(num_convs, out_channels):
  layers = []
  for _ in range(num_convs):
    layers.append(nn.LazyConv2d(out_channels, kernel_size=3, padding=1))
    layers.append(nn.ReLU())
  layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
  return nn.Sequential(*layers)

def vgg_block(num_convs, num_channels):
  blk = nn.Sequential()
  for _ in range(num_convs):
    blk.add(nn.Conv2D(num_channels, kernel_size=3,
             padding=1, activation='relu'))
  blk.add(nn.MaxPool2D(pool_size=2, strides=2))
  return blk

def vgg_block(num_convs, out_channels):
  layers = []
  for _ in range(num_convs):
    layers.append(nn.Conv(out_channels, kernel_size=(3, 3), padding=(1, 1)))
    layers.append(nn.relu)
  layers.append(lambda x: nn.max_pool(x, window_shape=(2, 2), strides=(2, 2)))
  return nn.Sequential(layers)

def vgg_block(num_convs, num_channels):
  blk = tf.keras.models.Sequential()
  for _ in range(num_convs):
    blk.add(
      tf.keras.layers.Conv2D(num_channels, kernel_size=3,
                  padding='same', activation='relu'))
  blk.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
  return blk

8.2.2. VGG網絡

與 AlexNet 和 LeNet 一樣，VGG 網絡可以分為兩部分：第一部分主要由卷積層和池化層組成，第二部分由與 AlexNet 相同的全連接層組成。關鍵區別在于卷積層在保持維數不變的非線性變換中分組，然后是分辨率降低步驟，如圖 8.2.1所示。

圖 8.2.1從 AlexNet 到 VGG。關鍵區別在于 VGG 由層塊組成，而 AlexNet 的層都是單獨設計的。

網絡的卷積部分連續連接圖 8.2.1中的幾個 VGG 塊（也在vgg_block函數中定義）。這種卷積分組是一種在過去十年中幾乎保持不變的模式，盡管操作的具體選擇已經發生了相當大的修改。該變量 conv_arch由一個元組列表（每個塊一個）組成，其中每個元組包含兩個值：卷積層數和輸出通道數，它們正是調用函數所需的參數vgg_block。因此，VGG 定義了一個網絡家族，而不僅僅是一個特定的表現形式。要構建一個特定的網絡，我們只需迭代arch以組成塊。

class VGG(d2l.Classifier):
  def __init__(self, arch, lr=0.1, num_classes=10):
    super().__init__()
    self.save_hyperparameters()
    conv_blks = []
    for (num_convs, out_channels) in arch:
      conv_blks.append(vgg_block(num_convs, out_channels))
    self.net = nn.Sequential(
      *conv_blks, nn.Flatten(),
      nn.LazyLinear(4096), nn.ReLU(), nn.Dropout(0.5),
      nn.LazyLinear(4096), nn.ReLU(), nn.Dropout(0.5),
      nn.LazyLinear(num_classes))
    self.net.apply(d2l.init_cnn)

class VGG(d2l.Classifier):
  def __init__(self, arch, lr=0.1, num_classes=10):
    super().__init__()
    self.save_hyperparameters()
    self.net = nn.Sequential()
    for (num_convs, num_channels) in arch:
      self.net.add(vgg_block(num_convs, num_channels))
    self.net.add(nn.Dense(4096, activation='relu'), nn.Dropout(0.5),
           nn.Dense(4096, activation='relu'), nn.Dropout(0.5),
           nn.Dense(num_classes))
    self.net.initialize(init.Xavier())

class VGG(d2l.Classifier):
  arch: list
  lr: float = 0.1
  num_classes: int = 10
  training: bool = True

  def setup(self):
    conv_blks = []
    for (num_convs, out_channels) in self.arch:
      conv_blks.append(vgg_block(num_convs, out_channels))

    self.net = nn.Sequential([
      *conv_blks,
      lambda x: x.reshape((x.shape[0], -1)), # flatten
      nn.Dense(4096), nn.relu,
      nn.Dropout(0.5, deterministic=not self.training),
      nn.Dense(4096), nn.relu,
      nn.Dropout(0.5, deterministic=not self.training),
      nn.Dense(self.num_classes)])

class VGG(d2l.Classifier):
  def __init__(self, arch, lr=0.1, num_classes=10):
    super().__init__()
    self.save_hyperparameters()
    self.net = tf.keras.models.Sequential()
    for (num_convs, num_channels) in arch:
      self.net.add(vgg_block(num_convs, num_channels))
    self.net.add(
      tf.keras.models.Sequential([
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(4096, activation='relu'),
      tf.keras.layers.Dropout(0.5),
      tf.keras.layers.Dense(4096, activation='relu'),
      tf.keras.layers.Dropout(0.5),
      tf.keras.layers.Dense(num_classes)]))

原始VGG網絡有5個卷積塊，其中前兩個各有一個卷積層，后三個各有兩個卷積層。第一個塊有 64 個輸出通道，隨后的每個塊將輸出通道的數量加倍，直到該數量達到 512。由于該網絡使用 8 個卷積層和 3 個全連接層，因此通常稱為 VGG-11。

VGG(arch=((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))).layer_summary(
  (1, 1, 224, 224))

Sequential output shape:   torch.Size([1, 64, 112, 112])
Sequential output shape:   torch.Size([1, 128, 56, 56])
Sequential output shape:   torch.Size([1, 256, 28, 28])
Sequential output shape:   torch.Size([1, 512, 14, 14])
Sequential output shape:   torch.Size([1, 512, 7, 7])
Flatten output shape:    torch.Size([1, 25088])
Linear output shape:     torch.Size([1, 4096])
ReLU output shape:  torch.Size([1, 4096])
Dropout output shape:    torch.Size([1, 4096])
Linear output shape:     torch.Size([1, 4096])
ReLU output shape:  torch.Size([1, 4096])
Dropout output shape:    torch.Size([1, 4096])
Linear output shape:     torch.Size([1, 10])

VGG(arch=((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))).layer_summary(
  (1, 1, 224, 224))

Sequential output shape:   (1, 64, 112, 112)
Sequential output shape:   (1, 128, 56, 56)
Sequential output shape:   (1, 256, 28, 28)
Sequential output shape:   (1, 512, 14, 14)
Sequential output shape:   (1, 512, 7, 7)
Dense output shape: (1, 4096)
Dropout output shape:    (1, 4096)
Dense output shape: (1, 4096)
Dropout output shape:    (1, 4096)
Dense output shape: (1, 10)

VGG(arch=((1, 64), (1, 128), (2, 256), (2, 512), (2, 512)),
  training=False).layer_summary((1, 224, 224, 1))

Sequential output shape:   (1, 112, 112, 64)
Sequential output shape:   (1, 56, 56, 128)
Sequential output shape:   (1, 28, 28, 256)
Sequential output shape:   (1, 14, 14, 512)
Sequential output shape:   (1, 7, 7, 512)
function output shape:    (1, 25088)
Dense output shape: (1, 4096)
custom_jvp output shape:   (1, 4096)
Dropout output shape:    (1, 4096)
Dense output shape: (1, 4096)
custom_jvp output shape:   (1, 4096)
Dropout output shape:    (1, 4096)
Dense output shape: (1, 10)

VGG(arch=((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))).layer_summary(
  (1, 224, 224, 1))

Sequential output shape:   (1, 112, 112, 64)
Sequential output shape:   (1, 56, 56, 128)
Sequential output shape:   (1, 28, 28, 256)
Sequential output shape:   (1, 14, 14, 512)
Sequential output shape:   (1, 7, 7, 512)
Sequential output shape:   (1, 10)

如您所見，我們將每個塊的高度和寬度減半，最終達到 7 的高度和寬度，然后展平表示以供網絡的完全連接部分處理。 Simonyan 和 Zisserman ( 2014 )描述了 VGG 的其他幾種變體。事實上，在引入新架構時，提出具有不同速度-精度權衡的網絡系列已經成為常態。

8.2.3. 訓練

由于 VGG-11 在計算上比 AlexNet 要求更高，我們構建了一個通道數較少的網絡。這對于 Fashion-MNIST 的訓練來說綽綽有余。模型訓練過程與8.1節AlexNet類似。再次觀察驗證和訓練損失之間的密切匹配，表明只有少量過度擬合。

model = VGG(arch=((1, 16), (1, 32), (2, 64), (2, 128), (2, 128)), lr=0.01)
trainer = d2l.Trainer(max_epochs=10, num_gpus=1)
data = d2l.FashionMNIST(batch_size=128, resize=(224, 224))
model.apply_init([next(iter(data.get_dataloader(True)))[0]], d2l.init_cnn)
trainer.fit(model, data)

model = VGG(arch=((1, 16), (1, 32), (2, 64), (2, 128), (2, 128)), lr=0.01)
trainer = d2l.Trainer(max_epochs=10, num_gpus=1)
data = d2l.FashionMNIST(batch_size=128, resize=(224, 224))
trainer.fit(model, data)

model = VGG(arch=((1, 16), (1, 32), (2, 64), (2, 128), (2, 128)), lr=0.01)
trainer = d2l.Trainer(max_epochs=10, num_gpus=1)
data = d2l.FashionMNIST(batch_size=128, resize=(224, 224))
trainer.fit(model, data)

trainer = d2l.Trainer(max_epochs=10)
data = d2l.FashionMNIST(batch_size=128, resize=(224, 224))
with d2l.try_gpu():
  model = VGG(arch=((1, 16), (1, 32), (2, 64), (2, 128), (2, 128)), lr=0.01)
  trainer.fit(model, data)

8.2.4. 概括

有人可能會爭辯說 VGG 是第一個真正現代的卷積神經網絡。雖然 AlexNet 引入了許多使深度學習大規模有效的組件，但可以說是 VGG 引入了關鍵屬性，例如多個卷積塊以及對深度和窄網絡的偏好。它也是第一個實際上是整個類似參數化模型系列的網絡，為從業者提供了復雜性和速度之間的充分權衡。這也是現代深度學習框架大放異彩的地方。不再需要生成 XML 配置文件來指定網絡，而是通過簡單的 Python 代碼組裝所述網絡。

最近 ParNet （Goyal等人，2021 年）證明，可以通過大量并行計算使用更淺的架構來實現有競爭力的性能。這是一個令人興奮的發展，希望它能影響未來的建筑設計。不過，在本章的剩余部分，我們將追溯過去十年的科學進步之路。

8.2.5. 練習

與 AlexNet 相比，VGG 在計算方面要慢得多，而且需要更多的 GPU 內存。

比較 AlexNet 和 VGG 所需的參數數量。

比較卷積層和全連接層中使用的浮點運算數量。

您如何減少全連接層產生的計算成本？

當顯示與網絡各層相關的維度時，我們只能看到與 8 個塊（加上一些輔助變換）相關的信息，即使網絡有 11 層。剩下的 3 層去了哪里？

使用 VGG 論文（Simonyan 和 Zisserman，2014 年）中的表 1構建其他常見模型，例如 VGG-16 或 VGG-19。

對 Fashion-MNIST 中的分辨率進行上采樣8 從28×28到224×224尺寸非常浪費。嘗試修改網絡架構和分辨率轉換，例如，將其輸入改為 56 或 84 維。你能在不降低網絡準確性的情況下這樣做嗎？考慮 VGG 論文（Simonyan 和 Zisserman，2014 年），了解在下采樣之前添加更多非線性的想法。

聲明：本文內容及配圖由入駐作者撰寫或者入駐合作網站授權轉載。文章觀點僅代表作者本人，不代表電子發燒友網立場。文章及其配圖僅供工程師學習之用，如有內容侵權或者其他違規問題，請聯系本站處理。舉報投訴

網絡

網絡

+關注

關注
14

文章
7780

瀏覽量
90479
pytorch

pytorch

+關注

關注
2

文章
809

瀏覽量
13791

在线观看www成人影院-在线观看www日本免费网站-在线观看www视频-在线观看操-欧美18在线-欧美1级

搜索歷史

PyTorch教程-8.2. 使用塊的網絡 (VGG)

評論