盗墓笔记小说txt下载,玄幻小说排行榜完本,欢乐颂第三季

回憶一下圖 7.2.1中的卷積示例。輸入的高度和寬度均為 3，卷積核的高度和寬度均為 2，從而產生具有維度的輸出表示2×2. 假設輸入形狀是 nh×nw卷積核形狀為 kh×kw，輸出形狀將是 (nh?kh+1)×(nw?kw+1)：我們只能將卷積核移動到它用完像素以應用卷積為止。

在下文中，我們將探索許多技術，包括填充和跨步卷積，它們可以更好地控制輸出的大小。作為動機，請注意，由于內核的寬度和高度通常大于1，在應用許多連續的卷積之后，我們往往會得到比輸入小得多的輸出。如果我們從一個240×240像素圖像，10層層5×5卷積將圖像縮小為200×200像素，切片30%的圖像，并用它抹掉原始圖像邊界上的任何有趣信息。填充是處理此問題的最流行的工具。在其他情況下，我們可能希望大幅降低維數，例如，如果我們發現原始輸入分辨率很笨重。跨步卷積是一種流行的技術，可以在這些情況下提供幫助。

import torch
from torch import nn

from mxnet import np, npx
from mxnet.gluon import nn

npx.set_np()

import jax
from flax import linen as nn
from jax import numpy as jnp
from d2l import jax as d2l

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

import tensorflow as tf

7.3.1. 填充

如上所述，應用卷積層時的一個棘手問題是我們往往會丟失圖像周邊的像素。考慮圖 7.3.1，該圖將像素利用率描述為卷積核大小和圖像內位置的函數。角落里的像素幾乎沒有被使用。

圖 7.3.1尺寸卷積的像素利用1×1, 2×2，和3×3分別。

由于我們通常使用小內核，對于任何給定的卷積，我們可能只會丟失幾個像素，但是當我們應用許多連續的卷積層時，這可能會累加起來。這個問題的一個直接解決方案是在輸入圖像的邊界周圍添加額外的填充像素，從而增加圖像的有效尺寸。通常，我們將額外像素的值設置為零。在圖 7.3.2中，我們填充一個3×3輸入，將其大小增加到5×5. 相應的輸出然后增加到4×4矩陣。陰影部分是第一個輸出元素以及用于輸出計算的輸入和內核張量元素：0×0+0×1+0×2+0×3=0.

圖 7.3.2帶填充的二維互相關。

一般來說，如果我們總共添加ph填充行（大約一半在頂部，一半在底部）和總共pw填充列（大約一半在左邊，一半在右邊），輸出形狀將是

(7.3.1)(nh?kh+ph+1)×(nw?kw+pw+1).

這意味著輸出的高度和寬度將增加 ph和pw，分別。

在許多情況下，我們會想要設置ph=kh?1和 pw=kw?1給輸入和輸出相同的高度和寬度。這樣在構建網絡時更容易預測每一層的輸出形狀。假如說kh這里很奇怪，我們會墊ph/2高度兩側的行。如果 kh是偶數，一種可能是填充 ?ph/2?輸入頂部的行和 ?ph/2?底部的行。我們將以相同的方式填充寬度的兩側。

CNN 通常使用具有奇數高度和寬度值的卷積核，例如 1、3、5 或 7。選擇奇數核大小的好處是我們可以保留維度，同時在頂部和底部填充相同數量的行，并且左右的列數相同。

此外，這種使用奇數內核和填充來精確保持維度的做法提供了文書上的好處。對于任意一個二維張量X，當核的大小為奇數，且各邊的padding行數和列數相同時，產生與輸入等高等寬的輸出，我們知道輸出是通過cross計算的-輸入和卷積核與以為中心的窗口的相關性。Y[i, j]X[i, j]

在下面的示例中，我們創建了一個二維卷積層，其高度和寬度均為 3，并在所有邊上應用 1 個像素的填充。給定一個高度和寬度為 8 的輸入，我們發現輸出的高度和寬度也為 8。

# We define a helper function to calculate convolutions. It initializes the
# convolutional layer weights and performs corresponding dimensionality
# elevations and reductions on the input and output
def comp_conv2d(conv2d, X):
  # (1, 1) indicates that batch size and the number of channels are both 1
  X = X.reshape((1, 1) + X.shape)
  Y = conv2d(X)
  # Strip the first two dimensions: examples and channels
  return Y.reshape(Y.shape[2:])

# 1 row and column is padded on either side, so a total of 2 rows or columns
# are added
conv2d = nn.LazyConv2d(1, kernel_size=3, padding=1)
X = torch.rand(size=(8, 8))
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

# We define a helper function to calculate convolutions. It initializes
# the convolutional layer weights and performs corresponding dimensionality
# elevations and reductions on the input and output
def comp_conv2d(conv2d, X):
  conv2d.initialize()
  # (1, 1) indicates that batch size and the number of channels are both 1
  X = X.reshape((1, 1) + X.shape)
  Y = conv2d(X)
  # Strip the first two dimensions: examples and channels
  return Y.reshape(Y.shape[2:])

# 1 row and column is padded on either side, so a total of 2 rows or columns are added
conv2d = nn.Conv2D(1, kernel_size=3, padding=1)
X = np.random.uniform(size=(8, 8))
comp_conv2d(conv2d, X).shape

(8, 8)

# We define a helper function to calculate convolutions. It initializes
# the convolutional layer weights and performs corresponding dimensionality
# elevations and reductions on the input and output
def comp_conv2d(conv2d, X):
  # (1, X.shape, 1) indicates that batch size and the number of channels are both 1
  key = jax.random.PRNGKey(d2l.get_seed())
  X = X.reshape((1,) + X.shape + (1,))
  Y, _ = conv2d.init_with_output(key, X)
  # Strip the dimensions: examples and channels
  return Y.reshape(Y.shape[1:3])
# 1 row and column is padded on either side, so a total of 2 rows or columns are added
conv2d = nn.Conv(1, kernel_size=(3, 3), padding='SAME')
X = jax.random.uniform(jax.random.PRNGKey(d2l.get_seed()), shape=(8, 8))
comp_conv2d(conv2d, X).shape

(8, 8)

# We define a helper function to calculate convolutions. It initializes
# the convolutional layer weights and performs corresponding dimensionality
# elevations and reductions on the input and output
def comp_conv2d(conv2d, X):
  # (1, 1) indicates that batch size and the number of channels are both 1
  X = tf.reshape(X, (1, ) + X.shape + (1, ))
  Y = conv2d(X)
  # Strip the first two dimensions: examples and channels
  return tf.reshape(Y, Y.shape[1:3])
# 1 row and column is padded on either side, so a total of 2 rows or columns
# are added
conv2d = tf.keras.layers.Conv2D(1, kernel_size=3, padding='same')
X = tf.random.uniform(shape=(8, 8))
comp_conv2d(conv2d, X).shape

TensorShape([8, 8])

當卷積核的高和寬不同時，我們可以通過為高和寬設置不同的填充數，使輸出和輸入具有相同的高和寬。

# We use a convolution kernel with height 5 and width 3. The padding on either
# side of the height and width are 2 and 1, respectively
conv2d = nn.LazyConv2d(1, kernel_size=(5, 3), padding=(2, 1))
comp_conv2d(conv2d, X).shape

torch.Size([8, 8])

# We use a convolution kernel with height 5 and width 3. The padding on
# either side of the height and width are 2 and 1, respectively
conv2d = nn.Conv2D(1, kernel_size=(5, 3), padding=(2, 1))
comp_conv2d(conv2d, X).shape

(8, 8)

# We use a convolution kernel with height 5 and width 3. The padding on
# either side of the height and width are 2 and 1, respectively
conv2d = nn.Conv(1, kernel_size=(5, 3), padding=(2, 1))
comp_conv2d(conv2d, X).shape

(8, 8)

# We use a convolution kernel with height 5 and width 3. The padding on
# either side of the height and width are 2 and 1, respectively
conv2d = tf.keras.layers.Conv2D(1, kernel_size=(5, 3), padding='same')
comp_conv2d(conv2d, X).shape

TensorShape([8, 8])

7.3.2. 步幅

在計算互相關時，我們從輸入張量左上角的卷積窗口開始，然后將其滑過所有位置，包括向下和向右。在前面的示例中，我們默認一次滑動一個元素。然而，有時，無論是為了提高計算效率還是因為我們希望下采樣，我們一次將窗口移動一個以上的元素，跳過中間位置。如果卷積核很大，這是特別有用的，因為它捕獲了大面積的底層圖像。

我們將每張幻燈片遍歷的行數和列數稱為步幅。到目前為止，我們對高度和寬度都使用了 1 的步幅。有時，我們可能想使用更大的步幅。圖 7.3.3顯示了垂直步長為 3，水平步長為 2 的二維互相關運算。陰影部分是輸出元素以及用于輸出計算的輸入和內核張量元素： 0×0+0×1+1×2+2×3=8, 0×0+6×1+0×2+0×3=6. 我們可以看到，當第一列的第二個元素生成時，卷積窗口向下滑動了三行。當生成第一行的第二個元素時，卷積窗口向右滑動兩列。當卷積窗口在輸入上繼續向右滑動兩列時，就沒有輸出了，因為輸入元素無法填滿窗口（除非我們再添加一列padding）。

圖 7.3.3高度和寬度的步長分別為 3 和 2 的互相關。

一般來說，當高度的步幅為sh寬度的步幅是sw，輸出形狀為

(7.3.2)?(nh?kh+ph+sh)/sh?×?(nw?kw+pw+sw)/sw?.

如果我們設置ph=kh?1和pw=kw?1, 那么輸出形狀可以簡化為 ?(nh+sh?1)/sh?×?(nw+sw?1)/sw?. 更進一步，如果輸入的高度和寬度可以被高度和寬度的步幅整除，那么輸出形狀將是 (nh/sh)×(nw/sw).

下面，我們將高度和寬度的步幅都設置為 2，從而將輸入的高度和寬度減半。

conv2d = nn.LazyConv2d(1, kernel_size=3, padding=1, stride=2)
comp_conv2d(conv2d, X).shape

torch.Size([4, 4])

conv2d = nn.Conv2D(1, kernel_size=3, padding=1, strides=2)
comp_conv2d(conv2d, X).shape

(4, 4)

conv2d = nn.Conv(1, kernel_size=(3, 3), padding=1, strides=2)
comp_conv2d(conv2d, X).shape

(4, 4)

conv2d = tf.keras.layers.Conv2D(1, kernel_size=3, padding='same', strides=2)
comp_conv2d(conv2d, X).shape

TensorShape([4, 4])

讓我們看一個稍微復雜一點的例子。

conv2d = nn.LazyConv2d(1, kernel_size=(3, 5), padding=(0, 1), stride=(3, 4))
comp_conv2d(conv2d, X).shape

torch.Size([2, 2])

conv2d = nn.Conv2D(1, kernel_size=(3, 5), padding=(0, 1), strides=(3, 4))
comp_conv2d(conv2d, X).shape

(2, 2)

conv2d = nn.Conv(1, kernel_size=(3, 5), padding=(0, 1), strides=(3, 4))
comp_conv2d(conv2d, X).shape

(2, 2)

conv2d = tf.keras.layers.Conv2D(1, kernel_size=(3,5), padding='valid',
                strides=(3, 4))
comp_conv2d(conv2d, X).shape

TensorShape([2, 1])

7.3.3. 總結與討論

填充可以增加輸出的高度和寬度。這通常用于為輸出提供與輸入相同的高度和寬度，以避免不希望的輸出收縮。此外，它確保所有像素的使用頻率相同。通常我們在輸入高度和寬度的兩側選擇對稱填充。在這種情況下，我們指的是 (ph,pw)填充。最常見的是我們設置ph=pw，在這種情況下，我們只是聲明我們選擇填充p.

類似的約定適用于步幅。橫步時 sh和垂直步幅swmatch，我們簡單說說strides. 步幅可以降低輸出的分辨率，例如將輸出的高度和寬度降低到僅 1/n輸入的高度和寬度n>1. 默認情況下，填充為 0，步幅為 1。

到目前為止，我們討論的所有填充都只是用零擴展圖像。這具有顯著的計算優勢，因為它很容易實現。此外，可以將運算符設計為隱式利用此填充，而無需分配額外的內存。同時，它允許 CNN 對圖像中的隱式位置信息進行編碼，只需了解“空白”的位置即可。零填充有很多替代方法。 Alsallakh等人。( 2020 )提供了替代方案的廣泛概述（盡管沒有明確的案例使用非零填充，除非出現偽影）。

7.3.4. 練習

給定本節中最后一個具有內核大小的代碼示例 (3,5), 填充(0,1), 和大步(3,4), 計算輸出形狀以檢查它是否與實驗結果一致。

對于音頻信號，stride為2對應什么？

實施鏡像填充，即邊界值被簡單地鏡像以擴展張量的填充。

步幅大于 1 的計算優勢是什么？

大于 1 的步幅在統計上有什么好處？

你將如何實現一大步12？它對應什么？這什么時候有用？

聲明：本文內容及配圖由入駐作者撰寫或者入駐合作網站授權轉載。文章觀點僅代表作者本人，不代表電子發燒友網立場。文章及其配圖僅供工程師學習之用，如有內容侵權或者其他違規問題，請聯系本站處理。舉報投訴

pytorch

pytorch

+關注

關注
2

文章
809

瀏覽量
13914

在线观看www成人影院-在线观看www日本免费网站-在线观看www视频-在线观看操-欧美18在线-欧美1级

搜索歷史

PyTorch教程-7.3. 填充和步幅

評論