電子發(fā)燒友網(wǎng)>電子資料下載>電子資料>PyTorch教程14.9之語義分割和數(shù)據(jù)集

PyTorch教程14.9之語義分割和數(shù)據(jù)集

375485 2023-06-05 | pdf | 0.26 MB | 次下載 | 免費

資料介紹

在第 14.3 節(jié)-第 14.8 節(jié)討論對象檢測任務(wù)時，矩形邊界框用于標(biāo)記和預(yù)測圖像中的對象。本節(jié)將討論語義分割問題，重點關(guān)注如何將圖像劃分為屬于不同語義類的區(qū)域。與目標(biāo)檢測不同，語義分割在像素級別識別和理解圖像中的內(nèi)容：它對語義區(qū)域的標(biāo)記和預(yù)測是在像素級別。圖 14.9.1顯示了語義分割中圖像的狗、貓和背景的標(biāo)簽。與目標(biāo)檢測相比，語義分割中標(biāo)記的像素級邊界明顯更細(xì)粒度。

https://file.elecfans.com/web2/M00/A9/CD/poYBAGR9O9WAJnnkAAdSBrW48yA985.svg

圖 14.9.1語義分割中圖像的狗、貓和背景的標(biāo)簽。

14.9.1。圖像分割和實例分割

計算機視覺領(lǐng)域還有兩個與語義分割類似的重要任務(wù)，即圖像分割和實例分割。我們將如下簡要地將它們與語義分割區(qū)分開來。

圖像分割將圖像分成幾個組成區(qū)域。這類問題的方法通常利用圖像中像素之間的相關(guān)性。它在訓(xùn)練時不需要圖像像素的標(biāo)簽信息，也不能保證分割后的區(qū)域在預(yù)測時具有我們希望得到的語義。以圖 14.9.1中的圖像作為輸入，圖像分割可以將狗分成兩個區(qū)域：一個覆蓋以黑色為主的嘴巴和眼睛，另一個覆蓋以黃色為主的身體其余部分。
實例分割也稱為同時檢測和分割。它研究如何識別圖像中每個對象實例的像素級區(qū)域。與語義分割不同，實例分割不僅需要區(qū)分語義，還需要區(qū)分不同的對象實例。例如，如果圖像中有兩只狗，實例分割需要區(qū)分一個像素屬于這兩只狗中的哪一只。

14.9.2。Pascal VOC2012 語義分割數(shù)據(jù)集

最重要的語義分割數(shù)據(jù)集之一是Pascal VOC2012。下面，我們將看看這個數(shù)據(jù)集。

							%matplotlib inline
import os
import torch
import torchvision
from d2l import torch as d2l

							 

							%matplotlib inline
import os
from mxnet import gluon, image, np, npx
from d2l import mxnet as d2l

npx.set_np()

							 

數(shù)據(jù)集的 tar 文件大約 2 GB，因此下載文件可能需要一段時間。提取的數(shù)據(jù)集位于 ../data/VOCdevkit/VOC2012.

							#@save
d2l.DATA_HUB['voc2012'] = (d2l.DATA_URL + 'VOCtrainval_11-May-2012.tar',
              '4e443f8a2eca6b1dac8a6c57641b67dd40621a49')

voc_dir = d2l.download_extract('voc2012', 'VOCdevkit/VOC2012')

							Downloading ../data/VOCtrainval_11-May-2012.tar from http://d2l-data.s3-accelerate.amazonaws.com/VOCtrainval_11-May-2012.tar...

						

							#@save
d2l.DATA_HUB['voc2012'] = (d2l.DATA_URL + 'VOCtrainval_11-May-2012.tar',
              '4e443f8a2eca6b1dac8a6c57641b67dd40621a49')

voc_dir = d2l.download_extract('voc2012', 'VOCdevkit/VOC2012')

進入路徑后../data/VOCdevkit/VOC2012，我們可以看到數(shù)據(jù)集的不同組成部分。該ImageSets/Segmentation路徑包含指定訓(xùn)練和測試樣本的文本文件，而 JPEGImages和SegmentationClass路徑分別存儲每個示例的輸入圖像和標(biāo)簽。這里的label也是image格式的，和它的labeled input image大小一樣。此外，任何標(biāo)簽圖像中具有相同顏色的像素屬于同一語義類。下面定義了read_voc_images將所有輸入圖像和標(biāo)簽讀入內(nèi)存的函數(shù)。

							#@save
def read_voc_images(voc_dir, is_train=True):
  """Read all VOC feature and label images."""
  txt_fname = os.path.join(voc_dir, 'ImageSets', 'Segmentation',
               'train.txt' if is_train else 'val.txt')
  mode = torchvision.io.image.ImageReadMode.RGB
  with open(txt_fname, 'r') as f:
    images = f.read().split()
  features, labels = [], []
  for i, fname in enumerate(images):
    features.append(torchvision.io.read_image(os.path.join(
      voc_dir, 'JPEGImages', f'{fname}.jpg')))
    labels.append(torchvision.io.read_image(os.path.join(
      voc_dir, 'SegmentationClass' ,f'{fname}.png'), mode))
  return features, labels

train_features, train_labels = read_voc_images(voc_dir, True)

							 

							#@save
def read_voc_images(voc_dir, is_train=True):
  """Read all VOC feature and label images."""
  txt_fname = os.path.join(voc_dir, 'ImageSets', 'Segmentation',
               'train.txt' if is_train else 'val.txt')
  with open(txt_fname, 'r') as f:
    images = f.read().split()
  features, labels = [], []
  for i, fname in enumerate(images):
    features.append(image.imread(os.path.join(
      voc_dir, 'JPEGImages', f'{fname}.jpg')))
    labels.append(image.imread(os.path.join(
      voc_dir, 'SegmentationClass', f'{fname}.png')))
  return features, labels

train_features, train_labels = read_voc_images(voc_dir, True)

							 

我們繪制前五個輸入圖像及其標(biāo)簽。在標(biāo)簽圖像中，白色和黑色分別代表邊框和背景，而其他顏色對應(yīng)不同的類別。

							n = 5
imgs = train_features[:n] + train_labels[:n]
imgs = [img.permute(1,2,0) for img in imgs]
d2l.show_images(imgs, 2, n);

							 

https://file.elecfans.com/web2/M00/A9/00/poYBAGR4YpiAUiS-AAFQfESlL94544.png

							n = 5
imgs = train_features[:n] + train_labels[:n]
d2l.show_images(imgs, 2, n);

							 

接下來，我們枚舉該數(shù)據(jù)集中所有標(biāo)簽的 RGB 顏色值和類名。

							#@save
VOC_COLORMAP = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0],
        [0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128],
        [64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0],
        [64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128],
        [0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0],
        [0, 64, 128]]

#@save
VOC_CLASSES = ['background', 'aeroplane', 'bicycle', 'bird', 'boat',
        'bottle', 'bus', 'car', 'cat', 'chair', 'cow',
        'diningtable', 'dog', 'horse', 'motorbike', 'person',
        'potted plant', 'sheep', 'sofa', 'train', 'tv/monitor']

							 

							#@save
VOC_COLORMAP = [[0, 0, 0], [128, 0, 0], [0, 128, 0], [128, 128, 0],
        [0, 0, 128], [128, 0, 128], [0, 128, 128], [128, 128, 128],
        [64, 0, 0], [192, 0, 0], [64, 128, 0], [192, 128, 0],
        [64, 0, 128], [192, 0, 128], [64, 128, 128], [192, 128, 128],
        [0, 64, 0], [128, 64, 0], [0, 192, 0], [128, 192, 0],
        [0, 64, 128]]

#@save
VOC_CLASSES = ['background', 'aeroplane', 'bicycle', 'bird', 'boat',
        'bottle', 'bus', 'car', 'cat', 'chair', 'cow',
        'diningtable', 'dog', 'horse', 'motorbike', 'person',
        'potted plant', 'sheep', 'sofa', 'train', 'tv/monitor']

							 

使用上面定義的兩個常量，我們可以方便地找到標(biāo)簽中每個像素的類索引。我們定義了voc_colormap2label 構(gòu)建從上述 RGB 顏色值到類索引的映射的函數(shù)，以及voc_label_indices將任何 RGB 值映射到此 Pascal VOC2012 數(shù)據(jù)集中它們的類索引的函數(shù)。

							#@save
def voc_colormap2label():
  """Build the mapping from RGB to class indices for VOC labels."""
  colormap2label = torch.zeros(256 ** 3, dtype=torch.long)
  for i