如何用pytorch構建自己的數據

01-30

用pytorch構建自己的數據，可以簡潔的完成一下四個工序

存儲自己的數據
transform
batch做批量
shuffle

視頻筆記（簡潔版）（完整冗長版 part1, part2, part3, part4）

# -*- coding: utf-8 -*-n"""nData Loading and Processing Tutorialn====================================n**Author**: `Sasank Chilamkurthy <https://chsasank.github.io>`_nnA lot of effort in solving any machine learning problem goes in tonpreparing the data. PyTorch provides many tools to make data loadingneasy and hopefully, to make your code more readable. In this tutorial,nwe will see how to load and preprocess/augment data from a non trivialndataset.nnTo run this tutorial, please make sure the following packages areninstalled:nn- ``scikit-image``: For image io and transformsn- ``pandas``: For easier csv parsingnn"""nnfrom __future__ import print_function, divisionnimport osnimport torchnimport pandas as pdnfrom skimage import io, transformnimport numpy as npnimport matplotlib.pyplot as pltnfrom torch.utils.data import Dataset, DataLoadernfrom torchvision import transforms, utilsnn# Ignore warningsnimport warningsnwarnings.filterwarnings("ignore")nnplt.ion() # interactive modenn######################################################################n# The dataset we are going to deal with is that of facial pose.n# This means that a face is annotated like this:n#n# .. figure:: /_static/img/landmarked_face2.pngn# :width: 400n#n# Over all, 68 different landmark points are annotated for each face.n#n# .. note::n# Download the dataset from `here <https://download.pytorch.org/tutorial/faces.zip>`_n# so that the images are in a directory named faces/.n# This dataset was actuallyn# generated by applying excellent `dlibs posen# estimation <http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html>`__n# on a few images from imagenet tagged as face.n#n# Dataset comes with a csv file with annotations which looks like this:n#n# ::n#n# image_name,part_0_x,part_0_y,part_1_x,part_1_y,part_2_x, ... ,part_67_x,part_67_yn# 0805personali01.jpg,27,83,27,98, ... 84,134n# 1084239450_e76e00b7e7.jpg,70,236,71,257, ... ,128,312n#n# Lets quickly read the CSV and get the annotations in an (N, 2) array where Nn# is the number of landmarks.n#nnlandmarks_frame = pd.read_csv(/Users/Natsume/Desktop/data/faces/face_landmarks.csv)nni = 65nimg_name = landmarks_frame.ix[i, 0]nlandmarks = landmarks_frame.ix[i, 1:].as_matrix().astype(float)nlandmarks = landmarks.reshape(-1, 2)nnprint(Image name: {}.format(img_name))nprint(Landmarks shape: {}.format(landmarks.shape))nprint(First 4 Landmarks: {}.format(landmarks[:4]))nnn######################################################################n# Lets write a simple helper function to show an image and its landmarksn# and use it to show a sample.n#nndef show_landmarks(image, landmarks):n """Show image with landmarks"""n plt.imshow(image)n plt.scatter(landmarks[:, 0], landmarks[:, 1], s=10, marker=., c=r)n plt.pause(0.001) # pause a bit so that plots are updatednnplt.figure()nshow_landmarks(io.imread(os.path.join(/Users/Natsume/Desktop/data/faces/, img_name)),n landmarks)nplt.show()nnn######################################################################n# Dataset classn# -------------n#n# ``torch.utils.data.Dataset`` is an abstract class representing an# dataset.n# Your custom dataset should inherit ``Dataset`` and override the followingn# methods:n#n# - ``__len__`` so that ``len(dataset)`` returns the size of the dataset.n# - ``__getitem__`` to support the indexing such that ``dataset[i]`` cann# be used to get :math:`i` th samplen#n# Lets create a dataset class for our face landmarks dataset. We willn# read the csv in ``__init__`` but leave the reading of images ton# ``__getitem__``. This is memory efficient because all the images are notn# stored in the memory at once but read as required.n#n# Sample of our dataset will be a dictn# ``{image: image, landmarks: landmarks}``. Our datset will take ann# optional argument ``transform`` so that any required processing can ben# applied on the sample. We will see the usefulness of ``transform`` in then# next section.n#nnclass FaceLandmarksDataset(Dataset):n """Face Landmarks dataset."""nn def __init__(self, csv_file, root_dir, transform=None):n """n Args:n csv_file (string): Path to the csv file with annotations.n root_dir (string): Directory with all the images.n transform (callable, optional): Optional transform to be appliedn on a sample.n """n self.landmarks_frame = pd.read_csv(csv_file)n self.root_dir = root_dirn self.transform = transformnn def __len__(self):n return len(self.landmarks_frame)nn def __getitem__(self, idx):n img_name = os.path.join(self.root_dir, self.landmarks_frame.ix[idx, 0])n image = io.imread(img_name)n landmarks = self.landmarks_frame.ix[idx, 1:].as_matrix().astype(float)n landmarks = landmarks.reshape(-1, 2)n sample = {image: image, landmarks: landmarks}nn if self.transform:n sample = self.transform(sample)nn return samplennn######################################################################n# Lets instantiate this class and iterate through the data samples. Wen# will print the sizes of first 4 samples and show their landmarks.n#nnface_dataset = FaceLandmarksDataset(csv_file=/Users/Natsume/Desktop/data/faces/face_landmarks.csv,n root_dir=/Users/Natsume/Desktop/data/faces/)nnfig = plt.figure()nnfor i in range(len(face_dataset)):n sample = face_dataset[i]nn print(i, sample[image].shape, sample[landmarks].shape)nn ax = plt.subplot(1, 4, i + 1)n plt.tight_layout()n ax.set_title(Sample #{}.format(i))n ax.axis(off)n show_landmarks(**sample)nn if i == 3:n plt.show()n breaknnn######################################################################n# Transformsn# ----------n#n# One issue we can see from the above is that the samples are not of then# same size. Most neural networks expect the images of a fixed size.n# Therefore, we will need to write some prepocessing code.n# Lets create three transforms:n#n# - ``Rescale``: to scale the imagen# - ``RandomCrop``: to crop from image randomly. This is datan# augmentation.n# - ``ToTensor``: to convert the numpy images to torch images (we need ton# swap axes).n#n# We will write them as callable classes instead of simple functions son# that parameters of the transform need not be passed everytime itsn# called. For this, we just need to implement ``__call__`` method andn# if required, ``__init__`` method. We can then use a transform like this:n#n# ::n#n# tsfm = Transform(params)n# transformed_sample = tsfm(sample)n#n# Observe below how these transforms had to be applied both on the image andn# landmarks.n#nnclass Rescale(object):n """Rescale the image in a sample to a given size.nn Args:n output_size (tuple or tuple): Desired output size. If tuple, output isn matched to output_size. If int, smaller of image edges is matchedn to output_size keeping aspect ratio the same.n """nn def __init__(self, output_size):n assert isinstance(output_size, (int, tuple))n self.output_size = output_sizenn def __call__(self, sample):n image, landmarks = sample[image], sample[landmarks]nn h, w = image.shape[:2]n if isinstance(self.output_size, int):n if h > w:n new_h, new_w = self.output_size * h / w, self.output_sizen else:n new_h, new_w = self.output_size, self.output_size * w / hn else:n new_h, new_w = self.output_sizenn new_h, new_w = int(new_h), int(new_w)nn img = transform.resize(image, (new_h, new_w))nn # h and w are swapped for landmarks because for images,n # x and y axes are axis 1 and 0 respectivelyn landmarks = landmarks * [new_w / w, new_h / h]nn return {image: img, landmarks: landmarks}nnnclass RandomCrop(object):n """Crop randomly the image in a sample.nn Args:n output_size (tuple or int): Desired output size. If int, square cropn is made.n """nn def __init__(self, output_size):n assert isinstance(output_size, (int, tuple))n if isinstance(output_size, int):n self.output_size = (output_size, output_size)n else:n assert len(output_size) == 2n self.output_size = output_sizenn def __call__(self, sample):n image, landmarks = sample[image], sample[landmarks]nn h, w = image.shape[:2]n new_h, new_w = self.output_sizenn top = np.random.randint(0, h - new_h)n left = np.random.randint(0, w - new_w)nn image = image[top: top + new_h,n left: left + new_w]nn landmarks = landmarks - [left, top]nn return {image: image, landmarks: landmarks}nnnclass ToTensor(object):n """Convert ndarrays in sample to Tensors."""nn def __call__(self, sample):n image, landmarks = sample[image], sample[landmarks]nn # swap color axis becausen # numpy image: H x W x Cn # torch image: C X H X Wn image = image.transpose((2, 0, 1))n return {image: torch.from_numpy(image),n landmarks: torch.from_numpy(landmarks)}nnn######################################################################n# Compose transformsn# ~~~~~~~~~~~~~~~~~~n#n# Now, we apply the transforms on an sample.n#n# Lets say we want to rescale the shorter side of the image to 256 andn# then randomly crop a square of size 224 from it. i.e, we want to composen# ``Rescale`` and ``RandomCrop`` transforms.n# ``torchvision.transforms.Compose`` is a simple callable class which allows usn# to do this.n#nnscale = Rescale(256)ncrop = RandomCrop(128)ncomposed = transforms.Compose([Rescale(256),n RandomCrop(224)])nn# Apply each of the above transforms on sample.nfig = plt.figure()nsample = face_dataset[65]nfor i, tsfrm in enumerate([scale, crop, composed]):n transformed_sample = tsfrm(sample)nn ax = plt.subplot(1, 3, i + 1)n plt.tight_layout()n ax.set_title(type(tsfrm).__name__)n show_landmarks(**transformed_sample)nnplt.show()nnn######################################################################n# Iterating through the datasetn# -----------------------------n#n# Lets put this all together to create a dataset with composedn# transforms.n# To summarize, every time this dataset is sampled:n#n# - An image is read from the file on the flyn# - Transforms are applied on the read imagen# - Since one of the transforms is random, data is augmentated onn# samplingn#n# We can iterate over the created dataset with a ``for i in range``n# loop as before.n#nntransformed_dataset = FaceLandmarksDataset(csv_file=/Users/Natsume/Desktop/data/faces/face_landmarks.csv,n root_dir=/Users/Natsume/Desktop/data/faces/,n transform=transforms.Compose([n Rescale(256),n RandomCrop(224),n ToTensor()n ]))nnfor i in range(len(transformed_dataset)):n sample = transformed_dataset[i]nn print(i, sample[image].size(), sample[landmarks].size())nn if i == 3:n breaknnn######################################################################n# However, we are losing a lot of features by using a simple ``for`` loop ton# iterate over the data. In particular, we are missing out on:n#n# - Batching the datan# - Shuffling the datan# - Load the data in parallel using ``multiprocessing`` workers.n#n# ``torch.utils.data.DataLoader`` is an iterator which provides all thesen# features. Parameters used below should be clear. One parameter ofn# interest is ``collate_fn``. You can specify how exactly the samples needn# to be batched using ``collate_fn``. However, default collate should workn# fine for most use cases.n#nndataloader = DataLoader(transformed_dataset, batch_size=4,n shuffle=True, num_workers=4)nnn# Helper function to show a batchndef show_landmarks_batch(sample_batched):n """Show image with landmarks for a batch of samples."""n images_batch, landmarks_batch = n sample_batched[image], sample_batched[landmarks]n batch_size = len(images_batch)n im_size = images_batch.size(2)nn grid = utils.make_grid(images_batch)n plt.imshow(grid.numpy().transpose((1, 2, 0)))nn for i in range(batch_size):n plt.scatter(landmarks_batch[i, :, 0].numpy() + i * im_size,n landmarks_batch[i, :, 1].numpy(),n s=10, marker=., c=r)nn plt.title(Batch from dataloader)nnfor i_batch, sample_batched in enumerate(dataloader):n print(i_batch, sample_batched[image].size(),n sample_batched[landmarks].size())nn # observe 4th batch and stop.n if i_batch == 3:n plt.figure()n show_landmarks_batch(sample_batched)n plt.axis(off)n plt.ioff()n plt.show()n breaknn######################################################################n# Afterword: torchvisionn# ----------------------n#n# In this tutorial, we have seen how to write and use datasets, transformsn# and dataloader. ``torchvision`` package provides some common datasets andn# transforms. You might not even have to write custom classes. One of then# more generic datasets available in torchvision is ``ImageFolder``.n# It assumes that images are organized in the following way: ::n#n# root/ants/xxx.pngn# root/ants/xxy.jpegn# root/ants/xxz.pngn# .n# .n# .n# root/bees/123.jpgn# root/bees/nsdf3.pngn# root/bees/asd932_.pngn#n# where ants, bees etc. are class labels. Similarly generic transformsn# which operate on ``PIL.Image`` like ``RandomHorizontalFlip``, ``Scale``,n# are also avaiable. You can use these to write a dataloader like this: ::n#n# import torchn# from torchvision import transforms, datasetsn#n# data_transform = transforms.Compose([n# transforms.RandomSizedCrop(224),n# transforms.RandomHorizontalFlip(),n# transforms.ToTensor(),n# transforms.Normalize(mean=[0.485, 0.456, 0.406],n# std=[0.229, 0.224, 0.225])n# ])n# hymenoptera_dataset = datasets.ImageFolder(root=hymenoptera_data/train,n# transform=data_transform)n# dataset_loader = torch.utils.data.DataLoader(hymenoptera_dataset,n# batch_size=4, shuffle=True,n# num_workers=4)n#n# For an example with training code, please seen# :doc:`transfer_learning_tutorial`.n

返回一句一句讀代碼