本次CS代写的主要涉及如下领域: Deep Learning代写
Assignment 5
Zahra Sheikhbahaee
Zeou Hu & Colin Vandenhof
April 2020
1 Convolutional Neural Networks Basics
Consider the following CNN architecture:
conv:
100 filters
size 3× 3
stride 1
input:
256 × 256 × 3
conv:
50 filters
size 5× 5
stride 3
pooling:
size 2× 2
conv:
30 filters
size 3× 3
stride 1
pooling:
size 2× 4
fully connected:
10 outputs flattened
In all layers we donotadd the bias term (for simplicity), and the nonlinear activations are omitted throughout. Note that thethirddimension of each conv filter is omitted as it can be inferred from the previous layer, while pooling (with each pixel appearing exactly once in all pooling windows) is performed on each channel/feature map separately. Answer the following questions. [For (b) (c) (d), you will receive partial credit if you correctly derive some of the intermediate results.]
- (5 pts) What can we do to achieve the same output size (excluding the dimension of channels/feature maps) as the input size for each convolu- tional layer? Explain using a filter with size (2w+ 1)×(2h+ 1) and stride 1 on an input image with sizeW×H.
- (5 pts)Withoutpadding what is the output size of each intermediate layer (except the first input layer and the last fully connected layer)?
- (5 pts) What is the receptive field of a single (output) neuron on thelast convolutional layer(ignoring any boundary issue)?
- (5 pts) How many parameters do the above architecture have? For this exercise we assume the output image size of each convolutional layer is the same as its input image size (except the number of channels, of course).
2 CNN Implementation
Note: Please mention your Python version (and maybe the version of all other packages) in the code. In this exercise you are going to run some experiments involving CNNs. You need to know Python and install the following libraries: PyTorch, Numpy and all their dependencies. You can find detailed instructions and tutorials for each of these libraries on the respective websites. For all experiments, running on CPU is sufficient. You do not need to run the code on GPUs. Before start, we suggest you review what we learned about each layer in CNN, and read at least this tutorial.
- Train a VGG11 net on the MNIST dataset. VGG11 was an earlier version of VGG16 and can be found as model A in Table 1 of this paper, whose Section 2.1 also gives you all the details about each layer. The goal is to get as close to 0 loss as possible. Note that our input dimension is different from the VGG paper. You need to resize each image in MNIST from its original size 28×28 to 32×32 [why?], and it might be necessary to change at least one other layer of VGG11. [This experiment will take up to 1 hour on a CPU, so please be cautious of your time. If this running time is not bearable, you may cut the training set by 1/10, so only have∼600 images per class instead of the regular∼6000.]
- Once you’ve done the above, the next goal is to inspect the training pro- cess. Create the following plots:
(a) test accuracy vs the number of epochs (say 3∼5)
(b) training accuracy vs the number of epochs
(c) test loss vs the number of epochs
(d) training loss vs the number of epochs
[If running more than 1 epoch is computationally infeasible, simply run 1
epoch and try to record the accuracy / loss every few minibatches.]
3 Recurrent Neural Network Implementation
In this question, you will implement a recurrent neural network to classify
Twitter messages as positive or negative sentiment. You are provided two
files,tweetstrain.csvandtweetstest.csvthat contain the training
data and test data. Each line of the files consists of a tweet and its
sentiment, either negative (0) or positive (1).
To classify the tweets, implement an LSTM using PyTorch. Each tweet
should be treated as a sequence of ASCII characters. The input to the
LSTM at each step should be a one-hot encoding of the ASCII character.
Since there are 128 ASCII characters, this should be a vector of size 128.
The size of the hidden layer should also be 128. Use a linear layer to map
the final hidden layer to the output (size 2). Finally, apply the softmax
function to obtain class probabilities.
Use negative log likelihood as the loss function with stochastic gradient
descent or any related optimization algorithm. Train on the training set
using a batch size of 32 and a learning rate of 0.0001 for at least 100000
iterations. Plot curves for (a) the test set accuracy vs. number of iterations
and (b) the training set accuracy vs. number of iterations. You may
compute accuracy every 1000 iterations to speed up the process.
4 Variational Auto-Encoder
In Variational autoencoder, we optimize the evidence lower bound
L(θ,φ;x) =−DKL[qφ(z|x)||pθ(z)] +Eqφ(z|x)
[
logpθ(x|z)
]
whereqφ(z|x) is the variational distribution with variational parameterφwhich approximates the posteriorp(z|x).
- Given z ∈R^1 , p(z) ∼ N(0,1) and q(z|x) ∼ N(μz,σ^2 z), write down DKL[qφ(z|x)||pθ(z)] in terms ofμzandσz.
- Assumingqφ(z|x) is a Gaussian, the decoder network computes its mean μzand its varianceσ^2. Why do we modelσz^2 in log space using neural networks instead of directly modelσz^2? Why do we need the reparame- terization trick, instead of directly sampling from the latent distribution N(μz,σz^2 )?
- For decoder, we use sometimes a Multi-Layer Perceptions with either Bernoulli (in case of binary data) or Gaussian (in case of real-valued data) outputs. The expected reconstruction error (cross-entropy term) Eqφ(z|x(i))
[
logpθ(x(i)|z)
]
can be estimated by sampling, that is
Eqφ(z|x(i))
[
logpθ(x(i)|z)
]
=
1
L
∑L
l=
(
logpθ(x(i)|z)
)
if dataxgivenzfollows a multivariate Bernoulli with dimensionD, how
should this reconstruction loss term look like?
5 The Final Small-Project
This part of assignment should be done by the students who decided to attend the final exam. Collaboration policy: This assignment should be done individually.
5.1 Implementation of VAE
In the penultimate lecture, we have learned about approximating posteriors with variational inference, using the reparameterization trick for VI, and deep gen- erative models for images using variational autoencoders. We also learnt more about convolutional neural networks in the final lecture and their applications in the computer vision. In this project, you will bring together both constructs to train a model that can generate images of different classes of galaxies. The data is available here. The training images are JPG images of 61578 galaxies. The probability distributions for the classifications for each of the training images are given in solutionstrainingrev1. There are 37 different classes of galaxy types and at the end of training process, you should be able to generate these categories and compare with the original images. You’ll need to install PyTorch to use the starter code. The objective is to minimize the reconstruction error using cross-entropy and the Kullback–Leibler divergence. In the following you’ll find a code snippet that you can start to complete and make it work for this dataset.
import torch import torch.utils.data from torch import nn, optim from torch.nn import functional as F from torchvision import datasets, transforms from torchvision.utils import save_image
class VAE(nn.Module): def init(self, input_chanel=3, zdim=512, image_size=424): super(VAE, self).init() self.z_dim = z_dim
encoder part
self.encoder_conv1 = nn.Conv2d(input_chanel, zdim//16, kernel_size=4, stride=2, padding=1) self.encoder_bn1 = nn.BatchNorm2d(zdim//16) #You must extend this part
#decoder part
self.decoder_conv1 = nn.ConvTranspose2d(zdim, zdim//2, kernel_size=4, stride=1,
padding=0)
self.decoder_bn1 = nn.BatchNorm2d(zdim//2)
#You must extend this part
def encode(self, x):
x1 = F.leaky_relu(self.encoder_bn1(self.encoder_conv1(x)), negative_slope=0.2)
#Fill this part with some CNN architecture
def decode(self, z):
#complete this part
def reparameterize(self, z_means, z_log_var):
epsilon * variance + means
#write down the appropriate code def forward(self, x):
encoder
return x_recon, mu, logvar model = VAE()
Loss and Optimizer
optimizer =#fill in this part
Start training
for epoch in range(epoch): for i, data in enumerate(training_data_loader):
Forward pass
x = data
x_reconst, mu, log_var = model(x)
# reconstruction loss between x_reconst and x
#write the appropriate code
# kl divergence
kl_div =#write down the correct objective function
# Backprop and optimize
loss = reconst_loss + kl_div
optimizer.zero_grad()
loss.backward()
optimizer.step()
First, we want to train decoder parametersθand encoder parametersφto have accurate reconstructions. Second, we wish to build a probabilistic model on top of an autoencoder, so that we can reason about our uncertainty over the code space.
- Forgoal 1, we will simply produce a point estimate of the encoder and decoder parametersθ(following the principle of minimizing reconstruction error). We wish to find the decoder and encoder parameters that, for the training data at hand, minimize reconstruction error meaning that a good autoencoder must have lowbinary cross entropy(you can change the reconstruction loss). You should adjust the learning rate–lrand potentially other keyword arguments as well. Then train for at least 200 epochs (or more if you don’t see convergence).
- Forgoal 2, we’d like to be ”more Bayesian”, so we’ll assume a full gener- ative model for our data.
Plot reconstruction error (y-axis) on both train and test sets versus the
number of training iterations (x-axis). Show performance with 3 different
VAE encoder/decoder architectures: deep feed-forward neural network
with 512 hidden units. Then a five convolutional layers architecture which
starts with 32 filters and double it each time and finally build up a ResNet
block from scratch and add five ResNet blocks to your VAE model as well.
For the further example you can take a look at this paper.
- Compare sampled images of a particular class of galaxy with their original images.
- Illustrate the location of different categories of galaxies within the latent space. One way to understand what is represented in the latent space is to consider where it encodes elements of the data. Here you will produce a scatter plot in the latent space, where each point in the plot will be the mean vector for the distributionq(z|x) given by the encoder. Further, we will colour each point in the plot by the class label for the input data. You must include the visualization result of the weight matrix for different layer.
- For this assignment, any new idea that would be tried and not mentioned in the description which leads to a significant improvement of result or including the idea of this paper, would have extra bonus.