一篇深入浅出的对比分析文章：通过实例代码详解VAE和GAN的区别与联系

最编程 2024-07-24 10:00:02

...

前面介绍了VAE-GAN 论文:Autoencoding beyond pixels usingALearnedSimilarityMmetric及视频

这篇文章通过代码介绍了VAE-GAN，特色如下：

1 多GPU

2 学习rate动态改变！

3 隐变量空间可视化

4 特征向量代数计算

5 神经元激活可视化

6 训练学习快

效果：

微信代码格式不好看，可以阅读原文访问原文章：https://github.com/timsainb/Tensorflow-MultiGPU-VAE-GAN

Tensorflow Multi-GPU VAE-GAN implementation

This is an implementation of the VAE-GAN based on the implementation described in Autoencoding beyond pixels using a learned similarity metric ref论文:Autoencoding beyond pixels usingALearnedSimilarityMmetric及视频
I implement a few useful things like
- Visualizing Movement Through Z-Space 可视化
- Latent Space Algebra 变量空间技术
- Spike Triggered Average Style Receptive Fields 神经元激活区域

How does a VAE-GAN work?

We have three networks, an Encoder, a Generator, and a Discriminator.
- The Encoder learns to map input x onto z space (latent space)
- The Generator learns to generate x from z space
- The Discriminator learns to discriminate whether the image being put in is real, or generated

Diagram of basic network input and output

l_x_tilde and l_x here become layers of high level features that the discriminator learns.

we train the network to minimize the difference between the high level features of x and x_tilde
This is basically an autoencoder that works on high level features rather than pixels
Adding this autoencoder to a GAN helps to stabilize the GAN

Training

ref

Train Encoder on minimization of:

kullback_leibler_loss(z_x, gaussian)
mean_squared_error(l_x_tilde_, l_x)

Train Generator on minimization of:

kullback_leibler_loss(z_x, gaussian)
mean_squared_error(l_x_tilde_, l_x)
-1*log(d_x_p)

Train Discriminator on minimization of:

-1*log(d_x) + log(1 - d_x_p)

Which GPUs are we using?

Set gpus to a list of the GPUs you're using. The network will then split up the work between those gpus

gpus = [2] # Here I set CUDA to only see one GPU
os.environ["CUDA_VISIBLE_DEVICES"]=','.join([str(i) for i in gpus])
num_gpus = len(gpus) # number of GPUs to use

Reading the dataset from HDF5 format

open `makedataset.ipynb' for instructions on how to build the dataset

A data iterator for batching (drawn up by Luke Metz)

https://indico.io/blog/tensorflow-data-inputs-part1-placeholders-protobufs-queues/

iter_ = data_iterator()

iter_ = data_iterator()

#face_batch, label_batch

Draw out the architecture of our network

Each of these functions represent the Encoder, Generator, and Discriminator described above.
It would be interesting to try and implement the inception architecture to do the same thing, next time around:

They describe how to implement inception, in prettytensor, here: https://github.com/google/prettytensor

各个神经网络配置：

Defining the forward pass through the network 前向计算

This function is based upon the inference function from tensorflows cifar tutorials
- https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/models/image/cifar10/cifar10.py
Notice I use with tf.variable_scope("enc"). This way, we can reuse these variables using reuse=True. We can also specify which variables to train using which error functions based upon the label enc

def inference(x):
    """
    Run the models. Called inference because it does the same thing as tensorflow's cifar tutorial
    """
    z_p =  tf.random_normal((batch_size, hidden_size), 0, 1) # normal dist for GAN
    eps = tf.random_normal((batch_size, hidden_size), 0, 1) # normal dist for VAE

    with pt.defaults_scope(activation_fn=tf.nn.elu,
                               batch_normalize=True,
                               learned_moments_update_rate=0.0003,
                               variance_epsilon=0.001,
                               scale_after_normalization=True):

        with tf.variable_scope("enc"):         
                z_x_mean, z_x_log_sigma_sq = encoder(x) # get z from the input      
        with tf.variable_scope("gen"):
            z_x = tf.add(z_x_mean, 
                tf.mul(tf.sqrt(tf.exp(z_x_log_sigma_sq)), eps)) # grab our actual z
            x_tilde = generator(z_x)  
        with tf.variable_scope("dis"):   
            _, l_x_tilde = discriminator(x_tilde)
        with tf.variable_scope("gen", reuse=True):         
            x_p = generator(z_p)    
        with tf.variable_scope("dis", reuse=True):
            d_x, l_x = discriminator(x)  # positive examples              
        with tf.variable_scope("dis", reuse=True):
            d_x_p, _ = discriminator(x_p)  
        return z_x_mean, z_x_log_sigma_sq, z_x, x_tilde, l_x_tilde, x_p, d_x, l_x, d_x_p, z_p

ref：上面的计算变量和下图对应。

Loss - define our various loss functions

SSE - we don't actually use this loss (also its the MSE), its just to see how close x is to x_tilde
KL Loss - our VAE gaussian distribution loss.
- See https://arxiv.org/abs/1312.6114
D_loss - Our descriminator loss, how good the discriminator is at telling if something is real
G_loss - essentially the opposite of the D_loss, how good the generator is a tricking the discriminator
notice we clip our values to make sure learning rates don't explode

Average the gradients between towers

This function is taken directly from
- https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/models/image/cifar10/cifar10_multi_gpu_train.py
Basically we're taking a list of gradients from each tower, and averaging them together

Plot network output

This is just my ugly function to regularly plot the output of my network - tensorboard would probably be a better option for this

.........................略

With your graph, define what a step is (needed for multi-gpu), and what your optimizers are for each of your networks

动态学习率；学习率后面动态生成

Run all of the functions we defined above

tower_grads_e defines the list of gradients for the encoder for each tower
For each GPU we grab parameters corresponding to each network, we then calculate the gradients, and add them to the twoers to be averaged

Now lets actually run our session

with graph.as_default():

    # Start the Session
    init = tf.initialize_all_variables()
    saver = tf.train.Saver() # initialize network saver
    sess = tf.InteractiveSession(graph=graph,config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True))
    sess.run(init)

Get some example data to do visualizations with

example_data, _ = iter_.next()
np.shape(example_data)

(32, 12288)

Initialize our epoch number, and restore a saved network by uncommening `#tf.train...`

epoch = 0
tf.train.Saver.restore(saver, sess, 'models/faces_multiGPU_64_0000.tfmod')

Now we actually run the network

Importantly, notice how we define the learning rates
- we calculate the sigmoid of how the network has been performing, and squash the learning rate using a sigmoid based on that. So if the discriminator has been winning, it's learning rate will be low, and if the generator is winning, it's learning rate will be lower on the next batch. 学习率动态处理
- e_current_lr = e_learning_rate*sigmoid(np.mean(d_real),-.5,10)