Adversarial and Generative

By LI Haoyang 2020.12.7

Content

Adversarial and Generative ContentAdversarial Example for Generative PurposeImage Synthesis with a Single (Robust) Classifier - 2019ApproachRealistic Image GenerationImage InpaintingImage-to-Image TranslationSuper-ResolutionInteractive Image ManipulationSketch-to-imageFeature PaintingInspirationsAdversarial Robustness as a Prior for Learned Representations - 2019Limitations of Standard RepresentationsProperties and Applications of Robust RepresentationsRobust representations are (approximately) invertible out of the boxRepresentation proximity seems to entail semantic similarityInversion of out-of-distribution inputsInterpolation between arbitrary inputsDirect feature visualizationInspirationGenerative Approach for Adversarial RobustnessAnalysis by Synthesis (a Bayesian Classifier for MNIST) - 2018Analysis by SynthesisTight estimate of the lower bound for adversarial examplesExperimentsInspirations

Adversarial Example for Generative Purpose

Image Synthesis with a Single (Robust) Classifier - 2019

Code: https://git.io/robust-apps

Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Andrew Ilyas, Logan Engstrom, Aleksander Madry. Image Synthesis with a Single (Robust) Classifier. arXiv preprint 2019 arXiv:1906.09453

This paper demonstrates the possibility of using a adversarially trained classifier to turn adversarial examples into semantically meaningful synthesis of images.

In contrast to other state-of-the-art approaches, the toolkit we develop is rather minimal: it uses a single, off-the-shelf classifier for all these tasks. The crux of our approach is that we train this classifier to be adversarially robust.

In this work, we demonstrate that basic classification tools alone suffice to tackle various image synthesis tasks. These tasks include (cf. Figure 1):
generation (Section 3.1)
inpainting (Section 3.2)
image-to-image translation (Section 3.3)
super-resolution (Section 3.4)
interactive image manipulation (Section 3.5).

It's surprising but expected, since the gradient captures semantic information, the classifier should be able to tackle semantic manipulating tasks.

Approach

Our approach is remarkably simple: all the applications are performed using gradient ascent on class scores derived from the same robustly trained classifier.

Basically, they conduct an adversarial attack on an adversarially trained classifier, i.e. classifier trained using the robust optimization objective:

\Bbb{E}_{(x,y)\sim\mathcal{D}}\left[\max_{\delta\in\Delta}\mathcal{L}(x+\delta, y)\right]

$\Delta$ captures imperceptible changes.

This process can be viewed as encoding priors into the model, preventing it from relying on imperceptible features of the input.

Indeed, the findings of Tsipras et al. [Tsi+19] are aligned with this viewpoint—by encouraging the model to be invariant to small perturbations, robust training ensures that changes in the model’s predictions correspond to salient input changes.

it turns out that this phenomenon also emerges when we maximize the probability of a specific class (targeted attacks) for a robust model.

Realistic Image Generation

The purpose of realistic image generation is:

Given a set of example inputs, we would like to learn a model that can produce novel perceptually-plausible inputs.

Many of these methods, however, can be tricky to train and properly tune. They are also fairly computationally intensive, and often require fine-grained performance optimizations.

In contrast, we demonstrate that robust classifiers, without any special training or auxiliary networks, can be a powerful tool for synthesizing realistic natural images.

$y$ $\mathcal{L}$ $y$ , i.e.

x=\mathop{\arg\min}_{||x^\prime -x_0||_2\le\epsilon}\mathcal{L}(x^\prime,y),x_0\sim\mathcal{G}_y

$\mathcal{G}_y$ is some class-conditional seed distribution.

Here they use a simple choice of this seed distribution, a multivariate normal distribution, i.e.

\mathcal{G}_y:=\mathcal{N}(\mu_y,\Sigma_y),\mu_y=\Bbb{E}_{x\sim\mathcal{D}_y}[x],\Sigma=\Bbb{E}_{x\sim\mathcal{D}_y}[(x-\mu_y)^\top(x-\mu_y)]

$\mathcal{D}_y$ $y$ .

Image Inpainting

Image inpainting is the task of recovering images with large corrupted regions.

$x$ $m\in\{0,1\}^d$ , the goal of inpainting is to recover the missing pixels in a manner that is perceptually plausible with respect to the rest of the image.

For this purpose, they optimize the image to maximize the score of the underlying true class, while also forcing it to be consistent with the original in the uncorrupted region, i.e.

x_I=\mathop{\arg\min}_{x^\prime}\mathcal{L}(x^\prime, y)+\lambda ||(x-x^\prime)\odot (1-m)||_2

$\mathcal{L}$ $\odot$ $\lambda$ is an appropriately chosen constant.

Interestingly, even when this approach fails (reconstructions differ from the original), the resulting images do tend to be perceptually plausible to a human, as shown in Appendix Figure 12.

Image-to-Image Translation

The goal of image-to-image translation is to translate an image from a source to a target domain in a semantic manner.

The key is to (robustly) train a classifier to distinguish between the source and target domain. Conceptually, such a classifier will extract salient characteristics of each domain in order to make accurate predictions. We can then translate an input from the source domain by directly maximizing the predicted score of the target domain.

For overly simple tasks, models might extract little salient information (e.g., by relying on backgrounds instead of objects4) in which case our approach would not lead to meaningful translations.

In contrast, our method operates in the unpaired setting, where samples from the source and target domain are provided without an explicit pairing.

A robust discriminator can replace the generator, aha!

Super-Resolution

Super-resolution refers to the task of recovering high-resolution images given their low resolution version. While this goal is underspecified, our aim is to produce a high-resolution image that is consistent with the input and plausible to a human.

They cast super-resolution as the task of accentuating the salient features of low-resolution images, achieved by maximizing the score predicted by a robust classifier trained on the original high-resolution dataset for the underlying classes.

To ensure the structure and high-level content is preserved, they also penalize large deviations from the original low-resolution image. The problem solved here is formulated as

\hat{x}_H=\mathop{\arg\min}_{||x^\prime-\uparrow(x_L)||\le\epsilon}\mathcal{L}(x^\prime,y)

$\uparrow(\cdot)$ denotes the up-sampling operation based on nearest neighbors.

In general, our approach produces highresolution samples that are substantially sharper, particularly in regions of the image that contain salient class information.

Interactive Image Manipulation

Sketch-to-image

Recent work has explored building deep learning–based interactive tools for image synthesis and manipulation. For example, GANs have been used to transform simple sketches [CH18; Par+19] into realistic images.

By performing PGD to maximize the probability of a chosen target class, we can use robust models to convert hand-drawn sketches to natural images.

Feature Painting

Generative model–based paint applications often allow the user to control more fine-grained features, as opposed to just the overall class. We now show that we can perform similar feature manipulation through a minor modification to our basic primitive of class score maximization.

$x$ $f$ $R(x)$ $m$ , they simply apply PGD to solve

x_I=\arg\max_{x^\prime}R(x^\prime)_{f}-\lambda_P||(x-x^\prime)\odot(1-m)||

Change the designated region to maximize the designated feature while keep the rest of the image untouched.

Inspirations

More broadly, our findings suggest that adversarial robustness might be a property that is desirable beyond security and reliability contexts. Robustness may, in fact, offer a path towards building a more human-aligned machine learning toolkit.

They demonstrate the possibility of using discriminative models to do generative tasks. It will be a promising future, if the computational expense of getting a robust classifier is reduced.

Adversarial Robustness as a Prior for Learned Representations - 2019

Code: https://git.io/robust-reps

Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Brandon Tran, Aleksander Madry. Adversarial Robustness as a Prior for Learned Representations. arXiv preprint 2019. arXiv:1906.00945

In particular, these representations are approximately invertible, while allowing for direct visualization and manipulation of salient input features.

They propose to use the robust optimization framework as a tool to enforce (user-specified) priors on features that models should learn.

They demonstrate that the resulting learned "robust representations" address many of the shortcomings affecting standard learned representations.

Their findings include:

Representation inversion : Robust representations are approximately invertible. This property also naturally enables feature interpolation between arbitrary inputs.
Simple feature visualization: Direct maximization of the coordinates of robust representations suffices to visualize easily recognizable features of the model.
Feature manipulation: Through the aforementioned direct feature visualization property, robust representations enable the addition of specific features to images through direct first-order optimization.

Limitations of Standard Representations

$R(x)\in\R^k$ $x\in\R^d$ $k\ll d$ ).

$R(x)$ .

distance in representation space $(x_1,x_2)$ $\ell_2$ $(R(x_1),R(x_2))$ $||R(x_1)-R(x_2)||_2$ .

They find that it's straightforward to construct pairs of images with nearly identical representations yet drastically different content as shown in Figure 2.

$x_1,x_2\sim\mathcal{D}$ and then optimize one of them to minimize distance in representation space to the other, i.e. solving the following problem:

x_1^\prime=x_1+\arg\min_{\delta}||R(x_1+\delta)-R(x_2)||_2

Note that if representations truly provided an encoding of any image into high-level features, finding images with similar representations should necessitate finding images with similar high-level features.

This indicates that although the features learned are linearly separable but they are acquired with an injective feature extractor.

It was also pointed by another work saying that excessive invariance brings adversarial vulnerability. Since the phenomenon demonstrated here is actually because the network is too invariant to semantically meaningful information.

$\Delta$ $\Delta$ $\ell_p$ -norm perturbations) results in models that share more invariances with (and thus are encouraged to use similar features to) human perception.

Properties and Applications of Robust Representations

Robust representations are (approximately) invertible out of the box

Use the same optimization:

x_1^\prime=x_1+\arg\min_{\delta}||R(x_1+\delta)-R(x_2)||_2

which can be viewed as recovering an image that maps to the desired target representation, commonly referred to as representation inversion.

On robust classifier, the recovered images share similar semantic information as shown in Figure 3.

Representation proximity seems to entail semantic similarity

On the other hand, for robust models, we cannot get close to the target representation while staying close to the source image—this is illustrated quantitatively in Figure 4.

We also find that even when δ is highly constrained (i.e. when we are forced to stay very close to the source image and thus cannot match the representation of the target well), the solution to the inversion problem still displays some salient features of the target image (c.f. Figure 5)

It indicates that robust representation is highly representative.

Inversion of out-of-distribution inputs

We find that the inversion properties uncovered above hold even for out-of-distribution inputs, demonstrating that robust representations capture general features as opposed to features only relevant for the specific classification task.

It indicates that robust representations are universally invertible.....

Does it mean that adversarially trained model loses some domain knowledge?

Interpolation between arbitrary inputs

Note that this ability to consistently invert representations into corresponding inputs also translates into the ability to semantically interpolate between any two inputs.

This is expected but still very surprising.

Direct feature visualization

Optimization-based feature visualization $R(\cdot)$ of a given network. It maximizes a specific feature in the representation with respect to the input to obtain insight into the role of the feature in classification.

$i\in[k]$ $x^\prime$ that maximally activates it, i.e.

x^\prime=\arg\max_{\delta}R(x_0+\delta)_i

$x_0$ refers to various starting points.

For standard networks, optimizing the objective (5) often yields unsatisfying results

For robust representations, however, we find that easily recognizable high-level features emerge from optimizing objective (5) directly, without any regularization or post-processing.

As a natural consequence, it's possible to use the same optimization to manipulate the feature as shown in Figure 11.

Inspiration

This paper shows some surprising but expected properties of robust classifier, the potential to use a classifier to generate images is very promising, and soon there will be people trying to improve the quality of these generated images.

But I think the essence is the bijective property of the feature extraction function, maybe it's possible to enforce bijective to gain robustness.

Generative Approach for Adversarial Robustness

Analysis by Synthesis (a Bayesian Classifier for MNIST) - 2018

Lukas Schott, Jonas Rauber, Matthias Bethge, Wieland Brendel. Towards the first adversarially robust neural network model on MNIST. arXiv preprint 2018. arXiv:1805.09190

We present a novel robust classification model that performs analysis by synthesis using learned class-conditional data distributions.

$L_0$ $L_2$ $L_{\infty}$ perturbations and we demonstrate that most adversarial examples are strongly perturbed towards the perceptual boundary between the original and the adversarial class.

They trained 10 generative models for each class of MNIST and then use Bayesian equation for classification. It's semantically adversarially robust.

Analysis by Synthesis

As shown in Figure 1, they train ten class-conditional generative models based on variational auto-encoder, and use them to classify the input robustly.

The Bayesian classifier is formulated as follows:

$(\bold{x},y)$ $\bold{x}\in\R^N$ $p(y|\bold{x})$ $p(\bold{x}|y)$ and classify new inputs using Bayes formula, i.e.

p(y|\bold{x})=\frac{p(\bold{x}|y)p(y)}{p(\bold{x})}\propto p(\bold{x}|y)p(y)

$p(y)$ $p(\bold{x}|y)$ .

$\log p(\bold{x})$ $p_{\theta}(\bold{x}|\bold{z})$ $\bold{z}\sim p(\bold{z})$ $\theta$ :

\log p_{\theta}(\bold{x})\ge\Bbb{E}_{\bold{z}\sim q_{\phi}(\bold{z}|\bold{x})}[\log p_{\theta}(\bold{x}|\bold{z})-\mathcal{D}_{KL}[q_{\phi}(\bold{z}|\bold{x})||p(\bold{z})]]=:\ell_y(\bold{x})

$p(\bold{z})=\mathcal{N}(0,\Bbb{1})$ $q_{\phi}(\bold{z}|\bold{x})$ $\phi$ .

The first term is a reconstruction error and the second term is the mismatch between the variational and the true posterior. This term is the so-called evidence lower bound (ELBO) on the log-likelihood.

The structure of the propose analysis by synthesis (ABS) model is as shown in Figure 1.

Class-conditional distributions
$y$ $VAE_y$ $y$ $p(\bold{x}|y)$ .
Optimization-based inference
$q_{\phi}(\bold{z}|\bold{x})$ itself is a neural network susceptible to adversarial perturbations, so they use an optimization-based inference by
$\ell_y^{*}(\bold{x}) =\max_{\bold{z}}\log p_{\theta}(\bold{x}|\bold{z})-\mathcal{D}_{KL}[\mathcal{N}(\bold{z},\sigma\Bbb{1})||\mathcal{N}(0,\Bbb{1})]$
To avoid local minima we evaluate 8000 random points in the latent space of each VAE, from which we pick the best as a starting point for a gradient descent with 50 iterations using the Adam optimizer.
The inference is too slow for this classifier to be practically useful.
Classification and confidence
$\ell_y^{*}(\bold{x})$ $\alpha$ $\eta$ and divide it by the total evidence, i.e.
$p(y|\bold{x})=(e^{\alpha \ell_y^* (\bold{x})}+\eta)/\sum_c \left(e^{\alpha\ell_c^* (\bold{x})}+\eta\right)$
A variant of softmax.
$\eta >0$ $p(y|\bold{x})$ $q(\bold{x}, y)$ $\eta$ .
Binarization (Binary ABS only)
Binary ABS $b$ $0$ $b<0.5$ $1$ $b\ge 0.5$ during testing.
Discriminative finetuning (Binary ABS only)
Binary ABS $\ell_y^*(\bold{x})$ $\gamma_y$ $\gamma_y\in[0.96,1.06]$ $y$ .

Tight estimate of the lower bound for adversarial examples

The optimization-based inference can be written as

\ell_c^*(\bold{x})=\max_{\bold{z}}-\mathcal{D}_{KL}[\mathcal{N}(\bold{z},\sigma\bold{1})||\mathcal{N}(0,\bold{1})]-\frac{1}{2\sigma^2}||G_c(\bold{z})-\bold{x}||_2^2+C

$p(\bold{x}|\bold{z})$ $C$ $G_c(\bold{x})$ $p(\bold{x}|\bold{z},c)$ .

$y$ $\bold{z}_{\bold{x}}^*$ $\bold{x}$ $y$ .

$\ell_y^*(\bold{x}+\bold{\delta})$ $\delta$ $\epsilon=||\delta||_2$ can be estimated by

\ell_y^*(\bold{x}+\delta)\ge\ell_y^*(\bold{x})-\frac{1}{\sigma^2}\epsilon ||G_y(\bold{z_x^*})-\bold{x}||_2-\frac{1}{2\sigma^2}\epsilon^2 + C

$c\neq y$ is

\ell_c^*(\bold{x}+\delta)\le-\mathcal{D}_{KL}[\mathcal{N}(0,\sigma_q\bold{1})||\mathcal{N}(0,\bold{1})]+C-\begin{cases} \frac{1}{2\sigma^2}(d_c-\epsilon)^2&\text{if }d_c\ge\epsilon\\ 0 &\text{else} \end{cases}

$d_c=\min_z||G_c(\bold{z})-\bold{x}||_2$ .

By equating the two bounds, the perturbation becomes

\epsilon_x=\min_{c\neq y}\max\{0,\frac{d_c+\ell_y^*(\bold{x})-\mathcal{D}_{KL}[\mathcal{N}(0,\sigma_q\bold{1})||\mathcal{N}(0,\bold{1})]}{2(d_c+||G_y(\bold{z_x^*})-\bold{x}||_2)}\}

Experiments

As shown by experiments, ABS is quite robust. And the generated adversarial examples show semantic tamper, which violates the imperceptibility requirement, so it's still rational to say that it's robust.

Although the inference cost is impractical.

Inspirations

Bayesian classifiers suffer from inference cost, so it's not practically useful, however, I think it can be modified to be useful.

This paper didn't get any attention even after two years, I don't know why. Perhaps because variational inference is not a hot topic and the title of this paper restricts the dataset to be MNIST, which is likely to be ingeneralizable.