Are Adversarial Example Inevitable ?

By LI Haoyang 2020.12.8

Content

Are Adversarial Example Inevitable ?ContentAre Adversarial Example Inevitable? - ICLR 2019NotationProblem SetupAdversarial Example on the Unit SphereAdversarial Example on the Unit CubeSparse Adversarial ExamplesExistence of Adversarial ExamplesCan We Escape Fundamental Bounds?ExperimentsAre Adversarial Examples Inevitable?Inspirations

Are Adversarial Example Inevitable? - ICLR 2019

Ali Shafahi, W. Ronny Huang, Christoph Studer, Soheil Feizi, Tom Goldstein. Are adversarial examples inevitable? ICLR 2019. arXiv:1809.02104

This paper analyzes adversarial examples from a theoretical perspective, and identifies fundamental bounds on the susceptibility of a classifier to adversarial attacks. We show that, for certain classes of problems, adversarial examples are inescapable.

$\frac{1}{2}\text{exp}(-\pi \epsilon^2)$ $\ell_2$ -norm at most.

We present an example image class for which there is no fundamental link between dimensionality and robustness, and argue that the data distribution, and not dimensionality, is the primary cause of adversarial susceptibility.

Notation

The notations they use are as follows:

$[0,1]^n$ $n$ dimensions
$\text{vol}(\mathcal{A})$ $\mathcal{A}\sub [0,1]^n$ .
$\Bbb{S}^{n-1}=\{x\in\R^n|||x||_2 = 1\}$ $\R^n$ .
$s_{n-1}$ $\Bbb{S}^{n-1}$
$\mu [\mathcal{A}]$ $\mathcal{A}\in\Bbb{S}^{n-1}$ , i.e. the surface area the set covers
$\mu_1[\mathcal{A}]=\mu[\mathcal{A}]/s_{n-1}$ normalized measure $\mathcal{A}$ .
$\ell_p$ -norm, i.e.
$||z||_p=\left(\sum_i |z_i|^p\right)^{1/p}\text{ if }p>0,\text{ and }||z||_0=\text{card}\{z_i|z_i\neq 0\}$

Problem Setup

$\Omega$ $m$ $\{\rho_c\}_{c=1}^m$ $\rho_c:\Omega\to\R$ .

$c$ $\rho_c$ $U_c=\sup_x \rho_c(x)$ .

$\mathcal{C}:\Omega\to\{1,2,\dots,m\}$ $\Omega$ into disjoint measurable subsets, one for each class label.

Definition 1 $x\in\Omega$ $c$ $\epsilon >0$ $d$ $x$ $\epsilon$ $d$ $\hat{x}\in\Omega$ $\mathcal{C}(\hat{x})\neq c$ $d(x,\hat{x})\le\epsilon$ .

$x$ expands across the classification boundary.

$\ell_p$ $\ell_0$ -norm adversarial example.

Adversarial Example on the Unit Sphere

The idea is to show that, provided a class of data points takes up enough space, nearly every point in the class lies close to the class boundary.

Definition 2 $\epsilon$ $\mathcal{A}\sub \Omega$ $d$ $\mathcal{A}(\epsilon, d)$ $\epsilon$ $\mathcal{A}$ . To be precise

\mathcal{A}(\epsilon, d)=\{x\in\Omega|d(x,y)\le\epsilon\text{ for some } y\in\mathcal{A}\}

This "for some" is ambiguous.....

Lemma 1 $\mathcal{A}\sub\Bbb{S}^{n-1}\sub\R^n$ $\mu_1(\mathcal{A})\ge 1/2$ $\epsilon$ $\mathcal{A}(\epsilon)$ $\epsilon$ -expansion of a half sphere.

$\epsilon$ $1/2$ $\epsilon$ strip.

Lemma 2 $\epsilon$ $\epsilon$ -expansion of a half sphere has normalized measure at least

1-\left(\frac{\pi}{8}\right)^{\frac{1}{2}}\exp \left(-\frac{n-1}{2}\epsilon^2\right)

Lemmas 1 and 2 together can be taken to mean that, if a set is not too small, then in high dimensions almost all points on the sphere are reachable within a short jump from that set.

If it's not smaller than a half sphere, is it really appropriate to say that it's not too small? This is ambiguous.

Despite its complex appearance, the result below is a consequence of the (relatively simple) isoperimetric inequality.

Theorem 1 $m$ $\Bbb{S}^{n-1}\sub\R^n$ $\{\rho_c\}_{c=1}^m$ $\mathcal{C}:\Bbb{S}^{n-1}\to\{1,2,\dots,m\}$ that partitions the sphere into measurable subsets. Define the following scalar constants:

$V_c$ $\rho_c$ $V_c:=s_{n-1}\cdot\sup_x\rho_c(x)$ .
$f_c=\mu_c\{x|\mathcal{C}(x)=c\}$ $c$ $\mathcal{C}$ .

$c$ $f_c\le\frac{1}{2}$ $x$ $\rho_c$ . Then with probability at least

1-V_c\left(\frac{\pi}{8}\right)^{\frac{1}{2}}\exp \left(-\frac{n-1}{2}\epsilon^2\right)\tag{1}

one of the following conditions holds:

$x$ $\mathcal{C}$ , or
$x$ $\epsilon$ -adversarial example in the geodesic distance

$\epsilon$ .

$x$ $y$ on a sphere,

d_{\infty}(x,y)\le d_2(x,y)\le d_g(x,y)

$d_{\infty},d_2$ $d_g$ $\ell_{\infty}$ , Euclidean and geodesic distance respectively, therefore, $\epsilon$ -adversarial example in the geodesic metric would also be adversarial in the other two metrics, and the bound in Theorem 1 holds regardless of which of the three metrics we choose.

Adversarial Example on the Unit Cube

In a more typical situation, images will be scaled so that their pixels lie in [0, 1], and data lies inside a high-dimensional hypercube (but, unlike the sphere, data is not confined to its surface).

Fortunately, researchers have been able to derive “algebraic” isoperimetric inequalities that provide lower bounds on the size of the -expansion of sets without identifying the shape that achieves this minimum.

Lemma 3 $\mathcal{A}\sub [0,1]^n$ $p$ $d_p(x,y)=||x-y||_p,p>0$ $\Phi(z)=(2\pi)^{-\frac{1}{2}}\int _{-\infty}^z e^{-t^2/2}dt$ $\alpha$ $\Phi(\alpha)=\text{vol}[\mathcal{A}]$ . Then

\text{vol}[\mathcal{A}(\epsilon,d_p)]\ge\Phi\left(\alpha+\frac{\sqrt{2\pi n}}{n^{1/p^*}}\epsilon\right)\tag{2}

$p^*=\min(p,2)$ $\text{vol}(\mathcal{A})\ge 1/2$ , then we simply have

\text{vol}[\mathcal{A}(\epsilon,d_p)]\ge 1-\frac{\exp(-2\pi n^{1-2/p^*}\epsilon^2)}{2\pi\epsilon n^{1/2-1/p^*}}\tag{3}

$\epsilon$ -extension of this subset will occupy nearly the whole space.....?

Using this result, we can show that most data samples in a cube admit adversarial examples, provided the data distribution is not excessively concentrated.

Theorem 2 $m$ $[0,1]^n$ $\{\rho_c\}_{c=1}^m$ $\mathcal{C}:[0,1]^n\to \{1,2,\dots,m\}$ that partitions the hypercube into disjoint measurable subsets. Define the following scalar constants:

$U_c$ $\rho_c$ .
$f_c$ $c$ $\mathcal{C}$ .

$c$ $f_c\le \frac{1}{2}$ $\ell_p$ $p>0$ $p^*=\min(p,2)$ $x$ $\rho_c$ . Then with probability at least

1-U_c\frac{\exp(-2\pi n^{1-2/p^*}\epsilon^2)}{2\pi\epsilon n^{1/2-1/p^*}}\tag{4}

one of the following conditions holds:

$x$ $\mathcal{C}$ , or
$x$ $\hat{x}$ $||x-\hat{x}||_p\le\epsilon$ .

$p\ge 2$ , the bound becomes

1-U_c\exp(-2\pi \epsilon^2)/(2\pi\epsilon)\tag{5}

$p^*=\min(p,2)$ ...

$\epsilon$ relative to a typical vector.

$\ell_{\infty}$ $\ell_{\infty}$ $\epsilon<1$ $n$ .

$\epsilon<1$ $\epsilon$ reaches 1.

They state that using the tighter bound of equation 2, equation 4 can be replaced with

1-U_c\hat{\Phi}(\alpha+\sqrt{2\pi}\epsilon)

$p\ge 2$ $\hat{\Phi}(z)=\frac{1}{\sqrt{2\pi}}\int_z^{\infty}e^{-t^2/2}dt\ge\frac{1}{\sqrt{2\pi}z}e^{-z^2/2}, z>0$ $\alpha=\Phi^{-1}(1-f_c)$ .

$\epsilon<1$ $f_c$ $\epsilon$ $f_c$ $f_c < 10^{−3}$ for at least one class.

$\ell_{\infty}$ -norm attacks, guarantees of adversarial examples are much stronger on the sphere (Section 3) than on the cube.

$\ell_{\infty}$ -expansions that nearly match the behavior of equation 5, and so our theorems in this case are actually quite tight.

I do not get the sense of the last few paragraphs of this section.

Sparse Adversarial Examples

$\ell_0$ metric, i.e.

d(x,y)=||x-y||_0=\text{card}\{x_i\neq y_i\}

$x$ $\epsilon$ $\epsilon$ $\epsilon$ is taken to be a positive integer)

Theorem 2 is fairly tight for p = 1 or 2. However, the bound becomes quite loose for small p, and in particular it fails completely for the important case of p = 0.

Lemma 4 $p$ $\mathcal{A}\sub [0,1]^n$ $p$ $d(x,y)=||x-y||_p,p\ge 0$ . We have

\text{vol}[\mathcal{A}(\epsilon, d_p)]\ge 1-\frac{\exp(-\epsilon^{2p}/n)}{\text{vol}[\mathcal{A}]}\text{ for }p>0, \text{and}\\ \text{vol}[\mathcal{A}(\epsilon, d_0)]\ge 1-\frac{\exp(-\epsilon^2/n)}{\text{vol}[\mathcal{A}]}\text{ for }p=0

Theorem 3 $f_c\le\frac{1}{2}$ $x$ $\rho_c$ . Then with probability at least

1-2 U_c\exp(-\epsilon^2/n)\tag{8}

one of the following conditions holds:

$x$ $\mathcal{C}$ , or
$x$ $\epsilon$ pixels, while still remaining in the unit hypercube.

Using this equation, the probability for a uniformly distributed MNIST dataset to have an one-pixel adversarial example is more than 0.8....

I think it's misaligned with the experimental results.

Existence of Adversarial Examples

Tighter bounds can be obtained if we only guarantee that adversarial examples exist for some data points in a class, without bounding the probability of this event.

Theorem 4 $c$ $f_c<\frac{1}{2}$ $\ell_p$ $p^*=\min(p,2)$ .

$\text{supp}(\rho_c)$ $\rho_c$ $x$ $\rho_c(x)>0$ $\epsilon$ -adversarial example if

\text{vol}[\text{supp}(\rho_c)]\ge\begin{cases} \frac{1}{2}\exp(-\pi \epsilon^2 n^{1-2/p^*}),\text{ for }p>0\text{ or}\\ \exp\left(-2\left(\epsilon-\sqrt{\frac{n\log 2}{2}}\right)^2/n\right),\text{ for }p=0\tag{9} \end{cases}

$p=0$ $\epsilon\ge\sqrt{n\log 2/2}$ .

From wikipedia:

The support of a distribution is the smallest closed interval/set whose complement has probability zero. It may be understood as the points or elements that are actual members of the distribution.

This theorem states that if the distribution distributes large enough, there exists an adversarial example for sure.

$\ell_2$ -norm is used, the bound becomes

\text{vol}[\text{supp}(\rho_c)]\ge\exp(-\pi \epsilon^2)/2

$\sqrt{n}$ , and the bound becomes active for

\epsilon=\sqrt{n}

It can be seen that the bound is active whenever the size of the support satisfies

\text{vol}[\text{supp}(\rho_c)]>\frac{1}{2 e^{\pi n}}

$n$ $c$ $e^{-\pi}\approx 0.043$ .

It seems to say that as long as the data of this class is distributed larger than a small hypercube, there exists an adversarial example.

$\epsilon$ .

Luckily, if we restrict the adversarial examples to defend to be small, it's still solvable seemingly.

Can We Escape Fundamental Bounds?

There are a number of ways to escape the guarantees of adversarial examples made by Theorems 1-4. One potential escape is for the class density functions to take on extremely large values.

Unbounded density functions and low-dimensional data manifolds
$U_c = \infty$ )
Adding a “don’t know” class
The analysis above assumes the classifier assigns a label to every point in the cube. If a classifier has the ability to say “I don’t know,” rather than assign a label to every input, then the region of the cube that is assigned class labels might be very small, and adversarial examples could be escaped even if the other assumptions of Theorem 4 are satisfied. In this case, it would still be easy for the adversary to degrade classifier performance by perturbing images into the “don’t know” class.
It does not solve the problem.....
Feature squeezing
$U_c$ (we see in Section 8 that this is a reasonable assumption) or loss in accuracy (a stronger assumption), measuring data in lower dimensions could increase robustness.
$U_c$ is better to be larger, so the probability lower bound is smaller.....
Computational hardness
It may be computationally hard to craft adversarial examples because of local flatness of the classification function, obscurity of the classifier function, or other computational difficulties.
This is the defense of gradient masking....But it does not solve the problem either.

Experiments

It is commonly thought that high-dimensional classifiers are more susceptible to adversarial examples than low-dimensional classifiers. This perception is partially motivated by the observation that classifiers on highresolution image distributions like ImageNet are more easily fooled than low resolution classifiers on MNIST.

We will see below that high dimensional distributions may be more concentrated than their low-dimensional counterparts.

The theorem below shows that these fundamental limits do not depend in a non-trivial way on the dimensionality of the images in big MNIST, and so the relationship between dimensionality and susceptibility in Figure 4a results from the weakness of the training process.

Theorem 5 $\epsilon$ $p$ $c$ $\epsilon$ $\ell_2$ $p$ $b\ge 1$ $c$ $b\epsilon$ $p$ .

$b$ $b\epsilon$ $p$ $b\ge 1$ $\epsilon$ $p$ .

They want to state that the vulnerability stems from data distribution and it is trivially related to dimensionality.

$\ell_2$ -norm of a 56 × 56 image is twice that of its 28 × 28 counterpart.

As shown in Figure 4b:

As predicted by Theorem 5, the 112×112 classifier curve is twice as wide as the 56×56 curve, which in turn is twice as wide as the 28×28 curve. In addition, we see the kind of “phase transition” behavior predicted by Theorem 2, in which the classifier suddenly changes from being highly robust to being highly susceptible as passes a critical threshold.

For these reasons, it is reasonable to suspect that the adversarially trained classifiers in Figure 4b are operating near the fundamental limit predicted by Theorem 2.

This is a theoretically backup for adversarial training.....

But then why are high-dimensional classifiers so easy to fool?

$U_c$ $U_c$ .

$U_c$ $\rho_c$ , how can it be larger than 1?

As shown in Figure 4c:

$U_c$ and dramatically increase susceptibility by choosing a more “spread out” dataset, like CIFAR-10, in which adjacent pixels are less strongly correlated and images appear to concentrate near complex, higher-dimensional manifolds.

The former problem lives in 3136 dimensions, while the latter lives in 3072, and both have 10 classes. Despite the structural similarities between these problems, the decreased concentration of CIFAR-10 results in vastly more susceptibility to attacks, regardless of whether adversarial training is used.

$U_c$ $U_c$ are likely concentrated near high-dimensional complex manifolds, have more intra-class variation, and thus more apparent complexity.

An informal interpretation of Theorem 2 is that “high complexity” image classes are fundamentally more susceptible to adversarial examples.

$U_c$ seems to be ambiguous.

Are Adversarial Examples Inevitable?

The question of whether adversarial examples are inevitable is an ill-posed one. Clearly, any classification problem has a fundamental limit on robustness to adversarial attacks that cannot be escaped by any classifier. However, we have seen that these limits depend not only on fundamental properties of the dataset, but also on the strength of the adversary and the metric used to measure perturbations. This paper provides a characterization of these limits and how they depend on properties of the data distribution. Unfortunately, it is impossible to know the exact properties of real-world image distributions or the resulting fundamental limits of adversarial training for specific datasets. However, the analysis and experiments in this paper suggest that, especially for complex image classes in high-dimensional spaces, these limits may be far worse than our intuition tells us.

Inspirations

They give a comprehensive analysis of the existence of adversarial examples. They point out that the higher adversarial vulnerability of higher dimensional data stems from the training process and use adversarially trained models demonstrate that fundamentally the adversarial perturbation required increases as dimension increases.

Thus, fundamentally, the adversarial vulnerability stems from data distribution, i.e. the concentration of data points from each class.

This explains why CIFAR-10 is much more difficult than MNIST for adversarial robustness.