Explanation for Robustness and Adversarial Example - 1

By LI Haoyang 2020.10

Explanation for Robustness and Adversarial Example - 1RobustnessRobustness of classifiers: from adversarial to random noise - NIPS 2016Definitions and notationsRobustness of clasifiersAffine classifiersGeneral classifiersCurvature of decision boundaryRobustness to random and semi-random noiseExperimentsConclusionRobustness of classifiers to universal perturbations - ICLR 2018Adversarial vulnerability for any classifier - NIPS 2018InspirationsAdversarially robust generalization requires more data - NIPS 2018Overfitting in CIFAR-10Gaussian modelBernoulli modelExperimentsInspirationsRobustness May Be at Odds with Accuracy - ICLR 2019The price of adversarial robustnessAdversarial robustness may be incompatible with standard accuracyThe importance of adversarial trainingUnexpected benefits of adversarial trainingInspirationsCause of Adversarial ExampleAdversarial spheres - 2018Adversarial examples are not bugs, they are features - NIPS 2019DefinitionsRobust featuresNon-robust featuresTheoretical frameworkInspirationsAdversarial Examples Are a Natural Consequence of Test Error in Noise - ICML 2019MotivationAdversarial and corruption robustnessErrors in Gaussian noise suggest adversarial examplesConcentration of measure for noisy imagesEvaluating corruption robustnessConclusionInspirations

Robustness

This direction germinates from the robustness analysis of machine learning algorithms, which is a domain with a long history.

Robustness of classifiers: from adversarial to random noise - NIPS 2016

Paper: https://dl.acm.org/doi/abs/10.5555/3157096.3157279

Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. 2016. Robustness of classifiers: from adversarial to random noise. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS'16). Curran Associates Inc., Red Hook, NY, USA, 1632–1640. arXiv:1608.08967

$\sqrt{d}$ times the distance from the datapoint to the classification boundary (where d denotes the dimension of the data) provided the curvature of the decision boundary is sufficiently small.

$\sqrt{d/m}$ $m$ $m$ $d$ , it is still possible to find small perturbations that cause data misclassification.

Definitions and notations

$L$ $f:\Bbb{R}^d\to \Bbb{R}^L$ $x_0\in \Bbb{R}^d$ $\hat{k}(x_0)=\mathop{\arg\max}_kf_k(x_0)$ $\cal{S}$ $\Bbb{R}^d$ $m$ .

$r_\cal{S}^*$ $\cal{S}$ $f$ $x_0$ :

r_{\mathcal{S}}^*(x_0)=\mathop{\arg\min}\limits_{r\in\mathcal{S}}||r||_2\ s.t.\ \hat{k}(x_0+r)\ne \hat{k}(x_0)

It can also be equivalently written as:

r_{\mathcal{S}}^*(x_0)=\mathop{\arg\min}\limits_{r\in\mathcal{S}}||r||_2\ s.t.\ \exist k\ne \hat{k}(x_0):f_k(x_0+k)\ge f_{\hat{k}(x_0)}(x_0+r)

$\cal{S}=\Bbb{R}^d$ $r^*(x_0)=r_{\Bbb{R}^d}^*(x_0)$ is the generally defined adversarial perturbation.

$f$ $x_0$ $\cal{S}$ $||r_{\cal{S}}^*(x_0)||_2$ $\cal{S}$ corresponds to study in different regimes:

Random noise regime
The robustness quantity in this regime is defined as:
$\min_t|t|\ s.t.\ \exist k\ne \hat{k}(x_0),f_k(x_0+tv)\ge f_{\hat{k}(x_0)}(x_0+tv)$
$t$ $\cal{S}$ $v$ $\Bbb{S}^{d-1}$ $d$ dimensions.
Semi-random noise regime
$\cal{S}$ $m$ $m$ is one, it reduces to random noise regime.

$v$ $t$ $m$ .

Robustness of clasifiers

Affine classifiers

$f$ is an affine classifier, i.e.:

f(x)=W^Tx+b,\\ W=[w_1,\dots,w_L],b\in \Bbb{R}^L

Theorem 1 $\delta>0$ $\cal{S}$ $m$ $\Bbb{R}^d$ $f$ $L$ -class affine classifier. Let

\zeta_1(m,\delta)=\left(1+2\sqrt{\frac{\text{ln}(1/\delta)}{m}}+\frac{2\text{ln}(1/\delta)}{m}\right)^{-1},\\ \zeta_2(m,\delta)=\left(\max\left((1/e)\delta^{2/m},1-\sqrt{2(1-\delta^{2/m})}\right)\right)^{-1}

$||r_{\cal{S}}^*||2$ $||r^*||_2$ :

\sqrt{\zeta_1(m,\delta)}\sqrt{\frac{d}{m}}||r^*||_2\le||r_{\cal{S}}^*||_2\le\sqrt{\zeta_2(m,\delta)}\sqrt{\frac{d}{m}}||r^*||_2

$1-2(L+1)\delta$ .

$||r^*||_2$ $\sqrt{d/m}$ .

$||r^*||_2$ is small).

$m$ $m\ge 250$ $||r_{\cal{S}}^*||_2\approx\sqrt{d/m}||r^*||_2$ $\zeta_1(m,\delta)\approx\zeta_2(m,\delta)\approx 1$ $m$ .

General classifiers

Curvature of decision boundary

$\scr{B}_{i,j}$ $i$ $j$ is:

\mathscr{B}_{i,j}=\{x\in \Bbb{R}^d:f_i(x)-f_j(x)=0\}

$\cal{R}_i$ $\cal{R}_j$ $i$ $j$ :

\cal{R}_i=\{x\in\Bbb{R}^d:f_i(x)>f_j(x)\},\\ \cal{R}_j=\{x\in\Bbb{R}^d:f_j(x)>f_i(x)\}

$p\in \scr{B}_{i,j}$ $q_{i||j}(p)$ $\cal{R}_i$ $\scr{B}_{i,j}$ $p$ , i.e.:

q_{i||j}(p)=\sup_{z\in \Bbb{R}^d}\{||z-p||_2:\Bbb{B}(z,||z-p||_2)\sube \cal{R}_i\}

$\Bbb{B}(z,||z-p||_2)$ $\Bbb{R}^d$ $z$ $||z-p||_2$ .

$p$ on the boundary, while the ball itself falls completely in one side of the boundary.

$q_{i,j}(p)$ is defined to be the radius of the worst-case ball inscribed in any of the two regions:

q_{i,j}(p)=\min(q_{i||j}(p),q_{j||i}(p))

$p$ to the entire boundary, the global curvature is defined as the worst case radius over the entire boundary :

q(\mathscr{B}_{i,j})=\inf_{p\in\mathscr{B}_{i,j}}q_{i,j}(p),\\ \kappa(\mathscr{B}_{i,j})=\frac{1}{q(\mathscr{B}_{i,j})}

$\kappa(\scr{B}_{i,j})$ $p$ on the decision boundary.

$\kappa(\scr{B}_{i,j})=0$ , as it's possible to inscribe balls of infinite radius inside each region of the space.

$\kappa(\scr{B}_{i,j})$ provides an intuitive way of describing the nonlinearity of the decision boundary by fitting balls inside the classification regions.

Robustness to random and semi-random noise

$\hat{k}$ $k\in\{1,\dots,L\}\backslash\{\hat{k}\}$ $\scr{B}_k:=\scr{B}_{k,\hat{k}}$ $k$ $\hat{k}$ , the semi-random robustness and adversarial robustness defined above can be re-written as:

r_{\cal{S}}^k=\mathop{\arg\min}\limits_{r\in\cal{S}}||r||_2\ s.t.\ f_k(x_0+r)\ge f_{\hat{k}}(x_0+r),\\ r^k=\mathop{\arg\min}_{r}||r||_2\ s.t.\ f_k(x_0+r)\ge f_{\hat{k}}(x_0+r)

Theorem 2 $\cal{S}$ $m$ $\Bbb{R}^d$ $\kappa:=\kappa(\scr{B}_k)$ . Assuming that the curvature satisfies

\kappa\le \frac{C}{\zeta_2(m,\delta)||r^k||_2}\frac{m}{d}

$||r_{\cal{S}}^k||_2$ $||r^k||_2$ :

\left(1-C_1||r^k||_2\kappa\zeta_2(m,\delta)\frac{d}{m}\right)\sqrt{\zeta_1(m,\delta)}\sqrt{\frac{d}{m}}\le\frac{||r_{\cal{S}}^k||_2}{||r^k||_2}\le \left(1+C_2||r^k||_2\kappa\zeta_2(m,\delta)\frac{d}{m}\right)\sqrt{\zeta_2(m,\delta)}\sqrt{\frac{d}{m}}

$1-4\delta$ $C=0.2$ $C_1=0.625$ $C_2=2.25$ .

$\kappa(\scr{B}_k)$ is sufficiently small.

$A$ $k$ $||r^k||_2$ is large:

A=\{k:||r^k||_2\ge1.45\sqrt{\zeta_2(m,\delta)}\sqrt{\frac{d}{m}}||r^*||_2\}

$\hat{k}$ .

$||r_{\cal{S}}^*||_2$ $||r^*||_2$ .

Corollary 1 $\cal{S}$ $m$ $\Bbb{R}^d$ $k\notin A$ , we have

\kappa(\mathscr{B}_k)||r^k||_2\le \frac{0.2}{\zeta_2(m,\delta)}\frac{m}{d}

Then, we have

0.875 \sqrt{\zeta_{1}(m, \delta)} \sqrt{\frac{d}{m}}\left\|\boldsymbol{r}^{*}\right\|_{2} \leq\left\|\boldsymbol{r}_{\mathcal{S}}^{*}\right\|_{2} \leq 1.45 \sqrt{\zeta_{2}(m, \delta)} \sqrt{\frac{d}{m}}\left\|\boldsymbol{r}^{*}\right\|_{2}

$1-4(L+2)\delta$ .

$||r_{\cal{S}}^*||_2$ $||r^*||_2$ $\sqrt{d/m}$ .

$m=1$ $\sqrt{d}$ , and shows that in high dimensional classification problems, classifiers with sufficiently flat boundaries are much more robust to random noise than to adversarial noise.

$m$ ).

Experiments

$\beta(f;m)$ is defined as:

\beta(f ; m)=\sqrt{m / d} \frac{1}{|\mathscr{D}|} \sum_{\boldsymbol{x} \in \mathscr{D}} \frac{\left\|\boldsymbol{r}_{\mathcal{S}}^{*}(\boldsymbol{x})\right\|_{2}}{\left\|\boldsymbol{r}^{*}(\boldsymbol{x})\right\|_{2}}

$\scr{D}$ $\cal{S}$ $\boldsymbol{x}$ .

$\sqrt{d/m}||r^*(x)||_2$ $m$ .

$\beta$ $1$ $m$ $d$ . This shows that our quantitative analysis provide very accurate estimates of the robustness to semi-random noise.

$\kappa(\scr{B}_k)$ , as this is a key assumption of our theoretical findings.

This shows that imperceptibly small structured messages can be added to an image causing data misclassification.

Conclusion

Our results show, in particular, that when the decision boundary has a small curvature, classifiers are robust to random noise in high dimensional classification problems (even if the robustness to adversarial perturbations is relatively small).
Moreover, for semi-random noise that is mostly random and only mildly adversarial (i.e., the subspace dimension is small), our results show that state-of-the-art classifiers remain vulnerable to such perturbations.
To improve the robustness to semi-random noise, our analysis encourages to impose geometric constraints on the curvature of the decision boundary, as we have shown the existence of an intimate relation between the robustness of classifiers and the curvature of the decision boundary.

Robustness of classifiers to universal perturbations - ICLR 2018

Paper: https://openreview.net/forum?id=ByrZyglCb

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard, Stefano Soatto. Robustness of Classifiers to Universal Perturbations: A Geometric Perspective. ICLR 2018.

Adversarial vulnerability for any classifier - NIPS 2018

Paper: http://papers.nips.cc/paper/7394-adversarial-vulnerability-for-any-classifier

Alhussein Fawzi, Hamza Fawzi, Omar Fawzi. Adversarial vulnerability for any classifier. NIPS 2018.

In this paper, we study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. We derive fundamental upper bounds on the robustness to perturbations of any classification function, and prove the existence of adversarial perturbations that transfer well across different classifiers with small risk.

For details, see Adversarial Vulnerability.

Inspirations

From the general idea, this paper suggests the following potentials

Higher-dimensional data and more classes introduce less robustness.
Unconstrained robustness (general robustness) can be increased by increasing in-distribution robustness (It seems to suggest that an accurate model is also important in terms of robustness?)
A classifier is more robust if the boundary in latent space is more linear.

Adversarially robust generalization requires more data - NIPS 2018

Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, Aleksander Madry. Adversarially Robust Generalization Requires More Data. NIPS 2018. arXiv:1804.11285

We show that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of “standard” learning.

As hinted in the title, they show that adversarially robust generalization requires more data. They address the following question:

How does the sample complexity of standard generalization compare to that of adversarially robust generalization?

Put differently, we ask if a dataset that allows for learning a good classifier also suffices for learning a robust one.

Overfitting in CIFAR-10

However, CIFAR10 offers a different picture. Here, the model (a wide residual network [61]) is still able to fully fit the training set even against an adversary, but the generalization gap is significantly larger.

As shown in Figure 1, on MNIST, adversarial training can achieve both high robust accuracy in training and test set while on CIFAR-10, there appears overfitting in adversarial accuracy.

Gaussian model

The Gaussian model is acquired by the following steps:

$\theta^*\in\R^d$ $\sigma>0$ $(\theta^*,\sigma)$ $(x,y)\in\R^d\times\{+1,-1\}$ .

$y\in\{+1,-1\}$ uniformly at random.
$x\in\R^d$ $\mathcal{N}(y\cdot\theta^*,\sigma^2I)$ .

$\theta^*$ $\sqrt{d}$ $\sigma^2$ , which controls the amount of overlap between the two classes.

The Classification error is defined as:

$\mathcal{P}:\R^d\times\{+1,-1\}\to\R$ $\beta$ $f:\R^d\to\{+1,-1\}$ is defined as

\beta=\Bbb{P}_{(x,y)\sim\mathcal{P}}[f(x)\neq y]

The Robust classification error is defined as:

$\mathcal{P}:\R^d\times\{+1,-1\}\to\R$ $\mathcal{B}:\R^d\to\mathscr{P}(\R^d)$ $\mathcal{B}$ $\beta$ $f:\R^d\to\{+1,-1\}$ is defined as

\beta=\Bbb{P}_{(x,y)\sim\mathcal{P}}[\exist x^\prime\in\mathcal{B}(x:f(x^\prime)\neq y)]

$\ell_{\infty}$ -robustness, i.e.

\mathcal{B}_{\infty}^{\epsilon}(x)=\{x^\prime\in\R^d|\ ||x^\prime-x||_{\infty}\le\varepsilon\}

Using a linear classifier defined as:

f_w(x)=\text{sgn}(\left<w,x\right>):\R^d\to\{+1,-1\}

They prove the following theorem:

Theorem 4 $(x,y)$ $(\theta^*,\sigma)$ $||\theta^*||_2=\sqrt{d}$ $\sigma\le c\cdot d^{1/4}$ $c$ $\hat{w}\in\R^d$ $\hat{w}=y\cdot x$ $f_{\hat{w}}$ has classification error at most 1%.

To minimize the number of parameters in our bounds, we have set the error probability to 1%.

This theorem states that as long as the variance is small enough, a simple linear classifier they consider can perform very well.

$\ell_{\infty}$ -robust classification error requires significantly more samples.

Theorem 5 $(x_1,y_1),\dots,(x_n,y_n)$ $(\theta^*,\sigma)$ $||\theta^*||_2=\sqrt{d}$ $\sigma\le c_1\cdot d^{1/4}$ $\hat{w}\in\R^d$ $\hat{w}=\frac{1}{n}\sum_{i=1}^n y_i x_i$ $f_{\hat{w}}$ $\ell_{\infty}^{\epsilon}$ -robust classification error at most 1% if

n\ge\begin{cases} 1&\text{for }\epsilon\le\frac{1}{4}d^{-1/4}\\ c_2\epsilon^2\sqrt{d}&\text{for }\frac{1}{4}d^{-1/4}\le\epsilon\le\frac{1}{4} \end{cases}

$c_1$ $c_2$ are two universal constants.

$\ell_{\infty}^\epsilon$ $\epsilon$ is bounded by a small constant and we have a large number of samples.

Theorem 6 $g_n$ $n$ $f_n$ $\sigma=c_1\cdot d^{1/4}$ $\epsilon\ge 0$ $\theta\in\R^d$ $\mathcal{N}(0,I)$ $n$ $(\theta,\sigma)$ $\ell_{\infty}^\epsilon$ $f_n$ $(1-1/d)\frac{1}{2}$ (typo?) if

n\le c_2\frac{\epsilon^2\sqrt{d}}{\log d}

Comparing Theorem 5Theorem 6 $n$ required for robust generalization is bounded as

c\frac{\epsilon^2\sqrt{d}}{\log d}\le n\le c^\prime\epsilon^2\sqrt{d}

This shows that for high-dimensional problems, adversarial robustness can provably require a significantly larger number of samples.

As a result, the lower bound provides transferable adversarial examples and applies to worst-case distribution shifts without a classifier-adaptive adversary.

Bernoulli model

When sampling a datapoint for a given class, we flip each bit of the corresponding class vertex with a certain probability. This data model is inspired by the MNIST dataset because MNIST images are close to binary (many pixels are almost fully black or white).

The Bernoulli model is acquired as follows

$\theta^*\in\{+1,-1\}^d$ $\tau >0$ $(\theta^*,\tau)$ $(x,y)\in\{+1,-1\}^d\times\{+1,-1\}$ :

$y\in\{+1,-1\}$ uniformly at random from its domain.
$x\in\{+1,-1\}^d$ $x_i$ from the distribution
$x_i=\begin{cases} y\cdot \theta^*_i&\text{ with probability }1/2+\tau\\ -y\cdot\theta^*_i&\text{ with probability }1/2-\tau \end{cases}$

$\tau$ makes the samples less correlated with their respective class vectors, leading to a harder classification problem.

This model is similar to that of MNIST, since most pixels in a sample of MNIST are either near 0 or near 1.

Similarly, in standard generalization, there is

Theorem 8 $(x,y)$ $(\theta^*,\tau)$ $\tau\ge c\cdot d^{-1/4}$ $c$ $\hat{w}\in\R^d$ $\hat{w}=y\cdot x$ $f_{\hat{w}}$ has classification error at most 1%.

As well as in robust generalization, there is

Theorem 9 $g_n$ $n$ $f_n$ $\theta^*$ $\{+1,-1\}^d$ $n$ $(\theta^*,\tau)$ $\tau=c_1\cdot d^{-1/4}$ $\epsilon<3\tau$ $0<\gamma<1/2$ $\ell_{\infty}^\epsilon$ $f_n$ $\frac{1}{2}-\gamma$ if

n\le c_2\frac{\epsilon^2\gamma^2 d}{\log d/\gamma}

They then show that non-linear classifiers can achieve a significantly improved robustness in this setting.

$T:\R^d\to\R^d$ which is defined element-wise as

T(x)_i=\begin{cases} +1&\text{ if }x_i\ge 0\\ -1&\text{ otherwise} \end{cases}

$\epsilon< 1$ $\ell_{\infty}$ $T(\mathcal{B}_{\infty}^\epsilon(x))=\{x\},\forall x\in\{+1,-1\}^d$ .

Hence the following theorem:

Theorem 10 $(x,y)$ $(\theta^*,\tau)$ $\tau\ge c\cdot d^{-1/4}$ $c$ $\hat{w}\in\R^d$ $\hat{w}=yx$ $f_{\hat{w}}\circ T$ $\ell_{\infty}^\epsilon$ $\epsilon<1$ .

A binarization can defend well in this setting.

This discrepancy provides evidence that robust generalization requires a more nuanced understanding of the data distribution than standard generalization.

Experiments

We consider standard convolutional neural networks and train models on datasets of varying complexity. Specifically, we study the MNIST [34], CIFAR-10 [33], and SVHN [40] datasets.

We use a simple convolutional architecture for MNIST, a standard ResNet model [23] for CIFAR-10, and a wider ResNet [61] for SVHN.

The plots demonstrate the need for more data to achieve adversarially robust generalization. For any fixed test set accuracy, the number of samples needed is significantly higher for robust generalization.

$\ell_{\infty}$ -adversary on MNIST.

$ReLU(x-\epsilon_{filter})$ $ReLU(1-x-\epsilon_{filter})$ .

As predicted by our theory, the networks achieve good adversarially robust generalization with significantly fewer samples when thresholding filters are added.

It is also worth noting that the thresholding filters could have been learned by the original network architecture, and that this modification only decreases the capacity of the model.

Our findings emphasize network architecture as a crucial factor for learning adversarially robust networks from a limited number of samples.

We also experimented with thresholding filters on the CIFAR10 dataset, but did not observe any significant difference from the standard architecture.

Inspirations

Theoretically and empirically, this paper demonstrates that robust generalization requires more data, and by the way show the difference between MNIST and CIFAR-10.

It's interesting that a simple thresholding can improve the robustness of classifier on MNIST. Although they fail to do the same on CIFAR-10, I think it's worth further exploration.

Robustness May Be at Odds with Accuracy - ICLR 2019

Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, Aleksander Madry. Robustness May Be at Odds with Accuracy. ICLR 2019. arXiv:1805.12152

Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy.

The representations learned by robust models tend to align better with salient data characteristics and human perception.

They try to answer the following question:

Why does there seem to be a trade-off between standard and adversarially robust accuracy?

The price of adversarial robustness

In canonical classification setting, the goal is to train models that have low expected loss (also known as population risk):

\Bbb{E}_{(x,y)\sim\mathcal{D}}[\mathcal{L}(x,y;\theta)]

For adversarial robustness, the goal is to train models with low expected adversarial loss:

\Bbb{E}_{(x,y)\sim\mathcal{D}}\left[\max_{\delta\in\Delta}\mathcal{L}(x+\delta,y;\theta)\right]

$\Delta$ $l_p$ $\Delta=\{\delta\in\R^d|||\delta||_p\le\varepsilon\}$ ).

Adversarial training incorporates the two goals above, and aims at the following problem:

\min_{\theta}\Bbb{E}_{(x,y)\sim\tilde{\mathcal{D}}}\left[\max_{\delta\in\mathcal{S}}\mathcal{L}(x+\delta,y;\theta)\right]

However, the robustness comes with a drop in accuracy and an increase in the training time.

Our point of start is a popular view of adversarial training as the “ultimate” form of data augmentation.

$\delta$ corresponds to augmenting the training data in the “most confusing” and thus also “most helpful” manner.

But as shown in experiments, in most cases, adversarial training has a side effect to standard accuracy.

Adversarial robustness may be incompatible with standard accuracy

$(x,y)$ $\mathcal{D}$

y\mathop{\sim}\limits^{u.a.r}\{-1,+1\}, x_1=\begin{cases} +y,\ w.p.\ p\\ -y,\ w.p.\ 1-p^{'} \end{cases},\ x_2,\dots,x_{d+1}\mathop{\sim}^{i.i.d}\mathcal{N}(\eta y,1)

$\mathcal{N}(\mu,\sigma^2)$ $\mu$ $\sigma^2$ $p\ge 0.5$ $\eta$ $\eta=\Theta(1/\sqrt{d})$ ) such that a simple classifier attains high standard accuracy (>99%).

$p$ $x_1$ $p=0.95$ for simplicity).

In short, they construct a dataset of two high dimensional Gaussian spheres but with one dimension correlated with the label.

A standard classification of the dataset is easy, e.g. a natural linear classifier

f_{avg}(x):=\text{sign}(w^\top_{unif}x),w_{unif}:=\left[0,\frac{1}{d},\cdots,\frac{1}{d}\right]

$d$ .

The accuracy, i.e.

\operatorname{Pr}\left[f_{\text {avg }}(x)=y\right]=\operatorname{Pr}\left[\operatorname{sign}\left(w_{\text {unif }} x\right)=y\right]=\operatorname{Pr}\left[\frac{y}{d} \sum_{i=1}^{d} \mathcal{N}(\eta y, 1)>0\right]=\operatorname{Pr}\left[\mathcal{N}\left(\eta, \frac{1}{d}\right)>0\right]

$\eta\ge 3/\sqrt{d}$ .

The standard classifier can utilize any feature that is even slightly correlated with the label.

But for adversarially trained model, this analogy breaks.

$\varepsilon=2\eta$ $-y$ .

$x^{'}$ $x^{'}_2,\dots,x_{d+1}^{'}$ $\mathcal{N}(-\eta y,1)$ .

Now, the accuracy achieved based on adversarial perturbed features is no better than 1% (reversed):

\min _{\|\delta\|_{\infty} \leq \varepsilon} \operatorname{Pr}[\operatorname{sign}(x+\delta)=y] \leq \operatorname{Pr}\left[\mathcal{N}\left(\eta, \frac{1}{d}\right)-\varepsilon>0\right]=\operatorname{Pr}\left[\mathcal{N}\left(-\eta, \frac{1}{d}\right)>0\right]

$x_1$ $x_2,\dots,x_{d+1}$ ) that arises in the adversarial setting.

Theorem 2.1 $1-\delta$ $\mathcal{D}$ $\frac{p}{1-p}\delta$ $l_{\infty}$ $\varepsilon\ge2\eta$ .

In short, a very strong adversary used in adversarial training will poison the accuracy.

Here, the trade-off between standard and adversarial accuracy is an inherent trait of the data distribution itself and not due to having insufficient samples.

Note, however, that humans can have lower accuracy in certain benchmarks compared to ML models (Karpathy, 2011, 2014; He et al., 2015; Gastaldi, 2017) potentially because ML models rely on brittle features that humans themselves are naturally invariant to.

The importance of adversarial training

Theorem 2.2 $\eta\ge 4/\sqrt{d}$ $p\le 0.975$ $>99\%$ $<1\%$ $l_{\infty}$ $\varepsilon\ge 2\eta$ $p$ $\varepsilon<1$ .

This theorem shows that if our focus is on robust models, adversarial training is necessary to achieve non-trivial adversarial accuracy in this setting.

An interesting implication of our analysis is that standard training produces classifiers that rely on features that are weakly correlated with the correct label. (leading to the transferability of adversarial examples)

Further, we find that it is possible to obtain a robust classifier by directly training a standard model using only features that are relatively well-correlated with the label (without adversarial training).

Unexpected benefits of adversarial training

A model that achieves small loss for all perturbations in the set ∆, will necessarily have learned representations that are invariant to such perturbations. Thus, robust training can be viewed as a method to embed certain invariances in a model.

The adversarially trained model has a loss gradient better aligned with human perceptron.

$\Delta$ , adversarial training alone might be sufficient to yield more human-aligned gradients.

The adversarial examples generated for robust models appear to exhibit salient data charateristics. And consiquently

By linearly interpolating between the original image and the image produced by PGD we can produce a smooth, “perceptually plausible” interpolation between classes (Figure 4) .

Inspirations

This is truly a fruitful paper, theoretically, it points out the relation between robustness and accuracy, and empirically, it points out the better gradients manifested by robust models. Based on this paper, there are several directions for potential exploratories:

Since the gradients are more interpretable, can one regularize the gradients to substitute adversarial training? (It has been achieved by JARN).
Is it possible to use adversarially trained model it manipulate images instead of GAN?
Theorem 2.1 $p=1-\delta$ $p\ge0.5$ for the theorem to stand, thus the model is equally robust and accurate with accuracy of 50%. It seems to coincide the reported robustness accuracy of C&W attack.
Theorem 2.1 $\varepsilon\ge 2\eta$ will tamper the accuracy when involved in adversarial training, it seem that one can improve adversarial training by restricting the ability of adversary involved. (It has been done partially be Friendly Adversarial Training)

Cause of Adversarial Example

Adversarial spheres - 2018

Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, Ian Goodfellow. The Relationship Between High-Dimensional Geometry and Adversarial Examples. 2018. arXiv:1801.02774v3

We hypothesize that this counterintuitive behavior is a result of the high-dimensional geometry of the data manifold, and explore this hypothesis on a simple high-dimensional dataset.

Adversarial examples are not bugs, they are features - NIPS 2019

Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. In Advances in Neural Information Processing Systems, pages 125–136, 2019. arXiv:1905.02175v3

They claim that

Adversarial vulnerability is a direct result of our models’ sensitivity to well-generalizing features in the data.

In fact, we find that standard ML datasets do admit highly predictive yet imperceptible features.

Since any two models are likely to learn similar non-robust features, perturbations that manipulate such features will apply to both.

To corroborate their claim, they designed two training dataset to show that it's possible to disentangle robust and non-robust features from the standard image classification dataset.

Definitions

They consider a binary classification scenario.

$(x,y)\in\cal{X}\times\{\pm 1\}$ $\cal{D}$ $C:\cal{X}\to\{\pm 1\}$ $y$ $x$ .

feature $\cal{X}$ $\mathcal{F}=\{f:\mathcal{X}\to\Bbb{R}\}$ $\Bbb{E}_{(x,y)\sim\mathcal{D}}[f(x)]=0$ $\Bbb{E}_{(x,y)\sim\mathcal{D}}[f(x)^2]=1$ .

The features are categorized in the following manner:

$\rho$ -useful features
$\cal{D}$ $f$ $\rho$ useful $\rho>0$ ) if it is correlated with the true label in experience, i.e.
$\Bbb{E}_{(x,y)\sim\mathcal{D}}[y\cdot f(x)]\ge \rho$
$\rho$ $f$ $\rho$ $\cal{D}$ $\rho_{\cal{D}}(f)$ .
$\gamma$ -robustly useful features
$\rho$ $f$ $f$ $\gamma$ $\delta\in \Delta(x)$ , i.e.
$\Bbb{E}_{(x,y)\sim\cal{D}}\left[\inf_{\delta\in\Delta(x)}y\cdot f(x+\delta)\right]\ge \gamma$
Useful, non-robust features
useful, non-robust feature $\rho$ $\rho$ $\gamma$ $\gamma\ge 0$ .

In short, a feature that still correlates with the corresponding label under permitted adversarial perturbations is defined as a robust feature here.

$x$ $C=(F,w,b)$ $y$ as

C(x)=\text{sgn}\left(b+\sum_{f\in F}w_f\cdot f(x)\right)

The standard training is performed by minimizing a loss function, e.g.

\Bbb{E}_{(x,y)\sim\mathcal{D}}[\mathcal{L}_{\theta}(x,y)]=-\Bbb{E}_{(x,y)\sim\mathcal{D}}\left[y\cdot\left(b+\sum_{f\in F}w_f\cdot f(x)\right)\right]

$\rho$ -usefulness.

The robust training involves an adversary making any useful but non-robust features anti-correlated with the true label, i.e.

\Bbb{E}_{(x,y)\sim\mathcal{D}}\left[\max_{\delta\in \Delta(x)}\mathcal{L}_{\theta}(x+\delta,y)\right]

The process can be viewed as explicitly preventing the classifier from learning a useful but non-robust combination of features.

Robust features

$C$ $\hat{\cal{D}}_R$ should suffices

\Bbb{E}_{(x,y)\sim\hat{\cal{D}}_R}[f(x)\cdot y]=\begin{cases} \Bbb{E}_{(x,y)\sim\mathcal{D}}[f(x)\cdot y]&if\ f\in F_C\\ 0&otherwise \end{cases}

$F_C$ $C$ .

$x\mapsto x_r$ $\hat{\cal{D}}_R$ by the following optimization:

\min_{x_r}||g(x_r)-g(x)||_2

$g$ $x$ to the representation layer.

$f\notin F_C$ $x_0$ $\cal{D}$ $x$ $f(x_0)\cdot y\simeq 0$ )

$C$ $\hat{\cal{D}}_{NR}$ .

As shown by Figure 2

By filtering out non-robust features from the dataset (e.g. by restricting the set of available features to those used by a robust model), one can train a robust model using standard training.

Non-robust features

$x$ $(x,y)$ $x_{adv}$ $t$ , i.e.

x_{adv}=\mathop{\arg\min}_{||x^{'}-x||\le \epsilon}L_{C}(x^{'},t)

$(x_{adv},t)$ $\hat{\cal{D}}_{rand}$ such that

\Bbb{E}_{(x,y)\sim\hat{\cal{D}}_{rand}}[y\cdot f(x)]=\begin{cases} > 0&\text{if }f\text{ non-robustly useful under }\mathcal{D},\\ =0 & \text{otherwise} \end{cases}

$t$ $t$ $y$ .

$t$ $y$ $\hat{\cal{D}}_{det}$ such that

\mathbb{E}_{(x, y) \sim \widehat{\mathcal{D}}_{\text {det}}}[y \cdot f(x)]\left\{\begin{array}{ll} >0 & \text { if } f \text { non-robustly useful under } \mathcal{D} \\ <0 & \text { if } f \text { robustly useful under } \mathcal{D} \\ \in \mathbb{R} & \text { otherwise }(f \text { not useful under } \mathcal{D}) \end{array}\right.

Models trained on these dataset show a surprisingly generalization ability.

Given that such features are inherent to the data distribution, different classifiers trained on independent samples from that distribution are likely to utilize similar non-robust features. Consequently, an adversarial example constructed by exploiting the non-robust features learned by one classifier will transfer to any other classifier utilizing these features in a similar manner.

Theoretical framework

They study a simple problem of maximum likelihood classification between two Gaussian distributions.

$(x,y)$ $\cal{D}$ according to

y\stackrel{u.a.x.}{\sim}\{-1,+1\},x\sim\mathcal{N}(y,\mu_*,\Sigma_*)

$\Theta=(\mu,\Sigma)$ such that

\Theta=\arg\min\limits_{\mu,\Sigma}\Bbb{E}_{(x,y)\sim\mathcal{D}}[l(x;y\cdot \mu,\Sigma)]

$l(x;\mu,\Sigma)$ represents the Gaussian negative log-likelihood (NLL) function, i.e. (?)

l(x;\mu,\Sigma)=x^T\Sigma^{-1}\mu

$\mu,\Sigma$ which maximize the likelihood of the sampled data under the given model.

$x$ $y$ is calculated by

y=\arg\max_{y}l(x;y\cdot\mu,\Sigma)=\text{sign}(x^T\Sigma^{-1}\mu)

The robust analogue of this problem (avdersarial training) is

\Theta_r=\arg\min_{\mu,\Sigma}\Bbb{E}_{(x,y)\sim\mathcal{D}}\left[\max_{||\delta||_2\le\varepsilon}l(x+\delta;y\cdot\mu,\Sigma)\right]

Theorem 1 (Adversarial vulnerability from misalignment). Consider an adversary whose perturbation is determined by the "Lagrangian penalty" form, i.e.

\max_{\delta}l(x+\delta;y\cdot\mu,\Sigma)-C\cdot||\delta||_2

$C\ge \frac{1}{\sigma_{min}(\Sigma_*)}$ $\cal{L}_{adv}$ $(\mu,\Sigma)$ is given by

\mathcal{L}_{adv}(\Theta)-\mathcal{L}(\Theta)=tr\left[(1+(C\cdot\Sigma_*-1)^{-1})^2\right]-d

$tr(\Sigma_*)=k$ $\Sigma_*=\frac{k}{d}I$ .

Theorem 2Robustly learned parameters $\mu_r=\mu^*$ $\Sigma_r$ $\varepsilon_0>0$ $\varepsilon\in[0,\varepsilon_0)$ ,

\Sigma_r=\frac{1}{2}\Sigma_*+\frac{1}{\lambda}\cdot I+\sqrt{\frac{1}{\lambda}\cdot\Sigma_*+\frac{1}{4}\Sigma_*^2},\\ \Omega\left(\frac{1+\varepsilon^{1/2}}{\varepsilon^{1/2}+\varepsilon^{3/2}}\right)\le\lambda\le O\left(\frac{1+\varepsilon^{1/2}}{\varepsilon^{1/2}}\right)

Theorem 3Gradient alignment $f(x)$ $f_r(x)$ $l_2$ -bounded maximum likelihood classification, respectively. The maximum angle formed between the gradient of the classifier (w.r.t input) and the vector connecting the classes can be smaller for the robust model, i.e.

\min_{\mu}\frac{\left<\mu,\nabla_xf_r(x)\right>}{||\mu||\cdot||\nabla_xf_r(x)||}>\min_{\mu}\frac{\left<\mu,\nabla_xf(x)\right>}{||\mu||\cdot||\nabla_xf(x)||}

Our theoretical analysis suggests that rather than offering any quantitative classification benefits, a natural way to view the role of robust optimization is as enforcing a prior over the features learned by the classifier.

Inspirations

Based on the theorems proposed in this paper, it's possible that

One can achieve adversarial robustness by filtering out the non-robust features in dataset.
One can improve adversarial robustness by aligning the Jacobians with the inputs (It has been achieved by JARN).

Adversarial Examples Are a Natural Consequence of Test Error in Noise - ICML 2019

Nic Ford, Justin Gilmer, Nicolas Carlini, Dogus Cubuk. Adversarial Examples Are a Natural Consequence of Test Error in Noise. arXiv:1901.10513

This paper finds connection between adversarial robustness and corruption robustness and indicates that increasing adversarial robustness can also increase corruption robustness.

In this paper we provide both empirical and theoretical evidence that these are two manifestations of the same underlying phenomenon, establishing close connections between the adversarial robustness and corruption robustness research programs.

Based on our results we recommend that future adversarial defenses consider evaluating the robustness of their methods to distributional shift with benchmarks such as Imagenet-C.

Motivation

Several recent papers (Gilmer et al., 2018b; Mahloujifar et al., 2018; Dohmatob, 2018; Fawzi et al., 2018a) use concentration of measure to prove rigorous upper bounds on adversarial robustness for certain distributions in terms of test error, suggesting non-zero test error may imply the existence of adversarial perturbations. ( $r_{unc}(f)\ge \frac{1}{2}r_{in}(x)$ )

This may seem in contradiction with empirical observations that increasing small perturbation robustness tends to reduce model accuracy (Tsipras et al., 2018). (i.e. Adversarial Robustness May be at Odds with Accuracy.)

It could be the case that hard bounds on adversarial robustness in terms of test error exist, but current classifiers have yet to approach these hard bounds.

$\varepsilon=2\eta$ ).

Adversarial and corruption robustness

$E$ $p$ , the two robustness is defined as

corruption robustness
$q$ , it's defined as
$\Bbb{P}_{x\sim q}[x\not\in E]$
$q$ is not an error. (Higher is better.)
adversarial robustness
$x$ $d$ $x$ $E$ $d(x,E)$ ), it's defined as
$\Bbb{P}_{x\sim p}[d(x,E)>\epsilon]$
$p$ $\epsilon$ of some point in the error set.
$\epsilon$ $\epsilon$ .

Errors in Gaussian noise suggest adversarial examples

For linear models, the error rate in Gaussian noise exactly determines the distance to the decision boundary.

$x_0$ $\mathcal{N}(x_0;\sigma^2 I)$ $\mu$ $\sigma(x_0,\mu)$ $\sigma$ $\mu$ , i.e.

\Bbb{E}_{x\sim\mathcal{N}(x_0;\sigma^2 I)}[x\in E]=\mu

$\mu$ $x$ $\mathcal{N}(x_0;\sigma^2 I)$ falls into the error set (crosses the boundary).

$l_2$ $d=||\cdot||_2$ , there is

d(x_0,E)=-\sigma(x_0,\mu)\Phi^{-1}(\mu)

In which,

\Phi(t)=\frac{1}{2\pi}\int_{-\infty}^t exp(-x^2/2)dx

is the cdf of the univariate standard normal distribution.

$\mathcal{N}(x_0;\sigma^2I)$ $x_0$ $\sigma\sqrt{n}$ $n$ is large.

$\sigma\sqrt{n}$ $\mu$ $\mu$ .

It indicates that even the error rate is ralatively small for random Gaussian perturbations, the feasible nearest perturbation is still very close to the original image, as shown in Figure 10.

$d(x_0,E)$ $\sigma(x_0,\mu)$ $\mu=0.01$ $\sigma(x_0,0.01)$ $d(x_0,E)$ , the latter is acquired using PGD.

$\sigma(x_0,0.01)$ $d(x_0,E)$ on average.

$\sigma$ $d$ .

They further draw two-dimensional slices in image space through three points to prove the correctness of half-space model.

Concentration of measure for noisy images

$x_0$ $\epsilon^*_q(E)$ .

$\epsilon$ for which

\Bbb{P}_{x\sim q}[d(x,E)\le\epsilon]=\frac{1}{2}

TheoremGaussian Isoperimetric Inequality $q=\mathcal{N}(0;\sigma^2 I)$ $\R^n$ $\sigma^2I$ $E\sube \R^n$ $\mu=\Bbb{P}_{x\sim q}[x\in E]$ $\Phi$ $\mu\ge\frac{1}{2}$ $\epsilon^*_q(E)=0.$ $\epsilon^*_q(E)\le-\sigma\Phi^{-1}(\mu)$ $E$ is a half space.

$\Bbb{P}_{x\sim q}[x\in E]$ , the most robust are the ones whose error set is a half space (as shown in Figure 1).

$\epsilon$ -band of error sets reaches its minimum if the error rate, i.e. the proportion of the red area is fixed.

They further evaluate the theorem on CIFAR-10 and ImageNet test sets.

$\epsilon^*_p$ using PGD with 200 steps one each sample and reported the median.

This shows that improved adversarial robustness results in improved robustness to large random perturbations, as the isoperimetric inequality says it must.

The adversarial training pushes the boundary behind and consequently reduces the proportion of the shaded area in Figure 1.

Evaluating corruption robustness

Gaussian data augmentation and adversarial training both improve the overall benchmark1, which requires averaging the performance across all corruptions, and the results were quite close.

Interestingly, both methods performed much worse than the clean model on the fog and contrast corruptions. (It's also reported on Fourier Perspective)

As shown in Figure 6, the Gaussian augmentation also helps with the adversarial robustness, although only a little. And as shown in Figure 7, they discover that those methods relying on gradient masking also fail to help with Gaussian noise.

Conclusion

The nearby errors we can find show up at the same distance scales we would expect from a linear model with the same corruption robustness.

Concentration of measure shows that a non-zero error rate in Gaussian noise logically implies the existence of small adversarial perturbations of noisy images.

Finally, training procedures designed to improve adversarial robustness also improve many types of corruption robustness, and training on Gaussian noise moderately improves adversarial robustness.

Inspirations

This is a fruitful paper. They build a bridge between the research of adversarial robustness and corruption robustness, and empirically show that the improvement of adversarial robustness also helps with the improvement of Gaussian noise corruption robustness.

Through the theorems they form, the purpose of these two robustness coincide with each other and they suggest further collaboration of these two communities of researchers.

They also demenstrate empirically that those defense methods reported to be relying on gradient masking also fail to help with Gaussian noise corruption, which is a new meter for defense methods to self evaluate.

The most amazing although not surprising is the result of the curse of dimensionality in robustness of model, i.e. a very small tolaratable error rate of Gaussian noise error entails the existence of very subtle adversarial perturbations.

It indicates that probably some noise augmentation can do the work for the heavy adversarial training.