Adversarial Vulnerability

By LI Haoyang 2020.10.27

Adversarial vulnerability for any classifier - NIPS 2018

Paper: http://papers.nips.cc/paper/7394-adversarial-vulnerability-for-any-classifier

Alhussein Fawzi, Hamza Fawzi, Omar Fawzi. Adversarial vulnerability for any classifier. NIPS 2018.

In this paper, we study the phenomenon of adversarial perturbations under the assumption that the data is generated with a smooth generative model. We derive fundamental upper bounds on the robustness to perturbations of any classification function, and prove the existence of adversarial perturbations that transfer well across different classifiers with small risk.

Setting

Generator and classifier

$g$ $z\in\cal{Z}:=\R^d$ $\cal{X}:=\R^m$ $m$ pixels, i.e.

g:\cal{Z}\to \cal{X}

$\mu$ $z\sim \upsilon$ $\upsilon=\mathcal{N}(0,I_d)$ $g(z)$ .

$f$ $\cal{X}:=\R^m$ $\{1,\dots,K\}$ , i.e.

f:\R^m\to\{1,\dots,K\}

$f$ $\cal{X}$ $K$ $C_i=\{x\in\cal{X}:f(x)=i\}$ , each of which corresponds to a different predicted label.

$i$ is equal to

\Bbb{P}(C_i)=\upsilon(g^{-1}(C_i))

$g^{-1}(C_i)$ $\cal{Z}$ .

robustness $f$ $g$ .

Robustness

They define two notions of robustness.

In-distribution robustness
$x=g(z)$ $r_{in}(x)$ is defined as
$r_{in}(x)=\min_{r\in\mathcal{Z}}||g(z+r)-x||,s.t.f(g(z+r))\ne f(x)$
In short, the minimal perturbation added to original image that change the classification result while remain the perturbed image in the original distribution of images.
Unconstrained robustness
The unconstrained robustness is the general definition of adversarial perturbations, i.e.
$r_{unc}(x)=\min_{r\in\mathcal{X}}||r||,s.t. f(x+r)\ne f(x)$
$r_{unc}(x)\le r_{in}(x)$ . (why?)

They assume that the generative model is smooth in the sense that it satisfies as modulus of continuity property:

Assumption 1 $g$ $w$ , i.e.

\forall z,z^{'}\in\mathcal{Z},||g(z)-g(z^{'})||\le w(||z-z^{'}||_2)

$w(t)$ $t$ $w(0)=0$ , which potentially allows us to model distributions with disconnected support.

Upper bounds on robustness

Theorem 1. $f:\R^m\to\{1,\dots,K\}$ $\eta$ satisfies

\Bbb{P}(r_{in}(x)\le\eta)\ge\sum_{i=1}^K(\Phi(a_{\ne i}+w^{-1}(\eta))-\Phi(a_{\ne i}))

$\Phi$ $\cal{N}(0,1)$ $\Phi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^xe^{-u^2/2}du$ $a_{\ne i}=\Phi^{-1}\left(\Bbb{P}\left(\cup_{i\ne i}C_i\right)\right)$ .

$i$ $\Bbb{P}(C_i)\le\frac{1}{2}$ (the classes are not too unbalanced), we have

\Bbb{P}(r_{in}(x)\le\eta)\ge1-\sqrt{\frac{\pi}{2}}e^{-w^{-1}(\eta)^2/2}

$\Bbb{P}(C_i)=\frac{1}{K}$ $i, K\ge 5$ , then

\mathbb{P}\left(r_{i n}(x) \leq \eta\right) \geq 1-\sqrt{\frac{\pi}{2}} e^{-\omega^{-1}(\eta)^{2} / 2} e^{-\eta \sqrt{\log \left(\frac{K^{2}}{4 \pi \log (K)}\right)}}

Based on Theorem 1, there are following results

$w^{-1}(\eta)=\eta/L$
$d$ $g$ $L\ll d$ $L$ $f$ .
$K$
In other words, it is easier to find adversarial perturbations in the setting where the number of classes is large, than for a binary classification task.
Classification-agnostic bound
$f$ .
How tight is the upper bound?
This suggests that classifiers are maximally robust when the induced classification boundaries in the latent space are linear.

Relation between in-distribution robustness and unconstrained robustness

$r_{unc}(x)\ge\frac{1}{2}r_{in}(x)$ .

$f$ $\tilde{f}$ constructed in a nearest neighbor strategy:

\tilde{f}(x)=f(g(z^{*}))\ \ \text{with}\ \ z^{*}=\arg\min_{z}||g(z)-x||

Theorem 2. $\tilde{f}$ $r_{unc}(x)\ge\frac{1}{2}r_{in}(x)$ .

$r$ $\frac{r}{2}$ $f$ .

This suggests that the unconstrained robustness can be increased by increasing in-distribution robustness.

Transferability of perturbations

Theorem 3 $f,h$ $\Bbb{P}(f\circ g(z)\neq h\circ g(z))\le\delta$ $f$ $h$ $\delta/2$ $g$ $\Bbb{P}(C_i(f))+\delta\le \frac{1}{2}$ $i$ , then

\begin{aligned} \mathbb{P}\left\{\exists v:\|v\|_{2} \leq\right.& \eta \ \ \text{and } \left.\begin{array}{r} f(g(z)+v) \neq f(g(z)) \\ h(g(z)+v) \neq h(g(z)) \end{array}\right\}\\ &\geq 1-\sqrt{\frac{\pi}{2}} e^{-\omega^{-1}(\eta)^{2} / 2}-2 \delta \end{aligned}

Theorem 1 $2\delta$ term, which is small if the risk of both classifiers is small.

This suggests that if both classifiers are easy to be fooled, it's not that difficult to find a transferable adversarial example.

Approximate generative model

Theorem 4.Theorem 1 $g$ $\delta$ $\mu$ $1$ $(\cal{X},||\cdot||)$ $W(g_{*}(\upsilon),\mu)\le\delta$ $g_{*}(\upsilon)$ $\upsilon$ $g$ $w$ is concave

\underset{x \sim \mu}{\mathbb{E}} r_{u n c}(x) \leq \omega\left(\sum_{i=1}^{K}-a_{\neq i} \Phi\left(-a_{\neq i}\right)+\frac{e^{-a_{\neq i}^{2} / 2}}{\sqrt{2 \pi}}\right)+\delta

$r_{unc}(x)$ $K\ge 5$ equiprobable classes, we have

\underset{x \sim \mu}{\mathbb{E}} r_{\text {unc}}(x) \leq \omega\left(\frac{\log (4 \pi \log (K))}{\sqrt{2 \log (K)}}\right)+\delta

In words, when the data is defined according to a distribution which can be approximated by a smooth, high-dimensional generative model, our results show that arbitrary classifiers will have small adversarial examples in expectation.

$K$ $w$ $0$ . Note however that the decrease is slow as it is only logarithmic.

This suggests that for a classification of infinity classes, the expected adversarial perturbation can be very small.

Experiments

Inspirations

I admit that I don't completely understand this paper.

From the general idea, this paper suggests the following potentials

Higher-dimensional data and more classes introduce less robustness.
Unconstrained robustness (general robustness) can be increased by increasing in-distribution robustness (It seems to suggest that an accurate model is also important in terms of robustness?)
A classifier is more robust if the boundary in latent space is more linear.

Derivation of theorems

Gaussian isoperimetric inequality

The standard Gaussian distribution is

\Phi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^x e^{-x^2/2}dx

Theorem

$\upsilon_d$ $\R^d$ $A\sube \R^d$ $A_{\eta}=\{z\in\R^d:\exist z^{'}\in A,s.t. ||z-z^{'}||_2\le\eta\}$ $\upsilon_d(A)=\Phi(a)$ $\upsilon_d(A_{\eta})\ge\Phi(a+\eta)$ .

$A_{\eta}$ $A$ $A_{\eta}$ $A$ $\eta$ $||\cdot||_2$ $A$ $\Phi(a)$ $A_{\eta}$ $\Phi(a+\eta)$ .

Theorem 1

Theorem

$f:\R^m\to\{1,\dots,K\}$ $\eta$ satisfies

\Bbb{P}(r_{in}(x)\le\eta)\ge\sum_{i=1}^K(\Phi(a_{\ne i}+w^{-1}(\eta))-\Phi(a_{\ne i}))

$\Phi$ $\cal{N}(0,1)$ $\Phi(x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^xe^{-x^2/2}dx$ $a_{\ne i}=\Phi^{-1}\left(\Bbb{P}\left(\cup_{i\ne i}C_i\right)\right)$ .

$i$ $\Bbb{P}(C_i)\le\frac{1}{2}$ (the classes are not too unbalanced), we have

\Bbb{P}(r_{in}(x)\le\eta)\ge1-\sqrt{\frac{\pi}{2}}e^{-w^{-1}(\eta)^2/2}

$\Bbb{P}(C_i)=\frac{1}{K}$ $i, K\ge 5$ , then

\mathbb{P}\left(r_{i n}(x) \leq \eta\right) \geq 1-\sqrt{\frac{\pi}{2}} e^{-\omega^{-1}(\eta)^{2} / 2} e^{-\eta \sqrt{\log \left(\frac{K^{2}}{4 \pi \log (K)}\right)}}

Derivation

First, define

C_{i\to}=\{x\in C_i:dist(x,\cup_{j\neq i}C_j)\le\eta\},\\ dist(x,C)=\inf_{x^{'}\in C}||x-x^{'}||

$dist(x,C)$ $x$ $C$ $C_{i\to}$ $C_i$ $\eta$ , it's a set of out-most data points.

$z$ -space

B_i=g^{-1}(C_i),\\ B_{i\to}=\{z\in B_i:dist(z,\cup_{j\neq i}B_j\le w^{-1}(\eta))\}

$w^{-1}(\eta)$ $\eta$ $z$ -space.

$g(B_{i\to})\sube C_{i\to}$ .

Thus, we have

\Bbb{P}(C_{i\to})=\upsilon(g^{-1}(C_{i\to}))\ge\upsilon(B_{i\to})

$C_{i\to}$ $B_{i\to}$ , the probability of the former is larger than the latter.

$r_{in}(x)$ :

r_{in}(x)=\min_{r\in\mathcal{Z}}||g(z+r)-x||,s.t.f(g(z+r))\ne f(x)

We have

\Bbb{P}(r_{in}(x)\le\eta)=\Bbb{P}(\cup_i C_{i\to})

$C_{i\to}$ $\eta$ $r_{in}(x)$ is the minmal distance to move a point from the region corresponding to any class to another region classified some classes else, hence the equation above.

$(B_{i\to})\cup{(\cup_{j\neq i}B_j)}$ $w^{-1}(\eta)$ $\cup_{j\neq i}B_j$ . (by definition)

$A=\cup_{j\neq i}B_j$ $a=a_{\neq i}$ , ( $\upsilon(A)=\Phi(a)$ )we have

\begin{array}{rl} &\upsilon(A_{w^{-1}(\eta)})\ge\Phi(a+w^{-1}(\eta))\\\implies &\upsilon((B_{i\to})\cup{(\cup_{j\neq i}B_j)})\ge\Phi(a_{\neq i}+w^{-1}(\eta))\\ \implies &\upsilon(B_{i\to})+\upsilon(\cup_{j\neq i}B_j)\ge\Phi(a_{\neq i}+w^{-1}(\eta))\\ \implies &\upsilon(B_{i\to})\ge\Phi(a_{\neq i}+w^{-1}(\eta))-\Phi(a_{\neq i}) \end{array}

$B_{i\to}$ $i$ , we have

\upsilon(\cup_iB_{i\to}(\eta))\ge\sum_{i=1}^K(\Phi(a_{\neq i}+w^{-1}(\eta))-\Phi(a_{\neq i}))

$\Bbb{P}(C_{i\to})\ge \upsilon(B_{i\to})$ $\Bbb{P}(r_{in}(x)\le\eta)=\Bbb{P}(\cup_i C_{i\to})$ , there stands the first equation in theorem 1

\Bbb{P}(r_{in}(x)\le\eta)\ge\sum_{i=1}^K(\Phi(a_{\ne i}+w^{-1}(\eta))-\Phi(a_{\ne i}))

$z$ $\eta$ $w^{-1}(\eta)$ .

Theorem 2

Theorem

$f$ $\tilde{f}$ constructed in a nearest neighbor strategy:

\tilde{f}(x)=f(g(z^{*}))\ \ \text{with}\ \ z^{*}=\arg\min_{z}||g(z)-x||

$\tilde{f}$ $r_{unc}(x)\ge\frac{1}{2}r_{in}(x)$ .

$\tilde{f}$ $x$ as the class of its nearest neighbor in distribution.

Proof

$x=g(z)\in\cal{X}$ $x^{'}\in\cal{X}$ $z^*$ $\tilde{f}(x^{'})=f(g(z^*))$ $\tilde{f}$ $||x^{'}-g(z^*)||\le||x^{'}-g(z)||$ . As such, using the triangle inequality, we get

\begin{aligned} \left\|g(z)-g\left(z^{*}\right)\right\| & \leq\left\|g(z)-x^{\prime}\right\|+\left\|x^{\prime}-g\left(z^{*}\right)\right\| \\ & \leq 2\left\|g(z)-x^{\prime}\right\| \end{aligned}

$x^{'}$ $\tilde{f}(x)\neq\tilde{f}(x^{'})$ , we obtain

r_{in}(x)\le 2r_{unc}(x)