Provable Defense

By LI Haoyang 2020.11.6

Content

Verification

A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks - NIPS 2019

Code: http://github.com/Hadisalman/robust-verify-benchmark

Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, Pengchuan Zhang. A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks. NIPS 2019. arXiv:1902.08722

Verification algorithms fall into two categories: exact verifiers that run in exponential time and relaxed verifiers that are efficient but incomplete.

$l_{\infty}$ adversarial distortion, the optimal layer-wise convex relaxation only slightly improves the lower bound found by Wong and Kolter [2018], especially when compared with the upper bound provided by the PGD attack, which is consistently 1.5 to 5 times larger;

(ii) in terms of upper bounding the robust error, the optimal layer-wise convex relaxation does not significantly close the gap between the PGD lower bound (or MILP exact answer) and the upper bound from Wong and Kolter [2018]

Problem

$f:\R^n\to\R^K$ $f_i(x)$ $i$ adversarially robust $x$ $\mathcal{S}_{in}(x)$ if

\min_{x^\prime\in\mathcal{S}_{in}(x),i\neq i^*}f_{i^*}(x)-f_i(x^\prime)>0,\text{ where }i^*=\arg\max_{j}f_j(x)

It ensures that for the most perturbed example in the attack space, the second largest logit is still smaller than the largest, i.e. correct, logit.

$\mathcal{S}_{in}(x^{nom})$ $\mathcal{S}_{in}(x^{nom})=\{x:||x-x^{nom}||_\infty\le\epsilon\}$ $l_{\infty}$ $f(x)$ $L$ -layer feedforward NN.

$\{0,1,\dots,L-1\}$ $[L]$ $\{x^{(0)},x^{(1)},x^{(L-1)}\}$ $x^{[L]}$ $f(x)$ is defined as

x^{(l+1)}=\sigma^{(l)}(\bold{W}^{(l)}x^{(l)}+b^{(l)}),\ \forall l\in[L],\\ f(x):=z^{(L)}=\bold{W}^{(L)}x^{(L)}+b^{(L)}

$x^{(l)}\in\R^{n^{(l)}},z^{(l)}\in\R^{n_z^{(l)}},x^{(0)}:=x\in\R^{n^{(0)}}$ $\bold{W}^{(l)}\in\R^{n_z^{(l)}\times n^{(l)}}$ $b^{(l)}\in\R^{n_z^{(l)}}$ $l$ $\sigma^{(l)}:\R^{n_z^{(l)}}\to\R^{n^{(l+1)}}$ is a non-linear activation function.

Convex relaxation from the primal view

$\mathcal{O}(c,c_0,L,\underline{z}^{[L]},\overline{z}^{[L]})$ :

\min_{(x^{[L+1]},z^{[L]})\in\mathcal{D}}c^\top x^{(L)}+c_0\\ \text{s.t. }z^{(l)}=\bold{W}^{(l)}x^{(l)}+b^{(l)},l\in[L]\\ x^{(l+1)}=\sigma^{(l)}(z^{(l)}),l\in[L]

$\mathcal{D}$ is

\mathcal{D}=\{(x^{[L+1]},z^{[L]}):x^{(0)}\in\mathcal{S}_{in}(x^{nom}), \underline{z}^{(l)}\le z^{(l)}\le\overline{z}^{(l)},l\in[L]\}

i.e. the set of activations and preactivations satisfying the bounds.

$c^\top=\bold{W}_{i^{nom}}^{(L)}-\bold{W}_i^{(L)},c_0=b_{i^{nom}}^{(L)}-b_i^{(L)},\overline{z}^{[L]}=\infty$ $\mathcal{O}$ $p_{\mathcal{O}}^*$ .

$z^{(l)}$ , we can significantly narrow donw the optimization domain and achieve tighter solutions when we relax the nonlinearities:

Obtaining lower and upper bounds by solving sub-problems
It's widely used but not efficient.
Convex relaxation in the primal space
$x^{(l+1)}=\sigma^{(l)}(z^{(l)})$ $\mathcal{C}$ :
$\min_{(x^{[L+1]},z^{[L]})\in\mathcal{D}}c^\top x^{(L)}+c_0\\ \text{s.t. }z^{(l)}=\bold{W}^{(l)}x^{(l)}+b^{(l)},\underline{\sigma}^{(l)}(z^{(l)})\le x^{(l+1)}\le\overline{\sigma}^{(l)}(z^{(l)}),\forall l\in[L]$
$\overline{\sigma}^{(l)}(z)$ $\underline{\sigma}^{(l)}(z)$ $\underline{\sigma}^{(l)}(z^{(l)})\le \sigma^{(l)}(z^{(l)})\le\overline{\sigma}^{(l)}(z^{(l)})$ $\underline{z}^{(l)}\le z\le \overline{z}^{(l)}$ $\mathcal{C}$ $p_{\mathcal{C}}^*$ $p_{\mathcal{C}}^*\le p_{\mathcal{O}}^*$ .
The optimal layer-wise convex relaxation
The optimal layer-wise convex relaxation, i.e.
- $\underline{\sigma}_{opt}(z)$ $\sigma$ .
- $\overline{\sigma}_{opt}(z)$ $\sigma$ .
$\mathcal{C}_{opt}$ $p_{\mathcal{C}_{opt}}^*$ .
Greedily solving the primal with linear bounds
$p_{\mathcal{C}_{opt}}^*$ $\mathcal{C}$ , i.e.
$\overline{\sigma}^{(l)}(z^{(l)}):=\overline{a}^{(l)}z^{(l)}+\overline{b}^{(l)},\underline{\sigma}^{(l)}(z^{(l)}):=\underline{a}^{(l)}z^{(l)}+\underline{b}^{(l)}$
$z^{(L)}$ $x^{(0)}$ can be recursively acquired.

Intuitively, the convex relaxation expands the attack space to include those for networks with activations in between the higher and lower activation bounds.

Convex relaxation from the dual view

Strong duality for the convex relaxed problem

$\mathcal{C}$ with dual variables introduced is

gc(\mu^{[L]},\underline{\lambda}^{[L]},\overline{\lambda}^{[L]}):=\min_{(x^{[L+1]},z^{[L]})\in\mathcal{D}}c^\top x^{(L)}+c_0+\sum_{l=0}^{L-1}\mu^{(l)\top}(z^{(l)}-\bold{W}^{(l)}x^{(l)}-b^{(l)})\\ -\sum_{l=0}^{L-1}\underline{\lambda}^{(l)\top}(x^{(l+1)}-\underline{\sigma}^{(l)}(z^{(l)}))+\sum_{l=0}^{L-1}\overline{\lambda}^{(l)\top}(x^{(l+1)}-\overline{\sigma}^{(l)}(z^{(l)}))

by weak duality [Boyd and Vandenberghe, 2004]

d_{\mathcal{C}}^*=\max_{\mu^{[L]},\underline{\lambda}^{[L]}\ge 0,\overline{\lambda}^{[L]}\ge 0}gc(\mu^{[L]},\underline{\lambda}^{[L]},\overline{\lambda}^{[L]})\le p_{\mathcal{C}}^*

Theorem 4.1 $p_{\mathcal{C}}^*=d_{\mathcal{C}}^*$ $\underline{\sigma}^{(l)}$ $\overline{\sigma}^{(l)}$ $[\underline{z}^{(l)},\overline{z}^{(l)}]$ $l\in [L]$ $\mathcal{C}$ and the weak duality.

The dual form is the standard Lagrangian dual form for the relaxed problem.

The optimal layer-wise dual relaxation.

$\mathcal{O}$ is (first proposed by Dvijotham et al. [2018b])

go(\mu^{[L]},\lambda^{[L]}):=\min_{(x^{[L+1]},z^{[L]})\in\mathcal{D}}c^\top x^{(L)}+c_0\\ +\sum_{l=0}^{L-1}\mu^{(l)\top}(z^{(l)}-\bold{W}^{(l)}x^{(l)}-b^{(l)})+\sum_{l=0}^{L-1}\lambda^{(l)\top}(x^{(l+1)}-\sigma^{(l)}(z^{(l)}))

by weak duality

d_{\mathcal{O}}^*:=\max_{\mu^{[L]},\lambda^{[L]}}go(\mu^{[L]},\lambda^{[L]})\le p_{\mathcal{O}}^*

Theorem 4.2 $d_{\mathcal{O}}^*=d_{\mathcal{C}_{opt}}^*$ $\sigma^{(l)}$ $\underline{\sigma}_{opt}^{(l)}$ $\overline{\sigma}_{opt}^{(l)}$ $d_{\mathcal{C}_{opt}}^*$ $d_{\mathcal{O}}^*$ provided by the dual of the original problem are the same.

$\mathcal{C}$ ) and the two kinds of dual relaxations, (9) and (11), are all blocked by the same barrier.

Corollary 4.3 $p_{\mathcal{C}_{opt}}^*=d_{\mathcal{O}}^*$ $\sigma^{(l)}$ $l\in[L]$ $\underline{\sigma}_{opt}^{(l)}$ $\overline{\sigma}_{opt}^{(l)}$ $p_{\mathcal{C}_{opt}}^*$ $\mathcal{C}$ $d_{\mathcal{O}}^*$ provided by the dual relaxation (11) are the same.

Greedily solving the dual with linear bounds

$\underline{\sigma}$ $\overline{\sigma}$ are linear, the dual objective can be lower bounded as below:

p_{\mathcal{C}}^*=d_{\mathcal{C}}^*\ge\sum_{l=0}^{L-1}\left(\overline{b}^{(l)\top}(\lambda^{(l)})_{+}-\underline{b}^{(l)\top}(\lambda^{(l)})_{-}-b^{(l)\top}\mu^{(l)}\right)+c_0-\sup_{x\in\mathcal{S}_{in}(x^{nom})}(\bold{W}^{(0)\top}\mu^{(0)})^\top x

The dual variables are determined by a backward propagation:

\lambda^{(L-1)}=-c,\mu^{(l)}=\overline{a}^{(l)}(\lambda^{(l)})_{+}+\underline{a}^{(l)}(\lambda^{(l)})_{-},\lambda^{(l-1)}=\bold{W}^{(l)\top}\mu^{(l)}\ \forall l\in [L-1]

I don't get it....

Optimal LP-relaxed Verification

$p_{\mathcal{C}}^*$ ,that limits all such algorithms. Is this just theoretical babbling or is this barrier actually problematic in practice?

$p_{\mathcal{C}}^*$ ,thus it's a barrier.

To exactly solve the tightest LP-relaxed verification problem of a ReLU network, two steps are required:

Obtaining pre-activation bounds
Obtaining the tightest pre-activation upper and lower bounds of all the neurons in the NN, excluding those in the last layer.
$\mathcal{C}$ $L_0$ $l_0\in[L_0]$ $\underline{z}_{j}^{(l_0)}$ $\overline{z}_{j}^{(l_0)}$ $j\in[n^{(l_0)}]$ $\mathcal{C}$ :
$L\gets l_0,c^\top\gets \bold{W}^{(l_0)}_{j,:}(\text{resp. } c^\top\gets -\bold{W}_{j,:}^\top),c_0\gets b_j^{(l_0)}(\text{resp. }c_0\gets -b_{j}^{(l_0)})$
and the exact optimum is computed.
This is solved in parallel.
Indeed, we design a scheduler to do so on a cluster with 1000 CPU-nodes.
Solving the LP-relaxed problem for the last layer
Solving the LP-relaxed verification problem exactly for the last layer of the NN.
$\mathcal{C}$ $j\in[n^{(L_0)}]\backslash \{j^{nom}\}$ can be obtained by
$L\gets L_0,c^{\top}\gets\bold{W}_{j^{nom},:}^{(L_0)}-\bold{W}_{j,:}^{(L_0)},c_0\gets b_{j^{nom}}^{(L_0)}-b_{j}^{(L_0)}$
and the exact minimum is computed.
$j^{nom}$ $x^{nom}$ .
$x^{nom}$ iff the solutions of all such LPs are positive, i.e. we cannot make the true class logit lower than any other logits.

Experiments

We conduct two experiments to assess the tightness of LP-ALL:
1) finding certified upper bounds on the robust error of several NN classifiers,
$\epsilon$ using different algorithms.
All experiments are conducted on MNIST and/or CIFAR-10 datasets.

We conduct experiments on a range of ReLU-activated feedforward networks. MLP-A and MLP-B refer to multilayer perceptrons: MLP-A has 1 hidden layer with 500 neurons, and MLP-B has 2 hidden layers with 100 neurons each.

We run experiments on a cluster with 1000 CPU-nodes. The total run time amounts to more than 22 CPU-years.

It's computational expensive....

For all NORmally and ADV-trained networks, we see that the certified upper bounds using LP-GREEDY and LP-ALL are very loose when we compare the gap between them to the lower bounds found by PGD and MILP.

$\epsilon$ $l_{\infty}$ ball in which no adversarial examples can be crafted.

Since solving LP-ALL is expensive, we find the -bounds only for ten samples of the MNIST and CIFAR-10 datasets.

On MNIST, the results show that for all networks trained NORmally or via ADV, the certified lower bounds on are 1.5 to 5 times smaller than the upper bound found by PGD

This indicates that although PGD cannot find an adversarial example in the smaller radius but the adversarial examples exist in a smaller radius, verified by the robust verification.

Then we performed extensive experiments to show that even the optimal convex relaxation for ReLU networks in this framework cannot obtain tight bounds on the robust error in all cases we consider here. Thus any method will face a convex relaxation barrier as soon as it can be described by our framework

In so far as the ultimate goal of robustness verification is to construct a training method to lower certified error, this barrier is not necessarily problematic — some such method could still produce networks for which convex relaxation as described by our framework produces accurate robust error bounds.

Inspirations

This paper provides a large amount of mathematical equations and notations, which are a little overwhelmed, a smaller portion of them will be favored.

The conculsion indicates that all the methods with convex relaxation are barriered by the optimal convex relaxation, thus none of them succeeds in finding the real lower bound, but in practice, this gap so far still provide accurate robust error bounds.

It's clear that the method proposed can be improved computationally, otherwise it will never be a prevailing method.

Semantify-NN - CVPR 2020

Code: https://github.com/JeetMo/SemantifyNN

Jeet Mohapatra, Tsui-Wei (Lily)Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel. Towards Verifying Robustness of Neural Networks Against A Family of Semantic Perturbations. CVPR 2020. arXiv:1912.09533

To bridge this gap, we propose Semantify-NN, a model-agnostic and generic robustness verification approach against semantic perturbations for neural networks.

They propose a Semantic Perturbation layer that can be inserted into the input layer of any given model, which is capable of verifying the robustness of model under semantic perturbations.

$x$ and a trained DNN, the primary goal of verification tools is to provide a “robustness certificate” for verifying its properties in a specified threat model.

Robustness certification

$\mathcal{A}$ attack space $x$ $\Omega_\mathcal{A}(x)$ $d_{\mathcal{A}}$ robustness certification $\delta$ such that

\min_{x^{'}\in\Omega_{\mathcal{A}}(x),d_{\mathcal{A}}(x,x^{'})\le\delta}\left(\min_{j\neq c}f_c(x^{'})-f_j(x^{'})\right)>0

$f$ $K$ $f_j(\cdot)$ $j$ $j\in\{1,2,\dots,K\}$ $c$ $x$ .

In short, it aims to find the largest magnitude bound under a certain threat model that every perturbation permitted fails to change the classification result.

Semantic perturbation

In general, semantic adversarial attacks craft adversarial examples by tuning a set of parameters governing semantic manipulations of data samples, which are either explicitly specified (e.g. rotation angle) or implicitly learned (e.g. latent representations of generative models).

Continuously parameterized semantic perturbations cannot be enumerated.

This name is tricking, the semantic perturbations are basically manipulations of the image that preserve the semantic meaning of the image. Actually a few years ago, this may be seen as a method to test the generalization of models.

$\Omega_{\mathcal{A}}^k$ , there exists a function

g:X\times(I_1\times I_2\times\cdots\times I_k)\to X

such that

\Omega_{\mathcal{A}}^k(x)=\{g(x,\epsilon_1,\dots,\epsilon_k)|\epsilon_i\in I_i\}\\ d_{\mathcal{A}}(g(x,\epsilon_1,\dots,\epsilon_k),x)=||(\epsilon_1,\dots,\epsilon_k)||_p

In which

$X$ is the pixel space (i.e. the raw image)
$I_i$ denotes a set of feasible semantic operations
$\epsilon_i$ $I_i$
$k$ denotes the dimension of semantic attack
$\epsilon^k=(\epsilon_1,\epsilon_2,\dots,\epsilon_k)\\ I^k=I_1\times\cdots\times I_k$

$\epsilon_i$ can describe some human-interpretable characteristic of an image, such as translations shift, rotation angle, etc.

Semantic attacks can be divided into two categories

Discretely parameterized perturbations
(translation, occlusions, etc.)
Continuously parameterized perturbations
(color changes, brightness, contrast, spatial transformations, etc.)

Discretely Parameterized Semantic Perturbation

Translation
$I_1=\{0,1,\dots,r\},I_2=\{0,1,\dots,t\}$ $r,t$ .
Occlusion
It can be expressed by a 3-dimensional attack with three parameters, i.e. the coordinates of the left-uppermost pixel of the occlusion patch and the size of the patch.

SP-layers (Continuously Parameterized Semantic Perturbation)

$I^k\sub \R^k$ .

They propose to add semantic perturbation layers (SP-layers) to the input layer for efficient robustness verification.

$g_x(\epsilon^k)=g(x,\epsilon^k)$ , the verification problem becomes

\min_{\epsilon^k\in I^k,d_{\mathcal{A}}(g_x(\epsilon^k),x)\le\delta} \left(\min_{j\neq c}f_c(g_x(\epsilon^k))-f_j(g_x(\epsilon^k))\right)>0

$f^{sem}=f\circ g_x$ , it further becomes

\min_{\epsilon^k\in I^k,||\epsilon^k||_p\le\delta} \left(\min_{j\neq c}f_c^{sem}(\epsilon^k)-f_j^{sem}(\epsilon^k)\right)>0

$l_p$ $I^k$ .

Overall, the SP-layers work as a parameterized input transformation function from the semantic space to RGB space used for further predictions.

Instantiations of SP-layers

Hue

Two colors with the same hue are generally considered as different shades of a color, like blue and light blue.

$0-360^\circ$ $[0,6]$ , thus there is

\begin{array}{rl} g(R,G,B,\epsilon_h)&=\left(d \cdot \phi_{R}^{h}\left(h^{\prime}\right)+m, d \cdot \phi_{G}^{h}\left(h^{\prime}\right)+m, d \cdot \phi_{B}^{h}\left(h^{\prime}\right)+m\right) \end{array}

In which,

\begin{array}{rl} d&=(1-|2l-1|)s,\\ m&=1-\frac{d}{2},\\ h^{\prime}&=(h+\epsilon_h)\mod {6} \end{array}

and

\left(\phi_{R}^{h}\left(h^{\prime}\right), \phi_{G}^{h}\left(h^{\prime}\right), \phi_{B}^{h}\left(h^{\prime}\right)\right)=\left\{\begin{array}{ll} (1, V, 0) & 0 \leq h^{\prime} \leq 1 \\ (V, 1,0) & 1 \leq h^{\prime} \leq 2 \\ (0,1, V) & 2 \leq h^{\prime} \leq 3 \\ (0, V, 1) & 3 \leq h^{\prime} \leq 4 \\ (V, 0,1) & 4 \leq h^{\prime} \leq 5 \\ (1,0, V) & 5 \leq h^{\prime} \leq 6 \end{array}\right.

$V=(1-|(h^\prime\mod 2)-1|)$ .

$0\le h^{\prime}\le 6$ $\sigma_i(x)=\text{ReLU}(x-i)$ to a hidden layer connecting from hue space to original RGB space:

\begin{aligned} \phi_{R}^{h}\left(h^{\prime}\right) &=1+\sigma_{2}\left(h^{\prime}\right)+\sigma_{4}\left(h^{\prime}\right)-\left(\sigma_{5}\left(h^{\prime}\right)+\sigma_{1}\left(h^{\prime}\right)\right) \\ \phi_{G}^{h}\left(h^{\prime}\right) &=\sigma_{0}\left(h^{\prime}\right)+\sigma_{4}\left(h^{\prime}\right)-\left(\sigma_{1}\left(h^{\prime}\right)+\sigma_{3}\left(h^{\prime}\right)\right) \\ \phi_{B}^{h}\left(h^{\prime}\right) &=\sigma_{2}\left(h^{\prime}\right)+\sigma_{6}\left(h^{\prime}\right)-\left(\sigma_{5}\left(h^{\prime}\right)+\sigma_{3}\left(h^{\prime}\right)\right) \end{aligned}

Intriguing, thus it becomes a layer of networks.

Saturation

This value corresponds to the colorfulness of the picture, with a saturation of 0, the picture becomes gray-scale.

g(R,G,B,\epsilon_s)=(d_R\cdot\phi^s(s^\prime)+l,d_G\cdot\phi^s(s^\prime)+l,d_B\cdot\phi^s(s^\prime)+l)

In which

\begin{array}{rl} s^\prime&=s+\epsilon_s\\ d_R&=\frac{R-l}{s}\\ d_G&=\frac{G-l}{s}\\ d_B&=\frac{B-l}{s} \end{array}

and

\phi^{s}\left(s^{\prime}\right)=\min \left(\max \left(s^{\prime}, 0\right), 1\right)=\sigma_{0}\left(s^{\prime}\right)-\sigma_{1}\left(s^{\prime}\right)

Hence it also becomes a layer of network.

Lightness

This property corresponds to the perceived brightness of the image where a lightness of 1 gives us white and a lightness of 0 gives us black images.

g(R,G,B,\epsilon_l)=(d_R\cdot\phi_1^l(l^\prime)+\phi_2^l(l^\prime),d_G\cdot\phi_1^l(l^\prime)+\phi_2^l(l^\prime),d_B\cdot\phi_1^l(l^\prime)+\phi_2^l(l^\prime))

In which

\begin{array}{rl} l^\prime&=l+\epsilon_l\\ d_R&=\frac{R-l}{1-|2l-1|}\\ d_G&=\frac{G-l}{1-|2l-1|}\\ d_B&=\frac{B-l}{1-|2l-1|} \end{array}

and

\begin{array}{rl} \phi_1^l(l^\prime)&=1-|2\cdot\min(\max(l^\prime,0),1)-1|\\ &=-\sigma_0(2\cdot l^\prime)-\sigma_2(2\cdot l^\prime)+2\cdot\sigma_1(2\cdot l^\prime)+1\\ \phi_2^l(l^\prime)&=\min(\max(l^\prime,0),1)=\sigma_0(l^\prime)-\sigma_1(l^\prime) \end{array}

Another layer of network!

Brightness and contrast

$\epsilon_b$ $\epsilon_c$ , this layer is

\begin{array}{rl} g(x,\epsilon_b,\epsilon_c)&=\min(\max((1+\epsilon_c)\cdot x+\epsilon_b,0),1)\\ &=\sigma_0((1+\epsilon_c)\cdot x+\epsilon_b)-\sigma_1((1+\epsilon_c)\cdot x+\epsilon_b) \end{array}

Rotation

$\theta$ . They consider rotations at the center of the image with the boundaries being extended to the area outside the image.

$(i,j)$ $x_{i,j}^\prime$ $\theta$ , it's acquired by

x^\prime_{i,j}=\frac{\sum_{k,l}x_{k,l}\cdot\max(0,1-\sqrt{(k-i^\prime)^2+(l-j^\prime)^2})} {\sum_{k,l}\max(0,1-\sqrt{(k-i^\prime)^2+(l-j^\prime)^2})}

In which

\begin{array}{rl} i^\prime&=i\cos\theta-j\sin\theta\\ j^\prime&=j\cos\theta+i\sin\theta \end{array}

$k,l$ range over all possible values.

This part is a little weird.

$(k,l)$ $(i,j)$ is given by

m_{(k,l)(i,j)}(\theta)=\frac{\max(0,1-\sqrt{(k-i^\prime)^2+(l-j^\prime)^2})} {\sum_{k^\prime,l^\prime}\max(0,1-\sqrt{(k^\prime-i^\prime)^2+(l^\prime-j^\prime)^2})}

It's highly non-linear and $\theta$ $\theta$ , it takes non-zero values which can go up to 1.

This makes naive verification infeasible and one idea is to use Explicit Input Splitting (split the range of angles into small parts), but it's still computationally infeasible for a large number of splitsim.

To balance this trade-off, we propose a new refinement technique named as implicit input splitting.

Input space refinement for Sematify-NN

Theorem 3.1. $x$ $S$ $x^\prime$ $x^\prime$ $S$ is also correctly classified, where the convex hull in the pixel space.

Here one certification cycle means one pass through the certification algorithm sharing the same linear relaxation values.

As a result, we can split the original interval into smaller parts and certify each of them separately in order to certify the larger interval.

Intuitively, since the minimizer of the output space for the union of intervals is one of the minimizers for each intervals, thus the minimum requirement still holds for the union if it holds for every interval.

We find that we are unable to get good verification results for datasets like MNIST and CIFAR-10 without increasing the number of partitions to very large values (≈ 40, 000).

Experiments

Inspirations

The so-called semantic perturbation seems familiar and some years ago, they were used to test the generalization of models in face of corruptions, translations and rotations. Specially, CNN used to be claimed to be translational invariant and rotational invariant.

This paper reformulates the semantic perturbations into the form of an handcrafted neural network that transforms the parameter space into the image space, defining the attack space implicitly. This structure and intuition sounds very familiar, if we refer the parameter space as latent space and view the Semantify-NN as a generator and the classifier as a discriminator, the whole thing becomes a Generative Adversarial Nets, i.e. GAN, but with a handcrafted generator and a multi-class discriminator.

Based on this resembling, I think someone will soon use a GAN to solve this problem.

Adversarial Training + Verification

COLT (Convex Layerwise Adversarial Training) - ICLR 2020

Code: https://github.com/eth-sri/colt

Paper: https://openreview.net/forum?id=SJxSDxrKDr

Mislav Balunovic, Martin Vechev. Adversarial Training and Provable Defenses: Bridging the Gap. ICLR 2020.

They propose to train the network with both a verifier and an adversary.

The overall procedure is:

The verifier produces a convex relaxation of all possible intermediate vector outputs in the neural network to certify a property (e.g. robustness) of the network.
An adversary searches over this (intermediate) convex region in order to find a latent adversarial example, i.e. a concrete intermediate input contained in the relaxation that when propagated through the network causes a misclassification which prevents verification.
The resulting latent adversarial examples are now incorporated into our training scheme using adversarial training.

Threat model and settings

$\bold{x}\in\R^{d_0}$ $\Bbb{S}_0(\bold{x})\sube\R^{d_0}$ .

A convex set is a set that for every two elements in this set, the weighted combination of them is still an element in the set.

$\Bbb{S}_0$ can capture a wide range of specifications, including:

$L_p$ perturbations
Geometric transformations
Semantic perturbations
Camera imaging

$L_{\infty}$ perturbations, this specification of convex set is

\Bbb{S}_0(\bold{x})=\{\bold{x}^\prime\in\R^{d_0},||\bold{x}-\bold{x}^\prime||_{\infty}<\epsilon\}

$k$ $\theta$ is

h_{\theta}=h_{\theta}^k\circ h_{\theta}^{k-1}\circ\cdots\circ h_{\theta}^1

$h_{\theta}^i:\R^{d_{i-1}}\to\R^{d_i}$ $i$ $i$ $k$ is denoted as

h_{\theta}^{i:k}=h_{\theta}^k\circ h_{\theta}^{k-1}\circ\cdots\circ h_{\theta}^i

The goal of verification is to prove a property on the output of the neural network, encoded via a linear constraint:

c^\top h_{\theta}(\bold{x}^\prime)+d<0,\forall\bold{x}^\prime\in\Bbb{S}_0(\bold{x})

$c$ $d$ are property specific vector and scalar values respectively.

$h_{\theta}$ $\mathcal{L}$ and solve the following min-max optimization problem:

\min_{\theta}\Bbb{E}_{(x,y)\sim D}\max_{\bold{x}^\prime\in\Bbb{S}_0(\bold{x})}\mathcal{L}(h_{\theta}(\bold{x}^\prime),y)

Based on the approximation used to solve the inner maximization, there are two families of techniques:

Adversarial Training
It replaces the maximum loss with a lower bound obtained using an adversarial attack. The PGD-based adversarial training is empirically robust but without guarantees.
Provable Defenses
It replaces the maximum loss with a computed upper bound, hence providing guarantees on the robustness of the resulting network inside the threat model.
But the upper bound is computed using relaxation, which is typically not tight, and the way the bound is computed is also significantly more complex.

Certification via convex relaxations

$i$ , i.e.

\Bbb{S}_i(\bold{x})=h_{\theta}^i(\Bbb{S}_{i-1}(\bold{x}))\sube \R^{d_i}

$\bold{x}\prime\in\Bbb{S}_0(\bold{x})$ through the network layer by layer.

$\Bbb{S}_i(\bold{x})$ $\Bbb{C}_i(\bold{x})$ $\Bbb{C}_0(\bold{x})=\Bbb{S}_0(\bold{x})$ , since it's already convex.

$\Bbb{D}\sube\R^{d_{i-1}}$ $i$ $g_{\theta}^i(\Bbb{D})$ $h_{\theta}^i(\Bbb{D})\sube g_{\theta}^i(\Bbb{D})$ .

So a convex relaxation of the intermediate vectors can be recursively defined as

\Bbb{C}_i(\bold{x})=g_{\theta}^i(\Bbb{C}_{i-1}(\bold{x}))\sube\R^{d_i}

$\Bbb{S}_k(\bold{x})\sube \Bbb{C}_k(\bold{x})$ $\Bbb{C}_k(\bold{x})$ $\Bbb{S}_k(\bold{x})$ satisfy the constraint.

$\bold{x}_i^\prime$ $i$ $c^\top h_{\theta}^{i+1:k}(\bold{x}_i^\prime)+d\ge 0,\bold{x}_i^\prime\in\Bbb{C}_i(\bold{x})\backslash \Bbb{S}_i(\bold{x})$ , leading the failure of certification.

It's possible that the original set is satisfactory while the convex relaxation violates the property.

These points are referred as latent adversarial examples.

Convex Layerwise Adversarial Training

Our key observation is that the two families of defense methods described earlier are in fact different ends of the same spectrum: $\Bbb{C}_0(\bold{x})$ $\Bbb{C}_k(\bold{x})$ .

They propose COLT as a combination, the concrete steps are as follows:

$\Bbb{C}_0(\bold{x})$ .
$\Bbb{C}_1(\bold{x})$ to maximize the loss and then backpropagate the update the weights from the second layer to the last one.
From the second to the last layer, iteratively conduct the same procedure in step 2.

$l$ -th step, COLT solves the following min-max optimization problem:

\min_{\theta^{l+1:k}}\Bbb{E}_{(x,y)\sim D}\max_{\bold{x}_l^\prime\in\Bbb{C}_l(\bold{x})}\mathcal{L}(h_{\theta}^{l+1:k}(\bold{x}_l^\prime),y,\theta)

$l=0$ $\Bbb{C}_0(\bold{x})=\Bbb{S}_0(\bold{x})$ .

Calculate convex approximation using linear relaxation

$\Bbb{S}_l(\bold{x})$ can be represented by a set

\Bbb{C}_l(\bold{x})=\{\bold{a}_l+A_l\bold{e}|\bold{e}\in[-1,1]^{m_i}\}

$\bold{a}_l$ $A_l$ $[-1,1]^{m_i}$ . This representation is also known as zonotope abstraction.

A convex relaxation can be represented by an affine transformed and shifted standard hypercube.

$\bold{a}_0=\bold{x}$ $A_0=\epsilon I_{d_o}$ $A_l\bold{e}$ can be computed by performing a chain of matrix-vector multiplications from right to left instead of matrix-matrix multiplications.

$\bold{x}^\prime$ $\bold{e}$ . The Line 7 of the above algorithm is instantiated with the following operations:

\bold{e}_j\gets clip(\bold{e}_j+\alpha A_l^\top\nabla_{\bold{x}_j^\prime}\mathcal{L}(\bold{x}_j^\prime,y_j),-1,1)\\ \bold{x}_j^\prime\gets \bold{a}_l+A_l \bold{e}_j

clip $clip(t,-1,1):=\min(\max(t,-1),1)$ .

Experiments

The certification of the neural networks is conducted using exactly solved convex hulls for each intermediate features.

$\bold{x}^\prime\in\Bbb{C}_l(\bold{x})$ which, if propagated through the network, violates the correctness property in Equation 1.

We remark that this combination of convex relaxation and exact bound propagation does not fall under the recently introduced convex barrier to certification Salman et al. (2019b).

They evaluate on two architectures (both very small):

A 4-layer convolutional network: first 3 layers are convolutional layers with filter sizes 32, 32, 128, kernel sizes 3, 4, 4 and strides 1, 2, 2, respectively.
A 3-layer convolutional network: first 2 layers are convolutional layers with filter sizes 32 and 128, kernel sizes 5 and 4, strides 2 and 2, respectively.

We always compare to the best reported and reproducible results in the literature on any architecture.

Larger architecture produces the results in Table 1 and smaller architecture produces the results in Table 2.

They also evaluate the latent adversarial robustness.

In this experiment, we are interested in robustness of a model against latent adversarial attacks during each stage of the training. To perform this experiment, after each stage of the training we stored the intermediate model and ran latent adversarial attack on these models, on each of the layers.

Using convex layerwise adversarial training, the model progressively becomes robust to perturbations in the deeper layers.

It seems that the deeper layer is always less robust than the more shallow layer.

They also evaluate on MNIST and SVHN, both using the smaller architecture.

We believe that instantiating our method with a convex relaxation that is more memory friendly than what we used would likely yield better results in this experiment.

Inspirations

This paper presents COLT, a new method combining the verification and adversarial training. To me, it's basically layerwise adversarial training on feature map, but with a convex relaxation of the set of feature maps, But it's two computationally expensive, and it's expected to see a work trying to fasten it.

Another inspiration is that, since adversarial training from feature maps to the predictions works, is it possible to conduct adversarial training from input to feature maps? (Adversarial Logit Pairing conduct it from input to logits and TRADES combine standard training with Adversarial Logit Pairing, it seems that no one has tried to do it.)