The Landscape of Adversarial Example

By LI Haoyang 2020.10

Content

The Landscape of Adversarial ExampleContentProblemsMapAttacksDefenseExplanationInterpretaionBenchmarkFalse positive noiseAdversarial augmentation

Problems

Adversarial examples are such examples that 1) make no difference to human eyes 2) fool the targeted system to malfunction. Many methods in deep learning have been discovered to be vulnerable to adversarial examples, along with many methods to generate adversarial examples.

$\hat{F}(\cdot)$ $F(\cdot)$ $x$ $\hat{x}=x+\delta$ $\delta$ $\hat{F}(\hat{x})\neq \hat{F}(x)$ :

\delta=\mathop{\min}\limits_{||\delta||<\varepsilon} ||\delta||,\\ s.t. \hat{F}(x+\delta)\neq \hat{F}(x)

By the attacker's knowledge of the system, these methods are divided into white-box and black-box, the former assumes the attacker have full knowledge of the system while the latter does not make such assumptions, which makes the former easier and popular in literature, less meaningful in practice and the latter more difficult and weirder in literature, more meaningful in practice. Between black and white, some also call the scenario when attacker has limited knowledge as gray-box.

The problem of generating adversarial examples is a reversed optimization problem, in which the parameters of the targeted model is fixed, while the input is tuned to mislead the model under certain constraints. The white-box methods generally utilize the gradients of the model, and the black-box methods either utilize the transferability of adversarial examples, or solve the problem with zero-order optimization algorithms (e.g. evolutionary algorithms, reinforcement learning, etc. ).

\delta=\mathop{\arg\max}\limits_{||\delta||<\varepsilon}\mathcal{L}(\hat{F}(x+\delta),F(x)),\\ \hat{x}=x+\delta

Most white-box attack (i.e. the most prevailing methods) reforms the problem into a reversed training where the loss is maximized rather than minimized.

The defense of adversarial attack is more difficult and no method has defended all adversarial attacks. Theoretically, a min-max optimization is formulated with the idea of adversarial training to train a robust model.

\min_{\theta} \Bbb{E}_{(x,y)\sim \cal{D}} \left[\max_{\delta\in\cal{S}}\mathcal{L}(\hat{F}(x+\delta;\theta),F(x))\right]

$\cal{S}$ $\theta$ to minimize the loss function respect to the perturbed example.

Based on the same minimax problem, while adversarial training aims to find a lower bound for the inner maximization, efforts in provable defense attempt to find an upper bound for the inner maximization. The latter is also known as verification of robustness.

Map

A Map of Adversarial Example Research

Attacks

Adversarial Attacks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus. Intriguing properties of neural networks. ICLR 2014. arXiv:1312.6199
This is the initial commit of adversarial example, they assume that it's caused by the exploded gradient along minor perturbations.
Ian Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ICLR 2015. (AT with FGSM)
They propose to defend adversarial attack using adversarial training, i.e. use the online generated adversarial example to augment the training data, and for this purpose, they also propose Fast Gradient Sign Method as a fast attack, utilizing the local linearity.
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Pascal Frossard. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. CVPR 2016. arXiv:1511.04599 (DeepFool)
They give a comprehensive analysis of adversarial example from a geometry perspective, and propose to use the perturbation orthogonal with the nearest classification boundary as adversarial perturbation, i.e. the DeepFool attack.
Nicholas Carlini, David Wagner. Towards Evaluating the Robustness of Neural Networks. SSP 2017. arXiv:1608.04644 (Carlini&Wagner Attack)
$l_1$ $l_2$ $l_{\infty}$ norm, i.e. the Carlini&Wagner attack.
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard. Universal adversarial perturbations. CVPR 2017. arXiv:1610.08401 (UAP)
They investigate a special type of adversarial perturbation, i.e. universal adversarial perturbations, which can fool the classifier class agnosticly. They also explain that the existence of universal adversarial perturbations is caused by the geometric correlation of classification boundaries.
Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, Ananthram Swami. Practical Black-Box Attacks against Machine Learning. CCS 2017. arXiv:1602.02697 (Substitute Attack)
They propose to train a substitute model to approximate the decision boundary of the black-box model. Start with a few representative examples, label them by querying the black-box model, augment them with adversarial examples generated using white-box attack againt the substitute model and do it iteratively.
Alexey Kurakin, Ian Goodfellow, Samy Bengio. Adversarial example in the physical world. ICLR 2017. arXiv:1607.02533
They first prove the existence of physical adversarial example,e.g. a printed adversarial example recaptured by camera can still fool the classifier.

Adversarial Example in Object Detection

Mahmood Sharif, Scruti Bhagavatula, Lujo Bauer, Michael Reiter. Accessorize to a Crime: Real and Stealthy Attacks on State-of-the-Art Face Recognition. 1528-1540. 10.1145/2976749.2978392. SIGSAC 2016 Paper: https://www.cs.cmu.edu/~sbhagava/papers/face-rec-ccs16.pdf (Adversarial Glasses)
They craft an adversarial glass that can fool face recognition system.
Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, Alan Yuille. Adversarial Examples for Semantic Segmentation and Object Detection. ICCV 2017. arXiv:1703.08603
They craft adversarial examples for object detection and semantic segmentation, specially, they can design the prediction of semantic segmentation.
Kevin Eykholt, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, Dawn Song. Robust Physical-World Attacks on Deep Learning Models. CVPR 2018 arXiv:1707.08945
They craft robust physical adversarial perturbations on traffic signs.
Simen Thys, Wiebe Van Ranst, Toon Goedemé. Fooling automated surveillance cameras: adversarial patches to attack person detection. CVPR workshop 2019 arXiv:1904.08653 (Adversarial Patch)
They craft an adversarial patch, with which carried in front, one can avoid the object detection of YOLO.
Kaidi Xu, Gaoyuan Zhang, Sijia Liu, Quanfu Fan, Mengshu Sun, Hongge Chen, Pin-Yu Chen, Yanzhi Wang, Xue Lin. Adversarial T-shirt! Evading Person Detectors in A Physical World. ECCV 2020. arXiv:1910.11099 (Adversarial T-shirt)
They craft an adversarial T-shirt, wearing which one can become "invisible" in the eyes of YOLO.
Zuxuan Wu, Ser-Nam Lim, Larry Davis, Tom Goldstein. Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors. ECCV 2020. arXiv:1910.14667
Haichao Zhang, Jianyu Wang. Towards Adversarially Robust Object Detection. ICCV 2019. Paper: ICCV 2019- Towards Adversarially Robust Object Detection
They propose adversarial training for robust detection, choosing the stronger adversarial example crafted either for classification task or for localization task at each batch.

Defense

Defenses against Adversarial Attacks

Regularization

Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier. Parseval Networks: Improving Robustness to Adversarial Examples. ICML 2017. arXiv:1704.08847
They propose to regularize the power of the weights of network by making the weights orthogonal to itself (this kind of weight is also known as a Parseval tight frame) as a defense.
Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, and Pushmeet Kohli. Adversarial robustness through local linearization. NIPS 2019. arXiv:1907.02610 (LLR)
They propose to penalize the loss with a local linearization term, i.e. enforcing the loss function to be well approximated by its first-order taylor expansion. Besides the local linearization term, they also add a magnitude term to restrict the magnitude of the gradients of loss.
Alvin Chan, Yi Tay, Yew Soon Ong, Jie Fu. Jacobian Adversarially Regularized Networks for Robustness. ICLR 2020. arXiv:1912.10185 (JARN)
They use a GAN to enforce the Jacobian of objective to be similar to the inputs, as empirically observed that adversarially trained networks have a gradient more similar to the input image.

Adversarial Training

Harini Kannan, Alexey Kurakin, Ian Goodfellow. Adversarial Logit Pairing. arXiv preprint 2018 arXiv:1803.06373 (ALP)
They propose to enforce the logits activated by a clean image and its adversarial counterpart to be similar by adding a penalizing term in the objective. It's in fact a type of adversarial training on logits.
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. ICLR 2018. arXiv:1706.06083 (AT with PGD)
They claim that Project Gradient Descent attack is the ultimate white-box attack and propose to use PGD attack in adversarial training, besides, they also prove that when the inner maximizer is reached, an update on the outer minimizer can find the optimal for the minimax problem of adversarial training.
Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, Tom Goldstein. Adversarial Training for Free! NIPS 2019. arXiv:1904.12843v2 (AT for free)
Thye modify the adversarial training in order to accelerate it. Instead of launching K steps of PGD to generate a batch of adversarial examples and then train the model with them for one time, they launches 1 steps of PGD for the same batch and train model for m times.
Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, Bin Dong. You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle. NIPS 2019. Paper: http://papers.nips.cc/paper/8316-you-only-propagate-once-accelerating-adversarial-training-via-maximal-principle (YOPO)
They utilize the Pontryagin Maximization Principle to accelerate the original adversarial training.
Jonathan Uesato, Jean-Baptiste Alayrac, Po-Sen Huang, Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli. Are Labels Required for Improving Adversarial Robustness?. NIPS 2019. arXiv:1905.13725 (UAT)
They propose to use the unlabeled data along with labeled to augment adversarial training. For the unlabeled data, they use either the label predicted by the model online or the logits generated by the model. This is discovered by decomposing the loss objective used by adversarial training.
Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, Mohan Kankanhalli. Attacks Which Do Not Kill Training Make Adversarial Learning Stronger. ICML 2020. arXiv:2002.11242 (FAT)
They point out that a very strong adversary can flip the distribution of data, making the label uncorrelated completely, thus hindering the performance of adversarial training. So they propose the idea of friendly adversarial training and implement it by early stopping the adversarial attack.
Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael I. Jordan. Theoretically Principled Trade-off between Robustness and Accuracy. ICML 2019. arXiv:1901.08573 (TRADES)
They analyze the trade-off between robustness and accuracy, then propose to add a regularization term for robustness in the objective. The regularization term enforces the logits of adversarial example and corresponding clean examples to be similar, just like the Adversarial Logit Pairing.
Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Kaiming He. Feature Denoising for Improving Adversarial Robustness. CVPR 2019. arXiv:1812.03411 (Feature Denoising)
They observe that the feature map activated by adversarial examples appears to be much more noisier than clean examples, hence proposing a feature denoising block (the structure is similar to a residual block). It should be used along with adversarial training.
Morgane Goibert, Elvis Dohmatob. Adversarial Robustness via Label-Smoothing. arXiv preprint 2019. arXiv:1906.11567 (ALS)
Eric Wong, Leslie Rice, J. Zico Kolter. Fast is better than free: Revisiting adversarial training. ICLR 2020. Paper: https://openreview.net/forum?id=BJx040EFvH¬eId=BJx040EFvH (AT with FGSM RS)
They revisit the adversarial training with FGSM and find that using a large step induces a phenomenon they named as catastrophic overfitting. They propose to use FGSM with random start to mitigate this phenomenon. FGSM with random start has been proposed before, but used for ensemble adversarial training.
Maksym Andriushchenko, Nicolas Flammarion. Understanding and Improving Fast Adversarial Training. NIPS 2020. Paper: https://infoscience.epfl.ch/record/278914 (AT with FGSM GradAlign)
They carefully revisit the adversarial training with FGSM RS and point it out that it in fact reduces the equivalent step size and still suffers from catastrophic overfitting. Besides, they discover that adversarial training for free and adversarial training with two steps of PGD also suffers from catastrophic overfitting.
They further discover that when the catastrophic overfitting appears, the gradient alignment, i.e. the cosine similarity between the gradient of loss respect to clean example and adversarial counterpart, starts to drop simultaneously, making FGSM ineffective. They propose to use a gradient alignment term in objective to align the gradients such that adversarial training with FGSM works again.
Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le. Smooth Adversarial Training. arXiv preprint 2020 arXiv:2006.14536
They propose to use a smooth activation function for a better backpropagation only in PGD attacks in adversarial training.
Amirreza Shaeiri, Rozhin Nobahari, Mohammad Hossein Rohban. Towards Deep Learning Models Resistant to Large Perturbations. arXiv preprint 2020. arXiv:2003.13370 (Iterative AT)
They discover that adversarial training with PGD fails when directly started with a large step size and propose to initialize the network with the adversarially trained weights using a smaller step size before training with a large step size. Based on this idea, they also propose an iterative adversarial training that increases the step size gradually as the training progresses.

Provable Defenses/Verification

Hadi Salman, Greg Yang, Huan Zhang, Cho-Jui Hsieh, Pengchuan Zhang. A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks. NIPS 2019. arXiv:1902.08722
They connect all the existing convex relaxation of robustness verification and point out that there is a convex barrier (a difference cuased by convex relaxation) hindering these relaxation from properly verify the robustness.
Jeet Mohapatra, Tsui-Wei (Lily)Weng, Pin-Yu Chen, Sijia Liu, Luca Daniel. Towards Verifying Robustness of Neural Networks Against A Family of Semantic Perturbations. CVPR 2020. arXiv:1912.09533 (Sematify-NN)
They incorparate multiple simantic perturbations into layers of neural networks, and propose to use this network to verify the robustness of model against semantic perturbations.
Mislav Balunovic, Martin Vechev. Adversarial Training and Provable Defenses: Bridging the Gap. ICLR 2020. Paper: https://openreview.net/forum?id=SJxSDxrKDr (COLT)
The propose a Convex Layerwise Adversarial Training incorparating verification and adversarial training. It can be seen as a layerwise adversarial training in latent space from the first layer to the last layer, with the attack space of each latent space replaced with a convex relaxed version.

NAS + defense

Minghao Guo, Yuzhe Yang, Rui Xu, Ziwei Liu, Dahua Lin. When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks. CVPR 2020. arXiv:1911.10695
They design a NAS to search for the most robust architecture and discover that a dense network is more robust.

Defense at Inference

Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, Nate Kushman. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples. ICLR 2018. arXiv:1710.10766 (PixelDefend)
They propose to purify the adversarial example before feeding it to the model by projecting adversarial examples back to the data manifold, i.e. restore the data distribution, using a Pixel CNN. It was breached and labeled as obfuscated gradient later.
Tianyu Pang, Kun Xu, Jun Zhu. Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks. ICLR 2020. Paper: https://openreview.net/forum?id=ByxtC2VtPB (MI)
They analyze mixup as a defense in inference, i.e. choosing a clean example and mix it (weighted sum) with the fed example (potentially adversarial), and conclude that it is an effective defense given the assumption that the network functions linearly between instances. It works well along with Interpolate Adversarial Training.

Ensemble

Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel. Ensemble Adversarial Training: Attacks and Defenses. ICLR 2018. arXiv:1705.07204 (R+FGSM ensemble)
They propose an ensemble adversarial training with single step attack started from random points.
Huanrui Yang, Jingyang Zhang, Hongliang Dong, Nathan Inkawhich, Andrew Gardner, Andrew Touchet, Wesley Wilkes, Heath Berry, Hai Li. DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles. NIPS 2020. arXiv:2009.14720

Breach Defense

Warren He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song. Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong. arXiv preprint 2017. arXiv:1706.04701
Nicholas Carlini, David Wagner. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. AISec 2017. arXiv:1705.07263
Anish Athalye, Nicholas Carlini, David Wagner. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018. arXiv:1802.00420

Evaluation

Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, Alexey Kurakin. On Evaluating Adversarial Robustness. arXiv:1902.06705 (SINGLE NOTE) LIVING DOCUMENT

Explanation

This direction germinates from the robustness analysis of machine learning algorithms, which is a domain with a long history.

Explanation of Robustness and Adversarial Example

Robustness Analysis

Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard. Robustness of classifiers: from adversarial to random noise. NIPS 2016. arXiv:1608.08967
Alhussein Fawzi, Hamza Fawzi, Omar Fawzi. Adversarial vulnerability for any classifier. NIPS 2018. Paper: http://papers.nips.cc/paper/7394-adversarial-vulnerability-for-any-classifier (Single NOTE)
They combine a generator and a discriminator to analyze the adversarial vulnerability and give the relation between in-distribution robustness (defined as the magnitude of smallest perturbation inside the data distribution, it's actually a measure of generalization) and unconstrained robustness (adversarial robustness). They also give an upper bound of the robustness.
Daniel Cullina, Arjun Nitin Bhagoji, Prateek Mittal. PAC-learning in the presence of evasion adversaries. NIPS 2018. arXiv:1806.01471 (SINGLE NOTE)
They evaluate the sample complexity in the frame of PAC-learning theory, concluding that the sample complexity with the presence of adversary can be smaller, similar or larger than that of a standard scenario.
Not very useful for engineers.
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard, Stefano Soatto. Robustness of Classifiers to Universal Perturbations: A Geometric Perspective. ICLR 2018. (*)
Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, Aleksander Madry. Adversarially Robust Generalization Requires More Data. NIPS 2018. arXiv:1804.11285
They discover that in adversarial training there appears overfitting in CIFAR-10 but not in MNIST. They further study a Bernoulli model, which is similar to MNIST and find that using partial binarization (thresholding) can effectively improve the defense against adversarial attack in MNIST.
They also experiment the relationship between training set size and robust accuarcy and conclude that a larger training set size is always better for the same level of perturbation. To reach the same level of robust accuracy, a larger training set size is required for a larger perturbation allowed. Hence they conclude that adversarially robust generalization requires more data.
Carl-Johann Simon-Gabriel, Yann Ollivier, Léon Bottou, Bernhard Schölkopf, David Lopez-Paz. First-order Adversarial Vulnerability of Neural Networks and Input Dimension. ICML 2019. arXiv:1802.01421 (SINGLE NOTE)
$l_2$ -norm, they build a link with the double backpropagation proposed decasdes ago in the purpose of increasing accuracy.
$\sqrt{d}$ $d$ $l_p$ $1/d$ .
Empirically, they also show that proper gradient regularization can match with adversarial augmentation. They also show that as the error rate drops in the whole training process, the adversarial error rate first drops then grows, and PGD adversarial training outperforms down-sampling.
Dimitris Tsipras, Shibani Santurkar, Logan Engstrom, Alexander Turner, Aleksander Madry. Robustness May Be at Odds with Accuracy. ICLR 2019. arXiv:1805.12152
They prove that a very strong adversary can flip the data distribution, hence hindering the adversarial training. They also show that the gradients of adversarially trained model are more interpretable.
Colin Wei, Tengyu Ma. Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin. ICLR 2020. Paper: https://openreview.net/forum?id=HJe_yR4Fwr
Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John Duchi, Percy Liang. Understanding and Mitigating the Tradeoff Between Robustness and Accuracy. ICML 2020. arXiv:2002.10716
Guillermo Ortiz-Jimenez, Apostolos Modas, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard. Hold me tight! Influence of discriminative features on deep network boundaries. NIPS 2020. arXiv:2002.06349

What is Adversarial Example?

Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, Ian Goodfellow. The Relationship Between High-Dimensional Geometry and Adversarial Examples. arXiv preprint 2018. arXiv:1801.02774v3
Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. NIPS 2019. arXiv:1905.02175v3
They discover that it's possible to train a relatively robust model with standard training using the features utilized by a robust model; respectively, it's also possible to train a very accurate but highly non-robust model using the dataset crafted in a manner that only the non-robust feature utilized by a classifier is related to the label.
Therefore, they conclude that adversarial examples are not bugs, they are features.
Nic Ford, Justin Gilmer, Nicolas Carlini, Dogus Cubuk. Adversarial Examples Are a Natural Consequence of Test Error in Noise. arXiv preprint 2019 arXiv:1901.10513
They use a halfspace model to demonstrate that when the Gaussian noise error rate is relative low, there is still likely to exist an inperceptible adversarial example. They also use the Gaussian isoperimetric inequality to show that the best boundary with the least number of adversarially attackable points should be linear.

Interpretaion

These are some empirical discoveries and some unique interpretations for adversarial examples and the robustness of models.

Interprertation for Robustness and Adversarial Example

Dong Yin, Raphael Gontijo Lopes, Jonathon Shlens, Ekin D. Cubuk, Justin Gilmer. A Fourier Perspective on Model Robustness in Computer Vision. NIPS 2019. arXiv:1906.08988
They observe that adversarially robust model focus more on the low-frequency component as well as model trained with Gaussian augmentation and they both suffer from low-frequency corruptions such as fog. They also observe that adding certain Fourier basis vector with large norm can craft adversarial example.
Tianyuan Zhang, Zhanxing Zhu. Interpreting Adversarially Trained Convolutional Neural Networks. ICML 2019. arXiv:1905.09797
They empirically show that adversarially trained network focuses more on the shape information rather than texture, and points out that potentially one can enhance the robustness of model by forcing it focus more on global features. This can partially explain the performance of feature denoising.
Cihang Xie, Alan Yuille. Intriguing Properties of Adversarial Training at Scale. ICLR 2020. Paper: https://openreview.net/forum?id=HyxJhCEFDS¬eId=rJxeamAAKB
They discover that Batch Normalization has a negative effect on robustness and observe that although standard accuracy increases marginally as the depth of model grows, the robust accuracy increases significantly.
Shivam Garg, Vatsal Sharan, Brian Hu Zhang, Gregory Valiant. A Spectral View of Adversarially Robust Features. NIPS 2018. arXiv:1811.06609 (*)
Leslie Rice, Eric Wong, J. Zico Kolter. Overfitting in adversarially robust deep learning. arXiv preprint 2020 arXiv:2002.11569 (AT with PGD + Early Stop)
They observe that overfitting occurs in adversarial training although not yet in standard training, therefore they propose to early stop the adversarial training, which surprisingly reaches the SOTA results.

Benchmark

These are some benchmark datasets and some methods proposed to benchmark the performance.

Benchmark Adversarial Defenses

Dan Hendrycks, Thomas G. Dietterich. Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations. ICLR 2019. arXiv:1807.01697 (ImageNet-C and ImageNet-P)
They propose two corrupted ImageNet dataset, ImageNet-C with multiple corruptions and ImageNet-P with perturbations with various magnitudes.
Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein. RobustBench: a standardized adversarial robustness benchmark. ICLR 2020. arXiv:2010.09670 (Robustbench)
They propose to use autoattack to benchmark the performance of different defense methods, and create a leaderboard.
Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, Dawn Song. Natural Adversarial Examples. arXiv preprint 2020. arXiv:1907.07174 (ImageNet-A and ImageNet-O)
They point out the existence of natural adversarial examples, i.e. those examples are naturally easy to be misclassified and craft the ImageNet-A.
Francesco Croce, Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. ICML 2020. arXiv:2003.01690 (AutoAttack)

False positive noise

Images that are meaningless to human eyes, but meaningful to classifiers.

Anh Nguyen, Jason Yosinski, Jeff Clune. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. CVPR 2015. arXiv:1412.1897 (*)

Adversarial augmentation

This is a very fresh direction with potential to grow up.

Adversarial Augmentation

Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le. Adversarial Examples Improve Image Recognition. CVPR 2020. arXiv:1911.09665v2
They use an independent Batch Normalization branch to incorparate adversarial examples for a better accuracy.

🔝