Defenses against Adversarial Attacks

By LI Haoyang 2020.10.20 | 2020.11.15

Content

Defenses against Adversarial AttacksContentRegularizationAdversarial TrainingRobust StructureDefense at InferenceEnsembleBreach

Regularization

There a bunch of methods trying to increase robustness of model by regularization. The idea of regularization germinated from the very first paper that proposed the problem of adversarial example, i.e. Intriguing properties of neural networks.

Adversarial Defense by Regularization

Moustapha Cisse , Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier. Parseval Networks: Improving Robustness to Adversarial Examples. ICML 2017. arXiv:1704.08847
Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, and Pushmeet Kohli. Adversarial robustness through local linearization. In NeurIPS, 2019. arXiv:1907.02610
Alvin Chan, Yi Tay, Yew Soon Ong, Jie Fu. Jacobian Adversarially Regularized Networks for Robustness. ICLR 2020. arXiv:1912.10185

Adversarial Training

The prevailing method to defend adversarial attack is adversarial training, using the adversarial examples generated online to train the model for a more robust version. It was proposed along with Fast Gradient Sign Method.

Adversarial Defense by Adversarial Training

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. ICLR 2018. arXiv:1706.06083
Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S. Davis, Gavin Taylor, Tom Goldstein. Adversarial Training for Free! arXiv:1904.12843v2
Dinghuai Zhang, Tianyuan Zhang, Yiping Lu, Zhanxing Zhu, Bin Dong. You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle. NIPS 2019. Paper: http://papers.nips.cc/paper/8316-you-only-propagate-once-accelerating-adversarial-training-via-maximal-principle
Jonathan Uesato, Jean-Baptiste Alayrac, Po-Sen Huang, Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli. Are Labels Required for Improving Adversarial Robustness?. NIPS 2019. arXiv:1905.13725
Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, Mohan Kankanhalli. Attacks Which Do Not Kill Training Make Adversarial Learning Stronger. ICML 2020. arXiv:2002.11242
Cihang Xie, Mingxing Tan, Boqing Gong, Alan Yuille, Quoc V. Le. Smooth Adversarial Training. arXiv:2006.14536
Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P. Xing, Laurent El Ghaoui, Michael I. Jordan. Theoretically Principled Trade-off between Robustness and Accuracy. ICML 2019. arXiv:1901.08573
Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, Kaiming He. Feature Denoising for Improving Adversarial Robustness. CVPR 2019. arXiv:1812.03411
Eric Wong, Leslie Rice, and J. Zico Kolter. Fast is better than free: Revisiting adversarial training. ICLR, 2020. Paper: https://openreview.net/forum?id=BJx040EFvH¬eId=BJx040EFvH
Maksym Andriushchenko, Nicolas Flammarion. Understanding and Improving Fast Adversarial Training. NIPS 2020. Paper: https://infoscience.epfl.ch/record/278914
Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, Aleksander Madry. Adversarially Robust Generalization Requires More Data. NIPS 2018. arXiv:1804.11285

Robust Structure

Minghao Guo, Yuzhe Yang, Rui Xu, Ziwei Liu, Dahua Lin. When NAS Meets Robustness: In Search of Robust Architectures against Adversarial Attacks. CVPR 2020. arXiv:1911.10695

Defense at Inference

Adversarial Defense at Inference

Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, Nate Kushman. PixelDefend: Leveraging Generative Models to Understand and Defend against Adversarial Examples. ICLR 2018. arXiv:1710.10766
Tianyu Pang, Kun Xu, Jun Zhu. Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks. ICLR 2020. Paper: https://openreview.net/forum?id=ByxtC2VtPB

Ensemble

Adversarial Defense with Ensemble

Huanrui Yang, Jingyang Zhang, Hongliang Dong, Nathan Inkawhich, Andrew Gardner, Andrew Touchet, Wesley Wilkes, Heath Berry, Hai Li. DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles. NIPS 2020. arXiv:2009.14720

Breach

Breach Adversarial Defenses

Nicholas Carlini, David Wagner. Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods. AISec 2017. arXiv:1705.07263
Breached:
- Secondary classification based detection
  - Zhitao Gong, Wenlu Wang, and Wei-Shinn Ku. 2017. Adversarial and Clean Data Are Not Twins. arXiv preprint arXiv:1704.04960 (2017). (Adversarial Retraining)
  - Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. 2017. On the (Statistical) Detection of Adversarial Examples. arXiv preprint arXiv:1702.06280 (2017). (Adversarial Retraining)
  - Jan Hendrik Metzen, Tim Genewein, Volker Fischer, and Bastian Bischoff. 2017. On Detecting Adversarial Perturbations. In International Conference on Learning Representations. arXiv preprint arXiv:1702.04267. (Examining Convolutional Layers)
- Principal component analysis detection
  - Arjun Nitin Bhagoji, Daniel Cullina, and Prateek Mittal. 2017. Dimensionality Reduction as a Defense against Evasion Attacks on Machine Learning Classifiers. arXiv preprint arXiv:1704:02654 (2017).(Dimensionality Reduction)
  - Dan Hendrycks and Kevin Gimpel. 2017. Early Methods for Detecting Adversarial Images. In International Conference on Learning Representations(WorkshopTrack). (Input Image PCA)
  - Xin Li and Fuxin Li. 2016. Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics. arXiv preprint arXiv:1612.07767 (2016) (Hidden Layer PCA)
- Distributional detection
  - Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting Adversarial Samples from Artifacts. arXiv preprint arXiv:1703.00410 (2017). (Kernel Density Estimation)
  - Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. 2017. On the (Statistical) Detection of Adversarial Examples. arXiv preprint arXiv:1702.06280 (2017). (Two methods proposed in one paper.) (Maximum Mean Discrepancy)
- Normalization detection
  - Reuben Feinman, Ryan R Curtin, Saurabh Shintre, and Andrew B Gardner. 2017. Detecting Adversarial Samples from Artifacts. arXiv preprint arXiv:1703.00410 (2017). (Two methods proposed in one paper) (Dropout Randomization)
  - Xin Li and Fuxin Li. 2016. Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics. arXiv preprint arXiv:1612.07767 (2016) (Two methods proposed in one paper) (Mean Blur)
Warren He, James Wei, Xinyun Chen, Nicholas Carlini, Dawn Song. Adversarial Example Defenses: Ensembles of Weak Defenses are not Strong. 2017. arXiv:1706.04701
Breached:
- feature squeezing
  - XU, W., EVANS, D., AND QI, Y. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017).
  - XU, W., EVANS, D., AND QI, Y. Feature squeezing mitigates and detects Carlini/Wagner adversarial examples. arXiv preprint arXiv:1705.10686 (2017).
- specialists+1
  - ABBASI, M., AND GAGN´E, C. Robustness to adversarial examples through an ensemble of specialists. arXiv preprint arXiv:1702.06856 (2017).
- ensemble of three detectors
  - FEINMAN, R., CURTIN, R. R., SHINTRE, S., AND GARDNER, A. B. Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410 (2017)
  - GONG, Z., WANG, W., AND KU, W.-S. Adversarial and clean data are not twins. arXiv preprint arXiv:1704.04960 (2017).
  - METZEN, J. H., GENEWEIN, T., FISCHER, V., AND BISCHOFF, B. On detecting adversarial perturbations. arXiv preprint arXiv:1702.04267 (2017).
Anish Athalye, Nicholas Carlini, David Wagner. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples. ICML 2018. arXiv:1802.00420
Un-breached:
- Adversarial Training
  - Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJzIBfZAb. accepted as poster (Adversarial Training with PGD)
  - Na, T., Ko, J. H., and Mukhopadhyay, S. Cascade adversarial machine learning regularized with a unified embedding. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=HyRVBzap-. (Cascade Adversarial Training)
Breached:
- Gradient Shattering
  - Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow. Thermometer Encoding: One Hot Way To Resist Adversarial Examples. ICLR 2018. URL: https://openreview.net/forum?id=S18Su--CW (Thermometer Encoding)
  - Guo, C., Rana, M., Cisse, M., and van der Maaten, L. Countering adversarial images using input transformations. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SyJ7ClWCb. accepted as poster (Input Transformations)
  - Ma, X., Li, B., Wang, Y., Erfani, S. M., Wijewickrema, S., Schoenebeck, G., Houle, M. E., Song, D., and Bailey, J. Characterizing adversarial subspaces using local intrinsic dimensionality. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=B1gJ1L2aW. accepted as oral presentation. (Local Intrinsic Dimensionality)
- Stochastic Gradients
  - Dhillon, G. S., Azizzadenesheli, K., Bernstein, J. D., Kossaifi, J., Khanna, A., Lipton, Z. C., and Anandkumar, A. Stochastic activation pruning for robust adversarial defense. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=H1uR4GZRZ. accepted as poster. (Stochastic Activation Pruning)
  - Xie, C., Wang, J., Zhang, Z., Ren, Z., and Yuille, A. Mitigating adversarial effects through randomization. International Conference on Learning Representations,2018. URL https://openreview.net/forum?id=Sk9yuql0Z. accepted as poster. (Mitigating Through Randomization)
- Vanishing & Exploding Gradients
  - Song, Y., Kim, T., Nowozin, S., Ermon, S., and Kushman, N. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJUYGxbCW. accepted as poster. (PixelDefend)
  - Samangouei, P., Kabkab, M., and Chellappa, R. Defensegan: Protecting classifiers against adversarial attacks using generative models. International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BkJ3ibb0-. accepted as poster (Defense-GAN)